[ https://issues.apache.org/jira/browse/CASSANDRA-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-6255: -------------------------------------- Priority: Minor (was: Major) > Exception count not incremented on OutOfMemoryError (HSHA) > ---------------------------------------------------------- > > Key: CASSANDRA-6255 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6255 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Oracle java version "1.7.0_15" > rpc_server_type: hsha > Reporter: Dan Hendry > Priority: Minor > > One of our nodes decided to stop listening on 9160 (netstat -l was showing > nothing and telnet was reporting connection refused). Nodetool status showed > no hosts down and on the offending node nodetool info gave the following: > {noformat} > nodetool info > Token : (invoke with -T/--tokens to see all 256 tokens) > ID : (removed) > Gossip active : true > Thrift active : true > Native Transport active: false > Load : 2.05 TB > Generation No : 1382536528 > Uptime (seconds) : 432970 > Heap Memory (MB) : 8098.05 / 14131.25 > Data Center : DC1 > Rack : RAC2 > Exceptions : 0 > Key Cache : size 536854996 (bytes), capacity 536870912 (bytes), > 41383646 hits, 1710831591 requests, 0.024 recent hit rate, 0 save period in > seconds > Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, > NaN recent hit rate, 0 save period in seconds > {noformat} > After looking at the cassandra log, I saw a bunch of the following: > {noformat} > ERROR [Selector-Thread-16] 2013-10-27 17:36:00,370 CustomTHsHaServer.java > (line 187) Uncaught Exception: > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:691) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371) > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:145) > at > org.apache.cassandra.thrift.CustomTHsHaServer.requestInvoke(CustomTHsHaServer.java:337) > at > org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.handleRead(CustomTHsHaServer.java:281) > at > org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.select(CustomTHsHaServer.java:224) > at > org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.run(CustomTHsHaServer.java:182) > ERROR [Selector-Thread-7] 2013-10-27 17:36:00,370 CustomTHsHaServer.java > (line 187) Uncaught Exception: > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:691) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371) > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:145) > at > org.apache.cassandra.thrift.CustomTHsHaServer.requestInvoke(CustomTHsHaServer.java:337) > at > org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.handleRead(CustomTHsHaServer.java:281) > at > org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.select(CustomTHsHaServer.java:224) > at > org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.run(CustomTHsHaServer.java:182) > {noformat} > There wasn't anything else overtly suspicious in the logs except for the > occasional > {noformat} > ERROR [Selector-Thread-0] 2013-10-27 17:35:58,662 TNonblockingServer.java > (line 468) Read an invalid frame size of 0. Are you using TFramedTransport on > the client side? > {noformat} > but that periodically comes up - I have looked into it before but it has > never seemed to have any serious impact. > This ticket is not about *why* an OutOfMemoryError occurred - which is bad > but I don't think I have enough information to reproduce or speculate on a > cause. This ticket is about the fact that an OutOfMemoryError occurred and > nodetool info was reporting Thrift active : true and Exceptions : 0. > Our monitoring systems and investigation processes are both starting to rely > on on the exception count. The fact that it was not accurate here is > disconcerting. -- This message was sent by Atlassian JIRA (v6.1#6144)