Hi Michaels :-), My guess is this ticket will be closed with a "Won't Fix" resolution.
Cassandra 2.0 is no longer supported and I have seen tickets being rejected like CASSANDRA-10510 <https://issues.apache.org/jira/browse/CASSANDRA-10510> . Would you like to upgrade to 2.1.last and see if you still have the issue? About your issue, do you stop your node using a command like the following one? nodetool disablethrift && nodetool disablebinary && sleep 5 && nodetool disablegossip && sleep 10 && nodetool drain && sleep 10 && sudo service cassandra stop or even flushing: nodetool disablethrift && nodetool disablebinary && sleep 5 && nodetool disablegossip && sleep 10 && nodetool flush && nodetool drain && sleep 10 && sudo service cassandra stop Are commitlogs empty when you start cassandra? C*heers, ----------------------- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-05-11 5:35 GMT+02:00 Michael Fong <michael.f...@ruckuswireless.com>: > Hi, > > Thanks for your recommendation. > I also opened a ticket to keep track @ > https://issues.apache.org/jira/browse/CASSANDRA-11748 > Hope this could brought someone's attention to take a look. Thanks. > > Sincerely, > > Michael Fong > > -----Original Message----- > From: Michael Kjellman [mailto:mkjell...@internalcircle.com] > Sent: Monday, May 09, 2016 11:57 AM > To: dev@cassandra.apache.org > Cc: u...@cassandra.apache.org > Subject: Re: Cassandra 2.0.x OOM during startsup - schema version > inconsistency after reboot > > I'd recommend you create a JIRA! That way you can get some traction on the > issue. Obviously an OOM is never correct, even if your process is wrong in > some way! > > Best, > kjellman > > Sent from my iPhone > > > On May 8, 2016, at 8:48 PM, Michael Fong < > michael.f...@ruckuswireless.com> wrote: > > > > Hi, all, > > > > > > Haven't heard any responses so far, and this isue has troubled us for > quite some time. Here is another update: > > > > We have noticed several times that The schema version may change after > migration and reboot: > > > > Here is the scenario: > > > > 1. Two node cluster (1 & 2). > > > > 2. There are some schema changes, i.e. create a few new > columnfamily. The cluster will wait until both nodes have schema version in > sync (describe cluster) before moving on. > > > > 3. Right before node2 is rebooted, the schema version is > consistent; however, after ndoe2 reboots and starts servicing, the > MigrationManager would gossip different schema version. > > > > 4. Afterwards, both nodes starts exchanging schema message > indefinitely until one of the node dies. > > > > We currently suspect the change of schema is due to replying the old > entry in commit log. We wish to continue dig further, but need experts help > on this. > > > > I don't know if anyone has seen this before, or if there is anything > wrong with our migration flow though.. > > > > Thanks in advance. > > > > Best regards, > > > > > > Michael Fong > > > > From: Michael Fong [mailto:michael.f...@ruckuswireless.com] > > Sent: Thursday, April 21, 2016 6:41 PM > > To: u...@cassandra.apache.org; dev@cassandra.apache.org > > Subject: RE: Cassandra 2.0.x OOM during bootstrap > > > > Hi, all, > > > > Here is some more information on before the OOM happened on the rebooted > node in a 2-node test cluster: > > > > > > 1. It seems the schema version has changed on the rebooted node > after reboot, i.e. > > Before reboot, > > Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 > > MigrationManager.java (line 328) Gossiping my schema version > > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > > Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 > > MigrationManager.java (line 328) Gossiping my schema version > > 4cb463f8-5376-3baf-8e88-a5cc6a94f58f > > > > After rebooting node 2, > > Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java > > (line 328) Gossiping my schema version > > f5270873-ba1f-39c7-ab2e-a86db868b09b > > > > > > > > 2. After reboot, both nods repeatedly send MigrationTask to each > other - we suspect it is related to the schema version (Digest) mismatch > after Node 2 rebooted: > > The node2 keeps submitting the migration task over 100+ times to the > other node. > > INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) > > Node /192.168.88.33 has restarted, now UP INFO [GossipStage:1] > > 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) Updating > > topology for /192.168.88.33 INFO [GossipStage:1] 2016-04-19 > > 11:18:18,263 StorageService.java (line 1544) Node /192.168.88.33 state > > jump to normal INFO [GossipStage:1] 2016-04-19 11:18:18,264 > > TokenMetadata.java (line 414) Updating topology for /192.168.88.33 > DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line > 102) Submitting migration task for /192.168.88.33 DEBUG [GossipStage:1] > 2016-04-19 11:18:18,265 MigrationManager.java (line 102) Submitting > migration task for /192.168.88.33 DEBUG [MigrationStage:1] 2016-04-19 > 11:18:18,268 MigrationTask.java (line 62) Can't send schema pull request: > node /192.168.88.33 is down. > > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java > (line 62) Can't send schema pull request: node /192.168.88.33 is down. > > DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java > > (line 977) removing expire time for endpoint : /192.168.88.33 INFO > > [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line > > 978) InetAddress /192.168.88.33 is now UP DEBUG > > [RequestResponseStage:1] 2016-04-19 11:18:18,353 MigrationManager.java > > (line 102) Submitting migration task for /192.168.88.33 DEBUG > > [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line > > 977) removing expire time for endpoint : /192.168.88.33 INFO > > [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line > > 978) InetAddress /192.168.88.33 is now UP DEBUG > [RequestResponseStage:1] 2016-04-19 11:18:18,355 MigrationManager.java > (line 102) Submitting migration task for /192.168.88.33 DEBUG > [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 977) > removing expire time for endpoint : /192.168.88.33 INFO > [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 978) > InetAddress /192.168.88.33 is now UP DEBUG [RequestResponseStage:2] > 2016-04-19 11:18:18,356 MigrationManager.java (line 102) Submitting > migration task for /192.168.88.33 ..... > > > > > > On the otherhand, Node 1 keeps updating its gossip information, followed > by receiving and submitting migrationTask afterwards: > > DEBUG [RequestResponseStage:3] 2016-04-19 11:18:18,332 Gossiper.java > > (line 977) removing expire time for endpoint : /192.168.88.34 INFO > > [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line > > 978) InetAddress /192.168.88.34 is now UP DEBUG > > [RequestResponseStage:4] 2016-04-19 11:18:18,335 Gossiper.java (line > > 977) removing expire time for endpoint : /192.168.88.34 INFO > > [RequestResponseStage:4] 2016-04-19 11:18:18,335 Gossiper.java (line > 978) InetAddress /192.168.88.34 is now UP DEBUG [RequestResponseStage:3] > 2016-04-19 11:18:18,335 Gossiper.java (line 977) removing expire time for > endpoint : /192.168.88.34 INFO [RequestResponseStage:3] 2016-04-19 > 11:18:18,335 Gossiper.java (line 978) InetAddress /192.168.88.34 is now > UP ...... > > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 > MigrationRequestVerbHandler.java (line 41) Received migration request from / > 192.168.88.34. > > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,595 > MigrationRequestVerbHandler.java (line 41) Received migration request from / > 192.168.88.34. > > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,843 > MigrationRequestVerbHandler.java (line 41) Received migration request from / > 192.168.88.34. > > DEBUG [MigrationStage:1] 2016-04-19 11:18:18,878 > MigrationRequestVerbHandler.java (line 41) Received migration request from / > 192.168.88.34. > > ...... > > DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java > > (line 127) submitting migration task for /192.168.88.34 DEBUG > > [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line > > 127) submitting migration task for /192.168.88.34 DEBUG > [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line 127) > submitting migration task for /192.168.88.34 ..... > > > > Has anyone experienced this scenario? Thanks in advanced! > > > > Sincerely, > > > > Michael Fong > > > > From: Michael Fong [mailto:michael.f...@ruckuswireless.com] > > Sent: Wednesday, April 20, 2016 10:43 AM > > To: u...@cassandra.apache.org<mailto:u...@cassandra.apache.org>; > > dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> > > Subject: Cassandra 2.0.x OOM during bootstrap > > > > Hi, all, > > > > We have recently encountered a Cassandra OOM issue when Cassandra is > brought up sometimes (but not always) in our 4-node cluster test bed. > > > > After analyzing the heap dump, we could find the Internal-Response > thread pool (JMXEnabledThreadPoolExecutor) is filled with thounds of > 'org.apache.cassandra.net.MessageIn' objects, and occupy > 2 gigabytes of > heap memory. > > > > According to the documents on internet, it seems internal-response > thread pool is somewhat related to schema-checking. Has anyone encountered > similar issue before? > > > > We are using Cassandra 2.0.17 and JDK 1.8. Thanks in advance! > > > > Sincerely, > > > > Michael Fong >