[ https://issues.apache.org/jira/browse/CASSANDRA-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thibaut updated CASSANDRA-2054: ------------------------------- Attachment: jstack.txt gc.log All nodes were up when the error occured, this time on node 192.168.0.3. I stopped our application and the node returned to normal state. (jstack is from when the node was accessible again) /software/cassandra/bin/nodetool -h localhost info 266 Gossip active : true Load : 6.26 GB Generation No : 1296040182 Uptime (seconds) : 1208 Heap Memory (MB) : 907.55 / 3005.38 ffffffffffffffff 192.168.0.1 Up Normal 4.6 GB 5.00% 0cc 192.168.0.2 Up Normal 4.6 GB 5.00% 199 192.168.0.3 Up Normal 5.35 GB 5.00% 266 192.168.0.4 Up Normal 2.54 GB 5.00% 333 192.168.0.5 Up Normal 2.59 GB 5.00% 400 192.168.0.6 Up Normal 2.55 GB 5.00% 4cc 192.168.0.7 Up Normal 2.61 GB 5.00% 599 192.168.0.8 Up Normal 2.59 GB 5.00% 666 192.168.0.9 Up Normal 2.34 GB 5.00% 733 192.168.0.10 Up Normal 1.74 GB 5.00% 7ff 192.168.0.11 Up Normal 2.41 GB 5.00% 8cc 192.168.0.12 Up Normal 2.73 GB 5.00% 999 192.168.0.13 Up Normal 3.18 GB 5.00% a66 192.168.0.14 Up Normal 3.26 GB 5.00% b33 192.168.0.15 Up Normal 3.02 GB 5.00% c00 192.168.0.16 Up Normal 2.5 GB 5.00% ccc 192.168.0.17 Up Normal 2.42 GB 5.00% d99 192.168.0.18 Up Normal 2.31 GB 5.00% e66 192.168.0.19 Up Normal 2.28 GB 5.00% f33 192.168.0.20 Up Normal 2.33 GB 5.00% ffffffffffffffff > Cpu Spike to > 100%. > --------------------- > > Key: CASSANDRA-2054 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2054 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.7.0 > Reporter: Thibaut > Attachments: gc.log, jstack.txt, jstackerror.txt > > > I see sudden spikes of cpu usage where cassandra will take up an enormous > amount of cpu (uptime load > 1000). > My application executes both reads and writes. > I tested this with > https://hudson.apache.org/hudson/job/Cassandra-0.7/193/artifact/cassandra/build/apache-cassandra-2011-01-24_06-01-26-bin.tar.gz. > I disabled JNA, but this didn't help. > Jstack won't work anymore when this happens: > -bash-4.1# jstack 27699 > /tmp/jstackerror > 27699: Unable to open socket file: target process not responding or HotSpot > VM not loaded > The -F option can be used when the target process is not responding > Also, my entire application comes to a halt as long as the node is in this > state, as the node is still marked as up, but won't respond (cassandra is > taking up all the cpu on the first node) to any requests. > /software/cassandra/bin/nodetool -h localhost ring > Address Status State Load Owns Token > ffffffffffffffff > 192.168.0.1 Up Normal 3.48 GB 5.00% 0cc > 192.168.0.2 Up Normal 3.48 GB 5.00% 199 > 192.168.0.3 Up Normal 3.67 GB 5.00% 266 > 192.168.0.4 Up Normal 2.55 GB 5.00% 333 > 192.168.0.5 Up Normal 2.58 GB 5.00% 400 > 192.168.0.6 Up Normal 2.54 GB 5.00% 4cc > 192.168.0.7 Up Normal 2.59 GB 5.00% 599 > 192.168.0.8 Up Normal 2.58 GB 5.00% 666 > 192.168.0.9 Up Normal 2.33 GB 5.00% 733 > 192.168.0.10 Down Normal 2.39 GB 5.00% 7ff > 192.168.0.11 Up Normal 2.4 GB 5.00% 8cc > 192.168.0.12 Up Normal 2.74 GB 5.00% 999 > 192.168.0.13 Up Normal 3.17 GB 5.00% a66 > 192.168.0.14 Up Normal 3.25 GB 5.00% b33 > 192.168.0.15 Up Normal 3.01 GB 5.00% c00 > 192.168.0.16 Up Normal 2.48 GB 5.00% ccc > 192.168.0.17 Up Normal 2.41 GB 5.00% d99 > 192.168.0.18 Up Normal 2.3 GB 5.00% e66 > 192.168.0.19 Up Normal 2.27 GB 5.00% f33 > 192.168.0.20 Up Normal 2.32 GB 5.00% ffffffffffffffff > The interesting part is that after a while (seconds or minutes), I have seen > cassandra nodes return to a normal state again (without restart). I have also > never seen this happen at 2 nodes at the same time in the cluster (the node > where it happens differes, but there seems to be scheme for it to happen on > the first node most of the times). > In the above case, I restarted node 192.168.0.10 and the first node returned > to normal state. (I don't know if there is a correlation) > I attached the jstack of the node in trouble (as soon as I could access it > with jstack, but I suspect this is the jstack when the node was running > normal again). > The heap usage is still moderate: > /software/cassandra/bin/nodetool -h localhost info > 0cc > Gossip active : true > Load : 3.49 GB > Generation No : 1295949691 > Uptime (seconds) : 42843 > Heap Memory (MB) : 1570.58 / 3005.38 > I will enable the GC logging tomorrow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.