[ 
https://issues.apache.org/jira/browse/CASSANDRA-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987055#action_12987055
 ] 

Thibaut commented on CASSANDRA-2054:
------------------------------------

The jstack was taken after the node returned to "normal" state. I'm unable to 
do a jstack if the node is taking away all the cpu.

The connections could also all come from the threads from my application 
(multiple threads) trying to connect while the node is in the "not normal" 
state, as the node is still marked as up.


> Cpu Spike to > 100%. 
> ---------------------
>
>                 Key: CASSANDRA-2054
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2054
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Thibaut
>         Attachments: gc.log, jstack.txt, jstackerror.txt
>
>
> I see sudden spikes of cpu usage where cassandra will take up an enormous 
> amount of cpu (uptime load > 1000). 
> My application executes both reads and writes.
> I tested this with 
> https://hudson.apache.org/hudson/job/Cassandra-0.7/193/artifact/cassandra/build/apache-cassandra-2011-01-24_06-01-26-bin.tar.gz.
> I disabled JNA, but this didn't help.
> Jstack won't work anymore when this happens:
> -bash-4.1# jstack 27699 > /tmp/jstackerror
> 27699: Unable to open socket file: target process not responding or HotSpot 
> VM not loaded
> The -F option can be used when the target process is not responding
> Also, my entire application comes to a halt as long as the node is in this 
> state, as the node is still marked as up, but won't respond (cassandra is 
> taking up all the cpu on the first node) to any requests.
> /software/cassandra/bin/nodetool -h localhost ring
> Address Status State Load Owns Token
> ffffffffffffffff
> 192.168.0.1 Up Normal 3.48 GB 5.00% 0cc
> 192.168.0.2 Up Normal 3.48 GB 5.00% 199
> 192.168.0.3 Up Normal 3.67 GB 5.00% 266
> 192.168.0.4 Up Normal 2.55 GB 5.00% 333
> 192.168.0.5 Up Normal 2.58 GB 5.00% 400
> 192.168.0.6 Up Normal 2.54 GB 5.00% 4cc
> 192.168.0.7 Up Normal 2.59 GB 5.00% 599
> 192.168.0.8 Up Normal 2.58 GB 5.00% 666
> 192.168.0.9 Up Normal 2.33 GB 5.00% 733
> 192.168.0.10 Down Normal 2.39 GB 5.00% 7ff
> 192.168.0.11 Up Normal 2.4 GB 5.00% 8cc
> 192.168.0.12 Up Normal 2.74 GB 5.00% 999
> 192.168.0.13 Up Normal 3.17 GB 5.00% a66
> 192.168.0.14 Up Normal 3.25 GB 5.00% b33
> 192.168.0.15 Up Normal 3.01 GB 5.00% c00
> 192.168.0.16 Up Normal 2.48 GB 5.00% ccc
> 192.168.0.17 Up Normal 2.41 GB 5.00% d99
> 192.168.0.18 Up Normal 2.3 GB 5.00% e66
> 192.168.0.19 Up Normal 2.27 GB 5.00% f33
> 192.168.0.20 Up Normal 2.32 GB 5.00% ffffffffffffffff
> The interesting part is that after a while (seconds or minutes), I have seen 
> cassandra nodes return to a normal state again (without restart). I have also 
> never seen this happen at 2 nodes at the same time in the cluster (the node 
> where it happens differes, but there seems to be scheme for it to happen on 
> the first node most of the times).
> In the above case, I restarted node 192.168.0.10 and the first node returned 
> to normal state. (I don't know if there is a correlation)
> I attached the jstack of the node in trouble (as soon as I could access it 
> with jstack, but I suspect this is the jstack when the node was running 
> normal again).
> The heap usage is still moderate:
> /software/cassandra/bin/nodetool -h localhost info
> 0cc
> Gossip active    : true
> Load             : 3.49 GB
> Generation No    : 1295949691
> Uptime (seconds) : 42843
> Heap Memory (MB) : 1570.58 / 3005.38
> I will enable the GC logging tomorrow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to