[ 
https://issues.apache.org/jira/browse/CASSANDRA-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thibaut updated CASSANDRA-2054:
-------------------------------

    Attachment: jstack.txt
                gc.log

All nodes were up when the error occured, this time on node 192.168.0.3. I 
stopped our application and the node returned to normal state.

(jstack is from when the node was accessible again)


/software/cassandra/bin/nodetool -h localhost info
266
Gossip active    : true
Load             : 6.26 GB
Generation No    : 1296040182
Uptime (seconds) : 1208
Heap Memory (MB) : 907.55 / 3005.38


                                                       ffffffffffffffff
192.168.0.1     Up     Normal  4.6 GB          5.00%   0cc
192.168.0.2     Up     Normal  4.6 GB          5.00%   199
192.168.0.3     Up     Normal  5.35 GB         5.00%   266
192.168.0.4     Up     Normal  2.54 GB         5.00%   333
192.168.0.5     Up     Normal  2.59 GB         5.00%   400
192.168.0.6     Up     Normal  2.55 GB         5.00%   4cc
192.168.0.7     Up     Normal  2.61 GB         5.00%   599
192.168.0.8     Up     Normal  2.59 GB         5.00%   666
192.168.0.9     Up     Normal  2.34 GB         5.00%   733
192.168.0.10    Up     Normal  1.74 GB         5.00%   7ff
192.168.0.11    Up     Normal  2.41 GB         5.00%   8cc
192.168.0.12    Up     Normal  2.73 GB         5.00%   999
192.168.0.13    Up     Normal  3.18 GB         5.00%   a66
192.168.0.14    Up     Normal  3.26 GB         5.00%   b33
192.168.0.15    Up     Normal  3.02 GB         5.00%   c00
192.168.0.16    Up     Normal  2.5 GB          5.00%   ccc
192.168.0.17    Up     Normal  2.42 GB         5.00%   d99
192.168.0.18    Up     Normal  2.31 GB         5.00%   e66
192.168.0.19    Up     Normal  2.28 GB         5.00%   f33
192.168.0.20    Up     Normal  2.33 GB         5.00%   ffffffffffffffff


> Cpu Spike to > 100%. 
> ---------------------
>
>                 Key: CASSANDRA-2054
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2054
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Thibaut
>         Attachments: gc.log, jstack.txt, jstackerror.txt
>
>
> I see sudden spikes of cpu usage where cassandra will take up an enormous 
> amount of cpu (uptime load > 1000). 
> My application executes both reads and writes.
> I tested this with 
> https://hudson.apache.org/hudson/job/Cassandra-0.7/193/artifact/cassandra/build/apache-cassandra-2011-01-24_06-01-26-bin.tar.gz.
> I disabled JNA, but this didn't help.
> Jstack won't work anymore when this happens:
> -bash-4.1# jstack 27699 > /tmp/jstackerror
> 27699: Unable to open socket file: target process not responding or HotSpot 
> VM not loaded
> The -F option can be used when the target process is not responding
> Also, my entire application comes to a halt as long as the node is in this 
> state, as the node is still marked as up, but won't respond (cassandra is 
> taking up all the cpu on the first node) to any requests.
> /software/cassandra/bin/nodetool -h localhost ring
> Address Status State Load Owns Token
> ffffffffffffffff
> 192.168.0.1 Up Normal 3.48 GB 5.00% 0cc
> 192.168.0.2 Up Normal 3.48 GB 5.00% 199
> 192.168.0.3 Up Normal 3.67 GB 5.00% 266
> 192.168.0.4 Up Normal 2.55 GB 5.00% 333
> 192.168.0.5 Up Normal 2.58 GB 5.00% 400
> 192.168.0.6 Up Normal 2.54 GB 5.00% 4cc
> 192.168.0.7 Up Normal 2.59 GB 5.00% 599
> 192.168.0.8 Up Normal 2.58 GB 5.00% 666
> 192.168.0.9 Up Normal 2.33 GB 5.00% 733
> 192.168.0.10 Down Normal 2.39 GB 5.00% 7ff
> 192.168.0.11 Up Normal 2.4 GB 5.00% 8cc
> 192.168.0.12 Up Normal 2.74 GB 5.00% 999
> 192.168.0.13 Up Normal 3.17 GB 5.00% a66
> 192.168.0.14 Up Normal 3.25 GB 5.00% b33
> 192.168.0.15 Up Normal 3.01 GB 5.00% c00
> 192.168.0.16 Up Normal 2.48 GB 5.00% ccc
> 192.168.0.17 Up Normal 2.41 GB 5.00% d99
> 192.168.0.18 Up Normal 2.3 GB 5.00% e66
> 192.168.0.19 Up Normal 2.27 GB 5.00% f33
> 192.168.0.20 Up Normal 2.32 GB 5.00% ffffffffffffffff
> The interesting part is that after a while (seconds or minutes), I have seen 
> cassandra nodes return to a normal state again (without restart). I have also 
> never seen this happen at 2 nodes at the same time in the cluster (the node 
> where it happens differes, but there seems to be scheme for it to happen on 
> the first node most of the times).
> In the above case, I restarted node 192.168.0.10 and the first node returned 
> to normal state. (I don't know if there is a correlation)
> I attached the jstack of the node in trouble (as soon as I could access it 
> with jstack, but I suspect this is the jstack when the node was running 
> normal again).
> The heap usage is still moderate:
> /software/cassandra/bin/nodetool -h localhost info
> 0cc
> Gossip active    : true
> Load             : 3.49 GB
> Generation No    : 1295949691
> Uptime (seconds) : 42843
> Heap Memory (MB) : 1570.58 / 3005.38
> I will enable the GC logging tomorrow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to