[ https://issues.apache.org/jira/browse/CASSANDRA-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sunjian updated CASSANDRA-5205: ------------------------------- Attachment: the-normal-free-node-no-presure.jpg one of the normal free node , it enjoy all the happy free time > The first three Cassandra node is very busy , GC pause the world (Real > production Env. Exp.) > -------------------------------------------------------------------------------------------- > > Key: CASSANDRA-5205 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5205 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 1.1.5 > Environment: cassandra 1.1.5 release > centos 5.5 > jdk1.7u9 > vmware(TM)'s exsi based VM : 30GB RAM , 4*4core CPU > Hard ware : Dell R720 , 2*6core CPU , 128GB RAM , made 3 node as above > data hosted by each node : about 8GB > Reporter: sunjian > Priority: Minor > Fix For: 1.1.10 > > Attachments: the-normal-free-node-no-presure.jpg, > the-trouble-maker-node.jpg > > > hi dear cares , > I have 10 nodes before , all on the centos VM with 16GB ram and 8core CPU , > and running the cassandra 1.1.5 with only one User keyspace (RF=3) . > Heap(Old:8GB,New:2GB) > matters : > 1. the first three nodes (from token 0) goes very busy all the time , but the > left 7 nodes seems nothing to do , both the CPU and RAM was freely . > 2. all of the first three nodes' JVM ram cost increasing crazy , CMS GC fires > nearly every seconds > 3. when GC happened , the world seems stopped . checking via node tool , when > running node tool on the first three node , nodetool will hung up . when > running on the left 7 nodes , it shows that the first three node down > 4. when GC finished , the node comes back , but it will gone in mins later . > 5. kill java process , reboot the frozen node , it will up in mins , and the > JVM ram will be increasing full in mins as well , and everythings above > repeating .... > 6. even if only one of the first three node frozen , the client request will > failed . but my client request CL=QUORUM , and I am playing with hector > client lib. > 7. disable the three nodes' thrift api , nothing changed. > ----------change------------ > 0. stop the coming user request (stop our user service to make cassandra free) > 1. decommission 4 nodes (one by one) > 2. moving tokens to banlance the left 6 nodes (one by one) > 3. change the left 6 node resource to : 30GB RAM 16core CPU , heap(16G old , > 4GB new) > 4. enable JNA > 5. do major compaction on the 6nodes , do repair on the 6nodes > 6. start the new cluster ... > 7. everything seems ok in the early running time , but 5hours past , every > bad matters come back . > 8. because of we have got double RAM now , the dead repeating cycle goes > hourly > some screen short attached . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira