[ https://issues.apache.org/jira/browse/CASSANDRA-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacek Furmankiewicz updated CASSANDRA-7361: ------------------------------------------- Attachment: histogram.png Histogram report > Cassandra locks up in full GC when you assign the entire heap to row cache > -------------------------------------------------------------------------- > > Key: CASSANDRA-7361 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7361 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Ubuntu, RedHat, JDK 1.7 > Reporter: Jacek Furmankiewicz > Priority: Minor > Attachments: histogram.png, leaks_report.png > > > We have a long running batch load process, which runs for many hours. > Massive amount of writes, in large mutation batches (we increase the thrift > frame size to 45 MB). > Everything goes well, but after about 3 hrs of processing everything locks > up. We start getting NoHostsAvailable exceptions on the Java application side > (with Astyanax as our driver), eventually socket timeouts. > Looking at Cassandra, we can see that it is using nearly the full 8GB of heap > and unable to free it. It spends most of its time in full GC, but the amount > of memory does not go down. > Here is a long sample from jstat to show this over an extended time period > e.g. > http://aep.appspot.com/display/NqqEagzGRLO_pCP2q8hZtitnuVU/ > This continues even after we shut down our app. Nothing is connected to > Cassandra any more, yet it is still stuck in full GC and cannot free up > memory. > Running nodetool tpstats shows that nothing is pending, all seems OK: > {quote} > Pool Name Active Pending Completed Blocked All > time blocked > ReadStage 0 0 69555935 0 > 0 > RequestResponseStage 0 0 0 0 > 0 > MutationStage 0 0 73123690 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > ReplicateOnWriteStage 0 0 0 0 > 0 > GossipStage 0 0 0 0 > 0 > CacheCleanupExecutor 0 0 0 0 > 0 > MigrationStage 0 0 46 0 > 0 > MemoryMeter 0 0 1125 0 > 0 > FlushWriter 0 0 824 0 > 30 > ValidationExecutor 0 0 0 0 > 0 > InternalResponseStage 0 0 23 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > MemtablePostFlusher 0 0 1783 0 > 0 > MiscStage 0 0 0 0 > 0 > PendingRangeCalculator 0 0 1 0 > 0 > CompactionExecutor 0 0 74330 0 > 0 > commitlog_archiver 0 0 0 0 > 0 > HintedHandoff 0 0 0 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > PAGED_RANGE 0 > BINARY 0 > READ 585 > MUTATION 75775 > _TRACE 0 > REQUEST_RESPONSE 0 > COUNTER_MUTATION 0 > {quote} > We had this happen on 2 separate boxes, one with 2.0.6, the other with 2.0.8. > Right now this is a total blocker for us. We are unable to process the > customer data and have to abort in the middle of large processing. > This is a new customer, so we did not have a chance to see if this occurred > with 1.1 or 1.2 in the past (we moved to 2.0 recently). > We have the Cassandra process still running, pls let us know if there is > anything else we could run to give you more insight. -- This message was sent by Atlassian JIRA (v6.2#6252)