[ 
https://issues.apache.org/jira/browse/CASSANDRA-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020052#comment-14020052
 ] 

Jacek Furmankiewicz edited comment on CASSANDRA-7361 at 6/6/14 5:07 PM:
------------------------------------------------------------------------

Well, we definitely have some sort of issue.

I have JNA installed:

{quote}
sudo apt-get install libjna-java
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libjna-java is already the newest version.
libjna-java set to manually installed.
{quote}

But when I start 2.0.8 manually from the bin folder, it says

{quote}
NFO 12:00:08,299 JNA not found. Native methods will be disabled.
{quote}

Considering that JNA is such a critical requirement and the row cache is 2x the 
size of the heap, it seems wrong to me that Cassandra just starts and gets 
itself all locked up within a few hours.


was (Author: jfurmankiewicz):
Well, we definitely have some sort of issue.

I have JNA installed:

{quote}
sudo apt-get install libjna-java
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libjna-java is already the newest version.
libjna-java set to manually installed.
{quote}

But when I start 2.0.8 manually from the bin folder, it says

{quote}
NFO 12:00:08,299 JNA not found. Native methods will be disabled.
{quote}

Considering that JNA is such a critical requirement and the row cache is 2x the 
same of the heap, it seems wrong to me that Cassandra just starts and gets 
itself all locked up within a few hours.

> Cassandra locks up in full GC when you assign the entire heap to row cache
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7361
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7361
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu, RedHat, JDK 1.7
>            Reporter: Jacek Furmankiewicz
>            Priority: Minor
>
> We have a long running batch load process, which runs for many hours.
> Massive amount of writes, in large mutation batches (we increase the thrift 
> frame size to 45 MB).
> Everything goes well, but after about 3 hrs of processing everything locks 
> up. We start getting NoHostsAvailable exceptions on the Java application side 
> (with Astyanax as our driver), eventually socket timeouts.
> Looking at Cassandra, we can see that it is using nearly the full 8GB of heap 
> and unable to free it. It spends most of its time in full GC, but the amount 
> of memory does not go down.
> Here is a long sample from jstat to show this over an extended time period
> e.g.
> http://aep.appspot.com/display/NqqEagzGRLO_pCP2q8hZtitnuVU/
> This continues even after we shut down our app. Nothing is connected to 
> Cassandra any more, yet it is still stuck in full GC and cannot free up 
> memory.
> Running nodetool tpstats shows that nothing is pending, all seems OK:
> {quote}
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> ReadStage                         0         0       69555935         0        
>          0
> RequestResponseStage              0         0              0         0        
>          0
> MutationStage                     0         0       73123690         0        
>          0
> ReadRepairStage                   0         0              0         0        
>          0
> ReplicateOnWriteStage             0         0              0         0        
>          0
> GossipStage                       0         0              0         0        
>          0
> CacheCleanupExecutor              0         0              0         0        
>          0
> MigrationStage                    0         0             46         0        
>          0
> MemoryMeter                       0         0           1125         0        
>          0
> FlushWriter                       0         0            824         0        
>         30
> ValidationExecutor                0         0              0         0        
>          0
> InternalResponseStage             0         0             23         0        
>          0
> AntiEntropyStage                  0         0              0         0        
>          0
> MemtablePostFlusher               0         0           1783         0        
>          0
> MiscStage                         0         0              0         0        
>          0
> PendingRangeCalculator            0         0              1         0        
>          0
> CompactionExecutor                0         0          74330         0        
>          0
> commitlog_archiver                0         0              0         0        
>          0
> HintedHandoff                     0         0              0         0        
>          0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> PAGED_RANGE                  0
> BINARY                       0
> READ                       585
> MUTATION                 75775
> _TRACE                       0
> REQUEST_RESPONSE             0
> COUNTER_MUTATION             0
> {quote}
> We had this happen on 2 separate boxes, one with 2.0.6, the other with 2.0.8.
> Right now this is a total blocker for us. We are unable to process the 
> customer data and have to abort in the middle of large processing.
> This is a new customer, so we did not have a chance to see if this occurred 
> with 1.1 or 1.2 in the past (we moved to 2.0 recently).
> We have the Cassandra process still running, pls let us know if there is 
> anything else we could run to give you more insight.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to