[ 
https://issues.apache.org/jira/browse/CASSANDRA-10515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977052#comment-14977052
 ] 

Jeff Griffith edited comment on CASSANDRA-10515 at 10/27/15 7:59 PM:
---------------------------------------------------------------------

[~krummas] [~tjake] something interesting on this second form of commit log 
growth where all nodes had uncontrolled commit log growth unlike the first 
example (many files in L0) where it was isolated nodes. for this latter case, I 
think i'm able to relate this to a separate problem with an index out of bounds 
exception. working with [~benedict] it seems like we have that one solved. i'm 
hopeful that patch will solve this growing commit log problem as well. it seems 
like all roads lead to rome where rome is commit log growth :-)

here is the other JIRA identifying an integer overflow in 
AbstractNativeCell.java
https://issues.apache.org/jira/browse/CASSANDRA-10579

Still uncertain how to proceed with the first form that seems to be starvation 
as you have described.



was (Author: jeffery.griffith):
[~krummas] [~tjake] something interesting on this second form of commit log 
growth where all nodes had uncontrolled commit log growth unless the first 
example (many files in L0) where it was isolated nodes. for this latter case, I 
think i'm able to relate this to a separate problem with an index out of bounds 
exception. working with [~benedict] it seems like we have that one solved. i'm 
hopeful that patch will solve this growing commit log problem as well. it seems 
like all roads lead to rome where rome is commit log growth :-)

here is the other JIRA identifying an integer overflow in 
AbstractNativeCell.java
https://issues.apache.org/jira/browse/CASSANDRA-10579

Still uncertain how to proceed with the first form that seems to be starvation 
as you have described.


> Commit logs back up with move to 2.1.10
> ---------------------------------------
>
>                 Key: CASSANDRA-10515
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10515
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: redhat 6.5, cassandra 2.1.10
>            Reporter: Jeff Griffith
>            Assignee: Branimir Lambov
>            Priority: Critical
>              Labels: commitlog, triage
>         Attachments: C5commitLogIncrease.jpg, CommitLogProblem.jpg, 
> CommitLogSize.jpg, MultinodeCommitLogGrowth-node1.tar.gz, RUN3tpstats.jpg, 
> cassandra.yaml, cfstats-clean.txt, stacktrace.txt, system.log.clean
>
>
> After upgrading from cassandra 2.0.x to 2.1.10, we began seeing problems 
> where some nodes break the 12G commit log max we configured and go as high as 
> 65G or more before it restarts. Once it reaches the state of more than 12G 
> commit log files, "nodetool compactionstats" hangs. Eventually C* restarts 
> without errors (not sure yet whether it is crashing but I'm checking into it) 
> and the cleanup occurs and the commit logs shrink back down again. Here is 
> the nodetool compactionstats immediately after restart.
> {code}
> jgriffith@prod1xc1.c2.bf1:~$ ndc
> pending tasks: 2185
>    compaction type   keyspace                          table     completed    
>       total    unit   progress
>         Compaction   SyncCore                          *cf1*   61251208033   
> 170643574558   bytes     35.89%
>         Compaction   SyncCore                          *cf2*   19262483904    
> 19266079916   bytes     99.98%
>         Compaction   SyncCore                          *cf3*    6592197093    
>  6592316682   bytes    100.00%
>         Compaction   SyncCore                          *cf4*    3411039555    
>  3411039557   bytes    100.00%
>         Compaction   SyncCore                          *cf5*    2879241009    
>  2879487621   bytes     99.99%
>         Compaction   SyncCore                          *cf6*   21252493623    
> 21252635196   bytes    100.00%
>         Compaction   SyncCore                          *cf7*   81009853587    
> 81009854438   bytes    100.00%
>         Compaction   SyncCore                          *cf8*    3005734580    
>  3005768582   bytes    100.00%
> Active compaction remaining time :        n/a
> {code}
> I was also doing periodic "nodetool tpstats" which were working but not being 
> logged in system.log on the StatusLogger thread until after the compaction 
> started working again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to