[ 
https://issues.apache.org/jira/browse/CASSANDRA-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775635#comment-13775635
 ] 

Oleg Anastasyev commented on CASSANDRA-6079:
--------------------------------------------

Exactly.
The case here could be:
1. Coordinator writes batchlog to chosen 2 nodes within the same dc
2. Coordinator crashes
3. Those 2 nodes supposed to run replayAllFailedBatches to replay this batch of 
crashed coordinator
3.1 but they cannot, because compaction pool is busy with ongoing large CF 
compaction and they are waiting on submitUsedDefined().get()


                
> Memtables flush is delayed when having a lot of batchlog activity, making 
> node OOM
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6079
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6079
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Oleg Anastasyev
>            Assignee: Oleg Anastasyev
>            Priority: Minor
>             Fix For: 1.2.11, 2.0.2
>
>         Attachments: NoWaitBatchlogCompaction.diff
>
>
> Both MeteredFlusher and BatchlogManager share the same OptionalTasks thread. 
> So, when batchlog manager processes its tasks no flushes can occur. Even 
> more, batchlog manager waits for batchlog CF compaction to finish.
> On a lot of batchlog activity this prevents memtables from flush for a long 
> time, making the node OOM.
> Fixed this by moving batchlog to its own thread and not waiting for batchlog 
> compaction to finish.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to