[ 
https://issues.apache.org/jira/browse/CASSANDRA-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830254#comment-15830254
 ] 

Jan Urbański commented on CASSANDRA-13123:
------------------------------------------

[~jasobrown] I haven't had the chance to try this out in production yet, I'll 
try to do that tomorrow. The initial commitlog replay takes up to two minutes 
for each of our nodes right now and if I understand correctly, after a drain 
all commitlogs except for at most two would be deleted, so the initial replay 
phase would be reduced to essentially zero. The shutdown phase might take a bit 
longer, because it'll have to wait for those commitlogs to be deleted, of 
course.

The exact improvement depends on the number of CLs left behind after a drain - 
on machines with heavily contended disks it can be a lot, on lightly loaded 
ones it might be 0.

As to when we're doing drains, it's on every restart (it's part of the restart 
procedure that we have).

> Draining a node might fail to delete all inactive commitlogs
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-13123
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13123
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Jan Urbański
>            Assignee: Jan Urbański
>             Fix For: 3.8
>
>         Attachments: 13123-2.2.8.txt, 13123-3.0.10.txt, 13123-3.9.txt, 
> 13123-trunk.txt
>
>
> After issuing a drain command, it's possible that not all of the inactive 
> commitlogs are removed.
> The drain command shuts down the CommitLog instance, which in turn shuts down 
> the CommitLogSegmentManager. This has the effect of discarding any pending 
> management tasks it might have, like the removal of inactive commitlogs.
> This in turn leads to an excessive amount of commitlogs being left behind 
> after a drain and a lengthy recovery after a restart. With a fleet of dozens 
> of nodes, each of them leaving several GB of commitlogs after a drain and 
> taking up to two minutes to recover them on restart, the additional time 
> required to restart the entire fleet becomes noticeable.
> This problem is not present in 3.x or trunk because of the CLSM rewrite done 
> in CASSANDRA-8844.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to