[ 
https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411102#comment-13411102
 ] 

Robert Coli commented on CASSANDRA-1967:
----------------------------------------

Relevant code section now (1.1.1 release) reads :
"
   public boolean accept(File dir, String name)
            {
                // we used to try to avoid instantiating commitlog (thus 
creating an empty segment ready for writes)
                // until after recover was finished.  this turns out to be 
fragile; it is less error-prone to go
                // ahead and allow writes before recover(), and just skip 
active segments when we do.
                return CommitLogSegment.possibleCommitLogFile(name) && 
!instance.allocator.manages(name);
            }
"

This suggests that the described pattern of an explicit flush triggering 
compaction is no longer a concern.

A node which has just been restarted might start compacting shortly after 
restart as a side effect of accepting new writes during replay. This might fill 
a memtable and flush, triggering compaction. Unless the heap has been made 
smaller between restarts, I don't believe a flush can be triggered during 
replay in any other way. If you have changed the size of your heap between 
restarts, it seems reasonable and logical to presume that replay might result 
in flush. 

This situation is the same as normal operation : a node being written to might 
flush and compact. Unless we are very very stringent about wanting to ELIMINATE 
ANY CHANCE that a node which has "recently" restarted might start compacting as 
a side effect of restart (and 
https://issues.apache.org/jira/browse/CASSANDRA-2444 doesn't seem interested in 
being that stringent...) I think we are probably at best case behavior here.

In any case, this particular ticket should probably be resolved as it seems to 
no longer describe the current state of code.
                
> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>            Priority: Minor
>
> (Apologies in advance if there is some very compelling reason to flush after 
> replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering 
> compaction. As a node is likely to struggle performance-wise after 
> restarting, triggering compaction at that time seems like something we might 
> wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on 
> this flush is at 
> src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is 
> instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently 
> open commitlog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to