[ 
https://issues.apache.org/jira/browse/CASSANDRA-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-936:
-------------------------------------

    Attachment: 936.txt

Okay, here's what's happening.

The assert in question is
        assert context.position > 
context.getSegment().getHeader().getPosition(id) : "discard called on obsolete 
context " + context;

The first argument, context.position, is "the position in the current commitlog 
segment at the time the currently-finishing flush began."  The second argument 
is "the position we wrote in the commitlog segment header showing where the 
position was for last-completed flush."  If the current flush position is less 
than the last flush position, then the last flush potentially deleted segments 
whose data wasn't turned into sstables yet, which would be Very Bad.  If the 
positions are equal, that would mean we're flushing a CF that hasn't had any 
new data since the last one, which isn't supposed to happen either.

But, there's a discontinuity in the position measuring, and that's when a new 
commitlog context is swapped in.  Let me give a diagram here of the CL 
segement's contents:


XXXXXXXXXXXXXXYYYYYYYYYYYYYYZZZZZZZZZZZZZ
0             H             F            etc

The first H bytes are the header.  From H to position F represents the first 
mutation written to the fresh segment.

Normally, if you force a memtable to flush after a new commitlog segment is 
rolled in, one of two things will happen:

(1) you will flush with a context position of H, and a header 
lastFlushedPosition of 0 (the initial value) or
(2) you will flush with a context position of F or higher, and a header 
lastFlushedPosition of H (after some writes occur)

But, if the flush context gets measured at H before any writes happen, THEN A 
WRITE OCCURS DURING THE FLUSH BEFORE THE DISCARD PHASE, you will have context 
position == lastFlushedPosition == H, because making a write calls turnOn which 
edits lastFlushedAt, even though no flushes have occurred, because 
lastFlushedAt is also semantically startReplayAt, and 0 is not a valid place to 
start mutation replay.  So in the latest stacktrace, H == 169, and it's failing 
in exactly this scenario.

The least invasive fix for this in 0.6 is to change the assert from > to >=.


> Discard Commitlog Exception
> ---------------------------
>
>                 Key: CASSANDRA-936
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-936
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Dan Di Spaltro
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.1
>
>         Attachments: 936.txt
>
>
> 2010-03-30_21:19:02.31041 java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.AssertionError: discard 
> called on obsolete context 
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log',
>  position=8780)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.FutureTask.get(FutureTask.java:111)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1118)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 2010-03-30_21:19:02.31041     at java.lang.Thread.run(Thread.java:636)
> 2010-03-30_21:19:02.31041 Caused by: java.lang.RuntimeException: 
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.AssertionError: discard called on obsolete context 
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log',
>  position=8780)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 2010-03-30_21:19:02.31041     ... 2 more
> 2010-03-30_21:19:02.31041 Caused by: java.lang.RuntimeException: 
> java.util.concurrent.ExecutionException: java.lang.AssertionError: discard 
> called on obsolete context 
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log',
>  position=8780)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:358)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.db.ColumnFamilyStore$1.runMayThrow(ColumnFamilyStore.java:371)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> 2010-03-30_21:19:02.31041     ... 6 more
> 2010-03-30_21:19:02.31041 Caused by: java.util.concurrent.ExecutionException: 
> java.lang.AssertionError: discard called on obsolete context 
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log',
>  position=8780)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.FutureTask.get(FutureTask.java:111)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:350)
> 2010-03-30_21:19:02.31041     ... 8 more
> 2010-03-30_21:19:02.31041 Caused by: java.lang.AssertionError: discard called 
> on obsolete context 
> CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log',
>  position=8780)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegmentsInternal(CommitLog.java:378)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.db.commitlog.CommitLog.access$300(CommitLog.java:72)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.db.commitlog.CommitLog$6.call(CommitLog.java:344)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> 2010-03-30_21:19:02.31041     at 
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.db.commitlog.CommitLogExecutorService.process(CommitLogExecutorService.java:113)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.db.commitlog.CommitLogExecutorService.access$200(CommitLogExecutorService.java:35)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.db.commitlog.CommitLogExecutorService$1.runMayThrow(CommitLogExecutorService.java:67)
> 2010-03-30_21:19:02.31041     at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> 2010-03-30_21:19:02.31041     ... 1 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to