[ https://issues.apache.org/jira/browse/CASSANDRA-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-936: ------------------------------------- Attachment: 936.txt Okay, here's what's happening. The assert in question is assert context.position > context.getSegment().getHeader().getPosition(id) : "discard called on obsolete context " + context; The first argument, context.position, is "the position in the current commitlog segment at the time the currently-finishing flush began." The second argument is "the position we wrote in the commitlog segment header showing where the position was for last-completed flush." If the current flush position is less than the last flush position, then the last flush potentially deleted segments whose data wasn't turned into sstables yet, which would be Very Bad. If the positions are equal, that would mean we're flushing a CF that hasn't had any new data since the last one, which isn't supposed to happen either. But, there's a discontinuity in the position measuring, and that's when a new commitlog context is swapped in. Let me give a diagram here of the CL segement's contents: XXXXXXXXXXXXXXYYYYYYYYYYYYYYZZZZZZZZZZZZZ 0 H F etc The first H bytes are the header. From H to position F represents the first mutation written to the fresh segment. Normally, if you force a memtable to flush after a new commitlog segment is rolled in, one of two things will happen: (1) you will flush with a context position of H, and a header lastFlushedPosition of 0 (the initial value) or (2) you will flush with a context position of F or higher, and a header lastFlushedPosition of H (after some writes occur) But, if the flush context gets measured at H before any writes happen, THEN A WRITE OCCURS DURING THE FLUSH BEFORE THE DISCARD PHASE, you will have context position == lastFlushedPosition == H, because making a write calls turnOn which edits lastFlushedAt, even though no flushes have occurred, because lastFlushedAt is also semantically startReplayAt, and 0 is not a valid place to start mutation replay. So in the latest stacktrace, H == 169, and it's failing in exactly this scenario. The least invasive fix for this in 0.6 is to change the assert from > to >=. > Discard Commitlog Exception > --------------------------- > > Key: CASSANDRA-936 > URL: https://issues.apache.org/jira/browse/CASSANDRA-936 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Dan Di Spaltro > Assignee: Jonathan Ellis > Fix For: 0.6.1 > > Attachments: 936.txt > > > 2010-03-30_21:19:02.31041 java.util.concurrent.ExecutionException: > java.lang.RuntimeException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: discard > called on obsolete context > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log', > position=8780) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.FutureTask.get(FutureTask.java:111) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1118) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > 2010-03-30_21:19:02.31041 at java.lang.Thread.run(Thread.java:636) > 2010-03-30_21:19:02.31041 Caused by: java.lang.RuntimeException: > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.AssertionError: discard called on obsolete context > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log', > position=8780) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.FutureTask.run(FutureTask.java:166) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > 2010-03-30_21:19:02.31041 ... 2 more > 2010-03-30_21:19:02.31041 Caused by: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: discard > called on obsolete context > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log', > position=8780) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:358) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.db.ColumnFamilyStore$1.runMayThrow(ColumnFamilyStore.java:371) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > 2010-03-30_21:19:02.31041 ... 6 more > 2010-03-30_21:19:02.31041 Caused by: java.util.concurrent.ExecutionException: > java.lang.AssertionError: discard called on obsolete context > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log', > position=8780) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.FutureTask.get(FutureTask.java:111) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:350) > 2010-03-30_21:19:02.31041 ... 8 more > 2010-03-30_21:19:02.31041 Caused by: java.lang.AssertionError: discard called > on obsolete context > CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1269983937410.log', > position=8780) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegmentsInternal(CommitLog.java:378) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.db.commitlog.CommitLog.access$300(CommitLog.java:72) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.db.commitlog.CommitLog$6.call(CommitLog.java:344) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > 2010-03-30_21:19:02.31041 at > java.util.concurrent.FutureTask.run(FutureTask.java:166) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.db.commitlog.CommitLogExecutorService.process(CommitLogExecutorService.java:113) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.db.commitlog.CommitLogExecutorService.access$200(CommitLogExecutorService.java:35) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.db.commitlog.CommitLogExecutorService$1.runMayThrow(CommitLogExecutorService.java:67) > 2010-03-30_21:19:02.31041 at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > 2010-03-30_21:19:02.31041 ... 1 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.