[ https://issues.apache.org/jira/browse/CASSANDRA-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-3579: ---------------------------------------- Attachment: 3579.patch I believe I know what is going on. This is bug with CF having gc_grace == 0 (which hints have). The problem lies in the following line of removeDeleted: {noformat} if ((c.isMarkedForDelete() && c.getLocalDeletionTime() <= gcBefore) || c.timestamp() <= cf.getMarkedForDeleteAt()) { iter.remove(); } {noformat} and more precisely the first condition. When that is executed, we have gcbefore <= now but we *can* have gcbefore == now if gc_grace == 0 (and since the resolution is the second, it's not even a very unlikely race). Now for expiring columns it is further possible that localExpirationTime == gcbefore == now. When that happens, c.isMarkedForDelete() will return false (thus the column will be kept) because this method is defined as: {noformat} public boolean isMarkedForDelete() { return (int) (System.currentTimeMillis() / 1000 ) > localExpirationTime; } {noformat} However, during the second pass, now has changed and it is possible that we have now > gcbefore. But since gcbefore == localExpirationTime, this means that isMarkedForDelete() will be true *and* getLocalDeletionTime() <= gcbefore, so the column will be considered tombstone and gcable. In other word, the current code does not respect the condition that at all time a column is considered gcable only if it is considered deleted. A rather simple fix consist in changing the condition in removeDeleted to be {noformat} if ((c.isMarkedForDelete() && c.getLocalDeletionTime() < gcBefore) || c.timestamp() <= cf.getMarkedForDeleteAt()) { iter.remove(); } {noformat} Note the strict lesser than operator for the first condition. It fixes it because since we know that we always have gcbefore <= now, localExpirationTime < gcBefore always imply that isMarkedForDelete() is true. Attached patch to make that fix (with hopefully useful comments). > AssertionError in hintedhandoff - 1.0.5 > --------------------------------------- > > Key: CASSANDRA-3579 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3579 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.5 > Environment: RHEL 6.1 64 bit, 32 GB RAM, 8 GB allocated to JVM, > running XFS filesystem for commit/data directories > Reporter: Ramesh Natarajan > Assignee: Sylvain Lebresne > Fix For: 1.0.7 > > Attachments: 3579.patch > > > We are running a 8 node cassandra cluster running cassandra 1.0.5. > All our CF use leveled compaction. We ran a test where we did a lot > of inserts for 3 days. After that we started to run tests where some > of the reads could ask for information that was inserted a while back. > In this scenario we are seeing this assertion error in HintedHandoff. > ERROR [HintedHandoff:3] 2011-12-05 15:42:04,324 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[HintedHandoff:3,1,main] > java.lang.RuntimeException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: > originally calculated column size of 470937164 but now it is 470294247 > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: > originally calculated column size of 470937164 but now it is 470294247 > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330) > at > org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:81) > at > org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:353) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.AssertionError: originally calculated column size of > 470937164 but now it is 470294247 > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:326) > ... 6 more > Caused by: java.lang.AssertionError: originally calculated column size > of 470937164 but now it is 470294247 > at > org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:124) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) > at > org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:158) > at > org.apache.cassandra.db.compaction.CompactionManager$6.call(CompactionManager.java:275) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > ... 3 more > ERROR [HintedHandoff:3] 2011-12-05 15:42:04,333 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[HintedHandoff:3,1,main] > java.lang.RuntimeException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: > originally calculated column size of 470937164 but now it is 470294247 > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: java.lang.AssertionError: > originally calculated column size of 470937164 but now it is 470294247 > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330) > at > org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:81) > at > org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:353) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.AssertionError: originally calculated column size of > 470937164 but now it is 470294247 > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:326) > ... 6 more > Caused by: java.lang.AssertionError: originally calculated column size > of 470937164 but now it is 470294247 > at > org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:124) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) > at > org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:158) > at > org.apache.cassandra.db.compaction.CompactionManager$6.call(CompactionManager.java:275) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > ... 3 more > ERROR [CompactionExecutor:9931] 2011-12-05 15:42:04,333 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[CompactionExecutor:9931,1,main] > java.lang.AssertionError: originally calculated column size of > 470937164 but now it is 470294247 > at > org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:124) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160) > at > org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:158) > at > org.apache.cassandra.db.compaction.CompactionManager$6.call(CompactionManager.java:275) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira