[ https://issues.apache.org/jira/browse/CASSANDRA-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-6181: ---------------------------------------- Attachment: 6181.txt I unfortunately haven't been to reproduce with the commit log from Jeffrey. That being said, looking at the stacktrace more closely, I don't think that this is an infinite loop. Rather, in some insertion cases, we have to iterate over all (or a large part) of the range tombstones and that is currently done recursively so this can blow up the stack. The blow-up does reproduce rather easily in a unit test (with 3K range tombstone, which is not small, but not all that much). I though we would be unlikely to run into that case with the way range tombstones are used in practice, but I suppose that's still possible if you have multiple clustering columns so maybe that's just that. Anyway, I don't really another fix than to rewrite the logic non-recursively. Attaching a patch for this. This is probably a little bit more involved that what I'd like to push in 1.2 at this point, but at same I don't think there is any simpler way to fix this. On the bright side, RangeTombstoneList is relatively well covered by unit tests. [~exabytes18], [~jdamick]: If you guys could check that the attached patch does fix this for you, that would be awesome. > Replaying a commit led to java.lang.StackOverflowError and node crash > --------------------------------------------------------------------- > > Key: CASSANDRA-6181 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6181 > Project: Cassandra > Issue Type: Bug > Environment: 1.2.8 & 1.2.10 - ubuntu 12.04 > Reporter: Jeffrey Damick > Assignee: Sylvain Lebresne > Priority: Critical > Fix For: 1.2.12 > > Attachments: 6181.txt > > > 2 of our nodes died after attempting to replay a commit. I can attach the > commit log file if that helps. > It was occurring on 1.2.8, after several failed attempts to start, we > attempted startup with 1.2.10. This also yielded the same issue (below). > The only resolution was to physically move the commit log file out of the way > and then the nodes were able to start... > The replication factor was 3 so I'm hoping there was no data loss... > {code} > INFO [main] 2013-10-11 14:50:35,891 CommitLogReplayer.java (line 119) > Replaying /ebs/cassandra/commitlog/CommitLog-2-1377542389560.log > ERROR [MutationStage:18] 2013-10-11 14:50:37,387 CassandraDaemon.java (line > 191) Exception in thread Thread[MutationStage:18,5,main] > java.lang.StackOverflowError > at > org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:68) > at > org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:57) > at > org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) > at > org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:229) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:81) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31) > at > org.apache.cassandra.db.RangeTombstoneList.insertAfter(RangeTombstoneList.java:439) > at > org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:405) > at > org.apache.cassandra.db.RangeTombstoneList.weakInsertFrom(RangeTombstoneList.java:472) > at > org.apache.cassandra.db.RangeTombstoneList.insertAfter(RangeTombstoneList.java:456) > at > org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:405) > at > org.apache.cassandra.db.RangeTombstoneList.weakInsertFrom(RangeTombstoneList.java:472) > at > org.apache.cassandra.db.RangeTombstoneList.insertAfter(RangeTombstoneList.java:456) > at > org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:405) > at > org.apache.cassandra.db.RangeTombstoneList.weakInsertFrom(RangeTombstoneList.java:472) > .... etc.... over and over until .... > at > org.apache.cassandra.db.RangeTombstoneList.weakInsertFrom(RangeTombstoneList.java:472) > at > org.apache.cassandra.db.RangeTombstoneList.insertAfter(RangeTombstoneList.java:456) > at > org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:405) > at > org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:144) > at > org.apache.cassandra.db.RangeTombstoneList.addAll(RangeTombstoneList.java:186) > at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:180) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:197) > at > org.apache.cassandra.db.AbstractColumnContainer.addAllWithSizeDelta(AbstractColumnContainer.java:99) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:207) > at org.apache.cassandra.db.Memtable.put(Memtable.java:170) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:745) > at org.apache.cassandra.db.Table.apply(Table.java:388) > at org.apache.cassandra.db.Table.apply(Table.java:353) > at > org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:258) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)