[jira] [Updated] (CASSANDRA-6181) Replaying a commit led to java.lang.StackOverflowError and node crash

Sylvain Lebresne (JIRA) Tue, 22 Oct 2013 02:46:21 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-6181:
----------------------------------------

    Attachment: 6181.txt

I unfortunately haven't been to reproduce with the commit log from Jeffrey.

That being said, looking at the stacktrace more closely, I don't think that 
this is an infinite loop. Rather, in some insertion cases, we have to iterate 
over all (or a large part) of the range tombstones and that is currently done 
recursively so this can blow up the stack. The blow-up does reproduce rather 
easily in a unit test (with 3K range tombstone, which is not small, but not all 
that much). I though we would be unlikely to run into that case with the way 
range tombstones are used in practice, but I suppose that's still possible if 
you have multiple clustering columns so maybe that's just that.

Anyway, I don't really another fix than to rewrite the logic non-recursively.  
Attaching a patch for this. This is probably a little bit more involved that 
what I'd like to push in 1.2 at this point, but at same I don't think there is 
any simpler way to fix this. On the bright side, RangeTombstoneList is 
relatively well covered by unit tests.

[~exabytes18], [~jdamick]: If you guys could check that the attached patch does 
fix this for you, that would be awesome.


> Replaying a commit led to java.lang.StackOverflowError and node crash
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-6181
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6181
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: 1.2.8 & 1.2.10 - ubuntu 12.04
>            Reporter: Jeffrey Damick
>            Assignee: Sylvain Lebresne
>            Priority: Critical
>             Fix For: 1.2.12
>
>         Attachments: 6181.txt
>
>
> 2 of our nodes died after attempting to replay a commit.  I can attach the 
> commit log file if that helps.
> It was occurring on 1.2.8, after several failed attempts to start, we 
> attempted startup with 1.2.10.  This also yielded the same issue (below).  
> The only resolution was to physically move the commit log file out of the way 
> and then the nodes were able to start...  
> The replication factor was 3 so I'm hoping there was no data loss...
> {code}
>  INFO [main] 2013-10-11 14:50:35,891 CommitLogReplayer.java (line 119) 
> Replaying /ebs/cassandra/commitlog/CommitLog-2-1377542389560.log
> ERROR [MutationStage:18] 2013-10-11 14:50:37,387 CassandraDaemon.java (line 
> 191) Exception in thread Thread[MutationStage:18,5,main]
> java.lang.StackOverflowError
>         at 
> org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:68)
>         at 
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:57)
>         at 
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29)
>         at 
> org.apache.cassandra.db.marshal.AbstractType.compareCollectionMembers(AbstractType.java:229)
>         at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:81)
>         at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.insertAfter(RangeTombstoneList.java:439)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:405)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.weakInsertFrom(RangeTombstoneList.java:472)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.insertAfter(RangeTombstoneList.java:456)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:405)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.weakInsertFrom(RangeTombstoneList.java:472)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.insertAfter(RangeTombstoneList.java:456)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:405)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.weakInsertFrom(RangeTombstoneList.java:472)
> .... etc.... over and over until ....
>         at 
> org.apache.cassandra.db.RangeTombstoneList.weakInsertFrom(RangeTombstoneList.java:472)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.insertAfter(RangeTombstoneList.java:456)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:405)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:144)
>         at 
> org.apache.cassandra.db.RangeTombstoneList.addAll(RangeTombstoneList.java:186)
>         at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:180)
>         at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:197)
>         at 
> org.apache.cassandra.db.AbstractColumnContainer.addAllWithSizeDelta(AbstractColumnContainer.java:99)
>         at org.apache.cassandra.db.Memtable.resolve(Memtable.java:207)
>         at org.apache.cassandra.db.Memtable.put(Memtable.java:170)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:745)
>         at org.apache.cassandra.db.Table.apply(Table.java:388)
>         at org.apache.cassandra.db.Table.apply(Table.java:353)
>         at 
> org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:258)
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (CASSANDRA-6181) Replaying a commit led to java.lang.StackOverflowError and node crash

Reply via email to