[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249033#comment-14249033 ]
Jonathan Ellis commented on CASSANDRA-7275: ------------------------------------------- bq. Killing C* process is harmful as if we have code problem in writeSortedContents or replaceFlushed code it would potentially result in shutdown of the whole cluster or at least of all of the neighbors sharing replica range. I'm much more comfortable with "things die if something goes catastrophically wrong" than "things start returning nonsense on reads" which is what happens if we mark something flushed that actually wasn't. That said, I'd be okay using disk failure policy as a guide. If people opt into best effort behavior and are okay with those implications, so be it. > Errors in FlushRunnable may leave threads hung > ---------------------------------------------- > > Key: CASSANDRA-7275 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Tyler Hobbs > Assignee: Pavel Yaskevich > Priority: Minor > Fix For: 2.0.12 > > Attachments: 0001-Move-latch.countDown-into-finally-block.patch, > 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch > > > In Memtable.FlushRunnable, the CountDownLatch will never be counted down if > there are errors, which results in hanging any threads that are waiting for > the flush to complete. For example, an error like this causes the problem: > {noformat} > ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line > 198) Exception in thread Thread[FlushWriter:474,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.position(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) > at > org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) > at > org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) > at > org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) > at > org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)