[ https://issues.apache.org/jira/browse/CASSANDRA-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253271#comment-14253271 ]
Benedict commented on CASSANDRA-7275: ------------------------------------- I should clarify, since it sounds like we are not too far in disagreement on this point: I'm suggesting only that the failure is reported to the flush call site, not so the callsite can specialised on the kind of exception, but so that if this specific callsite can safely cope with _any_ failure, it can be specialised to do so, with remedial action if necessary. A whitelist would be a subset of this approach, and hence simpler - but only if it's genuinely safe to just drop the problem on the floor; I'm not sufficiently familiar with these system tables to say for sure, but I do recollect problems safely starting a node when compactions_in_progress was not properly maintained, so I expect _some_ remedial action will probably be necessary, perhaps on a case-by-case basis. > Errors in FlushRunnable may leave threads hung > ---------------------------------------------- > > Key: CASSANDRA-7275 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7275 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Tyler Hobbs > Assignee: Pavel Yaskevich > Priority: Minor > Fix For: 2.0.12 > > Attachments: 0001-Move-latch.countDown-into-finally-block.patch, > 7252-2.0-v2.txt, CASSANDRA-7275-flush-info.patch > > > In Memtable.FlushRunnable, the CountDownLatch will never be counted down if > there are errors, which results in hanging any threads that are waiting for > the flush to complete. For example, an error like this causes the problem: > {noformat} > ERROR [FlushWriter:474] 2014-05-20 12:10:31,137 CassandraDaemon.java (line > 198) Exception in thread Thread[FlushWriter:474,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.position(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:64) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:138) > at > org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103) > at > org.apache.cassandra.db.ColumnFamily.getColumnStats(ColumnFamily.java:439) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:194) > at > org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:397) > at > org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) > at > org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)