[jira] [Updated] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.

Jonathan Ellis (JIRA) Fri, 08 Apr 2011 15:02:47 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Ellis updated CASSANDRA-2261:
--------------------------------------

    Fix Version/s:     (was: 0.7.5)
                   0.8

Let's target 0.8.  It looks like we're eating exception information -- an 
IOException that gets rethrown/caught as SSTableIOException will not be logged.

What does adding SSTableIOException buy us?

IMO the ideal fix would be, audit our uses of IOE to make sure we're not 
laundering them (by re-throwing as RTE or IOError) where we want to catch them, 
and just catch IOE for uses like this.

(I'd also like to audit to *add* laundering to IOError for unrecoverable 
problems where they happen, instead of polluting the call heirarchy until we 
arbitrarily launder somewhere else.  And maybe even launder to AssertionError 
for "can't happen" IOE like where we are reading from a stream we know is 
in-memory.)

But, I am not a fan of checked exceptions so maybe that is not the Real Java 
Solution.

> During Compaction, Corrupt SSTables with rows that cause failures should be 
> identified and blacklisted.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2261
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2261
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.6
>            Reporter: Benjamin Coverston
>            Assignee: Benjamin Coverston
>            Priority: Minor
>              Labels: not_a_pony
>             Fix For: 0.8
>
>         Attachments: 2261.patch
>
>
> When a compaction of a set of SSTables fails because of corruption it will 
> continue to try to compact that SSTable causing pending compactions to build 
> up.
> One way to mitigate this problem would be to log the error, then identify the 
> specific SSTable that caused the failure, subsequently blacklisting that 
> SSTable and ensuring that it is no longer included in future compactions. For 
> this we could simply store the problematic SSTable's name in memory.
> If it's not possible to identify the SSTable that caused the issue, then 
> perhaps blacklisting the (ordered) permutation of SSTables to be compacted 
> together is something that can be done to solve this problem in a more 
> general case, and avoid issues where two (or more) SSTables have trouble 
> compacting a particular row. For this option we would probably want to store 
> the lists of the bad combinations in the system table somewhere s.t. these 
> can survive a node failure (there have been a few cases where I have seen a 
> compaction cause a node failure).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2261) During Compaction, Corrupt SSTables with rows that cause failures should be identified and blacklisted.

Reply via email to