[jira] [Commented] (CASSANDRA-6008) Getting 'This should never happen' error at startup due to sstables missing

Tyler Hobbs (JIRA) Wed, 11 Dec 2013 15:55:34 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845859#comment-13845859
 ]


Tyler Hobbs commented on CASSANDRA-6008:
----------------------------------------

bq. You're right that hasIrrelevantData is just checking for tombstones to 
purge now... it used to check for cells shadowed by tombstones as well. Is it 
another regression that it does not?

{{ColumnFamily.hasIrrelevantData()}} still checks for cells shadowed by 
tombstones.  It just wasn't a good name for the DeletionInfo method, which is 
only checks for purgeable tombstones.

bq. Looks to me like dropping the shouldPurge check from LCR.write is a 
regression – shouldPurge is what says "we're sure there's no data in other 
sstables that should be shadowed by this tombstone." Surprised we don't have a 
test that catches that.

The {{shouldPurge}} check in the LCR constructor handles that.  If shouldPurge 
is false, it will leave the tombstone in {{emptyColumnFamily}}.  Then, in 
{{write()}}, {{isMarkedForDelete()}} will be true, meaning the tombstone will 
be written out.

I can add a test to exercise this, if you'd like.

bq. I'm not actually sure where the bug is in the original code. I see that the 
Reducer fix will result in correctly purging range tombstones now, but I don't 
think that's the dropping-row-tombstones bug you referred to. The code in the 
constructor is cleaner now but I don't see why the original didn't work as 
intended.

There were a few different bugs.

The first and main bug was related to the top-level tombstone being ignored.  I 
believe the original intention of the LCR code was for emptyColumnFamily to 
hold the DeletionInfo (or row tombstone, in earlier forms), resulting in cells 
being deleted during {{removeDeletedAndOldShards()}}.  However, 
emptyColumnFamily is only cloned once (when creating the reducer), and then 
that clone is cleared and reused during each call to {{getReduced()}}, so the 
DeletionInfo was lost after the first round.

The second bug was that if the row tombstone had expired, it would be purged in 
the LCR constructor, so cells would not be considered deleted later during the 
merge/reduce process.

The last bug was just a minor potential issue I spotted in {{reduce()}} where 
we weren't necessarily picking the range tombstone with the highest timestamp, 
just the last range tombstone we saw.

> Getting 'This should never happen' error at startup due to sstables missing
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6008
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6008
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: John Carrino
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.4
>
>         Attachments: 6008-2.0-v1.patch, 6008-trunk-v1.patch
>
>
> Exception encountered during startup: "Unfinished compactions reference 
> missing sstables. This should never happen since compactions are marked 
> finished before we start removing the old sstables"
> This happens when sstables that have been compacted away are removed, but 
> they still have entries in the system.compactions_in_progress table.
> Normally this should not happen because the entries in 
> system.compactions_in_progress are deleted before the old sstables are 
> deleted.
> However at startup recovery time, old sstables are deleted (NOT BEFORE they 
> are removed from the compactions_in_progress table) and then after that is 
> done it does a truncate using SystemKeyspace.discardCompactionsInProgress
> We ran into a case where the disk filled up and the node died and was bounced 
> and then failed to truncate this table on startup, and then got stuck hitting 
> this exception in ColumnFamilyStore.removeUnfinishedCompactionLeftovers.
> Maybe on startup we can delete from this table incrementally as we clean 
> stuff up in the same way that compactions delete from this table before they 
> delete old sstables.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (CASSANDRA-6008) Getting 'This should never happen' error at startup due to sstables missing

Reply via email to