[ 
https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589293#comment-14589293
 ] 

Stefania commented on CASSANDRA-9591:
-------------------------------------

I've created some dtests, branch attached, the pull request is 
[here|https://github.com/riptano/cassandra-dtest/pull/331].

I integrated the original patch and my suggestions to 2.1 and 2.2. You find 
attached all 3 GitHub branches.

Pending CI, the unit and dtests for scrub pass on my box for 2.0 and 2.1 but 
not for 2.2. Here we have a problem, it seems we need the first and last keys 
created in buildSummary() because of the new lifecycle code:

{code}
nosetests -s scrub_test.py:TestScrub.test_standalone_scrub_data_file_only

dtest: DEBUG: Pre-scrub sstables snapshotted into snapshot 
pre-scrub-1434513958878
WARNING: Missing component: 
/tmp/dtest-UPhj9E/test/node1/data/ks/users-254da3c014a611e59b004b06169f4ffa/la-1-big-Index.db
Scrubbing 
BigTableReader(path='/tmp/dtest-UPhj9E/test/node1/data/ks/users-254da3c014a611e59b004b06169f4ffa/la-1-big-Data.db')
 (863 bytes)
null

dtest: DEBUG: Error scrubbing 
BigTableReader(path='/tmp/dtest-UPhj9E/test/node1/data/ks/users-254da3c014a611e59b004b06169f4ffa/la-1-big-Data.db'):
 null
java.lang.NullPointerException
        at 
java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:321)
        at java.util.ComparableTimSort.sort(ComparableTimSort.java:184)
        at java.util.Arrays.sort(Arrays.java:1312)
        at java.util.Arrays.sort(Arrays.java:1506)
        at java.util.ArrayList.sort(ArrayList.java:1454)
        at java.util.Collections.sort(Collections.java:141)
        at 
org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:187)
        at org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:50)
        at 
org.apache.cassandra.db.lifecycle.SSTableIntervalTree.<init>(SSTableIntervalTree.java:20)
        at 
org.apache.cassandra.db.lifecycle.SSTableIntervalTree.build(SSTableIntervalTree.java:30)
        at org.apache.cassandra.db.lifecycle.View$4.apply(View.java:183)
        at org.apache.cassandra.db.lifecycle.View$4.apply(View.java:178)
        at 
com.google.common.base.Functions$FunctionComposition.apply(Functions.java:211)
        at org.apache.cassandra.db.lifecycle.Tracker.apply(Tracker.java:126)
        at org.apache.cassandra.db.lifecycle.Tracker.apply(Tracker.java:99)
        at 
org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkpoint(LifecycleTransaction.java:233)
        at 
org.apache.cassandra.db.lifecycle.LifecycleTransaction.checkpoint(LifecycleTransaction.java:214)
        at 
org.apache.cassandra.io.sstable.SSTableRewriter.switchWriter(SSTableRewriter.java:285)
        at 
org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:330)
        at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:169)
        at 
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:179)
        at 
org.apache.cassandra.io.sstable.SSTableRewriter.finish(SSTableRewriter.java:317)
        at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:299)
        at 
org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:124)

--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 10.984s

FAILED (failures=1)
{code}

I think we either need to set first and last in SSTableReader also when the 
index is not available or we need to see if we can avoid updating the live set 
when we are offline. cc [~benedict] for suggestions re lifecycle code.

[~michaelsembwever], could you review my suggestions and if you are happy 
prepare a final patch for 2.0 and 2.1 that can be committed in your name? You 
can just attach the GitHub branch if easier. However if you do change things 
slightly let us know so we can rerun in CI (continuous integration).

Then could you work with Benedict's suggestion to fix the 2.2. problem? We can 
help you with this if required. 




> Scrub (recover) sstables even when -Index.db is missing
> -------------------------------------------------------
>
>                 Key: CASSANDRA-9591
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9591
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: mck
>            Assignee: mck
>              Labels: sstablescrub
>             Fix For: 2.0.x
>
>         Attachments: 9591-2.0.txt
>
>
> Today SSTableReader needs at minimum 3 files to load an sstable:
>  - -Data.db
>  - -CompressionInfo.db 
>  - -Index.db
> But during the scrub process the -Index.db file isn't actually necessary, 
> unless there's corruption in the -Data.db and we want to be able to skip over 
> corrupted rows. Given that there is still a fair chance that there's nothing 
> wrong with the -Data.db file and we're just missing the -Index.db file this 
> patch addresses that situation.
> So the following patch makes it possible for the StandaloneScrubber 
> (sstablescrub) to recover sstables despite missing -Index.db files.
> This can happen from a catastrophic incident where data directories have been 
> lost and/or corrupted, or wiped and the backup not healthy. I'm aware that 
> normally one depends on replicas or snapshots to avoid such situations, but 
> such catastrophic incidents do occur in the wild.
> I have not tested this patch against normal c* operations and all the other 
> (more critical) ways SSTableReader is used. i'll happily do that and add the 
> needed units tests if people see merit in accepting the patch.
> Otherwise the patch can live with the issue, in-case anyone else needs it. 
> There's also a cassandra distribution bundled with the patch 
> [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz]
>  to make life a little easier for anyone finding themselves in such a bad 
> situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to