[jira] [Updated] (CASSANDRA-6503) sstables from stalled repair sessions become live after a reboot and can resurrect deleted data

Jason Brown (JIRA) Thu, 23 Jan 2014 05:43:53 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Brown updated CASSANDRA-6503:
-----------------------------------

    Attachment: 6503_2.0-v2.diff

Attached v2 patch has the following changes:

- Changed StreamReceiveTask to keep a collection of SSTW rather than SSTR. This 
allows us to do the conversion of SSTW to SSTR all together after we've gotten 
all the streamed files. Also fixed up the code paths to here so they pass SSTW.

- Also in StreamReceiveTask, added an abort() method, which will discard the 
SSTWs it has buffered up. Changed StreamSession so that when a session ends in 
failure, it calls the new STR.abort() method.

- Split FileMessage out into IncomingFileMessage and OutgoingFileMessage. I 
needed to do this since as each one has a different subclass of SSTable, but 
also because java generics doesn't allow me to return different subclasses from 
StreamMessage.Serializer<V extends StreamMessage>. This necessitated the 
changes in StreamMessage as I couldn't have one serializer for both 
IncomingFileMessage and OutgoingFileMessage.  As it didn't seem best to create 
a new StreamMessage.Type (something like FILE_IN and FILE_OUT) just to 
represent the FILE message type's behavior on inbound vs. outbound, I instead 
split the SM.Type.serializer into two variables: inSerializer and 
outSerializer. For all the other Type's, the in and out serializers are the 
same class; in the case of Type.FILE, this is where I'm referencing 
IncomingFileMessage.serializer and OutgoingFileMessage.serializer, 
respectively. This seemed the cleanest way to introduce the now-bifurcated life 
of Type.FILE/FileMessage.

- added StreamLockfile to satisfy [~yukim]'s request for a mechanism to remove, 
on restart, the subset of SSTRs that were successfully converted when others 
from it's stream session failed. Assumes the process crashed in the middle of 
converting the SSTWs to SSTRs.

In the first patch, I chose to write the lockfile out to the commitlog 
directory. I did this as it seems like overkill to add another yaml setting 
(and Config/DD change) just for this value. Thus, I wanted to piggyback off 
something else that we already have, and DD.getCommitLogDirectory seemed the 
least worst. I'm open to suggestions on this.

Once these changes are incorporated into 2.0 and trunk, I would still like to 
do something for 1.2 but I do not think we need to be as extensive as what 
we're doing for 2.0+. Perhaps leave out the lockfile and the abort(), and just 
leave the deferring of converting SSTW to SSTR until the end of the session 
(basically what the current 1.2 patch does, but I'll check it out again after 
the 2.0 stuff is good).


> sstables from stalled repair sessions become live after a reboot and can 
> resurrect deleted data
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6503
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>            Assignee: Jason Brown
>            Priority: Minor
>             Fix For: 1.2.14, 2.0.5
>
>         Attachments: 6503_2.0-v2.diff, 6503_c1.2-v1.patch
>
>
> The sstables streamed in during a repair session don't become active until 
> the session finishes.  If something causes the repair session to hang for 
> some reason, those sstables will hang around until the next reboot, and 
> become active then.  If you don't reboot for 3 months, this can cause data to 
> resurrect, as GC grace has expired, so tombstones for the data in those 
> sstables may have already been collected.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (CASSANDRA-6503) sstables from stalled repair sessions become live after a reboot and can resurrect deleted data

Reply via email to