[ https://issues.apache.org/jira/browse/CASSANDRA-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Brown updated CASSANDRA-6503: ----------------------------------- Attachment: 6503_2.0-v2.diff Attached v2 patch has the following changes: - Changed StreamReceiveTask to keep a collection of SSTW rather than SSTR. This allows us to do the conversion of SSTW to SSTR all together after we've gotten all the streamed files. Also fixed up the code paths to here so they pass SSTW. - Also in StreamReceiveTask, added an abort() method, which will discard the SSTWs it has buffered up. Changed StreamSession so that when a session ends in failure, it calls the new STR.abort() method. - Split FileMessage out into IncomingFileMessage and OutgoingFileMessage. I needed to do this since as each one has a different subclass of SSTable, but also because java generics doesn't allow me to return different subclasses from StreamMessage.Serializer<V extends StreamMessage>. This necessitated the changes in StreamMessage as I couldn't have one serializer for both IncomingFileMessage and OutgoingFileMessage. As it didn't seem best to create a new StreamMessage.Type (something like FILE_IN and FILE_OUT) just to represent the FILE message type's behavior on inbound vs. outbound, I instead split the SM.Type.serializer into two variables: inSerializer and outSerializer. For all the other Type's, the in and out serializers are the same class; in the case of Type.FILE, this is where I'm referencing IncomingFileMessage.serializer and OutgoingFileMessage.serializer, respectively. This seemed the cleanest way to introduce the now-bifurcated life of Type.FILE/FileMessage. - added StreamLockfile to satisfy [~yukim]'s request for a mechanism to remove, on restart, the subset of SSTRs that were successfully converted when others from it's stream session failed. Assumes the process crashed in the middle of converting the SSTWs to SSTRs. In the first patch, I chose to write the lockfile out to the commitlog directory. I did this as it seems like overkill to add another yaml setting (and Config/DD change) just for this value. Thus, I wanted to piggyback off something else that we already have, and DD.getCommitLogDirectory seemed the least worst. I'm open to suggestions on this. Once these changes are incorporated into 2.0 and trunk, I would still like to do something for 1.2 but I do not think we need to be as extensive as what we're doing for 2.0+. Perhaps leave out the lockfile and the abort(), and just leave the deferring of converting SSTW to SSTR until the end of the session (basically what the current 1.2 patch does, but I'll check it out again after the 2.0 stuff is good). > sstables from stalled repair sessions become live after a reboot and can > resurrect deleted data > ----------------------------------------------------------------------------------------------- > > Key: CASSANDRA-6503 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6503 > Project: Cassandra > Issue Type: Bug > Reporter: Jeremiah Jordan > Assignee: Jason Brown > Priority: Minor > Fix For: 1.2.14, 2.0.5 > > Attachments: 6503_2.0-v2.diff, 6503_c1.2-v1.patch > > > The sstables streamed in during a repair session don't become active until > the session finishes. If something causes the repair session to hang for > some reason, those sstables will hang around until the next reboot, and > become active then. If you don't reboot for 3 months, this can cause data to > resurrect, as GC grace has expired, so tombstones for the data in those > sstables may have already been collected. -- This message was sent by Atlassian JIRA (v6.1.5#6160)