[ 
https://issues.apache.org/jira/browse/CASSANDRA-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890762#action_12890762
 ] 

Nirmal Ranganathan commented on CASSANDRA-1189:
-----------------------------------------------

Here's some proposed changes, please comment with feedback. There are two 
occurrences of streaming: 

Source transfers to Destination (Anti-entropy repair, node decommission, 
possibly bulk import)
- In each of the cases source has a list of sstable files it needs to transfer 
to the destination.
- Source maintains a list of all the files, source creates a session id for 
transferring this set of files.
- Source streams the first file, header contains a new StreamHeader, that has 
the PendingFile info embedded. 
- Destination receives the stream, it has all the info for the file, once done 
responds with a StreamStatus message.
- If StreamStatus is success, Source continues with next file, if not 
retransfer until all files are complete.

(Approach 1) Destination requests from Source (Anti-entropy repair, bootstrap, 
possibly bulk export)
- Destination complies list of ranges and sends a StreamRequest message to 
Source, it attaches a session id to keep track of the request.
- Source based on the ranges compiles a list of PendingFile's and sends a 
StreamRequestResponse message with the list of files.
- Destination now has the list of files to maintain state.
- Destination sends a StreamRequest for a file from the list, it has a session 
id and file descriptor info attached. 
- Source Streams the file to Destination. 
- Destination based on the transfer status, requests the next file or 
re-requests the same file, until all files are transferred. 

(Approach 2) Destination requests from Source (Anti-entropy repair, bootstrap, 
possibly bulk export)
- Destination complies list of ranges and sends a StreamRequest message to 
Source, it attaches a session id to keep track of the request.
- Source compiles list of PendingFile's from requested ranges. Source maintains 
state. 
- Source Streams file 1 with attached StreamHeader.
- Destination receives file and responds with a StreamStatus. 
- Source based on status transfers the next file or re-transfers the same file. 

Changes to Protocol for File Streaming:
- Current -> | Protocol magic | Header | Body (File contents) |
- Proposed -> | Protocol magic | Header | StreamHeader size | StreamHeader | 
Body (File contents) |
- The protocol for all other Message's remain the same, the format remains the 
same, the content will vary.

Effects of the mentioned changes:
- There can be multiple transfers per source and destination.
- No order of files is required, prevents overlapping streams from breaking 
anything.
- Other services can transfer files without a problem. 
- Initiate and Initiate Done will be removed. A little cleaner process. 
- Facilitates for adding a layer on top to do bulk imports/exports.

Questions:
- The current streaming does not seem to maintain persistant state if a node 
fails during streaming, would that be something that needs to be considered. 
- Do we want to add checksums?

> Refactor streaming
> ------------------
>
>                 Key: CASSANDRA-1189
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1189
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7
>            Reporter: Gary Dusbabek
>            Assignee: Nirmal Ranganathan
>            Priority: Critical
>             Fix For: 0.7
>
>
> The current architecture is buggy because it makes the assumption that only 
> one stream can be in process between two nodes at a given time, and stream 
> send order never changes.  Because of this, the ACK process gets fouled up 
> when other services wish to stream files.
> The process is somewhat contorted too (request, initiate, initiate done, 
> send).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to