[ 
https://issues.apache.org/jira/browse/CASSANDRA-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032195#comment-16032195
 ] 

Jason Brown commented on CASSANDRA-12229:
-----------------------------------------

Pushed more changes onto the shame branch (sha 
{{e3dd956fadb8084babdb2d96e222a3b2949589bd}})

- removed waitUntilWritable() from {{ByteBufDataOutputStreamPlus}} and replaced 
with a simple semaphore.
- cleaned up the close/cancel notification in {{StreamingInboundHander}} from 
the {{thread#interrupt()}} to using an atomic variable (polling it from the 
background thread).
- corrected and optimized the way ByteBufs and ByteBuffers are used in the 
stream classes
- restore CASSANDRA-7585 and bring back StreamingConnectionFactory, with 
modified versions of it's implementations and then inject the factory where we 
used to do  it.
- move stream versioning back into StreamMessage, instead of StreamSession 
(where I had moved it)
- minor clean ups and doc additions
- addressed TODO's and error handling

Here are a few responses to some of the concerns raised by [~aweisberg] from 
the current PR.

bq. This is really a generic inbound message handler right? Are there 
potentially races here with multiple threads interacting with 
StreamResultFuture and StreamManager and the global state? Can they register 
different state and then end up with a different StreamResultFuture or session? 

I don't believe this to be the case as the intiator sends out the 
{{StreamInitMessage}} and subsequent control messages on the same (control) 
channel, so you have TCP ordering guarantees there. Further, most of the 
important, state-changing functions related to {{StreamSession}} are already 
{{synchronized}}.

bq. Ideally I think each stream session would get a dedicated control channel 
and a dedicated actor thread to orchestrate control messages. Completely 
decouple the whole thing and turn into a queue of events being processed 
serially. The only concurrency and parallelism is then files being transferred 
in parallel which is pretty clearly shared nothing other than resource limits 
which are generally pretty easy to reason about since there is no risk of 
deadlock.

The trick with that is we would need to send session identifier info in every 
message so that the correct {{StreamSession}} can be identified. Otherwise, you 
have to stick a reference to the {{StreamSession}} somewhere: in the channel 
attributes, in the {{StreamInboundHandler}}, or in the deserialization task 
(like what I currently have). Further, I'm now trying to avoid altering the 
stream message format/protocol, so long as correctness is maintained, because 
there's a boat load of other changes happening here.

bq. ... checksums ...
I did not change the checksum algorithm for the LZ4 compression on internode 
messaging as that would be a change to the (undocumented) internode messaging 
protocol, and would break upgrades. Further, Sylvain didn't want to change the 
internode messaging protocol for this release, so I think we're stuck with the 
32-bit checksumming as that's what we have in the existing 
[{{OutboundTcpConnection}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L530].


> Move streaming to non-blocking IO and netty (streaming 2.1)
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-12229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12229
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>             Fix For: 4.0
>
>
> As followup work to CASSANDRA-8457, we need to move streaming to use netty.
> Streaming 2.0 (CASSANDRA-5286) brought many good improvements to how files 
> are transferred between nodes in a cluster. However, the low-level details of 
> the current streaming implementation does not line up nicely with a 
> non-blocking model, so I think this is a good time to review some of those 
> details and add in additional goodness. The current implementation assumes a 
> sequential or "single threaded" approach to the sending of stream messages as 
> well as the transfer of files. In short, after several iterative prototypes, 
> I propose the following:
> 1) use a single bi-diredtional connection (instead of requiring to two 
> sockets & two threads)
> 2) send the "non-file" {{StreamMessage}} s (basically anything not 
> {{OutboundFileMessage}}) via the normal internode messaging. This will 
> require a slight bit more management of the session (the ability to look up a 
> {{StreamSession}} from a static function on {{StreamManager}}, but we have 
> have most of the pieces we need for this already.
> 3) switch to a non-blocking IO model (facilitated via netty)
> 4) Allow files to be streamed in parallel (CASSANDRA-4663) - this should just 
> be a thing already
> 5) If the entire sstable is to streamed, in addition to the DATA component, 
> transfer all the components of the sstable (primary index, bloom filter, 
> stats, and so on). This way we can avoid the CPU and GC pressure from 
> deserializing the stream into objects. File streaming then amounts to a 
> block-level transfer.
> Note: The progress/results of CASSANDRA-11303 will need to be reflected here, 
> as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to