[ https://issues.apache.org/jira/browse/CASSANDRA-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032195#comment-16032195 ]
Jason Brown commented on CASSANDRA-12229: ----------------------------------------- Pushed more changes onto the shame branch (sha {{e3dd956fadb8084babdb2d96e222a3b2949589bd}}) - removed waitUntilWritable() from {{ByteBufDataOutputStreamPlus}} and replaced with a simple semaphore. - cleaned up the close/cancel notification in {{StreamingInboundHander}} from the {{thread#interrupt()}} to using an atomic variable (polling it from the background thread). - corrected and optimized the way ByteBufs and ByteBuffers are used in the stream classes - restore CASSANDRA-7585 and bring back StreamingConnectionFactory, with modified versions of it's implementations and then inject the factory where we used to do it. - move stream versioning back into StreamMessage, instead of StreamSession (where I had moved it) - minor clean ups and doc additions - addressed TODO's and error handling Here are a few responses to some of the concerns raised by [~aweisberg] from the current PR. bq. This is really a generic inbound message handler right? Are there potentially races here with multiple threads interacting with StreamResultFuture and StreamManager and the global state? Can they register different state and then end up with a different StreamResultFuture or session? I don't believe this to be the case as the intiator sends out the {{StreamInitMessage}} and subsequent control messages on the same (control) channel, so you have TCP ordering guarantees there. Further, most of the important, state-changing functions related to {{StreamSession}} are already {{synchronized}}. bq. Ideally I think each stream session would get a dedicated control channel and a dedicated actor thread to orchestrate control messages. Completely decouple the whole thing and turn into a queue of events being processed serially. The only concurrency and parallelism is then files being transferred in parallel which is pretty clearly shared nothing other than resource limits which are generally pretty easy to reason about since there is no risk of deadlock. The trick with that is we would need to send session identifier info in every message so that the correct {{StreamSession}} can be identified. Otherwise, you have to stick a reference to the {{StreamSession}} somewhere: in the channel attributes, in the {{StreamInboundHandler}}, or in the deserialization task (like what I currently have). Further, I'm now trying to avoid altering the stream message format/protocol, so long as correctness is maintained, because there's a boat load of other changes happening here. bq. ... checksums ... I did not change the checksum algorithm for the LZ4 compression on internode messaging as that would be a change to the (undocumented) internode messaging protocol, and would break upgrades. Further, Sylvain didn't want to change the internode messaging protocol for this release, so I think we're stuck with the 32-bit checksumming as that's what we have in the existing [{{OutboundTcpConnection}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L530]. > Move streaming to non-blocking IO and netty (streaming 2.1) > ----------------------------------------------------------- > > Key: CASSANDRA-12229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12229 > Project: Cassandra > Issue Type: Improvement > Components: Streaming and Messaging > Reporter: Jason Brown > Assignee: Jason Brown > Fix For: 4.0 > > > As followup work to CASSANDRA-8457, we need to move streaming to use netty. > Streaming 2.0 (CASSANDRA-5286) brought many good improvements to how files > are transferred between nodes in a cluster. However, the low-level details of > the current streaming implementation does not line up nicely with a > non-blocking model, so I think this is a good time to review some of those > details and add in additional goodness. The current implementation assumes a > sequential or "single threaded" approach to the sending of stream messages as > well as the transfer of files. In short, after several iterative prototypes, > I propose the following: > 1) use a single bi-diredtional connection (instead of requiring to two > sockets & two threads) > 2) send the "non-file" {{StreamMessage}} s (basically anything not > {{OutboundFileMessage}}) via the normal internode messaging. This will > require a slight bit more management of the session (the ability to look up a > {{StreamSession}} from a static function on {{StreamManager}}, but we have > have most of the pieces we need for this already. > 3) switch to a non-blocking IO model (facilitated via netty) > 4) Allow files to be streamed in parallel (CASSANDRA-4663) - this should just > be a thing already > 5) If the entire sstable is to streamed, in addition to the DATA component, > transfer all the components of the sstable (primary index, bloom filter, > stats, and so on). This way we can avoid the CPU and GC pressure from > deserializing the stream into objects. File streaming then amounts to a > block-level transfer. > Note: The progress/results of CASSANDRA-11303 will need to be reflected here, > as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org