[ https://issues.apache.org/jira/browse/CASSANDRA-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008698#comment-16008698 ]
Jason Brown commented on CASSANDRA-12229: ----------------------------------------- [~aweisberg] created a [PR|https://github.com/jasobrown/cassandra/pull/1/files], and added a bunch of comments. I took his feedback, and created a new branch and a [new PR|https://github.com/jasobrown/cassandra/pull/2/files] for comments. Significant changes in this rev: - Ariel suggested moving the disk IO off the event loop on the sending side, and keep a blocking IO behavior for the disk reads. Doing this allowed me to go back and reuse the {{StreamReader}}/{{StreamWriter}} set of classes. To achieve the disk reads to happen on the event loop required some back flips, so ditching that code is not a bad thing. - While I was reverting back to the {{StreamReader}} classes, I could also revert the {{StreamMessage}} changes. Reverting back (and lightly modifying) those classes resulted in nearly the same performance (and there's always more tuning to be done), with ~40% reduction in the patch set from trunk. A few oddities needs to be cleaned up: - SwappingByteBufDataOutputStreamPlus - this is an experiment from a experimental branch from CASSANDRA-8457. The basic idea for this class is sound, but the naming and implementation might be a bit funky. - restoring a few unit tests - I've (temporariliy) removed the checksumming from {{StreamCompressionSerializer}} as it does incur about a 30% performance penalty on streaming uncompressed sstables. This cost might be covered over once files can be parallel, but I've pulled it out for now and would like to have a discussion on it. > Move streaming to non-blocking IO and netty (streaming 2.1) > ----------------------------------------------------------- > > Key: CASSANDRA-12229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12229 > Project: Cassandra > Issue Type: Improvement > Components: Streaming and Messaging > Reporter: Jason Brown > Assignee: Jason Brown > Fix For: 4.0 > > > As followup work to CASSANDRA-8457, we need to move streaming to use netty. > Streaming 2.0 (CASSANDRA-5286) brought many good improvements to how files > are transferred between nodes in a cluster. However, the low-level details of > the current streaming implementation does not line up nicely with a > non-blocking model, so I think this is a good time to review some of those > details and add in additional goodness. The current implementation assumes a > sequential or "single threaded" approach to the sending of stream messages as > well as the transfer of files. In short, after several iterative prototypes, > I propose the following: > 1) use a single bi-diredtional connection (instead of requiring to two > sockets & two threads) > 2) send the "non-file" {{StreamMessage}} s (basically anything not > {{OutboundFileMessage}}) via the normal internode messaging. This will > require a slight bit more management of the session (the ability to look up a > {{StreamSession}} from a static function on {{StreamManager}}, but we have > have most of the pieces we need for this already. > 3) switch to a non-blocking IO model (facilitated via netty) > 4) Allow files to be streamed in parallel (CASSANDRA-4663) - this should just > be a thing already > 5) If the entire sstable is to streamed, in addition to the DATA component, > transfer all the components of the sstable (primary index, bloom filter, > stats, and so on). This way we can avoid the CPU and GC pressure from > deserializing the stream into objects. File streaming then amounts to a > block-level transfer. > Note: The progress/results of CASSANDRA-11303 will need to be reflected here, > as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org