[ https://issues.apache.org/jira/browse/CASSANDRA-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064614#comment-13064614 ]
Yuki Morishita commented on CASSANDRA-2677: ------------------------------------------- Attached patch let cassandra create sstable with indices and BF directly from streaming. I left the old path to handle the case where older version of node streams to the new one. I don't have test environment with SSL, so testing with encryption enabled environment is appreciated. > Optimize streaming to be single-pass > ------------------------------------ > > Key: CASSANDRA-2677 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2677 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Ellis > Priority: Minor > Fix For: 1.0 > > Attachments: trunk-2677.txt > > > Streaming currently is a two-pass operation: one to write the Data component > do disk from the socket, then another to build the index and bloom filter > from it. This means we do about 2x the i/o we would if we created the index > and BF during the original write. > For node movement this was not considered to be a Big Deal because the stream > target is not a member of the ring, so we can be inefficient without hurting > live queries. But optimizing node movement to not require un/rebootstrap > (CASSANDRA-1427) and bulk load (CASSANDRA-1278) mean we can stream to live > nodes too. > The main obstacle here is we don't know how many keys will be in the new > sstable ahead of time, which we need to size the bloom filter correctly. We > can solve this by including that information (or a close approximation) in > the stream setup -- the source node can calculate that without hitting disk > from the in-memory index summary. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira