Optimize streaming to be single-pass
------------------------------------

                 Key: CASSANDRA-2677
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2677
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Jonathan Ellis
            Assignee: Sylvain Lebresne
            Priority: Minor
             Fix For: 0.8.1


Streaming currently is a two-pass operation: one to write the Data component do 
disk from the socket, then another to build the index and bloom filter from it. 
 This means we do about 2x the i/o we would if we created the index and BF 
during the original write.

For node movement this was not considered to be a Big Deal because the stream 
target is not a member of the ring, so we can be inefficient without hurting 
live queries.  But optimizing node movement to not require un/rebootstrap 
(CASSANDRA-1427) and bulk load (CASSANDRA-1278) mean we can stream to live 
nodes too.

The main obstacle here is we don't know how many keys will be in the new 
sstable ahead of time, which we need to size the bloom filter correctly. We can 
solve this by including that information (or a close approximation) in the 
stream setup -- the source node can calculate that without hitting disk from 
the in-memory index summary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to