[ https://issues.apache.org/jira/browse/CASSANDRA-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis reassigned CASSANDRA-2677: ----------------------------------------- Assignee: (was: Sylvain Lebresne) > Optimize streaming to be single-pass > ------------------------------------ > > Key: CASSANDRA-2677 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2677 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Ellis > Priority: Minor > Fix For: 0.8.2 > > > Streaming currently is a two-pass operation: one to write the Data component > do disk from the socket, then another to build the index and bloom filter > from it. This means we do about 2x the i/o we would if we created the index > and BF during the original write. > For node movement this was not considered to be a Big Deal because the stream > target is not a member of the ring, so we can be inefficient without hurting > live queries. But optimizing node movement to not require un/rebootstrap > (CASSANDRA-1427) and bulk load (CASSANDRA-1278) mean we can stream to live > nodes too. > The main obstacle here is we don't know how many keys will be in the new > sstable ahead of time, which we need to size the bloom filter correctly. We > can solve this by including that information (or a close approximation) in > the stream setup -- the source node can calculate that without hitting disk > from the in-memory index summary. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira