[ https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-3003: -------------------------------------- Affects Version/s: 1.0 Fix Version/s: 1.0 How is this looking, Yuki? > Trunk single-pass streaming doesn't handle large row correctly > -------------------------------------------------------------- > > Key: CASSANDRA-3003 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3003 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0 > Reporter: Sylvain Lebresne > Assignee: Yuki Morishita > Priority: Critical > Labels: streaming > Fix For: 1.0 > > > For normal column family, trunk streaming always buffer the whole row into > memory. In uses > {noformat} > ColumnFamily.serializer().deserializeColumns(in, cf, true, true); > {noformat} > on the input bytes. > We must avoid this for rows that don't fit in the inMemoryLimit. > Note that for regular column families, for a given row, there is actually no > need to even recreate the bloom filter of column index, nor to deserialize > the columns. It is enough to filter the key and row size to feed the index > writer, but then simply dump the rest on disk directly. This would make > streaming more efficient, avoid a lot of object creation and avoid the > pitfall of big rows. > Counters column family are unfortunately trickier, because each column needs > to be deserialized (to mark them as 'fromRemote'). However, we don't need to > do the double pass of LazilyCompactedRow for that. We can simply use a > SSTableIdentityIterator and deserialize/reserialize input as it comes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira