[ https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091838#comment-13091838 ]
Yuki Morishita commented on CASSANDRA-3003: ------------------------------------------- V2 attached and ready for the review. For Counter columns, instead of padding in place of removed delta, v2 just "mark" the counter column to clear delta later, by multiplying #elt by -1 in order to keep the header size for later removal. Marking only occur when deserialize "fromRemote", and actual removal of delta is done when reading again from disk after the streaming. > Trunk single-pass streaming doesn't handle large row correctly > -------------------------------------------------------------- > > Key: CASSANDRA-3003 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3003 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0 > Reporter: Sylvain Lebresne > Assignee: Yuki Morishita > Priority: Critical > Labels: streaming > Fix For: 1.0 > > Attachments: 3003-v1.txt, 3003-v2.txt, mylyn-context.zip > > > For normal column family, trunk streaming always buffer the whole row into > memory. In uses > {noformat} > ColumnFamily.serializer().deserializeColumns(in, cf, true, true); > {noformat} > on the input bytes. > We must avoid this for rows that don't fit in the inMemoryLimit. > Note that for regular column families, for a given row, there is actually no > need to even recreate the bloom filter of column index, nor to deserialize > the columns. It is enough to filter the key and row size to feed the index > writer, but then simply dump the rest on disk directly. This would make > streaming more efficient, avoid a lot of object creation and avoid the > pitfall of big rows. > Counters column family are unfortunately trickier, because each column needs > to be deserialized (to mark them as 'fromRemote'). However, we don't need to > do the double pass of LazilyCompactedRow for that. We can simply use a > SSTableIdentityIterator and deserialize/reserialize input as it comes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira