[ https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083111#comment-13083111 ]
Jonathan Ellis commented on CASSANDRA-3003: ------------------------------------------- bq. it would have been nicer to have the cleaning of a counter context not change its size Can we pad it somehow? > Trunk single-pass streaming doesn't handle large row correctly > -------------------------------------------------------------- > > Key: CASSANDRA-3003 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3003 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Sylvain Lebresne > Assignee: Yuki Morishita > Priority: Critical > Labels: streaming > > For normal column family, trunk streaming always buffer the whole row into > memory. In uses > {noformat} > ColumnFamily.serializer().deserializeColumns(in, cf, true, true); > {noformat} > on the input bytes. > We must avoid this for rows that don't fit in the inMemoryLimit. > Note that for regular column families, for a given row, there is actually no > need to even recreate the bloom filter of column index, nor to deserialize > the columns. It is enough to filter the key and row size to feed the index > writer, but then simply dump the rest on disk directly. This would make > streaming more efficient, avoid a lot of object creation and avoid the > pitfall of big rows. > Counters column family are unfortunately trickier, because each column needs > to be deserialized (to mark them as 'fromRemote'). However, we don't need to > do the double pass of LazilyCompactedRow for that. We can simply use a > SSTableIdentityIterator and deserialize/reserialize input as it comes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira