[ https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuki Morishita updated CASSANDRA-3003: -------------------------------------- Attachment: 3003-v5.txt > Trunk single-pass streaming doesn't handle large row correctly > -------------------------------------------------------------- > > Key: CASSANDRA-3003 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3003 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0 > Reporter: Sylvain Lebresne > Assignee: Yuki Morishita > Priority: Critical > Labels: streaming > Fix For: 1.0 > > Attachments: 3003-v1.txt, 3003-v2.txt, 3003-v3.txt, 3003-v5.txt, > v3003-v4.txt > > > For normal column family, trunk streaming always buffer the whole row into > memory. In uses > {noformat} > ColumnFamily.serializer().deserializeColumns(in, cf, true, true); > {noformat} > on the input bytes. > We must avoid this for rows that don't fit in the inMemoryLimit. > Note that for regular column families, for a given row, there is actually no > need to even recreate the bloom filter of column index, nor to deserialize > the columns. It is enough to filter the key and row size to feed the index > writer, but then simply dump the rest on disk directly. This would make > streaming more efficient, avoid a lot of object creation and avoid the > pitfall of big rows. > Counters column family are unfortunately trickier, because each column needs > to be deserialized (to mark them as 'fromRemote'). However, we don't need to > do the double pass of LazilyCompactedRow for that. We can simply use a > SSTableIdentityIterator and deserialize/reserialize input as it comes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira