[ https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082991#comment-13082991 ]
Sylvain Lebresne commented on CASSANDRA-3003: --------------------------------------------- bq. I'm probably missing something, but isn't the problem that this can't be done without two passes for rows that are too large to fit in memory? Hum true. What we need to do is deserialize each row with the 'fromRemote' flag on so that the delta are cleaned up, and them reserialize the result. But that will potentially reduce the column serialized size (and thus modify the row total size and the column index). Now we could imagine to remember the offset of the beginning of the row, to load the column index in memory and update it during the first pass (it would likely be ok to simply update the index offsets without changing the index structure itself), and to seek back at the end to write the updated data size and column index. However, this unfortunately won't be doable with the current SequentialWriter (and CompressedSequentialWriter) since we cannot seek back (without truncating). Retrospectively, it would have been nicer to have the cleaning of a counter context not change its size :( So yeah, it sucks. I'm still mildly fan of moving the cleanup because it "feels wrong" somehow. It feels it would be better to have that delta cleaning done sooner than latter. But this may end up being the simplest/more efficient solution. > Trunk single-pass streaming doesn't handle large row correctly > -------------------------------------------------------------- > > Key: CASSANDRA-3003 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3003 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Sylvain Lebresne > Assignee: Yuki Morishita > Priority: Critical > Labels: streaming > > For normal column family, trunk streaming always buffer the whole row into > memory. In uses > {noformat} > ColumnFamily.serializer().deserializeColumns(in, cf, true, true); > {noformat} > on the input bytes. > We must avoid this for rows that don't fit in the inMemoryLimit. > Note that for regular column families, for a given row, there is actually no > need to even recreate the bloom filter of column index, nor to deserialize > the columns. It is enough to filter the key and row size to feed the index > writer, but then simply dump the rest on disk directly. This would make > streaming more efficient, avoid a lot of object creation and avoid the > pitfall of big rows. > Counters column family are unfortunately trickier, because each column needs > to be deserialized (to mark them as 'fromRemote'). However, we don't need to > do the double pass of LazilyCompactedRow for that. We can simply use a > SSTableIdentityIterator and deserialize/reserialize input as it comes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira