[ 
https://issues.apache.org/jira/browse/CASSANDRA-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082991#comment-13082991
 ] 

Sylvain Lebresne commented on CASSANDRA-3003:
---------------------------------------------

bq. I'm probably missing something, but isn't the problem that this can't be 
done without two passes for rows that are too large to fit in memory?

Hum true. What we need to do is deserialize each row with the 'fromRemote' flag 
on so that the delta are cleaned up, and them reserialize the result. But that 
will potentially reduce the column serialized size (and thus modify the row 
total size and the column index). Now we could imagine to remember the offset 
of the beginning of the row, to load the column index in memory and update it 
during the first pass (it would likely be ok to simply update the index offsets 
without changing the index structure itself), and to seek back at the end to 
write the updated data size and column index. However, this unfortunately won't 
be doable with the current SequentialWriter (and CompressedSequentialWriter) 
since we cannot seek back (without truncating). Retrospectively, it would have 
been nicer to have the cleaning of a counter context not change its size :(

So yeah, it sucks. I'm still mildly fan of moving the cleanup because it "feels 
wrong" somehow. It feels it would be better to have that delta cleaning done 
sooner than latter. But this may end up being the simplest/more efficient 
solution.

> Trunk single-pass streaming doesn't handle large row correctly
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-3003
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3003
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Yuki Morishita
>            Priority: Critical
>              Labels: streaming
>
> For normal column family, trunk streaming always buffer the whole row into 
> memory. In uses
> {noformat}
>   ColumnFamily.serializer().deserializeColumns(in, cf, true, true);
> {noformat}
> on the input bytes.
> We must avoid this for rows that don't fit in the inMemoryLimit.
> Note that for regular column families, for a given row, there is actually no 
> need to even recreate the bloom filter of column index, nor to deserialize 
> the columns. It is enough to filter the key and row size to feed the index 
> writer, but then simply dump the rest on disk directly. This would make 
> streaming more efficient, avoid a lot of object creation and avoid the 
> pitfall of big rows.
> Counters column family are unfortunately trickier, because each column needs 
> to be deserialized (to mark them as 'fromRemote'). However, we don't need to 
> do the double pass of LazilyCompactedRow for that. We can simply use a 
> SSTableIdentityIterator and deserialize/reserialize input as it comes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to