[jira] [Commented] (CASSANDRA-9708) Serialize ClusteringPrefixes in batches

Benedict (JIRA) Tue, 14 Jul 2015 02:58:15 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626139#comment-14626139
 ]


Benedict commented on CASSANDRA-9708:
-------------------------------------

bq. Constructing an NIODataInputStream to decode a vint

That's not what's happening. We're constructing it to decode an object graph. 
That object graph seems to be sufficiently compressed in some cases to be < 9 
bytes, and that seems to only occur post these changes. The refusal to safely 
decode object graphs encoded in < 9 bytes is definitely something we want to 
avoid, but we may be decoding arbitrarily large graphs, via the normal complex 
decode call graph. Refactoring this is a really significant undertaking, and 
one you mentioned in another ticket recently. I'm very much in favour of the 
exploratory work to do so, but in the meantime we have to live with what we 
have.


> Serialize ClusteringPrefixes in batches
> ---------------------------------------
>
>                 Key: CASSANDRA-9708
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9708
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 3.0.0 rc1
>
>
> Typically we will have very few clustering prefixes to serialize, however in 
> theory they are not constrained (or are they, just to a very large number?). 
> Currently we encode a fat header for all values up front (two bits per 
> value), however those bits will typically be zero, and typically we will have 
> only a handful (perhaps 1 or 2) of values.
> This patch modifies the encoding to batch the prefixes in groups of up to 32, 
> along with a header that is vint encoded. Typically this will result in a 
> single byte per batch, but will consume up to 9 bytes if some of the values 
> have their flags set. If we have more than 32 columns, we just read another 
> header. This means we incur no garbage, and compress the data on disk in many 
> cases where we have more than 4 clustering components.
> I do wonder if we shouldn't impose a limit on clustering columns, though: If 
> you have more than a handful merge performance is going to disintegrate. 32 
> is probably well in excess of what we should be seeing in the wild anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9708) Serialize ClusteringPrefixes in batches

Reply via email to