[jira] [Commented] (CASSANDRA-9708) Serialize ClusteringPrefixes in batches

Sylvain Lebresne (JIRA) Thu, 23 Jul 2015 06:21:34 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638790#comment-14638790
 ]


Sylvain Lebresne commented on CASSANDRA-9708:
---------------------------------------------

Mostly lgtm, but {{ClusteringPrefix.Deserializer.skipNext}} should also handle 
{{nextHeader}} (can't rely on {{deserializeOne}} having been run at all, and 
there is the repeated header case).

Nitpicks:
* There's a redundantly added import in {{UnfilteredSerializer}}.
* The style is slightly inconsistent between the {{serializeValuesWithoutSize}} 
and {{deserializeValuesWithoutSize}}: in the first, you're using a for-loop for 
all the index between {{offset}} and {{limit}}, setting {{offset = limit}} at 
the end, while in the second you use a while-loop, incrementing {{offset}} 
directly. Both are obviously equivalent, but it'll make it a tad easier to 
validate that both methods are symmetric of one another for future readers if 
the styles are the same.
* "// no need to module 64, as done for us": the other comment regarding this 
are more clear than this "as done for us" (and typo to "module").
* Duplicating the test and running it with empty values (empty string 
typically) would improve coverage without too much effort (testing the null 
case is harder though cause CQL will get in the way).


> Serialize ClusteringPrefixes in batches
> ---------------------------------------
>
>                 Key: CASSANDRA-9708
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9708
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 3.0.0 rc1
>
>
> Typically we will have very few clustering prefixes to serialize, however in 
> theory they are not constrained (or are they, just to a very large number?). 
> Currently we encode a fat header for all values up front (two bits per 
> value), however those bits will typically be zero, and typically we will have 
> only a handful (perhaps 1 or 2) of values.
> This patch modifies the encoding to batch the prefixes in groups of up to 32, 
> along with a header that is vint encoded. Typically this will result in a 
> single byte per batch, but will consume up to 9 bytes if some of the values 
> have their flags set. If we have more than 32 columns, we just read another 
> header. This means we incur no garbage, and compress the data on disk in many 
> cases where we have more than 4 clustering components.
> I do wonder if we shouldn't impose a limit on clustering columns, though: If 
> you have more than a handful merge performance is going to disintegrate. 32 
> is probably well in excess of what we should be seeing in the wild anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9708) Serialize ClusteringPrefixes in batches

Reply via email to