[jira] [Commented] (CASSANDRA-11206) Support large partitions on the 3.0 sstable format

Robert Stupp (JIRA) Mon, 20 Jun 2016 02:40:39 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339248#comment-15339248
 ]


Robert Stupp commented on CASSANDRA-11206:
------------------------------------------

bq. RowIndexEntry$serializedSize used to return the size of the index for the 
entire row.
The meaning of this method changed but hasn't been renamed accordingly - my 
bad. It just returns the serialized size of these fields, so without the actual 
"index payload".

bq. Javadoc for IndexInfo
The only real new thing in 3.0 index format is the table with the offsets to 
the IndexInfo objects. The rest has changed mostly by switching to vint 
encoding - "hidden" by the note for "ma" _store rows natively_.

bq. Pre_C_11206_RowIndexEntry
You can safely ignore (or even remove) the Pre-C-11206 stuff in 
RowIndexEntryTest. It just felt safer to have it initially as it was meant to 
ensure that the new implementation is binary compatible with the old one.

> Support large partitions on the 3.0 sstable format
> --------------------------------------------------
>
>                 Key: CASSANDRA-11206
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11206
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Jonathan Ellis
>            Assignee: Robert Stupp
>              Labels: docs-impacting
>             Fix For: 3.6
>
>         Attachments: 11206-gc.png, trunk-gc.png
>
>
> Cassandra saves a sample of IndexInfo objects that store the offset within 
> each partition of every 64KB (by default) range of rows.  To find a row, we 
> binary search this sample, then scan the partition of the appropriate range.
> The problem is that this scales poorly as partitions grow: on a cache miss, 
> we deserialize the entire set of IndexInfo, which both creates a lot of GC 
> overhead (as noted in CASSANDRA-9754) but is also non-negligible i/o activity 
> (relative to reading a single 64KB row range) as partitions get truly large.
> We introduced an "offset map" in CASSANDRA-10314 that allows us to perform 
> the IndexInfo bsearch while only deserializing IndexInfo that we need to 
> compare against, i.e. log(N) deserializations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11206) Support large partitions on the 3.0 sstable format

Reply via email to