[ 
https://issues.apache.org/jira/browse/CASSANDRA-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163981#comment-13163981
 ] 

Jonathan Ellis commented on CASSANDRA-3581:
-------------------------------------------

bq. Nullifying the minimum & maximum column name fields has the effect of 
flagging the SSTable as containing row tombstones.

Okay, but my main concern isn't w/ how to implement that but that it probably 
tips the balance to "not worth bothering with the complexity and overhead of 
sorting sstables by min/max column name and doing the pruning dance" if we can 
apply it in such a small number of cases.
                
> Optimize RangeSlice operations for append-mostly use cases
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-3581
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3581
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Rick Branson
>            Assignee: Rick Branson
>            Priority: Minor
>             Fix For: 1.1
>
>
> Currently, to perform a slice or count with a SliceRange, all of the SSTables 
> containing the requested row must be interrogated to determine if they 
> contain matching column names. SliceRange operations on wide rows which have 
> columns distributed across many SSTable files can turn into a relatively 
> expensive operation involving many disk seeks. On time-series use cases such 
> as the one highlighted below, most of these I/O operations end up just 
> eliminating most of the SSTables.
> This optimization would require two values to be added to the SSTable header: 
> the minimum and maximum column names (according to the CF comparator) across 
> all rows (including tombstones) within the SSTable. For SliceRange 
> operations, SSTables containing rows with column names entirely outside of 
> the SliceRange would be completely eliminated without even a single disk 
> operation.
> Rationale: a very common use case for Cassandra is to use a column family to 
> store time-series data with a row for each metric and a column for each data 
> point with the column name being a TimeUUID. Data is typically read with a 
> bounded time range using a SliceRange. For the described use case, any given 
> SSTable within this ColumnFamily will have a tightly bound range of minimum 
> and maximum column names across all rows, and there will be little overlap of 
> these column name ranges across different SSTable files. Append-mostly column 
> families with serial column names (as ordered by the comparator) on which 
> SliceRange operations are used can benefit from this optimization, and the 
> cost to use cases that do not fall within this group range from negligible to 
> non-existant.
> Caveat: even just one row tombstone would throw this off completely. From 
> what I can tell, there's no way to skip an SSTable that contains a row 
> tombstone, and there is also no current way to segregate tombstones. Stu had 
> some interesting ideas in CASSANDRA-2498 about segregating tombstones to 
> separate SSTables, but that's for a later time. The light at the end of the 
> tunnel is that users which benefit from this optimization either do not 
> perform deletes or do them in large batches. These same users would also be 
> able to use slice tombstones instead of row tombstones to preverse the 
> optimized behavior. A full row tombstone would nullify the minimum/maximum 
> values, indicating that the optimization can't be used.
> Question for the audience: should there be some kind of cap to the size of 
> the min/max column names kept in the header to keep the internal bearings 
> greased and everyone honest? Something like 256 bytes seems reasonable to me, 
> and we just disable the optimization if the column name size exceeds this 
> limit. Is there a way we could, say, store only the most significant 32 bytes 
> for each end of the name range? I can't think of any.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to