Optimize RangeSlice operations for append-mostly use cases
----------------------------------------------------------

                 Key: CASSANDRA-3581
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3581
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Rick Branson
            Assignee: Rick Branson
            Priority: Minor
             Fix For: 1.1



Currently, to perform a slice or count with a SliceRange, all of the SSTables 
containing the requested row must be interrogated to determine if they contain 
matching column names. SliceRange operations on wide rows which have columns 
distributed across many SSTable files can turn into a relatively expensive 
operation involving many disk seeks. On time-series use cases such as the one 
highlighted below, most of these I/O operations end up just eliminating most of 
the SSTables.

This optimization would require two values to be added to the SSTable header: 
the minimum and maximum column names (according to the CF comparator) across 
all rows (including tombstones) within the SSTable. For SliceRange operations, 
SSTables containing rows with column names entirely outside of the SliceRange 
would be completely eliminated without even a single disk operation.

Rationale: a very common use case for Cassandra is to use a column family to 
store time-series data with a row for each metric and a column for each data 
point with the column name being a TimeUUID. Data is typically read with a 
bounded time range using a SliceRange. For the described use case, any given 
SSTable within this ColumnFamily will have a tightly bound range of minimum and 
maximum column names across all rows, and there will be little overlap of these 
column name ranges across different SSTable files. Append-mostly column 
families with serial column names (as ordered by the comparator) on which 
SliceRange operations are used can benefit from this optimization, and the cost 
to use cases that do not fall within this group range from negligible to 
non-existant.

Caveat: even just one row tombstone would throw this off completely. From what 
I can tell, there's no way to skip an SSTable that contains a row tombstone, 
and there is also no current way to segregate tombstones. Stu had some 
interesting ideas in CASSANDRA-2498 about segregating tombstones to separate 
SSTables, but that's for a later time. The light at the end of the tunnel is 
that users which benefit from this optimization either do not perform deletes 
or do them in large batches. These same users would also be able to use slice 
tombstones instead of row tombstones to preverse the optimized behavior. A full 
row tombstone would nullify the minimum/maximum values, indicating that the 
optimization can't be used.

Question for the audience: should there be some kind of cap to the size of the 
min/max column names kept in the header to keep the internal bearings greased 
and everyone honest? Something like 256 bytes seems reasonable to me, and we 
just disable the optimization if the column name size exceeds this limit. Is 
there a way we could, say, store only the most significant 32 bytes for each 
end of the name range? I can't think of any.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to