[ https://issues.apache.org/jira/browse/CASSANDRA-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163981#comment-13163981 ]
Jonathan Ellis commented on CASSANDRA-3581: ------------------------------------------- bq. Nullifying the minimum & maximum column name fields has the effect of flagging the SSTable as containing row tombstones. Okay, but my main concern isn't w/ how to implement that but that it probably tips the balance to "not worth bothering with the complexity and overhead of sorting sstables by min/max column name and doing the pruning dance" if we can apply it in such a small number of cases. > Optimize RangeSlice operations for append-mostly use cases > ---------------------------------------------------------- > > Key: CASSANDRA-3581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3581 > Project: Cassandra > Issue Type: Improvement > Reporter: Rick Branson > Assignee: Rick Branson > Priority: Minor > Fix For: 1.1 > > > Currently, to perform a slice or count with a SliceRange, all of the SSTables > containing the requested row must be interrogated to determine if they > contain matching column names. SliceRange operations on wide rows which have > columns distributed across many SSTable files can turn into a relatively > expensive operation involving many disk seeks. On time-series use cases such > as the one highlighted below, most of these I/O operations end up just > eliminating most of the SSTables. > This optimization would require two values to be added to the SSTable header: > the minimum and maximum column names (according to the CF comparator) across > all rows (including tombstones) within the SSTable. For SliceRange > operations, SSTables containing rows with column names entirely outside of > the SliceRange would be completely eliminated without even a single disk > operation. > Rationale: a very common use case for Cassandra is to use a column family to > store time-series data with a row for each metric and a column for each data > point with the column name being a TimeUUID. Data is typically read with a > bounded time range using a SliceRange. For the described use case, any given > SSTable within this ColumnFamily will have a tightly bound range of minimum > and maximum column names across all rows, and there will be little overlap of > these column name ranges across different SSTable files. Append-mostly column > families with serial column names (as ordered by the comparator) on which > SliceRange operations are used can benefit from this optimization, and the > cost to use cases that do not fall within this group range from negligible to > non-existant. > Caveat: even just one row tombstone would throw this off completely. From > what I can tell, there's no way to skip an SSTable that contains a row > tombstone, and there is also no current way to segregate tombstones. Stu had > some interesting ideas in CASSANDRA-2498 about segregating tombstones to > separate SSTables, but that's for a later time. The light at the end of the > tunnel is that users which benefit from this optimization either do not > perform deletes or do them in large batches. These same users would also be > able to use slice tombstones instead of row tombstones to preverse the > optimized behavior. A full row tombstone would nullify the minimum/maximum > values, indicating that the optimization can't be used. > Question for the audience: should there be some kind of cap to the size of > the min/max column names kept in the header to keep the internal bearings > greased and everyone honest? Something like 256 bytes seems reasonable to me, > and we just disable the optimization if the column name size exceeds this > limit. Is there a way we could, say, store only the most significant 32 bytes > for each end of the name range? I can't think of any. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira