[ https://issues.apache.org/jira/browse/CASSANDRA-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163978#comment-13163978 ]
Rick Branson edited comment on CASSANDRA-3581 at 12/6/11 11:47 PM: ------------------------------------------------------------------- {quote} I don't see any drawback to putting this [the minimum/maximum column names] in the metadata/statistics component, which would keep backwards compatibility headaches down. {quote} This is just my naiveté showing through as I wasn't aware of that component. From a conceptual perspective, any metadata storage for the SSTable would work, and since it's purely an optional optimization, this makes sense. {quote} Right, and the problem with that is you can't know if the row has a tombstone without looking up the row and reading its header, which is a large part of the overhead of reading the entire row. So unless we also add a "sstable contains row tombstones" flag to our metadata we're screwed. Tracking that flag is not a problem per se, but it would narrow the usefulness of the optimization significantly if it can only be applied if there have been no row deletes in the entire sstable. {quote} Nullifying the minimum & maximum column name fields has the effect of flagging the SSTable as containing row tombstones. was (Author: rbranson): {quote} I don't see any drawback to putting this [the minimum/maximum column names] in the metadata/statistics component, which would keep backwards compatibility headaches down. {/quote} This is just my naiveté showing through as I wasn't aware of that component. From a conceptual perspective, any metadata storage for the SSTable would work, and since it's purely an optional optimization, this makes sense. {quote} Right, and the problem with that is you can't know if the row has a tombstone without looking up the row and reading its header, which is a large part of the overhead of reading the entire row. So unless we also add a "sstable contains row tombstones" flag to our metadata we're screwed. Tracking that flag is not a problem per se, but it would narrow the usefulness of the optimization significantly if it can only be applied if there have been no row deletes in the entire sstable. {/quote} Nullifying the minimum & maximum column name fields has the effect of flagging the SSTable as containing row tombstones. > Optimize RangeSlice operations for append-mostly use cases > ---------------------------------------------------------- > > Key: CASSANDRA-3581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3581 > Project: Cassandra > Issue Type: Improvement > Reporter: Rick Branson > Assignee: Rick Branson > Priority: Minor > Fix For: 1.1 > > > Currently, to perform a slice or count with a SliceRange, all of the SSTables > containing the requested row must be interrogated to determine if they > contain matching column names. SliceRange operations on wide rows which have > columns distributed across many SSTable files can turn into a relatively > expensive operation involving many disk seeks. On time-series use cases such > as the one highlighted below, most of these I/O operations end up just > eliminating most of the SSTables. > This optimization would require two values to be added to the SSTable header: > the minimum and maximum column names (according to the CF comparator) across > all rows (including tombstones) within the SSTable. For SliceRange > operations, SSTables containing rows with column names entirely outside of > the SliceRange would be completely eliminated without even a single disk > operation. > Rationale: a very common use case for Cassandra is to use a column family to > store time-series data with a row for each metric and a column for each data > point with the column name being a TimeUUID. Data is typically read with a > bounded time range using a SliceRange. For the described use case, any given > SSTable within this ColumnFamily will have a tightly bound range of minimum > and maximum column names across all rows, and there will be little overlap of > these column name ranges across different SSTable files. Append-mostly column > families with serial column names (as ordered by the comparator) on which > SliceRange operations are used can benefit from this optimization, and the > cost to use cases that do not fall within this group range from negligible to > non-existant. > Caveat: even just one row tombstone would throw this off completely. From > what I can tell, there's no way to skip an SSTable that contains a row > tombstone, and there is also no current way to segregate tombstones. Stu had > some interesting ideas in CASSANDRA-2498 about segregating tombstones to > separate SSTables, but that's for a later time. The light at the end of the > tunnel is that users which benefit from this optimization either do not > perform deletes or do them in large batches. These same users would also be > able to use slice tombstones instead of row tombstones to preverse the > optimized behavior. A full row tombstone would nullify the minimum/maximum > values, indicating that the optimization can't be used. > Question for the audience: should there be some kind of cap to the size of > the min/max column names kept in the header to keep the internal bearings > greased and everyone honest? Something like 256 bytes seems reasonable to me, > and we just disable the optimization if the column name size exceeds this > limit. Is there a way we could, say, store only the most significant 32 bytes > for each end of the name range? I can't think of any. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira