Promote row index
-----------------

                 Key: CASSANDRA-2319
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Stu Hood
            Assignee: Stu Hood
             Fix For: 0.8


The row index contains entries for configurably sized blocks of a wide row. For 
a row of appreciable size, the row index ends up directing the third seek (1. 
index, 2. row index, 3. content) to nearby the first column of a scan.

Since the row index is always used for wide rows, and since it contains 
information that tells us whether or not the 3rd seek is necessary (the column 
range or name we are trying to slice may not exist in a given sstable), 
promoting the row index into the sstable index would allow us to drop the 
maximum number of seeks for wide rows back to 2, and, more importantly, would 
allow sstables to be eliminated using only the index.

An example usecase that benefits greatly from this change is time series data 
in wide rows, where data is appended to the beginning or end of the row. Our 
existing compaction strategy gets lucky and clusters the oldest data in the 
oldest sstables: for queries to recently appended data, we would be able to 
eliminate wide rows using only the sstable index, rather than needing to seek 
into the data file to determine that it isn't interesting. For narrow rows, 
this change would have no effect, as they will not reach the threshold for 
indexing anyway.

A first cut design for this change would look very similar to the file format 
design proposed on #674: http://wiki.apache.org/cassandra/FileFormatDesignDoc: 
row keys clustered, column names clustered, and offsets clustered and delta 
encoded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to