[ 
https://issues.apache.org/jira/browse/CASSANDRA-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stu Hood updated CASSANDRA-1092:
--------------------------------

    Attachment:     (was: 
0004-Make-CompactionIterator-extend-SliceMergingIterator.patch)

> Add Slice API, and replace CF and SC for compaction reads
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-1092
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1092
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Critical
>             Fix For: 0.8
>
>         Attachments: 0001-Add-Slice-and-ColumnKey.patch
>
>
> Currently, we have two read paths for fetching Columns from disk: the 
> io.sstable.SSTableScanner interface, and the db.filter.SSTable*Iterator 
> interfaces. The latter is intended for iterating over the IColumns contained 
> in a single row, while the former iterates over entire rows at once (although 
> SSTableScanner supports returning a db.filter implementation per row).
> While this separation has allowed for highly optimized pushdown filtering in 
> the db.filter classes, the lack of abstraction makes it impossible to reason 
> about changes to the file format, and depends on random access into the file. 
> Additionally, the separation of 'row iteration' from 'icolumn iteration' 
> ignores the fact that super columns contain an additional level of columns 
> that could be iterated. Rather than introducing a third level of iterators 
> that deals with iterating over subcolumns, a unified interface for iterating 
> over arbitrarily nested columns would clarify the code, and open the door to 
> many interesting possibilities (see CASSANDRA-998).
> This ticket deals with implementing an initial cut of the unified interface, 
> which reuses the "Scanner" name. The org.apache.cassandra.Scanner interface 
> is essentially an extended iterator, which is further enhanced by 
> org.apache.cassandra.SeekableScanner to add operations that reposition the 
> iterator. By the end of CASSANDRA-998, SeekableScanner will have 
> implementations for the Memtable and SSTables, allowing for uniform iteration 
> of all sources.
> The object that a Scanner iterates over is org.apache.cassandra.Slice, which 
> is immutable, and contains parent deletion Metadata 
> (markedForDeleteAt/localDeletionTime: like a ColumnFamily or SuperColumn). 
> Since only the highest markedForDeleteAt or localDeletionTime matters for 
> nested columns, Slices simplify storage of this data by storing a single 
> value for all parents. The Metadata in a Slice is bounded at each end by a 
> org.apache.cassandra.db.ColumnKey, which is a compound key representing the 
> full path to a column, or a parent boundary.
> The ColumnKeys in a Slice make it possible to delete column name ranges. By 
> convention (in this patch), the ColumnKeys in a Slice always share parents. 
> In the future, if we wanted to support range deletes for rows or 
> supercolumns, it would be trivial to remove that assumption.
> SSTables and Memtables can be abstracted into "sorted lists of Slices" which 
> are individually non-intersecting. Client reads and compactions can use 
> org.apache.cassandra.SliceMergingIterator to merge the Slices from multiple 
> Scanners into a new Scanner which is globally non-intersecting. This process 
> will be at the heart of any read from a ColumnFamilyStore by the end of 998, 
> but this issue only uses SliceMergingIterator at the core of compaction, by 
> making CompactionIterator a subclass of SliceMergingIterator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to