[jira] Created: (CASSANDRA-1601) Refactor index definitions

Stu Hood (JIRA) Sun, 10 Oct 2010 22:49:00 -0700

Refactor index definitions
--------------------------

                 Key: CASSANDRA-1601
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1601
             Project: Cassandra
          Issue Type: Improvement
          Components: API
            Reporter: Stu Hood
            Priority: Critical
             Fix For: 0.7.0



h3. Overview
There are a few considerations for defining secondary indexes and row 
validation that I don't think have been brought up yet. While the interface is 
still malleable pre 0.7.0, we should attempt to make changes that allow for 
forwards compatibility of index/validator schemas. This is an umbrella ticket 
for suggesting/debating the changes: other tickets should be opened for quick 
improvements that can be made before 0.7.0.

----

h3. Index output types
The output (queryable) data from an indexing operation is what actually goes in 
the index. For a particular row, the output can be either _single-valued_, 
_multi-valued_ or _compound_:
* Single-valued
** Implemented in trunk (special case of multi-valued)
* Multi-valued
** Multiple index values _of the same type_ can match a single row
** Row probably contains a list/set (perhaps in a supercolumn)
* Compound
** Multiple base properties concatenated as one index entry 
** Different validators/comparators for each component
** (Given the simplicity of performing boolean operations on 1472 indexes, 
compound local indexes are unlikely to ever be worthwhile, but compound 
distributed indexes will be: see comments on CASSANDRA-1599)

h3. Index input types
The other end of indexing is selection of values from a row to be indexed. 
Selection can correspond directly to our current {{db.filter.*}} 
implementations, and may be best implemented by specifying the validator/index 
using the same Thrift objects you would use for a similar query:
* Name selection
** Implemented in trunk, but should probably just be a special case of list 
selection below
** Corresponds to db.filter.NamesQueryFilter of size 1
* List selection
** Should specify a list of columns of which all values must be of the same 
type, as defined by the Validator
** Corresponds to db.filter.NamesQueryFilter
* Range (prefix?) selection
** Subsets of a row may be interesting for indexing
** Range corresponds to db.filter.SliceQueryFilter
*** (A Prefix might actually be more useful for indexing, but is better 
implemented by indexing an arbitrarily nested row)
** Open question: might the ability to index only the 'top N values' from a row 
be useful? If so, then this selector should allow N to be specified like it 
would be for a slice

h3. Supercolumns/arbitrary-nesting
Another consideration is that we should be able to support indexing and 
validation of supercolumns (and hence, arbitrarily nested rows). Since the 
selection of columns to index is essentially the same as the selection of 
columns to return for a query, this can probably mirror (and suggest 
improvements to) our query API.

h3. UDFs
This is obviously still an open area, but user defined indexing functions are 
essentially a transform between the _input_ and _output_ (as defined above), 
which would normally have equal structures. Leaving room for UDFs in our index 
definitions makes sense, and will likely lead to a much more general and 
elegant design.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (CASSANDRA-1601) Refactor index definitions

Reply via email to