[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522136#comment-14522136
 ] 

Matthias Broecheler commented on CASSANDRA-6477:
------------------------------------------------

I think the discussion around materialized views (which I would love to see in 
C* at some point) is distracting from what this ticket is really about: closing 
a hole in the indexing story for C*.

In RDBMS (and pretty much all other database systems), indexes are used to 
efficiently retrieve a set of rows identified by their columns values in a 
particular order at the expense of write performance. By design, C* builds a 
100% selectivity index on the primary key. In addition, one can install 
secondary indexes. Those secondary indexes are useful up to a certain 
selectivity %. Beyond that threshold, it becomes increasingly more efficient to 
maintain the index as a global distributed hash map rather than a local index 
on each node. And that's the hole in the indexing story, because those types of 
indexes must currently be maintained by the application.

I am stating the obvious here to point out that the first problem is to provide 
the infrastructure to create that second class of indexes while ensuring some 
form of (eventual) consistency. Much like with 2i, once that is in place one 
can utilize the infrastructure to build other things on top - including 
materialized views which will need this to begin with (if the primary key of 
your materialized view has high selectivity).

As for nomenclature, I agree that "global vs local" index is a technical 
distinction that has little to no meaning to the user. After all, they want to 
build an index to get to their data quickly. How that happens is highly 
secondary. Initially, it might make sense to ask the user to specify the 
selectivity estimate for the index (defaulting to low) and for C* to pick the 
best indexing approach based on that. In the future, one could utilize sampled 
histograms to help the user with that decision.

> Global indexes
> --------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.x
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to