[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504086#comment-14504086
 ] 

Jack Krupansky commented on CASSANDRA-6477:
-------------------------------------------

It would be helpful if someone were to update the description and primary use 
case(s) for this feature.

My understanding of the original use case was to avoid the fan out from the 
coordinator node on an indexed query - the global index would contain the 
partition keys for matched rows so that only the node(s) containing those 
partition key(s) would be needed. So, my question at this stage is whether the 
intention is that the initial cut of MV would include a focus on that 
performance optimization use case, or merely focus on the increased general 
flexibility of MV instead. Would the initial implementation of MV even 
necessarily use a GI? Would local vs. global index be an option to be specified?

Also, whether it is GI or MV, what guidance will the spec, doc, and training 
give users as to its performance and scalability? My concern with GI was that 
it works well for small to medium-sized clusters, but not with very large 
clusters. So, what would the largest cluster that a user could use a GI for? 
And also how many GI's make sense. For example, with 1 billion rows per node, 
and 50 nodes, and a GI on 10 columns, that would be... 1B * 50 * 10 = 500 
billion index entries on each node, right? Seems like a  bit much for a JVM 
heap or even off-heap memory. Maybe 500M * 20 * 4 = 40 billion index entries 
per node would be a wiser upper limit, and even that may be a bit extreme.




> Global indexes
> --------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to