[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504086#comment-14504086 ]
Jack Krupansky commented on CASSANDRA-6477: ------------------------------------------- It would be helpful if someone were to update the description and primary use case(s) for this feature. My understanding of the original use case was to avoid the fan out from the coordinator node on an indexed query - the global index would contain the partition keys for matched rows so that only the node(s) containing those partition key(s) would be needed. So, my question at this stage is whether the intention is that the initial cut of MV would include a focus on that performance optimization use case, or merely focus on the increased general flexibility of MV instead. Would the initial implementation of MV even necessarily use a GI? Would local vs. global index be an option to be specified? Also, whether it is GI or MV, what guidance will the spec, doc, and training give users as to its performance and scalability? My concern with GI was that it works well for small to medium-sized clusters, but not with very large clusters. So, what would the largest cluster that a user could use a GI for? And also how many GI's make sense. For example, with 1 billion rows per node, and 50 nodes, and a GI on 10 columns, that would be... 1B * 50 * 10 = 500 billion index entries on each node, right? Seems like a bit much for a JVM heap or even off-heap memory. Maybe 500M * 20 * 4 = 40 billion index entries per node would be a wiser upper limit, and even that may be a bit extreme. > Global indexes > -------------- > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)