[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631983#comment-14631983 ]
Jonathan Shook edited comment on CASSANDRA-6477 at 7/17/15 9:58 PM: -------------------------------------------------------------------- If we look at this from the perspective of a typical developer who simply wants query tables to be easier to manage, then the basic requirements are pretty simple: Emulate current practice. That isn't to say that we shouldn't dig deeper in terms of what would could make sense in different contexts, but the basic usage pattern that it is meant to simplify is pretty basic: * Logged batches are not commonly used to wrap a primary table with it's query tables during writes. The failure modes of these are usually well understood, meaning that it is clear what the implications are for a failed write in nearly every case. * The same CL is generally used for all related tables. * Savvy users will do this with async with the same CL for all of these operations. So effectively, I would expect the very basic form of this feature to look much like it would in practice already, except that it requires much less effort on the end user to maintain. I would like for us to consider that where the implementation varies from this, that there may be lots of potential for surprise. I really think we need to be following the principle of least surprise here as a start. It is almost certain that MV will be adopted quickly in places that have a need for it because they are essentially doing this manually at the present. If you require them to micro-manage the settings in order to even get close to the current result (performance, availability assumptions, ...) then we should change the defaults. It doesn't really seem necessary that we force the coordinator node to be a replica. This is orthogonal to the base problem, and has controls in topology aware clients already. As well, it does add potentially another hop, which I do have concerns about with respect to the above. was (Author: jshook): If we look at this from the perspective of a typical developer who simply wants query tables to be easier to manage, then the basic requirements are pretty simple: Emulate current practice. That isn't to say that we shouldn't dig deeper in terms of what would could make sense in different contexts, but the basic usage pattern that it is meant to simplify is pretty basic: * Logged batches are not commonly used to wrap a primary table with it's query tables during writes. The failure modes of these are usually well understood, meaning that it is clear what the implications are for a failed write in nearly every case. * The same CL is generally used for all related tables. * Savvy users will do this with async with the same CL for all of these operations. So effectively, I would expect the very basic form of this feature to look much like it would in practice already, except that it requires much less effort on the end user to maintain. I would like for us to consider that where the implementation varies from this, that there may be lots of potential for surprise. I really think we need to be following the principle of least surprise here as a start. It is almost certain that MV will be adopted quickly in places that have a need for it because the are essentially doing this manually at the present. If you require them to micro-manage the settings in order to even get close to the current result (performance, availability assumptions, ...) then we should change the defaults. It doesn't really seem necessary that we force the coordinator node to be a replica. This is orthogonal to the base problem, and has controls in topology aware clients already. As well, it does add potentially another hop, which I do have concerns about with respect to the above. > Materialized Views (was: Global Indexes) > ---------------------------------------- > > Key: CASSANDRA-6477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Assignee: Carl Yeksigian > Labels: cql > Fix For: 3.0 beta 1 > > Attachments: test-view-data.sh, users.yaml > > > Local indexes are suitable for low-cardinality data, where spreading the > index across the cluster is a Good Thing. However, for high-cardinality > data, local indexes require querying most nodes in the cluster even if only a > handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)