[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-9415.
-----------------------------------------
    Resolution: Later

While I understand the interest it would have for analytics, I, like others 
expressed, tend to not love adding an implicit way to do something when we have 
an explicit one that seems simple enough to me (and we've constantly tried to 
favor explicit over implicit in C* as a design choice).

But the more important problem is that doing this would break the consistency 
level guarantees. And for that reason I'm -1 on doing this completely 
implicitly (unless/until we somehow fix the CL guarantees made on MVs but 
that's probably not around the corner either). In theory I suppose we could add 
some form of flag for the user to say "I'm happy to give up any CL guarantee 
but please use MVs if you can" but that's pretty messy imo.

I'll also note that while it would undeniably be convenient for analytics, it's 
totally possible for analytic drivers to check the defined MVs and do the exact 
same work on their side.

Anyway, given what's above I'm going to close as "Later", by which I mean 
"We're not planning to implement this anytime soon so let's be upfront about 
it, but maybe later, when we have more mileage with MVs and if we've improve 
the CL guarantees on them, we might look again at this with fresh eyes".


> Implicit use of Materialized Views on SELECT
> --------------------------------------------
>
>                 Key: CASSANDRA-9415
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter:  Brian Hess
>              Labels: ponies
>
> CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
> write path for the best-practice of "query tables".  But it does not simplify 
> the read path as much as our users want/need.
> We suggest to folks to create multiple copies of their base table optimized 
> for certain queries - hence "query table".  For example, we may have a USER 
> table with two type of queries: lookup by userid and lookup by email address. 
>  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
> would have the exact same schema, with the same PRIMARY KEY columns, but 
> different PARTITION KEY - the first would be USERID and the second would be 
> EMAIL.
> One complicating thing with this approach is that the application now needs 
> to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
> INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
> covers this nicely.
> However, the other side of the coin is that the application needs to know 
> which query table to leverage based on the selection criteria.  Using the 
> example above, if the query has a predicate such as "WHERE userid = 'bhess'", 
> then USERS_BY_USERID is the better table to use.  Similarly, when the 
> predicate is "WHERE email = 'bhess@company.whatever'", USERS_BY_EMAIL is 
> appropriate.
> On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name" 
> to the collection of tables.  You do operations just on the base table.  It 
> is very attractive for the SELECT side as well.  It would be very good to 
> allow an application to simply do "SELECT * FROM users WHERE userid = 
> 'bhess'" and have that query implicitly leverage the USERS_BY_USERID 
> materialized view.
> For additional use cases, especially analytics use cases like in Spark, this 
> allows the Spark code to simply push down the query without having to know 
> about all of the MVs that have been set up.  The system will route the query 
> appropriately.  And if additional MVs are necessary to make a query run 
> better/faster, then those MVs can be set up and Spark will implicitly 
> leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to