[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

2015-07-07 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616747#comment-14616747
 ] 

 Brian Hess commented on CASSANDRA-9415:


We already silently rewrite queries for users when they use a secondary index.  
The user doesn't specify that C* should consult the secondary index when they 
query; it is implicit.

 Implicit use of Materialized Views on SELECT
 

 Key: CASSANDRA-9415
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
 Project: Cassandra
  Issue Type: Improvement
Reporter:  Brian Hess
  Labels: ponies

 CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
 write path for the best-practice of query tables.  But it does not simplify 
 the read path as much as our users want/need.
 We suggest to folks to create multiple copies of their base table optimized 
 for certain queries - hence query table.  For example, we may have a USER 
 table with two type of queries: lookup by userid and lookup by email address. 
  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
 would have the exact same schema, with the same PRIMARY KEY columns, but 
 different PARTITION KEY - the first would be USERID and the second would be 
 EMAIL.
 One complicating thing with this approach is that the application now needs 
 to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
 INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
 covers this nicely.
 However, the other side of the coin is that the application needs to know 
 which query table to leverage based on the selection criteria.  Using the 
 example above, if the query has a predicate such as WHERE userid = 'bhess', 
 then USERS_BY_USERID is the better table to use.  Similarly, when the 
 predicate is WHERE email = 'bhess@company.whatever', USERS_BY_EMAIL is 
 appropriate.
 On INSERT/UPDATE/DELETE, Materialized Views essentially give a single name 
 to the collection of tables.  You do operations just on the base table.  It 
 is very attractive for the SELECT side as well.  It would be very good to 
 allow an application to simply do SELECT * FROM users WHERE userid = 
 'bhess' and have that query implicitly leverage the USERS_BY_USERID 
 materialized view.
 For additional use cases, especially analytics use cases like in Spark, this 
 allows the Spark code to simply push down the query without having to know 
 about all of the MVs that have been set up.  The system will route the query 
 appropriately.  And if additional MVs are necessary to make a query run 
 better/faster, then those MVs can be set up and Spark will implicitly 
 leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

2015-07-07 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616743#comment-14616743
 ] 

 Brian Hess commented on CASSANDRA-9415:


Wouldn't the preparing of the statement handle that?

 Implicit use of Materialized Views on SELECT
 

 Key: CASSANDRA-9415
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
 Project: Cassandra
  Issue Type: Improvement
Reporter:  Brian Hess
  Labels: ponies

 CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
 write path for the best-practice of query tables.  But it does not simplify 
 the read path as much as our users want/need.
 We suggest to folks to create multiple copies of their base table optimized 
 for certain queries - hence query table.  For example, we may have a USER 
 table with two type of queries: lookup by userid and lookup by email address. 
  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
 would have the exact same schema, with the same PRIMARY KEY columns, but 
 different PARTITION KEY - the first would be USERID and the second would be 
 EMAIL.
 One complicating thing with this approach is that the application now needs 
 to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
 INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
 covers this nicely.
 However, the other side of the coin is that the application needs to know 
 which query table to leverage based on the selection criteria.  Using the 
 example above, if the query has a predicate such as WHERE userid = 'bhess', 
 then USERS_BY_USERID is the better table to use.  Similarly, when the 
 predicate is WHERE email = 'bhess@company.whatever', USERS_BY_EMAIL is 
 appropriate.
 On INSERT/UPDATE/DELETE, Materialized Views essentially give a single name 
 to the collection of tables.  You do operations just on the base table.  It 
 is very attractive for the SELECT side as well.  It would be very good to 
 allow an application to simply do SELECT * FROM users WHERE userid = 
 'bhess' and have that query implicitly leverage the USERS_BY_USERID 
 materialized view.
 For additional use cases, especially analytics use cases like in Spark, this 
 allows the Spark code to simply push down the query without having to know 
 about all of the MVs that have been set up.  The system will route the query 
 appropriately.  And if additional MVs are necessary to make a query run 
 better/faster, then those MVs can be set up and Spark will implicitly 
 leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

2015-07-07 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616746#comment-14616746
 ] 

 Brian Hess commented on CASSANDRA-9415:


For #1, that would be part of choosing the MV.  If it didn't contain all the 
columns in the projection list then you couldn't use that MV.
For #2, that would also be understandable by C*.  That is, you can see if the 
predicate of the query matches the predicate of the MV.  For example, if the 
MV had a predicate like WHERE x  100, then if the query had a predicate like 
WHERE x=200 then you know you could use the MV.

As to the driver being able to route the query - in order for the driver to 
route the query you need to prepare the statement.  In preparing the statement, 
you would choose the MV, and the driver gets that anyway.

 Implicit use of Materialized Views on SELECT
 

 Key: CASSANDRA-9415
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
 Project: Cassandra
  Issue Type: Improvement
Reporter:  Brian Hess
  Labels: ponies

 CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
 write path for the best-practice of query tables.  But it does not simplify 
 the read path as much as our users want/need.
 We suggest to folks to create multiple copies of their base table optimized 
 for certain queries - hence query table.  For example, we may have a USER 
 table with two type of queries: lookup by userid and lookup by email address. 
  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
 would have the exact same schema, with the same PRIMARY KEY columns, but 
 different PARTITION KEY - the first would be USERID and the second would be 
 EMAIL.
 One complicating thing with this approach is that the application now needs 
 to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
 INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
 covers this nicely.
 However, the other side of the coin is that the application needs to know 
 which query table to leverage based on the selection criteria.  Using the 
 example above, if the query has a predicate such as WHERE userid = 'bhess', 
 then USERS_BY_USERID is the better table to use.  Similarly, when the 
 predicate is WHERE email = 'bhess@company.whatever', USERS_BY_EMAIL is 
 appropriate.
 On INSERT/UPDATE/DELETE, Materialized Views essentially give a single name 
 to the collection of tables.  You do operations just on the base table.  It 
 is very attractive for the SELECT side as well.  It would be very good to 
 allow an application to simply do SELECT * FROM users WHERE userid = 
 'bhess' and have that query implicitly leverage the USERS_BY_USERID 
 materialized view.
 For additional use cases, especially analytics use cases like in Spark, this 
 allows the Spark code to simply push down the query without having to know 
 about all of the MVs that have been set up.  The system will route the query 
 appropriately.  And if additional MVs are necessary to make a query run 
 better/faster, then those MVs can be set up and Spark will implicitly 
 leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

2015-05-18 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548462#comment-14548462
 ] 

Aleksey Yeschenko commented on CASSANDRA-9415:
--

This sounds interesting. May or may not be doable to provide this by default - 
depending on consistency provided by CASSANDRA-6477 - but certainly something 
to look into.

 Implicit use of Materialized Views on SELECT
 

 Key: CASSANDRA-9415
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
 Project: Cassandra
  Issue Type: Improvement
Reporter:  Brian Hess

 CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
 write path for the best-practice of query tables.  But it does not simplify 
 the read path as much as our users want/need.
 We suggest to folks to create multiple copies of their base table optimized 
 for certain queries - hence query table.  For example, we may have a USER 
 table with two type of queries: lookup by userid and lookup by email address. 
  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
 would have the exact same schema, with the same PRIMARY KEY columns, but 
 different PARTITION KEY - the first would be USERID and the second would be 
 EMAIL.
 One complicating thing with this approach is that the application now needs 
 to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
 INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
 covers this nicely.
 However, the other side of the coin is that the application needs to know 
 which query table to leverage based on the selection criteria.  Using the 
 example above, if the query has a predicate such as WHERE userid = 'bhess', 
 then USERS_BY_USERID is the better table to use.  Similarly, when the 
 predicate is WHERE email = 'bhess@company.whatever', USERS_BY_EMAIL is 
 appropriate.
 On INSERT/UPDATE/DELETE, Materialized Views essentially give a single name 
 to the collection of tables.  You do operations just on the base table.  It 
 is very attractive for the SELECT side as well.  It would be very good to 
 allow an application to simply do SELECT * FROM users WHERE userid = 
 'bhess' and have that query implicitly leverage the USERS_BY_USERID 
 materialized view.
 For additional use cases, especially analytics use cases like in Spark, this 
 allows the Spark code to simply push down the query without having to know 
 about all of the MVs that have been set up.  The system will route the query 
 appropriately.  And if additional MVs are necessary to make a query run 
 better/faster, then those MVs can be set up and Spark will implicitly 
 leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

2015-05-18 Thread Carl Yeksigian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548493#comment-14548493
 ] 

Carl Yeksigian commented on CASSANDRA-9415:
---

There are a couple of problems with substituting materialized views in place of 
base tables transparently:
# Unless the MV includes all of the columns exactly as the underlying table has 
them, select statements which are valid on the base table will not be valid on 
the MV
# When where clauses are allowed for the MV, the MV can't be used for the 
select statement since it isn't a full copy of the base table

Also, using a different table was a benefit because the driver's will be able 
to easily route the queries (CASSANDRA-8517).

Overall, I'm weary of transforming user's queries for them; I'd rather users 
who use MV use the views directly. It is less likely that we would change the 
rules surrounding queries to a MV directly versus base table query 
transformations.

 Implicit use of Materialized Views on SELECT
 

 Key: CASSANDRA-9415
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
 Project: Cassandra
  Issue Type: Improvement
Reporter:  Brian Hess

 CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
 write path for the best-practice of query tables.  But it does not simplify 
 the read path as much as our users want/need.
 We suggest to folks to create multiple copies of their base table optimized 
 for certain queries - hence query table.  For example, we may have a USER 
 table with two type of queries: lookup by userid and lookup by email address. 
  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
 would have the exact same schema, with the same PRIMARY KEY columns, but 
 different PARTITION KEY - the first would be USERID and the second would be 
 EMAIL.
 One complicating thing with this approach is that the application now needs 
 to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
 INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
 covers this nicely.
 However, the other side of the coin is that the application needs to know 
 which query table to leverage based on the selection criteria.  Using the 
 example above, if the query has a predicate such as WHERE userid = 'bhess', 
 then USERS_BY_USERID is the better table to use.  Similarly, when the 
 predicate is WHERE email = 'bhess@company.whatever', USERS_BY_EMAIL is 
 appropriate.
 On INSERT/UPDATE/DELETE, Materialized Views essentially give a single name 
 to the collection of tables.  You do operations just on the base table.  It 
 is very attractive for the SELECT side as well.  It would be very good to 
 allow an application to simply do SELECT * FROM users WHERE userid = 
 'bhess' and have that query implicitly leverage the USERS_BY_USERID 
 materialized view.
 For additional use cases, especially analytics use cases like in Spark, this 
 allows the Spark code to simply push down the query without having to know 
 about all of the MVs that have been set up.  The system will route the query 
 appropriately.  And if additional MVs are necessary to make a query run 
 better/faster, then those MVs can be set up and Spark will implicitly 
 leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

2015-05-18 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548497#comment-14548497
 ] 

Aleksey Yeschenko commented on CASSANDRA-9415:
--

For the record - if we did this at all, it wouldn't be a part of 3.0.0. Maybe 
not even 3.X at all.

 Implicit use of Materialized Views on SELECT
 

 Key: CASSANDRA-9415
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
 Project: Cassandra
  Issue Type: Improvement
Reporter:  Brian Hess

 CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
 write path for the best-practice of query tables.  But it does not simplify 
 the read path as much as our users want/need.
 We suggest to folks to create multiple copies of their base table optimized 
 for certain queries - hence query table.  For example, we may have a USER 
 table with two type of queries: lookup by userid and lookup by email address. 
  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
 would have the exact same schema, with the same PRIMARY KEY columns, but 
 different PARTITION KEY - the first would be USERID and the second would be 
 EMAIL.
 One complicating thing with this approach is that the application now needs 
 to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
 INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
 covers this nicely.
 However, the other side of the coin is that the application needs to know 
 which query table to leverage based on the selection criteria.  Using the 
 example above, if the query has a predicate such as WHERE userid = 'bhess', 
 then USERS_BY_USERID is the better table to use.  Similarly, when the 
 predicate is WHERE email = 'bhess@company.whatever', USERS_BY_EMAIL is 
 appropriate.
 On INSERT/UPDATE/DELETE, Materialized Views essentially give a single name 
 to the collection of tables.  You do operations just on the base table.  It 
 is very attractive for the SELECT side as well.  It would be very good to 
 allow an application to simply do SELECT * FROM users WHERE userid = 
 'bhess' and have that query implicitly leverage the USERS_BY_USERID 
 materialized view.
 For additional use cases, especially analytics use cases like in Spark, this 
 allows the Spark code to simply push down the query without having to know 
 about all of the MVs that have been set up.  The system will route the query 
 appropriately.  And if additional MVs are necessary to make a query run 
 better/faster, then those MVs can be set up and Spark will implicitly 
 leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

2015-05-18 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548487#comment-14548487
 ] 

T Jake Luciani commented on CASSANDRA-9415:
---

This might be a better fit for the drivers to implement. since they could route 
the queries to the appropriate nodes directly.

 Implicit use of Materialized Views on SELECT
 

 Key: CASSANDRA-9415
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
 Project: Cassandra
  Issue Type: Improvement
Reporter:  Brian Hess

 CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
 write path for the best-practice of query tables.  But it does not simplify 
 the read path as much as our users want/need.
 We suggest to folks to create multiple copies of their base table optimized 
 for certain queries - hence query table.  For example, we may have a USER 
 table with two type of queries: lookup by userid and lookup by email address. 
  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
 would have the exact same schema, with the same PRIMARY KEY columns, but 
 different PARTITION KEY - the first would be USERID and the second would be 
 EMAIL.
 One complicating thing with this approach is that the application now needs 
 to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
 INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
 covers this nicely.
 However, the other side of the coin is that the application needs to know 
 which query table to leverage based on the selection criteria.  Using the 
 example above, if the query has a predicate such as WHERE userid = 'bhess', 
 then USERS_BY_USERID is the better table to use.  Similarly, when the 
 predicate is WHERE email = 'bhess@company.whatever', USERS_BY_EMAIL is 
 appropriate.
 On INSERT/UPDATE/DELETE, Materialized Views essentially give a single name 
 to the collection of tables.  You do operations just on the base table.  It 
 is very attractive for the SELECT side as well.  It would be very good to 
 allow an application to simply do SELECT * FROM users WHERE userid = 
 'bhess' and have that query implicitly leverage the USERS_BY_USERID 
 materialized view.
 For additional use cases, especially analytics use cases like in Spark, this 
 allows the Spark code to simply push down the query without having to know 
 about all of the MVs that have been set up.  The system will route the query 
 appropriately.  And if additional MVs are necessary to make a query run 
 better/faster, then those MVs can be set up and Spark will implicitly 
 leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

2015-05-18 Thread Ryan Svihla (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548435#comment-14548435
 ] 

Ryan Svihla commented on CASSANDRA-9415:


This would be a big win for a lot of analytics tools and would bring us ever 
closer to RDBMS for ease of use. I can see this greatly smoothing the learning 
curve for new users as well.

 Implicit use of Materialized Views on SELECT
 

 Key: CASSANDRA-9415
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
 Project: Cassandra
  Issue Type: Improvement
Reporter:  Brian Hess

 CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
 write path for the best-practice of query tables.  But it does not simplify 
 the read path as much as our users want/need.
 We suggest to folks to create multiple copies of their base table optimized 
 for certain queries - hence query table.  For example, we may have a USER 
 table with two type of queries: lookup by userid and lookup by email address. 
  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
 would have the exact same schema, with the same PRIMARY KEY columns, but 
 different PARTITION KEY - the first would be USERID and the second would be 
 EMAIL.
 One complicating thing with this approach is that the application now needs 
 to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
 INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
 covers this nicely.
 However, the other side of the coin is that the application needs to know 
 which query table to leverage based on the selection criteria.  Using the 
 example above, if the query has a predicate such as WHERE userid = 'bhess', 
 then USERS_BY_USERID is the better table to use.  Similarly, when the 
 predicate is WHERE email = 'bhess@company.whatever', USERS_BY_EMAIL is 
 appropriate.
 On INSERT/UPDATE/DELETE, Materialized Views essentially give a single name 
 to the collection of tables.  You do operations just on the base table.  It 
 is very attractive for the SELECT side as well.  It would be very good to 
 allow an application to simply do SELECT * FROM users WHERE userid = 
 'bhess' and have that query implicitly leverage the USERS_BY_USERID 
 materialized view.
 For additional use cases, especially analytics use cases like in Spark, this 
 allows the Spark code to simply push down the query without having to know 
 about all of the MVs that have been set up.  The system will route the query 
 appropriately.  And if additional MVs are necessary to make a query run 
 better/faster, then those MVs can be set up and Spark will implicitly 
 leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

2015-05-18 Thread Brian Hess (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548608#comment-14548608
 ] 

 Brian Hess commented on CASSANDRA-9415:


[~jbellis] Oracle, DB2, and SQL Server do this (at least - maybe others).

In Oracle it is Materialized Views (see 
http://docs.oracle.com/cd/B28359_01/server.111/b28313/basicmv.htm):
bq. The end user queries the tables and views at the detail data level. The 
query rewrite mechanism in the Oracle server automatically rewrites the SQL 
query to use the summary tables. This mechanism reduces response time for 
returning results from the query. Materialized views within the data warehouse 
are transparent to the end user or to the database application.

In DB2 it is Materialized Query Tables (see 
http://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/)
bq. Materialized query tables can significantly improve the performance of 
queries, especially complex queries. If the optimizer determines that a query 
or part of a query could be resolved using an MQT, the query might be rewritten 
to take advantage of the MQT.

In SQL Server it is Indexed Views (see 
https://msdn.microsoft.com/en-us/library/dd171921(SQL.100).aspx):
bq. The indexed view can be used in a query execution in two ways. The query 
can reference the indexed view directly, or, more importantly, the query 
optimizer can select the view if it determines that the view can be substituted 
for some or all of the query in the lowest-cost query plan. In the second case, 
the indexed view is used instead of the underlying tables and their ordinary 
indexes. The view does not need to be referenced in the query for the query 
optimizer to use it during query execution. This allows existing applications 
to benefit from the newly created indexed views without changing those 
applications.


 Implicit use of Materialized Views on SELECT
 

 Key: CASSANDRA-9415
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
 Project: Cassandra
  Issue Type: Improvement
Reporter:  Brian Hess
  Labels: ponies

 CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
 write path for the best-practice of query tables.  But it does not simplify 
 the read path as much as our users want/need.
 We suggest to folks to create multiple copies of their base table optimized 
 for certain queries - hence query table.  For example, we may have a USER 
 table with two type of queries: lookup by userid and lookup by email address. 
  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
 would have the exact same schema, with the same PRIMARY KEY columns, but 
 different PARTITION KEY - the first would be USERID and the second would be 
 EMAIL.
 One complicating thing with this approach is that the application now needs 
 to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
 INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
 covers this nicely.
 However, the other side of the coin is that the application needs to know 
 which query table to leverage based on the selection criteria.  Using the 
 example above, if the query has a predicate such as WHERE userid = 'bhess', 
 then USERS_BY_USERID is the better table to use.  Similarly, when the 
 predicate is WHERE email = 'bhess@company.whatever', USERS_BY_EMAIL is 
 appropriate.
 On INSERT/UPDATE/DELETE, Materialized Views essentially give a single name 
 to the collection of tables.  You do operations just on the base table.  It 
 is very attractive for the SELECT side as well.  It would be very good to 
 allow an application to simply do SELECT * FROM users WHERE userid = 
 'bhess' and have that query implicitly leverage the USERS_BY_USERID 
 materialized view.
 For additional use cases, especially analytics use cases like in Spark, this 
 allows the Spark code to simply push down the query without having to know 
 about all of the MVs that have been set up.  The system will route the query 
 appropriately.  And if additional MVs are necessary to make a query run 
 better/faster, then those MVs can be set up and Spark will implicitly 
 leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)