[jira] [Comment Edited] (SPARK-29038) SPIP: Support Spark Materialized View

Lantao Jin (Jira) Tue, 10 Sep 2019 20:25:29 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927214#comment-16927214
 ]


Lantao Jin edited comment on SPARK-29038 at 9/11/19 3:24 AM:
-------------------------------------------------------------

[~mgaido] IIUC, there is no "query caching" in Spark, even no result cache. But 
Spark natively supports RDD-level cache. Multiple jobs can share cached RDD. 
The cached RDD is closer to the calculation result and requires less 
computation. In addition, the file system level cache such as HDFS cache or 
Alluxio can also load data into memory in advance, improving data processing 
efficiency. But materialized view actually is a technology about summaries 
*precalculating*. Summaries are special types of aggregate views that improve 
query execution times by precalculating expensive joins and aggregation 
operations prior to execution and storing the results in a table in the 
database. The query optimizer transparently rewrites the request to use the 
materialized view. Queries go directly to the materialized view  
 which had been persisted in storage (e.g HDFS) and not to the underlying 
detail tables. 


was (Author: cltlfcjin):
[~mgaido] IIUC, there is no "query caching" in Spark, even no result cache. But 
Spark natively supports RDD-level cache. Multiple jobs can share cached RDD. 
The cached RDD is closer to the calculation result and requires less 
computation. In addition, the file system level cache such as HDFS cache or 
Alluxio can also load data into memory in advance, improving data processing 
efficiency. But materialized view actually is a technology about summaries 
*precalculating*. Summaries are special types of aggregate views that improve 
query execution times by precalculating expensive joins and aggregation 
operations prior to execution and storing the results in a table in the 
database. The query optimizer transparently rewrites the request to use the 
materialized view. Queries go directly to the materialized view and not to the 
underlying detail tables which had been materialized to storage like HDFS. 

> SPIP: Support Spark Materialized View
> -------------------------------------
>
>                 Key: SPARK-29038
>                 URL: https://issues.apache.org/jira/browse/SPARK-29038
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Lantao Jin
>            Priority: Major
>
> Materialized view is an important approach in DBMS to cache data to 
> accelerate queries. By creating a materialized view through SQL, the data 
> that can be cached is very flexible, and needs to be configured arbitrarily 
> according to specific usage scenarios. The Materialization Manager 
> automatically updates the cache data according to changes in detail source 
> tables, simplifying user work. When user submit query, Spark optimizer 
> rewrites the execution plan based on the available materialized view to 
> determine the optimal execution plan.
> Details in [design 
> doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29038) SPIP: Support Spark Materialized View

Reply via email to