GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/21594

    [SPARK-24596][SQL] Non-cascading Cache Invalidation

    ## What changes were proposed in this pull request?
    
    1. Add parameter 'cascade' in CacheManager.uncacheQuery(). Under 
'cascade=false' mode, only invalidate the current cache, and for other 
dependent caches, rebuild execution plan and reuse cached buffer.
    2. Pass true/false from callers in different uncache scenarios:
    - Drop tables and regular (persistent) views: regular mode
    - Drop temporary views: non-cascading mode
    - Modify table contents (INSERT/UPDATE/MERGE/DELETE): regular mode
    - Call DataSet.unpersist(): non-cascading mode
    
    Note that a regular (persistent) view is a database object just like a 
table, so after dropping a regular view (whether cached or not cached), any 
query referring to that view should no long be valid. Hence if a cached 
persistent view is dropped, we need to invalidate the all dependent caches so 
that exceptions will be thrown for any later reference. On the other hand, a 
temporary view is in fact equivalent to an unnamed DataSet, and dropping a 
temporary view should have no impact on queries referencing that view. Thus we 
should do non-cascading uncaching for temporary views, which also guarantees a 
consistent uncaching behavior between temporary views and unnamed DataSets.
    
    ## How was this patch tested?
    
    New tests in CachedTableSuite and DatasetCacheSuite.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark noncascading-cache

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21594
    
----
commit 27e484b97ec5f9fdbfdaa5c8c1d9f45233cbbdbe
Author: Maryann Xue <maryannxue@...>
Date:   2018-06-19T04:32:11Z

    noncascading cache

commit 483008c577c0ec7335b0a9a1c567f60311bb83a6
Author: Maryann Xue <maryannxue@...>
Date:   2018-06-19T18:18:06Z

    code refine

commit a782aacd5d4943b8bbfadde27a9c9e9d30c223fe
Author: Maryann Xue <maryannxue@...>
Date:   2018-06-19T18:24:57Z

    Merge remote-tracking branch 'origin/master' into noncascading-cache

commit 0cd8dc10eb85b6df1704e13084f53f9cefe410b3
Author: Maryann Xue <maryannxue@...>
Date:   2018-06-19T21:36:29Z

    refine test cases

commit 71b93ed598833d760955e972894685c089af297b
Author: Maryann Xue <maryannxue@...>
Date:   2018-06-19T22:19:05Z

    refine test cases

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to