Shea Parkes created SPARK-20683:
-----------------------------------

             Summary: Make table uncache chaining optional
                 Key: SPARK-20683
                 URL: https://issues.apache.org/jira/browse/SPARK-20683
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.1
         Environment: Not particularly environment sensitive.  
Encountered/tested on Linux and Windows.
            Reporter: Shea Parkes


A recent change was made in SPARK-19765 that causes table uncaching to chain.  
That is, if table B is a child of table A, and they are both cached, now 
uncaching table A will automatically uncache table B.

At first I did not understand the need for this, but when reading the unit 
tests, I see that it is likely that many people do not keep named references to 
the child table (e.g. B).  Perhaps B is just made and cached as some part of 
data exploration.  In that situation, it makes sense for B to automatically be 
uncached when you are finished with A.

However, we commonly utilize a different design pattern that is now harmed by 
this automatic uncaching.  It is common for us to cache table A to then make 
two, independent children tables (e.g. B and C).  Once those two child tables 
are realized and cached, we'd then uncache table A (as it was no longer needed 
and could be quite large).  After this change now, when we uncache table A, we 
suddenly lose our cached status on both table B and C (which is quite 
frustrating).  All of these tables are often quite large, and we view what 
we're doing as mindful memory management.  We are maintaining named references 
to B and C at all times, so we can always uncache them ourselves when it makes 
sense.

Would it be acceptable/feasible to make this table uncache chaining optional?  
I would be fine if the default is for the chaining to happen, as long as we can 
turn it off via parameters.

If acceptable, I can try to work towards making the required changes.  I am 
most comfortable in Python (and would want the optional parameter surfaced in 
Python), but have found the places required to make this change in Scala (since 
I reverted the functionality in a private fork already).  Any help would be 
greatly appreciated however.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to