[ https://issues.apache.org/jira/browse/SPARK-20683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003864#comment-16003864 ]
Shea Parkes commented on SPARK-20683: ------------------------------------- For anyone that found this issue and just wants to revert to the old behavior in their own fork, the following change in {{sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala}} worked for us: {code} - if (cd.plan.find(_.sameResult(plan)).isDefined) { + if (cd.plan.sameResult(plan)) { {code} > Make table uncache chaining optional > ------------------------------------ > > Key: SPARK-20683 > URL: https://issues.apache.org/jira/browse/SPARK-20683 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1 > Environment: Not particularly environment sensitive. > Encountered/tested on Linux and Windows. > Reporter: Shea Parkes > > A recent change was made in SPARK-19765 that causes table uncaching to chain. > That is, if table B is a child of table A, and they are both cached, now > uncaching table A will automatically uncache table B. > At first I did not understand the need for this, but when reading the unit > tests, I see that it is likely that many people do not keep named references > to the child table (e.g. B). Perhaps B is just made and cached as some part > of data exploration. In that situation, it makes sense for B to > automatically be uncached when you are finished with A. > However, we commonly utilize a different design pattern that is now harmed by > this automatic uncaching. It is common for us to cache table A to then make > two, independent children tables (e.g. B and C). Once those two child tables > are realized and cached, we'd then uncache table A (as it was no longer > needed and could be quite large). After this change now, when we uncache > table A, we suddenly lose our cached status on both table B and C (which is > quite frustrating). All of these tables are often quite large, and we view > what we're doing as mindful memory management. We are maintaining named > references to B and C at all times, so we can always uncache them ourselves > when it makes sense. > Would it be acceptable/feasible to make this table uncache chaining optional? > I would be fine if the default is for the chaining to happen, as long as we > can turn it off via parameters. > If acceptable, I can try to work towards making the required changes. I am > most comfortable in Python (and would want the optional parameter surfaced in > Python), but have found the places required to make this change in Scala > (since I reverted the functionality in a private fork already). Any help > would be greatly appreciated however. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org