[ https://issues.apache.org/jira/browse/SPARK-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327696#comment-14327696 ]
Yin Huai commented on SPARK-5881: --------------------------------- As mentioned by [~lian cheng], we should also track the table names in the Cache Manager to correctly handle the following case. {code} val df1 = sql("SELECT * FROM testData LIMIT 10") df1.registerTempTable("t1") // Cache t1 explicitly sql("CACHE TABLE t1") // t1 and t2 share the same query plan sql("CACHE TABLE t2 AS SELECT * FROM testData LIMIT 10") // Replace t2 with a different query plan sql("CACHE TABLE t2 AS SELECT * FROM testData LIMIT 5") {code} > RDD remains cached after the table gets overridden by "CACHE TABLE" > ------------------------------------------------------------------- > > Key: SPARK-5881 > URL: https://issues.apache.org/jira/browse/SPARK-5881 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Yin Huai > Priority: Blocker > > {code} > val rdd = sc.parallelize((1 to 10).map(i => s"""{"a":$i, "b":"str${i}"}""")) > sqlContext.jsonRDD(rdd).registerTempTable("jt") > sqlContext.sql("CACHE TABLE foo AS SELECT * FROM jt") > sqlContext.sql("CACHE TABLE foo AS SELECT a FROM jt") > {code} > After the second CACHE TABLE command, the RDD for the first table still > remains in the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org