[jira] [Commented] (SPARK-5881) RDD remains cached after the table gets overridden by "CACHE TABLE"
[ https://issues.apache.org/jira/browse/SPARK-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556469#comment-15556469 ] Xiao Li commented on SPARK-5881: I think this has been resolved. Let me close it. Thanks! > RDD remains cached after the table gets overridden by "CACHE TABLE" > --- > > Key: SPARK-5881 > URL: https://issues.apache.org/jira/browse/SPARK-5881 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Priority: Critical > > {code} > val rdd = sc.parallelize((1 to 10).map(i => s"""{"a":$i, "b":"str${i}"}""")) > sqlContext.jsonRDD(rdd).registerTempTable("jt") > sqlContext.sql("CACHE TABLE foo AS SELECT * FROM jt") > sqlContext.sql("CACHE TABLE foo AS SELECT a FROM jt") > {code} > After the second CACHE TABLE command, the RDD for the first table still > remains in the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5881) RDD remains cached after the table gets overridden by CACHE TABLE
[ https://issues.apache.org/jira/browse/SPARK-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495621#comment-14495621 ] Yin Huai commented on SPARK-5881: - [~lian cheng] [~marmbrus] I think we need some major changes with our cache manager to fix it. I am inclined to bump the version. RDD remains cached after the table gets overridden by CACHE TABLE --- Key: SPARK-5881 URL: https://issues.apache.org/jira/browse/SPARK-5881 Project: Spark Issue Type: Bug Components: SQL Reporter: Yin Huai Priority: Critical {code} val rdd = sc.parallelize((1 to 10).map(i = s{a:$i, b:str${i}})) sqlContext.jsonRDD(rdd).registerTempTable(jt) sqlContext.sql(CACHE TABLE foo AS SELECT * FROM jt) sqlContext.sql(CACHE TABLE foo AS SELECT a FROM jt) {code} After the second CACHE TABLE command, the RDD for the first table still remains in the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5881) RDD remains cached after the table gets overridden by CACHE TABLE
[ https://issues.apache.org/jira/browse/SPARK-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327696#comment-14327696 ] Yin Huai commented on SPARK-5881: - As mentioned by [~lian cheng], we should also track the table names in the Cache Manager to correctly handle the following case. {code} val df1 = sql(SELECT * FROM testData LIMIT 10) df1.registerTempTable(t1) // Cache t1 explicitly sql(CACHE TABLE t1) // t1 and t2 share the same query plan sql(CACHE TABLE t2 AS SELECT * FROM testData LIMIT 10) // Replace t2 with a different query plan sql(CACHE TABLE t2 AS SELECT * FROM testData LIMIT 5) {code} RDD remains cached after the table gets overridden by CACHE TABLE --- Key: SPARK-5881 URL: https://issues.apache.org/jira/browse/SPARK-5881 Project: Spark Issue Type: Bug Components: SQL Reporter: Yin Huai Priority: Blocker {code} val rdd = sc.parallelize((1 to 10).map(i = s{a:$i, b:str${i}})) sqlContext.jsonRDD(rdd).registerTempTable(jt) sqlContext.sql(CACHE TABLE foo AS SELECT * FROM jt) sqlContext.sql(CACHE TABLE foo AS SELECT a FROM jt) {code} After the second CACHE TABLE command, the RDD for the first table still remains in the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5881) RDD remains cached after the table gets overridden by CACHE TABLE
[ https://issues.apache.org/jira/browse/SPARK-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327056#comment-14327056 ] Apache Spark commented on SPARK-5881: - User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/4689 RDD remains cached after the table gets overridden by CACHE TABLE --- Key: SPARK-5881 URL: https://issues.apache.org/jira/browse/SPARK-5881 Project: Spark Issue Type: Bug Components: SQL Reporter: Yin Huai Priority: Blocker {code} val rdd = sc.parallelize((1 to 10).map(i = s{a:$i, b:str${i}})) sqlContext.jsonRDD(rdd).registerTempTable(jt) sqlContext.sql(CACHE TABLE foo AS SELECT * FROM jt) sqlContext.sql(CACHE TABLE foo AS SELECT a FROM jt) {code} After the second CACHE TABLE command, the RDD for the first table still remains in the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org