Michael Armbrust created SPARK-3212: ---------------------------------------
Summary: Improve the clarity of caching semantics Key: SPARK-3212 URL: https://issues.apache.org/jira/browse/SPARK-3212 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Priority: Blocker Right now there are a bunch of different ways to cache tables in Spark SQL. For example: - tweets.cache() - sql("SELECT * FROM tweets").cache() - table("tweets").cache() - tweets.cache().registerTempTable(tweets) - sql("CACHE TABLE tweets") - cacheTable("tweets") Each of the above commands has subtly different semantics, leading to a very confusing user experience. Ideally, we would stop doing caching based on simple tables names and instead have a phase of optimization that does intelligent matching of query plans with available cached data. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org