Takuya Ueshin created SPARK-15915: ------------------------------------- Summary: CacheManager should use canonicalized plan for planToCache. Key: SPARK-15915 URL: https://issues.apache.org/jira/browse/SPARK-15915 Project: Spark Issue Type: Bug Components: SQL Reporter: Takuya Ueshin
{{DataFrame}} with plan overriding {{sameResult}} but not using canonicalized plan to compare can't cacheTable. The example is like: {code} val localRelation = Seq(1, 2, 3).toDF() localRelation.createOrReplaceTempView("localRelation") spark.catalog.cacheTable("localRelation") assert( localRelation.queryExecution.withCachedData.collect { case i: InMemoryRelation => i }.size == 1) {code} and this will fail as: {noformat} ArrayBuffer() had size 0 instead of expected size 1 {noformat} The reason is that when do {{spark.catalog.cacheTable("localRelation")}}, {{CacheManager}} tries to cache for the plan wrapped by {{SubqueryAlias}} but when planning for the DataFrame {{localRelation}}, {{CacheManager}} tries to find cached table for the not-wrapped plan because the plan for DataFrame {{localRelation}} is not wrapped. Some plans like {{LocalRelation}}, {{LogicalRDD}}, etc. override {{sameResult}} method, but not use canonicalized plan to compare so the {{CacheManager}} can't detect the plans are the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org