[ https://issues.apache.org/jira/browse/SPARK-27739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-27739. --------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24623 [https://github.com/apache/spark/pull/24623] > df.persist should save stats from optimized plan > ------------------------------------------------ > > Key: SPARK-27739 > URL: https://issues.apache.org/jira/browse/SPARK-27739 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: John Zhuge > Assignee: John Zhuge > Priority: Minor > Fix For: 3.0.0 > > > CacheManager.cacheQuery passes the stats for `planToCache` to > InMemoryRelation. Since the plan has not been optimized, the stats is > inaccurate because project and filter have not been applied. I'd suggest > passing the stats from the optimized plan. > {code:java} > class CacheManager extends Logging { > ... > def cacheQuery( > query: Dataset[_], > tableName: Option[String] = None, > storageLevel: StorageLevel = MEMORY_AND_DISK): Unit = { > val planToCache = query.logicalPlan > if (lookupCachedData(planToCache).nonEmpty) { > logWarning("Asked to cache already cached data.") > } else { > val sparkSession = query.sparkSession > val inMemoryRelation = InMemoryRelation( > sparkSession.sessionState.conf.useCompression, > sparkSession.sessionState.conf.columnBatchSize, storageLevel, > sparkSession.sessionState.executePlan(planToCache).executedPlan, > tableName, > planToCache) <<<<<== > ... > } > object InMemoryRelation { > def apply( > useCompression: Boolean, > batchSize: Int, > storageLevel: StorageLevel, > child: SparkPlan, > tableName: Option[String], > logicalPlan: LogicalPlan): InMemoryRelation = { > val cacheBuilder = CachedRDDBuilder(useCompression, batchSize, > storageLevel, child, tableName) > val relation = new InMemoryRelation(child.output, cacheBuilder, > logicalPlan.outputOrdering) > relation.statsOfPlanToCache = logicalPlan.stats <<<<<== > relation > } > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org