Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19864#discussion_r155159490 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala --- @@ -94,14 +94,16 @@ class CacheManager extends Logging { logWarning("Asked to cache already cached data.") } else { val sparkSession = query.sparkSession - cachedData.add(CachedData( - planToCache, - InMemoryRelation( - sparkSession.sessionState.conf.useCompression, - sparkSession.sessionState.conf.columnBatchSize, - storageLevel, - sparkSession.sessionState.executePlan(planToCache).executedPlan, - tableName))) + val inMemoryRelation = InMemoryRelation( + sparkSession.sessionState.conf.useCompression, + sparkSession.sessionState.conf.columnBatchSize, + storageLevel, + sparkSession.sessionState.executePlan(planToCache).executedPlan, + tableName) + if (planToCache.conf.cboEnabled && planToCache.stats.rowCount.isDefined) { --- End diff -- The statistics from relation is based on files size, will it easily cause OOM issue? I think in the cases other than cached query, we still use this relation's statistics. If this is an issue, doesn't it also affect the other cases?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org