Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19864#discussion_r155159490
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
    @@ -94,14 +94,16 @@ class CacheManager extends Logging {
           logWarning("Asked to cache already cached data.")
         } else {
           val sparkSession = query.sparkSession
    -      cachedData.add(CachedData(
    -        planToCache,
    -        InMemoryRelation(
    -          sparkSession.sessionState.conf.useCompression,
    -          sparkSession.sessionState.conf.columnBatchSize,
    -          storageLevel,
    -          sparkSession.sessionState.executePlan(planToCache).executedPlan,
    -          tableName)))
    +      val inMemoryRelation = InMemoryRelation(
    +        sparkSession.sessionState.conf.useCompression,
    +        sparkSession.sessionState.conf.columnBatchSize,
    +        storageLevel,
    +        sparkSession.sessionState.executePlan(planToCache).executedPlan,
    +        tableName)
    +      if (planToCache.conf.cboEnabled && 
planToCache.stats.rowCount.isDefined) {
    --- End diff --
    
    The statistics from relation is based on files size, will it easily cause 
OOM issue? I think in the cases other than cached query, we still use this 
relation's statistics. If this is an issue, doesn't it also affect the other 
cases?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to