Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19864#discussion_r155020000
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
    @@ -94,14 +94,16 @@ class CacheManager extends Logging {
           logWarning("Asked to cache already cached data.")
         } else {
           val sparkSession = query.sparkSession
    -      cachedData.add(CachedData(
    -        planToCache,
    -        InMemoryRelation(
    -          sparkSession.sessionState.conf.useCompression,
    -          sparkSession.sessionState.conf.columnBatchSize,
    -          storageLevel,
    -          sparkSession.sessionState.executePlan(planToCache).executedPlan,
    -          tableName)))
    +      val inMemoryRelation = InMemoryRelation(
    +        sparkSession.sessionState.conf.useCompression,
    +        sparkSession.sessionState.conf.columnBatchSize,
    +        storageLevel,
    +        sparkSession.sessionState.executePlan(planToCache).executedPlan,
    +        tableName)
    +      if (planToCache.conf.cboEnabled && 
planToCache.stats.rowCount.isDefined) {
    --- End diff --
    
    the reason I put it here is that when we did not enable CBO, the stats in 
the underlying plan might be much smaller than the actual size in memory 
leading to the potential risk of OOM error. 
    
    The underlying cause is that without CBO enabled, the size of the plan is 
calculated with BaseRelation's sizeInBytes, but with CBO, we can have a more 
accurate estimation, 
    
    
    
https://github.com/apache/spark/blob/03fdc92e42d260a2b7c0090115f162ba5c091aae/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala#L42-L46
    
    
https://github.com/apache/spark/blob/03fdc92e42d260a2b7c0090115f162ba5c091aae/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L370-L381


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to