[GitHub] spark pull request #19560: [SPARK-22334][SQL] Check table size from HDFS in ...

dongjoon-hyun Mon, 23 Oct 2017 09:17:45 -0700

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19560#discussion_r146319070
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -187,6 +187,15 @@ object SQLConf {
         .booleanConf
         .createWithDefault(false)
     
    +  val VERIFY_STATS_FROM_HDFS_WHEN_BROADCASTJOIN =
    +    buildConf("spark.sql.statistics.verifyStatsFromHdfsWhenBroadcastJoin")
    +    .doc("If table size in metastore is below 
spark.sql.autoBroadcastJoinThreshold, check the " +
    +      "size on Hdfs and set table size to be the bigger one. This is for 
defense and help avoid" +
    +      " OOM caused by broadcast join. It's useful when metastore failed to 
update the stats of" +
    --- End diff --
    
    Hi, @jinxing64 .
    Is this helpful for highly compressed Parquet tables, too?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19560: [SPARK-22334][SQL] Check table size from HDFS in ...

Reply via email to