[GitHub] spark pull request #19831: [SPARK-22626][SQL] Wrong Hive table statistics ma...

wzhfy Tue, 28 Nov 2017 18:02:53 -0800

Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19831#discussion_r153677300
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
    @@ -418,7 +418,7 @@ private[hive] class HiveClientImpl(
           // Note that this statistics could be overridden by Spark's 
statistics if that's available.
           val totalSize = 
properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
           val rawDataSize = 
properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
    -      val rowCount = 
properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)).filter(_ >= 0)
    +      val rowCount = 
properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)).filter(_ > 0)
    --- End diff --
    
    Hive has a flag called `StatsSetupConst.COLUMN_STATS_ACCURATE`. If I 
remember correctly, this flag will become **false** if user changes table 
properties or table data. Can you check if the flag exists in your case? If so, 
we can use the flag to decide whether to read statistics from Hive.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19831: [SPARK-22626][SQL] Wrong Hive table statistics ma...

Reply via email to