Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21556#discussion_r202093508
  
    --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt ---
    @@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     
     Select 1 decimal(9, 2) row (value = 7864320): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
     
------------------------------------------------------------------------------------------------
    -Parquet Vectorized                            3785 / 3867          4.2     
    240.6       1.0X
    -Parquet Vectorized (Pushdown)                 3820 / 3928          4.1     
    242.9       1.0X
    -Native ORC Vectorized                         3981 / 4049          4.0     
    253.1       1.0X
    -Native ORC Vectorized (Pushdown)               702 /  735         22.4     
     44.6       5.4X
    +Parquet Vectorized                            4407 / 4852          3.6     
    280.2       1.0X
    +Parquet Vectorized (Pushdown)                 1602 / 1634          9.8     
    101.8       2.8X
    --- End diff --
    
    I'm not sure I understand. That's less than 2^24, so it should fit in an 
int. It should also fit in 8 base-ten digits so decimal(9,2) should work. And 
last, if the values don't fit in an int, I'm not sure how we would be able to 
store them in the first place, regardless of how stats are handled.
    
    Did you verify that there are no stats for the file produced here? If 
that's the case, it would make sense with these numbers. I think we just need 
to look for a different reason why stats are missing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to