Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202093508 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 decimal(9, 2) row (value = 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Parquet Vectorized 3785 / 3867 4.2 240.6 1.0X -Parquet Vectorized (Pushdown) 3820 / 3928 4.1 242.9 1.0X -Native ORC Vectorized 3981 / 4049 4.0 253.1 1.0X -Native ORC Vectorized (Pushdown) 702 / 735 22.4 44.6 5.4X +Parquet Vectorized 4407 / 4852 3.6 280.2 1.0X +Parquet Vectorized (Pushdown) 1602 / 1634 9.8 101.8 2.8X --- End diff -- I'm not sure I understand. That's less than 2^24, so it should fit in an int. It should also fit in 8 base-ten digits so decimal(9,2) should work. And last, if the values don't fit in an int, I'm not sure how we would be able to store them in the first place, regardless of how stats are handled. Did you verify that there are no stats for the file produced here? If that's the case, it would make sense with these numbers. I think we just need to look for a different reason why stats are missing.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org