Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202214356 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 decimal(9, 2) row (value = 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ -Parquet Vectorized 3785 / 3867 4.2 240.6 1.0X -Parquet Vectorized (Pushdown) 3820 / 3928 4.1 242.9 1.0X -Native ORC Vectorized 3981 / 4049 4.0 253.1 1.0X -Native ORC Vectorized (Pushdown) 702 / 735 22.4 44.6 5.4X +Parquet Vectorized 4407 / 4852 3.6 280.2 1.0X +Parquet Vectorized (Pushdown) 1602 / 1634 9.8 101.8 2.8X --- End diff -- Here is a test: ```scala // decimal(9, 2) max values is 9999999.99 // 1024 * 1024 * 15 = 15728640 val path = "/tmp/spark/parquet" spark.range(1024 * 1024 * 15).selectExpr("cast((id) as decimal(9, 2)) as id").orderBy("id").write.mode("overwrite").parquet(path) ``` The generated parquet metadata: ```shell $ java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar meta /tmp/spark/parquet file: file:/tmp/spark/parquet/part-00000-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]} file schema: spark_schema -------------------------------------------------------------------------------- id: OPTIONAL INT32 O:DECIMAL R:0 D:1 row group 1: RC:5728640 TS:36 OFFSET:4 -------------------------------------------------------------------------------- id: INT32 SNAPPY DO:0 FPO:4 SZ:38/36/0.95 VC:5728640 ENC:PLAIN,BIT_PACKED,RLE ST:[no stats for this column] file: file:/tmp/spark/parquet/part-00001-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]} file schema: spark_schema -------------------------------------------------------------------------------- id: OPTIONAL INT32 O:DECIMAL R:0 D:1 row group 1: RC:651016 TS:2604209 OFFSET:4 -------------------------------------------------------------------------------- id: INT32 SNAPPY DO:0 FPO:4 SZ:2604325/2604209/1.00 VC:651016 ENC:PLAIN,BIT_PACKED,RLE ST:[min: 0.00, max: 651015.00, num_nulls: 0] file: file:/tmp/spark/parquet/part-00002-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]} file schema: spark_schema -------------------------------------------------------------------------------- id: OPTIONAL INT32 O:DECIMAL R:0 D:1 row group 1: RC:3231146 TS:12925219 OFFSET:4 -------------------------------------------------------------------------------- id: INT32 SNAPPY DO:0 FPO:4 SZ:12925864/12925219/1.00 VC:3231146 ENC:PLAIN,BIT_PACKED,RLE ST:[min: 651016.00, max: 3882161.00, num_nulls: 0] file: file:/tmp/spark/parquet/part-00003-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]} file schema: spark_schema -------------------------------------------------------------------------------- id: OPTIONAL INT32 O:DECIMAL R:0 D:1 row group 1: RC:2887956 TS:11552408 OFFSET:4 -------------------------------------------------------------------------------- id: INT32 SNAPPY DO:0 FPO:4 SZ:11552986/11552408/1.00 VC:2887956 ENC:PLAIN,BIT_PACKED,RLE ST:[min: 3882162.00, max: 6770117.00, num_nulls: 0] file: file:/tmp/spark/parquet/part-00004-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]} file schema: spark_schema -------------------------------------------------------------------------------- id: OPTIONAL INT32 O:DECIMAL R:0 D:1 row group 1: RC:3229882 TS:12920163 OFFSET:4 -------------------------------------------------------------------------------- id: INT32 SNAPPY DO:0 FPO:4 SZ:12920808/12920163/1.00 VC:3229882 ENC:PLAIN,BIT_PACKED,RLE ST:[min: 6770118.00, max: 9999999.00, num_nulls: 0] ``` As you can see `file:/tmp/spark/parquet/part-00000-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet` have not generated stats for that column.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org