Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21556#discussion_r202214356
  
    --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt ---
    @@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
     
     Select 1 decimal(9, 2) row (value = 7864320): Best/Avg Time(ms)    
Rate(M/s)   Per Row(ns)   Relative
     
------------------------------------------------------------------------------------------------
    -Parquet Vectorized                            3785 / 3867          4.2     
    240.6       1.0X
    -Parquet Vectorized (Pushdown)                 3820 / 3928          4.1     
    242.9       1.0X
    -Native ORC Vectorized                         3981 / 4049          4.0     
    253.1       1.0X
    -Native ORC Vectorized (Pushdown)               702 /  735         22.4     
     44.6       5.4X
    +Parquet Vectorized                            4407 / 4852          3.6     
    280.2       1.0X
    +Parquet Vectorized (Pushdown)                 1602 / 1634          9.8     
    101.8       2.8X
    --- End diff --
    
    Here is a test:
    ```scala
    // decimal(9, 2) max values is 9999999.99
    // 1024 * 1024 * 15 =          15728640
    val path = "/tmp/spark/parquet"
    spark.range(1024 * 1024 * 15).selectExpr("cast((id) as decimal(9, 2)) as 
id").orderBy("id").write.mode("overwrite").parquet(path)
    ```
    The generated parquet metadata:
    ```shell
    $ java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar meta  
/tmp/spark/parquet
    file:        
file:/tmp/spark/parquet/part-00000-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
    creator:     parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a) 
    extra:       org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]}
 
    
    file schema: spark_schema 
    
--------------------------------------------------------------------------------
    id:          OPTIONAL INT32 O:DECIMAL R:0 D:1
    
    row group 1: RC:5728640 TS:36 OFFSET:4 
    
--------------------------------------------------------------------------------
    id:           INT32 SNAPPY DO:0 FPO:4 SZ:38/36/0.95 VC:5728640 
ENC:PLAIN,BIT_PACKED,RLE ST:[no stats for this column]
    file:        
file:/tmp/spark/parquet/part-00001-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
    creator:     parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a) 
    extra:       org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]}
 
    
    file schema: spark_schema 
    
--------------------------------------------------------------------------------
    id:          OPTIONAL INT32 O:DECIMAL R:0 D:1
    
    row group 1: RC:651016 TS:2604209 OFFSET:4 
    
--------------------------------------------------------------------------------
    id:           INT32 SNAPPY DO:0 FPO:4 SZ:2604325/2604209/1.00 VC:651016 
ENC:PLAIN,BIT_PACKED,RLE ST:[min: 0.00, max: 651015.00, num_nulls: 0]
    file:        
file:/tmp/spark/parquet/part-00002-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
    creator:     parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a) 
    extra:       org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]}
 
    
    file schema: spark_schema 
    
--------------------------------------------------------------------------------
    id:          OPTIONAL INT32 O:DECIMAL R:0 D:1
    
    row group 1: RC:3231146 TS:12925219 OFFSET:4 
    
--------------------------------------------------------------------------------
    id:           INT32 SNAPPY DO:0 FPO:4 SZ:12925864/12925219/1.00 VC:3231146 
ENC:PLAIN,BIT_PACKED,RLE ST:[min: 651016.00, max: 3882161.00, num_nulls: 0]
    file:        
file:/tmp/spark/parquet/part-00003-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
    creator:     parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a) 
    extra:       org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]}
 
    
    file schema: spark_schema 
    
--------------------------------------------------------------------------------
    id:          OPTIONAL INT32 O:DECIMAL R:0 D:1
    
    row group 1: RC:2887956 TS:11552408 OFFSET:4 
    
--------------------------------------------------------------------------------
    id:           INT32 SNAPPY DO:0 FPO:4 SZ:11552986/11552408/1.00 VC:2887956 
ENC:PLAIN,BIT_PACKED,RLE ST:[min: 3882162.00, max: 6770117.00, num_nulls: 0]
    file:        
file:/tmp/spark/parquet/part-00004-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
    creator:     parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a) 
    extra:       org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]}
 
    
    file schema: spark_schema 
    
--------------------------------------------------------------------------------
    id:          OPTIONAL INT32 O:DECIMAL R:0 D:1
    
    row group 1: RC:3229882 TS:12920163 OFFSET:4 
    
--------------------------------------------------------------------------------
    id:           INT32 SNAPPY DO:0 FPO:4 SZ:12920808/12920163/1.00 VC:3229882 
ENC:PLAIN,BIT_PACKED,RLE ST:[min: 6770118.00, max: 9999999.00, num_nulls: 0]
    ```
    As you can see 
`file:/tmp/spark/parquet/part-00000-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet`
 have not generated stats for that column.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to