Hi All ,

 I am using CDH 5.7 which comes with Spark version 1.6.0.  I am saving my
data set as parquet data and then querying it . Query is executing fine But
when I checked the files generated by spark, I found statistics(min/max) is
missing for all the columns . And hence filters are not pushed down. Its
scanning the entire file.

*(1 to 30000).map(i => (i, i.toString)).toDF("a",

*parquet-tools meta

creator:     p*arquet-mr version 1.5.0-cdh5.7.1* (build ${buildNumber})

extra:       org.apache.spark.sql.parquet.row.metadata =

file schema: spark_schema


a:           OPTIONAL INT32 R:0 D:1

b:           OPTIONAL BINARY O:UTF8 R:0 D:1

row group 1: RC:148 TS:2012


a:            INT32 GZIP DO:0 FPO:4 SZ:297/635/2.14 VC:148

b:            BINARY GZIP DO:0 FPO:301 SZ:301/1377/4.57 VC:148

As you can see from the parquet meta the STA field is missing. And spark is
scanning all data of all files.

Any suggestion ?

Thanks //


