After setting `parquet.strings.signed-min-max.enabled` to `true` in `ShowMetaCommand.java`, parquet-tools meta show min,max.
@@ -57,8 +57,9 @@ public class ShowMetaCommand extends ArgsOnlyCommand { String[] args = options.getArgs(); String input = args[0]; Configuration conf = new Configuration(); + conf.set("parquet.strings.signed-min-max.enabled", "true"); Path inputPath = new Path(input); FileStatus inputFileStatus = inputPath.getFileSystem(conf).getFileStatus(inputPath); List<Footer> footers = ParquetFileReader.readFooters(conf, inputFileStatus, false); Result row group 1: RC:3 TS:56 OFFSET:4 -------------------------------------------------------------------------------- field1: BINARY SNAPPY DO:0 FPO:4 SZ:56/56/1.00 VC:3 ENC:DELTA_BYTE_ARRAY -- ST:[min: a, max: c, num_nulls: 0] For the reference, this was intended symptom by PARQUET-686 [1]. [1] https://www.mail-archive.com/commits@parquet.apache.org/msg00491.html 2018-01-24 10:31 GMT+09:00 Stephen Joung <step...@vcnc.co.kr>: > How can I write parquet file with min/max statistic? > > 2018-01-24 10:30 GMT+09:00 Stephen Joung <step...@vcnc.co.kr>: > >> Hi, I am trying to use spark sql filter push down. and specially want to >> use row group skipping with parquet file. >> >> And I guessed that I need parquet file with statistics min/max. >> >> ---- >> >> On spark master branch - I tried to write single column with "a", "b", >> "c" to parquet file f1 >> >> scala> List("a", "b", "c").toDF("field1").coalesce(1 >> ).write.parquet("f1") >> >> But saved file does not have statistics (min, max) >> >> $ ls f1/*.parquet >> f1/part-00000-445036f9-7a40-4333-8405-8451faa44319-c000.snappy.parquet >> $ parquet-tool meta f1/*.parquet >> file: file:/Users/stephen/p/spark/f >> 1/part-00000-445036f9-7a40-4333-8405-8451faa44319- c000.snappy.parquet >> creator: parquet-mr version 1.8.2 (build >> c6522788629e590a53eb79874b95f6c3ff11f16c) >> extra: org.apache.spark.sql.parquet.row.metadata = >> {"type":"struct","fields":[{"name":"field1","type":"string", >> "nullable":true,"metadata":{}}]} >> >> file schema: spark_schema >> ----------------------------------------------------------- >> --------------------- >> field1: OPTIONAL BINARY O:UTF8 R:0 D:1 >> >> row group 1: RC:3 TS:48 OFFSET:4 >> ----------------------------------------------------------- >> --------------------- >> field1: BINARY SNAPPY DO:0 FPO:4 SZ:50/48/0.96 VC:3 >> ENC:BIT_PACKED,RLE,PLAIN ST:[no stats for this column] >> >> ---- >> >> Any pointer or comment would be appreciated. >> Thank you. >> >> >