Re: Spark/Parquet/Statistics question

2017-11-21 Thread Rabin Banerjee
; >> So the question is why is Spark, particularly, 2.1.0, only generate > min/max > >> for numeric columns, but not strings(BINARY) fields, even if the string > >> field is included in the sort? Maybe I missed a configuraiton? > >> > >> The second issue,

Re: Spark/Parquet/Statistics question

2017-01-17 Thread Michael Segel
/secret/spark21-sortById` where >> id=4").show >> I got many lines like this: >> 17/01/17 09:23:35 INFO FilterCompat: Filtering using predicate: >> and(noteq(id, null), eq(id, 4)) >> 17/01/17 09:23:35 INFO FileScanRDD: Reading File path: >> file:///secret

Re: Spark/Parquet/Statistics question

2017-01-17 Thread Dong Jiang
7ac12-6038-46ee-b5c3-d7a5a06e4425.snappy.parquet, > range: 0-558, partition values: [empty row] > ... > 17/01/17 09:23:35 INFO FilterCompat: Filtering using predicate: > and(noteq(id, null), eq(id, 4)) > 17/01/17 09:23:35 INFO FileScanRDD: Reading File path:

Re: Spark/Parquet/Statistics question

2017-01-17 Thread Jörn Franke
te: > and(noteq(id, null), eq(id, 4)) > 17/01/17 09:23:35 INFO FileScanRDD: Reading File path: > file:///secret/spark21-sortById/part-00193-39f7ac12-6038-46ee-b5c3-d7a5a06e4425.snappy.parquet, > range: 0-574, partition values: [empty row] > ... > >

Spark/Parquet/Statistics question

2017-01-17 Thread djiang
9:23:35 INFO FilterCompat: Filtering using predicate: and(noteq(id, null), eq(id, 4)) 17/01/17 09:23:35 INFO FileScanRDD: Reading File path: file:///secret/spark21-sortById/part-00193-39f7ac12-6038-46ee-b5c3-d7a5a06e4425.snappy.parquet, range: 0-574, partition values: [empty row] ... The q