Michael,
Thanks. Is this still turned off in the released 1.2? Is it possible to
turn it on just to get an idea of how much of a difference it makes?
-Jerry
On 05/12/14 12:40 am, Michael Armbrust wrote:
I'll add that some of our data formats will actual infer this sort of
useful information
I'll add that some of our data formats will actual infer this sort of
useful information automatically. Both parquet and cached inmemory tables
keep statistics on the min/max value for each column. When you have
predicates over these sorted columns, partitions will be eliminated if they
can't
You can try to write your own Relation with filter push down or use the
ParquetRelation2 for workaround.
(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala)
Cheng Hao
-Original Message-
From: Jerry Raj