But Xuelin already posted in the original message that the code was using SET spark.sql.parquet.filterPushdown=true
On Wed, Jan 7, 2015 at 12:42 AM, Daniel Haviv <danielru...@gmail.com> wrote: > Quoting Michael: > Predicate push down into the input format is turned off by default because > there is a bug in the current parquet library that null pointers when there > are full row groups that are null. > > https://issues.apache.org/jira/browse/SPARK-4258 > > You can turn it on if you want: > http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration > > Daniel > > On 7 בינו׳ 2015, at 08:18, Xuelin Cao <xuelin...@yahoo.com.INVALID> wrote: > > > Hi, > > I'm testing parquet file format, and the predicate pushdown is a > very useful feature for us. > > However, it looks like the predicate push down doesn't work after I > set > sqlContext.sql("SET spark.sql.parquet.filterPushdown=true") > > Here is my sql: > *sqlContext.sql("select adId, adTitle from ad where > groupId=10113000").collect* > > Then, I checked the amount of input data on the WEB UI. But the > amount of input data is ALWAYS 80.2M regardless whether I turn the > spark.sql.parquet.filterPushdown > flag on or off. > > I'm not sure, if there is anything that I must do when *generating > *the parquet file in order to make the predicate pushdown available. > (Like ORC file, when creating the ORC file, I need to explicitly sort the > field that will be used for predicate pushdown) > > Anyone have any idea? > > And, anyone knows the internal mechanism for parquet predicate > pushdown? > > Thanks > > > >