Re: SQL predicate pushdown on parquet or other columnar formats

2016-08-01 Thread Mich Talebzadeh
Hi, You mentioned: In general, is this optimization done for all columnar databases or file formats ? Have you tried it using an ORC file? That is another columnar table/file. Spark follows a rule based optimizer. It does not have a cost based optimizer yet! It is planned for future I believe

SQL predicate pushdown on parquet or other columnar formats

2016-08-01 Thread Sandeep Joshi
Hi I just want to confirm my understanding of the physical plan generated by Spark SQL while reading from a Parquet file. When multiple predicates are pushed to the PrunedFilterScan, does Spark ensure that the Parquet file is not read multiple times while evaluating each predicate ? In general,