Hi,
You mentioned:
In general, is this optimization done for all columnar databases or file
formats ?
Have you tried it using an ORC file? That is another columnar table/file.
Spark follows a rule based optimizer. It does not have a cost based
optimizer yet! It is planned for future I believe
Hi
I just want to confirm my understanding of the physical plan generated by
Spark SQL while reading from a Parquet file.
When multiple predicates are pushed to the PrunedFilterScan, does Spark
ensure that the Parquet file is not read multiple times while evaluating
each predicate ?
In general,