Hi dev,

We would like to raise a discussion about supporting `DataFilter
pushed down dynamically`. We are trying to push down dataFilter with
PartitionFilter, and prune partitionFilter at runtime before push to
parquet, which can push less filter to parquet. More details can be
found in https://github.com/apache/spark/pull/35669.

before this patch, the physical plan is:

Filter:
(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))

ParquetScan DataFilters:
[(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))]

PartitionFilters:
[((part#15 = 0) OR (part#15 = 1))]

PushedFilters:
[Or(LessThan(id,2),GreaterThan(id,3))]

after this patch, the physical plan is:

Filter:
(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))

ParquetScan DataFilters:
[(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))]

PartitionFilters:
[((part#15 = 0) OR (part#15 = 1))]

PushedDynamicalFilters:
[Or(And(LessThan(id,2),EqualTo(part,0)),And(GreaterThan(id,3),EqualTo(part,1)))]

Please note that PushedFilters is changed to PushedDynamicalFilters.
[Or(And(LessThan(id,2),EqualTo(part,0)),And(GreaterThan(id,3),EqualTo(part,1)))]
means data filter id < 2 is dynamically pushed to partition 0 and data
filter id > 3 is dynamically pushed to partition 1.

We would like to start a discussion about this PR and whether we can
use PushedDynamicalFilters to represent filter dynamic pushdown. Any
feedback is welcome.

Thanks!

Jacky Lee 

--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to