Hi dev,

We would like raise a discussion about supporting `DataFilter pushed down dynamically`. We are trying to push down dataFilter with PartitionFilter, and prune partitionFilter at runtime before push to parquet, which can push less filter to parquet. More details can be found in 35669.

before this patch, the physical plan is:

Filter:
(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))

ParquetScan DataFilters: 
[(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))]

PartitionFilters: 
[((part#15 = 0) OR (part#15 = 1))]

PushedFilters: 
[Or(LessThan(id,2),GreaterThan(id,3))]

after this patch, the physical plan is:

Filter:
(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))

ParquetScan DataFilters: 
[(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))]

PartitionFilters: 
[((part#15 = 0) OR (part#15 = 1))]

PushedDynamicalFilters: 
[Or(And(LessThan(id,2),EqualTo(part,0)),And(GreaterThan(id,3),EqualTo(part,1)))]

Please note that PushedFilters is changed to PushedDynamicalFilters[Or(And(LessThan(id,2),EqualTo(part,0)),And(GreaterThan(id,3),EqualTo(part,1)))] means data filter id < 2 is dynamically pushed to partition 0 and data filter id > 3 is dynamically pushed to partition 1.

We would like to start a discussion about this PR and whether we can use PushedDynamicalFilters to represent filter dynamic pushdown. Any feedback is welcome.

Thanks!

Jacky Lee

Reply via email to