Hi dev,

We would like to raise a discussion about supporting `DataFilter pushed
down dynamically`. We are trying to push down dataFilter with
PartitionFilter, and prune partitionFilter at runtime before push to
parquet, which can push less filter to parquet. More details can be found
in 35669 <https://github.com/apache/spark/pull/35669>.

before this patch, the physical plan is:

Filter:
(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))

ParquetScan DataFilters:
[(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))]

PartitionFilters:
[((part#15 = 0) OR (part#15 = 1))]

PushedFilters:
[Or(LessThan(id,2),GreaterThan(id,3))]

after this patch, the physical plan is:

Filter:
(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))

ParquetScan DataFilters:
[(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))]

PartitionFilters:
[((part#15 = 0) OR (part#15 = 1))]

PushedDynamicalFilters:
[Or(And(LessThan(id,2),EqualTo(part,0)),And(GreaterThan(id,3),EqualTo(part,1)))]

Please note that PushedFilters is changed to PushedDynamicalFilters.
[Or(And(LessThan(id,2),EqualTo(part,0)),And(GreaterThan(id,3),EqualTo(part,1)))]
means
data filter id < 2 is dynamically pushed to partition 0 and data filter id
> 3 is dynamically pushed to partition 1.

We would like to start a discussion about this PR and whether we can
use PushedDynamicalFilters
to represent filter dynamic pushdown. Any feedback is welcome.

Thanks!

Jacky Lee

Reply via email to