Hi dev,

We would like raise a discussion about supporting `DataFilter pushed down
dynamically`. We are trying to push down dataFilter with PartitionFilter,
and prune partitionFilter at runtime before push to parquet, which can push
less filter to parquet. More details can be found in 35669
<https://github.com/apache/spark/pull/35669>.

before this patch, the physical plan is:

Filter:
(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))

ParquetScan DataFilters:
[(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))]

PartitionFilters:
[((part#15 = 0) OR (part#15 = 1))]

PushedFilters:
[Or(LessThan(id,2),GreaterThan(id,3))]

after this patch, the physical plan is:

Filter:
(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))

ParquetScan DataFilters:
[(((id#14 < 2) AND (part#15 = 0)) OR ((id#14 > 3) AND (part#15 = 1)))]

PartitionFilters:
[((part#15 = 0) OR (part#15 = 1))]

PushedDynamicalFilters:
[Or(And(LessThan(id,2),EqualTo(part,0)),And(GreaterThan(id,3),EqualTo(part,1)))]

Please note that PushedFilters is changed to PushedDynamicalFilters.
[Or(And(LessThan(id,2),EqualTo(part,0)),And(GreaterThan(id,3),EqualTo(part,1)))]
means
data filter id < 2 is dynamically pushed to partition 0 and data filter id
> 3 is dynamically pushed to partition 1.

We would like to start a discussion about this PR and whether we can
use PushedDynamicalFilters
to represent filter dynamic pushdown. Any feedback is welcome.

Thanks!

Jacky Lee

Reply via email to