[
https://issues.apache.org/jira/browse/SPARK-54593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-54593:
-----------------------------------
Labels: pull-request-available (was: )
> Expand DPP for LogicalRelation and LogicalRDD
> ---------------------------------------------
>
> Key: SPARK-54593
> URL: https://issues.apache.org/jira/browse/SPARK-54593
> Project: Spark
> Issue Type: Improvement
> Components: Optimizer, SQL
> Affects Versions: 3.0.0, 4.0.0
> Reporter: Dustin Smith
> Priority: Major
> Labels: pull-request-available
>
> Enable Dynamic Partition Pruning (DPP) for queries involving LocalRelation
> and LogicalRDD as the filtering side of broadcast joins. These logical plan
> nodes represent small, materialized datasets that are ideal candidates for
> DPP optimization.
> # PartitionPruning.scala
> ** Modified hasSelectivePredicate() to treat LocalRelation and LogicalRDD as
> selective predicates
> ** Added overhead calculation for LocalRelation and LogicalRDD in
> calculatePlanOverhead()
> ** LogicalRDD is only considered for DPP when it has originStats with
> rowCount, indicating materialized data
> # Test Coverage
> ** Added 5 comprehensive tests in DynamicPartitionPruningSuite.scala:
> *** DPP with LocalRelation in broadcast join
> *** DPP with LogicalRDD from cached DataFrame in broadcast join
> *** DPP with empty LocalRelation
> *** DPP should not trigger for LogicalRDD without originStats
> *** DPP with large LocalRelation
> Will supply a Github PR.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]