[ 
https://issues.apache.org/jira/browse/SPARK-54593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-54593:
-----------------------------------
    Labels: pull-request-available  (was: )

> Expand DPP for LogicalRelation and LogicalRDD
> ---------------------------------------------
>
>                 Key: SPARK-54593
>                 URL: https://issues.apache.org/jira/browse/SPARK-54593
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer, SQL
>    Affects Versions: 3.0.0, 4.0.0
>            Reporter: Dustin Smith
>            Priority: Major
>              Labels: pull-request-available
>
> Enable Dynamic Partition Pruning (DPP) for queries involving LocalRelation 
> and LogicalRDD as the filtering side of broadcast joins. These logical plan 
> nodes represent small, materialized datasets that are ideal candidates for 
> DPP optimization.
>  #  PartitionPruning.scala
>  ** Modified hasSelectivePredicate() to treat LocalRelation and LogicalRDD as 
> selective predicates
>  ** Added overhead calculation for LocalRelation and LogicalRDD in 
> calculatePlanOverhead()
>  ** LogicalRDD is only considered for DPP when it has originStats with 
> rowCount, indicating materialized data
>  # Test Coverage
>  ** Added 5 comprehensive tests in DynamicPartitionPruningSuite.scala:
>  *** DPP with LocalRelation in broadcast join
>  *** DPP with LogicalRDD from cached DataFrame in broadcast join
>  *** DPP with empty LocalRelation 
>  *** DPP should not trigger for LogicalRDD without originStats 
>  *** DPP with large LocalRelation 
> Will supply a Github PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to