[ 
https://issues.apache.org/jira/browse/SPARK-54593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Smith updated SPARK-54593:
---------------------------------
    Description: 
Enable Dynamic Partition Pruning (DPP) for queries involving LocalRelation and 
LogicalRDD as the filtering side of broadcast joins. These logical plan nodes 
represent small, materialized datasets that are ideal candidates for DPP 
optimization.
 #  PartitionPruning.scala
 ** Modified hasSelectivePredicate() to treat LocalRelation and LogicalRDD as 
selective predicates
 ** Added overhead calculation for LocalRelation and LogicalRDD in 
calculatePlanOverhead()
 ** LogicalRDD is only considered for DPP when it has originStats with 
rowCount, indicating materialized data
 # Test Coverage
 ** Added 5 comprehensive tests in DynamicPartitionPruningSuite.scala:
 *** DPP with LocalRelation in broadcast join
 *** DPP with LogicalRDD from cached DataFrame in broadcast join
 *** DPP with empty LocalRelation 
 *** DPP should not trigger for LogicalRDD without originStats 
 *** DPP with large LocalRelation 

Will supply a Github PR.

https://github.com/apache/spark/pull/53324

  was:
Enable Dynamic Partition Pruning (DPP) for queries involving LocalRelation and 
LogicalRDD as the filtering side of broadcast joins. These logical plan nodes 
represent small, materialized datasets that are ideal candidates for DPP 
optimization.
 #  PartitionPruning.scala
 ** Modified hasSelectivePredicate() to treat LocalRelation and LogicalRDD as 
selective predicates
 ** Added overhead calculation for LocalRelation and LogicalRDD in 
calculatePlanOverhead()
 ** LogicalRDD is only considered for DPP when it has originStats with 
rowCount, indicating materialized data
 # Test Coverage
 ** Added 5 comprehensive tests in DynamicPartitionPruningSuite.scala:
 *** DPP with LocalRelation in broadcast join
 *** DPP with LogicalRDD from cached DataFrame in broadcast join
 *** DPP with empty LocalRelation 
 *** DPP should not trigger for LogicalRDD without originStats 
 *** DPP with large LocalRelation 

Will supply a Github PR.


> Expand DPP for LogicalRelation and LogicalRDD
> ---------------------------------------------
>
>                 Key: SPARK-54593
>                 URL: https://issues.apache.org/jira/browse/SPARK-54593
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer, SQL
>    Affects Versions: 3.0.0, 4.0.0
>            Reporter: Dustin Smith
>            Priority: Major
>              Labels: pull-request-available
>
> Enable Dynamic Partition Pruning (DPP) for queries involving LocalRelation 
> and LogicalRDD as the filtering side of broadcast joins. These logical plan 
> nodes represent small, materialized datasets that are ideal candidates for 
> DPP optimization.
>  #  PartitionPruning.scala
>  ** Modified hasSelectivePredicate() to treat LocalRelation and LogicalRDD as 
> selective predicates
>  ** Added overhead calculation for LocalRelation and LogicalRDD in 
> calculatePlanOverhead()
>  ** LogicalRDD is only considered for DPP when it has originStats with 
> rowCount, indicating materialized data
>  # Test Coverage
>  ** Added 5 comprehensive tests in DynamicPartitionPruningSuite.scala:
>  *** DPP with LocalRelation in broadcast join
>  *** DPP with LogicalRDD from cached DataFrame in broadcast join
>  *** DPP with empty LocalRelation 
>  *** DPP should not trigger for LogicalRDD without originStats 
>  *** DPP with large LocalRelation 
> Will supply a Github PR.
> https://github.com/apache/spark/pull/53324



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to