[ 
https://issues.apache.org/jira/browse/SPARK-52160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Smith updated SPARK-52160:
-------------------------------
    Description: 
We have a hive partitioned dataset, partitioned on year, month, day. When doing 
a broadcast hash join with that table, with the partition keys as the join 
keys, dynamic partition pruning is not being used. It's reading all partitions. 
The query would be a lot faster with DPP. 

 

When filtering on year, month, and day (no join), Spark does read only the 
partitions that match the filter. 

 

Code: 
https://community.palantir.com/t/dynamic-partition-pruning-of-hive-partitioned-datasets-with-a-broadcast-join/3545

  was:
We have a hive partitioned dataset, partitioned on year, month, day. When doing 
a broadcast hash join with that table, with the partition keys as the join 
keys, dynamic partition pruning is not being used. It's reading all partitions. 
The query would be a lot faster with DPP. 

 

When filtering on year, month, and day (no join), Spark does read only the 
partitions that match the filter. 


> Dynamic pruning not being used with broadcast hash join
> -------------------------------------------------------
>
>                 Key: SPARK-52160
>                 URL: https://issues.apache.org/jira/browse/SPARK-52160
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.5.5
>            Reporter: John Smith
>            Priority: Major
>
> We have a hive partitioned dataset, partitioned on year, month, day. When 
> doing a broadcast hash join with that table, with the partition keys as the 
> join keys, dynamic partition pruning is not being used. It's reading all 
> partitions. The query would be a lot faster with DPP. 
>  
> When filtering on year, month, and day (no join), Spark does read only the 
> partitions that match the filter. 
>  
> Code: 
> https://community.palantir.com/t/dynamic-partition-pruning-of-hive-partitioned-datasets-with-a-broadcast-join/3545



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to