[GitHub] [spark] wangyum opened a new pull request, #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

GitBox Tue, 01 Nov 2022 01:56:54 -0700


wangyum opened a new pull request, #38464:
URL: https://github.com/apache/spark/pull/38464


   ### What changes were proposed in this pull request?
   
   This PR enhances DPP to use bloom filters if 
`spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly` is disabled 
and build plan can't build broadcast by size and can reuse the existing shuffle 
exchanges.
   
   ### Why are the changes needed?
   
   Avoid job fail if 
`spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly` is disabled:
   ```sql
   select catalog_sales.* from  catalog_sales join catalog_returns  where 
cr_order_number = cs_sold_date_sk and cr_returned_time_sk < 40000;
   ```
   ```
   20/08/16 06:44:42 ERROR TaskSetManager: Total size of serialized results of 
494 tasks (1225.3 MiB) is bigger than spark.driver.maxResultSize (1024.0 MiB)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum opened a new pull request, #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

Reply via email to