szehon-ho opened a new pull request, #47426:
URL: https://github.com/apache/spark/pull/47426

     ### What changes were proposed in this pull request?
   
   Introduce runtime partition filtering for SPJ.  In planning, we have the 
list of partition values on both sides to plan the tasks.  We can thus filter 
out partition values based on the join type.
   
   Currently LEFT OUTER, RIGHT OUTER, INNER join types are supported as they 
are more common, we can optimize other join types in subsequent PR.
   
     ### Why are the changes needed?
   
   In some common join types (INNER, LEFT, RIGHT), we have an opportunity to 
greatly reduce the data scanned in SPJ.  For example, a small table joining a 
larger table by partition key, can prune out most of the partitions of the 
large table.
   
   There is some similarity with the concept of DPP, but that uses heuristics 
and this is more exact as SPJ planning requires us anyway to list out both 
sides partitioning.
   
     ### Does this PR introduce _any_ user-facing change?
   
   No.
   
     ### How was this patch tested?
   
   New tests in KeyGroupedPartitioningSuite.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to