szehon-ho opened a new pull request, #47426: URL: https://github.com/apache/spark/pull/47426
### What changes were proposed in this pull request? Introduce runtime partition filtering for SPJ. In planning, we have the list of partition values on both sides to plan the tasks. We can thus filter out partition values based on the join type. Currently LEFT OUTER, RIGHT OUTER, INNER join types are supported as they are more common, we can optimize other join types in subsequent PR. ### Why are the changes needed? In some common join types (INNER, LEFT, RIGHT), we have an opportunity to greatly reduce the data scanned in SPJ. For example, a small table joining a larger table by partition key, can prune out most of the partitions of the large table. There is some similarity with the concept of DPP, but that uses heuristics and this is more exact as SPJ planning requires us anyway to list out both sides partitioning. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New tests in KeyGroupedPartitioningSuite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org