[PR] Make benefits_from_input_partitioning Default in SHJ [arrow-datafusion]

via GitHub Tue, 09 Jan 2024 07:20:46 -0800


metesynnada opened a new pull request, #8801:
URL: https://github.com/apache/arrow-datafusion/pull/8801


   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   Improves the condition on 
https://github.com/apache/arrow-datafusion/pull/8794#discussion_r1445868353 #.
   
   ## Rationale for this change
   
   Previous `SymmetricHashJoinExec` implementation did not require input 
ordering, thus this makes `SymmetricHashJoinExec` suboptimal when 
`target_partitions` is higher than one. 
   
   ## What changes are included in this PR?
   
   If the child nodes (left or right side of the join) already have a defined 
order and the columns used in the filter predicate are ordered, the order of 
that side is kept. The identified order is then used in the 
`SymmetricHashJoinExec` to maintain bounded memory during join operations. 
However, if the child nodes do not have an inherent order, or if the filter 
columns are unordered, no specific order is required for the 
SymmetricHashJoinExec. This approach ensures that the symmetric hash join 
operation only imposes ordering constraints when necessary, based on the 
properties of the child nodes and the filter condition.
   
   Also, proto files are changed, which increases the changed line count.
   
   ## Are these changes tested?
   
   Yes
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Make benefits_from_input_partitioning Default in SHJ [arrow-datafusion]

Reply via email to