Dandandan opened a new issue, #348:
URL: https://github.com/apache/arrow-ballista/issues/348

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   When we support broadcast exchanges 
https://github.com/apache/arrow-ballista/issues/342 we can transform certain
   
   **Describe the solution you'd like**
   
   Currently all plans involving hash joins look like the following.
   
   
   ```
   HashJoin <- RemoteExchange (partitioned) <- build side input
            <- RemoteExchange (partitioned) <- probe side input
   ```
   
   When the build side is small (e.g. Spark uses 10MB * number of partitions 
for this by default - but generally bigger can help as well in my experience).
   
   
   The new plan after optimization looks like this (note the missing exchange 
in the probe side, that side doesn't require shuffling now)
   
   ```
   HashJoin <- BroadCastExchange (partitioned) <- build side input
            <- probe side input
   ```
   
   **Describe alternatives you've considered**
   
   Implement the (physical) optimization rule. The rule should run after the 
`HashBuildProbeOrder` rule from DataFusion.
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to