andygrove commented on PR #1900:
URL: 
https://github.com/apache/datafusion-ballista/pull/1900#issuecomment-4846393463

   @avantgardnerio found a real gap in the broadcast-safety guard via a stacked 
test PR (andygrove/datafusion-ballista#89), now fixed here in b40a6701.
   
   DataFusion's own `JoinSelection` runs during `create_physical_plan` and, 
with the non-zero `hash_join_single_partition_threshold` this PR sets, stamps 
`CollectLeft` without restricting by join type. The guard only covered the 
`Partitioned -> CollectLeft` promotion path, so a pre-promoted unsafe join 
(e.g. `small LEFT JOIN big` with the small side already on the left) bypassed 
it and reached the broadcast-lowering branch, replicating the outer side across 
every probe task.
   
   The fix moves the broadcast-safety check to the front of 
`maybe_promote_to_broadcast`: any `CollectLeft` hash join whose join type isn't 
`collect_left_broadcast_safe` is demoted back to a `Partitioned` (shuffle) 
join, re-hash-partitioning both inputs on the join keys. It runs ahead of the 
threshold check so the correctness guard still fires even if Ballista's own 
broadcast promotion is disabled while DF's threshold is non-zero. 
avantgardnerio's reproducer test is pulled in verbatim (with the explanatory 
comment swapped for a current-state one to match the repo convention) and now 
passes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to