andygrove opened a new pull request, #1900:
URL: https://github.com/apache/datafusion-ballista/pull/1900

   # Which issue does this PR close?
   
   Part of #1770.
   
   This PR makes the default-value change suggested in #1770 (restore non-zero
   thresholds). The other part of #1770 — having the *static* job planner 
respect
   `datafusion.optimizer.hash_join_single_partition_threshold(_rows)` (it 
currently
   uses `ballista.optimizer.broadcast_join_threshold_bytes`) — is not included 
here,
   so #1770 is left open.
   
   # Rationale for this change
   
   Ballista pins `datafusion.optimizer.hash_join_single_partition_threshold` and
   `..._threshold_rows` to `0`, which disables `CollectLeft` (broadcast) hash 
joins.
   That was a deliberate workaround for #1055 ("Left/full outer join incorrect 
for
   CollectLeft / broadcast"), which is now fixed (closed COMPLETED).
   
   While the threshold is `0`, the AQE join resolver 
(`DynamicJoinSelectionExec` /
   `to_actual_join`) never promotes a small build side to a broadcast 
`CollectLeft`
   join — every join is repartitioned through a shuffle, even single-row 
dimension
   tables. Restoring the thresholds (now safe) lets the resolver broadcast small
   build sides.
   
   Measured on TPC-H SF10 (1 scheduler + 1 executor, 8 task slots, 8 partitions,
   3 iterations, AQE enabled), comparing the resolver with the threshold at `0` 
vs.
   the values in this PR:
   
   | | threshold = 0 | threshold = 10 MB / 1M rows |
   |---|---:|---:|
   | full 22-query total | 32.2 s | 16.2 s |
   
   Per-query, broadcast fires for small dimension joins and is left off for
   big-vs-big joins. Examples: Q2 0.82 s → 0.12 s, Q8 3.13 s → 0.46 s,
   Q9 4.61 s → 1.09 s, Q11 0.59 s → 0.09 s, Q21 5.17 s → 1.49 s. Row counts are
   unchanged for all 22 queries.
   
   # What changes are included in this PR?
   
   - Set `datafusion.optimizer.hash_join_single_partition_threshold` to 10 MB 
and
     `datafusion.optimizer.hash_join_single_partition_threshold_rows` to 
1,000,000
     in the Ballista session defaults (previously both `0`).
   - Update the client settings test to assert the new defaults (and rename it 
to
     reflect that it now checks the thresholds are set).
   
   # Are there any user-facing changes?
   
   Yes. With AQE enabled, a hash join whose build side fits under these 
thresholds
   is executed as a broadcast (`CollectLeft`) join instead of being 
repartitioned.
   No SQL semantics change. The static (non-AQE) planner uses a separate config
   (`ballista.optimizer.broadcast_join_threshold_bytes`) and is unaffected.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to