andygrove opened a new pull request, #1900:
URL: https://github.com/apache/datafusion-ballista/pull/1900
# Which issue does this PR close?
Part of #1770.
This PR makes the default-value change suggested in #1770 (restore non-zero
thresholds). The other part of #1770 — having the *static* job planner
respect
`datafusion.optimizer.hash_join_single_partition_threshold(_rows)` (it
currently
uses `ballista.optimizer.broadcast_join_threshold_bytes`) — is not included
here,
so #1770 is left open.
# Rationale for this change
Ballista pins `datafusion.optimizer.hash_join_single_partition_threshold` and
`..._threshold_rows` to `0`, which disables `CollectLeft` (broadcast) hash
joins.
That was a deliberate workaround for #1055 ("Left/full outer join incorrect
for
CollectLeft / broadcast"), which is now fixed (closed COMPLETED).
While the threshold is `0`, the AQE join resolver
(`DynamicJoinSelectionExec` /
`to_actual_join`) never promotes a small build side to a broadcast
`CollectLeft`
join — every join is repartitioned through a shuffle, even single-row
dimension
tables. Restoring the thresholds (now safe) lets the resolver broadcast small
build sides.
Measured on TPC-H SF10 (1 scheduler + 1 executor, 8 task slots, 8 partitions,
3 iterations, AQE enabled), comparing the resolver with the threshold at `0`
vs.
the values in this PR:
| | threshold = 0 | threshold = 10 MB / 1M rows |
|---|---:|---:|
| full 22-query total | 32.2 s | 16.2 s |
Per-query, broadcast fires for small dimension joins and is left off for
big-vs-big joins. Examples: Q2 0.82 s → 0.12 s, Q8 3.13 s → 0.46 s,
Q9 4.61 s → 1.09 s, Q11 0.59 s → 0.09 s, Q21 5.17 s → 1.49 s. Row counts are
unchanged for all 22 queries.
# What changes are included in this PR?
- Set `datafusion.optimizer.hash_join_single_partition_threshold` to 10 MB
and
`datafusion.optimizer.hash_join_single_partition_threshold_rows` to
1,000,000
in the Ballista session defaults (previously both `0`).
- Update the client settings test to assert the new defaults (and rename it
to
reflect that it now checks the thresholds are set).
# Are there any user-facing changes?
Yes. With AQE enabled, a hash join whose build side fits under these
thresholds
is executed as a broadcast (`CollectLeft`) join instead of being
repartitioned.
No SQL semantics change. The static (non-AQE) planner uses a separate config
(`ballista.optimizer.broadcast_join_threshold_bytes`) and is unaffected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]