andygrove opened a new pull request, #1902:
URL: https://github.com/apache/datafusion-ballista/pull/1902
# Which issue does this PR close?
Closes #1901.
# Rationale for this change
Client-set `datafusion.*` session config (e.g. `SET
datafusion.optimizer.prefer_hash_join = true`, or `-c` overrides in the tpch
benchmark) had no effect on scheduler-side planning, while `ballista.*`
settings
in the same session worked.
The cause is `SessionConfigExt::upgrade_for_ballista`, which calls
`ballista_restricted_configuration()` to apply Ballista's opinionated
DataFusion
defaults (`prefer_hash_join = false`, `hash_join_single_partition_threshold
= 0`,
the Utf8View flags). `SessionConfig::new_with_ballista()` already applies
these
once at construction. When `remote_with_state` later calls
`upgrade_for_ballista` again, the defaults are re-applied *after* the user
has
set their own values, silently reverting them. `ballista.*` settings survive
only
because the restricted config does not touch them.
Notably, the restricted-config comments state that users can opt back into
`prefer_hash_join` and override the view-type flags via `SET` — but the
re-application defeated exactly that.
# What changes are included in this PR?
- `upgrade_for_ballista` now applies `ballista_restricted_configuration()`
only
when the config has not already been through Ballista setup (detected by
the
absence of the `BallistaConfig` extension). A config that already carries
the
extension keeps the user's values.
- Tests: a user override of `prefer_hash_join` /
`hash_join_single_partition_threshold`
survives `upgrade_for_ballista`; a plain config still receives Ballista's
defaults on upgrade.
# Are there any user-facing changes?
Yes. `datafusion.*` session settings set on the client (via `SET` or config
overrides) are now honored by the scheduler. `round_robin_repartition`
remains
effectively enforced because the scheduler forces it off when building the
execution context. No SQL semantics change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]