Copilot commented on code in PR #19932:
URL: https://github.com/apache/datafusion/pull/19932#discussion_r2714134861
##########
datafusion/common/src/config.rs:
##########
@@ -1115,6 +1115,18 @@ config_namespace! {
/// See:
<https://trino.io/docs/current/admin/dynamic-filtering.html#dynamic-filter-collection-thresholds>
pub hash_join_inlist_pushdown_max_distinct_values: usize, default = 150
+ /// When true, pushes down hash table references for membership checks
in hash joins
+ /// when the build side is too large for InList pushdown.
+ /// When false, no membership filter is created when InList thresholds
are exceeded.
+ /// Default: true
+ pub hash_join_map_pushdown: bool, default = true
+
+ /// When true, pushes down min/max bounds for join key columns.
+ /// This enables statistics-based pruning (e.g., Parquet row group
skipping).
+ /// When false, only membership filters (InList or Map) are pushed
down.
+ /// Default: true
+ pub hash_join_bounds_pushdown: bool, default = true
Review Comment:
The new configuration options `hash_join_map_pushdown` and
`hash_join_bounds_pushdown` lack test coverage. While the PR description
mentions existing hash join tests pass, there are no specific tests that
validate the behavior when these options are set to false. Consider adding
tests similar to the existing `test_hashjoin_hash_table_pushdown_partitioned`
that verify:
1. When `hash_join_map_pushdown` is false and build side exceeds InList
thresholds, no membership filter is created (only bounds if enabled)
2. When `hash_join_bounds_pushdown` is false, bounds predicates are not
included in the filter
3. When both are false but dynamic filter pushdown is enabled, appropriate
behavior occurs
4. Various combinations of these flags work correctly in both Partitioned
and CollectLeft modes
##########
datafusion/common/src/config.rs:
##########
@@ -1115,6 +1115,18 @@ config_namespace! {
/// See:
<https://trino.io/docs/current/admin/dynamic-filtering.html#dynamic-filter-collection-thresholds>
pub hash_join_inlist_pushdown_max_distinct_values: usize, default = 150
+ /// When true, pushes down hash table references for membership checks
in hash joins
+ /// when the build side is too large for InList pushdown.
+ /// When false, no membership filter is created when InList thresholds
are exceeded.
Review Comment:
The documentation states "When false, no membership filter is created when
InList thresholds are exceeded." This could be clearer. Consider rephrasing to:
"When false, no hash table reference (Map) is used for membership checks. If
the build side exceeds InList thresholds, no membership filter is created (only
bounds, if enabled)." This makes it clearer that InList filters can still be
created when the build side is small enough, and that this config specifically
controls the Map fallback behavior.
```suggestion
/// When true, pushes down hash table references (Map) for
membership checks in hash joins
/// when the build side is too large for InList pushdown.
/// When false, no hash table reference (Map) is used for membership
checks. If the build
/// side exceeds InList thresholds, no membership filter is created
(only bounds, if enabled).
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]