LiaCastaneda commented on code in PR #20142:
URL: https://github.com/apache/datafusion/pull/20142#discussion_r2771436549
##########
datafusion/physical-plan/src/joins/hash_join/exec.rs:
##########
@@ -5091,17 +5176,17 @@ mod tests {
false,
)?;
join.dynamic_filter = Some(HashJoinExecDynamicFilter {
- filter: dynamic_filter,
+ partition_filters,
build_accumulator: OnceLock::new(),
});
// Execute the join
let stream = join.execute(0, task_ctx)?;
let _batches = common::collect(stream).await?;
- // After the join completes, the dynamic filter should be marked as
complete
+ // After the join completes, the partition filter should be marked as
complete
// wait_complete() should return immediately
- dynamic_filter_clone.wait_complete().await;
+ filter_to_wait.wait_complete().await;
Review Comment:
Yeah, we have some custom `ExecutionPlan` leaf nodes that don't evaluate
filters in the leaf node itself but instead push query execution (along with
the filters) to a remote data source. The key requirement is that we need to
wait for all build partitions to report their filters before querying the
remote source because we execute this leaf node in a single partition (even if
we have a Partitioned join), so we need the complete filter information from
all hash join build partitions to generate a correct predicate.
I was raising it in case other custom scans had similar patterns, but if
it's too specific to our use case, we can handle it internally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]