LiaCastaneda commented on code in PR #20142:
URL: https://github.com/apache/datafusion/pull/20142#discussion_r2771449199
##########
datafusion/physical-plan/src/joins/hash_join/exec.rs:
##########
@@ -5091,17 +5176,17 @@ mod tests {
false,
)?;
join.dynamic_filter = Some(HashJoinExecDynamicFilter {
- filter: dynamic_filter,
+ partition_filters,
build_accumulator: OnceLock::new(),
});
// Execute the join
let stream = join.execute(0, task_ctx)?;
let _batches = common::collect(stream).await?;
- // After the join completes, the dynamic filter should be marked as
complete
+ // After the join completes, the partition filter should be marked as
complete
// wait_complete() should return immediately
- dynamic_filter_clone.wait_complete().await;
+ filter_to_wait.wait_complete().await;
Review Comment:
Yeah, apart from Parquet, we have some custom `ExecutionPlan` leaf nodes
that don't evaluate filters in the leaf node itself but instead push query
execution (along with the filters) to a remote data source. The key requirement
is that we need to wait for all build partitions to report their filters before
querying the remote source because we execute this leaf node in a single
partition (even if we have a Partitioned join), so we need the complete filter
information from all hash join build partitions to generate a correct predicate.
I was raising it in case other custom scans had similar patterns, but if
it's too specific to our use case, we can handle it internally.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]