adriangb commented on code in PR #20142:
URL: https://github.com/apache/datafusion/pull/20142#discussion_r2770882485
##########
datafusion/physical-plan/src/joins/hash_join/exec.rs:
##########
@@ -5091,17 +5176,17 @@ mod tests {
false,
)?;
join.dynamic_filter = Some(HashJoinExecDynamicFilter {
- filter: dynamic_filter,
+ partition_filters,
build_accumulator: OnceLock::new(),
});
// Execute the join
let stream = join.execute(0, task_ctx)?;
let _batches = common::collect(stream).await?;
- // After the join completes, the dynamic filter should be marked as
complete
+ // After the join completes, the partition filter should be marked as
complete
// wait_complete() should return immediately
- dynamic_filter_clone.wait_complete().await;
+ filter_to_wait.wait_complete().await;
Review Comment:
> concern is how custom leaf nodes (ExecutionPlan) that have Partitioned
HashJoins to fetch the build in parallel but need to know when all partitions
have reported their data would adapt to the new structure
This is something you have internally right? Could you help explain how this
works / what the requirement is? At some point I wonder if it wouldn't be
better to have an optimizer rule that links the join node to the custom scan
node if the custom scan node needs to behave in a specific way if it is a the
leaf of a join (as opposed to e.g. the current parquet scan node that doesn't
care at all)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]