adriangb commented on code in PR #17197:
URL: https://github.com/apache/datafusion/pull/17197#discussion_r2288369790
##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -98,6 +98,151 @@ use parking_lot::Mutex;
const HASH_JOIN_SEED: RandomState =
RandomState::with_seeds('J' as u64, 'O' as u64, 'I' as u64, 'N' as u64);
+/// Coordinates dynamic filter bounds collection across multiple partitions
+///
+/// This structure ensures that dynamic filters are built with complete
information from all
+/// relevant partitions before being applied to probe-side scans. Incomplete
filters would
+/// incorrectly eliminate valid join results.
+///
+/// ## Synchronization Strategy
+///
+/// 1. Each partition computes bounds from its build-side data
+/// 2. Bounds are stored in the shared HashMap (indexed by partition_id)
+/// 3. A counter tracks how many partitions have reported their bounds
+/// 4. When the last partition reports (completed == total), bounds are merged
and filter is updated
+///
+/// ## Partition Counting
+///
+/// The `total_partitions` count represents how many times
`collect_build_side` will be called:
+/// - **CollectLeft**: Number of output partitions (each accesses shared build
data)
+/// - **Partitioned**: Number of input partitions (each builds independently)
+///
+/// ## Thread Safety
+///
+/// All fields use atomic operations or mutexes to ensure correct coordination
between concurrent
Review Comment:
Good call, 360253e2d
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]