adriangb commented on code in PR #17632: URL: https://github.com/apache/datafusion/pull/17632#discussion_r2361118488
########## datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs: ########## @@ -25,14 +25,24 @@ use crate::joins::PartitionMode; use crate::ExecutionPlan; use crate::ExecutionPlanProperties; +use arrow::datatypes::{DataType, Field}; +use datafusion_common::config::ConfigOptions; use datafusion_common::{Result, ScalarValue}; use datafusion_expr::Operator; +use datafusion_expr::ScalarUDF; +use datafusion_functions::hash::Hash; use datafusion_physical_expr::expressions::{lit, BinaryExpr, DynamicFilterPhysicalExpr}; -use datafusion_physical_expr::{PhysicalExpr, PhysicalExprRef}; +use datafusion_physical_expr::{PhysicalExpr, PhysicalExprRef, ScalarFunctionExpr}; +use ahash::RandomState; use itertools::Itertools; use parking_lot::Mutex; -use tokio::sync::Barrier; +use std::collections::HashSet; + +/// RandomState used by RepartitionExec for consistent hash partitioning +/// This must match the seeds used in RepartitionExec to ensure our hash-based +/// filter expressions compute the same partition assignments as the actual partitioning +const REPARTITION_RANDOM_STATE: RandomState = RandomState::with_seeds(0, 0, 0, 0); Review Comment: I've done this. In part because it was also necessary to test this change to be able to have deterministic partitioning (i.e. given the input values I know some will end up in each partition). That change will also allow an optimizer rule to swap out hash functions without us having to quibble about which one to choose. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org