fmonjalet commented on code in PR #18919:
URL: https://github.com/apache/datafusion/pull/18919#discussion_r2581356433


##########
datafusion/physical-expr/src/partitioning.rs:
##########
@@ -160,38 +175,43 @@ impl Partitioning {
             Distribution::SinglePartition if self.partition_count() == 1 => 
true,
             // When partition count is 1, hash requirement is satisfied.
             Distribution::HashPartitioned(_) if self.partition_count() == 1 => 
true,
-            Distribution::HashPartitioned(required_exprs) => {
-                match self {
-                    // Here we do not check the partition count for hash 
partitioning and assumes the partition count
-                    // and hash functions in the system are the same. In 
future if we plan to support storage partition-wise joins,
-                    // then we need to have the partition count and hash 
functions validation.
-                    Partitioning::Hash(partition_exprs, _) => {
-                        let fast_match =
-                            physical_exprs_equal(required_exprs, 
partition_exprs);
-                        // If the required exprs do not match, need to 
leverage the eq_properties provided by the child
-                        // and normalize both exprs based on the equivalent 
groups.
-                        if !fast_match {
-                            let eq_groups = eq_properties.eq_group();
-                            if !eq_groups.is_empty() {
-                                let normalized_required_exprs = required_exprs
-                                    .iter()
-                                    .map(|e| 
eq_groups.normalize_expr(Arc::clone(e)))
-                                    .collect::<Vec<_>>();
-                                let normalized_partition_exprs = 
partition_exprs
-                                    .iter()
-                                    .map(|e| 
eq_groups.normalize_expr(Arc::clone(e)))
-                                    .collect::<Vec<_>>();
-                                return physical_exprs_equal(
-                                    &normalized_required_exprs,
-                                    &normalized_partition_exprs,
-                                );
-                            }
+            Distribution::HashPartitioned(required_exprs) => match self {
+                // Here we do not check the partition count for hash 
partitioning and assumes the partition count
+                // and hash functions in the system are the same. In future if 
we plan to support storage partition-wise joins,
+                // then we need to have the partition count and hash functions 
validation.

Review Comment:
   Looking at this comment I am also thinking `KeyPartitioned` addresses this 
concern, in the sense that `KeyPartitioned(my_hash_func(key))` expresses that 
the set is partitioned according to a precise hash function (same for range 
partitioning). It removes the "unknown hash" from the equation.
   
   So for a join, if the two sides are partitioned by `KeyPartitioned(f(key))` 
with the same `f`, then we can use a partitioned HashJoin without repartition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to