Re: [PR] Add expression partitioning enum variant [datafusion]

via GitHub Fri, 15 May 2026 14:00:57 -0700


stuhood commented on code in PR #22207:
URL: https://github.com/apache/datafusion/pull/22207#discussion_r3250968025



##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -600,6 +600,11 @@ impl BatchPartitioner {
                     num_input_partitions,
                 ))
             }
+            Partitioning::Expr(_) => {
+                not_impl_err!(
+                    "Expression partitioning is not supported by 
RepartitionExec"
+                )
+            }

Review Comment:
   > My intent wasn't for `ExprPartitioning` to be efficient execution format 
for physically repartitioning rows. I was thinking of this as partitioning for 
sources/plans that already have known partitioning and declare it to preserve 
in the plan to unlock optimizations.
   
   That works for the first join, but not for followup joins. For example:
   
   If you have a 3 table join, the first join will be able to use an equality 
match on range partitioning to say: no re-partitioning needed at all because 
the two tables are partitioned the same way! Great.
   
   But its very likely that the second join _does_ need to re-partition _one_ 
of its inputs (assuming different join keys between the two joins): the output 
of join one needs to be re-partitioned to match the third table. Now, 
technically you can just repartition both sides (i.e. switch to hash or 
something). But if you instead re-partition to _match_ the third table, then 
you might be able to significantly cut down on data movement.
   
   ----
   
   So, yes: I think that it is important to be able to efficiently re-partition 
by this strategy. If we don't have concrete use-cases for generic expression 
partitioning, then it would not be my first choice here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add expression partitioning enum variant [datafusion]

Reply via email to