mingmwang commented on code in PR #4043:
URL: https://github.com/apache/arrow-datafusion/pull/4043#discussion_r1012581673
##########
datafusion/core/src/physical_plan/joins/hash_join.rs:
##########
@@ -270,6 +271,75 @@ impl ExecutionPlan for HashJoinExec {
self.schema.clone()
}
+ fn required_input_distribution(&self) -> Vec<Distribution> {
+ match self.mode {
+ PartitionMode::CollectLeft => vec![
+ Distribution::SinglePartition,
+ Distribution::UnspecifiedDistribution,
+ ],
+ PartitionMode::Partitioned => {
+ let (left_expr, right_expr) = self
+ .on
+ .iter()
+ .map(|(l, r)| {
+ (
+ Arc::new(l.clone()) as Arc<dyn PhysicalExpr>,
+ Arc::new(r.clone()) as Arc<dyn PhysicalExpr>,
+ )
+ })
+ .unzip();
+ vec![
Review Comment:
It is possible, but this PR will not include it . Originally I have plan to
implement such optimizations In Phase 2
with a more dynamic Enforcement rules, but it has the risk to introduce
skewed joins and currently we do not have good way to handle skewed joins.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]