2010YOUY01 commented on PR #17467:
URL: https://github.com/apache/datafusion/pull/17467#issuecomment-3288530256

   > Given the increasing interest in improving Joins in DataFusion, I wonder 
if now is the time to create some space / a structure for more sophisticated 
join planners instead of making the existing one more complicated. In 
particular, I think the join algorithm is just one part of a more sophisticated 
strategy for joins (that also may reorders joins, for example)
   > 
   > Maybe we could make `JoinPlanner` a trait that can be registered with the 
SessionContext or the Optimizer the same way as ExtensionPLanners?
   > 
   > Then we can provide a default JoinPlanner (what currently exists) that has 
its own config namespace, etc
   > 
   > ```rust
   > trait JoinPlanner {
   >   // plan the initial join when converting from Logical --> Physical join
   >   fn plan_initial_join(
   >     session_state: &SessionState,
   >     physical_left: Arc<dyn ExecutionPlan>,
   >     physical_right: Arc<dyn ExecutionPlan>,
   >     join_on: join_utils::JoinOn,
   >     join_filter: Option<join_utils::JoinFilter>,
   >     join_type: &JoinType,
   >     null_equality: &datafusion_common::NullEquality,) -> Arc<dyn 
ExecutionPlan>;
   > }
   > ```
   
   I think this trait should include two major steps from the current 
implementation: 
   1. `plan_initia_join` method: it converts the join logical plan to initial 
physical plan, which decides which join method to use (hash join or NLJ) for 
join nodes
   2. `JoinSelection` optimizer rule: it further refine the initial physical 
plan (e.g. change the repartition strategy inside Hash Join's initial physical 
plan), and also swaps join inputs according to stats
   
   Now I think it's a bit hard to extract them into a pluggable module, because 
they seem tightly coupled with other optimizer rules. Perhaps we can give it a 
try when there are multiple radically different join planning/reordering 
strategies available — we'll have a better understanding of how this interface 
should look by then.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to