viirya commented on code in PR #9236:
URL: https://github.com/apache/arrow-datafusion/pull/9236#discussion_r1515271348


##########
datafusion/core/src/physical_optimizer/projection_pushdown.rs:
##########
@@ -524,13 +529,100 @@ fn try_pushdown_through_union(
     Ok(Some(Arc::new(UnionExec::new(new_children))))
 }
 
+/// Some projection can't be pushed down left input or right input of hash 
join because filter or on need may need some columns that won't be used in 
later.
+/// By embed those projection to hash join, we can reduce the cost of 
build_batch_from_indices in hash join (build_batch_from_indices need to can 
compute::take() for each column) and avoid unecessary output creation.
+fn try_embed_to_hash_join(
+    projection: &ProjectionExec,
+    hash_join: &HashJoinExec,
+) -> Result<Option<Arc<dyn ExecutionPlan>>> {
+    // Collect all column indices from the given projection expressions.
+    let projection_index = collect_column_indices(projection.expr());
+
+    if projection_index.is_empty() {
+        return Ok(None);
+    };
+

Review Comment:
   If the project indices are same as the output columns of HashJoin, I think 
we don't need to embed it too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to