viirya commented on code in PR #9236:
URL: https://github.com/apache/arrow-datafusion/pull/9236#discussion_r1515271348
##########
datafusion/core/src/physical_optimizer/projection_pushdown.rs:
##########
@@ -524,13 +529,100 @@ fn try_pushdown_through_union(
Ok(Some(Arc::new(UnionExec::new(new_children))))
}
+/// Some projection can't be pushed down left input or right input of hash
join because filter or on need may need some columns that won't be used in
later.
+/// By embed those projection to hash join, we can reduce the cost of
build_batch_from_indices in hash join (build_batch_from_indices need to can
compute::take() for each column) and avoid unecessary output creation.
+fn try_embed_to_hash_join(
+ projection: &ProjectionExec,
+ hash_join: &HashJoinExec,
+) -> Result<Option<Arc<dyn ExecutionPlan>>> {
+ // Collect all column indices from the given projection expressions.
+ let projection_index = collect_column_indices(projection.expr());
+
+ if projection_index.is_empty() {
+ return Ok(None);
+ };
+
Review Comment:
If the project indices are same as the output columns of HashJoin, I think
we don't need to embed it too?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]