Dandandan commented on code in PR #16716:
URL: https://github.com/apache/datafusion/pull/16716#discussion_r2196037862
##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -928,6 +929,55 @@ pub(crate) fn build_batch_from_indices(
Ok(RecordBatch::try_new(Arc::new(schema.clone()), columns)?)
}
+/// Returns a new [RecordBatch] resulting of a join where the build/left side
is empty.
+/// The resulting batch has [Schema] `schema`.
+pub(crate) fn build_batch_empty_build_side(
+ schema: &Schema,
+ build_batch: &RecordBatch,
+ probe_batch: &RecordBatch,
+ column_indices: &[ColumnIndex],
+ join_type: JoinType,
+) -> Result<RecordBatch> {
+ match join_type {
+ // these join types only return data if the left side is not empty, so
we return an
+ // empty RecordBatch
+ JoinType::Inner
Review Comment:
I think it *should* be relatively fast to do a cross join / NLJ instead of a
hash join for those cases, but of course depends how the nested loop join is
implemented, probably there is more room for optimization of the nested loop
join.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]