jonathanc-n commented on code in PR #16716: URL: https://github.com/apache/datafusion/pull/16716#discussion_r2194904410
########## datafusion/physical-plan/src/joins/utils.rs: ########## @@ -928,6 +929,55 @@ pub(crate) fn build_batch_from_indices( Ok(RecordBatch::try_new(Arc::new(schema.clone()), columns)?) } +/// Returns a new [RecordBatch] resulting of a join where the build/left side is empty. +/// The resulting batch has [Schema] `schema`. +pub(crate) fn build_batch_empty_build_side( + schema: &Schema, + build_batch: &RecordBatch, + probe_batch: &RecordBatch, + column_indices: &[ColumnIndex], + join_type: JoinType, +) -> Result<RecordBatch> { + match join_type { + // these join types only return data if the left side is not empty, so we return an + // empty RecordBatch + JoinType::Inner Review Comment: > Thinking about this, I think a more generic version of this would be switching small left sides (e.g < 10 rows) to using cross join 🤔 Is this including for equijoin conditions? I think the performance seemed slow when there was a larger right table for doing this with nested loop join which follows a similar algorithm. It is probably a memory issue due to the cartesian product. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org