Dandandan commented on a change in pull request #55: URL: https://github.com/apache/arrow-datafusion/pull/55#discussion_r619873424
########## File path: datafusion/src/physical_plan/hash_join.rs ########## @@ -891,6 +898,36 @@ impl Stream for HashJoinStream { } Some(result) } + // If maybe_batch is None and num_output_rows is 0, that means right side batch was + // empty and has been coalesced to None. Fill right side with Null if preserve_left + // is true. + None if self.preserve_left && self.num_output_rows == 0 => { Review comment: I think this partially resolves a more general issue with the left join, which is that it doesn't keep track of unmatched left rows across batches. https://issues.apache.org/jira/browse/ARROW-10971 Maybe we can add a TODO here / issue that we should generalize this to produce rows that were not matched. This looks like a great start for that 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org