UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202340283


##########
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##########
@@ -828,13 +833,127 @@ impl<T: BatchTransformer> NestedLoopJoinStream<T> {
                     handle_state!(self.process_probe_batch())
                 }
                 NestedLoopJoinStreamState::ExhaustedProbeSide => {
-                    handle_state!(self.process_unmatched_build_batch())
+                    handle_state!(self.prepare_unmatched_output_indices())
+                }
+                NestedLoopJoinStreamState::OutputUnmatchedBuildRows(_) => {
+                    handle_state!(self.build_unmatched_output())
                 }
                 NestedLoopJoinStreamState::Completed => Poll::Ready(None),
             };
         }
     }
 
+    fn get_next_join_result(&mut self) -> Result<Option<RecordBatch>> {
+        let (left_indices, right_indices, start) =
+            self.join_result_status.as_mut().ok_or_else(|| {
+                datafusion_common::_internal_datafusion_err!(
+                    "should have join_result_status"
+                )
+            })?;
+
+        let left_batch = self
+            .left_data
+            .as_ref()
+            .ok_or_else(|| {
+                datafusion_common::_internal_datafusion_err!("should have 
left_batch")
+            })?
+            .batch();
+
+        let right_batch = match &self.state {
+            NestedLoopJoinStreamState::ProcessProbeBatch(record_batch) => 
record_batch,
+            NestedLoopJoinStreamState::OutputUnmatchedBuildRows(record_batch) 
=> {
+                record_batch
+            }
+            _ => {
+                return internal_err!(
+                    "state should be ProcessProbeBatch or OutputUnmatchBatch"
+                )
+            }
+        };
+
+        let current_start = *start;
+
+        if left_indices.is_empty() && right_indices.is_empty() && 
current_start == 0 {

Review Comment:
   > That was my initial approach. However, it resulted in an output with 0 
rows and 0 columns, which seems to be incorrect and caused the test to fail.
   You can see the failed CI run here:
   
https://github.com/apache/datafusion/actions/runs/15734253347/job/44343070926?pr=16443#step:5:1208
   
   Now I understand why [this 
test](https://github.com/apache/datafusion/blob/acf0bbe1cedf3fe0155de5f41a5b66046e262e04/datafusion/core/src/dataframe/mod.rs#L1250-L1263)
 passed after I changed the return value from `None` to 
`RecordBatch::new_empty`.
   
   In this unit test, the join result is converted to a string and then 
compared with the expected output. When converting to a string, it retrieves 
the schema from the `record_batch` (as the passed `schema_opt` is `None`).
   
https://github.com/apache/arrow-rs/blob/7b219f98c25fcd318a0c207f51a41398d1b23724/arrow-cast/src/pretty.rs#L183-L187
   When executed in the CLI, there's no issue even if the Nested Loop Join 
(NLJ) returns 0 record batches.
   
   ```
   > select t1.value from range(1) t1 join range(1) t2 on t1.value + t2.value 
>100;
   
   +-------+
   | value |
   +-------+
   +-------+
   0 row(s) fetched. 
   ```
   
   From a compatibility standpoint, I think it's better to keep it consistent 
with the previous behavior.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to