tobixdev commented on issue #18870:
URL: https://github.com/apache/datafusion/issues/18870#issuecomment-3563111499

   I think I found the root cause. `FixedSizeBinaryArray::new_null` [does not 
correctly set the length of the values 
buffer](https://github.com/apache/arrow-rs/issues/8900).
   
   The NL-Join implementation creates a scalar from the left-side of the join 
and then calls `to_array_of_size` 
(`datafusion/physical-plan/src/joins/nested_loop_join.rs:1953`), which in turn 
calls `FixedSizeBinaryArray::new_null`:
   
   ```rust
     // ScalarValue::to_array_of_size for FixedSizeBinaryArray
     ScalarValue::FixedSizeBinary(s, e) => match e {
         Some(value) => Arc::new(
             FixedSizeBinaryArray::try_from_sparse_iter_with_size(
                 repeat_n(Some(value.as_slice()), size),
                 *s,
             )
             .unwrap(),
         ),
         None => Arc::new(FixedSizeBinaryArray::new_null(*s, size)), // <---- 
The call
     },
   ```
   
   As the value buffer was not as long as expected, the slice operations panic 
during the execution of the join. This happens when the `ScalarValue` that 
get's extracted from the left join side is `None`.
   
   I've compiled the [51.1 
patch](https://github.com/apache/datafusion/pull/18820) against my fix in 
https://github.com/apache/arrow-rs/pull/8901 and it seems like this resolves 
our problems.
   
   @2010YOUY01 maybe you could quickly check if this explanation makes sense, 
as you're familar with the NL Join. 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to