tustvold opened a new issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072


   **Describe the bug**
   
   The string comparison kernels panic if they encounter non-monotonic offsets, 
even if the values for which they are encountered are null. Currently 
`ArrayDataBuilder`'s validation catches such arrays, however, I think this is 
actually a bug (#1071) as I don't think the arrow specification forbids nulls 
from having arbitrary offsets. I encountered this whilst working on a parquet 
StringArray decoder where I was hoping to leave the offsets for null values 
default zero-initialized and save some cycles.
   
   **To Reproduce**
   
   ```
   // Test handling of nulls with arbitrary offsets
   let offsets = vec![2, 0, 2, 2];
   let validity = vec![false, true, false];
   let values = "ab";
   
   let mut offsets_buffer = MutableBuffer::new(offsets.len() * 4);
   offsets_buffer.extend_from_slice(&offsets);
   
   let validity_buffer = MutableBuffer::from_iter(validity.iter().cloned());
   
   let mut values_buffer = MutableBuffer::new(values.len());
   values_buffer.extend_from_slice(values.as_bytes());
   
   let builder = ArrayDataBuilder::new(DataType::Utf8)
       .len(validity.len())
       .add_buffer(offsets_buffer.into())
       .add_buffer(values_buffer.into())
       .null_bit_buffer(validity_buffer.into());
   
   // Validation prevents non-monotonic offsets (#1071)
   let data = unsafe { builder.build_unchecked() };
   
   let string = StringArray::from(data);
   let eq = eq_utf8(&string, &string).unwrap();
   assert!(min_boolean(&eq).unwrap())
   ```
   
   **Expected behavior**
   
   I would expect the above to pass, all non-null values have valid offsets. 
Instead it panics with
   
   ```
   called `Option::unwrap()` on a `None` value
   thread 'compute::kernels::comparison::tests::test_utf8_array_nulls' panicked 
at 'called `Option::unwrap()` on a `None` value', 
arrow/src/array/array_string.rs:101:40
   stack backtrace:
      0: rust_begin_unwind
                at 
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
      1: core::panicking::panic_fmt
                at 
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
      2: core::panicking::panic
                at 
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:50:5
      3: core::option::Option<T>::unwrap
                at 
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/option.rs:746:21
      4: 
arrow::array::array_string::GenericStringArray<OffsetSize>::value_unchecked
                at ./src/array/array_string.rs:101:13
      5: arrow::compute::kernels::comparison::eq_utf8::{{closure}}
                at ./src/compute/kernels/comparison.rs:56:35
      6: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for 
&mut F>::call_once
                at 
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:280:13
      7: core::option::Option<T>::map
                at 
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/option.rs:848:29
      8: <core::iter::adapters::map::Map<I,F> as 
core::iter::traits::iterator::Iterator>::next
                at 
/rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/iter/adapters/map.rs:103:9
      9: arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter_bool
                at ./src/buffer/mutable.rs:489:38
     10: arrow::compute::kernels::comparison::eq_utf8
                at ./src/compute/kernels/comparison.rs:685:5
     11: arrow::compute::kernels::comparison::tests::test_utf8_array_nulls
                at ./src/compute/kernels/comparison.rs:2305:18
     12: 
arrow::compute::kernels::comparison::tests::test_utf8_array_nulls::{{closure}}
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to