alamb opened a new issue, #4788:
URL: https://github.com/apache/arrow-rs/issues/4788
**Describe the bug**
The cmp kernels do not ignore null key values of DictionaryArrays as they
should
**To Reproduce**
Run this program:
```rust
use std::sync::Arc;
use arrow::array::{Int32Array, StringArray, BooleanBufferBuilder};
use arrow::buffer::{ScalarBuffer, NullBuffer};
use arrow::{array::DictionaryArray};
use arrow::util::pretty::pretty_format_columns;
fn main() {
// Logically like this:
// keys: PrimitiveArray<Int32>
//
[
// null,
// 1,
// 0,
// ]
// values: StringArray
// [
// "us-west",
// "us-east",
// ]}
let values = StringArray::from(vec![Some("us-west"), Some("us-east")]);
let mut nulls = BooleanBufferBuilder::new(3);
nulls.append(false); // null
nulls.append(true);
nulls.append(true);
let nulls: NullBuffer = nulls.finish().into();
// key values
//
// since element 0 is NULL, the index value of 100 should be
// ignored
let key_values = ScalarBuffer::from(vec![100i32, 1i32, 0i32]);
let keys = Int32Array::new(key_values, Some(nulls));
let col = DictionaryArray::try_new(keys, Arc::new(values)).unwrap();
println!("Input col: {col:?}");
let comparison = arrow::compute::kernels::cmp::neq(
&col.slice(0, col.len() - 1),
&col.slice(1, col.len() - 1),
)
.expect("cmp");
println!("comparison: {}", pretty_format_columns("neq",
&[Arc::new(comparison) as _]).unwrap());
}
```
Results in
```
Input col: DictionaryArray {keys: PrimitiveArray<Int32>
[
null,
1,
0,
] values: StringArray
[
"us-west",
"us-east",
]}
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value',
/Users/alamb/Software/arrow-rs/arrow-array/src/array/byte_array.rs:294:38
stack backtrace:
0: rust_begin_unwind
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
1: core::panicking::panic_fmt
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
2: core::panicking::panic
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:117:5
3: core::option::Option<T>::unwrap
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/option.rs:935:21
4: arrow_array::array::byte_array::GenericByteArray<T>::value_unchecked
at
/Users/alamb/Software/arrow-rs/arrow-array/src/array/byte_array.rs:294:13
5: <&arrow_array::array::byte_array::GenericByteArray<T> as
arrow_ord::cmp::ArrayOrd>::value_unchecked
at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:522:9
6: arrow_ord::cmp::apply_op_vectored::{{closure}}
at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:450:12
7: arrow_ord::cmp::collect_bool
at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:388:24
8: arrow_ord::cmp::apply_op_vectored
at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:447:5
9: arrow_ord::cmp::apply
at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:326:17
10: arrow_ord::cmp::compare_op::{{closure}}
at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:219:29
11: arrow_ord::cmp::compare_op
at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:286:44
12: arrow_ord::cmp::neq
at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:86:5
13: rust_arrow_playground::main
at ./src/main.rs:40:23
14: core::ops::function::FnOnce::call_once
at
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
```
**Expected behavior**
The code should not panic
Specifically, if you change the key values from
```rust
let key_values = ScalarBuffer::from(vec![100i32, 1i32, 0i32]);
```
To
```rust
let key_values = ScalarBuffer::from(vec![0i32, 1i32, 0i32]);
```
Then the reproducer completes without error:
```
comparison: +------+
| neq |
+------+
| |
| true |
+------+
```
**Additional context**
Found while updating https://github.com/influxdata/influxdb_iox/pull/8577
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]