tustvold opened a new issue #1071:
URL: https://github.com/apache/arrow-rs/issues/1071
**Describe the bug**
ArrayDataBuilder performs various validation of the offsets array. In
particular, it validates that offsets are monotonically increasing and within
the bounds of the values array.
However, it is my understanding that nulls can have arbitrary offsets and so
I think this might be overly strict?
**To Reproduce**
```
let offsets = vec![2, 0, 2, 2];
let validity = vec![false, true, false];
let values = "ab";
let mut offsets_buffer = MutableBuffer::new(offsets.len() * 4);
offsets_buffer.extend_from_slice(&offsets);
let validity_buffer = MutableBuffer::from_iter(validity.iter().cloned());
let mut values_buffer = MutableBuffer::new(values.len());
values_buffer.extend_from_slice(values.as_bytes());
let arraydata = ArrayDataBuilder::new(DataType::Utf8)
.len(validity.len())
.add_buffer(offsets_buffer.into())
.add_buffer(values_buffer.into())
.null_bit_buffer(validity_buffer.into())
.build()
.unwrap();
```
**Expected behavior**
I would expect this to not error, as the non-null elements have valid offsets
**Additional context**
I encountered this whilst trying to produce a reproducer for a related bug,
where the string comparison kernels panic in the presence of non-monotonically
increasing offsets. This in turn was whilst working on a parquet string array
decoder, where I was hoping to just leave offsets for nulls zero-initialized.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]