viirya opened a new issue, #5896: URL: https://github.com/apache/arrow-rs/issues/5896
**Describe the bug** <!-- A clear and concise description of what the bug is. --> We move buffer pointer of offset buffer when slicing a string array and keep data buffer pointer unchanged. When exporting it through FFI, we simply export the moved pointer of the offset buffer. When importing the array, we calculate the length of data buffer by taking the difference of last offset and first offset in the (slice) offset buffer. Note that the calculated length is not correct. For example, the original string array's data buffer is 346536 bytes, last offset is 346536. We take a slice of 8192 strings from it, the slice of offsets are `[147456, ..., 294912]`. The calculated length is `294912 - 147456 = 147456`. But actually the length of data buffer is `346536`. So the data buffer of the imported array has incorrect length. It doesn't cause issues so far because we access imported data buffer using pointers at most time (and we don't actually check the range). But for some cases where we access the data as slice (i.e., `[]`), it will cause runtime panic like: ``` ---- ffi::tests_from_ffi::test_extend_imported_string_slice stdout ---- thread 'ffi::tests_from_ffi::test_extend_imported_string_slice' panicked at arrow-data/src/transform/variable_size.rs:38:29: range end index 10890 out of range for slice of length 5500 ``` Note `test_extend_imported_string_slice` is new test I added in #5895. **To Reproduce** <!-- Steps to reproduce the behavior: --> **Expected behavior** <!-- A clear and concise description of what you expected to happen. --> **Additional context** <!-- Add any other context about the problem here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
