damacek opened a new issue, #9235:
URL: https://github.com/apache/arrow-rs/issues/9235
**Describe the bug**
Recently, I encountered an assertion failure when passing a sliced
RecordBatch to the Python FFI. This problem seems to pop out, only when working
with slices of more RecordBatches containing complex data - nested lists and/or
structs.
**To Reproduce**
Add a following test case to
`arrow-pyarrow-integration-testing/tests/test_sql.py`:
```python
def test_nested_struct_with_list_slice():
"""
Test round-tripping sliced record batches with deeply nested struct
types.
This tests struct<struct<list<struct>>> with variable-length lists,
ensuring that slicing at different row offsets works correctly.
"""
# Build the nested type: struct<struct<list<struct>>>
item_type = pa.struct([("x", pa.int64())])
inner_struct_type = pa.struct([("items", pa.list_(item_type))])
outer_struct_type = pa.struct([("inner", inner_struct_type)])
# Key: variable-length inner lists (1, 2, 1 items)
batch = pa.record_batch(
[
pa.array([1, 2, 3], type=pa.int64()),
pa.array([
{"inner": {"items": [{"x": 1}]}},
{"inner": {"items": [{"x": 2}, {"x": 3}]}},
{"inner": {"items": [{"x": 4}]}},
], type=outer_struct_type),
],
names=["id", "outer"]
)
# Test round-trip of each sliced row
for i in range(batch.num_rows):
print(i)
sliced = batch.slice(i, 1)
result = rust.round_trip_record_batch(sliced)
result.validate(full=True)
assert result.to_pydict() == sliced.to_pydict()
assert result.schema == sliced.schema
```
When I run `pytest -v .`:
```
# Test round-trip of each sliced row
for i in range(batch.num_rows):
print(i)
sliced = batch.slice(i, 1)
> result = rust.round_trip_record_batch(sliced)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E pyo3_runtime.PanicException: assertion failed: (offset + length)
<= self.len()
tests\test_sql.py:757: PanicException
------------------------------------------- Captured stdout call
--------------------------------------------
0
1
------------------------------------------- Captured stderr call
--------------------------------------------
thread '<unnamed>' (12544) panicked at
C:\Code\arrow-rs\arrow-data\src\data.rs:581:9:
assertion failed: (offset + length) <= self.len()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
**Expected behavior**
The provided test should pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]