AlenkaF commented on issue #48182:
URL: https://github.com/apache/arrow/issues/48182#issuecomment-3557209465
This is a bit unexpected. And _maybe_ not a common use case and so it didn't
surface in this time.
Looking at the offsets and lengths (as mentioned) in the `ListArray` I would
think this should be passed to the C++ correctly:
```python
>>> list_of_structs.offsets
<pyarrow.lib.Int32Array object at 0x105025780>
[
0,
2,
3,
5
]
>>> list_of_structs[0].values.offset
0
>>> list_of_structs[1].values.offset
2
>>> list_of_structs[2].values.offset
3
```
but it seems that when creating a `RecordBatch` the offsets and/or length
are ignored? Even more strange is that it only happens to the first element of
the list! If we look at the second, the result is correct:
```python
>>> elem1 = list_of_structs[1]
... batch_for_elem1 = pyarrow.RecordBatch.from_struct_array(elem1.values)
...
>>> elem1
<pyarrow.ListScalar: [{'x': 5, 'y': 6}]>
>>> batch_for_elem1
pyarrow.RecordBatch
x: int64
y: int64
----
x: [5]
y: [6]
```
and same for the third. So I am guessing the issue is with offset 0 where
the "slicing" is ignored.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]