raulcd commented on issue #48182:
URL: https://github.com/apache/arrow/issues/48182#issuecomment-3553728056
That's interesting. I don't think the term corrupted is correct though, both
elem0 and list_of_structs reuse the same data in memory (see the buffer address
below) but checking the values one is able to show only the expected values:
```
>>> elem0.values.buffers()
[None, None, <pyarrow.Buffer address=0x7f8c280201c0 size=40 is_cpu=True
is_mutable=True>, None, <pyarrow.Buffer address=0x7f8c28020200 size=40
is_cpu=True is_mutable=True>]
>>> list_of_structs.values.buffers()
[None, None, <pyarrow.Buffer address=0x7f8c280201c0 size=40 is_cpu=True
is_mutable=True>, None, <pyarrow.Buffer address=0x7f8c28020200 size=40
is_cpu=True is_mutable=True>]
>>> list_of_structs.values
<pyarrow.lib.StructArray object at 0x7f8c73fd7d00>
-- is_valid: all not null
-- child 0 type: int64
[
1,
3,
5,
7,
9
]
-- child 1 type: int64
[
2,
4,
6,
8,
10
]
>>> elem0.values
<pyarrow.lib.StructArray object at 0x7f8c73fd7fa0>
-- is_valid: all not null
-- child 0 type: int64
[
1,
3
]
-- child 1 type: int64
[
2,
4
]
```
I agree that currently the behavior of
`pyarrow.RecordBatch.from_struct_array(elem0.values)` is unexpected.
Do you know if this is new with PyArrow 22.0.0 or was already happening with
previous versions of PyArrow?
cc @AlenkaF
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]