[ https://issues.apache.org/jira/browse/ARROW-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515308#comment-17515308 ]
David Li commented on ARROW-16081: ---------------------------------- IMO, using buffers directly is a lower-level API and in doing so the user is promising they have ensured the memory layouts match. (And I'm not sure, but types like strings, decimals, etc. may also fail.) And there's no validation possible (Arrow doesn't know what type you intend to interpret it as precisely because a buffer is being used instead of an array). It might help to have some docs about compatibility with various popular libraries (numpy, cudf, etc.) though. CC [~amol-] for opinions. > Incorrect results when reading a buffer of boolean values > --------------------------------------------------------- > > Key: ARROW-16081 > URL: https://issues.apache.org/jira/browse/ARROW-16081 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 7.0.0 > Environment: Ubuntu 20.04, Python 3.8.10, pyarrow==7.0.0 > Reporter: Jonathan Kenyon > Priority: Major > > The following reproducer demonstrates that a buffer of boolean values is not > correctly recovered when using pyarrow. > {code:python} > import pyarrow.parquet as pq > import pyarrow as pa > import numpy as np > if __name__ == "__main__": > data = np.array([True, False, True, False], dtype=bool) > length = len(data) > buf = pa.py_buffer(data) > array = pa.Array.from_buffers(pa.bool_(), length, [None, buf]) > np.testing.assert_array_equal(data, array.to_numpy(zero_copy_only=False)) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)