[ 
https://issues.apache.org/jira/browse/ARROW-16081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17515308#comment-17515308
 ] 

David Li commented on ARROW-16081:
----------------------------------

IMO, using buffers directly is a lower-level API and in doing so the user is 
promising they have ensured the memory layouts match. (And I'm not sure, but 
types like strings, decimals, etc. may also fail.) And there's no validation 
possible (Arrow doesn't know what type you intend to interpret it as precisely 
because a buffer is being used instead of an array). It might help to have some 
docs about compatibility with various popular libraries (numpy, cudf, etc.) 
though. CC [~amol-] for opinions.

> Incorrect results when reading a buffer of boolean values
> ---------------------------------------------------------
>
>                 Key: ARROW-16081
>                 URL: https://issues.apache.org/jira/browse/ARROW-16081
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 7.0.0
>         Environment: Ubuntu 20.04, Python 3.8.10, pyarrow==7.0.0
>            Reporter: Jonathan Kenyon
>            Priority: Major
>
> The following reproducer demonstrates that a buffer of boolean values is not 
> correctly recovered when using pyarrow.
> {code:python}
> import pyarrow.parquet as pq
> import pyarrow as pa
> import numpy as np
> if __name__ == "__main__":
>     data = np.array([True, False, True, False], dtype=bool)
>     length = len(data)
>     buf = pa.py_buffer(data)
>     array = pa.Array.from_buffers(pa.bool_(), length, [None, buf])
>     np.testing.assert_array_equal(data, array.to_numpy(zero_copy_only=False))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to