[ 
https://issues.apache.org/jira/browse/ARROW-15837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500952#comment-17500952
 ] 

quentin lhoest edited comment on ARROW-15837 at 3/3/22, 6:02 PM:
-----------------------------------------------------------------

Ok, created at https://issues.apache.org/jira/browse/ARROW-15839  :)


was (Author: lhoestq):
Ok, created at https://issues.apache.org/jira/browse/ARROW-14448 :)

> [C++][Python][Doc] ListArray.offsets is wrong when it contains both lists and 
> null values
> -----------------------------------------------------------------------------------------
>
>                 Key: ARROW-15837
>                 URL: https://issues.apache.org/jira/browse/ARROW-15837
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Documentation, Python
>    Affects Versions: 7.0.0
>            Reporter: quentin lhoest
>            Assignee: Antoine Pitrou
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 8.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi ! I noticed this bug by running this code:
> {code:java}
> import pyarrow as pa
> arr = pa.array([None, [0]])
> reconstructed_arr = pa.ListArray.from_arrays(arr.offsets, arr.values)
> print(reconstructed_arr.to_pylist())
> # [[], [0]] {code}
> The resulting array, reconstructed from the offsets and values of the 
> original array, {*}is not the same at the original array{*}.
> This is the case because it seems that `arr.offsets` is wrong. Indeed it 
> returns `[0, 0, 1]` instead of `[None, 0, 1]`:
> {code:java}
> print(arr.offsets.to_pylist())
> # [0, 0, 1]
> fixed_reconstructed_arr = pa.ListArray.from_arrays(pa.array([None, 0, 1]), 
> arr.values)
> print(fixed_reconstructed_arr.to_pylist())
> # [None, [0]]{code}
> If it can help, here is my investigation:
> The offsets seem to be wrong because they don't include the validity bitmap 
> from `{{{}arr.buffers()[0]`{}}}, which is used to say which values are null 
> and which values are non-null. Therefore the `None` is replaced by `0`.
> Though even if the validity bitmap is not taken into account at all, I 
> checked its value and it  was not what I expected: the validity bitmap at 
> `{{{}arr.buffers()[0]`{}}} is supposed to be `110` (in order to mask the None 
> in `[None, 0, 1]`) but it is `10` for some reason:
> {code:java}
> bin(int(arr.buffers()[0].hex(), 16))
> # '0b10'
> # I think it should be 0b110 - 1 corresponds to non-null and 0 corresponds to 
> null, if you take the bits in reverse order {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to