[ 
https://issues.apache.org/jira/browse/ARROW-13632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadym Zhernovyi updated ARROW-13632:
------------------------------------
    Description: 
When calling FixedSizeListArray.filter for a slice, it is always applied to the 
first (len(slice)) elements at the begging of the array which a slice is 
created from.
* The issue doesn't reproduce for ListArray. 
* a particular mask doesn't matter
* slice length and position doesn't matter
* a number of elements filtered at wrong position is always equal to a length 
of a slice
* the issues is not reproduced with 
[ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html]
* a type of data (int32, float, ...) doesn't matter

{code:python}
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC 
v.1916 64 bit (AMD64)] on win32
>>> import numpy as np
>>> import pyarrow as pa
>>> np.__version__
'1.21.1'
>>> pa.__version__
'5.0.0'
>>> data = [
    np.zeros(3, dtype='int32'),
    np.ones(3, dtype='int32'),
    np.ones(3, dtype='int32') + 1,
    np.ones(3, dtype='int32') + 2,
    np.ones(3, dtype='int32') + 3,
    np.ones(3, dtype='int32') + 4,
    np.ones(3, dtype='int32') + 5,
    np.ones(3, dtype='int32') + 6
        ]
a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # FixedSizeListArray
>>> a.filter(pa.array(len(a) * [True]))  # everything is ok 
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0>
[
  [0, 0, 0],
  [1, 1, 1],
  [2, 2, 2],
  [3, 3, 3],
  [4, 4, 4],
  [5, 5, 5],
  [6, 6, 6],
  [7, 7, 7]
]
>>> a[3:7].filter(pa.array(4 * [True]))  # outputs filtered element of a[0:3] 
>>> instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60>
[
  [0, 0, 0],
  [1, 1, 1],
  [2, 2, 2],
  [3, 3, 3]
]
>>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered 
>>> element of a[0:3] instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460>
[
  [0, 0, 0],
  [2, 2, 2]
]
>>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered element 
>>> of a[0:3] instead of a[4:]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00>
[
  [0, 0, 0],
  [1, 1, 1],
  [2, 2, 2],
  [3, 3, 3]
]
>>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2] 
>>> instead of a[4:6]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040>
[
  [0, 0, 0],
  [1, 1, 1]
]
>>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * [True])) 
>>> # ListArray slice filtering works ok
<pyarrow.lib.ListArray object at 0x000001E25E5F50A0>
[
  [3, 3, 3],
  [4, 4, 4],
  [5, 5, 5],
  [6, 6, 6]
]
{code}
 

 

  was:
When calling FixedSizeListArray.filter for a slice, it is always applied to the 
first (len(slice)) elements at the begging of the array which a slice is 
created from.
* The issue doesn't reproduce for ListArray. 
* a particular mask doesn't matter
* slice length and position doesn't matter
* a number of elements filtered at wrong position is always equal to a length 
of a slice
* the issues is not reproduced with 
[ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html]
* a type of data (int32, float, ...) doesn't matter
{code:python}
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC 
v.1916 64 bit (AMD64)] on win32
>>> import numpy as np
>>> import pyarrow as pa
>>> np.__version__
'1.21.1'
>>> pa.__version__
'5.0.0'
>>> data = [
    np.zeros(3, dtype='int32'),
    np.ones(3, dtype='int32'),
    np.ones(3, dtype='int32') + 1,
    np.ones(3, dtype='int32') + 2,
    np.ones(3, dtype='int32') + 3,
    np.ones(3, dtype='int32') + 4,
    np.ones(3, dtype='int32') + 5,
    np.ones(3, dtype='int32') + 6
        ]
a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # FixedSizeListArray
>>> a.filter(pa.array(len(a) * [True]))  # everything is ok 
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0>
[
  [0, 0, 0],
  [1, 1, 1],
  [2, 2, 2],
  [3, 3, 3],
  [4, 4, 4],
  [5, 5, 5],
  [6, 6, 6],
  [7, 7, 7]
]
>>> a[3:7].filter(pa.array(4 * [True]))  # outputs filtered element of a[0:3] 
>>> instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60>
[
  [0, 0, 0],
  [1, 1, 1],
  [2, 2, 2],
  [3, 3, 3]
]
>>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered 
>>> element of a[0:3] instead of a[3:7]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460>
[
  [0, 0, 0],
  [2, 2, 2]
]
>>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered element 
>>> of a[0:3] instead of a[4:]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00>
[
  [0, 0, 0],
  [1, 1, 1],
  [2, 2, 2],
  [3, 3, 3]
]
>>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2] 
>>> instead of a[4:6]
<pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040>
[
  [0, 0, 0],
  [1, 1, 1]
]
>>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * [True])) 
>>> # ListArray slice filtering works ok
<pyarrow.lib.ListArray object at 0x000001E25E5F50A0>
[
  [3, 3, 3],
  [4, 4, 4],
  [5, 5, 5],
  [6, 6, 6]
]
{code}
 

 


> [Python] Filter mask is always applied to elements at the begging of 
> FixedSizeListArray when filtering a slice
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13632
>                 URL: https://issues.apache.org/jira/browse/ARROW-13632
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 5.0.0
>         Environment: Windows 10, Python 3.9
>            Reporter: Vadym Zhernovyi
>            Priority: Major
>
> When calling FixedSizeListArray.filter for a slice, it is always applied to 
> the first (len(slice)) elements at the begging of the array which a slice is 
> created from.
> * The issue doesn't reproduce for ListArray. 
> * a particular mask doesn't matter
> * slice length and position doesn't matter
> * a number of elements filtered at wrong position is always equal to a length 
> of a slice
> * the issues is not reproduced with 
> [ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html]
> * a type of data (int32, float, ...) doesn't matter
> {code:python}
> Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) 
> [MSC v.1916 64 bit (AMD64)] on win32
> >>> import numpy as np
> >>> import pyarrow as pa
> >>> np.__version__
> '1.21.1'
> >>> pa.__version__
> '5.0.0'
> >>> data = [
>     np.zeros(3, dtype='int32'),
>     np.ones(3, dtype='int32'),
>     np.ones(3, dtype='int32') + 1,
>     np.ones(3, dtype='int32') + 2,
>     np.ones(3, dtype='int32') + 3,
>     np.ones(3, dtype='int32') + 4,
>     np.ones(3, dtype='int32') + 5,
>     np.ones(3, dtype='int32') + 6
>       ]
> a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # 
> FixedSizeListArray
> >>> a.filter(pa.array(len(a) * [True]))  # everything is ok 
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0>
> [
>   [0, 0, 0],
>   [1, 1, 1],
>   [2, 2, 2],
>   [3, 3, 3],
>   [4, 4, 4],
>   [5, 5, 5],
>   [6, 6, 6],
>   [7, 7, 7]
> ]
> >>> a[3:7].filter(pa.array(4 * [True]))  # outputs filtered element of a[0:3] 
> >>> instead of a[3:7]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60>
> [
>   [0, 0, 0],
>   [1, 1, 1],
>   [2, 2, 2],
>   [3, 3, 3]
> ]
> >>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered 
> >>> element of a[0:3] instead of a[3:7]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460>
> [
>   [0, 0, 0],
>   [2, 2, 2]
> ]
> >>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered 
> >>> element of a[0:3] instead of a[4:]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00>
> [
>   [0, 0, 0],
>   [1, 1, 1],
>   [2, 2, 2],
>   [3, 3, 3]
> ]
> >>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2] 
> >>> instead of a[4:6]
> <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040>
> [
>   [0, 0, 0],
>   [1, 1, 1]
> ]
> >>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * 
> >>> [True])) # ListArray slice filtering works ok
> <pyarrow.lib.ListArray object at 0x000001E25E5F50A0>
> [
>   [3, 3, 3],
>   [4, 4, 4],
>   [5, 5, 5],
>   [6, 6, 6]
> ]
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to