[ https://issues.apache.org/jira/browse/ARROW-13632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou resolved ARROW-13632. ------------------------------------ Resolution: Fixed Issue resolved by pull request 10944 [https://github.com/apache/arrow/pull/10944] > [Python] Filter mask is always applied to elements at the start of > FixedSizeListArray when filtering a slice > ------------------------------------------------------------------------------------------------------------ > > Key: ARROW-13632 > URL: https://issues.apache.org/jira/browse/ARROW-13632 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 5.0.0 > Environment: Windows 10, Python 3.9 > Reporter: Vadym Zhernovyi > Assignee: Antoine Pitrou > Priority: Major > Labels: pull-request-available > Fix For: 5.0.1, 6.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When calling FixedSizeListArray.filter for a slice, it is always applied to > the first (len(slice)) elements at the begging of the array which a slice is > created from. > * The issue doesn't reproduce for ListArray. > * a particular mask doesn't matter > * slice length and position doesn't matter > * a number of elements filtered at wrong position is always equal to a length > of a slice > * the issues is not reproduced with > [ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html] > * a type of data (int32, float, ...) doesn't matter > {code:python} > Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) > [MSC v.1916 64 bit (AMD64)] on win32 > >>> import numpy as np > >>> import pyarrow as pa > >>> np.__version__ > '1.21.1' > >>> pa.__version__ > '5.0.0' > >>> data = [ > np.zeros(3, dtype='int32'), > np.ones(3, dtype='int32'), > np.ones(3, dtype='int32') + 1, > np.ones(3, dtype='int32') + 2, > np.ones(3, dtype='int32') + 3, > np.ones(3, dtype='int32') + 4, > np.ones(3, dtype='int32') + 5, > np.ones(3, dtype='int32') + 6 > ] > >>> a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # > >>> FixedSizeListArray > >>> a.filter(pa.array(len(a) * [True])) # everything is ok > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0> > [ > [0, 0, 0], > [1, 1, 1], > [2, 2, 2], > [3, 3, 3], > [4, 4, 4], > [5, 5, 5], > [6, 6, 6], > [7, 7, 7] > ] > >>> a[3:7].filter(pa.array(4 * [True])) # output is filtered elements of > >>> a[0:3] instead of a[3:7] > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60> > [ > [0, 0, 0], > [1, 1, 1], > [2, 2, 2], > [3, 3, 3] > ] > >>> a[3:7].filter(pa.array([True, False, True, False])) # output is filtered > >>> elements of a[0:3] instead of a[3:7] > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460> > [ > [0, 0, 0], > [2, 2, 2] > ] > >>> a[4:].filter(pa.array([True, True, True, True])) # output is filtered > >>> elements of a[0:3] instead of a[4:] > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00> > [ > [0, 0, 0], > [1, 1, 1], > [2, 2, 2], > [3, 3, 3] > ] > >>> a[4:6].filter(pa.array([True, True])) # output is filtered elements of > >>> a[0:2] instead of a[4:6] > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040> > [ > [0, 0, 0], > [1, 1, 1] > ] > >>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * > >>> [True])) # ListArray slice filtering works ok > <pyarrow.lib.ListArray object at 0x000001E25E5F50A0> > [ > [3, 3, 3], > [4, 4, 4], > [5, 5, 5], > [6, 6, 6] > ] > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)