[ https://issues.apache.org/jira/browse/ARROW-13632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vadym Zhernovyi updated ARROW-13632: ------------------------------------ Description: When calling FixedSizeListArray.filter for a slice, it is always applied to the first (len(slice)) elements at the begging of the array which a slice is created from. * The issue doesn't reproduce for ListArray. * a particular mask doesn't matter * slice length and position doesn't matter * a number of elements filtered at wrong position is always equal to a length of a slice * the issues is not reproduced with [ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html] * a type of data (int32, float, ...) doesn't matter {code:python} Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC v.1916 64 bit (AMD64)] on win32 >>> import numpy as np >>> import pyarrow as pa >>> np.__version__ '1.21.1' >>> pa.__version__ '5.0.0' >>> data = [ np.zeros(3, dtype='int32'), np.ones(3, dtype='int32'), np.ones(3, dtype='int32') + 1, np.ones(3, dtype='int32') + 2, np.ones(3, dtype='int32') + 3, np.ones(3, dtype='int32') + 4, np.ones(3, dtype='int32') + 5, np.ones(3, dtype='int32') + 6 ] a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # FixedSizeListArray >>> a.filter(pa.array(len(a) * [True])) # everything is ok <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0> [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4], [5, 5, 5], [6, 6, 6], [7, 7, 7] ] >>> a[3:7].filter(pa.array(4 * [True])) # outputs filtered element of a[0:3] >>> instead of a[3:7] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60> [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3] ] >>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered >>> element of a[0:3] instead of a[3:7] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460> [ [0, 0, 0], [2, 2, 2] ] >>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered element >>> of a[0:3] instead of a[4:] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00> [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3] ] >>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2] >>> instead of a[4:6] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040> [ [0, 0, 0], [1, 1, 1] ] >>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * [True])) >>> # ListArray slice filtering works ok <pyarrow.lib.ListArray object at 0x000001E25E5F50A0> [ [3, 3, 3], [4, 4, 4], [5, 5, 5], [6, 6, 6] ] {code} was: When calling FixedSizeListArray.filter for a slice, it is always applied to the first (len(slice)) elements at the begging of the array which a slice is created from. * The issue doesn't reproduce for ListArray. * a particular mask doesn't matter * slice length and position doesn't matter * a number of elements filtered at wrong position is always equal to a length of a slice * the issues is not reproduced with [ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html] * a type of data (int32, float, ...) doesn't matter {code:python} Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) [MSC v.1916 64 bit (AMD64)] on win32 >>> import numpy as np >>> import pyarrow as pa >>> np.__version__ '1.21.1' >>> pa.__version__ '5.0.0' >>> data = [ np.zeros(3, dtype='int32'), np.ones(3, dtype='int32'), np.ones(3, dtype='int32') + 1, np.ones(3, dtype='int32') + 2, np.ones(3, dtype='int32') + 3, np.ones(3, dtype='int32') + 4, np.ones(3, dtype='int32') + 5, np.ones(3, dtype='int32') + 6 ] a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # FixedSizeListArray >>> a.filter(pa.array(len(a) * [True])) # everything is ok <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0> [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4], [5, 5, 5], [6, 6, 6], [7, 7, 7] ] >>> a[3:7].filter(pa.array(4 * [True])) # outputs filtered element of a[0:3] >>> instead of a[3:7] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60> [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3] ] >>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered >>> element of a[0:3] instead of a[3:7] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460> [ [0, 0, 0], [2, 2, 2] ] >>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered element >>> of a[0:3] instead of a[4:] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00> [ [0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3] ] >>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2] >>> instead of a[4:6] <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040> [ [0, 0, 0], [1, 1, 1] ] >>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * [True])) >>> # ListArray slice filtering works ok <pyarrow.lib.ListArray object at 0x000001E25E5F50A0> [ [3, 3, 3], [4, 4, 4], [5, 5, 5], [6, 6, 6] ] {code} > [Python] Filter mask is always applied to elements at the begging of > FixedSizeListArray when filtering a slice > -------------------------------------------------------------------------------------------------------------- > > Key: ARROW-13632 > URL: https://issues.apache.org/jira/browse/ARROW-13632 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 5.0.0 > Environment: Windows 10, Python 3.9 > Reporter: Vadym Zhernovyi > Priority: Major > > When calling FixedSizeListArray.filter for a slice, it is always applied to > the first (len(slice)) elements at the begging of the array which a slice is > created from. > * The issue doesn't reproduce for ListArray. > * a particular mask doesn't matter > * slice length and position doesn't matter > * a number of elements filtered at wrong position is always equal to a length > of a slice > * the issues is not reproduced with > [ListArray|https://arrow.apache.org/docs/python/generated/pyarrow.ListArray.html] > * a type of data (int32, float, ...) doesn't matter > {code:python} > Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:37:25) > [MSC v.1916 64 bit (AMD64)] on win32 > >>> import numpy as np > >>> import pyarrow as pa > >>> np.__version__ > '1.21.1' > >>> pa.__version__ > '5.0.0' > >>> data = [ > np.zeros(3, dtype='int32'), > np.ones(3, dtype='int32'), > np.ones(3, dtype='int32') + 1, > np.ones(3, dtype='int32') + 2, > np.ones(3, dtype='int32') + 3, > np.ones(3, dtype='int32') + 4, > np.ones(3, dtype='int32') + 5, > np.ones(3, dtype='int32') + 6 > ] > a = pa.array(data, type=pa.list_(pa.int32(), list_size=3)) # > FixedSizeListArray > >>> a.filter(pa.array(len(a) * [True])) # everything is ok > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA7C0> > [ > [0, 0, 0], > [1, 1, 1], > [2, 2, 2], > [3, 3, 3], > [4, 4, 4], > [5, 5, 5], > [6, 6, 6], > [7, 7, 7] > ] > >>> a[3:7].filter(pa.array(4 * [True])) # outputs filtered element of a[0:3] > >>> instead of a[3:7] > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DAD60> > [ > [0, 0, 0], > [1, 1, 1], > [2, 2, 2], > [3, 3, 3] > ] > >>> a[3:7].filter(pa.array([True, False, True, False])) # outputs filtered > >>> element of a[0:3] instead of a[3:7] > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5DA460> > [ > [0, 0, 0], > [2, 2, 2] > ] > >>> a[4:].filter(pa.array([True, True, True, True])) # outputs filtered > >>> element of a[0:3] instead of a[4:] > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5EED00> > [ > [0, 0, 0], > [1, 1, 1], > [2, 2, 2], > [3, 3, 3] > ] > >>> a[4:6].filter(pa.array([True, True]))# outputs filtered element of a[0:2] > >>> instead of a[4:6] > <pyarrow.lib.FixedSizeListArray object at 0x000001E25E5F5040> > [ > [0, 0, 0], > [1, 1, 1] > ] > >>> pa.array(data, type=pa.list_(pa.int32()))[3:7].filter(pa.array(4 * > >>> [True])) # ListArray slice filtering works ok > <pyarrow.lib.ListArray object at 0x000001E25E5F50A0> > [ > [3, 3, 3], > [4, 4, 4], > [5, 5, 5], > [6, 6, 6] > ] > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)