[ https://issues.apache.org/jira/browse/ARROW-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Molina updated ARROW-16518: -------------------------------------- Parent: (was: ARROW-17212) Issue Type: Bug (was: Sub-task) > [Python] Ensure _exec_plan.execplan preserves order of inputs > ------------------------------------------------------------- > > Key: ARROW-16518 > URL: https://issues.apache.org/jira/browse/ARROW-16518 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Alessandro Molina > Assignee: Alessandro Molina > Priority: Major > Fix For: 11.0.0 > > > At the moment execplan doesn't guarantee any ordered output, the batches are > consumed in a random order. This can lead to unordered rows in outputs when > {{use_threads=True}} > For example providing a column with {{b=[a, a, a, a, b, b, b, b]}} will > sometimes give back {{b=[a, b]}} and sometimes {{b=[b, a]}} > See > {code:java} > In [18]: table1 = pa.table({'a': [1, 2, 3, 4], 'b': ['a'] * 4}) > In [19]: table2 = pa.table({'a': [1, 2, 3, 4], 'b': ['b'] * 4}) > In [20]: table = pa.concat_tables([table1, table2]) > In [21]: ep._filter_table(table, pc.field('a') == 1) > Out[21]: > pyarrow.Table > a: int64 > b: string > ---- > a: [[1],[1]] > b: [["b"],["a"]] > In [22]: ep._filter_table(table, pc.field('a') == 1) > Out[22]: > pyarrow.Table > a: int64 > b: string > ---- > a: [[1],[1]] > b: [["a"],["b"]] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)