Matthew Roeschke created ARROW-17360: ----------------------------------------
Summary: [Python] pyarrow.orc.ORCFile.read does not preserve ordering of columns Key: ARROW-17360 URL: https://issues.apache.org/jira/browse/ARROW-17360 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 8.0.1 Reporter: Matthew Roeschke xref [https://github.com/pandas-dev/pandas/issues/47944] {code:java} In [1]: df = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]}) # pandas main branch / 1.5 In [2]: df.to_orc("abc") In [3]: pd.read_orc("abc", columns=['b', 'a']) Out[3]: a b 0 1 a 1 2 b 2 3 c In [4]: import pyarrow.orc as orc In [5]: orc_file = orc.ORCFile("abc") # reordered to a, b In [6]: orc_file.read(columns=['b', 'a']).to_pandas() Out[6]: a b 0 1 a 1 2 b 2 3 c # reordered to a, b In [7]: orc_file.read(columns=['b', 'a']) Out[7]: pyarrow.Table a: int64 b: string ---- a: [[1,2,3]] b: [["a","b","c"]] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)