Matthew Roeschke created ARROW-17360:
----------------------------------------

             Summary: [Python] pyarrow.orc.ORCFile.read does not preserve 
ordering of columns
                 Key: ARROW-17360
                 URL: https://issues.apache.org/jira/browse/ARROW-17360
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
    Affects Versions: 8.0.1
            Reporter: Matthew Roeschke


xref [https://github.com/pandas-dev/pandas/issues/47944]

 
{code:java}
In [1]: df = pd.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]})

# pandas main branch / 1.5
In [2]: df.to_orc("abc")

In [3]: pd.read_orc("abc", columns=['b', 'a'])
Out[3]:
   a  b
0  1  a
1  2  b
2  3  c

In [4]: import pyarrow.orc as orc

In [5]: orc_file = orc.ORCFile("abc")

# reordered to a, b
In [6]: orc_file.read(columns=['b', 'a']).to_pandas()
Out[6]:
   a  b
0  1  a
1  2  b
2  3  c

# reordered to a, b
In [7]: orc_file.read(columns=['b', 'a'])
Out[7]:
pyarrow.Table
a: int64
b: string
----
a: [[1,2,3]]
b: [["a","b","c"]] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to