[ https://issues.apache.org/jira/browse/ARROW-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375719#comment-17375719 ]
Wes McKinney commented on ARROW-12970: -------------------------------------- It would probably be worth the effort to implement the "tuplization" of RecordBatch in the libarrow_python C++ library to make it reasonably efficient. This would also be a good opportunity to move the implementation of the {{*Scalar.as_py}} methods into libarrow_python since you would only want to have one canonical implementation of boxing Arrow array values as Python objects. This relates to ARROW-12976 also. I can't find the Jira issue about moving the as_py implementations into C++, but I recall there was one in the past that [~kszucs] may have been working on at some point. > [Python] Efficient "row accessor" for a pyarrow RecordBatch / Table > ------------------------------------------------------------------- > > Key: ARROW-12970 > URL: https://issues.apache.org/jira/browse/ARROW-12970 > Project: Apache Arrow > Issue Type: New Feature > Components: Python > Reporter: Luke Higgins > Priority: Minor > Fix For: 6.0.0 > > > It would be nice to have a nice row accessor for a Table akin to > pandas.DataFrame.itertuples. > I have a lot of code where I am converting a parquet file to pandas just to > have access to the rows through iterating with itertuples. Having this > ability in pyarrow natively would be a nice feature and would avoid memory > copy in the pandas conversion. -- This message was sent by Atlassian Jira (v8.3.4#803005)