[ https://issues.apache.org/jira/browse/ARROW-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331665#comment-17331665 ]
Ying Zhou commented on ARROW-9299: ---------------------------------- See https://issues.apache.org/jira/browse/ARROW-12535 > [Python] Expose ORC metadata() in Python ORCFile > ------------------------------------------------ > > Key: ARROW-9299 > URL: https://issues.apache.org/jira/browse/ARROW-9299 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python > Affects Versions: 0.17.1 > Reporter: Jeremy Dyer > Assignee: Ying Zhou > Priority: Major > Labels: orc > > There is currently no way for a user to directly access the underlying ORC > metadata of a given file. It seems the C++ functions and objects already > existing and rather the plumbing is just missing the the cython/python and > potentially a few c++ shims. Giving users the ability to retrieve the > metadata without first reading the entire file could help numerous > applications to increase their query performance by allowing them to > intelligently determine which ORC stripes should be read. > This would allow for something like > {code:java} > import pyarrow as pa > orc_metadata = pa.orc.ORCFile(filename).metadata() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)