[ 
https://issues.apache.org/jira/browse/ARROW-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331665#comment-17331665
 ] 

Ying Zhou commented on ARROW-9299:
----------------------------------

See https://issues.apache.org/jira/browse/ARROW-12535

> [Python] Expose ORC metadata() in Python ORCFile
> ------------------------------------------------
>
>                 Key: ARROW-9299
>                 URL: https://issues.apache.org/jira/browse/ARROW-9299
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 0.17.1
>            Reporter: Jeremy Dyer
>            Assignee: Ying Zhou
>            Priority: Major
>              Labels: orc
>
> There is currently no way for a user to directly access the underlying ORC 
> metadata of a given file. It seems the C++ functions and objects already 
> existing and rather the plumbing is just missing the the cython/python and 
> potentially a few c++ shims. Giving users the ability to retrieve the 
> metadata without first reading the entire file could help numerous 
> applications to increase their query performance by allowing them to 
> intelligently determine which ORC stripes should be read.  
> This would allow for something like 
> {code:java}
> import pyarrow as pa 
> orc_metadata = pa.orc.ORCFile(filename).metadata()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to