[ 
https://issues.apache.org/jira/browse/ARROW-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039278#comment-17039278
 ] 

Wes McKinney commented on ARROW-6469:
-------------------------------------

It seems that short-circuit reads are a configuration issue with the Java 
client library. Presumably if they are configured properly then libhdfs will do 
them under the hood (since it's using JNI to make Java client API calls)

> [Python] HDFS documentation does not mention HDFS short circuit readings
> ------------------------------------------------------------------------
>
>                 Key: ARROW-6469
>                 URL: https://issues.apache.org/jira/browse/ARROW-6469
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Paulo Roberto Cerioni
>            Priority: Minor
>              Labels: documentation
>
> Due to PyArrow using libhdfs underneath, it is expected that files read from 
> HDFS are going to make use of short circuit readings.
> However, the PyArrow documentation does not explain whether this feature is 
> supported (and on what situations) and if that works without any 
> configuration.
> For instance, I'm interested in the use case in which we make use of short 
> circuit feature to read some of the columns from a Parquet file located in 
> HDFS into a dataframe.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to