Paulo Roberto Cerioni created ARROW-6469:
--------------------------------------------

             Summary: PyArrow HDFS documentation does not mention HDFS short 
circuit readings
                 Key: ARROW-6469
                 URL: https://issues.apache.org/jira/browse/ARROW-6469
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Paulo Roberto Cerioni


Due to PyArrow using libhdfs underneath, it is expected that files reading from 
HDFS are going to make use of short circuit readings.

However, the PyArrow documentation does not explain whether this feature is 
supported (and on what situations) and if that works without any configuration.

For instance, I'm interested in the use case in which we make use of short 
circuit feature to read some of the columns from a Parquet file located in HDFS 
into a dataframe.

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to