[jira] [Commented] (ARROW-1319) [Python] Add additional HDFS filesystem methods

Martin Durant (Jira) Wed, 04 Aug 2021 08:31:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393269#comment-17393269
 ]


Martin Durant commented on ARROW-1319:
--------------------------------------

> how it can fit into the current API

 

I mean, it's not really up me. It's something that HDFS allows you to do that 
some people may find useful for file-system operations on HDFS. Yes, hdfs3 had 
this functionality. 

 

>  the filesystem API is mostly meant for Arrow purposes of reading and writing 
>datasets

 

!! I thought this was meant to be a general-purpose, cross-platform file-system 
interface for the supported backends. pyarrow is the *only* way for python 
users to interact with HDFS. If they can't make delegation tokens with this 
interface, they won't be able to anywhere else. Other functionality falls into 
this bucket too, such as setting the number of replications of some file.

 

> [Python] Add additional HDFS filesystem methods
> -----------------------------------------------
>
>                 Key: ARROW-1319
>                 URL: https://issues.apache.org/jira/browse/ARROW-1319
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Martin Durant
>            Priority: Major
>              Labels: HDFS, filesystem
>
> The python library hdfs3 http://hdfs3.readthedocs.io/en/latest/api.html 
> contains a wider set of file-system methods than arrow's python bindings. 
> These are probably simple to implement for arrow-hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-1319) [Python] Add additional HDFS filesystem methods

Reply via email to