Hello,

I created this ticket to discuss possible improvements of the new PyArrow 
FileSystem API
https://issues.apache.org/jira/browse/ARROW-7584
 
As of today there seem to be only two popular projects to have an agnostic 
FileSystem API that can handle S3 & HDFS from Python:
- PyArrow via https://arrow.apache.org/docs/python/filesystems.html
- TensorFlow via https://www.tensorflow.org/api_docs/python/tf/io/gfile/GFile
 
On my side I would like to reuse a clean FileSystem API in my project and 
turned to the arrow for this purpose (I think TensorFlow already handles too 
many use cases should not provide yet another feature).
 
"Clean FileSystem API" for me also means to cover the interactive use case 
where one uses that API like the file system shell commands. We actually used 
https://github.com/dask/hdfs3 before and it worked really.
 
Currently there is the FileSystem API work in progress (see 
https://github.com/apache/arrow/blob/master/python/pyarrow/_fs.pyx#L185) and I 
would take the occasion to improve it and fix some issues with the existing API.
 
Can you have a look at the comments on 
https://issues.apache.org/jira/browse/ARROW-7584 and give feedback ?
 
I can do the implementations I suggest on my side but would like to make sure 
they will be accepted.

Best regards,
Fabian Höring

Reply via email to