Hello, I created this ticket to discuss possible improvements of the new PyArrow FileSystem API https://issues.apache.org/jira/browse/ARROW-7584 As of today there seem to be only two popular projects to have an agnostic FileSystem API that can handle S3 & HDFS from Python: - PyArrow via https://arrow.apache.org/docs/python/filesystems.html - TensorFlow via https://www.tensorflow.org/api_docs/python/tf/io/gfile/GFile On my side I would like to reuse a clean FileSystem API in my project and turned to the arrow for this purpose (I think TensorFlow already handles too many use cases should not provide yet another feature). "Clean FileSystem API" for me also means to cover the interactive use case where one uses that API like the file system shell commands. We actually used https://github.com/dask/hdfs3 before and it worked really. Currently there is the FileSystem API work in progress (see https://github.com/apache/arrow/blob/master/python/pyarrow/_fs.pyx#L185) and I would take the occasion to improve it and fix some issues with the existing API. Can you have a look at the comments on https://issues.apache.org/jira/browse/ARROW-7584 and give feedback ? I can do the implementations I suggest on my side but would like to make sure they will be accepted.
Best regards, Fabian Höring