Carl Boettiger created ARROW-14998:
--------------------------------------

             Summary: Support for HTTPS Filesystem access (from R client)
                 Key: ARROW-14998
                 URL: https://issues.apache.org/jira/browse/ARROW-14998
             Project: Apache Arrow
          Issue Type: Wish
          Components: R
            Reporter: Carl Boettiger


Thanks for such an amazing project. I've been entirely blown away by the S3 
Filesystem access in the latest release; and excited to see other backends like 
Azure being discussed in the issues.  As you know, many https clients also 
permit range requests, meaning (I think) that it should be possible to access 
public data (parquet, csv files) over generic HTTPS connections too.

As you probably know, duckdb already has support for https based remote file 
access, e.g. 
[https://github.com/duckdb/duckdb/blob/master/test/sql/copy/parquet/test_parquet_remote.test|https://github.com/duckdb/duckdb/blob/master/test/sql/copy/parquet/test_parquet_remote.test.]

 (though it is not available out-of-the-box in the R client there either).

 

It would be wonderful to have a similar remote filesystem access that could 
work over HTTPS like that in arrow.  (I gather on the python side, fsspec 
already gives access to a wide number of such abstractions, but we're more 
limited in R so far, except for the geospatial data, where bindings to GDAL 
mean we can access GDAL's rather amazing virtual file systems over https, S3, 
FTP, etc, [https://gdal.org/user/virtual_file_systems.html] – a nice array-data 
complement to the more database-oriented workflow of arrow...).

 

Thanks for considering!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to