[ https://issues.apache.org/jira/browse/ARROW-18089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dewey Dunnington updated ARROW-18089: ------------------------------------- Fix Version/s: 12.0.0 (was: 11.0.0) > [R] Cannot read_parquet on http URL > ----------------------------------- > > Key: ARROW-18089 > URL: https://issues.apache.org/jira/browse/ARROW-18089 > Project: Apache Arrow > Issue Type: Bug > Components: R > Reporter: Neal Richardson > Priority: Critical > Labels: triaged > Fix For: 12.0.0 > > > {code} > u <- > "https://raw.githubusercontent.com/apache/arrow/master/r/inst/v0.7.1.parquet" > read_parquet(u) > # Error: file must be a "RandomAccessFile" > read_parquet(url(u)) > # Error: file must be a "RandomAccessFile" > {code} > The issue is that urls get turned into InputStream by {{make_readable_file}}, > and parquet requires RandomAccessFile. > {code} > arrow:::make_readable_file(u) > # InputStream > {code} > There are two relevant codepaths in make_readable_file: if given a string > URL, it tries {{FileSystem$from_uri()}} and falls back to > {{MakeRConnectionInputStream}}, which returns InputStream not > RandomAccessFile. If provided a connection object (i.e. {{url(u)}}), it tries > MakeRConnectionRandomAccessFile first and falls back to > MakeRConnectionInputStream. If you provide a {{url()}} it does fall back to > InputStream: > {code} > arrow:::MakeRConnectionRandomAccessFile(url(u)) > # Error: Tell() returned an error > {code} > If we truly can't work with a HTTP URL in read_parquet, we should at least > document that. We could also do the workaround of downloading to a tempfile > first. -- This message was sent by Atlassian Jira (v8.20.10#820010)