Neal Richardson created ARROW-18089: ---------------------------------------
Summary: [R] Cannot read_parquet on http URL Key: ARROW-18089 URL: https://issues.apache.org/jira/browse/ARROW-18089 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Neal Richardson Fix For: 11.0.0 {code} u <- "https://raw.githubusercontent.com/apache/arrow/master/r/inst/v0.7.1.parquet" read_parquet(u) # Error: file must be a "RandomAccessFile" read_parquet(url(u)) # Error: file must be a "RandomAccessFile" {code} The issue is that urls get turned into InputStream by {{make_readable_file}}, and parquet requires RandomAccessFile. {code} arrow:::make_readable_file(u) # InputStream {code} There are two relevant codepaths in make_readable_file: if given a string URL, it tries {{FileSystem$from_uri()}} and falls back to {{MakeRConnectionInputStream}}, which returns InputStream not RandomAccessFile. If provided a connection object (i.e. {{url(u)}}), it tries MakeRConnectionRandomAccessFile first and falls back to MakeRConnectionInputStream. If you provide a {{url()}} it does fall back to InputStream: {code} arrow:::MakeRConnectionRandomAccessFile(url(u)) # Error: Tell() returned an error {code} If we truly can't work with a HTTP URL in read_parquet, we should at least document that. We could also do the workaround of downloading to a tempfile first. -- This message was sent by Atlassian Jira (v8.20.10#820010)