amoeba opened a new issue, #33904:
URL: https://github.com/apache/arrow/issues/33904

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   If I want to use arrow with non-AWS, S3-compatible storage, I might find the 
`s3_bucket` convenience function from our documentation 
[[1](https://arrow.apache.org/docs/dev/r/articles/arrow.html#connecting-to-cloud-storage),
 [2](https://arrow.apache.org/docs/dev/r/articles/fs.html)].  The documentation 
doesn't always mention non-AWS storage but I might see in the help page for 
`s3_bucket` that I can pass options to `S3FileSystem$Create` using dots and 
those options include `endpoint_override`. I might then try the following:
   
   ```r
   s3_bucket(
       "my-bucket",
       endpoint_override = "example.org", 
       anonymous = TRUE
   )
   ```
   
   However, this results in the following error:
   
   ```
   # Error: IOError: Bucket 'my-bucket' not found
   # /Users/bryce/src/apache/arrow/cpp/src/arrow/filesystem/s3fs.cc:829  
ResolveRegionUncached(bucket)
   # /Users/bryce/src/apache/arrow/cpp/src/arrow/filesystem/s3fs.cc:357  
ResolveS3BucketRegion(bucket)
   # /Users/bryce/src/apache/arrow/cpp/src/arrow/filesystem/filesystem.cc:719  
S3Options::FromUri(uri, out_path)
   ```
   
   You can see that we're trying to resolve the bucket's AWS region which fails 
as expected. Interestingly, the user that reported this issue to me found they 
could prevent this error by creating a bucket on AWS with the same name at 
which point the region resolution would succeed (pointlessly), the 
`S3FileSystem` createsd would respect their `endpoint_override`, and future 
calls like $ls() would target the correct endpoint. This could actually cause 
some significant frustration for non-AWS users.
   
   If I go back to the docs and read further down on 
https://arrow.apache.org/docs/dev/r/articles/fs.html#file-systems-that-emulate-s3,
 I find that the recommended way to do what I want is directly through 
`S3FileSystem$create`:
   
   ```r
   S3FileSystem$create(
       endpoint_override = "example.org",
       anonymous = TRUE
   )
   ```
   
   I was made aware through other channels while researching this issue that 
this syntax would also have worked (though it's not documented sufficiently):
   
   ```r
   s3_bucket("s3://anonymous@my-bucket?endpoint_override=example.org")
   ```
   
   With the increasing number of non-AWS providers of S3-compatible storage 
(e.g., [Ceph](https://docs.ceph.com/en/latest/radosgw/s3/), 
[DigitalOcean](https://www.digitalocean.com/products/spaces)], I only think 
we'll have more users of non-AWS, S3-compatible storage and those users will 
run into confusion over (1) how things work and (2) how things are documented.
   
   I'd like to:
   
   1. Do a pass over the documentation (online, and R help pages)
     a.  To make the messaging clear that we support AWS, GCS, and 
S3-compatible storage providers
     b. Add more examples to `s3_bucket` to make it clearer how a user can go 
about doing that
   3. Change the behavior of `s3_bucket` to special-case passing 
`endpoint_override` to skip region resolution and just jump directly to 
creating an `S3FileSystem`.
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to