amoeba opened a new issue, #33904: URL: https://github.com/apache/arrow/issues/33904
### Describe the bug, including details regarding any error messages, version, and platform. If I want to use arrow with non-AWS, S3-compatible storage, I might find the `s3_bucket` convenience function from our documentation [[1](https://arrow.apache.org/docs/dev/r/articles/arrow.html#connecting-to-cloud-storage), [2](https://arrow.apache.org/docs/dev/r/articles/fs.html)]. The documentation doesn't always mention non-AWS storage but I might see in the help page for `s3_bucket` that I can pass options to `S3FileSystem$Create` using dots and those options include `endpoint_override`. I might then try the following: ```r s3_bucket( "my-bucket", endpoint_override = "example.org", anonymous = TRUE ) ``` However, this results in the following error: ``` # Error: IOError: Bucket 'my-bucket' not found # /Users/bryce/src/apache/arrow/cpp/src/arrow/filesystem/s3fs.cc:829 ResolveRegionUncached(bucket) # /Users/bryce/src/apache/arrow/cpp/src/arrow/filesystem/s3fs.cc:357 ResolveS3BucketRegion(bucket) # /Users/bryce/src/apache/arrow/cpp/src/arrow/filesystem/filesystem.cc:719 S3Options::FromUri(uri, out_path) ``` You can see that we're trying to resolve the bucket's AWS region which fails as expected. Interestingly, the user that reported this issue to me found they could prevent this error by creating a bucket on AWS with the same name at which point the region resolution would succeed (pointlessly), the `S3FileSystem` createsd would respect their `endpoint_override`, and future calls like $ls() would target the correct endpoint. This could actually cause some significant frustration for non-AWS users. If I go back to the docs and read further down on https://arrow.apache.org/docs/dev/r/articles/fs.html#file-systems-that-emulate-s3, I find that the recommended way to do what I want is directly through `S3FileSystem$create`: ```r S3FileSystem$create( endpoint_override = "example.org", anonymous = TRUE ) ``` I was made aware through other channels while researching this issue that this syntax would also have worked (though it's not documented sufficiently): ```r s3_bucket("s3://anonymous@my-bucket?endpoint_override=example.org") ``` With the increasing number of non-AWS providers of S3-compatible storage (e.g., [Ceph](https://docs.ceph.com/en/latest/radosgw/s3/), [DigitalOcean](https://www.digitalocean.com/products/spaces)], I only think we'll have more users of non-AWS, S3-compatible storage and those users will run into confusion over (1) how things work and (2) how things are documented. I'd like to: 1. Do a pass over the documentation (online, and R help pages) a. To make the messaging clear that we support AWS, GCS, and S3-compatible storage providers b. Add more examples to `s3_bucket` to make it clearer how a user can go about doing that 3. Change the behavior of `s3_bucket` to special-case passing `endpoint_override` to skip region resolution and just jump directly to creating an `S3FileSystem`. ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
