[ 
https://issues.apache.org/jira/browse/ARROW-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566822#comment-17566822
 ] 

Carlos O'Ryan commented on ARROW-17069:
---------------------------------------

At a guess {{gs://voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3}} 
is using the default credentials.  After trying a number of things the client 
library will try to GCE credentials, which try to resolve 
{{metadata.google.internal}}.  So, not the most illuminating error message, but 
expected if (a) you are using default credentials, and (b) you have not 
configured your default credentials, and (c) you are not running on GCE where 
the fallback would work.


> [Python][R] GCSFIleSystem reports cannot resolve host on public buckets
> -----------------------------------------------------------------------
>
>                 Key: ARROW-17069
>                 URL: https://issues.apache.org/jira/browse/ARROW-17069
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python, R
>    Affects Versions: 8.0.0
>            Reporter: Will Jones
>            Assignee: Will Jones
>            Priority: Critical
>             Fix For: 9.0.0
>
>
> GCSFileSystem will returns {{Couldn't resolve host name}} if you don't supply 
> {{anonymous}} as the user:
> {code:python}
> import pyarrow.dataset as ds
> # Fails:
> dataset = 
> ds.dataset("gs://voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 749, in dataset
>     return _filesystem_dataset(source, **kwargs)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 441, in _filesystem_dataset
>     fs, paths_or_selector = _ensure_single_source(source, filesystem)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 408, in _ensure_single_source
>     file_info = filesystem.get_file_info(path)
>   File "pyarrow/_fs.pyx", line 444, in pyarrow._fs.FileSystem.get_file_info
>     info = GetResultValue(self.fs.GetFileInfo(path))
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>     return check_status(status)
>   File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
>     raise IOError(message)
> OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted in 
> GetObjectMetadata: EasyPerform() - CURL error [6]=Couldn't resolve host name)
> # This works fine:
> >>> dataset = 
> >>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> {code}
> I would expect that we could connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to