[ 
https://issues.apache.org/jira/browse/ARROW-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566997#comment-17566997
 ] 

Carlos O'Ryan commented on ARROW-17069:
---------------------------------------

{quote}Well, "access denied" is a more helpful hint at the potential resolution 
than "Couldn't resolve host name"...
{quote}
I agree with you that the error message is terrible, and I will fix it as soon 
as I have a minute. But that is not the right fix. No access has been denied, 
not yet. Authorization has not taken place, i.e., nothing has verified whether 
the principal (in this case {{{}w...@xxx.com{}}}) has permissions to access 
some GCS resource. In fact, not even authentication has taken place, i.e., the 
client library was trying to create the tokens that prove the access is coming 
from {{w...@xxx.com}} and *that* failed. I think a better message could be:
{quote}Could not create a OAuth2 access token authenticate your request with 
Google Cloud. Your request was not sent. Such an access token is required to 
complete the request successfully (unless you meant to use anonymous access, 
but the library is not configured to do so). The underlying error was: 
EasyPerform() - CURL error [6]=Couldn't resolve host name.
{quote}

> [Python][R] GCSFIleSystem reports cannot resolve host on public buckets
> -----------------------------------------------------------------------
>
>                 Key: ARROW-17069
>                 URL: https://issues.apache.org/jira/browse/ARROW-17069
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python, R
>    Affects Versions: 8.0.0
>            Reporter: Will Jones
>            Assignee: Will Jones
>            Priority: Critical
>             Fix For: 9.0.0
>
>
> GCSFileSystem will returns {{Couldn't resolve host name}} if you don't supply 
> {{anonymous}} as the user:
> {code:python}
> import pyarrow.dataset as ds
> # Fails:
> dataset = 
> ds.dataset("gs://voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 749, in dataset
>     return _filesystem_dataset(source, **kwargs)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 441, in _filesystem_dataset
>     fs, paths_or_selector = _ensure_single_source(source, filesystem)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 408, in _ensure_single_source
>     file_info = filesystem.get_file_info(path)
>   File "pyarrow/_fs.pyx", line 444, in pyarrow._fs.FileSystem.get_file_info
>     info = GetResultValue(self.fs.GetFileInfo(path))
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>     return check_status(status)
>   File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
>     raise IOError(message)
> OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted in 
> GetObjectMetadata: EasyPerform() - CURL error [6]=Couldn't resolve host name)
> # This works fine:
> >>> dataset = 
> >>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> {code}
> I would expect that we could connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to