[ 
https://issues.apache.org/jira/browse/ARROW-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566710#comment-17566710
 ] 

Micah Kornfield commented on ARROW-17069:
-----------------------------------------

Also does it help if you increase retry_limit_seconds to something higher?

> [Python][R] GCSFIleSystem reports cannot resolve host on public buckets
> -----------------------------------------------------------------------
>
>                 Key: ARROW-17069
>                 URL: https://issues.apache.org/jira/browse/ARROW-17069
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python, R
>    Affects Versions: 8.0.0
>            Reporter: Will Jones
>            Assignee: Will Jones
>            Priority: Critical
>             Fix For: 9.0.0
>
>
> GCSFileSystem will returns {{Couldn't resolve host name}} if you don't supply 
> {{anonymous}} as the user:
> {code:python}
> import pyarrow.dataset as ds
> # Fails:
> dataset = 
> ds.dataset("gs://voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 749, in dataset
>     return _filesystem_dataset(source, **kwargs)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 441, in _filesystem_dataset
>     fs, paths_or_selector = _ensure_single_source(source, filesystem)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", 
> line 408, in _ensure_single_source
>     file_info = filesystem.get_file_info(path)
>   File "pyarrow/_fs.pyx", line 444, in pyarrow._fs.FileSystem.get_file_info
>     info = GetResultValue(self.fs.GetFileInfo(path))
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>     return check_status(status)
>   File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
>     raise IOError(message)
> OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted in 
> GetObjectMetadata: EasyPerform() - CURL error [6]=Couldn't resolve host name)
> # This works fine:
> >>> dataset = 
> >>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> {code}
> I would expect that we could connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to