[ https://issues.apache.org/jira/browse/ARROW-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566710#comment-17566710 ]
Micah Kornfield commented on ARROW-17069: ----------------------------------------- Also does it help if you increase retry_limit_seconds to something higher? > [Python][R] GCSFIleSystem reports cannot resolve host on public buckets > ----------------------------------------------------------------------- > > Key: ARROW-17069 > URL: https://issues.apache.org/jira/browse/ARROW-17069 > Project: Apache Arrow > Issue Type: Bug > Components: Python, R > Affects Versions: 8.0.0 > Reporter: Will Jones > Assignee: Will Jones > Priority: Critical > Fix For: 9.0.0 > > > GCSFileSystem will returns {{Couldn't resolve host name}} if you don't supply > {{anonymous}} as the user: > {code:python} > import pyarrow.dataset as ds > # Fails: > dataset = > ds.dataset("gs://voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", > line 749, in dataset > return _filesystem_dataset(source, **kwargs) > File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", > line 441, in _filesystem_dataset > fs, paths_or_selector = _ensure_single_source(source, filesystem) > File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", > line 408, in _ensure_single_source > file_info = filesystem.get_file_info(path) > File "pyarrow/_fs.pyx", line 444, in pyarrow._fs.FileSystem.get_file_info > info = GetResultValue(self.fs.GetFileInfo(path)) > File "pyarrow/error.pxi", line 144, in > pyarrow.lib.pyarrow_internal_check_status > return check_status(status) > File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status > raise IOError(message) > OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted in > GetObjectMetadata: EasyPerform() - CURL error [6]=Couldn't resolve host name) > # This works fine: > >>> dataset = > >>> ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3") > {code} > I would expect that we could connect. -- This message was sent by Atlassian Jira (v8.20.10#820010)