pitrou commented on issue #43497:
URL: https://github.com/apache/arrow/issues/43497#issuecomment-2358649730
Ok, it seems the request simply fails authenticating and then retries a
number of times. You can see this by putting a limit on retry duration (5
seconds in the example below):
```python
>>> import pyarrow.dataset as ds
>>> uri =
"gs://datachain-demo/laion-aesthetics-csv/laion_aesthetics_1024_33M_1.csv?retry_limit_seconds=5"
>>> dataset = ds.dataset(uri, format="csv")
Traceback (most recent call last):
...
OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted
GetObjectMetadata: Could not create a OAuth2 access token to authenticate the
request. The request was not sent, as such an access token is required to
complete the request successfully. Learn more about Google Cloud authentication
at https://cloud.google.com/docs/authentication. The underlying error message
was: PerformWork() - CURL error [6]=Couldn't resolve host name)
```
It's a pity that by default this would retry for some long on an
authentication failure, though. Perhaps there's a way to avoid that?
cc @coryan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]