Ma77Ball opened a new pull request, #5667:
URL: https://github.com/apache/texera/pull/5667
### What changes were proposed in this PR?
- Route `DatasetFileDocument`'s presigned-URL fetch and file download
through a `requests.Session` with a `(10s connect, 60s read)` timeout, so a
hung or unreachable file-service fails in bounded time instead of blocking the
worker thread forever.
- Mount a `urllib3` `Retry` policy on the session (3 retries, exponential
backoff, retrying on connection errors and 5xx); both calls are idempotent
GETs, so retrying is safe.
### Any related issues, documentation, discussions?
Closes: #5666
### How was this PR tested?
- Hardening change with no existing spec for this path; verified manually by
loading the module and asserting the retry policy is wired (total=3,
backoff=0.5, status_forcelist={500,502,503,504}, GET-only).
- Confirmed a request to a non-routable host now fails in bounded time
(ConnectTimeout) rather than hanging, where the old no-timeout call would block
indefinitely.
- `ruff check` and `ruff format --check` pass on the modified file.
### Was this PR authored or co-authored using generative AI tooling?
Co-authored with Claude Opus 4.8 in compliance with ASF
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]