Ma77Ball commented on code in PR #5667: URL: https://github.com/apache/texera/pull/5667#discussion_r3424323473
########## amber/src/main/python/pytexera/storage/dataset_file_document.py: ########## @@ -19,9 +19,37 @@ import os import requests import urllib.parse +from requests.adapters import HTTPAdapter +from urllib3.util.retry import Retry Review Comment: Done. Added `urllib3==2.7.0` and `requests==2.34.0` to `requirements.txt`, pinned to the lock versions. `requests` was also imported directly but only transitive, so made it explicit too. ########## amber/src/main/python/pytexera/storage/dataset_file_document.py: ########## @@ -19,9 +19,37 @@ import os import requests import urllib.parse +from requests.adapters import HTTPAdapter +from urllib3.util.retry import Retry class DatasetFileDocument: + # (connect, read) timeout and retry settings for the file-service GETs below. + _CONNECT_TIMEOUT_SECONDS = 10 + _READ_TIMEOUT_SECONDS = 60 + _REQUEST_TIMEOUT = (_CONNECT_TIMEOUT_SECONDS, _READ_TIMEOUT_SECONDS) + _MAX_RETRIES = 3 + _RETRY_BACKOFF_FACTOR = 0.5 + _RETRY_STATUS_FORCELIST = (500, 502, 503, 504) + + @classmethod + def _build_session(cls) -> requests.Session: Review Comment: Done, renamed to `_retry_session()`. Both call sites already use `with self._retry_session() as session:`. ########## amber/src/main/python/pytexera/storage/dataset_file_document.py: ########## @@ -19,9 +19,37 @@ import os import requests import urllib.parse +from requests.adapters import HTTPAdapter +from urllib3.util.retry import Retry class DatasetFileDocument: + # (connect, read) timeout and retry settings for the file-service GETs below. + _CONNECT_TIMEOUT_SECONDS = 10 + _READ_TIMEOUT_SECONDS = 60 Review Comment: Done. Now connect 5s / read 10s (was 10s/60s); worst case ~48s vs ~3min. Read timeout bounds inactivity between bytes, not total download, so still safe for large files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
