Ma77Ball commented on code in PR #5667:
URL: https://github.com/apache/texera/pull/5667#discussion_r3424323473


##########
amber/src/main/python/pytexera/storage/dataset_file_document.py:
##########
@@ -19,9 +19,37 @@
 import os
 import requests
 import urllib.parse
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry

Review Comment:
   Done. Added `urllib3==2.7.0` and `requests==2.34.0` to `requirements.txt`, 
pinned to the lock versions. `requests` was also imported directly but only 
transitive, so made it explicit too.



##########
amber/src/main/python/pytexera/storage/dataset_file_document.py:
##########
@@ -19,9 +19,37 @@
 import os
 import requests
 import urllib.parse
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
 
 
 class DatasetFileDocument:
+    # (connect, read) timeout and retry settings for the file-service GETs 
below.
+    _CONNECT_TIMEOUT_SECONDS = 10
+    _READ_TIMEOUT_SECONDS = 60
+    _REQUEST_TIMEOUT = (_CONNECT_TIMEOUT_SECONDS, _READ_TIMEOUT_SECONDS)
+    _MAX_RETRIES = 3
+    _RETRY_BACKOFF_FACTOR = 0.5
+    _RETRY_STATUS_FORCELIST = (500, 502, 503, 504)
+
+    @classmethod
+    def _build_session(cls) -> requests.Session:

Review Comment:
   Done, renamed to `_retry_session()`. Both call sites already use `with 
self._retry_session() as session:`.



##########
amber/src/main/python/pytexera/storage/dataset_file_document.py:
##########
@@ -19,9 +19,37 @@
 import os
 import requests
 import urllib.parse
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
 
 
 class DatasetFileDocument:
+    # (connect, read) timeout and retry settings for the file-service GETs 
below.
+    _CONNECT_TIMEOUT_SECONDS = 10
+    _READ_TIMEOUT_SECONDS = 60

Review Comment:
   Done. Now connect 5s / read 10s (was 10s/60s); worst case ~48s vs ~3min. 
Read timeout bounds inactivity between bytes, not total download, so still safe 
for large files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to