Yicong-Huang commented on code in PR #5667:
URL: https://github.com/apache/texera/pull/5667#discussion_r3418901133
##########
amber/src/main/python/pytexera/storage/dataset_file_document.py:
##########
@@ -19,9 +19,37 @@
import os
import requests
import urllib.parse
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
Review Comment:
I checked we did not require `urllib3` as a dependency, it was available due
to a transitive dependency from `requests`.
we should not import a transitive dependency directly: if we import
something then it is our dependency.
I think there are two ways to go:
1. we can add `urllib3` to dependency, officially. This actually might be a
good idea for the long run.
2. the retry logic can be quite simple to implement with vanilla python try
catch, we could do it without `urllib3`.
I would recommend 1 in this case.
##########
amber/src/main/python/pytexera/storage/dataset_file_document.py:
##########
@@ -19,9 +19,37 @@
import os
import requests
import urllib.parse
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
class DatasetFileDocument:
+ # (connect, read) timeout and retry settings for the file-service GETs
below.
+ _CONNECT_TIMEOUT_SECONDS = 10
+ _READ_TIMEOUT_SECONDS = 60
+ _REQUEST_TIMEOUT = (_CONNECT_TIMEOUT_SECONDS, _READ_TIMEOUT_SECONDS)
+ _MAX_RETRIES = 3
+ _RETRY_BACKOFF_FACTOR = 0.5
+ _RETRY_STATUS_FORCELIST = (500, 502, 503, 504)
+
+ @classmethod
+ def _build_session(cls) -> requests.Session:
Review Comment:
if you are using this session only for retry, as you mentioned in the doc
string, it might be better to rename it `_retry_session()`
and use with
```
with _retry_session() as session:
// your logic
```
##########
amber/src/main/python/pytexera/storage/dataset_file_document.py:
##########
@@ -19,9 +19,37 @@
import os
import requests
import urllib.parse
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
class DatasetFileDocument:
+ # (connect, read) timeout and retry settings for the file-service GETs
below.
+ _CONNECT_TIMEOUT_SECONDS = 10
+ _READ_TIMEOUT_SECONDS = 60
Review Comment:
I feel generally we can use a shorter timeout. if retry 3 times with 60s
each that's already 3 minutes. let's make it a few seconds or shorter?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]