3dbrows commented on PR #13633:
URL: https://github.com/apache/arrow/pull/13633#issuecomment-1205350024
@pitrou @lidavidm
Thank you for your patience and meticulous review; I very much appreciate
the friendliness and constructiveness shown here, especially as a first-time
contributor, and a C++/Cython noob at that.
Progress since last comments:
* Introduced `GetAwsDefaultRetryStrategy()` and
`GetAwsStandardRetryStrategy()`
* These still have `static_cast<long>(max_attempts)` because the AWS SDK
expects `long` - avoidable? (Would imagine `int64_t` and `long` have same
precision?)
* These live in `S3RetryStrategy` as static methods.
* Possibly controversial: changing default behaviour of `s3fs` to use AWS
Standard retry strategy
* This seems like a more sensible default and should not degrade current
user experience.
* What prompted me to start this PR in the first place was the current
default behaviour of retrying 3x in quick succession, which has suboptimal
mitigation of `503 SlowDown`; I think using exponential backoff instead by
default will improve reliability.
* Still to do:
* Testing.
* The JIRA ticket ARROW-17057 should be set to in-progress but I do not
seem to have priveliges to do that.
Questions:
* Should I be defining `S3RetryStrategy::GetAwsDefaultRetryStrategy` or
just `GetAwsDefaultRetryStrategy`? (`s3fs.cc`)
* Need to figure out how to encapsulate `retry_strategy` and
`retry_max_attempts` nicely (avoid two parameters).
* How to handle `S3FileSystem._reconstruct()`: need to pass in the retry
parameters?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]