gaborgsomogyi opened a new pull request, #28236:
URL: https://github.com/apache/flink/pull/28236
## What is the purpose of the change
The `flink-s3-fs-native` connector was using the deprecated `RetryPolicy`
API from AWS SDK v2 in legacy retry mode. The legacy mode does not distinguish
throttling errors (HTTP 429) from transient server errors, applies no
token-bucket circuit breaking, and has been superseded by the `RetryStrategy`
API since SDK v2.25. S3-compatible storage systems commonly signal throttling
via HTTP 429, which the legacy mode silently dropped without retrying.
This pull request replaces the deprecated `RetryPolicy` with
`StandardRetryStrategy` from the non-deprecated AWS SDK v2 retries API. The
standard strategy natively handles both HTTP 429 and 503 as throttling events
with a dedicated backoff path, applies full-jitter exponential backoff
separately for throttle and non-throttle errors, and uses a token-bucket
circuit breaker to prevent retry storms under sustained load. The key backoff
parameters are exposed as Flink configuration options so operators can tune
retry behavior for their storage backend.
## Brief change log
- Replace deprecated `RetryPolicy.builder().numRetries(n)` with
`StandardRetryStrategy` via `ClientOverrideConfiguration.retryStrategy()` in
`S3ClientProvider`
- Add three new Flink config options (defaults match the SDK standard
strategy defaults):
- `s3.retry.base-delay` (default: 100ms) — backoff base delay for
non-throttle retries
- `s3.retry.throttle.base-delay` (default: 1s) — backoff base delay for
throttle retries (HTTP 429/503)
- `s3.retry.max-backoff` (default: 20s) — shared exponential backoff cap
for both retry paths
- Builder defaults for the new options reference the `ConfigOption` default
values directly, keeping a single source of truth
- Existing `s3.retry.max-num-retries` (default: 3) is preserved and maps to
`maxAttempts = maxRetries + 1`
## Verifying this change
This change added tests and can be verified as follows:
- Added 6 tests to `NativeS3FileSystemFactoryTest` covering default values
and explicit configuration for each of the three new `Duration` config options,
following the same pattern as the existing `testMaxRetriesDefault` /
`testMaxRetriesExplicitlyConfigured` tests
- Added 1 test to `S3ClientProviderTest`
(`testRetryBuilderDefaultsMatchConfigOptions`) verifying that `Builder` field
defaults and `ConfigOption` defaults stay in sync
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
(`software.amazon.awssdk:retries` was already a transitive dependency via
`sdk-core`)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: no
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: yes
## Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? JavaDocs on the new
`ConfigOption` fields in `NativeS3FileSystemFactory`
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes (Claude Sonnet 4.6)
<!--
Generated-by: Claude Sonnet 4.6
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]