gaborgsomogyi commented on code in PR #28136:
URL: https://github.com/apache/flink/pull/28136#discussion_r3272532603


##########
flink-filesystems/flink-s3-fs-native/README.md:
##########
@@ -327,6 +327,61 @@ When enabled, file uploads automatically use 
TransferManager for:
 - Better utilization of available bandwidth
 - Lower heap requirements for write operations
 
+## AWS Common Runtime (CRT) Support
+
+The filesystem optionally supports the [AWS Common Runtime 
(CRT)](https://github.com/awslabs/aws-crt-java) HTTP transport
+for higher throughput on large S3 workloads.
+
+When enabled, the CRT transport replaces:
+- **Sync client**: Apache HTTP Client → `AwsCrtHttpClient`
+- **Async client**: Netty NIO → `S3AsyncClient.crtBuilder()` (with built-in 
multipart acceleration)
+
+### Prerequisites
+
+The `aws-crt` artifact contains JNI-linked native libraries whose C-side 
`FindClass` paths are
+hardcoded, making Maven shade relocation incompatible. Therefore **CRT JARs 
are not bundled** in
+the fat JAR and must be placed manually.
+
+### Setup (step-by-step)
+
+1. Download `aws-crt-client-<version>.jar` (`groupId: software.amazon.awssdk`) 
from Maven Central,
+   matching the AWS SDK version used by this module (check 
`fs.s3.aws.sdk.version` in `pom.xml` —
+   e.g. `2.44.4`).
+
+2. Resolve the matching `aws-crt-<version>.jar` (`groupId: 
software.amazon.awssdk.crt`). **Note:**
+   `aws-crt` uses an independent versioning scheme (e.g. `0.33.x`) that does 
**not** track the AWS
+   SDK version. The compatible `aws-crt` version is declared as a dependency in
+   `aws-crt-client-<version>.pom` — open that POM on Maven Central and read the
+   `<dependency>` entry for `software.amazon.awssdk.crt:aws-crt` to find the 
exact version to
+   download.
+
+3. Place both JARs in the Flink plugin directory alongside 
`flink-s3-fs-native.jar`:
+
+   ```bash
+   cp aws-crt-client-<version>.jar $FLINK_HOME/plugins/s3-fs-native/
+   cp aws-crt-<version>.jar        $FLINK_HOME/plugins/s3-fs-native/

Review Comment:
   The version guidance is now correct, but the manual two-step download is 
still error-prone. Consider adding a helper script at 
`flink-filesystems/flink-s3-fs-native/tools/download-crt-jars.sh` that resolves 
and downloads the exact compatible JARs automatically:
   
   ```bash
   #!/usr/bin/env bash
   # Downloads aws-crt-client and aws-crt JARs (excluded from the fat JAR due 
to JNI shading)
   # to ./crt-jars/. Copy the output to $FLINK_HOME/plugins/s3-fs-native/ 
alongside
   # flink-s3-fs-native.jar to enable CRT transport (s3.crt.enabled=true).
   #
   # Usage: download-crt-jars.sh [AWS_CRT_VERSION]
   #   AWS_CRT_VERSION  aws-crt version to download (default: pinned value 
below).
   #                    Update the default when fs.s3.aws.sdk.version is bumped;
   #                    the compatible version is in aws-sdk-java-pom-<sdk>.pom 
-> awscrt.version.
   set -euo pipefail
   
   SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
   SDK_VERSION=$(grep -oP '(?<=<fs.s3.aws.sdk.version>)[^<]+' 
"${SCRIPT_DIR}/../pom.xml")
   AWS_CRT_VERSION=${1:-0.45.1}  # pinned for sdk ${SDK_VERSION}
   
   OUT_DIR="${SCRIPT_DIR}/../crt-jars"
   mkdir -p "${OUT_DIR}"
   
   mvn dependency:copy \
       -Dartifact="software.amazon.awssdk:aws-crt-client:${SDK_VERSION}:jar" \
       -DoutputDirectory="${OUT_DIR}" -Dmdep.useBaseVersion=true
   mvn dependency:copy \
       -Dartifact="software.amazon.awssdk.crt:aws-crt:${AWS_CRT_VERSION}:jar" \
       -DoutputDirectory="${OUT_DIR}" -Dmdep.useBaseVersion=true
   
   echo "Done. Copy ${OUT_DIR}/*.jar to \$FLINK_HOME/plugins/s3-fs-native/."
   ```
   
   The script reads `fs.s3.aws.sdk.version` directly from the module POM so it 
stays in sync automatically. `AWS_CRT_VERSION` can be overridden as a 
positional argument when needed; otherwise the pinned default is used. The 
README setup section should then reference this script instead of the manual 
download steps.



##########
flink-filesystems/flink-s3-fs-native/src/main/java/org/apache/flink/fs/s3native/S3ClientProvider.java:
##########
@@ -326,6 +359,12 @@ public static class Builder {
         // Custom credentials provider class names (comma-separated)
         @Nullable private String credentialsProviderClasses;
 
+        // CRT configuration
+        private boolean useCrt = false;
+        @Nullable private Double crtTargetThroughputGbps = null;
+        private long crtMinPartSizeInBytes =
+                NativeS3FileSystemFactory.PART_UPLOAD_MIN_SIZE.defaultValue();

Review Comment:
   `crtMinPartSizeInBytes` correctly derives its default from 
`NativeS3FileSystemFactory.PART_UPLOAD_MIN_SIZE.defaultValue()`, but the 
pattern is applied inconsistently. The Builder also hardcodes defaults for 
`chunkedEncoding`, `checksumValidation`, `maxConnections`, `connectionTimeout`, 
`socketTimeout`, `connectionMaxIdleTime`, `maxRetries`, `clientCloseTimeout`, 
and others — all of which have a canonical `ConfigOption` definition in 
`NativeS3FileSystemFactory`. If any of those config defaults change, the 
Builder silently drifts.
   
   All Builder field defaults should derive from their corresponding 
`ConfigOption.defaultValue()`, not be re-declared as magic literals. For 
example:
   ```java
   private boolean chunkedEncoding = 
NativeS3FileSystemFactory.CHUNKED_ENCODING_ENABLED.defaultValue();
   private boolean checksumValidation = 
NativeS3FileSystemFactory.CHECKSUM_VALIDATION_ENABLED.defaultValue();
   private int maxConnections = 
NativeS3FileSystemFactory.MAX_CONNECTIONS.defaultValue();
   // etc.
   ```
   This makes `NativeS3FileSystemFactory` the single source of truth for all 
defaults.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to