>From Ritik Raj <[email protected]>:
Ritik Raj has uploaded this change for review. (
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21198?usp=email )
Change subject: [ASTERIXDB-3768][CLOUD] Add configurable S3 checksum behavior
for S3-compatible storage
......................................................................
[ASTERIXDB-3768][CLOUD] Add configurable S3 checksum behavior for S3-compatible
storage
- user model changes: no
- storage format changes: no
- interface changes: no
AWS SDK Java v2 >= 2.30.0 introduced new cross-SDK checksum defaults
(WHEN_SUPPORTED) that break S3-compatible storage solutions (e.g. OCI)
which do not support the newer checksum APIs.
Add a new CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR configuration option
(values: when_required | when_supported | auto) with a smart default:
- when_required: when a custom endpoint is configured (S3-compatible)
- auto (SDK default): when using native AWS S3
ext-ref: MB-71732
Change-Id: If6618d3a336e9bf134efb1f219660421edc27c43
---
M
asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1/cluster_state_1.1.regexadm
M
asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_full/cluster_state_1_full.1.regexadm
M
asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_less/cluster_state_1_less.1.regexadm
M
asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
M
asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3CloudClient.java
M
asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/CloudProperties.java
M
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
A docs/superpowers/plans/2026-05-05-s3-checksum-behavior.md
A docs/superpowers/specs/2026-05-05-s3-checksum-behavior-design.md
M
hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/io/ICloudProperties.java
10 files changed, 590 insertions(+), 19 deletions(-)
git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb
refs/changes/98/21198/1
diff --git
a/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1/cluster_state_1.1.regexadm
b/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1/cluster_state_1.1.regexadm
index 10c1856..cffa1fa 100644
---
a/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1/cluster_state_1.1.regexadm
+++
b/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1/cluster_state_1.1.regexadm
@@ -40,6 +40,7 @@
"cloud.storage.prefix" : "",
"cloud.storage.region" : "",
"cloud.storage.s3.access.key.id" : null,
+ "cloud.storage.s3.checksum.behavior" : "auto",
"cloud.storage.s3.client.read.timeout" : -1,
"cloud.storage.s3.parallel.downloader.client.type" : "crt",
"cloud.storage.s3.secret.access.key" : null,
diff --git
a/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_full/cluster_state_1_full.1.regexadm
b/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_full/cluster_state_1_full.1.regexadm
index 22c4fc8..6f93933 100644
---
a/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_full/cluster_state_1_full.1.regexadm
+++
b/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_full/cluster_state_1_full.1.regexadm
@@ -40,6 +40,7 @@
"cloud.storage.prefix" : "",
"cloud.storage.region" : "",
"cloud.storage.s3.access.key.id" : null,
+ "cloud.storage.s3.checksum.behavior" : "auto",
"cloud.storage.s3.client.read.timeout" : -1,
"cloud.storage.s3.parallel.downloader.client.type" : "crt",
"cloud.storage.s3.secret.access.key" : null,
diff --git
a/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_less/cluster_state_1_less.1.regexadm
b/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_less/cluster_state_1_less.1.regexadm
index a36d3b5..b01ff20 100644
---
a/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_less/cluster_state_1_less.1.regexadm
+++
b/asterixdb/asterix-app/src/test/resources/runtimets/results/api/cluster_state_1_less/cluster_state_1_less.1.regexadm
@@ -40,6 +40,7 @@
"cloud.storage.prefix" : "",
"cloud.storage.region" : "",
"cloud.storage.s3.access.key.id" : null,
+ "cloud.storage.s3.checksum.behavior" : "auto",
"cloud.storage.s3.client.read.timeout" : -1,
"cloud.storage.s3.parallel.downloader.client.type" : "crt",
"cloud.storage.s3.secret.access.key" : null,
diff --git
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
index dd8485c..afca2e81 100644
---
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
+++
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java
@@ -56,12 +56,14 @@
private final boolean roundRobinDnsResolver;
private final String accessKeyId;
private final String secretAccessKey;
+ private final ICloudProperties.S3ChecksumBehavior checksumBehavior;
public S3ClientConfig(String region, String endpoint, String prefix,
boolean anonymousAuth,
Collection<String> certificates, long profilerLogInterval, int
writeBufferSize,
S3ParallelDownloaderClientType parallelDownloaderClientType,
boolean roundRobinDnsResolver) {
this(region, endpoint, prefix, anonymousAuth, certificates,
profilerLogInterval, writeBufferSize, 1, 0, 0, 0,
- false, false, 0, 0, -1, parallelDownloaderClientType,
roundRobinDnsResolver, "", "");
+ false, false, 0, 0, -1, parallelDownloaderClientType,
roundRobinDnsResolver, "", "",
+ ICloudProperties.S3ChecksumBehavior.WHEN_REQUIRED);
}
private S3ClientConfig(String region, String endpoint, String prefix,
boolean anonymousAuth,
@@ -70,7 +72,7 @@
boolean forcePathStyle, boolean disableSslVerify, int
requestsMaxPendingHttpConnections,
int requestsHttpConnectionAcquireTimeout, int
s3ReadTimeoutInSeconds,
S3ParallelDownloaderClientType parallelDownloaderClientType,
boolean roundRobinDnsResolver,
- String accessKeyId, String secretAccessKey) {
+ String accessKeyId, String secretAccessKey,
ICloudProperties.S3ChecksumBehavior checksumBehavior) {
this.region = Objects.requireNonNull(region, "region");
this.endpoint = endpoint;
this.prefix = Objects.requireNonNull(prefix, "prefix");
@@ -91,6 +93,7 @@
this.roundRobinDnsResolver = roundRobinDnsResolver;
this.accessKeyId = accessKeyId;
this.secretAccessKey = secretAccessKey;
+ this.checksumBehavior = Objects.requireNonNull(checksumBehavior,
"checksumBehavior");
}
public static S3ClientConfig of(ICloudProperties cloudProperties) {
@@ -104,7 +107,7 @@
cloudProperties.getRequestsHttpConnectionAcquireTimeout(),
cloudProperties.getS3ReadTimeoutInSeconds(),
S3ParallelDownloaderClientType.valueOf(cloudProperties.getS3ParallelDownloaderClientType()),
cloudProperties.useRoundRobinDnsResolver(),
cloudProperties.getS3AccessKeyId(),
- cloudProperties.getS3SecretAccessKey());
+ cloudProperties.getS3SecretAccessKey(),
cloudProperties.getS3ChecksumBehavior());
}
public enum S3ParallelDownloaderClientType {
@@ -126,7 +129,9 @@
}
public static S3ClientConfig of(Map<String, String> configuration, int
writeBufferSize) {
- // Used to determine local vs. actual S3
+ // Used to determine local vs. actual S3.
+ // checksumBehavior defaults to "when_required" via the convenience
constructor —
+ // appropriate here since a custom endpoint is always present.
String endPoint =
configuration.getOrDefault(AwsConstants.SERVICE_END_POINT_FIELD_NAME, "");
// Disabled
long profilerLogInterval = 0;
@@ -227,4 +232,8 @@
public boolean useRoundRobinDnsResolver() {
return roundRobinDnsResolver;
}
+
+ public ICloudProperties.S3ChecksumBehavior getChecksumBehavior() {
+ return checksumBehavior;
+ }
}
diff --git
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3CloudClient.java
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3CloudClient.java
index de69617..e1fa4e9 100644
---
a/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3CloudClient.java
+++
b/asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3CloudClient.java
@@ -52,6 +52,7 @@
import org.apache.asterix.common.exceptions.RuntimeDataException;
import org.apache.asterix.external.util.aws.AwsUtils;
import org.apache.asterix.external.util.aws.AwsUtils.CloseableAwsClients;
+import org.apache.asterix.external.util.aws.s3.S3Utils;
import org.apache.hyracks.api.exceptions.HyracksDataException;
import org.apache.hyracks.api.io.FileReference;
import org.apache.hyracks.api.util.IoUtil;
@@ -382,6 +383,7 @@
builder.credentialsProvider(credentialsProvider);
builder.region(Region.of(config.getRegion()));
builder.forcePathStyle(config.isForcePathStyle());
+ S3Utils.applyChecksumBehavior(builder, config.getChecksumBehavior());
AttributeMap.Builder customHttpConfigBuilder = AttributeMap.builder();
if (config.getRequestsMaxHttpConnections() > 0) {
diff --git
a/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/CloudProperties.java
b/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/CloudProperties.java
index 98c54f3..c4d040f 100644
---
a/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/CloudProperties.java
+++
b/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/CloudProperties.java
@@ -100,7 +100,12 @@
CLOUD_STORAGE_S3_USE_ROUND_ROBIN_DNS_RESOLVER(BOOLEAN, false),
CLOUD_STORAGE_S3_ACCESS_KEY_ID(STRING, (String) null),
CLOUD_STORAGE_S3_SECRET_ACCESS_KEY(STRING, (String) null),
- CLOUD_STORAGE_AZURE_CLIENT_ID(STRING, (String) null),;
+ CLOUD_STORAGE_AZURE_CLIENT_ID(STRING, (String) null),
+ CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR(STRING,
(Function<IApplicationConfig, String>) app -> {
+ String endpoint = app.getString(CLOUD_STORAGE_ENDPOINT);
+ return (endpoint == null || endpoint.isEmpty()) ?
ICloudProperties.S3ChecksumBehavior.AUTO.name()
+ : ICloudProperties.S3ChecksumBehavior.WHEN_REQUIRED.name();
+ });
private final IOptionType interpreter;
private final Object defaultValue;
@@ -151,6 +156,7 @@
case CLOUD_STORAGE_S3_ACCESS_KEY_ID:
case CLOUD_STORAGE_S3_SECRET_ACCESS_KEY:
case CLOUD_STORAGE_AZURE_CLIENT_ID:
+ case CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR:
return Section.COMMON;
default:
throw new IllegalStateException("NYI: " + this);
@@ -240,6 +246,13 @@
return "The S3 secret access key for static credential
authentication (defaults to null, which indicates to use default credential
chain)";
case CLOUD_STORAGE_AZURE_CLIENT_ID:
return "The Azure user managed identity client ID
(defaults to null, which takes the system managed identity client ID)";
+ case CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR:
+ return "The checksum behavior for S3 requests and
responses. Accepted values: "
+ + "'when_required' (only checksums mandated by the
operation), "
+ + "'when_supported' (checksums on all eligible
operations, SDK >= 2.30 default), "
+ + "'auto' (no explicit override, defer to SDK
default). "
+ + "Defaults to 'when_required' when a custom
endpoint is configured "
+ + "(S3-compatible stores), 'auto' for native AWS
S3.";
default:
throw new IllegalStateException("NYI: " + this);
}
@@ -260,6 +273,9 @@
if (this == CLOUD_STORAGE_S3_PARALLEL_DOWNLOADER_CLIENT_TYPE) {
return "crt if no custom endpoint is set; async otherwise";
}
+ if (this == CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR) {
+ return "when_required if a custom endpoint is set; auto
otherwise";
+ }
return IOption.super.usageDefaultOverride(accessor, optionPrinter);
}
@@ -406,4 +422,10 @@
public String getAzureClientId() {
return accessor.getString(Option.CLOUD_STORAGE_AZURE_CLIENT_ID);
}
+
+ // Parses the stored string value to the S3ChecksumBehavior enum
+ public ICloudProperties.S3ChecksumBehavior getS3ChecksumBehavior() {
+ return ICloudProperties.S3ChecksumBehavior
+
.fromString(accessor.getString(Option.CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR));
+ }
}
diff --git
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
index 07424b1..262f9bf 100644
---
a/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
+++
b/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java
@@ -106,7 +106,7 @@
import org.apache.hyracks.api.exceptions.IWarningCollector;
import org.apache.hyracks.api.exceptions.SourceLocation;
import org.apache.hyracks.api.exceptions.Warning;
-import org.apache.hyracks.util.annotations.AiProvenance;
+import org.apache.hyracks.cloud.io.ICloudProperties;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
@@ -185,24 +185,29 @@
} else if (certificates != null && !certificates.isBlank()) {
builder.httpClient(createHttpClient(certificates));
}
- if (serviceEndpoint != null) {
- configureS3CompatibleSettings(serviceEndpoint, builder);
- }
+ applyChecksumBehavior(builder,
appCtx.getCloudProperties().getS3ChecksumBehavior());
awsClients.setConsumingClient(builder.build());
return awsClients;
}
- @AiProvenance(agent = AiProvenance.Agent.CLAUDE_SONNET_4_6, tool =
AiProvenance.Tool.GITHUB_COPILOT)
- private static void configureS3CompatibleSettings(String serviceEndpoint,
S3ClientBuilder builder) {
- // AWS SDK 2.43+ sends CRC64NVME request checksums by default for all
eligible operations.
- // S3-compatible endpoints (non-AWS) and older mock servers do not
understand this header and
- // may reject or mishandle requests, returning empty or error
responses. When a custom endpoint
- // is configured (i.e. not talking to real AWS S3), disable automatic
checksum calculation so
- // only operations that explicitly require a checksum will include one.
- if (serviceEndpoint != null) {
-
builder.requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED);
-
builder.responseChecksumValidation(ResponseChecksumValidation.WHEN_REQUIRED);
+ public static void applyChecksumBehavior(S3ClientBuilder builder,
ICloudProperties.S3ChecksumBehavior behavior) {
+ if (behavior == null) {
+ LOGGER.warn("checksumBehavior is null; falling back to SDK
defaults.");
+ return;
+ }
+ switch (behavior) {
+ case WHEN_REQUIRED:
+
builder.requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED);
+
builder.responseChecksumValidation(ResponseChecksumValidation.WHEN_REQUIRED);
+ break;
+ case WHEN_SUPPORTED:
+
builder.requestChecksumCalculation(RequestChecksumCalculation.WHEN_SUPPORTED);
+
builder.responseChecksumValidation(ResponseChecksumValidation.WHEN_SUPPORTED);
+ break;
+ case AUTO:
+ // leave SDK defaults untouched
+ break;
}
}
diff --git a/docs/superpowers/plans/2026-05-05-s3-checksum-behavior.md
b/docs/superpowers/plans/2026-05-05-s3-checksum-behavior.md
new file mode 100644
index 0000000..dab1945
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-05-s3-checksum-behavior.md
@@ -0,0 +1,384 @@
+# S3 Checksum Behavior Configuration Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use
superpowers:subagent-driven-development (recommended) or
superpowers:executing-plans to implement this plan task-by-task. Steps use
checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Add a configurable `CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR` option
(values: `when_required` | `when_supported` | `auto`) to `CloudProperties` that
controls AWS SDK checksum behavior for both the blob storage and external-links
S3 client, defaulting to `when_required` when a custom endpoint is set and
`auto` otherwise.
+
+**Architecture:** The option lives in `CloudProperties` and is surfaced
through the `ICloudProperties` interface. For blob storage it flows through
`S3ClientConfig` → `S3CloudClient.buildClient`. For external links it is read
via `appCtx.getCloudProperties()` inside `S3Utils.buildClient`, replacing the
current hardcoded `configureS3CompatibleSettings` method.
+
+**Tech Stack:** Java 17, AWS SDK v2
(`software.amazon.awssdk.core.checksums.RequestChecksumCalculation`,
`ResponseChecksumValidation`), JUnit 4
+
+---
+
+## Files to Change
+
+| File | Action |
+|---|---|
+|
`hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/io/ICloudProperties.java`
| Add `getS3ChecksumBehavior()` method |
+|
`asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/CloudProperties.java`
| Add `CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR` option + accessor |
+|
`asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java`
| Add `checksumBehavior` field + getter; wire into constructors +
`of(ICloudProperties)` |
+|
`asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3CloudClient.java`
| Add `applyChecksumBehavior` helper; call it in `buildClient` |
+|
`asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java`
| Replace `configureS3CompatibleSettings` with `applyChecksumBehavior` helper;
read setting from `appCtx.getCloudProperties()` |
+
+---
+
+## Task 1: Add `getS3ChecksumBehavior()` to `ICloudProperties`
+
+**Files:**
+- Modify:
`hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/io/ICloudProperties.java`
+
+- [ ] **Step 1: Add the method to the interface**
+
+Open `ICloudProperties.java`. After the existing `getAzureClientId()` method,
add:
+
+```java
+ String getS3ChecksumBehavior();
+```
+
+The interface tail should look like:
+
+```java
+ String getS3AccessKeyId();
+
+ String getS3SecretAccessKey();
+
+ String getAzureClientId();
+
+ String getS3ChecksumBehavior();
+}
+```
+
+- [ ] **Step 2: Build to confirm no compilation errors**
+
+```bash
+cd asterixdb
+mvn compile -pl hyracks-fullstack/hyracks/hyracks-cloud -am -q
+```
+
+Expected: `BUILD SUCCESS` (fails are acceptable here since implementors
haven't been updated yet — just checking syntax)
+
+---
+
+## Task 2: Add `CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR` to `CloudProperties`
+
+**Files:**
+- Modify:
`asterixdb/asterix-common/src/main/java/org/apache/asterix/common/config/CloudProperties.java`
+
+- [ ] **Step 1: Add the enum constant**
+
+In the `Option` enum, change the last three constants from:
+
+```java
+ CLOUD_STORAGE_S3_ACCESS_KEY_ID(STRING, (String) null),
+ CLOUD_STORAGE_S3_SECRET_ACCESS_KEY(STRING, (String) null),
+ CLOUD_STORAGE_AZURE_CLIENT_ID(STRING, (String) null);
+```
+
+To:
+
+```java
+ CLOUD_STORAGE_S3_ACCESS_KEY_ID(STRING, (String) null),
+ CLOUD_STORAGE_S3_SECRET_ACCESS_KEY(STRING, (String) null),
+ CLOUD_STORAGE_AZURE_CLIENT_ID(STRING, (String) null),
+ CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR(STRING,
(Function<IApplicationConfig, String>) app -> {
+ String endpoint = app.getString(CLOUD_STORAGE_ENDPOINT);
+ return (endpoint == null || endpoint.isEmpty()) ? "auto" :
"when_required";
+ });
+```
+
+- [ ] **Step 2: Add the case to `section()`**
+
+In the `section()` switch statement, add before `default`:
+
+```java
+ case CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR:
+ return Section.COMMON;
+```
+
+- [ ] **Step 3: Add the case to `description()`**
+
+In the `description()` switch statement, add before `default`:
+
+```java
+ case CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR:
+ return "The checksum behavior for S3 requests and
responses. Accepted values: "
+ + "'when_required' (only checksums mandated by the
operation), "
+ + "'when_supported' (checksums on all eligible
operations, SDK >= 2.30 default), "
+ + "'auto' (no explicit override, defer to SDK
default). "
+ + "Defaults to 'when_required' when a custom
endpoint is configured "
+ + "(S3-compatible stores), 'auto' for native AWS
S3.";
+```
+
+- [ ] **Step 4: Add `usageDefaultOverride`**
+
+In the `usageDefaultOverride` method, add alongside the existing check:
+
+```java
+ @Override
+ public String usageDefaultOverride(IApplicationConfig accessor,
Function<IOption, String> optionPrinter) {
+ if (this == CLOUD_STORAGE_S3_PARALLEL_DOWNLOADER_CLIENT_TYPE) {
+ return "crt if no custom endpoint is set; async otherwise";
+ }
+ if (this == CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR) {
+ return "when_required if a custom endpoint is set; auto
otherwise";
+ }
+ return IOption.super.usageDefaultOverride(accessor, optionPrinter);
+ }
+```
+
+- [ ] **Step 5: Add the accessor method**
+
+At the end of the class body (after `getAzureClientId()`), add:
+
+```java
+ public String getS3ChecksumBehavior() {
+ return accessor.getString(Option.CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR);
+ }
+```
+
+- [ ] **Step 6: Build to confirm no compilation errors**
+
+```bash
+cd asterixdb
+mvn compile -pl asterixdb/asterix-common -am -q
+```
+
+Expected: `BUILD SUCCESS`
+
+---
+
+## Task 3: Add `checksumBehavior` to `S3ClientConfig`
+
+**Files:**
+- Modify:
`asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3ClientConfig.java`
+
+- [ ] **Step 1: Add the field**
+
+After the `secretAccessKey` field, add:
+
+```java
+ private final String checksumBehavior;
+```
+
+- [ ] **Step 2: Update the private all-args constructor**
+
+Change the constructor signature to add `String checksumBehavior` as the last
parameter, and assign it in the body:
+
+```java
+ private S3ClientConfig(String region, String endpoint, String prefix,
boolean anonymousAuth,
+ Collection<String> certificates, long profilerLogInterval, int
writeBufferSize, long tokenAcquireTimeout,
+ int writeMaxRequestsPerSeconds, int readMaxRequestsPerSeconds, int
requestsMaxHttpConnections,
+ boolean forcePathStyle, boolean disableSslVerify, int
requestsMaxPendingHttpConnections,
+ int requestsHttpConnectionAcquireTimeout, int
s3ReadTimeoutInSeconds,
+ S3ParallelDownloaderClientType parallelDownloaderClientType,
boolean roundRobinDnsResolver,
+ String accessKeyId, String secretAccessKey, String
checksumBehavior) {
+ // ... existing assignments ...
+ this.accessKeyId = accessKeyId;
+ this.secretAccessKey = secretAccessKey;
+ this.checksumBehavior = Objects.requireNonNull(checksumBehavior,
"checksumBehavior");
+ }
+```
+
+- [ ] **Step 3: Update the public convenience constructor**
+
+The public constructor delegates to private with `"", ""` at end. Add
`"when_required"` as the checksumBehavior default (this constructor is used
when an endpoint is always passed):
+
+```java
+ public S3ClientConfig(String region, String endpoint, String prefix,
boolean anonymousAuth,
+ Collection<String> certificates, long profilerLogInterval, int
writeBufferSize,
+ S3ParallelDownloaderClientType parallelDownloaderClientType,
boolean roundRobinDnsResolver) {
+ this(region, endpoint, prefix, anonymousAuth, certificates,
profilerLogInterval, writeBufferSize, 1, 0, 0, 0,
+ false, false, 0, 0, -1, parallelDownloaderClientType,
roundRobinDnsResolver, "", "",
+ "when_required");
+ }
+```
+
+- [ ] **Step 4: Update `of(ICloudProperties)`**
+
+Pass `cloudProperties.getS3ChecksumBehavior()` as the last argument:
+
+```java
+ public static S3ClientConfig of(ICloudProperties cloudProperties) {
+ return new S3ClientConfig(cloudProperties.getStorageRegion(),
cloudProperties.getStorageEndpoint(),
+ cloudProperties.getStoragePrefix(),
cloudProperties.isStorageAnonymousAuth(),
+ cloudProperties.getStorageCertificates(),
cloudProperties.getProfilerLogInterval(),
+ cloudProperties.getWriteBufferSize(),
cloudProperties.getTokenAcquireTimeout(),
+ cloudProperties.getWriteMaxRequestsPerSecond(),
cloudProperties.getReadMaxRequestsPerSecond(),
+ cloudProperties.getRequestsMaxHttpConnections(),
cloudProperties.isStorageForcePathStyle(),
+ cloudProperties.isStorageDisableSSLVerify(),
cloudProperties.getRequestsMaxPendingHttpConnections(),
+ cloudProperties.getRequestsHttpConnectionAcquireTimeout(),
cloudProperties.getS3ReadTimeoutInSeconds(),
+
S3ParallelDownloaderClientType.valueOf(cloudProperties.getS3ParallelDownloaderClientType()),
+ cloudProperties.useRoundRobinDnsResolver(),
cloudProperties.getS3AccessKeyId(),
+ cloudProperties.getS3SecretAccessKey(),
cloudProperties.getS3ChecksumBehavior());
+ }
+```
+
+- [ ] **Step 5: Add the getter**
+
+After `useRoundRobinDnsResolver()`, add:
+
+```java
+ public String getChecksumBehavior() {
+ return checksumBehavior;
+ }
+```
+
+- [ ] **Step 6: Build to confirm no compilation errors**
+
+```bash
+cd asterixdb
+mvn compile -pl asterixdb/asterix-cloud -am -q
+```
+
+Expected: `BUILD SUCCESS`
+
+---
+
+## Task 4: Apply checksum behavior in `S3CloudClient.buildClient`
+
+**Files:**
+- Modify:
`asterixdb/asterix-cloud/src/main/java/org/apache/asterix/cloud/clients/aws/s3/S3CloudClient.java`
+
+- [ ] **Step 1: Add SDK checksum imports**
+
+Add alongside existing AWS SDK imports:
+
+```java
+import software.amazon.awssdk.core.checksums.RequestChecksumCalculation;
+import software.amazon.awssdk.core.checksums.ResponseChecksumValidation;
+```
+
+- [ ] **Step 2: Call `applyChecksumBehavior` in `buildClient`**
+
+In `buildClient`, immediately after
`builder.forcePathStyle(config.isForcePathStyle());`, add:
+
+```java
+ applyChecksumBehavior(builder, config.getChecksumBehavior());
+```
+
+- [ ] **Step 3: Add the helper method**
+
+Add as a private static method near `buildClient`:
+
+```java
+ private static void applyChecksumBehavior(S3ClientBuilder builder, String
behavior) {
+ switch (behavior.toLowerCase()) {
+ case "when_required":
+
builder.requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED);
+
builder.responseChecksumValidation(ResponseChecksumValidation.WHEN_REQUIRED);
+ break;
+ case "when_supported":
+
builder.requestChecksumCalculation(RequestChecksumCalculation.WHEN_SUPPORTED);
+
builder.responseChecksumValidation(ResponseChecksumValidation.WHEN_SUPPORTED);
+ break;
+ case "auto":
+ default:
+ // leave SDK defaults untouched
+ break;
+ }
+ }
+```
+
+- [ ] **Step 4: Build and run the existing cloud S3 tests**
+
+```bash
+cd asterixdb
+mvn test -pl asterixdb/asterix-cloud -Dtest=LSMS3Test -q
+```
+
+Expected: `BUILD SUCCESS` and all tests pass.
+
+---
+
+## Task 5: Replace hardcoded checksum logic in `S3Utils`
+
+**Files:**
+- Modify:
`asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/aws/s3/S3Utils.java`
+
+- [ ] **Step 1: Replace the call site in `buildClient`**
+
+Remove this block:
+
+```java
+ if (serviceEndpoint != null) {
+ configureS3CompatibleSettings(serviceEndpoint, builder);
+ }
+```
+
+Add in its place:
+
+```java
+ applyChecksumBehavior(builder,
appCtx.getCloudProperties().getS3ChecksumBehavior());
+```
+
+- [ ] **Step 2: Remove `configureS3CompatibleSettings`**
+
+Delete the entire `configureS3CompatibleSettings` private static method
(including its `@AiProvenance` annotation).
+
+- [ ] **Step 3: Add `applyChecksumBehavior` helper**
+
+Add as a new private static method:
+
+```java
+ private static void applyChecksumBehavior(S3ClientBuilder builder, String
behavior) {
+ switch (behavior.toLowerCase()) {
+ case "when_required":
+
builder.requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED);
+
builder.responseChecksumValidation(ResponseChecksumValidation.WHEN_REQUIRED);
+ break;
+ case "when_supported":
+
builder.requestChecksumCalculation(RequestChecksumCalculation.WHEN_SUPPORTED);
+
builder.responseChecksumValidation(ResponseChecksumValidation.WHEN_SUPPORTED);
+ break;
+ case "auto":
+ default:
+ // leave SDK defaults untouched
+ break;
+ }
+ }
+```
+
+- [ ] **Step 4: Verify `ResponseChecksumValidation` import**
+
+Confirm these imports exist at the top of `S3Utils.java` (they should already
be present):
+
+```java
+import software.amazon.awssdk.core.checksums.RequestChecksumCalculation;
+import software.amazon.awssdk.core.checksums.ResponseChecksumValidation;
+```
+
+- [ ] **Step 5: Build to confirm no compilation errors**
+
+```bash
+cd asterixdb
+mvn compile -pl asterixdb/asterix-external-data -am -q
+```
+
+Expected: `BUILD SUCCESS`
+
+---
+
+## Task 6: Final verification
+
+- [ ] **Step 1: Full build of all affected modules**
+
+```bash
+cd asterixdb
+mvn compile \
+ -pl hyracks-fullstack/hyracks/hyracks-cloud \
+ -pl asterixdb/asterix-common \
+ -pl asterixdb/asterix-cloud \
+ -pl asterixdb/asterix-external-data \
+ -am -q
+```
+
+Expected: `BUILD SUCCESS`
+
+- [ ] **Step 2: Run tests across all affected modules**
+
+```bash
+cd asterixdb
+mvn test -pl asterixdb/asterix-cloud,asterixdb/asterix-external-data -q
+```
+
+Expected: `BUILD SUCCESS` and all tests pass.
diff --git a/docs/superpowers/specs/2026-05-05-s3-checksum-behavior-design.md
b/docs/superpowers/specs/2026-05-05-s3-checksum-behavior-design.md
new file mode 100644
index 0000000..7aa2567
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-05-s3-checksum-behavior-design.md
@@ -0,0 +1,122 @@
+# S3 Checksum Behavior Configuration
+
+**Date:** 2026-05-05
+**Status:** Approved
+
+## Problem
+
+AWS SDK Java v2 ≥ 2.30.0 changed its default checksum behavior:
+
+- `requestChecksumCalculation = WHEN_SUPPORTED` — SDK sends checksums on all
eligible S3 operations
+- `responseChecksumValidation = WHEN_SUPPORTED` — SDK validates response
checksums when present
+
+This is the correct behavior for native AWS S3 but breaks S3-compatible
storage solutions (e.g. OCI) that do not understand or support these checksum
headers, causing request failures.
+
+AsterixDB uses S3 in two code paths:
+1. **Blob storage** — `S3CloudClient` / `S3ClientConfig` (configured via
`CloudProperties`)
+2. **S3 external links** — `S3Utils.buildClient` (configured per-query via
`Map<String, String>`, but `appCtx.getCloudProperties()` is available at build
time)
+
+The external-data path already has a hardcoded workaround
(`configureS3CompatibleSettings`) that sets `WHEN_REQUIRED` when a custom
endpoint is present. The blob storage path has no checksum configuration at
all. Neither path is user-configurable.
+
+## Goal
+
+Add a single, configurable option `CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR` to
`CloudProperties` that controls checksum behavior for both code paths, with a
smart default that is safe for S3-compatible stores out of the box.
+
+## Design
+
+### Config Option (`CloudProperties`)
+
+```java
+CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR(STRING, (Function<IApplicationConfig,
String>) app -> {
+ String endpoint = app.getString(CLOUD_STORAGE_ENDPOINT);
+ return (endpoint == null || endpoint.isEmpty()) ? "auto" : "when_required";
+})
+```
+
+**Accepted values:**
+
+| Value | Request checksum | Response validation | Use case |
+|---|---|---|---|
+| `when_required` | `WHEN_REQUIRED` | `WHEN_REQUIRED` | S3-compatible stores
(OCI, MinIO, etc.) |
+| `when_supported` | `WHEN_SUPPORTED` | `WHEN_SUPPORTED` | Explicitly enable
SDK ≥ 2.30 default |
+| `auto` | *(SDK default)* | *(SDK default)* | Native AWS S3 without explicit
override |
+
+**Default logic:** `when_required` if `CLOUD_STORAGE_ENDPOINT` is non-empty
(S3-compatible), `auto` otherwise (native AWS). This matches the existing
hardcoded behavior in `S3Utils` and makes it configurable.
+
+**Usage override string:** `when_required if a custom endpoint is set; auto
otherwise`
+
+### Interface (`ICloudProperties`)
+
+Add one method:
+```java
+String getS3ChecksumBehavior();
+```
+
+### Config accessor (`CloudProperties`)
+
+```java
+public String getS3ChecksumBehavior() {
+ return accessor.getString(Option.CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR);
+}
+```
+
+### Config carrier (`S3ClientConfig`)
+
+- Add `String checksumBehavior` field
+- Add to the private all-args constructor
+- Add getter `getChecksumBehavior()`
+- `of(ICloudProperties)` factory reads
`cloudProperties.getS3ChecksumBehavior()`
+- `of(Map<String, String>, int)` overload (external writer) does **not** set
this field — the external-data path reads it directly from
`appCtx.getCloudProperties()` at build time
+
+### Client build — blob storage (`S3CloudClient.buildClient`)
+
+After `builder.forcePathStyle(...)`, add:
+```java
+applyChecksumBehavior(builder, config.getChecksumBehavior());
+```
+
+Add private static helper:
+```java
+private static void applyChecksumBehavior(S3ClientBuilder builder, String
behavior) {
+ switch (behavior.toLowerCase()) {
+ case "when_required":
+
builder.requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED);
+
builder.responseChecksumValidation(ResponseChecksumValidation.WHEN_REQUIRED);
+ break;
+ case "when_supported":
+
builder.requestChecksumCalculation(RequestChecksumCalculation.WHEN_SUPPORTED);
+
builder.responseChecksumValidation(ResponseChecksumValidation.WHEN_SUPPORTED);
+ break;
+ case "auto":
+ default:
+ // leave SDK defaults untouched
+ break;
+ }
+}
+```
+
+### Client build — external links (`S3Utils.buildClient`)
+
+Remove `configureS3CompatibleSettings(serviceEndpoint, builder)` and replace
with:
+```java
+String checksumBehavior = appCtx.getCloudProperties().getS3ChecksumBehavior();
+applyChecksumBehavior(builder, checksumBehavior);
+```
+
+Add the same `applyChecksumBehavior` helper to `S3Utils`.
+Remove the now-dead `configureS3CompatibleSettings` method.
+
+## Files Changed
+
+| File | Change |
+|---|---|
+| `asterix-common/.../config/CloudProperties.java` | Add
`CLOUD_STORAGE_S3_CHECKSUM_BEHAVIOR` option + accessor |
+| `hyracks-cloud/.../io/ICloudProperties.java` | Add `getS3ChecksumBehavior()`
|
+| `asterix-cloud/.../aws/s3/S3ClientConfig.java` | Add `checksumBehavior`
field + getter + wire into `of(ICloudProperties)` |
+| `asterix-cloud/.../aws/s3/S3CloudClient.java` | Call `applyChecksumBehavior`
in `buildClient` |
+| `asterix-external-data/.../aws/s3/S3Utils.java` | Replace
`configureS3CompatibleSettings` with `applyChecksumBehavior` |
+
+## Non-Goals
+
+- Per-dataset checksum override (each external dataset specifying its own
behavior)
+- Configuring request and response checksums independently
diff --git
a/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/io/ICloudProperties.java
b/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/io/ICloudProperties.java
index 103c1b3..33a860d 100644
---
a/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/io/ICloudProperties.java
+++
b/hyracks-fullstack/hyracks/hyracks-cloud/src/main/java/org/apache/hyracks/cloud/io/ICloudProperties.java
@@ -94,5 +94,29 @@
String getS3SecretAccessKey();
+ /**
+ * Valid values for {@link #getS3ChecksumBehavior()}.
+ */
+ enum S3ChecksumBehavior {
+ WHEN_REQUIRED,
+ WHEN_SUPPORTED,
+ AUTO;
+
+ /** Parses the config string (case-insensitive). Returns {@code null}
if unrecognized. */
+ public static S3ChecksumBehavior fromString(String s) {
+ if (s == null) {
+ return null;
+ }
+ for (S3ChecksumBehavior b : values()) {
+ if (b.name().equalsIgnoreCase(s)) {
+ return b;
+ }
+ }
+ return null;
+ }
+ }
+
+ S3ChecksumBehavior getS3ChecksumBehavior();
+
String getAzureClientId();
}
--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21198?usp=email
To unsubscribe, or for help writing mail filters, visit
https://asterix-gerrit.ics.uci.edu/settings?usp=email
Gerrit-MessageType: newchange
Gerrit-Project: asterixdb
Gerrit-Branch: lumina
Gerrit-Change-Id: If6618d3a336e9bf134efb1f219660421edc27c43
Gerrit-Change-Number: 21198
Gerrit-PatchSet: 1
Gerrit-Owner: Ritik Raj <[email protected]>