[PR] HDDS-15299. Add managed local S3 access keys [ozone]

via GitHub Sun, 17 May 2026 07:37:12 -0700


paf91 opened a new pull request, #10296:
URL: https://github.com/apache/ozone/pull/10296


   ## Summary
   
   This PR adds managed local S3 access keys on top of the `HDDS-13323-sts` 
feature branch. It intentionally targets `apache/ozone:HDDS-13323-sts`, not 
`master`, because this work is stacked on the STS feature branch and depends on 
its S3/STS auth-path changes. After review, the intended flow is to merge this 
work into `HDDS-13323-sts` first, then merge the STS feature branch into 
`master` through the existing STS integration flow.
   
   Managed local S3 access keys let admins issue OM-managed `accessKey` / 
`secretKey` pairs for S3 clients that cannot use STS session tokens. 
Credentials are validated in OM after STS-token handling and before the legacy 
`S3SecretManager` fallback. S3G request serialization is unchanged; 
authentication and identity resolution happen OM-side.
   
   Lifecycle support for managed keys is included: create, list/info, disable, 
rotate, and delete. Per-key JSON policy evaluation is deliberately not 
implemented in this PR. Managed keys authorize as their configured 
`effectiveUser` through the existing Ozone authorization path, and any stored 
`policyDocument` fails closed until the policy evaluator lands in a follow-up.
   
   The feature is gated and disabled by default, so existing deployments remain 
unchanged unless the managed S3 access-key feature is explicitly enabled.
   
   ## Credential resolution order
   
   Credential resolution order at OM:
   
   ```text
   1. STS session token, when a non-empty session token is visible to OM
   2. Managed local S3 access key
   3. Non-secure short-circuit
   4. Legacy S3SecretManager
   ```
   
   STS remains authoritative. If a request has a non-empty session token 
visible to OM, STS validation is selected first. If STS validation fails, the 
request fails and does not fall through to managed-key authentication or legacy 
`S3SecretManager`.
   
   Blank/whitespace session tokens visible to OM are rejected when OM performs 
S3 credential validation, such as in secure clusters or when managed keys are 
enabled. Truly empty tokens dropped before OM remain existing behavior.
   
   ## Feature gating
   
   | Key | Default | Behavior in this PR |
   |---|---:|---|
   | `ozone.s3.accesskey.enabled` | `false` | Master switch for managed local 
S3 access keys. |
   | `ozone.s3.accesskey.local.policy.enabled` | `false` | Enabling fails 
closed; the local policy evaluator lands in a follow-up. |
   | `ozone.s3.accesskey.insecure.cluster.admin.allowed` | `false` | Required 
for non-secure admin lifecycle operations. |
   
   When all defaults stand, this PR is a no-op for existing deployments.
   
   ## Managed-key fallback behavior
   
   Managed-key lookup and failure behavior is intentionally split:
   
   - in secure clusters, managed-key table miss falls back to the legacy 
`S3SecretManager` path;
   - managed-key table hit with terminal failure fails closed and does not fall 
back to legacy auth;
   - in non-secure managed-key mode, managed-key auth fails closed and does not 
fall back to arbitrary non-secure credentials.
   
   Terminal managed-key failures include:
   
   - disabled or expired keys;
   - inconsistent stored metadata, such as an `accessKeyId` mismatch in the 
stored row;
   - KMS provider absence or decrypt failure;
   - SigV4 signature mismatch;
   - `ozone.s3.accesskey.local.policy.enabled=true` or a stored non-empty 
`policyDocument`, until the evaluator lands;
   - table-lookup failures or unexpected runtime errors.
   
   Most terminal managed-key failures return `PERMISSION_DENIED`.
   
   The pre-finalization layout case returns 
`NOT_SUPPORTED_OPERATION_PRIOR_FINALIZATION`, because it is a 
server-state/layout-finalization failure rather than a data-path authorization 
failure.
   
   ## Secure-cluster requirement
   
   Managed local S3 access-key authentication on the data path requires a 
secure cluster. When the managed-key feature is enabled but the cluster is 
non-secure, managed-key authentication fails closed before arbitrary 
credentials are accepted.
   
   Admin lifecycle operations, such as create, disable, rotate, and delete, in 
a non-secure cluster require explicit opt-in via 
`ozone.s3.accesskey.insecure.cluster.admin.allowed=true`. The data-path 
fail-closed posture is independent of this knob.
   
   When the managed-key feature is disabled, the existing non-secure behavior 
is preserved.
   
   ## Local policy evaluation
   
   This PR does not implement local JSON policy evaluation.
   
   Until the evaluator is implemented in a follow-up:
   
   - `ozone.s3.accesskey.local.policy.enabled=true` fails closed;
   - stored non-empty `policyDocument` fails closed;
   - policy documents are not silently ignored.
   
   ## Identity model
   
   Managed S3 access-key authentication separates three identities:
   
   ```text
   credentialAccessKeyId
   s3NamespaceAccessId
   effectiveUser
   ```
   
   Where:
   
   - `credentialAccessKeyId` is the AWS access key ID from SigV4 and is used 
for audit fields;
   - `s3NamespaceAccessId` is the namespace-routing identity;
   - `effectiveUser` is the authorization principal.
   
   `effectiveUser` is set by the admin at managed-key creation time and stored 
as a field on `S3ManagedAccessKeyInfo`. It is immutable for the lifetime of the 
key.
   
   The split exists because the two concerns differ even in this initial 
implementation: `OmMetadataReader.checkAcls`, `OMClientRequest.getUserInfo`, 
and bucket-link resolution need the authorization principal, while 
`OzoneManager.getS3VolumeContext` tenant lookup needs the access-ID-style 
routing identity. In this initial implementation, `s3NamespaceAccessId = 
credentialAccessKeyId`. `effectiveUser` remains separate and may differ; it is 
the principal used for authorization. The call sites that consume each are 
already distinct.
   
   This also positions per-key policy and tenant-bound behavior to land without 
re-plumbing the auth path.
   
   ## Authorization model
   
   This PR does not add per-key policy enforcement.
   
   The intended authorization model is:
   
   ```text
   managed S3 access key
     -> effectiveUser
     -> existing Ozone authorization / ACL checks
   ```
   
   Therefore, if a managed key maps to `effectiveUser=alice`, the request is 
authorized as `alice` using the existing Ozone authorization path.
   
   The managed key is an indirection: many managed-key credentials can map to 
the same `effectiveUser`, can be issued and rotated independently, and can be 
expired or disabled without touching the underlying Ozone identity. Per-key 
JSON policy will narrow individual credentials further in a follow-up.
   
   ## HA freshness
   
   Managed-key authentication calls 
`OzoneManagerRatisUtils.checkLeaderStatus(...)` before any table lookup, so 
followers reject managed-key requests immediately and clients are routed to the 
leader. This avoids stale-read inconsistency on disabled, expired, or rotated 
keys.
   
   ## Secret handling
   
   Managed decrypted secrets use a byte-array SigV4 validation path.
   
   The managed-key path avoids converting plaintext managed secrets to `String`.
   
   The byte-array validation path clears the caller's secret bytes, derived 
SigV4 signing byte arrays, and the expected-signature byte buffer in `finally` 
blocks. Perfect zeroization is bounded by Java/JCA defensive copies and 
immutable temporary strings, but mutable secret-bearing buffers under this code 
path are cleared.
   
   ## Context cleanup
   
   Managed S3 auth context is cleared after request processing, alongside the 
existing S3 auth and STS token thread-local cleanup.
   
   ## Out of scope
   
   Out of scope for this PR, planned as follow-ups:
   
   - local JSON policy evaluation;
   - CLI command surface;
   - tenant-bound managed-key semantics;
   - broader E2E coverage;
   - additional operator UX around managed-key issuance;
   - explicit guard against tenant entries using managed-key access IDs.
   
   Out of scope and not planned in follow-ups under this JIRA:
   
   - S3G runtime serialization changes;
   - WebIdentity / HDDS-15273 changes;
   - changes to admin lifecycle/retrieval-handle semantics beyond this 
managed-key stack.
   
   ## Review focus
   
   Specific things I would appreciate review on:
   
   1. Target branch: should this go to `HDDS-13323-sts`, as a stacked 
feature-branch PR, rather than directly to `master`?
   2. Credential resolver order: `STS -> managed -> non-secure short-circuit -> 
legacy`. Is the precedence and fail-closed behavior at each step correct?
   3. The three-identity split: `credentialAccessKeyId` / `s3NamespaceAccessId` 
/ `effectiveUser`. Is each consumed correctly at the call sites that should use 
it — ACL checks, S3 volume/tenant lookup, bucket-link resolution, and request 
user-info propagation?
   4. Secret handling: is the byte-array validation path and best-effort 
zeroization sufficient given Java/JCA defensive-copy limitations?
   5. Local-policy fail-closed behavior pending the policy evaluator: is this 
acceptable as the initial posture?
   
   ## Tests
   
   The following checks passed:
   
   - 45 focused unit tests across `TestAWSV4AuthValidator`, 
`TestS3SecurityUtil`, `TestOzoneManagerS3Auth`, and 
`TestOMClientRequestWithUserInfo`.
   - 35 regression tests across `TestS3GetSecretRequest`, 
`TestS3AssumeRoleRequest`, and `TestS3RevokeSTSTokenRequest`.
   - OM checkstyle: clean.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] HDDS-15299. Add managed local S3 access keys [ozone]

Reply via email to