Simon Theodosius created HDDS-15273:
---------------------------------------

             Summary: [STS] Support AssumeRoleWithWebIdentity (OIDC) for 
temporary S3 credentials
                 Key: HDDS-15273
                 URL: https://issues.apache.org/jira/browse/HDDS-15273
             Project: Apache Ozone
          Issue Type: New Feature
          Components: documentation, OM, S3, Security
         Environment: Apache Ozone STS feature work based on the HDDS-13323 STS 
runtime branch.

Target runtime:
- Apache Ozone STS
- Ozone Manager
- S3 Gateway
- S3 clients using AWS SigV4 temporary credentials
- OIDC provider such as Keycloak
- Ranger or another configured Ozone authorizer as the policy decision point

The feature is disabled by default and is not environment-specific.
            Reporter: Simon Theodosius


Apache Ozone STS currently supports temporary S3 credentials through the STS 
AssumeRole flow. This issue adds OIDC/WebIdentity support to Ozone STS, 
allowing clients to exchange a Keycloak/OIDC JWT for temporary S3 credentials 
via an AssumeRoleWithWebIdentity flow.

The feature is gated behind ozone.sts.web.identity.enabled and is disabled by 
default. There is no behavior change on upgrade unless the feature is 
explicitly enabled.

This work is related to the STS umbrella work in HDDS-13323.

High-level architecture:

- Keycloak/OIDC authenticates the caller.
- Ozone STS validates the OIDC JWT on the OM side.
- Ozone STS issues temporary S3 credentials:
  - AccessKeyId
  - SecretAccessKey
  - SessionToken
  - Expiration
- Subsequent S3 requests use normal AWS SigV4 with x-amz-security-token.
- Ranger or the configured Ozone authorizer remains the authorization / policy 
decision point.
- Keycloak groups and roles are used only as identity attributes, not as final 
bucket/object authorization decisions.

Authorization model:

- OIDC/WebIdentity is authentication only.
- Ranger / configured Ozone authorizer remains responsible for authorization.
- The default IAccessAuthorizer implementation fails closed for WebIdentity by 
returning NOT_SUPPORTED_OPERATION.
- A production deployment requires a WebIdentity-capable Ranger/Ozone 
authorizer implementation of 
generateAssumeRoleWithWebIdentitySessionPolicy(...).
- Without such an authorizer, AssumeRoleWithWebIdentity does not issue 
credentials.

Scope:

- Add an OIDC/JWKS validation module.
- Add STS-focused config keys under ozone.sts.web.identity.*.
- Add AssumeRoleWithWebIdentity request/response path.
- Validate raw WebIdentityToken in OM preExecute().
- Strip raw JWT before Ratis replication.
- Persist only sanitized identity/session data and a token fingerprint.
- Extend the STS token identity model for WEB_IDENTITY while preserving legacy 
AssumeRole compatibility.
- Reuse the existing STS temporary credential validation path for S3 requests.
- Add mini-cluster E2E test:
  generated JWT + JWKS -> STS temporary credentials -> AWS SDK v2 S3 request.
- Add Keycloak Testcontainers integration test.
- Add documentation for Keycloak + Ranger/Ozone authorizer usage.

Backward compatibility:

- Existing AssumeRole flow is unchanged.
- Existing permanent S3 secret flow is unchanged.
- Existing S3 SigV4 behavior is unchanged.
- STSTokenIdentifier adds optional WebIdentity-specific fields and an AuthType.
- Previously-issued AssumeRole tokens remain valid.
- Tokens serialized without AuthType deserialize as ASSUME_ROLE, preserving 
compatibility with previously-issued tokens.
- ozone.sts.web.identity.enabled defaults to false.

Security properties:

- Raw JWT is validated only on OM leader in preExecute().
- Raw JWT is stripped before Ratis replication.
- Raw JWT is not stored in STSTokenIdentifier.
- Raw JWT is not logged or exposed in toString()/exception messages.
- Only SHA-256 token fingerprint and sanitized identity claims are persisted.
- alg=none is rejected.
- JWT signatures are validated through JWKS.
- RSA-family algorithms are allowlisted.
- issuer, audience, exp, nbf, and iat are validated with bounded clock skew.
- JWKS fetch is bounded with connect timeout, read timeout, and response size 
limit.
- Unknown-kid refresh is debounced and fails closed.
- Insecure HTTP issuer/JWKS URI is disabled by default.
- Enabling insecure HTTP for tests requires explicit 
ozone.sts.web.identity.allow.insecure.http.for.tests and emits an OM startup 
warning.
- Temporary credentials require x-amz-security-token.
- Non-canonical STS session-token encodings fail closed.

Configuration keys:

- ozone.sts.web.identity.enabled
- ozone.sts.web.identity.issuer.uri
- ozone.sts.web.identity.jwks.uri
- ozone.sts.web.identity.audience
- ozone.sts.web.identity.username.claim
- ozone.sts.web.identity.subject.claim
- ozone.sts.web.identity.groups.claim
- ozone.sts.web.identity.roles.claim
- ozone.sts.web.identity.clock.skew
- ozone.sts.web.identity.jwks.refresh.interval
- ozone.sts.web.identity.jwks.connect.timeout
- ozone.sts.web.identity.jwks.read.timeout
- ozone.sts.web.identity.jwks.size.limit
- ozone.sts.web.identity.require.https
- ozone.sts.web.identity.allow.insecure.http.for.tests

New dependency:

- Nimbus JOSE + JWT is used for OIDC/JWT/JWKS validation.
- Dependency and transitive licensing must be verified as part of the normal 
ASF dependency review.

Non-goals:

- This does not replace Kerberos daemon authentication.
- This does not add OFS OIDC login.
- This does not add CLI device-code login.
- This does not make Keycloak the Ozone policy engine.
- This does not replace Ranger or Native ACL authorization.
- This does not implement Keycloak Authorization Services as the object-store 
PDP.
- This does not make a fully Kerberos-free secure Ozone cluster.

Test coverage:

- Unit tests in hadoop-ozone/common for OIDC/JWT/JWKS validation and request 
models.
- Unit tests in hadoop-ozone/s3gateway for /sts routing, auth bypass gating, 
and parser hardening.
- Unit tests in hadoop-ozone/ozone-manager for OM runtime, STS token model, 
legacy token compatibility, and redaction.
- Mini-cluster E2E test in hadoop-ozone/integration-test-s3:
  generated JWT + local JWKS -> AssumeRoleWithWebIdentity -> temporary S3 
credentials -> AWS SDK v2 S3 SigV4 request.
- Keycloak Testcontainers integration test:
  real Keycloak JWT -> Ozone STS -> temporary S3 credentials -> allowed S3 
operation succeeds / denied operation fails.
- Keycloak Testcontainers IT passes when Docker is available. It is intended as 
an integration test, not as a mandatory no-Docker unit-test gate.

Design and operator documentation:

- Design and operator guide added as part of this work:
  hadoop-hdds/docs/content/security/OzoneSTSWebIdentityKeycloakRanger.md

Acceptance criteria:

- Feature is disabled by default.
- Existing AssumeRole tests continue to pass.
- Existing permanent S3 secret path continues to pass.
- Existing S3 SigV4 path continues to pass.
- OIDC JWT validation rejects:
  - alg=none
  - expired token
  - wrong issuer
  - wrong audience
  - wrong signature
  - tampered claims
  - missing required claims
- Raw WebIdentityToken is not serialized into Ratis-bound requests.
- Ratis replay does not call OIDC/JWKS/Keycloak.
- Temporary credentials issued through WebIdentity can be used with AWS SigV4 
and x-amz-security-token.
- Missing/wrong/expired session token fails closed.
- A denied authorizer/Ranger decision results in no credentials being issued.
- Deployments without a WebIdentity-capable authorizer fail closed.
- Keycloak Testcontainers IT passes in an environment where 
Docker/Testcontainers are available.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to