[ 
https://issues.apache.org/jira/browse/CASSSIDECAR-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075279#comment-18075279
 ] 

Jon Haddad commented on CASSSIDECAR-415:
----------------------------------------

Previously, the restore job creation endpoint required static AWS credential 
fields to be present in the request payload. This worked well for environments 
using short-lived STS credentials but made the sidecar unusable in AWS 
environments where credentials are provided automatically via IAM instance 
profiles, ECS task roles, or IRSA — the standard AWS-recommended approach for 
workloads running on EC2, ECS, and EKS.

This patch adds a credentialType field to the CreateRestoreJobRequest payload. 
When set to IAM, the endpoint accepts region-only credentials (no key fields) 
and the sidecar delegates credential resolution to the AWS default credential 
chain via DefaultCredentialsProvider. When set to STATIC, or omitted entirely, 
the existing behavior is preserved: all three key fields are required and a 
StaticCredentialsProvider is used. This makes null equivalent to STATIC, 
ensuring full backwards compatibility with existing clients on trunk.

To handle the new field, the restore_job table is versioned to restore_job_v6, 
adding a credential_type text column. The credential type is normalized to 
STATIC at the write boundary so the column is never null. Secrets are 
serialized to blob_secrets exactly as before — for IAM jobs the key fields are 
null and only region is present, so StorageClientPool can derive the region 
from the stored credentials without any schema changes to that field. 
StorageClient continues to branch on hasStaticCredentials() at authentication 
time: static jobs get a per-job StaticCredentialsProvider as before, while IAM 
jobs share a singleton DefaultCredentialsProvider to avoid spawning a new 
thread pool per job. Validation is enforced at the API boundary — passing 
static credentials with credentialType: IAM, or omitting key fields with 
credentialType: STATIC, is rejected at deserialization time with a clear error 
message before the job is ever persisted.

A new docs/src/spark.adoc is included that explains both credential modes with 
flow diagrams and code examples.

> Support IAM instance profile credentials for S3 restore job downloads
> ---------------------------------------------------------------------
>
>                 Key: CASSSIDECAR-415
>                 URL: https://issues.apache.org/jira/browse/CASSSIDECAR-415
>             Project: Sidecar for Apache Cassandra
>          Issue Type: Improvement
>          Components: Bulk Analytics
>            Reporter: Jon Haddad
>            Priority: Major
>
> The restore job feature downloads SSTables from S3 using static AWS 
> credentials that the caller must supply via POST 
> /api/v1/\{keyspace}/\{table}/restore-jobs. The request body must include a 
> secrets object (RestoreJobSecrets) containing separate read and write 
> StorageCredentials, each requiring accessKeyId, secretAccessKey, 
> sessionToken, and region — all enforced as non-null in 
> StorageCredentials.java (lines 52–55) and RestoreJobSecrets.java (lines 
> 41–42).
> On job creation, RestoreJobDatabaseAccessor.create() (line 90) serializes the 
> secrets to JSON via Jackson and writes them as a raw blob to the blob_secrets 
> column of the restore_jobs table, defined in RestoreJobsSchema.java (line 
> 91). There is no encryption applied at the column, table, or application 
> level — the credentials are stored as plaintext JSON bytes. This leaks the 
> credentials to anyone with access to this table.
> Because multiple sidecar nodes process different slices of the same restore 
> job in parallel, each node reads the job back from Cassandra — including the 
> secrets — via
> RestoreJobDatabaseAccessor.find() (line 191), which deserializes them from 
> row.getBytes("blob_secrets") in {{RestoreJob.java}} (line 94). Each node then 
> passes the job to StorageClientPool.storageClient() (line 86), which extracts 
> the region from {{restoreJob.secrets.readCredentials().region()}} (line 88) 
> and calls StorageClient.authenticate(). Inside 
> {{StorageClient.Credentials.init()}} (lines 341–344), the credentials are 
> unconditionally converted to AwsSessionCredentials and wrapped in a 
> StaticCredentialsProvider, which is then injected into each S3 request via 
> overrideConfiguration(b -> b.credentialsProvider(...)) in both objectExists() 
> (line 145) and rangeGetObject() (line 237).
> This design contradicts AWS best practices. AWS explicitly recommends using 
> IAM roles over static credentials wherever possible. IAM roles — via EC2 
> instance profiles, ECS task roles, or EKS IRSA — eliminate the need to 
> create, distribute, store, rotate, or revoke long-lived credentials. The 
> current design forces users running in AWS to work against this guidance: 
> even if their nodes already have IAM-granted S3 access, they must still 
> obtain and manage static credentials to satisfy the mandatory 
> {{Objects.requireNonNull(secrets, ...)}} check in 
> {{CreateRestoreJobRequestPayload.java}} (line 101).
> Passing static credentials over the request and storing them in Cassandra 
> creates risk that IAM roles entirely avoid. 
> RestoreJobDatabaseAccessor.create() (line 90) writes the secrets as a plain 
> JSON blob into blob_secrets. The restore_jobs table schema 
> (RestoreJobsSchema.java line 91) has no encryption configuration — no 
> column-level encryption, no transparent data encryption, no application-level 
> crypto. The credentials sit as plaintext, replicated across every Cassandra 
> node holding that partition, and included in any Cassandra backups taken 
> during the job's lifetime.
> Credentials visible in logs on failure. StorageClient logs 
> credentials.readCredentials on S3 request failures in both 
> logCredentialOnRequestFailure() (line 298) and the failure mapper in
> rangeGetObject() (line 256). Although StorageCredentials.toString() redacts 
> the secret key and session token (line 94 of StorageCredentials.java), the 
> access key ID is logged in plaintext. This provides an attack vector by 
> giving an adversary a string to search for to potentially match a secret to.
> *Proposed Solution*
> Make secrets optional throughout the restore job pipeline. When secrets are 
> absent, StorageClient should fall back to DefaultCredentialsProvider, which 
> implements the standard AWS credential chain: environment variables → system 
> properties → IAM instance profile → ECS task role → etc. This aligns the 
> sidecar with AWS best practices and allows operators running in
> AWS to use the credential model AWS recommends.
> StorageCredentials, RestoreJobSecrets, and CreateRestoreJobRequestPayload 
> need to permit null/absent credentials. The region must still be provided — 
> either inside the secrets object or as
> a new top-level field on the request — since it is required by 
> StorageClientPool.storageClient() (line 88) to construct the regional S3 
> endpoint.
> StorageClient.Credentials.init() (lines 339–345) should branch: use 
> StaticCredentialsProvider with AwsSessionCredentials when credentials are 
> present, use
> DefaultCredentialsProvider.create() when they are not. The 
> RestoreJobFatalException thrown when secrets are null (lines 331–334) should 
> be removed.
> RestoreJobDatabaseAccessor.create() (line 90) should skip writing 
> blob_secrets when secrets are null. RestoreJob.from() (line 94 of 
> RestoreJob.java) already handles a null blob_secrets
> column gracefully.
> API backward compatibility: Fully backward-compatible. Callers that currently 
> pass credentials continue to work unchanged.
> Acceptance Criteria
>  * secrets is optional in POST /api/v1/\{keyspace}/\{table}/restore-jobs; 
> existing clients with credentials continue to work unchanged
>  * When secrets is absent, StorageClient uses DefaultCredentialsProvider
>  * region is still required whether or not secrets are provided
>  * When using IAM mode, nothing is written to the blob_secrets column
>  * Integration test covering a restore job completing successfully without 
> explicit credentials
>  * Unit tests for StorageClient.Credentials covering both the static and IAM 
> credential paths
> Key Files to Modify
>  * client-common/.../common/data/StorageCredentials.java — make credential 
> fields optional
>  * client-common/.../common/data/RestoreJobSecrets.java — allow null 
> read/write credentials
>  * client-common/.../common/request/data/CreateRestoreJobRequestPayload.java 
> — remove null check on secrets; handle region when secrets are absent
>  * server/.../restore/StorageClient.java — branch on null credentials in 
> Credentials.init(); use DefaultCredentialsProvider as fallback; remove fatal 
> exception on null secrets
>  * server/.../restore/StorageClientPool.java — handle null secrets when 
> extracting region in storageClient()
>  * server/.../db/RestoreJob.java — handle null secrets throughout
>  * server/.../db/RestoreJobDatabaseAccessor.java — skip blob_secrets write 
> when secrets are null
>  * server/.../handlers/restore/CreateRestoreJobHandler.java — relax secrets 
> validation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to