Parth Chandra created SPARK-57252:
-------------------------------------
Summary: SPIP: Cloud Credential Refresh and Distribution Without
Kerberos
Key: SPARK-57252
URL: https://issues.apache.org/jira/browse/SPARK-57252
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 4.3.0
Reporter: Parth Chandra
Fix For: 4.3.0
*Problem*
Spark's delegation token infrastructure provides periodic credential refresh
and distribution to executors, but activation is unconditionally gated on
Hadoop Kerberos security. Cloud credential providers (AWS STS, GCP IAM,
internal identity systems) cannot participate in this mechanism without a KDC
or keytab,
even though the distribution channel itself (Credentials container +
UpdateDelegationTokens RPC) is already protocol-agnostic (proven by the Kafka
provider).
In deployments with hundreds of jobs and thousands of executors, each
executor independently performing multi-step authentication (SSO ->
authorization -> token service) places unnecessary load on backend identity
services. Credentials expire (often in 1 hour), and without centralized
refresh, long-running
jobs fail or create stampeding-herd renewal storms.
Three activation gates prevent the existing mechanism from running without
Kerberos:
1. SupportsDelegationToken.setupTokenManager() is wrapped in if
(UserGroupInformation.isSecurityEnabled)
2. HadoopDelegationTokenManager.renewalEnabled checks for keytab or Kerberos
TGT
3. obtainDelegationTokens() calls all providers inside doLogin()/doAs(),
which fails without Kerberos credentials
*Proposed Solution*
Introduce a DirectTokenProvider sub-trait of HadoopDelegationTokenProvider
with a requiresKerberos capability flag (default false). The existing
HadoopDelegationTokenManager partitions providers by type: Kerberos-dependent
providers are called inside doAs() (unchanged behavior), direct providers are
called
without a login context. The activation gates are relaxed to also trigger
when direct providers are present.
This builds on the feedback from
[SPARK-38954|https://issues.apache.org/jira/browse/SPARK-38954] and [PR
#37558|https://github.com/apache/spark/pull/37558], which requested a unified
single-manager approach with a SPIP.
[SPIP
Doc|https://docs.google.com/document/d/1PPqAoJAj48MdjMJNc7DlytXi745z-imFpVaFDnt18Xg/edit?tab=t.0#heading=h.fyv6xh6uxv71]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]