Parth Chandra created SPARK-57252:
-------------------------------------

             Summary: SPIP: Cloud Credential Refresh and Distribution Without 
Kerberos
                 Key: SPARK-57252
                 URL: https://issues.apache.org/jira/browse/SPARK-57252
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 4.3.0
            Reporter: Parth Chandra
             Fix For: 4.3.0


*Problem* 

  Spark's delegation token infrastructure provides periodic credential refresh 
and distribution to executors, but activation is unconditionally gated on 
Hadoop Kerberos security. Cloud credential providers (AWS STS, GCP IAM, 
internal identity systems) cannot participate in this mechanism without a KDC 
or keytab,
  even though the distribution channel itself (Credentials container + 
UpdateDelegationTokens RPC) is already protocol-agnostic (proven by the Kafka 
provider).

  In deployments with hundreds of jobs and thousands of executors, each 
executor independently performing multi-step authentication (SSO -> 
authorization -> token service) places unnecessary load on backend identity 
services. Credentials expire (often in 1 hour), and without centralized 
refresh, long-running
  jobs fail or create stampeding-herd renewal storms.

  Three activation gates prevent the existing mechanism from running without 
Kerberos:
  1. SupportsDelegationToken.setupTokenManager() is wrapped in if 
(UserGroupInformation.isSecurityEnabled)
  2. HadoopDelegationTokenManager.renewalEnabled checks for keytab or Kerberos 
TGT
  3. obtainDelegationTokens() calls all providers inside doLogin()/doAs(), 
which fails without Kerberos credentials



  *Proposed Solution*

  Introduce a DirectTokenProvider sub-trait of HadoopDelegationTokenProvider 
with a requiresKerberos capability flag (default false). The existing 
HadoopDelegationTokenManager partitions providers by type: Kerberos-dependent 
providers are called inside doAs() (unchanged behavior), direct providers are 
called
  without a login context. The activation gates are relaxed to also trigger 
when direct providers are present.

  This builds on the feedback from 
[SPARK-38954|https://issues.apache.org/jira/browse/SPARK-38954] and [PR 
#37558|https://github.com/apache/spark/pull/37558], which requested a unified 
single-manager approach with a SPIP.

[SPIP 
Doc|https://docs.google.com/document/d/1PPqAoJAj48MdjMJNc7DlytXi745z-imFpVaFDnt18Xg/edit?tab=t.0#heading=h.fyv6xh6uxv71]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to