Re: [DISCUSS] SPIP: OIDC Credential Propagation

Kousuke Saruta Sat, 06 Jun 2026 09:36:22 -0700

Hi Parth,

Thank you for the thoughtful response. I think the incremental approach
(your path1) might be feasible. Our proposals are complementary and
independent. They address different problems and can proceed in parallel
without blocking each other.
Your DirectTokenProvider unblocks non-Kerberos credential providers in the
existing HadoopDelegationTokenManager mechanism. This solves the
immediate gate problem for environments where pod-level identity is
unavailable or insufficient.
My SPIP introduces per-user/per-session identity propagation with a
separate manager and RPC, targeting the case where executors need
credentials derived from a user identity they cannot obtain themselves.
Neither depends on the other landing first. They share no code paths
(Different manager, different RPC, and different SPI).


Regarding the 2022 review feedback (sorry, I didn't know about that), the
constraints have shifted since then. Per-user identity propagation and
Spark Connect multi-tenancy require expressing per-session identity, but
Spark's current use of UGI is process-wide so per-session scoping would
require fundamental changes to how Spark interacts with UGI. In the
Appendix C in my SPIP doc, the rejection of UGI reflects these new
requirements.
Regarding binary payloads in ServiceCredential, for the initial
implementation, Base64 encoding within Map[String, String] is sufficient
for the S3A use case. Since the SPI is annotated @DeveloperApi, we can add
a byte[] field or richer payload type in a future release if concrete
integrations require binary credentials. I'd prefer to keep the initial
surface small and evolve based on real demand.

Best,
Kousuke

2026年6月6日(土) 2:40 Parth Chandra <[email protected]>:

>   Subject: Re: [DISCUSS] SPIP: OIDC Credential Propagation
>
>   Hi Kousuke,
>
>   Thanks for putting this together. As the author of the
> original SPARK-38954 [3] and PR #37558 [4], I'm glad to see this problem
> getting formal attention — it's been a real gap for cloud-native Spark
> deployments.
>
>   I think we're aligned on the problem but differ on scope. Your proposal
> addresses identity-aware credential propagation (per-user authorization,
> audit trails, Spark Connect multi-tenancy). That's a compelling long-term
> direction. The problem I was trying to solve in PR #37558 [4] is narrower:
> enable non-Kerberos credential providers to participate in the existing
> distribution mechanism, which is already provider-agnostic (as the Kafka
> provider demonstrates) but gated on Kerberos activation.
>
>   After the review feedback on PR #37558 [4] — specifically the direction
> that we should use a single auth-agnostic manager and UGI as the container
> — I've been working on a minimal approach SPARK-27252 [1][2]: a
> DirectTokenProvider sub-trait of the existing
> HadoopDelegationTokenProvider, with routing logic inside the existing
> HadoopDelegationTokenManager to call direct providers without doAs(). This
> requires ~80 lines of changes to existing
>   code, no new manager, no new RPC message, and no new credential store.
> It follows the review feedback from PR #37558 [4] exactly.
>
>   I see two paths forward and am happy with either:
>
>   1. *Incremental*: The minimal DirectTokenProvider change SPARK-27252
> [1][2] lands first, unblocking the immediate use case (driver-mediated
> credential refresh without Kerberos). Your UserCredentialManager and
> identity-aware  architecture can then build on top — or alongside — when
> the broader scope (Spark Connect, per-user identity, multi-cloud) is ready.
> The two aren't mutually exclusive.
>   2. *Unified*: If the community prefers to solve the full identity
> propagation problem in one shot, I'd be glad to collaborate on your
> proposal. In that case I'd suggest we address the relationship to the 2022
> review feedback explicitly — specifically the preference for a single
> manager and UGI as a container. Your Appendix C rejects that direction; it
> would strengthen the proposal to explain why the constraints have changed
> (Spark Connect multi-tenancy, per-user identity requirements that UGI
> cannot express).
>
>   One technical observation: your proposal's CredentialProvider.resolve()
> returns a ServiceCredential with Map[String, String] properties. For the
> S3A case this works well (access key, secret key, session token are
> strings). But some credential systems return binary payloads (signed SAML
> assertions, serialized protobuf tokens). Worth considering whether
> Map[String, byte[]] or an opaque byte[] field alongside the properties map
> would future-proof the SPI.
>
>   Happy to discuss further.
>
>   Best,
>   Parth
>
>   [1] https://issues.apache.org/jira/browse/SPARK-57252
>   [2]
> https://docs.google.com/document/d/1PPqAoJAj48MdjMJNc7DlytXi745z-imFpVaFDnt18Xg/edit?tab=t.0#heading=h.21tncge82jbl
>   [3] https://issues.apache.org/jira/browse/SPARK-38954
>   [4] https://github.com/apache/spark/pull/37558
>
>>

Re: [DISCUSS] SPIP: OIDC Credential Propagation

Reply via email to