[ 
https://issues.apache.org/jira/browse/SPARK-38954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38954:
----------------------------------
    Affects Version/s: 3.4.0
                           (was: 3.2.1)

> Implement sharing of cloud credentials among driver and executors
> -----------------------------------------------------------------
>
>                 Key: SPARK-38954
>                 URL: https://issues.apache.org/jira/browse/SPARK-38954
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Parth Chandra
>            Priority: Major
>
> Currently Spark uses external implementations (e.g. hadoop-aws) to access 
> cloud services like S3. In order to access the actual service, these 
> implementations use credentials provider implementations that obtain 
> credentials to allow access to the cloud service.
> These credentials are typically session credentials, which means that they 
> expire after a fixed time. Sometimes, this expiry can be only an hour and for 
> a spark job that runs for many hours (or spark streaming job that runs 
> continuously), the credentials have to be renewed periodically.
> In many organizations, the process of getting credentials may multi-step. The 
> organization has an identity provider service that provides authentication 
> for the user, while the cloud service provider provides authorization for the 
> roles the user has access to. Once the user is authenticated and her role 
> verified, the credentials are generated for a new session.
> In a large setup with hundreds of Spark jobs and thousands of executors, each 
> executor is then spending a lot of time getting credentials and this may put 
> unnecessary load on the backend authentication services.
> The alleviate this, we can use Spark's architecture to obtain the credentials 
> once in the driver and push the credentials to the executors. In addition, 
> the driver can check the expiry of the credentials and push updated 
> credentials to the executors. This is relatively easy to do since the rpc 
> mechanism to implement this is already in place and is used similarly for 
> Kerberos delegation tokens.
>   



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to