[ 
https://issues.apache.org/jira/browse/SPARK-25689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723523#comment-16723523
 ] 

ASF GitHub Bot commented on SPARK-25689:
----------------------------------------

vanzin opened a new pull request #23338: [SPARK-25689][yarn] Make driver, not 
AM, manage delegation tokens.
URL: https://github.com/apache/spark/pull/23338
 
 
   This change modifies the behavior of the delegation token code when running
   on YARN, so that the driver controls the renewal, in both client and cluster
   mode. For that, a few different things were changed:
   
   * The AM code only runs code that needs DTs when DTs are available.
   
   In a way, this restores the AM behavior to what it was pre-SPARK-23361, but
   keeping the fix added in that bug. Basically, all the AM code is run in a
   "UGI.doAs()" block; but code that needs to talk to HDFS (basically the
   distributed cache handling code) was delayed to the point where the driver
   is up and running, and thus when valid delegation tokens are available.
   
   * SparkSubmit / ApplicationMaster now handle user login, not the token 
manager.
   
   The previous AM code was relying on the token manager to keep the user
   logged in when keytabs are used. This required some odd APIs in the token
   manager and the AM so that the right UGI was exposed and used in the right
   places.
   
   After this change, the logged in user is handled separately from the token
   manager, so the API was cleaned up, and, as explained above, the whole AM
   runs under the logged in user, which also helps with simplifying some more 
code.
   
   * Distributed cache configs are sent separately to the AM.
   
   Because of the delayed initialization of the cached resources in the AM, it
   became easier to write the cache config to a separate properties file instead
   of bundling it with the rest of the Spark config. This also avoids having
   to modify the SparkConf to hide things from the UI.
   
   * Finally, the AM doesn't manage the token manager anymore.
   
   The above changes allow the token manager to be completely handled by the
   driver's scheduler backend code also in YARN mode (whether client or 
cluster),
   making it similar to other RMs. To maintain the fix added in SPARK-23361 also
   in client mode, the AM now sends an extra message to the driver on 
initialization
   to fetch delegation tokens; and although it might not really be needed, the
   driver also keeps the running AM updated when new tokens are created.
   
   Tested in a kerberized cluster with the same tests used to validate 
SPARK-23361,
   in both client and cluster mode. Also tested with a non-kerberized cluster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Move token renewal logic to driver in yarn-client mode
> ------------------------------------------------------
>
>                 Key: SPARK-25689
>                 URL: https://issues.apache.org/jira/browse/SPARK-25689
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.4.0
>            Reporter: Marcelo Vanzin
>            Priority: Minor
>
> Currently, both in yarn-cluster and yarn-client mode, the YARN AM is 
> responsible for renewing delegation tokens. That differs from other RMs 
> (Mesos and later k8s when it supports this functionality), and is one of the 
> roadblocks towards fully sharing the same delegation token-related code.
> We should look at keeping the renewal logic within the driver in yarn-client 
> mode. That would also remove the need to distribute the user's keytab to the 
> AM when running in that particular mode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to