[ https://issues.apache.org/jira/browse/SPARK-22218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-22218: ------------------------------------ Assignee: (was: Apache Spark) > spark shuffle services fails to update secret on application re-attempts > ------------------------------------------------------------------------ > > Key: SPARK-22218 > URL: https://issues.apache.org/jira/browse/SPARK-22218 > Project: Spark > Issue Type: Bug > Components: Shuffle, YARN > Affects Versions: 2.2.0 > Reporter: Thomas Graves > Priority: Blocker > > Running on yarn, If you have any application re-attempts using the spark 2.2 > shuffle service, the external shuffle service does not update the credentials > properly and the application re-attempts fail with > javax.security.sasl.SaslException. > A bug was fixed in 2.2 (SPARK-21494) where it changed the > ShuffleSecretManager to use containsKey > (https://git.corp.yahoo.com/hadoop/spark/blob/yspark_2_2_0/common/network-shuffle/src/main/java/org/apache/spark/network/sasl/ShuffleSecretManager.java#L50) > , which is the proper behavior, the problem is that between application > re-attempts it never removes the key. So when the second attempt starts, the > code says it already contains the key (since the application id is the same) > and it doesn't update the secret properly. > to reproduce this you can run something like a word count and have the > directory already existing. The first attempt will fail because the output > directory exists, the subsequent attempts will fail with max number of > executor failures. Note that this is assuming the second and third attempts > run on the same node as the first attempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org