[ https://issues.apache.org/jira/browse/SPARK-12800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216243#comment-15216243 ]
Thomas Graves commented on SPARK-12800: --------------------------------------- You are talking about launching a job using org.apache.spark.deploy.yarn.Client directly, correct? If so, we don't officially support that, I realize it isn't currently private and some people are using it but in 2.0 that will be made private. > Subtle bug on Spark Yarn Client under Kerberos Security Mode > ------------------------------------------------------------ > > Key: SPARK-12800 > URL: https://issues.apache.org/jira/browse/SPARK-12800 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.5.1, 1.5.2 > Reporter: Chester > > Version used: Spark 1.5.1 (1.5.2-SNAPSHOT) > Deployment Mode: Yarn-Cluster > Problem observed: > When running spark job directly from YarnClient (without using > spark-submit, I did not verify the spark-submit has the same issue or not), > when kerberos security is enabled, the first time run spark job always fail. > The failure is due to that the hadoop consider the job is in SIMPLE model > rather than Kerberos mode. But without shutting down the JVM, run the same > job again, the spark job will pass. If one restart the JVM, then the spark > job will fail again. > The cause: > Tracking down the source of the issue, I found that the problem seems lie > at the spark Yarn Client.scala. In the Client > {code} > def prepareLocalResources() method L 266 of Client.java, the following line > code is called. > YarnSparkHadoopUtil.get.obtainTokensForNamenodes(nns, hadoopConf, > credentials) > {code} > The YarnSparkHadoopUtil.get is in turns get initialized via reflection > {code} > object SparkHadoopUtil { > private val hadoop = { > val yarnMode = java.lang.Boolean.valueOf( > System.getProperty("SPARK_YARN_MODE", > System.getenv("SPARK_YARN_MODE"))) > if (yarnMode) { > try { > Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil") > .newInstance() > .asInstanceOf[SparkHadoopUtil] > } catch { > case e: Exception => throw new SparkException("Unable to load YARN > support", e) > } > } else { > new SparkHadoopUtil > } > } > def get: SparkHadoopUtil = { > hadoop > } > } > > class SparkHadoopUtil extends Logging { > private val sparkConf = new SparkConf() > val conf: Configuration = newConfiguration(sparkConf) > UserGroupInformation.setConfiguration(conf) > .... rest of line > } > {code} > Here SparkHadoopUtil creates a empty SparkConf and Hadoop Configuration from > that and set to UserGroupInformation > {code} > UserGroupInformation.setConfiguration(conf) > {code} > As the UserGroupInformation.authenticationMethod is static, above all wipe > out the security settings. UserGroupInformation.isSecurityEnabled() changed > from true to false. Thus the sequence call will fail. > Since the SparkHadoopUtil.hadoop is static/non-mutable variable, so > the next run it will be not create again, then > UserGroupInformation.setConfiguration(conf) > will not be called again, so the sequence spark job works. > The work around: > {code} > //first initialize the SparkHadoopUtil, which will create a static > instance > //which will set UserGroupInformation to a empty hadoop Configuration. > //we will need to reset the UserGroupInformation after that. > val util = SparkHadoopUtil.get > UserGroupInformation.setConfiguration(hadoopConf) > {code} > Then call > client.run() -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org