Hi All, I'm having trouble getting a job to use the spark history server. We have a cluster configured with Ambari, if I run the job from one of the nodes within the Ambari configured cluster, everything works fine, the job appears in the spark history server.
If I configure a client external to the cluster, running the same job, the history server is not used. When the job completes fine, I see these lines appear in the log: 16/02/05 11:57:22 INFO history.YarnHistoryService: Starting YarnHistoryService for application application_1453893909110_0108 attempt Some(appattempt_1453893909110_0108_000001); state=1; endpoint= http://somehost:8188/ws/v1/timeline/; bonded to ATS=false; listening=false; batchSize=10; flush count=0; total number queued=0, processed=0; attempted entity posts=0 successful entity posts=0 failed entity posts=0; events dropped=0; app start event received=false; app end event received=false; 16/02/05 11:57:22 INFO history.YarnHistoryService: Spark events will be published to the Timeline service at http://somehost:8188/ws/v1/timeline/ On the client which is external to the cluster, these lines do not appear in the logs. I have printed out spark context and attempted to match what is configured on the working job, with the failing job, all seems fine. These are the job settings: conf.set('spark.speculation','true') conf.set('spark.dynamicAllocation.enabled','false') conf.set('spark.shuffle.service.enabled','false') conf.set('spark.executor.instances', '4') conf.set('spark.akka.threads','4') conf.set('spark.dynamicAllocation.initialExecutors','4') conf.set('spark.history.provider','org.apache.spark.deploy.yarn.history.YarnHistoryProvider') conf.set('spark.yarn.services','org.apache.spark.deploy.yarn.history.YarnHistoryService') conf.set('spark.history.ui.port','18080') conf.set('spark.driver.extraJavaOptions','-Dhdp.version=2.3.4.0-3485') conf.set('spark.yarn.containerLauncherMaxThreads','25') conf.set('spark.yarn.driver.memoryOverhead','384') conf.set('spark.yarn.executor.memoryOverhead','384') conf.set('spark.yarn.historyServer.address','somehost:18080') conf.set('spark.yarn.max.executor.failures','3') conf.set('spark.yarn.preserve.staging.files','false') conf.set('spark.yarn.queue','default') conf.set('spark.yarn.scheduler.heartbeat.interval-ms','5000') conf.set('spark.yarn.submit.file.replication','3') conf.set('spark.yarn.am.extraJavaOptions','-Dhdp.version=2.3.4.0-3485') conf.set('spark.blockManager.port','9096') conf.set('spark.driver.port','9095') conf.set('spark.fileserver.port','9097') I am using the following tar.gz file to install spark on the node external to the cluster: http://www.apache.org/dyn/closer.lua/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz Will this version of spark have everything required to talk correctly to yarn and the spark history service? So it comes down to, the spark context settings appear to be exactly the same, there are no errors in the logs pointing to the job not being able to connect to anything, none of the ports are blocked, why is this not working when run external to the cluster....? There is no kerberos security configured on the cluster. Thanks!