Re: pyspark - spark history server

cs user Fri, 05 Feb 2016 10:24:51 -0800

Hi Folks,

So the fix for me was to copy this file on the nodes built with Ambari:


/usr/hdp/2.3.4.0-3485/spark/lib/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar

To this file on the client machine, external to the cluster:

/opt/spark/lib/spark-assembly-1.5.2-hadoop2.6.0.jar

I tried this after reading:

https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3ccaaonq7v7cq4hqr2p9ez5ojucmyc+mo2ggh068cwh+qwt6sx...@mail.gmail.com%3E

So I assume this is a custom built jar which is not part of the official
distribution. Just wanted to post in case this helps another person.

Thanks!


On Fri, Feb 5, 2016 at 2:08 PM, cs user <acldstk...@gmail.com> wrote:

> Hi All,
>
> I'm having trouble getting a job to use the spark history server. We have
> a cluster configured with Ambari, if I run the job from one of the nodes
> within the Ambari configured cluster, everything works fine, the job
> appears in the spark history server.
>
> If I configure a client external to the cluster, running the same job, the
> history server is not used.
>
> When the job completes fine, I see these lines appear in the log:
>
>
> 16/02/05 11:57:22 INFO history.YarnHistoryService: Starting
> YarnHistoryService for application application_1453893909110_0108 attempt
> Some(appattempt_1453893909110_0108_000001); state=1; endpoint=
> http://somehost:8188/ws/v1/timeline/; bonded to ATS=false;
> listening=false; batchSize=10; flush count=0; total number queued=0,
> processed=0; attempted entity posts=0 successful entity posts=0 failed
> entity posts=0; events dropped=0; app start event received=false; app end
> event received=false;
> 16/02/05 11:57:22 INFO history.YarnHistoryService: Spark events will be
> published to the Timeline service at http://somehost:8188/ws/v1/timeline/
>
>
> On the client which is external to the cluster, these lines do not appear
> in the logs. I have printed out spark context and attempted to match what
> is configured on the working job, with the failing job, all seems fine.
>
> These are the job settings:
>
> conf.set('spark.speculation','true')
> conf.set('spark.dynamicAllocation.enabled','false')
> conf.set('spark.shuffle.service.enabled','false')
> conf.set('spark.executor.instances', '4')
> conf.set('spark.akka.threads','4')
> conf.set('spark.dynamicAllocation.initialExecutors','4')
>
> conf.set('spark.history.provider','org.apache.spark.deploy.yarn.history.YarnHistoryProvider')
>
> conf.set('spark.yarn.services','org.apache.spark.deploy.yarn.history.YarnHistoryService')
> conf.set('spark.history.ui.port','18080')
> conf.set('spark.driver.extraJavaOptions','-Dhdp.version=2.3.4.0-3485')
> conf.set('spark.yarn.containerLauncherMaxThreads','25')
> conf.set('spark.yarn.driver.memoryOverhead','384')
> conf.set('spark.yarn.executor.memoryOverhead','384')
> conf.set('spark.yarn.historyServer.address','somehost:18080')
> conf.set('spark.yarn.max.executor.failures','3')
> conf.set('spark.yarn.preserve.staging.files','false')
> conf.set('spark.yarn.queue','default')
> conf.set('spark.yarn.scheduler.heartbeat.interval-ms','5000')
> conf.set('spark.yarn.submit.file.replication','3')
> conf.set('spark.yarn.am.extraJavaOptions','-Dhdp.version=2.3.4.0-3485')
> conf.set('spark.blockManager.port','9096')
> conf.set('spark.driver.port','9095')
> conf.set('spark.fileserver.port','9097')
>
> I am using the following tar.gz file to install spark on the node external
> to the cluster:
>
>
> http://www.apache.org/dyn/closer.lua/spark/spark-1.5.2/spark-1.5.2-bin-hadoop2.6.tgz
>
> Will this version of spark have everything required to talk correctly to
> yarn and the spark history service?
>
> So it comes down to, the spark context settings appear to be exactly the
> same, there are no errors in the logs pointing to the job not being able to
> connect to anything, none of the ports are blocked, why is this not working
> when run external to the cluster....?
>
> There is no kerberos security configured on the cluster.
>
> Thanks!
>
>
>
>
>
>
>
>

Re: pyspark - spark history server

Reply via email to