Re: Submitting extra jars on spark applications on yarn with cluster mode

Artemis User Sat, 14 Nov 2020 07:13:06 -0800

Assuming you were using hadoop for your yarn cluster. You can specifythe spark parameters spark.yarn.archive or spark.yarn.jars to containthe jar directory or jar files so that hadoop can find them by default. See Spark online doc for details(http://spark.apache.org/docs/latest/running-on-yarn.html#adding-other-jars).For instance:


spark.yarn.archive              hdfs:///spark-3/jars

Please note that you will have to use the hadoop copy command to copyyour jars to the HDFS before executing spark-submit (this part wasn'tclear for a lot of non-hadoop users). You may also want to load ALLspark jars to that directory in advance to speed up the launch process.You may want to contact your Hadoop admin for help.


-- ND

On 11/14/20 7:25 AM, Pedro Cardoso wrote:

Hello,
I am submitting a spark application on spark yarn using the clusterexecution mode.The application itself depends on a couple of jars. I can successfullysubmit and run the application using spark-submit --jars option asseen below:|spark-submit \ --name Yarn-App \ --class <FQN.Class> \--properties-file conf/yarn.properties \ --jarslib/<first.jar>,lib/<second.jar>,lib/<third.jar> \ <application.jar> >log/yarn-app.txt 2>&1|
With the yarn.properties being something like:
|# Spark submit config which used in conjunction with yarn clustermode of execution to not block spark-submit command # for applicationcompletion. spark.yarn.submit.waitAppCompletion=falsespark.submit.deployMode=cluster spark.master=yarn ## General SparkApplication properties spark.driver.cores=2 spark.driver.memory=4Gspark.executor.memory=5G spark.executor.cores=2spark.driver.extraJavaOptions=-Xms2Gspark.driver.extraClassPath=<first.jar>:<second.jar>:<third.jar>spark.executor.heartbeatInterval=30sspark.shuffle.service.enabled=true spark.dynamicAllocation.enabled:True spark.dynamicAllocation.minExecutors: 1spark.dynamicAllocation.maxExecutors: 100spark.dynamicAllocation.initialExecutors: 10spark.kryo.referenceTracking=false spark.kryoserializer.buffer.max=1Gspark.ui.showConsoleProgress=true spark.yarn.am.cores=4spark.yarn.am.memory=10G spark.yarn.archive=<HDFS path to spark-onlyjars> spark.yarn.historyServer.address=<url to history server>|
However, I would like to have everyting specified in the propertiesfile to simplify the work of my team and not force them to specify thejars every time.So my question is what is the spark.property that replaces thespark-submit *--jars* parameter such that I can specify everything inproperties file?
I've tried creating a tar.gz with the contents of the archivespecified in /spark.yarn.archive + /the extra 3 jars that I need,upload that to HDFS and change the archive property but it did not work.I got class not defined exceptions on classes that come from the 3extra jars.
If it helps, the jars are only required for the driver not theexecutors. They will simply perform spark-only operations.
Thank you and have good weekend.

--

*Pedro Cardoso*

*Research Engineer*

pedro.card...@feedzai.com <mailto:pedro.card...@feedzai.com>
Follow Feedzai on Facebook. <https://www.facebook.com/Feedzai/>FollowFeedzai on Twitter! <https://twitter.com/feedzai>Connect with Feedzaion LinkedIn! <https://www.linkedin.com/company/feedzai/>
Feedzai best in class aite report<https://feedzai.com/press-releases/aite-group-names-feedzai-market-leader/>
/
/
/
The content of this email is confidential and intended for therecipient specified in message only. It is strictly prohibited toshare any part of this message with any third party, without a writtenconsent of the sender. If you received this message by mistake, pleasereply to this message and follow with its deletion, so that we canensure such a mistake does not occur in the future./
/The content of this email is confidential and intended for therecipient specified in message only. It is strictly prohibited toshare any part of this message with any third party, without a writtenconsent of the sender. If you received this message by mistake, pleasereply to this message and follow with its deletion, so that we canensure such a mistake does not occur in the future./

Re: Submitting extra jars on spark applications on yarn with cluster mode

Reply via email to