Re: run new spark version on old spark cluster ?

Nicolas Paris Tue, 21 May 2019 09:42:46 -0700

Hi

I finally got all working. Here are the steps (for information, I am on HDP 
2.6.5):


- copy the old hive-site.xml into the new spark conf folder
- (optional?) donwload the jersey-bundle-1.8.jar and put it into the jars folder
- build a tar gz from all the jars and copy that archive to hdfs with chown 
hdfs:hadoop
- create a spark-default.conf file into conf folder and add the below lines:

> spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native
> spark.executor.extraLibraryPath
> /usr/hdp/current/hadoop-client/lib/native
> spark.driver.extraJavaOptions -Dhdp.version=2.6.5.0-292
> spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.5.0-292
> spark.eventLog.dir hdfs:///spark-history
> spark.eventLog.enabled false
> spark.hadoop.yarn.timeline-service.enabled false
> org.apache.spark.deploy.history.FsHistoryProvider
> spark.yarn.containerLauncherMaxThreads 25
> spark.driver.memoryOverhead 200
> spark.executor.memoryOverhead 200
> spark.yarn.max.executor.failures 3
> spark.yarn.preserve.staging.files false
> spark.yarn.queue default
> spark.yarn.scheduler.heartbeat.interval-ms 5000
> spark.yarn.submit.file.replication 3
> spark.yarn.archive hdfs:///hdp/apps/2.6.5.0-292/spark2/spark2.4.tar.gz
> spark.ui.port 4041

then the below command works (included hive, hdfs and yarn):

> bin/spark-shell --master yarn


Thanks for your support,



On Mon, May 20, 2019 at 03:42:46PM -0400, Koert Kuipers wrote:
> most likely have to set something in spark-defaults.conf like
> 
> spark.master yarn
> spark.submit.deployMode client
> 
> On Mon, May 20, 2019 at 3:14 PM Nicolas Paris <nicolas.pa...@riseup.net> 
> wrote:
> 
>     Finally that was easy to connect to both hive/hdfs. I just had to copy
>     the hive-site.xml from the old spark version and that worked instantly
>     after unzipping.
> 
>     Right now I am stuck on connecting to yarn.
> 
> 
>     On Mon, May 20, 2019 at 02:50:44PM -0400, Koert Kuipers wrote:
>     > we had very little issues with hdfs or hive, but then we use hive only
>     for
>     > basic reading and writing of tables.
>     >
>     > depending on your vendor you might have to add a few settings to your
>     > spark-defaults.conf. i remember on hdp you had to set the hdp.version
>     somehow.
>     > we prefer to build spark with hadoop being provided, and then add hadoop
>     > classpath to spark classpath. this works well on cdh, hdp, and also for
>     cloud
>     > providers.
>     >
>     > for example this is a typical build with hive for cdh 5 (which is based
>     on
>     > hadoop 2.6, you change hadoop version based on vendor):
>     > dev/make-distribution.sh --name <yourname> --tgz -Phadoop-2.6
>     -Dhadoop.version=
>     > 2.6.0 -Pyarn -Phadoop-provided -Phive
>     > add hadoop classpath to the spark classpath in spark-env.sh:
>     > export SPARK_DIST_CLASSPATH=$(hadoop classpath)
>     >
>     > i think certain vendors support multiple "vendor supported" installs, so
>     you
>     > could also look into that if you are not comfortable with running your
>     own
>     > spark build.
>     >
>     > On Mon, May 20, 2019 at 2:24 PM Nicolas Paris <nicolas.pa...@riseup.net>
>     wrote:
>     >
>     >     > correct. note that you only need to install spark on the node you
>     launch
>     >     it
>     >     > from. spark doesnt need to be installed on cluster itself.
>     >
>     >     That sound reasonably doable for me. My guess is I will have some
>     >     troubles to make that spark version work with both hive & hdfs
>     installed
>     >     on the cluster - or maybe that's finally plug-&-play i don't know.
>     >
>     >     thanks
>     >
>     >     On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote:
>     >     > correct. note that you only need to install spark on the node you
>     launch
>     >     it
>     >     > from. spark doesnt need to be installed on cluster itself.
>     >     >
>     >     > the shared components between spark jobs on yarn are only really
>     >     > spark-shuffle-service in yarn and spark-history-server. i have
>     found
>     >     > compatibility for these to be good. its best if these run latest
>     version.
>     >     >
>     >     > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris <
>     nicolas.pa...@riseup.net>
>     >     wrote:
>     >     >
>     >     >     > you will need the spark version you intend to launch with on
>     the
>     >     machine
>     >     >     you
>     >     >     > launch from and point to the correct spark-submit
>     >     >
>     >     >     does this mean to install a second spark version (2.4) on the
>     cluster
>     >     ?
>     >     >
>     >     >     thanks
>     >     >
>     >     >     On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
>     >     >     > yarn can happily run multiple spark versions side-by-side
>     >     >     > you will need the spark version you intend to launch with on
>     the
>     >     machine
>     >     >     you
>     >     >     > launch from and point to the correct spark-submit
>     >     >     >
>     >     >     > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris <
>     >     nicolas.pa...@riseup.net>
>     >     >     wrote:
>     >     >     >
>     >     >     >     Hi
>     >     >     >
>     >     >     >     I am wondering whether that's feasible to:
>     >     >     >     - build a spark application (with sbt/maven) based on
>     spark2.4
>     >     >     >     - deploy that jar on yarn on a spark2.3 based
>     installation
>     >     >     >
>     >     >     >     thanks by advance,
>     >     >     >
>     >     >     >
>     >     >     >     --
>     >     >     >     nicolas
>     >     >     >
>     >     >     >   
>     >   
>       ---------------------------------------------------------------------
>     >     >     >     To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>     >     >     >
>     >     >     >
>     >     >
>     >     >     --
>     >     >     nicolas
>     >     >
>     >     >   
>      ---------------------------------------------------------------------
>     >     >     To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>     >     >
>     >     >
>     >
>     >     --
>     >     nicolas
>     >
>     >     
> ---------------------------------------------------------------------
>     >     To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>     >
>     >
> 
>     --
>     nicolas
> 
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> 

-- 
nicolas

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: run new spark version on old spark cluster ?

Reply via email to