Hi I finally got all working. Here are the steps (for information, I am on HDP 2.6.5):
- copy the old hive-site.xml into the new spark conf folder - (optional?) donwload the jersey-bundle-1.8.jar and put it into the jars folder - build a tar gz from all the jars and copy that archive to hdfs with chown hdfs:hadoop - create a spark-default.conf file into conf folder and add the below lines: > spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native > spark.executor.extraLibraryPath > /usr/hdp/current/hadoop-client/lib/native > spark.driver.extraJavaOptions -Dhdp.version=2.6.5.0-292 > spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.5.0-292 > spark.eventLog.dir hdfs:///spark-history > spark.eventLog.enabled false > spark.hadoop.yarn.timeline-service.enabled false > org.apache.spark.deploy.history.FsHistoryProvider > spark.yarn.containerLauncherMaxThreads 25 > spark.driver.memoryOverhead 200 > spark.executor.memoryOverhead 200 > spark.yarn.max.executor.failures 3 > spark.yarn.preserve.staging.files false > spark.yarn.queue default > spark.yarn.scheduler.heartbeat.interval-ms 5000 > spark.yarn.submit.file.replication 3 > spark.yarn.archive hdfs:///hdp/apps/2.6.5.0-292/spark2/spark2.4.tar.gz > spark.ui.port 4041 then the below command works (included hive, hdfs and yarn): > bin/spark-shell --master yarn Thanks for your support, On Mon, May 20, 2019 at 03:42:46PM -0400, Koert Kuipers wrote: > most likely have to set something in spark-defaults.conf like > > spark.master yarn > spark.submit.deployMode client > > On Mon, May 20, 2019 at 3:14 PM Nicolas Paris <nicolas.pa...@riseup.net> > wrote: > > Finally that was easy to connect to both hive/hdfs. I just had to copy > the hive-site.xml from the old spark version and that worked instantly > after unzipping. > > Right now I am stuck on connecting to yarn. > > > On Mon, May 20, 2019 at 02:50:44PM -0400, Koert Kuipers wrote: > > we had very little issues with hdfs or hive, but then we use hive only > for > > basic reading and writing of tables. > > > > depending on your vendor you might have to add a few settings to your > > spark-defaults.conf. i remember on hdp you had to set the hdp.version > somehow. > > we prefer to build spark with hadoop being provided, and then add hadoop > > classpath to spark classpath. this works well on cdh, hdp, and also for > cloud > > providers. > > > > for example this is a typical build with hive for cdh 5 (which is based > on > > hadoop 2.6, you change hadoop version based on vendor): > > dev/make-distribution.sh --name <yourname> --tgz -Phadoop-2.6 > -Dhadoop.version= > > 2.6.0 -Pyarn -Phadoop-provided -Phive > > add hadoop classpath to the spark classpath in spark-env.sh: > > export SPARK_DIST_CLASSPATH=$(hadoop classpath) > > > > i think certain vendors support multiple "vendor supported" installs, so > you > > could also look into that if you are not comfortable with running your > own > > spark build. > > > > On Mon, May 20, 2019 at 2:24 PM Nicolas Paris <nicolas.pa...@riseup.net> > wrote: > > > > > correct. note that you only need to install spark on the node you > launch > > it > > > from. spark doesnt need to be installed on cluster itself. > > > > That sound reasonably doable for me. My guess is I will have some > > troubles to make that spark version work with both hive & hdfs > installed > > on the cluster - or maybe that's finally plug-&-play i don't know. > > > > thanks > > > > On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote: > > > correct. note that you only need to install spark on the node you > launch > > it > > > from. spark doesnt need to be installed on cluster itself. > > > > > > the shared components between spark jobs on yarn are only really > > > spark-shuffle-service in yarn and spark-history-server. i have > found > > > compatibility for these to be good. its best if these run latest > version. > > > > > > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris < > nicolas.pa...@riseup.net> > > wrote: > > > > > > > you will need the spark version you intend to launch with on > the > > machine > > > you > > > > launch from and point to the correct spark-submit > > > > > > does this mean to install a second spark version (2.4) on the > cluster > > ? > > > > > > thanks > > > > > > On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote: > > > > yarn can happily run multiple spark versions side-by-side > > > > you will need the spark version you intend to launch with on > the > > machine > > > you > > > > launch from and point to the correct spark-submit > > > > > > > > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris < > > nicolas.pa...@riseup.net> > > > wrote: > > > > > > > > Hi > > > > > > > > I am wondering whether that's feasible to: > > > > - build a spark application (with sbt/maven) based on > spark2.4 > > > > - deploy that jar on yarn on a spark2.3 based > installation > > > > > > > > thanks by advance, > > > > > > > > > > > > -- > > > > nicolas > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > > > > > > > > > -- > > > nicolas > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > > > > > -- > > nicolas > > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > -- > nicolas > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- nicolas --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org