we had very little issues with hdfs or hive, but then we use hive only for
basic reading and writing of tables.

depending on your vendor you might have to add a few settings to your
spark-defaults.conf. i remember on hdp you had to set the hdp.version
somehow.
we prefer to build spark with hadoop being provided, and then add hadoop
classpath to spark classpath. this works well on cdh, hdp, and also for
cloud providers.

for example this is a typical build with hive for cdh 5 (which is based on
hadoop 2.6, you change hadoop version based on vendor):
dev/make-distribution.sh --name <yourname> --tgz -Phadoop-2.6
-Dhadoop.version=2.6.0 -Pyarn -Phadoop-provided -Phive
add hadoop classpath to the spark classpath in spark-env.sh:
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

i think certain vendors support multiple "vendor supported" installs, so
you could also look into that if you are not comfortable with running your
own spark build.

On Mon, May 20, 2019 at 2:24 PM Nicolas Paris <nicolas.pa...@riseup.net>
wrote:

> > correct. note that you only need to install spark on the node you launch
> it
> > from. spark doesnt need to be installed on cluster itself.
>
> That sound reasonably doable for me. My guess is I will have some
> troubles to make that spark version work with both hive & hdfs installed
> on the cluster - or maybe that's finally plug-&-play i don't know.
>
> thanks
>
> On Mon, May 20, 2019 at 02:16:43PM -0400, Koert Kuipers wrote:
> > correct. note that you only need to install spark on the node you launch
> it
> > from. spark doesnt need to be installed on cluster itself.
> >
> > the shared components between spark jobs on yarn are only really
> > spark-shuffle-service in yarn and spark-history-server. i have found
> > compatibility for these to be good. its best if these run latest version.
> >
> > On Mon, May 20, 2019 at 2:02 PM Nicolas Paris <nicolas.pa...@riseup.net>
> wrote:
> >
> >     > you will need the spark version you intend to launch with on the
> machine
> >     you
> >     > launch from and point to the correct spark-submit
> >
> >     does this mean to install a second spark version (2.4) on the
> cluster ?
> >
> >     thanks
> >
> >     On Mon, May 20, 2019 at 01:58:11PM -0400, Koert Kuipers wrote:
> >     > yarn can happily run multiple spark versions side-by-side
> >     > you will need the spark version you intend to launch with on the
> machine
> >     you
> >     > launch from and point to the correct spark-submit
> >     >
> >     > On Mon, May 20, 2019 at 1:50 PM Nicolas Paris <
> nicolas.pa...@riseup.net>
> >     wrote:
> >     >
> >     >     Hi
> >     >
> >     >     I am wondering whether that's feasible to:
> >     >     - build a spark application (with sbt/maven) based on spark2.4
> >     >     - deploy that jar on yarn on a spark2.3 based installation
> >     >
> >     >     thanks by advance,
> >     >
> >     >
> >     >     --
> >     >     nicolas
> >     >
> >     >
>  ---------------------------------------------------------------------
> >     >     To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >     >
> >     >
> >
> >     --
> >     nicolas
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
> >
>
> --
> nicolas
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to