Re: Spark setup

Andrew Musselman Fri, 11 Apr 2014 16:52:19 -0700

We've used Mesos at a client to run both Hadoop and Spark jobs in the same
setup.  It's been a good experience so far.


I haven't used YARN on any projects yet but it looks like you need to
rebuild Spark to run on it currently:
https://spark.apache.org/docs/0.9.0/running-on-yarn.html

Why not officially support Hadoop v2 and recommend YARN for that, as well
as supporting Mesos?

Another question is how long we will support Hadoop v1.


On Fri, Apr 11, 2014 at 1:43 PM, Ted Dunning <[email protected]> wrote:

> I am pretty sure that mesos supports both map reduce and spark.
>
> In general, though, the biggest design consideration in which resource
> manager to use is to comply with local standards and traditions.
>
> For playing around, stand-alone spark is fine.
>
>
>
> On Thu, Apr 10, 2014 at 4:29 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > On Thu, Apr 10, 2014 at 4:20 PM, Pat Ferrel <[email protected]>
> wrote:
> >
> > > Hmm, that leaves Spark and Hadoop to manage tasks independently. Not
> > ideal
> > > if you are running both hadoop and spark jobs simultaneously.
> > >
> >
> > I think the only resource manager that semi-officially supports both
> > MapReduce and spark is Yarn. This sounds neat in theory, but in practice
> i
> > think one discovers too many hoops to jump thru. I am also inertly
> dubious
> > about quality and performance of Yarn compared to others.
> >
> >
> > >
> > > If you have a single user cluster or are running jobs in a pipeline I
> > > suppose you don't need Mesos.
> > >
> > >
> > > On Apr 10, 2014, at 1:00 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> > >
> > > On Thu, Apr 10, 2014 at 12:00 PM, Pat Ferrel <[email protected]>
> > > wrote:
> > >
> > > > What is the recommended Spark setup?
> > > >
> > >
> > > Check out their docs. We don't have any special instructions for
> mahout.
> > >
> > > The main point behind 0.9.0 release is that it now supports master HA
> > thru
> > > zookeeper, so for that reason alone you probably don't want to use
> mesos.
> > >
> > > You may want to use mesos to have pre-allocated workers per spark
> session
> > > (so called "coarse grained" mode). if you shoot a lot of short-running
> > > queries (1sec or less), this is a significant win in QPS and response
> > time.
> > > (fine grained mode will add about 3 seconds to start all the workers
> > lazily
> > > to pipeline time).
> > >
> > > In our case we are dealing with stuff that runs over 3 seconds for most
> > > part, so assuming 0.9.0 HA is stable enough (which i haven't tried
> yet),
> > > there's no reason for us to go mesos, multi-master standalone with
> > > zookeeper is good enough.
> > >
> > >
> > > >
> > > > I imagine most of us will have HDFS configured (with either local
> files
> > > or
> > > > an actual cluster).
> > > >
> > >
> > > Hadoop DFS API  is pretty much the only persistence api supported by
> > Mahout
> > > Spark Bindings at this point. So yes, you would want to have hdfs-only
> > > cluster running 1.x or 2 doesn't matter. i use cdh 4 distros.
> > >
> > >
> > > > Since most of Mahout is recommended to be run on Hadoop 1.x we should
> > use
> > > > Mesos? https://github.com/mesos/hadoop
> > > >
> > > > This would mean we'd need to have at least Hadoop 1.2.1 (in mesos and
> > > > current mahout pom). We'd use Mesos to manage hadoop and spark jobs
> but
> > > > HDFS would be controlled separately by hadoop itself.
> > > >
> > >
> > > I think i addressed this. no we are not bound by the MR part of mahout
> > > since Spark runs on whatever. like i said, with 0.9.0 + Mahout combo i
> > > would forego mesos -- unless it turns out meaningfully faster or more
> > > stable.
> > >
> > >
> > >
> > > >
> > > > Is this about right? Is there a setup doc I missed?
> > >
> > >
> > > i dont think one needed.
> > >
> > >
> >
>

Re: Spark setup

Reply via email to