Re: Spark setup

Dmitriy Lyubimov Thu, 10 Apr 2014 16:30:07 -0700

On Thu, Apr 10, 2014 at 4:20 PM, Pat Ferrel <[email protected]> wrote:


> Hmm, that leaves Spark and Hadoop to manage tasks independently. Not ideal
> if you are running both hadoop and spark jobs simultaneously.
>

I think the only resource manager that semi-officially supports both
MapReduce and spark is Yarn. This sounds neat in theory, but in practice i
think one discovers too many hoops to jump thru. I am also inertly dubious
about quality and performance of Yarn compared to others.


>
> If you have a single user cluster or are running jobs in a pipeline I
> suppose you don't need Mesos.
>
>
> On Apr 10, 2014, at 1:00 PM, Dmitriy Lyubimov <[email protected]> wrote:
>
> On Thu, Apr 10, 2014 at 12:00 PM, Pat Ferrel <[email protected]>
> wrote:
>
> > What is the recommended Spark setup?
> >
>
> Check out their docs. We don't have any special instructions for mahout.
>
> The main point behind 0.9.0 release is that it now supports master HA thru
> zookeeper, so for that reason alone you probably don't want to use mesos.
>
> You may want to use mesos to have pre-allocated workers per spark session
> (so called "coarse grained" mode). if you shoot a lot of short-running
> queries (1sec or less), this is a significant win in QPS and response time.
> (fine grained mode will add about 3 seconds to start all the workers lazily
> to pipeline time).
>
> In our case we are dealing with stuff that runs over 3 seconds for most
> part, so assuming 0.9.0 HA is stable enough (which i haven't tried yet),
> there's no reason for us to go mesos, multi-master standalone with
> zookeeper is good enough.
>
>
> >
> > I imagine most of us will have HDFS configured (with either local files
> or
> > an actual cluster).
> >
>
> Hadoop DFS API  is pretty much the only persistence api supported by Mahout
> Spark Bindings at this point. So yes, you would want to have hdfs-only
> cluster running 1.x or 2 doesn't matter. i use cdh 4 distros.
>
>
> > Since most of Mahout is recommended to be run on Hadoop 1.x we should use
> > Mesos? https://github.com/mesos/hadoop
> >
> > This would mean we'd need to have at least Hadoop 1.2.1 (in mesos and
> > current mahout pom). We'd use Mesos to manage hadoop and spark jobs but
> > HDFS would be controlled separately by hadoop itself.
> >
>
> I think i addressed this. no we are not bound by the MR part of mahout
> since Spark runs on whatever. like i said, with 0.9.0 + Mahout combo i
> would forego mesos -- unless it turns out meaningfully faster or more
> stable.
>
>
>
> >
> > Is this about right? Is there a setup doc I missed?
>
>
> i dont think one needed.
>
>

Re: Spark setup

Reply via email to