Re: Spark setup

Dmitriy Lyubimov Mon, 14 Apr 2014 11:47:55 -0700

thanks, yes that's it


On Mon, Apr 14, 2014 at 11:26 AM, Saikat Kanjilal <sxk1...@hotmail.com>wrote:

> @Pat,
> In regards to your question on JIRA, this is Dmitry's email about running
> mahout on spark.
>
> Sent from my iPhone
>
> > On Apr 11, 2014, at 7:52 PM, "Andrew Musselman" <
> andrew.mussel...@gmail.com> wrote:
> >
> > We've used Mesos at a client to run both Hadoop and Spark jobs in the
> same
> > setup.  It's been a good experience so far.
> >
> > I haven't used YARN on any projects yet but it looks like you need to
> > rebuild Spark to run on it currently:
> > https://spark.apache.org/docs/0.9.0/running-on-yarn.html
> >
> > Why not officially support Hadoop v2 and recommend YARN for that, as well
> > as supporting Mesos?
> >
> > Another question is how long we will support Hadoop v1.
> >
> >
> >> On Fri, Apr 11, 2014 at 1:43 PM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> >>
> >> I am pretty sure that mesos supports both map reduce and spark.
> >>
> >> In general, though, the biggest design consideration in which resource
> >> manager to use is to comply with local standards and traditions.
> >>
> >> For playing around, stand-alone spark is fine.
> >>
> >>
> >>
> >> On Thu, Apr 10, 2014 at 4:29 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> >> wrote:
> >>
> >>>> On Thu, Apr 10, 2014 at 4:20 PM, Pat Ferrel <p...@occamsmachete.com>
> >>> wrote:
> >>>
> >>>> Hmm, that leaves Spark and Hadoop to manage tasks independently. Not
> >>> ideal
> >>>> if you are running both hadoop and spark jobs simultaneously.
> >>>
> >>> I think the only resource manager that semi-officially supports both
> >>> MapReduce and spark is Yarn. This sounds neat in theory, but in
> practice
> >> i
> >>> think one discovers too many hoops to jump thru. I am also inertly
> >> dubious
> >>> about quality and performance of Yarn compared to others.
> >>>
> >>>
> >>>>
> >>>> If you have a single user cluster or are running jobs in a pipeline I
> >>>> suppose you don't need Mesos.
> >>>>
> >>>>
> >>>> On Apr 10, 2014, at 1:00 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> >> wrote:
> >>>>
> >>>> On Thu, Apr 10, 2014 at 12:00 PM, Pat Ferrel <p...@occamsmachete.com>
> >>>> wrote:
> >>>>
> >>>>> What is the recommended Spark setup?
> >>>>
> >>>> Check out their docs. We don't have any special instructions for
> >> mahout.
> >>>>
> >>>> The main point behind 0.9.0 release is that it now supports master HA
> >>> thru
> >>>> zookeeper, so for that reason alone you probably don't want to use
> >> mesos.
> >>>>
> >>>> You may want to use mesos to have pre-allocated workers per spark
> >> session
> >>>> (so called "coarse grained" mode). if you shoot a lot of short-running
> >>>> queries (1sec or less), this is a significant win in QPS and response
> >>> time.
> >>>> (fine grained mode will add about 3 seconds to start all the workers
> >>> lazily
> >>>> to pipeline time).
> >>>>
> >>>> In our case we are dealing with stuff that runs over 3 seconds for
> most
> >>>> part, so assuming 0.9.0 HA is stable enough (which i haven't tried
> >> yet),
> >>>> there's no reason for us to go mesos, multi-master standalone with
> >>>> zookeeper is good enough.
> >>>>
> >>>>
> >>>>>
> >>>>> I imagine most of us will have HDFS configured (with either local
> >> files
> >>>> or
> >>>>> an actual cluster).
> >>>>
> >>>> Hadoop DFS API  is pretty much the only persistence api supported by
> >>> Mahout
> >>>> Spark Bindings at this point. So yes, you would want to have hdfs-only
> >>>> cluster running 1.x or 2 doesn't matter. i use cdh 4 distros.
> >>>>
> >>>>
> >>>>> Since most of Mahout is recommended to be run on Hadoop 1.x we should
> >>> use
> >>>>> Mesos? https://github.com/mesos/hadoop
> >>>>>
> >>>>> This would mean we'd need to have at least Hadoop 1.2.1 (in mesos and
> >>>>> current mahout pom). We'd use Mesos to manage hadoop and spark jobs
> >> but
> >>>>> HDFS would be controlled separately by hadoop itself.
> >>>>
> >>>> I think i addressed this. no we are not bound by the MR part of mahout
> >>>> since Spark runs on whatever. like i said, with 0.9.0 + Mahout combo i
> >>>> would forego mesos -- unless it turns out meaningfully faster or more
> >>>> stable.
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> Is this about right? Is there a setup doc I missed?
> >>>>
> >>>>
> >>>> i dont think one needed.
> >>
>

Re: Spark setup

Reply via email to