thanks, yes that's it
On Mon, Apr 14, 2014 at 11:26 AM, Saikat Kanjilal <sxk1...@hotmail.com>wrote: > @Pat, > In regards to your question on JIRA, this is Dmitry's email about running > mahout on spark. > > Sent from my iPhone > > > On Apr 11, 2014, at 7:52 PM, "Andrew Musselman" < > andrew.mussel...@gmail.com> wrote: > > > > We've used Mesos at a client to run both Hadoop and Spark jobs in the > same > > setup. It's been a good experience so far. > > > > I haven't used YARN on any projects yet but it looks like you need to > > rebuild Spark to run on it currently: > > https://spark.apache.org/docs/0.9.0/running-on-yarn.html > > > > Why not officially support Hadoop v2 and recommend YARN for that, as well > > as supporting Mesos? > > > > Another question is how long we will support Hadoop v1. > > > > > >> On Fri, Apr 11, 2014 at 1:43 PM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > >> > >> I am pretty sure that mesos supports both map reduce and spark. > >> > >> In general, though, the biggest design consideration in which resource > >> manager to use is to comply with local standards and traditions. > >> > >> For playing around, stand-alone spark is fine. > >> > >> > >> > >> On Thu, Apr 10, 2014 at 4:29 PM, Dmitriy Lyubimov <dlie...@gmail.com> > >> wrote: > >> > >>>> On Thu, Apr 10, 2014 at 4:20 PM, Pat Ferrel <p...@occamsmachete.com> > >>> wrote: > >>> > >>>> Hmm, that leaves Spark and Hadoop to manage tasks independently. Not > >>> ideal > >>>> if you are running both hadoop and spark jobs simultaneously. > >>> > >>> I think the only resource manager that semi-officially supports both > >>> MapReduce and spark is Yarn. This sounds neat in theory, but in > practice > >> i > >>> think one discovers too many hoops to jump thru. I am also inertly > >> dubious > >>> about quality and performance of Yarn compared to others. > >>> > >>> > >>>> > >>>> If you have a single user cluster or are running jobs in a pipeline I > >>>> suppose you don't need Mesos. > >>>> > >>>> > >>>> On Apr 10, 2014, at 1:00 PM, Dmitriy Lyubimov <dlie...@gmail.com> > >> wrote: > >>>> > >>>> On Thu, Apr 10, 2014 at 12:00 PM, Pat Ferrel <p...@occamsmachete.com> > >>>> wrote: > >>>> > >>>>> What is the recommended Spark setup? > >>>> > >>>> Check out their docs. We don't have any special instructions for > >> mahout. > >>>> > >>>> The main point behind 0.9.0 release is that it now supports master HA > >>> thru > >>>> zookeeper, so for that reason alone you probably don't want to use > >> mesos. > >>>> > >>>> You may want to use mesos to have pre-allocated workers per spark > >> session > >>>> (so called "coarse grained" mode). if you shoot a lot of short-running > >>>> queries (1sec or less), this is a significant win in QPS and response > >>> time. > >>>> (fine grained mode will add about 3 seconds to start all the workers > >>> lazily > >>>> to pipeline time). > >>>> > >>>> In our case we are dealing with stuff that runs over 3 seconds for > most > >>>> part, so assuming 0.9.0 HA is stable enough (which i haven't tried > >> yet), > >>>> there's no reason for us to go mesos, multi-master standalone with > >>>> zookeeper is good enough. > >>>> > >>>> > >>>>> > >>>>> I imagine most of us will have HDFS configured (with either local > >> files > >>>> or > >>>>> an actual cluster). > >>>> > >>>> Hadoop DFS API is pretty much the only persistence api supported by > >>> Mahout > >>>> Spark Bindings at this point. So yes, you would want to have hdfs-only > >>>> cluster running 1.x or 2 doesn't matter. i use cdh 4 distros. > >>>> > >>>> > >>>>> Since most of Mahout is recommended to be run on Hadoop 1.x we should > >>> use > >>>>> Mesos? https://github.com/mesos/hadoop > >>>>> > >>>>> This would mean we'd need to have at least Hadoop 1.2.1 (in mesos and > >>>>> current mahout pom). We'd use Mesos to manage hadoop and spark jobs > >> but > >>>>> HDFS would be controlled separately by hadoop itself. > >>>> > >>>> I think i addressed this. no we are not bound by the MR part of mahout > >>>> since Spark runs on whatever. like i said, with 0.9.0 + Mahout combo i > >>>> would forego mesos -- unless it turns out meaningfully faster or more > >>>> stable. > >>>> > >>>> > >>>> > >>>>> > >>>>> Is this about right? Is there a setup doc I missed? > >>>> > >>>> > >>>> i dont think one needed. > >> >