On Thu, Apr 10, 2014 at 4:20 PM, Pat Ferrel <[email protected]> wrote:
> Hmm, that leaves Spark and Hadoop to manage tasks independently. Not ideal > if you are running both hadoop and spark jobs simultaneously. > I think the only resource manager that semi-officially supports both MapReduce and spark is Yarn. This sounds neat in theory, but in practice i think one discovers too many hoops to jump thru. I am also inertly dubious about quality and performance of Yarn compared to others. > > If you have a single user cluster or are running jobs in a pipeline I > suppose you don't need Mesos. > > > On Apr 10, 2014, at 1:00 PM, Dmitriy Lyubimov <[email protected]> wrote: > > On Thu, Apr 10, 2014 at 12:00 PM, Pat Ferrel <[email protected]> > wrote: > > > What is the recommended Spark setup? > > > > Check out their docs. We don't have any special instructions for mahout. > > The main point behind 0.9.0 release is that it now supports master HA thru > zookeeper, so for that reason alone you probably don't want to use mesos. > > You may want to use mesos to have pre-allocated workers per spark session > (so called "coarse grained" mode). if you shoot a lot of short-running > queries (1sec or less), this is a significant win in QPS and response time. > (fine grained mode will add about 3 seconds to start all the workers lazily > to pipeline time). > > In our case we are dealing with stuff that runs over 3 seconds for most > part, so assuming 0.9.0 HA is stable enough (which i haven't tried yet), > there's no reason for us to go mesos, multi-master standalone with > zookeeper is good enough. > > > > > > I imagine most of us will have HDFS configured (with either local files > or > > an actual cluster). > > > > Hadoop DFS API is pretty much the only persistence api supported by Mahout > Spark Bindings at this point. So yes, you would want to have hdfs-only > cluster running 1.x or 2 doesn't matter. i use cdh 4 distros. > > > > Since most of Mahout is recommended to be run on Hadoop 1.x we should use > > Mesos? https://github.com/mesos/hadoop > > > > This would mean we'd need to have at least Hadoop 1.2.1 (in mesos and > > current mahout pom). We'd use Mesos to manage hadoop and spark jobs but > > HDFS would be controlled separately by hadoop itself. > > > > I think i addressed this. no we are not bound by the MR part of mahout > since Spark runs on whatever. like i said, with 0.9.0 + Mahout combo i > would forego mesos -- unless it turns out meaningfully faster or more > stable. > > > > > > > Is this about right? Is there a setup doc I missed? > > > i dont think one needed. > >
