Hmm, that leaves Spark and Hadoop to manage tasks independently. Not ideal if 
you are running both hadoop and spark jobs simultaneously.

If you have a single user cluster or are running jobs in a pipeline I suppose 
you don’t need Mesos. 


On Apr 10, 2014, at 1:00 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

On Thu, Apr 10, 2014 at 12:00 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> What is the recommended Spark setup?
> 

Check out their docs. We don't have any special instructions for mahout.

The main point behind 0.9.0 release is that it now supports master HA thru
zookeeper, so for that reason alone you probably don't want to use mesos.

You may want to use mesos to have pre-allocated workers per spark session
(so called "coarse grained" mode). if you shoot a lot of short-running
queries (1sec or less), this is a significant win in QPS and response time.
(fine grained mode will add about 3 seconds to start all the workers lazily
to pipeline time).

In our case we are dealing with stuff that runs over 3 seconds for most
part, so assuming 0.9.0 HA is stable enough (which i haven't tried yet),
there's no reason for us to go mesos, multi-master standalone with
zookeeper is good enough.


> 
> I imagine most of us will have HDFS configured (with either local files or
> an actual cluster).
> 

Hadoop DFS API  is pretty much the only persistence api supported by Mahout
Spark Bindings at this point. So yes, you would want to have hdfs-only
cluster running 1.x or 2 doesn't matter. i use cdh 4 distros.


> Since most of Mahout is recommended to be run on Hadoop 1.x we should use
> Mesos? https://github.com/mesos/hadoop
> 
> This would mean we'd need to have at least Hadoop 1.2.1 (in mesos and
> current mahout pom). We'd use Mesos to manage hadoop and spark jobs but
> HDFS would be controlled separately by hadoop itself.
> 

I think i addressed this. no we are not bound by the MR part of mahout
since Spark runs on whatever. like i said, with 0.9.0 + Mahout combo i
would forego mesos -- unless it turns out meaningfully faster or more
stable.



> 
> Is this about right? Is there a setup doc I missed?


i dont think one needed.

Reply via email to