@Pat,
In regards to your question on JIRA, this is Dmitry's email about running 
mahout on spark.

Sent from my iPhone

> On Apr 11, 2014, at 7:52 PM, "Andrew Musselman" <[email protected]> 
> wrote:
> 
> We've used Mesos at a client to run both Hadoop and Spark jobs in the same
> setup.  It's been a good experience so far.
> 
> I haven't used YARN on any projects yet but it looks like you need to
> rebuild Spark to run on it currently:
> https://spark.apache.org/docs/0.9.0/running-on-yarn.html
> 
> Why not officially support Hadoop v2 and recommend YARN for that, as well
> as supporting Mesos?
> 
> Another question is how long we will support Hadoop v1.
> 
> 
>> On Fri, Apr 11, 2014 at 1:43 PM, Ted Dunning <[email protected]> wrote:
>> 
>> I am pretty sure that mesos supports both map reduce and spark.
>> 
>> In general, though, the biggest design consideration in which resource
>> manager to use is to comply with local standards and traditions.
>> 
>> For playing around, stand-alone spark is fine.
>> 
>> 
>> 
>> On Thu, Apr 10, 2014 at 4:29 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>> 
>>>> On Thu, Apr 10, 2014 at 4:20 PM, Pat Ferrel <[email protected]>
>>> wrote:
>>> 
>>>> Hmm, that leaves Spark and Hadoop to manage tasks independently. Not
>>> ideal
>>>> if you are running both hadoop and spark jobs simultaneously.
>>> 
>>> I think the only resource manager that semi-officially supports both
>>> MapReduce and spark is Yarn. This sounds neat in theory, but in practice
>> i
>>> think one discovers too many hoops to jump thru. I am also inertly
>> dubious
>>> about quality and performance of Yarn compared to others.
>>> 
>>> 
>>>> 
>>>> If you have a single user cluster or are running jobs in a pipeline I
>>>> suppose you don't need Mesos.
>>>> 
>>>> 
>>>> On Apr 10, 2014, at 1:00 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>>>> 
>>>> On Thu, Apr 10, 2014 at 12:00 PM, Pat Ferrel <[email protected]>
>>>> wrote:
>>>> 
>>>>> What is the recommended Spark setup?
>>>> 
>>>> Check out their docs. We don't have any special instructions for
>> mahout.
>>>> 
>>>> The main point behind 0.9.0 release is that it now supports master HA
>>> thru
>>>> zookeeper, so for that reason alone you probably don't want to use
>> mesos.
>>>> 
>>>> You may want to use mesos to have pre-allocated workers per spark
>> session
>>>> (so called "coarse grained" mode). if you shoot a lot of short-running
>>>> queries (1sec or less), this is a significant win in QPS and response
>>> time.
>>>> (fine grained mode will add about 3 seconds to start all the workers
>>> lazily
>>>> to pipeline time).
>>>> 
>>>> In our case we are dealing with stuff that runs over 3 seconds for most
>>>> part, so assuming 0.9.0 HA is stable enough (which i haven't tried
>> yet),
>>>> there's no reason for us to go mesos, multi-master standalone with
>>>> zookeeper is good enough.
>>>> 
>>>> 
>>>>> 
>>>>> I imagine most of us will have HDFS configured (with either local
>> files
>>>> or
>>>>> an actual cluster).
>>>> 
>>>> Hadoop DFS API  is pretty much the only persistence api supported by
>>> Mahout
>>>> Spark Bindings at this point. So yes, you would want to have hdfs-only
>>>> cluster running 1.x or 2 doesn't matter. i use cdh 4 distros.
>>>> 
>>>> 
>>>>> Since most of Mahout is recommended to be run on Hadoop 1.x we should
>>> use
>>>>> Mesos? https://github.com/mesos/hadoop
>>>>> 
>>>>> This would mean we'd need to have at least Hadoop 1.2.1 (in mesos and
>>>>> current mahout pom). We'd use Mesos to manage hadoop and spark jobs
>> but
>>>>> HDFS would be controlled separately by hadoop itself.
>>>> 
>>>> I think i addressed this. no we are not bound by the MR part of mahout
>>>> since Spark runs on whatever. like i said, with 0.9.0 + Mahout combo i
>>>> would forego mesos -- unless it turns out meaningfully faster or more
>>>> stable.
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Is this about right? Is there a setup doc I missed?
>>>> 
>>>> 
>>>> i dont think one needed.
>> 

Reply via email to