Just FYI.. on 1 I have a fairly good start on 1 but am in the midst of figuring out and resolving some classpath issues around getting the unit tests to work in the context of the shell.
> Date: Thu, 10 Apr 2014 13:28:11 -0700 > Subject: Re: Spark setup > From: [email protected] > To: [email protected] > > PS. > > I see 3 ways to run (embed) this work. > > (1) The ideal way is to write dsl scripts and run them thru Mahout spark > shell. But when it is finished, we just need to download Mahout, compile > it and point MAHOUT_HOME to it. This is it. then one can launch things > either interactively thru shell, or just by passing a script to it. Just > like it happens with R. > > Unfortunately this is work-in-progress, move along, nothing to see here.... > > (2) write a quick test within mahout project, recompile mahout and launch > your code. In this case mahout will take care of shipping mahout jars to > backend automatically, and since your code is included in them, nothing > else is required. > > (3) create a standalone projejct that depends on mahout-spark artifact. In > this case, it works pretty much like (2) except if one writes closures to > be used in any code (e.g. mapBlock or custom spark pipeline continuations), > then the closure code must be also shipped to backend. This becomes a bit > more hairy -- you need to compile your application and add its jars to the > call thath creates Mahout Context, otherwise attempt to run one's code in > the back may generate ClassNotFounds. > > (4) What about CLI?... -- So what about it? Option (1) should supersede > need for any CLI. As it stands, there is no CLI support, nor there are any > future plans to support them at this point. > > -d > > > Basically, if you writing a 3rd party application to test, then you just > need mahout source compiled with MAHOUT_HOME pointing to it. One's > application should take care of its own classpath which is done > automatically if one uses maven. If you import maven into Idea, then you > can use Idea's launcher to take care of client classpath for you. Backend > classpath is taken care by mahout; but you still need to ship your > application jars to the spark session, for which there's an "extra jars" > parameter of mathoutSparkContext call . > > > > > On Thu, Apr 10, 2014 at 1:00 PM, Dmitriy Lyubimov <[email protected]> wrote: > > > > > > > > > On Thu, Apr 10, 2014 at 12:00 PM, Pat Ferrel <[email protected]>wrote: > > > >> What is the recommended Spark setup? > >> > > > > Check out their docs. We don't have any special instructions for mahout. > > > > The main point behind 0.9.0 release is that it now supports master HA thru > > zookeeper, so for that reason alone you probably don't want to use mesos. > > > > You may want to use mesos to have pre-allocated workers per spark session > > (so called "coarse grained" mode). if you shoot a lot of short-running > > queries (1sec or less), this is a significant win in QPS and response time. > > (fine grained mode will add about 3 seconds to start all the workers lazily > > to pipeline time). > > > > In our case we are dealing with stuff that runs over 3 seconds for most > > part, so assuming 0.9.0 HA is stable enough (which i haven't tried yet), > > there's no reason for us to go mesos, multi-master standalone with > > zookeeper is good enough. > > > > > >> > >> I imagine most of us will have HDFS configured (with either local files > >> or an actual cluster). > >> > > > > Hadoop DFS API is pretty much the only persistence api supported by > > Mahout Spark Bindings at this point. So yes, you would want to have > > hdfs-only cluster running 1.x or 2 doesn't matter. i use cdh 4 distros. > > > > > >> Since most of Mahout is recommended to be run on Hadoop 1.x we should use > >> Mesos? https://github.com/mesos/hadoop > >> > >> This would mean we'd need to have at least Hadoop 1.2.1 (in mesos and > >> current mahout pom). We'd use Mesos to manage hadoop and spark jobs but > >> HDFS would be controlled separately by hadoop itself. > >> > > > > I think i addressed this. no we are not bound by the MR part of mahout > > since Spark runs on whatever. like i said, with 0.9.0 + Mahout combo i > > would forego mesos -- unless it turns out meaningfully faster or more > > stable. > > > > > > > >> > >> Is this about right? Is there a setup doc I missed? > > > > > > i dont think one needed. > > > >
