Re: Composing Mahout workflow (Re: Improving Our JIRA State)

Ted Dunning Thu, 27 Oct 2011 00:27:58 -0700

On Wed, Oct 26, 2011 at 7:59 PM, Drew Farris <[email protected]> wrote:


> (Also a separate topic here)
>
> On Wed, Oct 26, 2011 at 5:19 PM, Dan Brickley <[email protected]> wrote:
> >
> > Also I've been thinking in very fuzzy terms about how to compose
> > larger tasks from smaller pieces, and wondering what might be a more
> > principled way of doing this than running each bin/mahout job by hand.
> > Obviously coding it up is one way, but also little shell scripts or
> > makefiles or (if forced at gunpoint) maybe Ant ...?
>
> Well, there certainly seem to be a number of options out there, don't
> forget to mention the FlumeJava items like Ted's work on Plume or
> Cloudera Crunch. Is Oozie is an option for this as well? When I was
> looking at the clustering code recently and saw the various, methods
> starting with the run* prefix, I really wondered if there was a
> standard way that we could package these chunks of code (steps), that
> would allow them to be easily decomposed and re-combined in different
> ways.
>

I am still very convinced that lazy evaluation with execution plan rewriting
a la FlumeJava is a very important approach here.   That allows your library
and my code to intermingle in the resulting map-reduce program.


> There's some talk about beanifying our workflow steps in
> https://issues.apache.org/jira/browse/MAHOUT-612, but I can't say I
> understand how this would allow us to reach the composable workflow
> goal.
>

I don't think it does.  It just passes data around in files like we do now.

Re: Composing Mahout workflow (Re: Improving Our JIRA State)

Reply via email to