On Wed, Oct 26, 2011 at 7:59 PM, Drew Farris <[email protected]> wrote:
> (Also a separate topic here) > > On Wed, Oct 26, 2011 at 5:19 PM, Dan Brickley <[email protected]> wrote: > > > > Also I've been thinking in very fuzzy terms about how to compose > > larger tasks from smaller pieces, and wondering what might be a more > > principled way of doing this than running each bin/mahout job by hand. > > Obviously coding it up is one way, but also little shell scripts or > > makefiles or (if forced at gunpoint) maybe Ant ...? > > Well, there certainly seem to be a number of options out there, don't > forget to mention the FlumeJava items like Ted's work on Plume or > Cloudera Crunch. Is Oozie is an option for this as well? When I was > looking at the clustering code recently and saw the various, methods > starting with the run* prefix, I really wondered if there was a > standard way that we could package these chunks of code (steps), that > would allow them to be easily decomposed and re-combined in different > ways. > I am still very convinced that lazy evaluation with execution plan rewriting a la FlumeJava is a very important approach here. That allows your library and my code to intermingle in the resulting map-reduce program. > There's some talk about beanifying our workflow steps in > https://issues.apache.org/jira/browse/MAHOUT-612, but I can't say I > understand how this would allow us to reach the composable workflow > goal. > I don't think it does. It just passes data around in files like we do now.
