What about Groovy? Java does have scripting languages built in. Someone (sorry can't remember) has some patches to make Mahout scala-friendly.
A use case for "programmable workflow engine" is to run the same classification job 100 times with different tuning parameters, and save the confusion matrices for further optimization. Which of these tools allows this? Lance On Wed, Oct 26, 2011 at 7:59 PM, Drew Farris <[email protected]> wrote: > (Also a separate topic here) > > On Wed, Oct 26, 2011 at 5:19 PM, Dan Brickley <[email protected]> wrote: > > > > Also I've been thinking in very fuzzy terms about how to compose > > larger tasks from smaller pieces, and wondering what might be a more > > principled way of doing this than running each bin/mahout job by hand. > > Obviously coding it up is one way, but also little shell scripts or > > makefiles or (if forced at gunpoint) maybe Ant ...? > > Well, there certainly seem to be a number of options out there, don't > forget to mention the FlumeJava items like Ted's work on Plume or > Cloudera Crunch. Is Oozie is an option for this as well? When I was > looking at the clustering code recently and saw the various, methods > starting with the run* prefix, I really wondered if there was a > standard way that we could package these chunks of code (steps), that > would allow them to be easily decomposed and re-combined in different > ways. > > There's some talk about beanifying our workflow steps in > https://issues.apache.org/jira/browse/MAHOUT-612, but I can't say I > understand how this would allow us to reach the composable workflow > goal. > -- Lance Norskog [email protected]
