Trainer jobs should implement Hadoop's Tool
-------------------------------------------

                 Key: MAHOUT-348
                 URL: https://issues.apache.org/jira/browse/MAHOUT-348
             Project: Mahout
          Issue Type: Improvement
          Components: Classification
    Affects Versions: 0.3
            Reporter: Ferdy


It would be nice if the Trainer jobs (and Mahout jobs in general, those not 
already doing so) would implement Tool. From the Hadoop's javadocs:

"Tool, is the standard for any Map-Reduce tool/application. The 
tool/application should delegate the handling of standard command-line options 
to ToolRunner.run(Tool, String[]) and only handle its custom arguments."

The problem we are running into currently is the fact that as of Mahout 0.3 
there is no way to submit a CBayesDriver job with custom Configuration. 
Therefore it is not possible to set the classpath right for it's Mappers and 
Reducers, if one is to run the CBayesDriver with the generic "-libjars" option. 
Of course, this particular problem could be solved by just putting the required 
jars in the Hadoop lib dir, however this not always possible. For a custom 
Hadoop deployment (shared among many users and different types of jobs), every 
job should be able to specify it's own library dependencies.

Note: I'm currently aware of issue MAHOUT-167, which has limited overlap with 
this issue: MAHOUT-167 states that the new API should be used (particulary for 
Clustering jobs). This issue addresses the needs for implementing a Hadoop Job 
interface at all, preferably Tool.

Also, there's issue MAHOUT-294, an effort to track all changes surrounding the 
Job API.

Let me hear your thoughts, and I'll whip up a patch when needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to