Trainer jobs should implement Hadoop's Tool -------------------------------------------
Key: MAHOUT-348 URL: https://issues.apache.org/jira/browse/MAHOUT-348 Project: Mahout Issue Type: Improvement Components: Classification Affects Versions: 0.3 Reporter: Ferdy It would be nice if the Trainer jobs (and Mahout jobs in general, those not already doing so) would implement Tool. From the Hadoop's javadocs: "Tool, is the standard for any Map-Reduce tool/application. The tool/application should delegate the handling of standard command-line options to ToolRunner.run(Tool, String[]) and only handle its custom arguments." The problem we are running into currently is the fact that as of Mahout 0.3 there is no way to submit a CBayesDriver job with custom Configuration. Therefore it is not possible to set the classpath right for it's Mappers and Reducers, if one is to run the CBayesDriver with the generic "-libjars" option. Of course, this particular problem could be solved by just putting the required jars in the Hadoop lib dir, however this not always possible. For a custom Hadoop deployment (shared among many users and different types of jobs), every job should be able to specify it's own library dependencies. Note: I'm currently aware of issue MAHOUT-167, which has limited overlap with this issue: MAHOUT-167 states that the new API should be used (particulary for Clustering jobs). This issue addresses the needs for implementing a Hadoop Job interface at all, preferably Tool. Also, there's issue MAHOUT-294, an effort to track all changes surrounding the Job API. Let me hear your thoughts, and I'll whip up a patch when needed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.