[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jake Mannix updated MAHOUT-301: ------------------------------- Attachment: MAHOUT-301.patch Ok, now we're getting somewhere. This one a) has the ability to properly handle "mahout run -h" or "mahout run --help", helpfully spitting out the list of classes with shortName's which MahoutDriver has been told about in the driver.classes.props, and more importantly, it can, both in a release environment, and in a dev environment, do: {code} ./bin/mahout run kmeans [options] {code} If $MAHOUT_CONF_DIR is set, and points to a place with the right files, then the default properties are loaded from there (overridden by [options] given above). If both $HADOOP_HOME and $HADOOP_CONF_DIR are set, then this actually sets $HADOOP_CLASSPATH to be prepended with $MAHOUT_CONF_DIR so that the following is actually run: {code} $HADOOP_HOME/bin/hadoop jar [path to examples.job] o.a.m.driver.MahoutDriver kmeans [options] {code} actually works and it gets the default properties loaded and overridden as necessary, running your job on the hadoop cluster. If one of those variables are not specified (TODO: if $HADOOP_HOME is specified, but $HADOOP_CONF_DIR is not, guess a default of $HADOOP_HOME/conf, I suppose), then the assumption is to run locally. Previous behavior still works, from what I can tell - you can still do: {code} $MAHOUT_HOME/bin/mahout kmeans --output kmeans/out --input input/vecs -k 13 --clusters tmp/foobar {code} and we're backwards compatible with the old way. Now the question is: do we want to be? Or do we want to trim down the shell script to just always use MahoutDriver, and get rid of all of the 'elif [ "$COMMAND" =' stuff and just have $CLASS be MahoutDriver, passing it $COMMAND as the first argument? Then the command line would be exactly the same as before, except you could also load up your $MAHOUT_CONF_DIR/<shortName>.props files with whatever defaults you wanted to use. > Improve command-line shell script by allowing default properties files > ---------------------------------------------------------------------- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils > Affects Versions: 0.3 > Reporter: Jake Mannix > Assignee: Jake Mannix > Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.