[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Mannix updated MAHOUT-301:
-------------------------------

    Attachment: MAHOUT-301.patch

Ok, now we're getting somewhere.  This one a) has the ability to properly 
handle "mahout run -h" or "mahout run --help", helpfully spitting out the list 
of classes with shortName's which MahoutDriver has been told about in the 
driver.classes.props, and more importantly, it can, both in a release 
environment, and in a dev environment, do:

{code}
./bin/mahout run kmeans [options]
{code}

If $MAHOUT_CONF_DIR is set, and points to a place with the right files, then 
the default properties are loaded from there (overridden by [options] given 
above). 

If both $HADOOP_HOME and $HADOOP_CONF_DIR are set, then this actually sets 
$HADOOP_CLASSPATH to be prepended with $MAHOUT_CONF_DIR so that the following 
is actually run:

{code}
$HADOOP_HOME/bin/hadoop jar [path to examples.job] o.a.m.driver.MahoutDriver 
kmeans [options]
{code}

actually works and it gets the default properties loaded and overridden as 
necessary, running your job on the hadoop cluster.

If one of those variables are not specified (TODO: if $HADOOP_HOME is 
specified, but $HADOOP_CONF_DIR is not, guess a default of $HADOOP_HOME/conf, I 
suppose), then the assumption is to run locally.

Previous behavior still works, from what I can tell - you can still do:

{code}
$MAHOUT_HOME/bin/mahout kmeans --output kmeans/out --input input/vecs -k 13 
--clusters tmp/foobar
{code}

and we're backwards compatible with the old way.

Now the question is: do we want to be?  Or do we want to trim down the shell 
script to just always use MahoutDriver, and get rid of all of the 'elif [ 
"$COMMAND" =' stuff and just have $CLASS be MahoutDriver, passing it $COMMAND 
as the first argument?  

Then the command line would be exactly the same as before, except you could 
also load up your $MAHOUT_CONF_DIR/<shortName>.props files with whatever 
defaults you wanted to use.

> Improve command-line shell script by allowing default properties files
> ----------------------------------------------------------------------
>
>                 Key: MAHOUT-301
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-301
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Utils
>    Affects Versions: 0.3
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, 
> MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to