[ 
https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Mannix updated MAHOUT-301:
-------------------------------

    Attachment: MAHOUT-301.patch

Better version.  Javadocs updated in the patch to reflect the way it works:


{code}
/**
 * General-purpose driver class for Mahout programs.  Utilizes 
org.apache.hadoop.util.ProgramDriver to run
 * main methods of other classes, but first loads up default properties from a 
properties file.
 *
 * Usage: run on Hadoop like so:
 *
 * $HADOOP_HOME/bin/hadoop -jar path/to/job 
org.apache.mahout.driver.MahoutDriver \
 *   [--classesFile|-cf <file>] [--defaultsFile|-df <file>] shortJobName 
[over-ride opts]
 *
 * or for local running:
 *
 * $MAHOUT_HOME/bin/mahout run [--classesFile|-cf <file>] [--defaultsFile|-df 
<file>] shortJobName [over-ride ops]
 *
 * Works like this: by default, the file 
"core/src/main/resources/driver.classes.props" is loaded, which
 * defines a mapping between short names like "VectorDumper" and fully 
qualified class names.  This file may
 * instead be overridden on the command line by specifying --classesFile|-cf 
<classesFile>.
 *
 * The default properties to be applied to the program run is pulled out of, by 
default,
 * "core/src/main/resources/<shortJobName>.props", unless --defaultsFile|-df 
<file> is specified by the cmdline.
 * The format of the default properties files is as follows:
 *
 * i|input = /path/to/my/input
 * o|output = /path/to/my/output
 * m|jarFile = /path/to/jarFile
 * # etc - each line is shortArg|longArg = value
 *
 * The next argument to the Driver is supposed to be the short name of the 
class to be run (as defined in the
 * driver.classes.props file).
 *
 * Then the class which will be run will have it's main called with
 *
 *   main(new String[] { "--input", "/path/to/my/input", "--output", 
"/path/to/my/output" });
 *
 * After all the "default" properties are loaded from the file, any further 
command-line arguments are taken in,
 * and over-ride the defaults.
 *
 * So if your core/src/main/resources/driver.classes.props looks like so:
 *
 * org.apache.mahout.utils.vectors.VectorDumper = "vecDump"
 *
 * and you have a file core/src/main/resources/vecDump.props which looks like
 *
 * o|output = /tmp/vectorOut
 * s|seqFile = /my/vector/sequenceFile
 *
 * And you execute the command-line:
 *
 * $MAHOUT_HOME/bin/mahout run vecDump -s /my/otherVector/sequenceFile
 *
 * Then org.apache.mahout.utils.vectors.VectorDumper.main() will be called with 
arguments:
 *   {"--output", "/tmp/vectorOut", "-s", "/my/otherVector/sequenceFile"}
 */
{code}

> Improve command-line shell script by allowing default properties files
> ----------------------------------------------------------------------
>
>                 Key: MAHOUT-301
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-301
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Utils
>    Affects Versions: 0.3
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Minor
>             Fix For: 0.4
>
>         Attachments: MAHOUT-301.patch, MAHOUT-301.patch
>
>
> Snippet from javadoc gives the idea:
> {code}
> /**
>  * General-purpose driver class for Mahout programs.  Utilizes 
> org.apache.hadoop.util.ProgramDriver to run
>  * main methods of other classes, but first loads up default properties from 
> a properties file.
>  *
>  * Usage: run on Hadoop like so:
>  *
>  * $HADOOP_HOME/bin/hadoop -jar path/to/job 
> org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \
>  *   [default.props file for this class] [over-ride options, all specified in 
> long form: --input, --jarFile, etc]
>  *
>  * TODO: set the Main-Class to just be MahoutDriver, so that this option 
> isn't needed?
>  *
>  * (note: using the current shell scipt, this could be modified to be just 
>  * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props 
> file] [over-ride options]
>  * )
>  *
>  * Works like this: by default, the file 
> "core/src/main/resources/driver.classes.prop" is loaded, which
>  * defines a mapping between short names like "VectorDumper" and fully 
> qualified class names.  This file may
>  * instead be overridden on the command line by having the first argument be 
> some string of the form *classes.props.
>  *
>  * The next argument to the Driver is supposed to be the short name of the 
> class to be run (as defined in the
>  * driver.classes.props file).  After this, if the next argument ends in 
> ".props" / ".properties", it is taken to
>  * be the file to use as the default properties file for this execution, and 
> key-value pairs are built up from that:
>  * if the file contains
>  *
>  * input=/path/to/my/input
>  * output=/path/to/my/output
>  *
>  * Then the class which will be run will have it's main called with
>  *
>  *   main(new String[] { "--input", "/path/to/my/input", "--output", 
> "/path/to/my/output" });
>  *
>  * After all the "default" properties are loaded from the file, any further 
> command-line arguments are taken in,
>  * and over-ride the defaults.
>  */
> {code}
> Could be cleaned up, as it's kinda ugly with the whole "file named in 
> .props", but gives the idea.  Really helps cut down on repetitive long 
> command lines, lets defaults be put props files instead of locked into the 
> code also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to