[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838868#action_12838868 ] Drew Farris commented on MAHOUT-301: bq. Can you upload the patch for the maven configs. Maybe a separate issue? and mark it as 0.3. See: MAHOUT-311 > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.3 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838834#action_12838834 ] Grant Ingersoll commented on MAHOUT-301: Just capturing something longer term here, no need to block anything. One of the things I'd love to have is some basic "experiment management" capabilities. I can imagine in this mode that things like input parameters, etc. are all written into files and organized along with the output, etc. such that it is easy to keep track of all the different ways things get run over time. Seems like this script w/ default property files, etc. could be part of that solution. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.3 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838725#action_12838725 ] Jake Mannix commented on MAHOUT-301: Drew, do you have a patch with your last changes? If I can try them out too to verify that they work on more than one system, we can commit this I think. {quote} Should I commit those, open another issue or should I re-post as a part of this patch? {quote} I'd say that should be in a separate issue, that should be small enough to mark for 0.3 and commit separately. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.3 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838700#action_12838700 ] Robin Anil commented on MAHOUT-301: --- +1 for committing this. Can you upload the patch for the maven configs. Maybe a separate issue? and mark it as 0.3. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.3 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838694#action_12838694 ] Drew Farris commented on MAHOUT-301: Had a chance to take this out for a spin tonight. It is working very well. I did some k-means using the script starting with the 20newsgroups collection as textfiles, both locally and on a cluster. I think it is good to go, can we commit? I'd be happy to handle it if we have sufficient consensus. There are a couple modifications I've made to the maven assemblies to include all of this in the binary and source releases properly (adding the conf directory, setting executable on the mahout script, etc). While I was at it, I cleaned up the bin assembly process so that the releases should build faster too. Should I commit those, open another issue or should I re-post as a part of this patch? > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.3 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838159#action_12838159 ] Jake Mannix commented on MAHOUT-301: Ok, new patch, with the modification that indeed you have the ability to just run "$MAHOUT_HOME/bin/mahout [args]" and it still works. And if .props exists on the classpath, it'll get used for defaults. w00t, as the kids say. I've added to the patch the conf directory (you'd not kept it in your patch, Drew), and there are a bunch of emtpy files in there, except some of them have commented out properties in the right format: cleaneigen.props : {code} #ci|corpusInput = #ei|eigenInput = #o|output = {code} To help users see what they can store in here, and in what format. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837917#action_12837917 ] Jake Mannix commented on MAHOUT-301: Awesome Drew, I'll check it out. {quote} One potential TODO from this would be to potentially launch arbitrary classes if no matching program name is specified, but I need to dig into ProgramDriver to understand how it works before I can contribute something like that. {quote} Yeah, I was thinking about that over breakfast - an easy hack to do this is while the driver.classes.props file is being read, keep track if whether you've found an exact match on args[0], and once all of drivers.classes.props has been read and you haven't found a match, just do a Class.forName(args[0]) and add it to the ProgramDriver with it's full name as the "shortName" and the rest of the program will work (and would even still work with default properties files! If you put com.mycompany.MyClass.props in $MAHOUT_CONF_DIR, it'll read that for defaults). I'll see if I can add that to your patch later today. I think if that's working, we should be looking good to commit and see who else wants to play with it and test it out. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301-drew.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, > MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837763#action_12837763 ] Drew Farris commented on MAHOUT-301: This sounds great. I will take it for a spin when I am in front of a computer. My take is that the old if, else it's in the script are now redundant. As long as one can use MahoutDriver to run both classes that have been aliased to short names and classes specified using the full name, I say let's get rid of them. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837616#action_12837616 ] Jake Mannix commented on MAHOUT-301: Our comments crossed in the ether! :) {quote} Any thoughts on whether it makes sense to attempt to work the latter form into the mahout script? It won't pull the necessary config files for MahoutDriver in from a path outside of the job file unless HADOOP_CLASSPATH is set to include those directories, but I haven't had a chance to verify that. {quote} You're right - I did indeed set my HADOOP_CLASSPATH to include $MAHOUT_CONF_DIR, which allowed this to work, otherwise it would not. This should be done by the script. Ideally, yes, it's ugly but if $MAHOUT_HOME/bin/mahout just sets $HADOOP_CLASSPATH to include $MAHOUT_CONF_DIR (or $MAHOUT_HOME/conf if that variable is not set), then just execute $HADOOP_HOME/bin/hadoop jar ... then it should work. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837607#action_12837607 ] Drew Farris commented on MAHOUT-301: It doesn't appear that the following command works as intended: {code} ./bin/mahout org.apache.hadoop.util.RunJar /path/to/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver TestClassifier {code} The following seems to be the appropriate way to achieve what we're trying to do here: {code} hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver TestClassifier {code} Any thoughts on whether it makes sense to attempt to work the latter form into the mahout script? It won't pull the necessary config files for MahoutDriver in from a path outside of the job file unless HADOOP_CLASSPATH is set to include those directories, but I haven't had a chance to verify that. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837477#action_12837477 ] Drew Farris commented on MAHOUT-301: bq. Cool, so why not just check to see if $HADOOP_CONF_DIR is set - if it is, do "runjob" as described, if it's not, do "run" to do locally. Yes, ok -- that should work because I believe you can use RunJar to launch anything even if it isn't a mapreduce job, no need for classpath setup in this case either -- all you need to do is point to the examples job. Might be able to take advantage of this elsewhere. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837472#action_12837472 ] Jake Mannix commented on MAHOUT-301: {quote} Ahh, I see where you're coming from, so without core, you're suggesting that mahout pick up the jar files in the target directories if they exist? I think it is fine to modify the non-core classpath to include these, they won't be present in the release build anyway. {quote} Cool, yeah, that makes sense. {quote} Are any of the default properties files used beyond the MahoutDriver, which executes locally and sets up the job? Do these files need to be distributed to the rest of the cluster? As noted above, I think the proper way to run MahoutDriver in the context of a distributed job is to do something like: {code} ./bin/mahout org.apache.hadoop.util.RunJar /path/to/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver TestClassifier {code} I suspect we could easilly modify the mahout script and shorten this to: {code} ./bin/mahout runjob TestClassifier {code} {quote} Cool, so why not just check to see if $HADOOP_CONF_DIR is set - if it is, do "runjob" as described, if it's not, do "run" to do locally. {quote} FWIW, [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html|GenericOptionsParser] provides a way to do this with -files, -libjars and -archives {quote} Now of course, I guess I don't really need the files to get onto the job's classpath *on the cluster* - it just needs to be on the classpath of the locally running jvm which is invoking MahoutDriver.main(). So I was doing more work than was necessary. This is easy to do, just add MAHOUT_CONF_DIR to the classpath and we're good to go. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837448#action_12837448 ] Drew Farris commented on MAHOUT-301: {quote} Hmm... ok. I'm a little reticent about running -core when testing, because I'm not really testing what the release run will be like - I like the idea of having a single set of dependencies (jars, not classes directories) which are used locally, and the .job when hitting a remote hadoop cluster. Maybe I'm just not familiar with the -core option and it's use. {quote} Ahh, I see where you're coming from, so without core, you're suggesting that mahout pick up the jar files in the target directories if they exist? I think it is fine to modify the non-core classpath to include these, they won't be present in the release build anyway. {quote} The last step, as you've noted, is because I'm not sure that the script actually properly lets HADOOP_CONF_DIR properly get passed through the mahout shell script to actually running on the hadoop cluster, but maybe that's just a config issue in my case? Also means that in fact the default properties idea still doesn't work on hadoop, unless the default properties files are pushed to the classpath. {quote} Are any of the default properties files used beyond the MahoutDriver, which executes locally and sets up the job? Do these files need to be distributed to the rest of the cluster? As noted above, I think the proper way to run MahoutDriver in the context of a distributed job is to do something like: {code} ./bin/mahout org.apache.hadoop.util.RunJar /path/to/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver TestClassifier {code} I suspect we could easilly modify the mahout script and shorten this to: {code} ./bin/mahout runjob TestClassifier {code} I can look at this a little closer tonight, so if you have an updated patch for me to work on/test in a few hours, definitely post it. I'd be happy to make any changes you're interested in. {quote} What is the right way run a job with some additional (runtime) files added to the job's classpath? Is there some cmdline arg to "hadoop" that I'm forgetting? {quote} FWIW, [http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html|GenericOptionsParser] provides a way to do this with -files, -libjars and -archives > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > comm
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837440#action_12837440 ] Jake Mannix commented on MAHOUT-301: {quote} Jake, the basic idea is that you would always use -core when executing from within a build, but you would not use core when executing in the context of a binary release. {quote} Hmm... ok. I'm a little reticent about running -core when testing, because I'm not really testing what the release run will be like - I like the idea of having a single set of dependencies (jars, not classes directories) which are used locally, and the .job when hitting a remote hadoop cluster. Maybe I'm just not familiar with the -core option and it's use. So far, I've always run by the process of * make code/config changes * run mvn clean install (sometimes with -DskipTests if I'm doing rapid iterations) * run "mahout args" OR * hadoop jar examples/target/mahout-examples-{version}.job args The last step, as you've noted, is because I'm not sure that the script actually properly lets HADOOP_CONF_DIR properly get passed through the mahout shell script to actually running on the hadoop cluster, but maybe that's just a config issue in my case? Also means that in fact the default properties idea still doesn't work on hadoop, unless the default properties files are pushed to the classpath. Maybe a kludgey way to do it would be for the script to grab the properties files from the MAHOUT_CONF_DIR, unzip the release job jar, push them into it, and re-jar it back up and then give it to hadoop, and now those files will be available on the classpath of the running job on the remote cluster? What is the right way run a job with some additional (runtime) files added to the job's classpath? Is there some cmdline arg to "hadoop" that I'm forgetting? > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837434#action_12837434 ] Drew Farris commented on MAHOUT-301: Jake, the basic idea is that you would always use -core when executing from within a build, but you would not use core when executing in the context of a binary release. The binary release, built using mvn -Prelease, lands in target/mahout-0.3-SNAPSHOT.tar.gz, untar that and try running bin/mahout from the directory that's created and that should work fine without -core > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837428#action_12837428 ] Jake Mannix commented on MAHOUT-301: {quote} Something else I noticed is that the 'mahout' script doesn't add the classes in $MAHOUT_HOME/lib/*.jar to the classpath. This breakes the binary release in that it can't run anything, e.g: {quote} {quote} Also wondering what the purpose of adding the job jars to the classpath is? (removed in patch) {quote} When I run locally now, not using -core, I get this failure: {code} /bin/mahout vectordump -s wiki-sparse-vectors-out/vectors/part-0 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/utils/vectors/VectorDumper {code} This appears to be because your patch has CLASSPATH set to add on things like $MAHOUT_HOME/mahout-*.jar, which doesn't exist after I've done "mvn install". Is there another maven target I need to use to generate the release jars in $MAHOUT_HOME? > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837376#action_12837376 ] Drew Farris commented on MAHOUT-301: {quote} This wasn't a problem with my patch, right? That was an issue of the mahout script in trunk itself? {quote} Yes it was a problem with the script in trunk. I believe this was due to the fact that the job files were on the classpath instead of all of the dependency jars. Adding the job files to the classpath does not add the dependency jars they contain to the classpath as well. So, no you didn't add this, but it should be fixed (and is in the patch) {quote} What is the -core option for? I've never used it, how does it work? {quote} when you're running bin/mahout in the context of a build the -core option is used to tell it to use the build classpath instead of the classpath used for a binary release. This just follows the pattern established (by Doug?) in the hadoop and nutch launch scripts. {quote} Also added a help message for the 'run' argument. {quote} near line 72 in bin/mahout: (this is different from the --help question I had) {code} echo " seq2sparsegenerate sparse vectors from a sequence file" echo " vectordumpdump vectors from a sequence file" echo " run run mahout tasks using the MahoutDriver, see: http://cwiki.apache.org/MAHOUT/mahoutdriver.html"; {code} {quote} So you already added the ability to load via classpath, right? If we merge that way of thinking with what I'm currently working on (having a configurable "MAHOUT_CONF_DIR" which is used for all these props files), we could just have the mahout shell script just add MAHOUT_CONF_DIR to the classpath (the way you already have it adding the hardwired core/src/main/resources directory) and then it would work that way. {quote} Yep, that should do it, as long as MAHOUT_CONF_DIR appears before src/main/resources, we should be good to go. It should be added outside of the section of the script that determines if -core has been specified on the command-line. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instea
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837351#action_12837351 ] Jake Mannix commented on MAHOUT-301: Ok, Drew, got your patch in diff mode against mine finally. So you already added the ability to load via classpath, right? If we merge that way of thinking with what I'm currently working on (having a configurable "MAHOUT_CONF_DIR" which is used for all these props files), we could just have the mahout shell script just add MAHOUT_CONF_DIR to the classpath (the way you already have it adding the hardwired core/src/main/resources directory) and then it would work that way. New patch merging yours with mine forthcoming. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837345#action_12837345 ] Jake Mannix commented on MAHOUT-301: Hey Drew, thanks for looking at this. Problems you saw are probably what are known as "bugs". :) {quote} Did some testing, here's a patch to clean some of these things up + a couple questions: Could we load the default driver.classes.props from the classpath? If it was loaded that way the default would work regardless of where the mahout script is run from (it currently only works if ./bin/mahout is run, not ./mahout for example) and regardless of whether we're running from a binary release or the dev environment. (included in patch) {quote} YES! We should indeed load from classpath. My most recent version of this patch (which isn't posted, because it conflicts with yours, I'm trying to resolve that now) changes it so that you just supply a single directory in which driver.classes.props and the shortNames.props files are located. {quote} Something else I noticed is that the 'mahout' script doesn't add the classes in $MAHOUT_HOME/lib/*.jar to the classpath. This breakes the binary release in that it can't run anything, e.g: ./mahout vectordump Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/cli2/OptionException Caused by: java.lang.ClassNotFoundException: org.apache.commons.cli2.OptionException (fixed in patch) {code} This wasn't a problem with my patch, right? That was an issue of the mahout script in trunk itself? {code} Using -core in the context of a dev build should work properly, but leaving out -core will cause the script to error unless run in the context of a release - this is the way it should work, right? {code} What is the -core option for? I've never used it, how does it work? {code} Also added a help message for the 'run' argument. {code} Where did you add that? {code} Does executing './mahout run --help' hang for anyone else or is it something specific to my environment? (didn't track this one down) {code} The --help option I didn't have in there, you added it, do you know where it's hanging? > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repet
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837243#action_12837243 ] Drew Farris commented on MAHOUT-301: bq. BTW. How is hadoop execution done using shell script ? i.e It looks like something like the following would do the trick {code} /bin/mahout -core org.apache.hadoop.util.RunJar /path/to/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver TestClassifier {code} we could probably provide 'runjob' case that appends 'org.apache.hadoop.util.RunJar examples/target/mahout-examples-0.3-SNAPSHOT.job org.apache.mahout.driver.MahoutDriver', but perhaps this could be used in every case that 'run' is called? > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837234#action_12837234 ] Drew Farris commented on MAHOUT-301: bq. including the job jar is much cleaner than adding all deps. Plus there is nothing more to configure to execute it on top of hadoop.. The job files work fine with 'hadoop jar', but putting the job files in the classspath will not automatically include the dependencies they contain (e.g commons-cli2) on the classpath: the dependencies need to be added separately (see the ClassNotFoundException case described above) bq. BTW. How is hadoop execution done using shell script ? If the HADOOP_CONF_DIR is set, it should be picked up by the jobs, but I don't think that means jar/jobfile execution works properly. I suspect this needs modifications to make that possible. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837120#action_12837120 ] Robin Anil commented on MAHOUT-301: --- including the job jar is much cleaner than adding all deps. Plus there is nothing more to configure to execute it on top of hadoop.. BTW. How is hadoop execution done using shell script ? i.e hadoop jar mahout-examples-0.3.job o.a.m...DictionaryVectorizer --input . args > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301-drew.patch, MAHOUT-301.patch, > MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836962#action_12836962 ] Robin Anil commented on MAHOUT-301: --- The help comments are missing from the mahout/bin script. Scroll up that file and you will see a pretty printed help string. Just add the Mahout driver description and possibly a wikilink there. Otherwise looks good to commit. I have checked the full functionality yet. If anyone else want to take a look, please do quickly > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836952#action_12836952 ] Jake Mannix commented on MAHOUT-301: Oh, I forgot to finish my sentence which began "run as follows..." Once youv'e got default property files in your $MAHOUT_CONF_DIR, you can run like so: {code} $MAHOUT_HOME/bin/mahout run wikToSeq {code} and that's it. If you want to override the options in your wikToSeq.props file, just pass them in on that same command line above, and they override as desired. If this can be tested out and debugged, this patch is ready for committing, and significantly improves the command line experience. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301.patch, MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836328#action_12836328 ] Jake Mannix commented on MAHOUT-301: This patch modifies the mahout shell script to add the "run" command, which invokes this driver class. It also more nicely takes shortName definitions from either core/src/main/resources/driver.classes.props or the "-cf configFile" location, and runs the class specified by shortName using props specified in core/src/main/resources/shortName.props or whatever is "-df defaultpropsFile". Also takes options in the file of the form "DsomeOpt = optionVal" and passes those into the program as "-DsomeOpt=optionVal" as well. Not sure how well it works on hadoop yet. But comand line seems to work for the one class I've got a props file for (TestClassifier). > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301.patch, MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836278#action_12836278 ] Robin Anil commented on MAHOUT-301: --- Looks great. We parallely need to convert all mainClasses extending AbstractJob and cleanup the stuff there at MAHOUT-294 > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836271#action_12836271 ] Jake Mannix commented on MAHOUT-301: So this current patch will totally take -conf / -Dprop=value type stuff, and pass it directly on into the program in the usual way, with the only difference being that these arguments could also be in a properties file, as long as their using the exact same form, which would make ugly props files as is: if you wanted to not have to type: $MAHOUT_HOME/bin/mahout myClassShortName -DmyProp=value You would could currently need to have, in your props file: DmyProp = value which looks kinda silly, but would work. Oh wait, no it wouldn't, it would end up with a command line which would do " -DmyProp value" not "-DmyProp=value". To get the latter, we'd need an even uglier thing with the current patch: "DmyProp=value"= which would get interpolated into -DmyProp=value on the internal command line. Super ugly. I've got a modified version of this I can upload in a bit which takes care of the short-name/long-name arguments thing by a bit of a kludge, with props files which would look like this: i | input = foo/path which is to be interpreted as: if on the command line, the user say "-i bar/path" OR "--input baz/path", they override the "foo/path" in the props file. If the line in the props file has no "|" separating two options, it's assumed to be prepended with "-". Still doesn't remove the ugliness of -Dprop=value though. Not sure how is best to handle that one. What kind of props file syntax would tell it "take these key-value pairs and do '-key value" and do these other ones as '-Dkey=value'"? I guess just having the 'D' there would be a good signal? It could then just take i | input = foo/path DmyProp = propValue and translate that into a command line like: progName -i foo/path -DmyProp=myValue That would work and be not completely horribly ugly. Not great though. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836268#action_12836268 ] Drew Farris commented on MAHOUT-301: {blockquote} What does GenericOptionsParser do if you have a command line input like this: programName --input foo.txt -i bar.txt where --input is the long argument name for -i as short name? Which one wins? Is it deterministic? {blockquote} In most cases it's really depends on the implementation, sometimes GenericOptiosnParser isn't even being used. In Mahout's case it's likely to be commons-cli2 that's actually doing the parsing, and I don't know how it would behave in this case. I'll take a look. GenericOptionsParser simply handles things like -conf and -Dprop=value that control hadoop configurations, job settings and the like, and then hands back the rest to the caller. In many cases in the mahout , GenericOptionsParser isn't used at all which reduces the control one has over a job's behavior. iirc, Sean and Robin have made some progress towards eliminating these cases with the AbstractJob class. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836247#action_12836247 ] Ted Dunning commented on MAHOUT-301: THis also helps non command line usage, actually. I can imagine a workflow solution where setting all parameters on every step get onerous. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836231#action_12836231 ] Jake Mannix commented on MAHOUT-301: The TODO refers to the issue that I think there, but am not sure: what does GenericOptionsParser do if you have a command line input like this: programName --input foo.txt -i bar.txt where --input is the long argument name for -i as short name? Which one wins? Is it deterministic? > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-301) Improve command-line shell script by allowing default properties files
[ https://issues.apache.org/jira/browse/MAHOUT-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836209#action_12836209 ] Drew Farris commented on MAHOUT-301: This is pretty nice, it gets to the point where relying on shell-history or ad-hoc mechanisms to manage command-lines kills me and this is a nice solution. I've quickly skimmed the patch but I haven't tried it out. I see the TODO in there regarding short vs. long arguments. Do you have any thoughts on how to support single-dask arguments? Things the arguments supported by the [GenericOptionsParser|http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html] could be set in the properties file too. > Improve command-line shell script by allowing default properties files > -- > > Key: MAHOUT-301 > URL: https://issues.apache.org/jira/browse/MAHOUT-301 > Project: Mahout > Issue Type: New Feature > Components: Utils >Affects Versions: 0.3 >Reporter: Jake Mannix >Assignee: Jake Mannix >Priority: Minor > Fix For: 0.4 > > Attachments: MAHOUT-301.patch > > > Snippet from javadoc gives the idea: > {code} > /** > * General-purpose driver class for Mahout programs. Utilizes > org.apache.hadoop.util.ProgramDriver to run > * main methods of other classes, but first loads up default properties from > a properties file. > * > * Usage: run on Hadoop like so: > * > * $HADOOP_HOME/bin/hadoop -jar path/to/job > org.apache.mahout.driver.MahoutDriver [classes.props file] shortJobName \ > * [default.props file for this class] [over-ride options, all specified in > long form: --input, --jarFile, etc] > * > * TODO: set the Main-Class to just be MahoutDriver, so that this option > isn't needed? > * > * (note: using the current shell scipt, this could be modified to be just > * $MAHOUT_HOME/bin/mahout [classes.props file] shortJobName [default.props > file] [over-ride options] > * ) > * > * Works like this: by default, the file > "core/src/main/resources/driver.classes.prop" is loaded, which > * defines a mapping between short names like "VectorDumper" and fully > qualified class names. This file may > * instead be overridden on the command line by having the first argument be > some string of the form *classes.props. > * > * The next argument to the Driver is supposed to be the short name of the > class to be run (as defined in the > * driver.classes.props file). After this, if the next argument ends in > ".props" / ".properties", it is taken to > * be the file to use as the default properties file for this execution, and > key-value pairs are built up from that: > * if the file contains > * > * input=/path/to/my/input > * output=/path/to/my/output > * > * Then the class which will be run will have it's main called with > * > * main(new String[] { "--input", "/path/to/my/input", "--output", > "/path/to/my/output" }); > * > * After all the "default" properties are loaded from the file, any further > command-line arguments are taken in, > * and over-ride the defaults. > */ > {code} > Could be cleaned up, as it's kinda ugly with the whole "file named in > .props", but gives the idea. Really helps cut down on repetitive long > command lines, lets defaults be put props files instead of locked into the > code also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.