[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795991#action_12795991 ] Sean Owen commented on MAHOUT-185: -- I like the idea. Is there any elaboration or movement on this? I wonder to what extent we can make every "thing" in Mahout a command-line program? For example the CF bits aren't quite like that. Well, you could make about 15 different sets of args for the 15 different variations of a CF algorithm. Or you could make some general framework for taking a class with main() and args, but then we approach just reproducing "java". And then there are the Hadoop-related versions of everything, which already provide a "Job" class or "Driver" class to run it from the command line. It might be undesirable to duplicate this. > Add mahout shell script for easy launching of various algorithms > > > Key: MAHOUT-185 > URL: https://issues.apache.org/jira/browse/MAHOUT-185 > Project: Mahout > Issue Type: New Feature >Affects Versions: 0.2 > Environment: linux, bash >Reporter: Robin Anil > Fix For: 0.3 > > > Currently, Each algorithm has a different point of entry. At its too > complicated to understand and launch each one. A mahout shell script needs > to be made in the bin directory which does something like the following > mahout classify -algorithm bayes [OPTIONS] > mahout cluster -algorithm canopy [OPTIONS] > mahout fpm -algorithm pfpgrowth [OPTIONS] > mahout taste -algorithm slopeone [OPTIONS] > mahout misc -algorithm createVectorsFromText [OPTIONS] > mahout examples WikipediaExample -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797233#action_12797233 ] Jake Mannix commented on MAHOUT-185: As a note on this: one of the things I've sometimes done (and we do for managing our Hadoop jobs at LinkedIn) to make dealing with messy CLI stuff more managable, is to also allow for Properties files with default arguments for various jobs (makes for much more easily reproducible results, and it's self documenting - just have "mahout classify" look first in classify.props to see if default args are defined, go from there...). Using a base class like hadoop's Tool, you can leverage ToolRunner and GenericOptionsParser as well, and then hooking in a Properties-based way to run it as well makes it pretty flexible. It would be really nice to consolidate all of our Driver/Job classes into this issue, so that it's a) not duplicated, but b) in one place. This issue should get some priority - it will seriously help with our usability if there's an easy way to launch all the various tasks from one simple place. I'd love to have a little jruby script to run some of this stuff too, because when I was first writing decomposer, I found it invaluable to be able to just drop into jirb's REPL and start issuing java commands to run the various Hadoop jobs I was testing. > Add mahout shell script for easy launching of various algorithms > > > Key: MAHOUT-185 > URL: https://issues.apache.org/jira/browse/MAHOUT-185 > Project: Mahout > Issue Type: New Feature >Affects Versions: 0.2 > Environment: linux, bash >Reporter: Robin Anil > Fix For: 0.3 > > > Currently, Each algorithm has a different point of entry. At its too > complicated to understand and launch each one. A mahout shell script needs > to be made in the bin directory which does something like the following > mahout classify -algorithm bayes [OPTIONS] > mahout cluster -algorithm canopy [OPTIONS] > mahout fpm -algorithm pfpgrowth [OPTIONS] > mahout taste -algorithm slopeone [OPTIONS] > mahout misc -algorithm createVectorsFromText [OPTIONS] > mahout examples WikipediaExample -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797277#action_12797277 ] Ted Dunning commented on MAHOUT-185: Regarding the properties file idea, I have had very good luck with a convention that I now use pretty ubiquitously. Each application has a default properties file that is baked into the jar file. This allows slow changes subject to recompilation. All of these default properties are subject to over-ride in an external property file found in the class path or the current working directory. These over-rides are monitored for changes to allow on-the-fly reconfiguration of long-running processes. For transaction systems (not Mahout-like stuff), I also allow requests to contain an additional over-ride map of properties. This allows certain things to be changed on a request by request basis. This helps enormously because it allows almost anything to be the subject of A/B testing. > Add mahout shell script for easy launching of various algorithms > > > Key: MAHOUT-185 > URL: https://issues.apache.org/jira/browse/MAHOUT-185 > Project: Mahout > Issue Type: New Feature >Affects Versions: 0.2 > Environment: linux, bash >Reporter: Robin Anil > Fix For: 0.3 > > > Currently, Each algorithm has a different point of entry. At its too > complicated to understand and launch each one. A mahout shell script needs > to be made in the bin directory which does something like the following > mahout classify -algorithm bayes [OPTIONS] > mahout cluster -algorithm canopy [OPTIONS] > mahout fpm -algorithm pfpgrowth [OPTIONS] > mahout taste -algorithm slopeone [OPTIONS] > mahout misc -algorithm createVectorsFromText [OPTIONS] > mahout examples WikipediaExample -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830077#action_12830077 ] Robin Anil commented on MAHOUT-185: --- I like the script as i am running k-means these days :) {code} if [ "$COMMAND" = "vectordump" ] ; then CLASS=org.apache.mahout.utils.vectors.VectorDumper elif [ "$COMMAND" = "clusterdump" ] ; then CLASS=org.apache.mahout.utils.clustering.ClusterDumper elif [ "$COMMAND" = "seqdump" ] ; then CLASS=org.apache.mahout.utils.SequenceFileDumper elif [ "$COMMAND" = "kmeans" ] ; then CLASS=org.apache.mahout.clustering.kmeans.KMeansDriver elif [ "$COMMAND" = "canopy" ] ; then CLASS=org.apache.mahout.clustering.canopy.CanopyDriver elif [ "$COMMAND" = "lucenevector" ]; then CLASS=org.apache.mahout.utils.vectors.lucene.Driver elif [ "$COMMAND" = "seqdirectory" ]; then CLASS=org.apache.mahout.text.SequenceFilesFromDirectory elif [ "$COMMAND" = "seqwiki" ]; then CLASS=org.apache.mahout.text.WikipediaToSequenceFile {code} If we go like this we might have too many options. Any way to streamline this ? One thought i have is to have package level Main classes in Core like org.apache.mahout.Clustering.java which internally calls the different main functions ? Similarly in examples and util we can keep One Entry class each Examples.java and Util.java So with this limited set we can keep a global conf object which implements Tool and the fs object which is the default filesystem as specified by the conf This way each algorithm can request a conf object (which copies everything Tool has set) How does that sound? I can whip up all the main classes tonight > Add mahout shell script for easy launching of various algorithms > > > Key: MAHOUT-185 > URL: https://issues.apache.org/jira/browse/MAHOUT-185 > Project: Mahout > Issue Type: New Feature >Affects Versions: 0.2 > Environment: linux, bash >Reporter: Robin Anil > Fix For: 0.3 > > Attachments: MAHOUT-185.patch > > > Currently, Each algorithm has a different point of entry. At its too > complicated to understand and launch each one. A mahout shell script needs > to be made in the bin directory which does something like the following > mahout classify -algorithm bayes [OPTIONS] > mahout cluster -algorithm canopy [OPTIONS] > mahout fpm -algorithm pfpgrowth [OPTIONS] > mahout taste -algorithm slopeone [OPTIONS] > mahout misc -algorithm createVectorsFromText [OPTIONS] > mahout examples WikipediaExample -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832239#action_12832239 ] Jake Mannix commented on MAHOUT-185: Why don't we just commit the shell script and close this for now - it's useful as is. We can open another ticket for 0.4 around doing something more around the lines that Robin mentions above (which I've got partially complete on my local git repo). > Add mahout shell script for easy launching of various algorithms > > > Key: MAHOUT-185 > URL: https://issues.apache.org/jira/browse/MAHOUT-185 > Project: Mahout > Issue Type: New Feature >Affects Versions: 0.2 > Environment: linux, bash >Reporter: Robin Anil > Fix For: 0.4 > > Attachments: MAHOUT-185.patch > > > Currently, Each algorithm has a different point of entry. At its too > complicated to understand and launch each one. A mahout shell script needs > to be made in the bin directory which does something like the following > mahout classify -algorithm bayes [OPTIONS] > mahout cluster -algorithm canopy [OPTIONS] > mahout fpm -algorithm pfpgrowth [OPTIONS] > mahout taste -algorithm slopeone [OPTIONS] > mahout misc -algorithm createVectorsFromText [OPTIONS] > mahout examples WikipediaExample -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832626#action_12832626 ] Grant Ingersoll commented on MAHOUT-185: Looks like a good start. Longer term, we might want to integrate launching EC2, etc.: http://openbixo.org/documentation/running-bixo-in-ec2/ > Add mahout shell script for easy launching of various algorithms > > > Key: MAHOUT-185 > URL: https://issues.apache.org/jira/browse/MAHOUT-185 > Project: Mahout > Issue Type: New Feature >Affects Versions: 0.2 > Environment: linux, bash >Reporter: Robin Anil > Fix For: 0.4 > > Attachments: MAHOUT-185.patch > > > Currently, Each algorithm has a different point of entry. At its too > complicated to understand and launch each one. A mahout shell script needs > to be made in the bin directory which does something like the following > mahout classify -algorithm bayes [OPTIONS] > mahout cluster -algorithm canopy [OPTIONS] > mahout fpm -algorithm pfpgrowth [OPTIONS] > mahout taste -algorithm slopeone [OPTIONS] > mahout misc -algorithm createVectorsFromText [OPTIONS] > mahout examples WikipediaExample -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms
[ https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832661#action_12832661 ] Grant Ingersoll commented on MAHOUT-185: Committed revision 909120. > Add mahout shell script for easy launching of various algorithms > > > Key: MAHOUT-185 > URL: https://issues.apache.org/jira/browse/MAHOUT-185 > Project: Mahout > Issue Type: New Feature > Environment: linux, bash >Reporter: Robin Anil >Assignee: Grant Ingersoll > Fix For: 0.3 > > Attachments: MAHOUT-185.patch > > > Currently, Each algorithm has a different point of entry. At its too > complicated to understand and launch each one. A mahout shell script needs > to be made in the bin directory which does something like the following > mahout classify -algorithm bayes [OPTIONS] > mahout cluster -algorithm canopy [OPTIONS] > mahout fpm -algorithm pfpgrowth [OPTIONS] > mahout taste -algorithm slopeone [OPTIONS] > mahout misc -algorithm createVectorsFromText [OPTIONS] > mahout examples WikipediaExample -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms
Surely there is a clever way to use annotations for this. Not that I know what it might be. On Fri, Feb 5, 2010 at 4:05 AM, Robin Anil (JIRA) wrote: > If we go like this we might have too many options. Any way to streamline > this ? > > One thought i have is to have package level Main classes in Core like > org.apache.mahout.Clustering.java which internally calls the different main > functions ? -- Ted Dunning, CTO DeepDyve