[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832626#action_12832626
 ] 

Grant Ingersoll commented on MAHOUT-185:


Looks like a good start.  Longer term, we might want to integrate launching 
EC2, etc.: http://openbixo.org/documentation/running-bixo-in-ec2/



 Add mahout shell script for easy launching of various algorithms
 

 Key: MAHOUT-185
 URL: https://issues.apache.org/jira/browse/MAHOUT-185
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.2
 Environment: linux, bash
Reporter: Robin Anil
 Fix For: 0.4

 Attachments: MAHOUT-185.patch


 Currently, Each algorithm has a different point of entry. At its too 
 complicated to understand and launch each one.  A mahout shell script needs 
 to be made in the bin directory which does something like the following
 mahout classify -algorithm bayes [OPTIONS]
 mahout cluster -algorithm canopy  [OPTIONS]
 mahout fpm -algorithm pfpgrowth [OPTIONS]
 mahout taste -algorithm slopeone [OPTIONS] 
 mahout misc -algorithm createVectorsFromText [OPTIONS]
 mahout examples WikipediaExample

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-11 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832661#action_12832661
 ] 

Grant Ingersoll commented on MAHOUT-185:


Committed revision 909120.

 Add mahout shell script for easy launching of various algorithms
 

 Key: MAHOUT-185
 URL: https://issues.apache.org/jira/browse/MAHOUT-185
 Project: Mahout
  Issue Type: New Feature
 Environment: linux, bash
Reporter: Robin Anil
Assignee: Grant Ingersoll
 Fix For: 0.3

 Attachments: MAHOUT-185.patch


 Currently, Each algorithm has a different point of entry. At its too 
 complicated to understand and launch each one.  A mahout shell script needs 
 to be made in the bin directory which does something like the following
 mahout classify -algorithm bayes [OPTIONS]
 mahout cluster -algorithm canopy  [OPTIONS]
 mahout fpm -algorithm pfpgrowth [OPTIONS]
 mahout taste -algorithm slopeone [OPTIONS] 
 mahout misc -algorithm createVectorsFromText [OPTIONS]
 mahout examples WikipediaExample

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-10 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832239#action_12832239
 ] 

Jake Mannix commented on MAHOUT-185:


Why don't we just commit the shell script and close this for now - it's useful 
as is.  

We can open another ticket for 0.4 around doing something more around the lines 
that Robin mentions above (which I've got partially complete on my local git 
repo).

 Add mahout shell script for easy launching of various algorithms
 

 Key: MAHOUT-185
 URL: https://issues.apache.org/jira/browse/MAHOUT-185
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.2
 Environment: linux, bash
Reporter: Robin Anil
 Fix For: 0.4

 Attachments: MAHOUT-185.patch


 Currently, Each algorithm has a different point of entry. At its too 
 complicated to understand and launch each one.  A mahout shell script needs 
 to be made in the bin directory which does something like the following
 mahout classify -algorithm bayes [OPTIONS]
 mahout cluster -algorithm canopy  [OPTIONS]
 mahout fpm -algorithm pfpgrowth [OPTIONS]
 mahout taste -algorithm slopeone [OPTIONS] 
 mahout misc -algorithm createVectorsFromText [OPTIONS]
 mahout examples WikipediaExample

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-05 Thread Robin Anil (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12830077#action_12830077
 ] 

Robin Anil commented on MAHOUT-185:
---

I like the script as i am running k-means these days :)
{code}
if [ $COMMAND = vectordump ] ; then
  CLASS=org.apache.mahout.utils.vectors.VectorDumper
elif [ $COMMAND = clusterdump ] ; then
  CLASS=org.apache.mahout.utils.clustering.ClusterDumper
elif [ $COMMAND = seqdump ] ; then
  CLASS=org.apache.mahout.utils.SequenceFileDumper
elif [ $COMMAND = kmeans ] ; then
  CLASS=org.apache.mahout.clustering.kmeans.KMeansDriver
elif [ $COMMAND = canopy ] ; then
  CLASS=org.apache.mahout.clustering.canopy.CanopyDriver
elif [ $COMMAND = lucenevector ]; then
  CLASS=org.apache.mahout.utils.vectors.lucene.Driver
elif [ $COMMAND = seqdirectory ]; then
  CLASS=org.apache.mahout.text.SequenceFilesFromDirectory
elif [ $COMMAND = seqwiki ]; then
  CLASS=org.apache.mahout.text.WikipediaToSequenceFile
{code}

If we go like this we might have too many options. Any way to streamline this ?

One thought i have is to have package level Main classes in Core like 
org.apache.mahout.Clustering.java which internally calls the different main 
functions ?
Similarly in examples and util we can keep One Entry class each Examples.java 
and Util.java

So with this limited set we can keep a global conf object which implements Tool 
and the fs object which is the default filesystem as specified by the conf
This way each algorithm can request a conf object (which copies everything Tool 
has set)
How does that sound? I can whip up all the main classes tonight











 Add mahout shell script for easy launching of various algorithms
 

 Key: MAHOUT-185
 URL: https://issues.apache.org/jira/browse/MAHOUT-185
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.2
 Environment: linux, bash
Reporter: Robin Anil
 Fix For: 0.3

 Attachments: MAHOUT-185.patch


 Currently, Each algorithm has a different point of entry. At its too 
 complicated to understand and launch each one.  A mahout shell script needs 
 to be made in the bin directory which does something like the following
 mahout classify -algorithm bayes [OPTIONS]
 mahout cluster -algorithm canopy  [OPTIONS]
 mahout fpm -algorithm pfpgrowth [OPTIONS]
 mahout taste -algorithm slopeone [OPTIONS] 
 mahout misc -algorithm createVectorsFromText [OPTIONS]
 mahout examples WikipediaExample

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-02-05 Thread Ted Dunning
Surely there is a clever way to use annotations for this.  Not that I know
what it might be.

On Fri, Feb 5, 2010 at 4:05 AM, Robin Anil (JIRA) j...@apache.org wrote:

 If we go like this we might have too many options. Any way to streamline
 this ?

 One thought i have is to have package level Main classes in Core like
 org.apache.mahout.Clustering.java which internally calls the different main
 functions ?




-- 
Ted Dunning, CTO
DeepDyve


[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-01-06 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797233#action_12797233
 ] 

Jake Mannix commented on MAHOUT-185:


As a note on this:  one of the things I've sometimes done (and we do for 
managing our Hadoop jobs at LinkedIn) to make dealing with messy CLI stuff more 
managable, is to also allow for Properties files with default arguments for 
various jobs (makes for much more easily reproducible results, and it's self 
documenting - just have mahout classify look first in classify.props to see 
if default args are defined, go from there...).

Using a base class like hadoop's Tool, you can leverage ToolRunner and 
GenericOptionsParser as well, and then hooking in a Properties-based way to run 
it as well makes it pretty flexible.

It would be really nice to consolidate all of our Driver/Job classes into this 
issue, so that it's a) not duplicated, but b) in one place.  

This issue should get some priority - it will seriously help with our usability 
if there's an easy way to launch all the various tasks from one simple place.  
I'd love to have a little jruby script to run some of this stuff too, because 
when I was first writing decomposer, I found it invaluable to be able to just 
drop into jirb's REPL and start issuing java commands to run the various Hadoop 
jobs I was testing.

 Add mahout shell script for easy launching of various algorithms
 

 Key: MAHOUT-185
 URL: https://issues.apache.org/jira/browse/MAHOUT-185
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.2
 Environment: linux, bash
Reporter: Robin Anil
 Fix For: 0.3


 Currently, Each algorithm has a different point of entry. At its too 
 complicated to understand and launch each one.  A mahout shell script needs 
 to be made in the bin directory which does something like the following
 mahout classify -algorithm bayes [OPTIONS]
 mahout cluster -algorithm canopy  [OPTIONS]
 mahout fpm -algorithm pfpgrowth [OPTIONS]
 mahout taste -algorithm slopeone [OPTIONS] 
 mahout misc -algorithm createVectorsFromText [OPTIONS]
 mahout examples WikipediaExample

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-01-06 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797277#action_12797277
 ] 

Ted Dunning commented on MAHOUT-185:


Regarding the properties file idea, I have had very good luck with a convention 
that I now use pretty ubiquitously.  Each application has a default properties 
file that is baked into the jar file.  This allows slow changes subject to 
recompilation.  All of these default properties are subject to over-ride in an 
external property file found in the class path or the current working 
directory.  These over-rides are monitored for changes to allow on-the-fly 
reconfiguration of long-running processes.

For transaction systems (not Mahout-like stuff), I also allow requests to 
contain an additional over-ride map of properties.  This allows certain things 
to be changed on a request by request basis.  This helps enormously because it 
allows almost anything to be the subject of A/B testing.

 

 Add mahout shell script for easy launching of various algorithms
 

 Key: MAHOUT-185
 URL: https://issues.apache.org/jira/browse/MAHOUT-185
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.2
 Environment: linux, bash
Reporter: Robin Anil
 Fix For: 0.3


 Currently, Each algorithm has a different point of entry. At its too 
 complicated to understand and launch each one.  A mahout shell script needs 
 to be made in the bin directory which does something like the following
 mahout classify -algorithm bayes [OPTIONS]
 mahout cluster -algorithm canopy  [OPTIONS]
 mahout fpm -algorithm pfpgrowth [OPTIONS]
 mahout taste -algorithm slopeone [OPTIONS] 
 mahout misc -algorithm createVectorsFromText [OPTIONS]
 mahout examples WikipediaExample

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-185) Add mahout shell script for easy launching of various algorithms

2010-01-03 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795991#action_12795991
 ] 

Sean Owen commented on MAHOUT-185:
--

I like the idea. Is there any elaboration or movement on this?

I wonder to what extent we can make every thing in Mahout a command-line 
program? For example the CF bits aren't quite like that. Well, you could make 
about 15 different sets of args for the 15 different variations of a CF 
algorithm. Or you could make some general framework for taking a class with 
main() and args, but then we approach just reproducing java. 

And then there are the Hadoop-related versions of everything, which already 
provide a Job class or Driver class to run it from the command line. It 
might be undesirable to duplicate this.

 Add mahout shell script for easy launching of various algorithms
 

 Key: MAHOUT-185
 URL: https://issues.apache.org/jira/browse/MAHOUT-185
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.2
 Environment: linux, bash
Reporter: Robin Anil
 Fix For: 0.3


 Currently, Each algorithm has a different point of entry. At its too 
 complicated to understand and launch each one.  A mahout shell script needs 
 to be made in the bin directory which does something like the following
 mahout classify -algorithm bayes [OPTIONS]
 mahout cluster -algorithm canopy  [OPTIONS]
 mahout fpm -algorithm pfpgrowth [OPTIONS]
 mahout taste -algorithm slopeone [OPTIONS] 
 mahout misc -algorithm createVectorsFromText [OPTIONS]
 mahout examples WikipediaExample

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.