[jira] Updated: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Pallavi Palleti (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pallavi Palleti updated MAHOUT-99: -- Attachment: Mahout-99.patch Patch is modified to be compatible with latest trunk. Thanks Pallav

Re: Hadoop upgrade

2009-03-18 Thread Sean Owen
(I upgraded to 0.19.1 last week.) On Tue, Mar 17, 2009 at 10:41 PM, Grant Ingersoll wrote: > OK, pending MAHOUT-110, I think we are good to go on the release.  Not sure > who volunteered to upgrade Hadoop, so go for it now, or it will wait until > after 0.1.

Re: Hadoop upgrade

2009-03-18 Thread Grant Ingersoll
D'oh! Thanks! On Mar 18, 2009, at 4:32 AM, Sean Owen wrote: (I upgraded to 0.19.1 last week.) On Tue, Mar 17, 2009 at 10:41 PM, Grant Ingersoll wrote: OK, pending MAHOUT-110, I think we are good to go on the release. Not sure who volunteered to upgrade Hadoop, so go for it now, or it wi

[jira] Commented: (MAHOUT-110) Ant script for building Taste web app

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682977#action_12682977 ] Grant Ingersoll commented on MAHOUT-110: We good to go on this one, Sean? I'd like

[jira] Commented: (MAHOUT-110) Ant script for building Taste web app

2009-03-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682980#action_12682980 ] Sean Owen commented on MAHOUT-110: -- I say go for it. I will merge with my patch locally an

[jira] Created: (MAHOUT-111) Redirect Test output to file

2009-03-18 Thread Grant Ingersoll (JIRA)
Redirect Test output to file Key: MAHOUT-111 URL: https://issues.apache.org/jira/browse/MAHOUT-111 Project: Mahout Issue Type: Improvement Affects Versions: 0.2 Reporter: Grant Ingersoll

[jira] Commented: (MAHOUT-110) Ant script for building Taste web app

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682983#action_12682983 ] Grant Ingersoll commented on MAHOUT-110: Will do. Thanks for the kick in the pants

[jira] Resolved: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-99. --- Resolution: Fixed Fix Version/s: 0.1 Committed revision 755548. Thanks! > Improving s

[jira] Updated: (MAHOUT-111) Redirect Test output to file

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-111: --- Affects Version/s: (was: 0.2) 0.1 > Redirect Test output to file >

[jira] Resolved: (MAHOUT-111) Redirect Test output to file

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-111. Resolution: Fixed Fix Version/s: 0.1 Fixed > Redirect Test output to file > ---

Re: Concerns about Maven

2009-03-18 Thread Grant Ingersoll
On Mar 17, 2009, at 9:06 AM, Enis Soztutar wrote: -Grant Knowing nothing about the mahout build script(s), I think that having both ant and maven scripts might prove to be problematic. However keeping one module(taste) in ant will work. As a side note, we have discussed this same thing in

Thoughts on ...

2009-03-18 Thread Grant Ingersoll
http://lingpipe-blog.com/2009/03/12/speeding-up-k-means-clustering-algebra-sparse-vectors/ -Grant

[jira] Resolved: (MAHOUT-110) Ant script for building Taste web app

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-110. Resolution: Fixed Committed > Ant script for building Taste web app >

[jira] Reopened: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reopened MAHOUT-99: --- Hi Pallavi, I'm getting: 09/03/18 11:13:56 WARN mapred.LocalJobRunner: job_local_0001 java.lang.

Re: Concerns about Maven

2009-03-18 Thread Enis Soztutar
Grant Ingersoll wrote: On Mar 17, 2009, at 9:06 AM, Enis Soztutar wrote: -Grant Knowing nothing about the mahout build script(s), I think that having both ant and maven scripts might prove to be problematic. However keeping one module(taste) in ant will work. As a side note, we have discuss

Dirchlet Job example

2009-03-18 Thread Grant Ingersoll
Hey Jeff, Is it appropriate to have a Job example like we do for k-means and some of the other clustering algorithms for dirichlet? I see you do have some type of UI in there, right?Are there directions somewhere for running the example? http://cwiki.apache.org/MAHOUT/dirichlet-process

Re: Thoughts on ...

2009-03-18 Thread Jeff Eastman
Interesting optimization. We can incorporate it by adding a centroid^2 argument to DistanceMeasure interface and adjusting the affected clustering algorithms. All would benefit from this optimization. I will build a test to assess its impact and report. Jeff Grant Ingersoll wrote: http://ling

Re: [jira] Reopened: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
Did you reopen this issue because of this error? I just ran the example and it ran without error. Jeff Grant Ingersoll (JIRA) wrote: [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reopened MAHOUT-99:

Re: Dirchlet Job example

2009-03-18 Thread Jeff Eastman
Not only appropriate but essential. I will add a README file in the code and instructions in the wiki today. Jeff Grant Ingersoll wrote: Hey Jeff, Is it appropriate to have a Job example like we do for k-means and some of the other clustering algorithms for dirichlet? I see you do have so

Re: Dirchlet Job example

2009-03-18 Thread Otis Gospodnetic
Yeah, I was wondering about that simple, but nice cluster-showing UI... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Grant Ingersoll > To: mahout-dev@lucene.apache.org > Sent: Wednesday, March 18, 2009 12:01:28 PM > Subject: Dirchlet J

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683077#action_12683077 ] Grant Ingersoll commented on MAHOUT-99: --- Yeah, what version of Hadoop are you running?

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
I'm running the example in Eclipse using the stand-alone mode in the hadoop-0.19.1 jar file. It works fine, as does the hadoop compile in Eclipse. I cannot; however, get any hadoop stuff to work from the command line. Even though my JAVA_HOME environment is set to /Library/Java/Home and java -v

mvn package tar file issue

2009-03-18 Thread Otis Gospodnetic
Hi, Am I the only person getting the following after mvn package? [INFO] [ERROR] BUILD ERROR [INFO] [INFO] Failed to create assembly: Error creating a

Re: mvn package tar file issue

2009-03-18 Thread Grant Ingersoll
Where are you running it? The top? On Mar 18, 2009, at 2:15 PM, Otis Gospodnetic wrote: Hi, Am I the only person getting the following after mvn package? [INFO] [ERROR] BUILD ERROR [INFO] -

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Grant Ingersoll
On my Mac, I have: $ echo $JAVA_HOME /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home -Grant On Mar 18, 2009, at 2:10 PM, Jeff Eastman wrote: I'm running the example in Eclipse using the stand-alone mode in the hadoop-0.19.1 jar file. It works fine, as does the hadoop compile in

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683140#action_12683140 ] Grant Ingersoll commented on MAHOUT-99: --- I seem to recall hitting something similar be

Re: mvn package tar file issue

2009-03-18 Thread Otis Gospodnetic
Yes, at the top. Bad? Doing it from core worked. How come it doesn't work from root and should it, at least for 0.2? WOuld be more intuitive, no? Otis - Original Message > From: Grant Ingersoll > To: mahout-dev@lucene.apache.org > Sent: Wednesday, March 18, 2009 2:19:29 PM > Subje

Taste: user's neighbours and their similarity

2009-03-18 Thread Otis Gospodnetic
Hi, Is there a way to get a collection of neighbours for a given user? I'm referring to the same neighbour collection that recommendations are derived from. I didn't see a way, so I simply made NearestNUserNeighborhood.Estimator public (diff below), so I could do something like this: publ

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683170#action_12683170 ] Sean Owen commented on MAHOUT-103: -- 1. How do you feel about, therefore, changing to use m

Re: Taste: user's neighbours and their similarity

2009-03-18 Thread Sean Owen
How about the method UserBasedRecommender.mostSimilarUsers()? or a bit more directly, UserNeighborhood.getUserNeighborhood()? (They are arguably kind of redundant but it's 'for historical reasons' and low on my list of design sins.) These in turn largely use TopItems.getTopUsers() and you apparentl

Packaging step taking forever... is this right?

2009-03-18 Thread Jeff Eastman
[WARNING] Entry: mahout-0.2-SNAPSHOT/Users/jeff/Documents/workspace/Mahout/target/mahout-0.1-SNAPSHOT-project.tar.bz2 longer than 100 characters. No movement in the system transcript for many, many minutes. Jeff PGP.sig Description: PGP signature

Re: Packaging step taking forever... is this right?

2009-03-18 Thread Sean Owen
Took me ~15 minutes the first time, 5 minutes subsequent times. Yeah it still seems long, and does seem like something is amiss, but if it works it seems OK for now. On Wed, Mar 18, 2009 at 9:52 PM, Jeff Eastman wrote: > [WARNING] Entry: > mahout-0.2-SNAPSHOT/Users/jeff/Documents/workspace/Mahout

Re: Packaging step taking forever... is this right?

2009-03-18 Thread Otis Gospodnetic
Same experience here. But it finished eventually. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Jeff Eastman > To: mahout-dev@lucene.apache.org > Sent: Wednesday, March 18, 2009 5:52:32 PM > Subject: Packaging step taking forever... is

Re: Packaging step taking forever... is this right?

2009-03-18 Thread Jeff Eastman
Mine's been running in that one step for over an hour. Sean Owen wrote: Took me ~15 minutes the first time, 5 minutes subsequent times. Yeah it still seems long, and does seem like something is amiss, but if it works it seems OK for now. On Wed, Mar 18, 2009 at 9:52 PM, Jeff Eastman wrote:

Re: mvn package tar file issue

2009-03-18 Thread Grant Ingersoll
On Mar 18, 2009, at 3:49 PM, Otis Gospodnetic wrote: Yes, at the top. Bad? It works for me. Try doing a clean.

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683232#action_12683232 ] Ted Dunning commented on MAHOUT-103: > 1. How do you feel about, therefore, changing t

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683238#action_12683238 ] Sean Owen commented on MAHOUT-103: -- The comparison would be to Item. You could say that's

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683251#action_12683251 ] Sean Owen commented on MAHOUT-103: -- The comparison would be to Item. You could say that's

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Richard Tomsett (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683252#action_12683252 ] Richard Tomsett commented on MAHOUT-99: --- Yup, just downloaded the latest trunk and run

[jira] Commented: (MAHOUT-59) Create some examples of clustering well-known datasets

2009-03-18 Thread Richard Tomsett (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683254#action_12683254 ] Richard Tomsett commented on MAHOUT-59: --- Ugh, I had an example almost done but managed

Re: mvn package tar file issue

2009-03-18 Thread Jeff Eastman
That worked for me, but it still took forever. Grant Ingersoll wrote: On Mar 18, 2009, at 3:49 PM, Otis Gospodnetic wrote: Yes, at the top. Bad? It works for me. Try doing a clean. PGP.sig Description: PGP signature

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Pallavi Palleti (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683297#action_12683297 ] Pallavi Palleti commented on MAHOUT-99: --- Yup. That must be the issue. But I am wonderi

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Pallavi Palleti (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683312#action_12683312 ] Pallavi Palleti commented on MAHOUT-99: --- I have used KeyValueLineRecordReader internal

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
The Synthetic Control kMeans job calls the Canopy job to build its initial clusters as is commonly done. If the kMeans record format was changed and the Canopy not changed accordingly, then everything would still compile but there would be a mismatch when the kMeans mapper tried to read in the

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
Are the examples run automatically in the build? Pallavi Palleti (JIRA) wrote: [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683297#action_12683297 ] Pallavi Palleti commented on MAHOUT-99: ---

RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Palleti, Pallavi
Yeah. But, I am wondering how the testcases succeeded? I ran them using "mvn clean install" command. Thanks Pallavi -Original Message- From: Jeff Eastman [mailto:j...@windwardsolutions.com] Sent: Thursday, March 19, 2009 9:56 AM To: mahout-dev@lucene.apache.org Subject: Re: [jira] Comme

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
Sure, why don't you go ahead and post a patch? Pallavi Palleti (JIRA) wrote: [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683312#action_12683312 ] Pallavi Palleti commented on MAHOUT-99:

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
The unit tests dont care which format is used as long as it is consistent. The compiler helps enforce that. kMeans will run and its tests will pass. So will Canopy. When somebody runs the kMeans example it encounters the file format differences. Are all the examples run by the install? I'd be s

[jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Ankur (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683320#action_12683320 ] Ankur commented on MAHOUT-103: -- Alright! then I'll incorporate 1 and 2 as suggested by Sean an

Re: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Jeff Eastman
Also why not consider just converting canopy? Which reader is better? Jeff Eastman wrote: * PGP Signed: 03/18/09 at 21:37:36 Sure, why don't you go ahead and post a patch? Pallavi Palleti (JIRA) wrote: [ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.syst

RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Palleti, Pallavi
There is a testcase in TestKMeansClustering.java which actually uses the output of Canopy as input. This testcase succeeded without any issue. But the thing here is, it doesn't use hdfs but uses the local file system. So, this might be the reason why it is succeeded without any issue. Thanks Pa

RE: [jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Palleti, Pallavi
It depends on the kind of output. If we are just outputting only some numeric values then it is preferred to have SequenceFile as the data is written as binary. If not, it is preferred to write as simple text. Text file is readable where as binary is not readable. As we consider the data as te

[jira] Commented: (MAHOUT-99) Improving speed of KMeans

2009-03-18 Thread Pallavi Palleti (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683335#action_12683335 ] Pallavi Palleti commented on MAHOUT-99: --- If we need to modify Canopy. We need to modif

Re: [jira] Commented: (MAHOUT-103) Co-occurence based nearest neighbourhood

2009-03-18 Thread Ted Dunning
Ankur, What form will the counts be in when you need this function? Four integers separately available? Values in a view of a matrix? I will be happy to adapt some code to compute the measure you need. On Wed, Mar 18, 2009 at 9:45 PM, Ankur (JIRA) wrote: > map-red implementation of the log-l