[jira] [Commented] (MAHOUT-1390) SVD hangs for certain inputs
[ https://issues.apache.org/jira/browse/MAHOUT-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881736#comment-13881736 ] Hudson commented on MAHOUT-1390: SUCCESS: Integrated in Mahout-Quality #2435 (See [https://builds.apache.org/job/Mahout-Quality/2435/]) MAHOUT-1390 - Fixed extraneous commit. (tdunning: rev 1560889) * /mahout/trunk/math/src/test/java/org/apache/mahout/math/TestSingularValueDecomposition.java > SVD hangs for certain inputs > > > Key: MAHOUT-1390 > URL: https://issues.apache.org/jira/browse/MAHOUT-1390 > Project: Mahout > Issue Type: Bug > Components: Math >Affects Versions: 0.8 >Reporter: Ted Dunning >Priority: Critical > Fix For: 0.9 > > Attachments: MAHOUT-1390.patch > > > For certain inputs, the SingularValueDecomposition implementation that we > have doesn't detect that it has effectively converged and runs into an > infinite loop. > Luckily, there is a fix that has been added to the Jama implementation that > our SVD is ultimately based on and that fix works for our problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1409) MatrixVectorView has index check error
[ https://issues.apache.org/jira/browse/MAHOUT-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881735#comment-13881735 ] Hudson commented on MAHOUT-1409: SUCCESS: Integrated in Mahout-Quality #2435 (See [https://builds.apache.org/job/Mahout-Quality/2435/]) MAHOUT-1409 - bad index checking in viewColumn or viewRow. (tdunning: rev 1560890) * /mahout/trunk/math/src/main/java/org/apache/mahout/math/MatrixVectorView.java * /mahout/trunk/math/src/test/java/org/apache/mahout/math/MatrixVectorViewTest.java > MatrixVectorView has index check error > -- > > Key: MAHOUT-1409 > URL: https://issues.apache.org/jira/browse/MAHOUT-1409 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.8 >Reporter: Ted Dunning >Assignee: Ted Dunning > Attachments: MAHOUT-1409.patch > > > There is a > in the test for the correct index where there should be a >= -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: MAHOUT 0.9 Release - New URL
My schedule has opened up a bit and I can review as well. On Fri, Jan 24, 2014 at 3:06 PM, Sebastian Schelter wrote: > I will try the next candidate agaim, so one vote is sure. > Am 24.01.2014 23:54 schrieb "Suneel Marthi" : > > > I am open to having the conversation (and a part of me feels that the > > clusteringId fix should be in 0.9). > > > > If we decide to incorporate that into 0.9, I need to rollback the 0.9 > > Release that's presently out there in staging (for the 5th time in a row > > now). > > I am fine with doing that. > > > > What do you think we should do? > > > > a) Go ahead with 0.9 release without the fix for M-1410 . > > b) Rollback 0.9 and include the fix for M-1410 > > c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes > > M-1410 and any other issues/enhancements that are fixed. > > > > > > I am leaning towards (b), my only concern being that from my experience > in > > the past few weeks; its become real hard to muster the minimum 3 +1 PMC > > votes required for a release to pass. > > > > > > > > > > > > > > > > > > On Friday, January 24, 2014 5:45 PM, Ted Dunning > > wrote: > > > > > > > > Can we hold a separate discussion about whether the clustering id issue > > has to be in 0.9 while extending the vote deadline if necessary? > > > > If not, then all these votes are great and the release can go forward. > > > > If it is the sense that that fix has to be in, we should leave time for > > people for people to reverse their votes to -1. > > > > > > > > > > On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi > > wrote: > > > > Thanks for all those that volunteered. The voting for 0.9 Release closes > > tomorrow. > > > > > > > > > > > > > > > > > > > > > > > > > > >On Friday, January 24, 2014 4:05 AM, Gokhan Capan > > wrote: > > > > > >Using CentOS 6.5 and hadoop 1.2.1, all passed. > > > > > >+1 from me > > > > > >Gokhan > > > > > > > > > > > >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo > > wrote: > > > > > >> a),b),c),d) all passed on CentOS for me > > >> > > >> > Date: Thu, 23 Jan 2014 13:43:06 +0200 > > >> > Subject: Re: MAHOUT 0.9 Release - New URL > > >> > From: ssvinarc...@hortonworks.com > > >> > To: dev@mahout.apache.org > > >> > > > >> > I did a), b), c), d) and all steps pass. > > >> > +1 > > >> > > > >> > > > >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll < > gsing...@apache.org > > >> >wrote: > > >> > > > >> > > +1 from me. > > >> > > > > >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi < > suneel_mar...@yahoo.com > > > > > >> > > wrote: > > >> > > > > >> > > > Fixed the issues that were reported this week and restored FP > > mining > > >> > > into the codebase. > > >> > > > > > >> > > > Here's the URL for the final release in staging:- > > >> > > > > > >> > > > > >> > > > https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ > > >> > > > > > >> > > > The artifacts have been signed with the > > > following key: > > >> > > > https://people.apache.org/keys/committer/smarthi.asc > > >> > > > > > >> > > > > > >> > > > a) Verify that u can unpack the release (tar or zip) > > >> > > > b) Verify u r able to compile the distro > > >> > > > c) Run through the unit tests: mvn clean test > > >> > > > d) Run the example scripts under > > > $MAHOUT_HOME/examples/bin. Please > > >> run > > >> > > through all the different options in each script. > > >> > > > > > >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the > release > > >> to be > > >> > > finalized. > > >> > > > > >> > > > > >> > > Grant Ingersoll | @gsingers > > >> > > http://www.lucidworks.com > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > >> > -- > > >> > > > > CONFIDENTIALITY NOTICE > > >> > NOTICE: This message is intended for the use of the individual or > > entity > > >> to > > >> > which it is addressed and may contain information that is > > confidential, > > >> > privileged and exempt from disclosure under applicable law. If the > > reader > > >> > of this message is not the intended recipient, you are hereby > notified > > >> that > > >> > any printing, copying, dissemination, distribution, disclosure or > > >> > forwarding of this communication is strictly prohibited. If you have > > >> > received this communication in error, please contact the sender > > >> immediately > > >> > and delete it from your system. Thank You. > > >> > > >> >
Re: MAHOUT 0.9 Release - New URL
I will try the next candidate agaim, so one vote is sure. Am 24.01.2014 23:54 schrieb "Suneel Marthi" : > I am open to having the conversation (and a part of me feels that the > clusteringId fix should be in 0.9). > > If we decide to incorporate that into 0.9, I need to rollback the 0.9 > Release that's presently out there in staging (for the 5th time in a row > now). > I am fine with doing that. > > What do you think we should do? > > a) Go ahead with 0.9 release without the fix for M-1410 . > b) Rollback 0.9 and include the fix for M-1410 > c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes > M-1410 and any other issues/enhancements that are fixed. > > > I am leaning towards (b), my only concern being that from my experience in > the past few weeks; its become real hard to muster the minimum 3 +1 PMC > votes required for a release to pass. > > > > > > > > > On Friday, January 24, 2014 5:45 PM, Ted Dunning > wrote: > > > > Can we hold a separate discussion about whether the clustering id issue > has to be in 0.9 while extending the vote deadline if necessary? > > If not, then all these votes are great and the release can go forward. > > If it is the sense that that fix has to be in, we should leave time for > people for people to reverse their votes to -1. > > > > > On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi > wrote: > > Thanks for all those that volunteered. The voting for 0.9 Release closes > tomorrow. > > > > > > > > > > > > > > > > > >On Friday, January 24, 2014 4:05 AM, Gokhan Capan > wrote: > > > >Using CentOS 6.5 and hadoop 1.2.1, all passed. > > > >+1 from me > > > >Gokhan > > > > > > > >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo > wrote: > > > >> a),b),c),d) all passed on CentOS for me > >> > >> > Date: Thu, 23 Jan 2014 13:43:06 +0200 > >> > Subject: Re: MAHOUT 0.9 Release - New URL > >> > From: ssvinarc...@hortonworks.com > >> > To: dev@mahout.apache.org > >> > > >> > I did a), b), c), d) and all steps pass. > >> > +1 > >> > > >> > > >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll >> >wrote: > >> > > >> > > +1 from me. > >> > > > >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi > > >> > > wrote: > >> > > > >> > > > Fixed the issues that were reported this week and restored FP > mining > >> > > into the codebase. > >> > > > > >> > > > Here's the URL for the final release in staging:- > >> > > > > >> > > > >> > https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ > >> > > > > >> > > > The artifacts have been signed with the > > following key: > >> > > > https://people.apache.org/keys/committer/smarthi.asc > >> > > > > >> > > > > >> > > > a) Verify that u can unpack the release (tar or zip) > >> > > > b) Verify u r able to compile the distro > >> > > > c) Run through the unit tests: mvn clean test > >> > > > d) Run the example scripts under > > $MAHOUT_HOME/examples/bin. Please > >> run > >> > > through all the different options in each script. > >> > > > > >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release > >> to be > >> > > finalized. > >> > > > >> > > > >> > > Grant Ingersoll | @gsingers > >> > > http://www.lucidworks.com > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> > -- > >> > > > CONFIDENTIALITY NOTICE > >> > NOTICE: This message is intended for the use of the individual or > entity > >> to > >> > which it is addressed and may contain information that is > confidential, > >> > privileged and exempt from disclosure under applicable law. If the > reader > >> > of this message is not the intended recipient, you are hereby notified > >> that > >> > any printing, copying, dissemination, distribution, disclosure or > >> > forwarding of this communication is strictly prohibited. If you have > >> > received this communication in error, please contact the sender > >> immediately > >> > and delete it from your system. Thank You. > >> > >>
Re: MAHOUT 0.9 Release - New URL
I am open to having the conversation (and a part of me feels that the clusteringId fix should be in 0.9). If we decide to incorporate that into 0.9, I need to rollback the 0.9 Release that's presently out there in staging (for the 5th time in a row now). I am fine with doing that. What do you think we should do? a) Go ahead with 0.9 release without the fix for M-1410 . b) Rollback 0.9 and include the fix for M-1410 c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes M-1410 and any other issues/enhancements that are fixed. I am leaning towards (b), my only concern being that from my experience in the past few weeks; its become real hard to muster the minimum 3 +1 PMC votes required for a release to pass. On Friday, January 24, 2014 5:45 PM, Ted Dunning wrote: Can we hold a separate discussion about whether the clustering id issue has to be in 0.9 while extending the vote deadline if necessary? If not, then all these votes are great and the release can go forward. If it is the sense that that fix has to be in, we should leave time for people for people to reverse their votes to -1. On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi wrote: Thanks for all those that volunteered. The voting for 0.9 Release closes tomorrow. > > > > > > > > >On Friday, January 24, 2014 4:05 AM, Gokhan Capan wrote: > >Using CentOS 6.5 and hadoop 1.2.1, all passed. > >+1 from me > >Gokhan > > > >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo wrote: > >> a),b),c),d) all passed on CentOS for me >> >> > Date: Thu, 23 Jan 2014 13:43:06 +0200 >> > Subject: Re: MAHOUT 0.9 Release - New URL >> > From: ssvinarc...@hortonworks.com >> > To: dev@mahout.apache.org >> > >> > I did a), b), c), d) and all steps pass. >> > +1 >> > >> > >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll > >wrote: >> > >> > > +1 from me. >> > > >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi >> > > wrote: >> > > >> > > > Fixed the issues that were reported this week and restored FP mining >> > > into the codebase. >> > > > >> > > > Here's the URL for the final release in staging:- >> > > > >> > > >> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ >> > > > >> > > > The artifacts have been signed with the > following key: >> > > > https://people.apache.org/keys/committer/smarthi.asc >> > > > >> > > > >> > > > a) Verify that u can unpack the release (tar or zip) >> > > > b) Verify u r able to compile the distro >> > > > c) Run through the unit tests: mvn clean test >> > > > d) Run the example scripts under > $MAHOUT_HOME/examples/bin. Please >> run >> > > through all the different options in each script. >> > > > >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release >> to be >> > > finalized. >> > > >> > > >> > > Grant Ingersoll | @gsingers >> > > http://www.lucidworks.com >> > > >> > > >> > > >> > > >> > > >> > > >> > >> > -- >> > > CONFIDENTIALITY NOTICE >> > NOTICE: This message is intended for the use of the individual or entity >> to >> > which it is addressed and may contain information that is confidential, >> > privileged and exempt from disclosure under applicable law. If the reader >> > of this message is not the intended recipient, you are hereby notified >> that >> > any printing, copying, dissemination, distribution, disclosure or >> > forwarding of this communication is strictly prohibited. If you have >> > received this communication in error, please contact the sender >> immediately >> > and delete it from your system. Thank You. >> >>
Re: MAHOUT 0.9 Release - New URL
Can we hold a separate discussion about whether the clustering id issue has to be in 0.9 while extending the vote deadline if necessary? If not, then all these votes are great and the release can go forward. If it is the sense that that fix has to be in, we should leave time for people for people to reverse their votes to -1. On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi wrote: > Thanks for all those that volunteered. The voting for 0.9 Release closes > tomorrow. > > > > > > > > On Friday, January 24, 2014 4:05 AM, Gokhan Capan > wrote: > > Using CentOS 6.5 and hadoop 1.2.1, all passed. > > +1 from me > > Gokhan > > > > On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo > wrote: > > > a),b),c),d) all passed on CentOS for me > > > > > Date: Thu, 23 Jan 2014 13:43:06 +0200 > > > Subject: Re: MAHOUT 0.9 Release - New URL > > > From: ssvinarc...@hortonworks.com > > > To: dev@mahout.apache.org > > > > > > I did a), b), c), d) and all steps pass. > > > +1 > > > > > > > > > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll > >wrote: > > > > > > > +1 from me. > > > > > > > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi > > > > wrote: > > > > > > > > > Fixed the issues that were reported this week and restored FP > mining > > > > into the codebase. > > > > > > > > > > Here's the URL for the final release in staging:- > > > > > > > > > > > > https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ > > > > > > > > > > The artifacts have been signed with the > following key: > > > > > https://people.apache.org/keys/committer/smarthi.asc > > > > > > > > > > > > > > > a) Verify that u can unpack the release (tar or zip) > > > > > b) Verify u r able to compile the distro > > > > > c) Run through the unit tests: mvn clean test > > > > > d) Run the example scripts under > $MAHOUT_HOME/examples/bin. Please > > run > > > > through all the different options in each script. > > > > > > > > > > Committers and PMC, need a minimum of 3 '+1' votes for the release > > to be > > > > finalized. > > > > > > > > > > > > Grant Ingersoll | @gsingers > > > > http://www.lucidworks.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > CONFIDENTIALITY NOTICE > > > NOTICE: This message is intended for the use of the individual or > entity > > to > > > which it is addressed and may contain information that is confidential, > > > privileged and exempt from disclosure under applicable law. If the > reader > > > of this message is not the intended recipient, you are hereby notified > > that > > > any printing, copying, dissemination, distribution, disclosure or > > > forwarding of this communication is strictly prohibited. If you have > > > received this communication in error, please contact the sender > > immediately > > > and delete it from your system. Thank You. > > > > >
Re: MAHOUT 0.9 Release - New URL
Thanks for all those that volunteered. The voting for 0.9 Release closes tomorrow. On Friday, January 24, 2014 4:05 AM, Gokhan Capan wrote: Using CentOS 6.5 and hadoop 1.2.1, all passed. +1 from me Gokhan On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo wrote: > a),b),c),d) all passed on CentOS for me > > > Date: Thu, 23 Jan 2014 13:43:06 +0200 > > Subject: Re: MAHOUT 0.9 Release - New URL > > From: ssvinarc...@hortonworks.com > > To: dev@mahout.apache.org > > > > I did a), b), c), d) and all steps pass. > > +1 > > > > > > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll >wrote: > > > > > +1 from me. > > > > > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi > > > wrote: > > > > > > > Fixed the issues that were reported this week and restored FP mining > > > into the codebase. > > > > > > > > Here's the URL for the final release in staging:- > > > > > > > > https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ > > > > > > > > The artifacts have been signed with the following key: > > > > https://people.apache.org/keys/committer/smarthi.asc > > > > > > > > > > > > a) Verify that u can unpack the release (tar or zip) > > > > b) Verify u r able to compile the distro > > > > c) Run through the unit tests: mvn clean test > > > > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please > run > > > through all the different options in each script. > > > > > > > > Committers and PMC, need a minimum of 3 '+1' votes for the release > to be > > > finalized. > > > > > > > > > Grant Ingersoll | @gsingers > > > http://www.lucidworks.com > > > > > > > > > > > > > > > > > > > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. > >
Re: cluster-reuters.sh broken in trunk
Does Mahout still support Hadoop 0.20.2x? I know we had some discussions on this but I can't find them at the moment. iPhone'd > On Jan 24, 2014, at 16:43, Suneel Marthi wrote: > > I assume u r running this in MR mode?? Could u clear up your > /tmp/mahout-work- folder and try again. > > > > > On Friday, January 24, 2014 1:56 PM, Andrew Musselman > wrote: > > Actually, getting the same error with a fresh svn checkout: > > 14/01/24 09:42:13 INFO driver.MahoutDriver: Program took 291353 ms > (Minutes: 4.8558834) > Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and > HADOOP_CONF_DIR= > MAHOUT-JOB: > /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar > 14/01/24 09:42:16 INFO common.AbstractJob: Command line arguments: > {--clustering=null, > --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], > --convergenceDelta=[0.5], > --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], > --endPhase=[2147483647], > --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], > --maxIter=[10], --method=[mapreduce], --numClusters=[20], > --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, > --startPhase=[0], --tempDir=[temp]} > 14/01/24 09:42:17 INFO common.HadoopUtil: Deleting > /tmp/mahout-work-akm/reuters-kmeans-clusters > 14/01/24 09:42:17 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new compressor > 14/01/24 09:42:17 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > 14/01/24 09:42:17 INFO kmeans.KMeansDriver: Input: > /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors > Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > Out: /tmp/mahout-work-akm/reuters-kmeans Distance: > org.apache.mahout.common.distance.CosineDistanceMeasure > 14/01/24 09:42:17 INFO kmeans.KMeansDriver: convergence: 0.5 max > Iterations: 10 > 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new decompressor > Exception in thread "main" java.lang.IllegalStateException: No input > clusters found in > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c > argument. > at > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > > On Fri, Jan 24, 2014 at 10:07 AM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > >> Yeah, disregard, my repo was out of whack. >> >> >>> On Fri, Jan 24, 2014 at 10:00 AM, ap.dev wrote: >>> >>> I'm not getting any exceptions there. >>> >>> Original message >>> From: Andrew Musselman >>> Date:01/24/2014 11:38 AM (GMT-05:00) >>> To: dev@mahout.apache.org >>> Subject: cluster-reuters.sh broken in trunk >>> >>> Last night I had this issue when testing out cluster-reuters.sh with no >>> flags; anyone seen this recently? >>> >>> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms >>> (Minutes: 4.7799833) >>> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and >>> HADOOP_CONF_DIR= >>> MAHOUT-JOB: >>> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar >>> 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments: >>> {--clustering=null, >>> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], >>> --convergenceDelta=[0.5], >>> >>> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], >>> --endPhase=[2147483647], >>> >>> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], >>> --maxIter=[10], --method=[mapreduce], --numClusters=[20], >>> --output=[/tmp/mahout-wor
Re: cluster-reuters.sh broken in trunk
I assume u r running this in MR mode?? Could u clear up your /tmp/mahout-work- folder and try again. On Friday, January 24, 2014 1:56 PM, Andrew Musselman wrote: Actually, getting the same error with a fresh svn checkout: 14/01/24 09:42:13 INFO driver.MahoutDriver: Program took 291353 ms (Minutes: 4.8558834) Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar 14/01/24 09:42:16 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]} 14/01/24 09:42:17 INFO common.HadoopUtil: Deleting /tmp/mahout-work-akm/reuters-kmeans-clusters 14/01/24 09:42:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new compressor 14/01/24 09:42:17 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed 14/01/24 09:42:17 INFO kmeans.KMeansDriver: Input: /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed Out: /tmp/mahout-work-akm/reuters-kmeans Distance: org.apache.mahout.common.distance.CosineDistanceMeasure 14/01/24 09:42:17 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new decompressor Exception in thread "main" java.lang.IllegalStateException: No input clusters found in /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c argument. at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) On Fri, Jan 24, 2014 at 10:07 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Yeah, disregard, my repo was out of whack. > > > On Fri, Jan 24, 2014 at 10:00 AM, ap.dev wrote: > >> I'm not getting any exceptions there. >> >> Original message >> From: Andrew Musselman >> Date:01/24/2014 11:38 AM (GMT-05:00) >> To: dev@mahout.apache.org >> Subject: cluster-reuters.sh broken in trunk >> >> Last night I had this issue when testing out cluster-reuters.sh with no >> flags; anyone seen this recently? >> >> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms >> (Minutes: 4.7799833) >> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and >> HADOOP_CONF_DIR= >> MAHOUT-JOB: >> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar >> 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments: >> {--clustering=null, >> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], >> --convergenceDelta=[0.5], >> >> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], >> --endPhase=[2147483647], >> >> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], >> --maxIter=[10], --method=[mapreduce], --numClusters=[20], >> --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, >> --startPhase=[0], --tempDir=[temp]} >> 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor >> 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to >> /t
[jira] [Updated] (MAHOUT-1409) MatrixVectorView has index check error
[ https://issues.apache.org/jira/browse/MAHOUT-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Dunning updated MAHOUT-1409: Resolution: Fixed Status: Resolved (was: Patch Available) I committed this. > MatrixVectorView has index check error > -- > > Key: MAHOUT-1409 > URL: https://issues.apache.org/jira/browse/MAHOUT-1409 > Project: Mahout > Issue Type: Bug >Affects Versions: 0.8 >Reporter: Ted Dunning >Assignee: Ted Dunning > Attachments: MAHOUT-1409.patch > > > There is a > in the test for the correct index where there should be a >= -- This message was sent by Atlassian JIRA (v6.1.5#6160)
RE: cluster-reuters.sh broken in trunk
Cant reproduce on CentOS with hadoop 2.2.0 Running on hadoop, using /home/andy/apache_builds/hadoop_bin/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/andy/mahout_test/sandbox/mahout-trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar 14/01/24 14:09:32 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=[/tmp/mahout-work-andy/reuters-kmeans-clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/tmp/mahout-work-andy/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/tmp/mahout-work-andy/reuters-kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]} Attached the full log that helps any. > Date: Fri, 24 Jan 2014 10:55:14 -0800 > Subject: Re: cluster-reuters.sh broken in trunk > From: andrew.mussel...@gmail.com > To: dev@mahout.apache.org > > Actually, getting the same error with a fresh svn checkout: > > 14/01/24 09:42:13 INFO driver.MahoutDriver: Program took 291353 ms > (Minutes: 4.8558834) > Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and > HADOOP_CONF_DIR= > MAHOUT-JOB: > /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar > 14/01/24 09:42:16 INFO common.AbstractJob: Command line arguments: > {--clustering=null, > --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], > --convergenceDelta=[0.5], > --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], > --endPhase=[2147483647], > --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], > --maxIter=[10], --method=[mapreduce], --numClusters=[20], > --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, > --startPhase=[0], --tempDir=[temp]} > 14/01/24 09:42:17 INFO common.HadoopUtil: Deleting > /tmp/mahout-work-akm/reuters-kmeans-clusters > 14/01/24 09:42:17 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new compressor > 14/01/24 09:42:17 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > 14/01/24 09:42:17 INFO kmeans.KMeansDriver: Input: > /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors > Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > Out: /tmp/mahout-work-akm/reuters-kmeans Distance: > org.apache.mahout.common.distance.CosineDistanceMeasure > 14/01/24 09:42:17 INFO kmeans.KMeansDriver: convergence: 0.5 max > Iterations: 10 > 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new decompressor > Exception in thread "main" java.lang.IllegalStateException: No input > clusters found in > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c > argument. > at > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > On Fri, Jan 24, 2014 at 10:07 AM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > > > Yeah, disregard, my repo was out of whack. > > > > > > On Fri, Jan 24, 2014 at 10:00 AM, ap.dev wrote: > > > >> I'm not getting any exceptions there. > >> > >> Original message > >> From: Andrew Musselman > >> Date:01/24/2014 11:38 AM (GMT-05:00) > >> To: dev@mahout.apache.org > >> Subject: cluster-reuters.sh broken in trunk > >> > >> Last night I had this issue when testing out cluster-reuters.sh with no > >> flags; anyone seen this recently? > >> > >> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms > >> (Minutes: 4.7799833) > >> Running
Re: cluster-reuters.sh broken in trunk
Actually, getting the same error with a fresh svn checkout: 14/01/24 09:42:13 INFO driver.MahoutDriver: Program took 291353 ms (Minutes: 4.8558834) Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar 14/01/24 09:42:16 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]} 14/01/24 09:42:17 INFO common.HadoopUtil: Deleting /tmp/mahout-work-akm/reuters-kmeans-clusters 14/01/24 09:42:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new compressor 14/01/24 09:42:17 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed 14/01/24 09:42:17 INFO kmeans.KMeansDriver: Input: /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed Out: /tmp/mahout-work-akm/reuters-kmeans Distance: org.apache.mahout.common.distance.CosineDistanceMeasure 14/01/24 09:42:17 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new decompressor Exception in thread "main" java.lang.IllegalStateException: No input clusters found in /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c argument. at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) On Fri, Jan 24, 2014 at 10:07 AM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Yeah, disregard, my repo was out of whack. > > > On Fri, Jan 24, 2014 at 10:00 AM, ap.dev wrote: > >> I'm not getting any exceptions there. >> >> Original message >> From: Andrew Musselman >> Date:01/24/2014 11:38 AM (GMT-05:00) >> To: dev@mahout.apache.org >> Subject: cluster-reuters.sh broken in trunk >> >> Last night I had this issue when testing out cluster-reuters.sh with no >> flags; anyone seen this recently? >> >> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms >> (Minutes: 4.7799833) >> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and >> HADOOP_CONF_DIR= >> MAHOUT-JOB: >> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar >> 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments: >> {--clustering=null, >> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], >> --convergenceDelta=[0.5], >> >> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], >> --endPhase=[2147483647], >> >> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], >> --maxIter=[10], --method=[mapreduce], --numClusters=[20], >> --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, >> --startPhase=[0], --tempDir=[temp]} >> 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor >> 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to >> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed >> 14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input: >> /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-v
[jira] [Commented] (MAHOUT-1410) clusteredPoints do not contain a vector id
[ https://issues.apache.org/jira/browse/MAHOUT-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881294#comment-13881294 ] Pat Ferrel commented on MAHOUT-1410: This relates to https://issues.apache.org/jira/browse/MAHOUT-1030 There are more comments there, not sure if this needs to be separate from 1030. > clusteredPoints do not contain a vector id > -- > > Key: MAHOUT-1410 > URL: https://issues.apache.org/jira/browse/MAHOUT-1410 > Project: Mahout > Issue Type: Bug > Components: Clustering >Affects Versions: 0.9 > Environment: using 0.9 release candidate >Reporter: Pat Ferrel > > When clustering non-named vectors there are no vector ids in clusteredPoints > so the other values there, cluster id, vector values, distance-squared, pdf, > cannot be tied to any known vector. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable
[ https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881287#comment-13881287 ] Pat Ferrel commented on MAHOUT-1030: adde a https://issues.apache.org/jira/browse/MAHOUT-1410 not sure if it's needed. I can promise an instant test on some real world data if you have a patch, very sorry I didn't notice sooner! > Regression: Clustered Points Should be WeightedPropertyVectorWritable not > WeightedVectorWritable > > > Key: MAHOUT-1030 > URL: https://issues.apache.org/jira/browse/MAHOUT-1030 > Project: Mahout > Issue Type: Bug > Components: Clustering, Integration >Affects Versions: 0.7 >Reporter: Jeff Eastman >Assignee: Andrew Musselman > Fix For: 0.9 > > Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, > MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch > > > Looks like this won't make it into this build. Pretty widespread impact on > code and tests and I don't know which properties were implemented in the old > version. I will create a JIRA and post my interim results. > On 6/8/12 12:21 PM, Jeff Eastman wrote: > > That's a reversion that evidently got in when the new > > ClusterClassificationDriver was introduced. It should be a pretty easy fix > > and I will see if I can make the change before Paritosh cuts the release > > bits tonight. > > > > On 6/7/12 1:00 PM, Pat Ferrel wrote: > >> It appears that in kmeans the clusteredPoints are now written as > >> WeightedVectorWritable where in mahout 0.6 they were > >> WeightedPropertyVectorWritable? This means that the distance from the > >> centroid is no longer stored here? Why? I hope I'm wrong because that is > >> not a welcome change. How is one to order clustered docs by distance from > >> cluster centroid? > >> > >> I'm sure I could calculate the distance but that would mean looking up the > >> centroid for the cluster id given in the above WeightedVectorWritable, > >> which means iterating through all the clusters for each clustered doc. In > >> my case the number of clusters could be fairly large. > >> > >> Am I missing something? > >> > >> > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable
[ https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881283#comment-13881283 ] Pat Ferrel commented on MAHOUT-1030: Hmm, Suneel recommends creating a new Jira so I will Comments from Suneel, I concur that we r presently not capturing the vectorIds (unless its a Named Vector) and also concur that its hard to infer as to which vector belongs to which cluster without that. It seems easy to use NamedVector for now to be able to determine the vectors that belong to a cluster. The Clustering algos are only reading VectorWritable() and if the VectorWritable() did not have the vector key (i.e. is not a Named Vector) the clustering algorithm just wouldn't have it. See the following code snippet from PartialVectorMergeReducer.java :- {Code} if (namedVector) { vector = new NamedVector(vector, key.toString()); } // drop empty vectors. if (vector.getNumNondefaultElements() > 0) { VectorWritable vectorWritable = new VectorWritable(vector); context.write(key, vectorWritable); } {Code} So from the above code snippet if its not a Named Vector then the corresponding vector key is not captured (in the VectorWritable). The RowIdJob reads the same Tf-Idf vectors and creates a docIndex and matrix (I am sure u know their layout and what they are intended for so I'll avoid the details here). The following code snippet from ClusterIterator.iterateSeq() only reads the VectorWritable but not the Key: for (VectorWritable vw : new SequenceFileDirValueIterable(inPath, PathType.LIST, PathFilters.logsCRCFilter(), conf)) { It should have been reading a Pair to capture the Key for the vector as well. I presently have a 0.9 Release sitting out there in staging waiting to be finalized. Please create a JIRA for this and we should have it fixed in the next major release (or Release Candidate). > Regression: Clustered Points Should be WeightedPropertyVectorWritable not > WeightedVectorWritable > > > Key: MAHOUT-1030 > URL: https://issues.apache.org/jira/browse/MAHOUT-1030 > Project: Mahout > Issue Type: Bug > Components: Clustering, Integration >Affects Versions: 0.7 >Reporter: Jeff Eastman >Assignee: Andrew Musselman > Fix For: 0.9 > > Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, > MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch > > > Looks like this won't make it into this build. Pretty widespread impact on > code and tests and I don't know which properties were implemented in the old > version. I will create a JIRA and post my interim results. > On 6/8/12 12:21 PM, Jeff Eastman wrote: > > That's a reversion that evidently got in when the new > > ClusterClassificationDriver was introduced. It should be a pretty easy fix > > and I will see if I can make the change before Paritosh cuts the release > > bits tonight. > > > > On 6/7/12 1:00 PM, Pat Ferrel wrote: > >> It appears that in kmeans the clusteredPoints are now written as > >> WeightedVectorWritable where in mahout 0.6 they were > >> WeightedPropertyVectorWritable? This means that the distance from the > >> centroid is no longer stored here? Why? I hope I'm wrong because that is > >> not a welcome change. How is one to order clustered docs by distance from > >> cluster centroid? > >> > >> I'm sure I could calculate the distance but that would mean looking up the > >> centroid for the cluster id given in the above WeightedVectorWritable, > >> which means iterating through all the clusters for each clustered doc. In > >> my case the number of clusters could be fairly large. > >> > >> Am I missing something? > >> > >> > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (MAHOUT-1410) clusteredPoints do not contain a vector id
Pat Ferrel created MAHOUT-1410: -- Summary: clusteredPoints do not contain a vector id Key: MAHOUT-1410 URL: https://issues.apache.org/jira/browse/MAHOUT-1410 Project: Mahout Issue Type: Bug Components: Clustering Affects Versions: 0.9 Environment: using 0.9 release candidate Reporter: Pat Ferrel When clustering non-named vectors there are no vector ids in clusteredPoints so the other values there, cluster id, vector values, distance-squared, pdf, cannot be tied to any known vector. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable
[ https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881273#comment-13881273 ] Andrew Musselman commented on MAHOUT-1030: -- Yes, was looking at this last night with Suneel. I think he found where it's happening and we'll check it out. > Regression: Clustered Points Should be WeightedPropertyVectorWritable not > WeightedVectorWritable > > > Key: MAHOUT-1030 > URL: https://issues.apache.org/jira/browse/MAHOUT-1030 > Project: Mahout > Issue Type: Bug > Components: Clustering, Integration >Affects Versions: 0.7 >Reporter: Jeff Eastman >Assignee: Andrew Musselman > Fix For: 0.9 > > Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, > MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch > > > Looks like this won't make it into this build. Pretty widespread impact on > code and tests and I don't know which properties were implemented in the old > version. I will create a JIRA and post my interim results. > On 6/8/12 12:21 PM, Jeff Eastman wrote: > > That's a reversion that evidently got in when the new > > ClusterClassificationDriver was introduced. It should be a pretty easy fix > > and I will see if I can make the change before Paritosh cuts the release > > bits tonight. > > > > On 6/7/12 1:00 PM, Pat Ferrel wrote: > >> It appears that in kmeans the clusteredPoints are now written as > >> WeightedVectorWritable where in mahout 0.6 they were > >> WeightedPropertyVectorWritable? This means that the distance from the > >> centroid is no longer stored here? Why? I hope I'm wrong because that is > >> not a welcome change. How is one to order clustered docs by distance from > >> cluster centroid? > >> > >> I'm sure I could calculate the distance but that would mean looking up the > >> centroid for the cluster id given in the above WeightedVectorWritable, > >> which means iterating through all the clusters for each clustered doc. In > >> my case the number of clusters could be fairly large. > >> > >> Am I missing something? > >> > >> > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable
[ https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881226#comment-13881226 ] Pat Ferrel commented on MAHOUT-1030: This fixes a very literal reading of the bug. The distance-squared is indeed included in clusteredPoints BUT there are no vector ids so the distance can't actually be used. Without a vector id in clusteredPoints, Mahout doesn't really perform unsupervised categorization. I will now have to loop through all vectors, recalculate the distance and categorize them according to the cluster centroid they are closest to. The clusteredPoints and distance-squared can't actually be used without knowing the vector id. I think named vectors work here but many cases including mine do not have names only Mahout integer ids. Please correct me if I've missed something. When I cluster the user preference data used in the Mahout recommender I get clusteredPoints something like this. The data from the vector is given but not its id??? The Key here is a cluster id. pat$ mahout seqdumper -i /Users/pat/big-data/temp/clusters/clusteredPoints/ | more Jan 24, 2014 10:02:05 AM org.slf4j.impl.JCLLoggerAdapter info INFO: Command line arguments: {--endPhase=[2147483647], --input=[/Users/pat/big-data/temp/clusters/clusteredPoints/], --startPhase=[0], --tempDir=[temp]} 2014-01-24 10:02:05.707 java[29221:1003] Unable to load realm info from SCDynamicStore Input Path: file:/Users/pat/big-data/temp/clusters/clusteredPoints/part-m-0 Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable Key: 39: Value: wt: 1.0 distance-squared: 9.656875 vec: [0:1.000, 2:1.000, 5:1.000, 9:1.000, 12:1.000, 13:1.000, 17:1.000, 18:1.000, 19:1.000, 20:1.000] Key: 48: Value: wt: 1.0 distance-squared: 22.2291686 vec: [25:1.000, 26:1.000, 27:1.000, 28:1.000, 29:1.000, 30:1.000, 31:1.000, 36:1.000, 38:1.000, 39:1.000, 40:1.000, 41:1.000, 43:1.000, 44:1.000, 46:1.000, 48:1.000, 53:1.000, 54:1.000, 55:1.000, 56:1.000, 57:1.000, 58:1.000, 60:1.000, 63:1.000, 64:1.000, 66:1.000, 67:1.000, 68:1.000, 69:1.000, 70:1.000, 71:1.000, 72:1.000] > Regression: Clustered Points Should be WeightedPropertyVectorWritable not > WeightedVectorWritable > > > Key: MAHOUT-1030 > URL: https://issues.apache.org/jira/browse/MAHOUT-1030 > Project: Mahout > Issue Type: Bug > Components: Clustering, Integration >Affects Versions: 0.7 >Reporter: Jeff Eastman >Assignee: Andrew Musselman > Fix For: 0.9 > > Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, > MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch > > > Looks like this won't make it into this build. Pretty widespread impact on > code and tests and I don't know which properties were implemented in the old > version. I will create a JIRA and post my interim results. > On 6/8/12 12:21 PM, Jeff Eastman wrote: > > That's a reversion that evidently got in when the new > > ClusterClassificationDriver was introduced. It should be a pretty easy fix > > and I will see if I can make the change before Paritosh cuts the release > > bits tonight. > > > > On 6/7/12 1:00 PM, Pat Ferrel wrote: > >> It appears that in kmeans the clusteredPoints are now written as > >> WeightedVectorWritable where in mahout 0.6 they were > >> WeightedPropertyVectorWritable? This means that the distance from the > >> centroid is no longer stored here? Why? I hope I'm wrong because that is > >> not a welcome change. How is one to order clustered docs by distance from > >> cluster centroid? > >> > >> I'm sure I could calculate the distance but that would mean looking up the > >> centroid for the cluster id given in the above WeightedVectorWritable, > >> which means iterating through all the clusters for each clustered doc. In > >> my case the number of clusters could be fairly large. > >> > >> Am I missing something? > >> > >> > > -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: cluster-reuters.sh broken in trunk
Yeah, disregard, my repo was out of whack. On Fri, Jan 24, 2014 at 10:00 AM, ap.dev wrote: > I'm not getting any exceptions there. > > Original message > From: Andrew Musselman > Date:01/24/2014 11:38 AM (GMT-05:00) > To: dev@mahout.apache.org > Subject: cluster-reuters.sh broken in trunk > > Last night I had this issue when testing out cluster-reuters.sh with no > flags; anyone seen this recently? > > 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms > (Minutes: 4.7799833) > Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and > HADOOP_CONF_DIR= > MAHOUT-JOB: > /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar > 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments: > {--clustering=null, > --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], > --convergenceDelta=[0.5], > > --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], > --endPhase=[2147483647], > > --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], > --maxIter=[10], --method=[mapreduce], --numClusters=[20], > --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, > --startPhase=[0], --tempDir=[temp]} > 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor > 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > 14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input: > /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors > Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > Out: /tmp/mahout-work-akm/reuters-kmeans Distance: > org.apache.mahout.common.distance.CosineDistanceMeasure > 14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max > Iterations: 10 > 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor > Exception in thread "main" java.lang.IllegalStateException: No input > clusters found in > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c > argument. > at > > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > $ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > Found 1 items > -rw-r--r-- 1 akm supergroup149 2014-01-23 22:03 > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > > > $ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > SEQ > > org.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable > *org.apache.hadoop.io.compress.DefaultCodec�M5�0ü��� $ >
RE: cluster-reuters.sh broken in trunk
I'm not getting any exceptions there. Original message From: Andrew Musselman Date:01/24/2014 11:38 AM (GMT-05:00) To: dev@mahout.apache.org Subject: cluster-reuters.sh broken in trunk Last night I had this issue when testing out cluster-reuters.sh with no flags; anyone seen this recently? 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms (Minutes: 4.7799833) Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]} 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed 14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input: /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed Out: /tmp/mahout-work-akm/reuters-kmeans Distance: org.apache.mahout.common.distance.CosineDistanceMeasure 14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor Exception in thread "main" java.lang.IllegalStateException: No input clusters found in /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c argument. at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) $ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed Found 1 items -rw-r--r-- 1 akm supergroup149 2014-01-23 22:03 /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed $ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed SEQorg.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable*org.apache.hadoop.io.compress.DefaultCodec�M5�0ü���$
Re: cluster-reuters.sh broken in trunk
`./cluster-reuters.sh` from examples/bin which worked the other day just fine, selecting option (1). It runs fine with `./cluster-reuters.sh -nv` though. On Fri, Jan 24, 2014 at 8:50 AM, Suneel Marthi wrote: > 'No Flags' ??? Could u post the command u were trying? > > > > > > > On Friday, January 24, 2014 11:38 AM, Andrew Musselman < > andrew.mussel...@gmail.com> wrote: > > Last night I had this issue when testing out cluster-reuters.sh with no > flags; anyone seen this recently? > > 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms > (Minutes: 4.7799833) > Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and > HADOOP_CONF_DIR= > MAHOUT-JOB: > /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar > 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments: > {--clustering=null, > --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], > --convergenceDelta=[0.5], > > --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], > --endPhase=[2147483647], > > --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], > --maxIter=[10], --method=[mapreduce], --numClusters=[20], > --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, > --startPhase=[0], --tempDir=[temp]} > 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor > 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > 14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input: > /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors > Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > Out: /tmp/mahout-work-akm/reuters-kmeans Distance: > org.apache.mahout.common.distance.CosineDistanceMeasure > 14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max > Iterations: 10 > 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor > Exception in thread "main" java.lang.IllegalStateException: No input > clusters found in > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c > argument. > at > > org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > $ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > Found 1 items > -rw-r--r-- 1 akm supergroup149 2014-01-23 22:03 > /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > > > $ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed > > SEQorg.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable*org.apache.hadoop.io.compress.DefaultCodec�M5�0ü���$ >
Re: cluster-reuters.sh broken in trunk
'No Flags' ??? Could u post the command u were trying? On Friday, January 24, 2014 11:38 AM, Andrew Musselman wrote: Last night I had this issue when testing out cluster-reuters.sh with no flags; anyone seen this recently? 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms (Minutes: 4.7799833) Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]} 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed 14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input: /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed Out: /tmp/mahout-work-akm/reuters-kmeans Distance: org.apache.mahout.common.distance.CosineDistanceMeasure 14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor Exception in thread "main" java.lang.IllegalStateException: No input clusters found in /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c argument. at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) $ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed Found 1 items -rw-r--r-- 1 akm supergroup 149 2014-01-23 22:03 /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed $ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed SEQorg.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable*org.apache.hadoop.io.compress.DefaultCodec�M5�0ü���$
cluster-reuters.sh broken in trunk
Last night I had this issue when testing out cluster-reuters.sh with no flags; anyone seen this recently? 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms (Minutes: 4.7799833) Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]} 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed 14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input: /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed Out: /tmp/mahout-work-akm/reuters-kmeans Distance: org.apache.mahout.common.distance.CosineDistanceMeasure 14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor Exception in thread "main" java.lang.IllegalStateException: No input clusters found in /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c argument. at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) $ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed Found 1 items -rw-r--r-- 1 akm supergroup149 2014-01-23 22:03 /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed $ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed SEQorg.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable*org.apache.hadoop.io.compress.DefaultCodec�M5�0ü���$
Re: MAHOUT 0.9 Release - New URL
Using CentOS 6.5 and hadoop 1.2.1, all passed. +1 from me Gokhan On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo wrote: > a),b),c),d) all passed on CentOS for me > > > Date: Thu, 23 Jan 2014 13:43:06 +0200 > > Subject: Re: MAHOUT 0.9 Release - New URL > > From: ssvinarc...@hortonworks.com > > To: dev@mahout.apache.org > > > > I did a), b), c), d) and all steps pass. > > +1 > > > > > > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll >wrote: > > > > > +1 from me. > > > > > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi > > > wrote: > > > > > > > Fixed the issues that were reported this week and restored FP mining > > > into the codebase. > > > > > > > > Here's the URL for the final release in staging:- > > > > > > > > https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/ > > > > > > > > The artifacts have been signed with the following key: > > > > https://people.apache.org/keys/committer/smarthi.asc > > > > > > > > > > > > a) Verify that u can unpack the release (tar or zip) > > > > b) Verify u r able to compile the distro > > > > c) Run through the unit tests: mvn clean test > > > > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please > run > > > through all the different options in each script. > > > > > > > > Committers and PMC, need a minimum of 3 '+1' votes for the release > to be > > > finalized. > > > > > > > > > Grant Ingersoll | @gsingers > > > http://www.lucidworks.com > > > > > > > > > > > > > > > > > > > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. > >