[jira] [Commented] (MAHOUT-1390) SVD hangs for certain inputs

2014-01-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881736#comment-13881736
 ] 

Hudson commented on MAHOUT-1390:


SUCCESS: Integrated in Mahout-Quality #2435 (See 
[https://builds.apache.org/job/Mahout-Quality/2435/])
MAHOUT-1390 - Fixed extraneous commit. (tdunning: rev 1560889)
* 
/mahout/trunk/math/src/test/java/org/apache/mahout/math/TestSingularValueDecomposition.java


> SVD hangs for certain inputs
> 
>
> Key: MAHOUT-1390
> URL: https://issues.apache.org/jira/browse/MAHOUT-1390
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.8
>Reporter: Ted Dunning
>Priority: Critical
> Fix For: 0.9
>
> Attachments: MAHOUT-1390.patch
>
>
> For certain inputs, the SingularValueDecomposition implementation that we 
> have doesn't detect that it has effectively converged and runs into an 
> infinite loop.
> Luckily, there is a fix that has been added to the Jama implementation that 
> our SVD is ultimately based on and that fix works for our problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1409) MatrixVectorView has index check error

2014-01-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881735#comment-13881735
 ] 

Hudson commented on MAHOUT-1409:


SUCCESS: Integrated in Mahout-Quality #2435 (See 
[https://builds.apache.org/job/Mahout-Quality/2435/])
MAHOUT-1409 - bad index checking in viewColumn or viewRow. (tdunning: rev 
1560890)
* /mahout/trunk/math/src/main/java/org/apache/mahout/math/MatrixVectorView.java
* 
/mahout/trunk/math/src/test/java/org/apache/mahout/math/MatrixVectorViewTest.java


> MatrixVectorView has index check error
> --
>
> Key: MAHOUT-1409
> URL: https://issues.apache.org/jira/browse/MAHOUT-1409
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Ted Dunning
>Assignee: Ted Dunning
> Attachments: MAHOUT-1409.patch
>
>
> There is a > in the test for the correct index where there should be a >=



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: MAHOUT 0.9 Release - New URL

2014-01-24 Thread Ted Dunning
My schedule has opened up a bit and I can review as well.




On Fri, Jan 24, 2014 at 3:06 PM, Sebastian Schelter  wrote:

> I will try the next candidate agaim, so one vote is sure.
> Am 24.01.2014 23:54 schrieb "Suneel Marthi" :
>
> > I am open to having the conversation (and a part of me feels that the
> > clusteringId fix should be in 0.9).
> >
> > If we decide to incorporate that into 0.9, I need to rollback the 0.9
> > Release that's presently out there in staging (for the 5th time in a row
> > now).
> > I am fine with doing that.
> >
> > What do you think we should do?
> >
> > a) Go ahead with 0.9 release without the fix for M-1410 .
> > b) Rollback 0.9 and include the fix for M-1410
> > c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
> > M-1410 and any other issues/enhancements that are fixed.
> >
> >
> > I am leaning towards (b), my only concern being that from my experience
> in
> > the past few weeks; its become real hard to muster the minimum 3 +1 PMC
> > votes required for a release to pass.
> >
> >
> >
> >
> >
> >
> >
> >
> > On Friday, January 24, 2014 5:45 PM, Ted Dunning 
> > wrote:
> >
> >
> >
> > Can we hold a separate discussion about whether the clustering id issue
> > has to be in 0.9 while extending the vote deadline if necessary?
> >
> > If not, then all these votes are great and the release can go forward.
> >
> > If it is the sense that that fix has to be in, we should leave time for
> > people for people to reverse their votes to -1.
> >
> >
> >
> >
> > On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi 
> > wrote:
> >
> > Thanks for all those that volunteered.  The voting for 0.9 Release closes
> > tomorrow.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >On Friday, January 24, 2014 4:05 AM, Gokhan Capan 
> > wrote:
> > >
> > >Using CentOS 6.5 and hadoop 1.2.1, all passed.
> > >
> > >+1 from me
> > >
> > >Gokhan
> > >
> > >
> > >
> > >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo 
> > wrote:
> > >
> > >> a),b),c),d) all passed on CentOS for me
> > >>
> > >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > >> > Subject: Re: MAHOUT 0.9 Release - New URL
> > >> > From: ssvinarc...@hortonworks.com
> > >> > To: dev@mahout.apache.org
> > >> >
> > >> > I did a), b), c), d) and all steps pass.
> > >> > +1
> > >> >
> > >> >
> > >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <
> gsing...@apache.org
> > >> >wrote:
> > >> >
> > >> > > +1 from me.
> > >> > >
> > >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <
> suneel_mar...@yahoo.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Fixed the issues that were reported this week and restored FP
> > mining
> > >> > > into the codebase.
> > >> > > >
> > >> > > > Here's the URL for the final release in staging:-
> > >> > > >
> > >> > >
> > >>
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > >> > > >
> > >> > > > The artifacts have been signed with the
> > > following key:
> > >> > > > https://people.apache.org/keys/committer/smarthi.asc
> > >> > > >
> > >> > > >
> > >> > > > a) Verify that u can unpack the release (tar or zip)
> > >> > > > b) Verify u r able to compile the distro
> > >> > > > c)  Run through the unit tests: mvn clean test
> > >> > > > d) Run the example scripts under
> > > $MAHOUT_HOME/examples/bin. Please
> > >> run
> > >> > > through all the different options in each script.
> > >> > > >
> > >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the
> release
> > >> to be
> > >> > > finalized.
> > >> > >
> > >> > > 
> > >> > > Grant Ingersoll | @gsingers
> > >> > > http://www.lucidworks.com
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >> > --
> > >>
> > > > CONFIDENTIALITY NOTICE
> > >> > NOTICE: This message is intended for the use of the individual or
> > entity
> > >> to
> > >> > which it is addressed and may contain information that is
> > confidential,
> > >> > privileged and exempt from disclosure under applicable law. If the
> > reader
> > >> > of this message is not the intended recipient, you are hereby
> notified
> > >> that
> > >> > any printing, copying, dissemination, distribution, disclosure or
> > >> > forwarding of this communication is strictly prohibited. If you have
> > >> > received this communication in error, please contact the sender
> > >> immediately
> > >> > and delete it from your system. Thank You.
> > >>
> > >>
>


Re: MAHOUT 0.9 Release - New URL

2014-01-24 Thread Sebastian Schelter
I will try the next candidate agaim, so one vote is sure.
Am 24.01.2014 23:54 schrieb "Suneel Marthi" :

> I am open to having the conversation (and a part of me feels that the
> clusteringId fix should be in 0.9).
>
> If we decide to incorporate that into 0.9, I need to rollback the 0.9
> Release that's presently out there in staging (for the 5th time in a row
> now).
> I am fine with doing that.
>
> What do you think we should do?
>
> a) Go ahead with 0.9 release without the fix for M-1410 .
> b) Rollback 0.9 and include the fix for M-1410
> c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
> M-1410 and any other issues/enhancements that are fixed.
>
>
> I am leaning towards (b), my only concern being that from my experience in
> the past few weeks; its become real hard to muster the minimum 3 +1 PMC
> votes required for a release to pass.
>
>
>
>
>
>
>
>
> On Friday, January 24, 2014 5:45 PM, Ted Dunning 
> wrote:
>
>
>
> Can we hold a separate discussion about whether the clustering id issue
> has to be in 0.9 while extending the vote deadline if necessary?
>
> If not, then all these votes are great and the release can go forward.
>
> If it is the sense that that fix has to be in, we should leave time for
> people for people to reverse their votes to -1.
>
>
>
>
> On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi 
> wrote:
>
> Thanks for all those that volunteered.  The voting for 0.9 Release closes
> tomorrow.
> >
> >
> >
> >
> >
> >
> >
> >
> >On Friday, January 24, 2014 4:05 AM, Gokhan Capan 
> wrote:
> >
> >Using CentOS 6.5 and hadoop 1.2.1, all passed.
> >
> >+1 from me
> >
> >Gokhan
> >
> >
> >
> >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo 
> wrote:
> >
> >> a),b),c),d) all passed on CentOS for me
> >>
> >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> >> > Subject: Re: MAHOUT 0.9 Release - New URL
> >> > From: ssvinarc...@hortonworks.com
> >> > To: dev@mahout.apache.org
> >> >
> >> > I did a), b), c), d) and all steps pass.
> >> > +1
> >> >
> >> >
> >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll  >> >wrote:
> >> >
> >> > > +1 from me.
> >> > >
> >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi  >
> >> > > wrote:
> >> > >
> >> > > > Fixed the issues that were reported this week and restored FP
> mining
> >> > > into the codebase.
> >> > > >
> >> > > > Here's the URL for the final release in staging:-
> >> > > >
> >> > >
> >>
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> >> > > >
> >> > > > The artifacts have been signed with the
> > following key:
> >> > > > https://people.apache.org/keys/committer/smarthi.asc
> >> > > >
> >> > > >
> >> > > > a) Verify that u can unpack the release (tar or zip)
> >> > > > b) Verify u r able to compile the distro
> >> > > > c)  Run through the unit tests: mvn clean test
> >> > > > d) Run the example scripts under
> > $MAHOUT_HOME/examples/bin. Please
> >> run
> >> > > through all the different options in each script.
> >> > > >
> >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> >> to be
> >> > > finalized.
> >> > >
> >> > > 
> >> > > Grant Ingersoll | @gsingers
> >> > > http://www.lucidworks.com
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> > --
> >>
> > > CONFIDENTIALITY NOTICE
> >> > NOTICE: This message is intended for the use of the individual or
> entity
> >> to
> >> > which it is addressed and may contain information that is
> confidential,
> >> > privileged and exempt from disclosure under applicable law. If the
> reader
> >> > of this message is not the intended recipient, you are hereby notified
> >> that
> >> > any printing, copying, dissemination, distribution, disclosure or
> >> > forwarding of this communication is strictly prohibited. If you have
> >> > received this communication in error, please contact the sender
> >> immediately
> >> > and delete it from your system. Thank You.
> >>
> >>


Re: MAHOUT 0.9 Release - New URL

2014-01-24 Thread Suneel Marthi
I am open to having the conversation (and a part of me feels that the 
clusteringId fix should be in 0.9). 

If we decide to incorporate that into 0.9, I need to rollback the 0.9 Release 
that's presently out there in staging (for the 5th time in a row now). 
I am fine with doing that.  

What do you think we should do?

a) Go ahead with 0.9 release without the fix for M-1410 .
b) Rollback 0.9 and include the fix for M-1410
c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes 
M-1410 and any other issues/enhancements that are fixed.


I am leaning towards (b), my only concern being that from my experience in the 
past few weeks; its become real hard to muster the minimum 3 +1 PMC votes 
required for a release to pass. 








On Friday, January 24, 2014 5:45 PM, Ted Dunning  wrote:
 


Can we hold a separate discussion about whether the clustering id issue has to 
be in 0.9 while extending the vote deadline if necessary?

If not, then all these votes are great and the release can go forward.

If it is the sense that that fix has to be in, we should leave time for people 
for people to reverse their votes to -1.




On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi  wrote:

Thanks for all those that volunteered.  The voting for 0.9 Release closes 
tomorrow.
>
>
>
>
>
>
>
>
>On Friday, January 24, 2014 4:05 AM, Gokhan Capan  wrote:
>
>Using CentOS 6.5 and hadoop 1.2.1, all passed.
>
>+1 from me
>
>Gokhan
>
>
>
>On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo  wrote:
>
>> a),b),c),d) all passed on CentOS for me
>>
>> > Date: Thu, 23 Jan 2014 13:43:06 +0200
>> > Subject: Re: MAHOUT 0.9 Release - New URL
>> > From: ssvinarc...@hortonworks.com
>> > To: dev@mahout.apache.org
>> >
>> > I did a), b), c), d) and all steps pass.
>> > +1
>> >
>> >
>> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll > >wrote:
>> >
>> > > +1 from me.
>> > >
>> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi 
>> > > wrote:
>> > >
>> > > > Fixed the issues that were reported this week and restored FP mining
>> > > into the codebase.
>> > > >
>> > > > Here's the URL for the final release in staging:-
>> > > >
>> > >
>> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>> > > >
>> > > > The artifacts have been signed with the
> following key:
>> > > > https://people.apache.org/keys/committer/smarthi.asc
>> > > >
>> > > >
>> > > > a) Verify that u can unpack the release (tar or zip)
>> > > > b) Verify u r able to compile the distro
>> > > > c)  Run through the unit tests: mvn clean test
>> > > > d) Run the example scripts under
> $MAHOUT_HOME/examples/bin. Please
>> run
>> > > through all the different options in each script.
>> > > >
>> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
>> to be
>> > > finalized.
>> > >
>> > > 
>> > > Grant Ingersoll | @gsingers
>> > > http://www.lucidworks.com
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>> > --
>>
> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or entity
>> to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the reader
>> > of this message is not the intended recipient, you are hereby notified
>> that
>> > any printing, copying, dissemination, distribution, disclosure or
>> > forwarding of this communication is strictly prohibited. If you have
>> > received this communication in error, please contact the sender
>> immediately
>> > and delete it from your system. Thank You.
>>
>>

Re: MAHOUT 0.9 Release - New URL

2014-01-24 Thread Ted Dunning
Can we hold a separate discussion about whether the clustering id issue has
to be in 0.9 while extending the vote deadline if necessary?

If not, then all these votes are great and the release can go forward.

If it is the sense that that fix has to be in, we should leave time for
people for people to reverse their votes to -1.



On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi wrote:

> Thanks for all those that volunteered.  The voting for 0.9 Release closes
> tomorrow.
>
>
>
>
>
>
>
> On Friday, January 24, 2014 4:05 AM, Gokhan Capan 
> wrote:
>
> Using CentOS 6.5 and hadoop 1.2.1, all passed.
>
> +1 from me
>
> Gokhan
>
>
>
> On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo 
> wrote:
>
> > a),b),c),d) all passed on CentOS for me
> >
> > > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > > Subject: Re: MAHOUT 0.9 Release - New URL
> > > From: ssvinarc...@hortonworks.com
> > > To: dev@mahout.apache.org
> > >
> > > I did a), b), c), d) and all steps pass.
> > > +1
> > >
> > >
> > > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll  > >wrote:
> > >
> > > > +1 from me.
> > > >
> > > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi 
> > > > wrote:
> > > >
> > > > > Fixed the issues that were reported this week and restored FP
> mining
> > > > into the codebase.
> > > > >
> > > > > Here's the URL for the final release in staging:-
> > > > >
> > > >
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > > > >
> > > > > The artifacts have been signed with the
>  following key:
> > > > > https://people.apache.org/keys/committer/smarthi.asc
> > > > >
> > > > >
> > > > > a) Verify that u can unpack the release (tar or zip)
> > > > > b) Verify u r able to compile the distro
> > > > > c)  Run through the unit tests: mvn clean test
> > > > > d) Run the example scripts under
>  $MAHOUT_HOME/examples/bin. Please
> > run
> > > > through all the different options in each script.
> > > > >
> > > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> > to be
> > > > finalized.
> > > >
> > > > 
> > > > Grant Ingersoll | @gsingers
> > > > http://www.lucidworks.com
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> >
>  > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> >
> >
>


Re: MAHOUT 0.9 Release - New URL

2014-01-24 Thread Suneel Marthi
Thanks for all those that volunteered.  The voting for 0.9 Release closes 
tomorrow.







On Friday, January 24, 2014 4:05 AM, Gokhan Capan  wrote:
 
Using CentOS 6.5 and hadoop 1.2.1, all passed.

+1 from me

Gokhan



On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo  wrote:

> a),b),c),d) all passed on CentOS for me
>
> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > From: ssvinarc...@hortonworks.com
> > To: dev@mahout.apache.org
> >
> > I did a), b), c), d) and all steps pass.
> > +1
> >
> >
> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll  >wrote:
> >
> > > +1 from me.
> > >
> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi 
> > > wrote:
> > >
> > > > Fixed the issues that were reported this week and restored FP mining
> > > into the codebase.
> > > >
> > > > Here's the URL for the final release in staging:-
> > > >
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > > >
> > > > The artifacts have been signed with the
 following key:
> > > > https://people.apache.org/keys/committer/smarthi.asc
> > > >
> > > >
> > > > a) Verify that u can unpack the release (tar or zip)
> > > > b) Verify u r able to compile the distro
> > > > c)  Run through the unit tests: mvn clean test
> > > > d) Run the example scripts under
 $MAHOUT_HOME/examples/bin. Please
> run
> > > through all the different options in each script.
> > > >
> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> to be
> > > finalized.
> > >
> > > 
> > > Grant Ingersoll | @gsingers
> > > http://www.lucidworks.com
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > --
>
 > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>

Re: cluster-reuters.sh broken in trunk

2014-01-24 Thread Shannon Quinn
Does Mahout still support Hadoop 0.20.2x? I know we had some discussions on 
this but I can't find them at the moment. 

iPhone'd

> On Jan 24, 2014, at 16:43, Suneel Marthi  wrote:
> 
> I assume u r running this in MR mode??  Could u clear up your 
> /tmp/mahout-work- folder and try again.
> 
> 
> 
> 
> On Friday, January 24, 2014 1:56 PM, Andrew Musselman 
>  wrote:
> 
> Actually, getting the same error with a fresh svn checkout:
> 
> 14/01/24 09:42:13 INFO driver.MahoutDriver: Program took 291353 ms
> (Minutes: 4.8558834)
> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 14/01/24 09:42:16 INFO common.AbstractJob: Command line arguments:
> {--clustering=null,
> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
> --convergenceDelta=[0.5],
> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
> --endPhase=[2147483647],
> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
> --maxIter=[10], --method=[mapreduce], --numClusters=[20],
> --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
> --startPhase=[0], --tempDir=[temp]}
> 14/01/24 09:42:17 INFO common.HadoopUtil: Deleting
> /tmp/mahout-work-akm/reuters-kmeans-clusters
> 14/01/24 09:42:17 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new compressor
> 14/01/24 09:42:17 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> 14/01/24 09:42:17 INFO kmeans.KMeansDriver: Input:
> /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors
> Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> Out: /tmp/mahout-work-akm/reuters-kmeans Distance:
> org.apache.mahout.common.distance.CosineDistanceMeasure
> 14/01/24 09:42:17 INFO kmeans.KMeansDriver: convergence: 0.5 max
> Iterations: 10
> 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new decompressor
> Exception in thread "main" java.lang.IllegalStateException: No input
> clusters found in
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c
> argument.
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> 
> 
> 
> On Fri, Jan 24, 2014 at 10:07 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
> 
>> Yeah, disregard, my repo was out of whack.
>> 
>> 
>>> On Fri, Jan 24, 2014 at 10:00 AM, ap.dev  wrote:
>>> 
>>> I'm not getting any exceptions there.
>>> 
>>>  Original message 
>>> From: Andrew Musselman 
>>> Date:01/24/2014  11:38 AM  (GMT-05:00)
>>> To: dev@mahout.apache.org
>>> Subject: cluster-reuters.sh broken in trunk
>>> 
>>> Last night I had this issue when testing out cluster-reuters.sh with no
>>> flags; anyone seen this recently?
>>> 
>>> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms
>>> (Minutes: 4.7799833)
>>> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
>>> HADOOP_CONF_DIR=
>>> MAHOUT-JOB:
>>> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>>> 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments:
>>> {--clustering=null,
>>> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
>>> --convergenceDelta=[0.5],
>>> 
>>> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
>>> --endPhase=[2147483647],
>>> 
>>> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
>>> --maxIter=[10], --method=[mapreduce], --numClusters=[20],
>>> --output=[/tmp/mahout-wor

Re: cluster-reuters.sh broken in trunk

2014-01-24 Thread Suneel Marthi
I assume u r running this in MR mode??  Could u clear up your 
/tmp/mahout-work- folder and try again.




On Friday, January 24, 2014 1:56 PM, Andrew Musselman 
 wrote:
 
Actually, getting the same error with a fresh svn checkout:

14/01/24 09:42:13 INFO driver.MahoutDriver: Program took 291353 ms
(Minutes: 4.8558834)
Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
14/01/24 09:42:16 INFO common.AbstractJob: Command line arguments:
{--clustering=null,
--clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
--convergenceDelta=[0.5],
--distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
--endPhase=[2147483647],
--input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
--maxIter=[10], --method=[mapreduce], --numClusters=[20],
--output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
--startPhase=[0], --tempDir=[temp]}
14/01/24 09:42:17 INFO common.HadoopUtil: Deleting
/tmp/mahout-work-akm/reuters-kmeans-clusters
14/01/24 09:42:17 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new compressor
14/01/24 09:42:17 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
14/01/24 09:42:17 INFO kmeans.KMeansDriver: Input:
/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors
Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
Out: /tmp/mahout-work-akm/reuters-kmeans Distance:
org.apache.mahout.common.distance.CosineDistanceMeasure
14/01/24 09:42:17 INFO kmeans.KMeansDriver: convergence: 0.5 max
Iterations: 10
14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.IllegalStateException: No input
clusters found in
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c
argument.
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)




On Fri, Jan 24, 2014 at 10:07 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Yeah, disregard, my repo was out of whack.
>
>
> On Fri, Jan 24, 2014 at 10:00 AM, ap.dev  wrote:
>
>> I'm not getting any exceptions there.
>>
>>  Original message 
>> From: Andrew Musselman 
>> Date:01/24/2014  11:38 AM  (GMT-05:00)
>> To: dev@mahout.apache.org
>> Subject: cluster-reuters.sh broken in trunk
>>
>> Last night I had this issue when testing out cluster-reuters.sh with no
>> flags; anyone seen this recently?
>>
>> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms
>> (Minutes: 4.7799833)
>> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
>> HADOOP_CONF_DIR=
>> MAHOUT-JOB:
>> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>> 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments:
>> {--clustering=null,
>> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
>> --convergenceDelta=[0.5],
>>
>> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
>> --endPhase=[2147483647],
>>
>> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
>> --maxIter=[10], --method=[mapreduce], --numClusters=[20],
>> --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
>> --startPhase=[0], --tempDir=[temp]}
>> 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor
>> 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
>> /t

[jira] [Updated] (MAHOUT-1409) MatrixVectorView has index check error

2014-01-24 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated MAHOUT-1409:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

I committed this.

> MatrixVectorView has index check error
> --
>
> Key: MAHOUT-1409
> URL: https://issues.apache.org/jira/browse/MAHOUT-1409
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Ted Dunning
>Assignee: Ted Dunning
> Attachments: MAHOUT-1409.patch
>
>
> There is a > in the test for the correct index where there should be a >=



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


RE: cluster-reuters.sh broken in trunk

2014-01-24 Thread Andrew Palumbo
Cant reproduce on CentOS with hadoop 2.2.0

  Running on hadoop, using 
/home/andy/apache_builds/hadoop_bin/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/bin/hadoop
 and HADOOP_CONF_DIR=
MAHOUT-JOB: 
/home/andy/mahout_test/sandbox/mahout-trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
14/01/24 14:09:32 INFO common.AbstractJob: Command line arguments: 
{--clustering=null,
 --clusters=[/tmp/mahout-work-andy/reuters-kmeans-clusters],
 --convergenceDelta=[0.5],
 --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
 --endPhase=[2147483647],
 
--input=[/tmp/mahout-work-andy/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
 --maxIter=[10], --method=[mapreduce], --numClusters=[20],
 --output=[/tmp/mahout-work-andy/reuters-kmeans], --overwrite=null,
 --startPhase=[0], --tempDir=[temp]}

Attached the full log that helps any.  

> Date: Fri, 24 Jan 2014 10:55:14 -0800
> Subject: Re: cluster-reuters.sh broken in trunk
> From: andrew.mussel...@gmail.com
> To: dev@mahout.apache.org
> 
> Actually, getting the same error with a fresh svn checkout:
> 
> 14/01/24 09:42:13 INFO driver.MahoutDriver: Program took 291353 ms
> (Minutes: 4.8558834)
> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 14/01/24 09:42:16 INFO common.AbstractJob: Command line arguments:
> {--clustering=null,
> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
> --convergenceDelta=[0.5],
> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
> --endPhase=[2147483647],
> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
> --maxIter=[10], --method=[mapreduce], --numClusters=[20],
> --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
> --startPhase=[0], --tempDir=[temp]}
> 14/01/24 09:42:17 INFO common.HadoopUtil: Deleting
> /tmp/mahout-work-akm/reuters-kmeans-clusters
> 14/01/24 09:42:17 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new compressor
> 14/01/24 09:42:17 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> 14/01/24 09:42:17 INFO kmeans.KMeansDriver: Input:
> /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors
> Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> Out: /tmp/mahout-work-akm/reuters-kmeans Distance:
> org.apache.mahout.common.distance.CosineDistanceMeasure
> 14/01/24 09:42:17 INFO kmeans.KMeansDriver: convergence: 0.5 max
> Iterations: 10
> 14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new decompressor
> Exception in thread "main" java.lang.IllegalStateException: No input
> clusters found in
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c
> argument.
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> 
> 
> On Fri, Jan 24, 2014 at 10:07 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
> 
> > Yeah, disregard, my repo was out of whack.
> >
> >
> > On Fri, Jan 24, 2014 at 10:00 AM, ap.dev  wrote:
> >
> >> I'm not getting any exceptions there.
> >>
> >>  Original message 
> >> From: Andrew Musselman 
> >> Date:01/24/2014  11:38 AM  (GMT-05:00)
> >> To: dev@mahout.apache.org
> >> Subject: cluster-reuters.sh broken in trunk
> >>
> >> Last night I had this issue when testing out cluster-reuters.sh with no
> >> flags; anyone seen this recently?
> >>
> >> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms
> >> (Minutes: 4.7799833)
> >> Running

Re: cluster-reuters.sh broken in trunk

2014-01-24 Thread Andrew Musselman
Actually, getting the same error with a fresh svn checkout:

14/01/24 09:42:13 INFO driver.MahoutDriver: Program took 291353 ms
(Minutes: 4.8558834)
Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
14/01/24 09:42:16 INFO common.AbstractJob: Command line arguments:
{--clustering=null,
--clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
--convergenceDelta=[0.5],
--distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
--endPhase=[2147483647],
--input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
--maxIter=[10], --method=[mapreduce], --numClusters=[20],
--output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
--startPhase=[0], --tempDir=[temp]}
14/01/24 09:42:17 INFO common.HadoopUtil: Deleting
/tmp/mahout-work-akm/reuters-kmeans-clusters
14/01/24 09:42:17 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new compressor
14/01/24 09:42:17 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
14/01/24 09:42:17 INFO kmeans.KMeansDriver: Input:
/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors
Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
Out: /tmp/mahout-work-akm/reuters-kmeans Distance:
org.apache.mahout.common.distance.CosineDistanceMeasure
14/01/24 09:42:17 INFO kmeans.KMeansDriver: convergence: 0.5 max
Iterations: 10
14/01/24 09:42:17 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.IllegalStateException: No input
clusters found in
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c
argument.
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



On Fri, Jan 24, 2014 at 10:07 AM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> Yeah, disregard, my repo was out of whack.
>
>
> On Fri, Jan 24, 2014 at 10:00 AM, ap.dev  wrote:
>
>> I'm not getting any exceptions there.
>>
>>  Original message 
>> From: Andrew Musselman 
>> Date:01/24/2014  11:38 AM  (GMT-05:00)
>> To: dev@mahout.apache.org
>> Subject: cluster-reuters.sh broken in trunk
>>
>> Last night I had this issue when testing out cluster-reuters.sh with no
>> flags; anyone seen this recently?
>>
>> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms
>> (Minutes: 4.7799833)
>> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
>> HADOOP_CONF_DIR=
>> MAHOUT-JOB:
>> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>> 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments:
>> {--clustering=null,
>> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
>> --convergenceDelta=[0.5],
>>
>> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
>> --endPhase=[2147483647],
>>
>> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
>> --maxIter=[10], --method=[mapreduce], --numClusters=[20],
>> --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
>> --startPhase=[0], --tempDir=[temp]}
>> 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor
>> 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
>> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
>> 14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input:
>> /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-v

[jira] [Commented] (MAHOUT-1410) clusteredPoints do not contain a vector id

2014-01-24 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881294#comment-13881294
 ] 

Pat Ferrel commented on MAHOUT-1410:


This relates to https://issues.apache.org/jira/browse/MAHOUT-1030

There are more comments there, not sure if this needs to be separate from 1030.




> clusteredPoints do not contain a vector id
> --
>
> Key: MAHOUT-1410
> URL: https://issues.apache.org/jira/browse/MAHOUT-1410
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.9
> Environment: using 0.9 release candidate
>Reporter: Pat Ferrel
>
> When clustering non-named vectors there are no vector ids in clusteredPoints 
> so the other values there, cluster id, vector values, distance-squared, pdf, 
> cannot be tied to any known vector.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2014-01-24 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881287#comment-13881287
 ] 

Pat Ferrel commented on MAHOUT-1030:


adde a https://issues.apache.org/jira/browse/MAHOUT-1410
not sure if it's needed.

I can promise an instant test on some real world data if you have a patch, very 
sorry I didn't notice sooner!


> Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
> WeightedVectorWritable
> 
>
> Key: MAHOUT-1030
> URL: https://issues.apache.org/jira/browse/MAHOUT-1030
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering, Integration
>Affects Versions: 0.7
>Reporter: Jeff Eastman
>Assignee: Andrew Musselman
> Fix For: 0.9
>
> Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, 
> MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch
>
>
> Looks like this won't make it into this build. Pretty widespread impact on 
> code and tests and I don't know which properties were implemented in the old 
> version. I will create a JIRA and post my interim results.
> On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > That's a reversion that evidently got in when the new 
> > ClusterClassificationDriver was introduced. It should be a pretty easy fix 
> > and I will see if I can make the change before Paritosh cuts the release 
> > bits tonight.
> >
> > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> >> It appears that in kmeans the clusteredPoints are now written as 
> >> WeightedVectorWritable where in mahout 0.6 they were 
> >> WeightedPropertyVectorWritable? This means that the distance from the 
> >> centroid is no longer stored here? Why? I hope I'm wrong because that is 
> >> not a welcome change. How is one to order clustered docs by distance from 
> >> cluster centroid?
> >>
> >> I'm sure I could calculate the distance but that would mean looking up the 
> >> centroid for the cluster id given in the above WeightedVectorWritable, 
> >> which means iterating through all the clusters for each clustered doc. In 
> >> my case the number of clusters could be fairly large.
> >>
> >> Am I missing something?
> >>
> >>
> >



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2014-01-24 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881283#comment-13881283
 ] 

Pat Ferrel commented on MAHOUT-1030:


Hmm, Suneel recommends creating a new Jira so I will

Comments from Suneel,

I concur that we r presently not capturing the vectorIds (unless its a Named 
Vector) and also concur that its hard to infer as to which vector belongs to 
which cluster without that. It seems easy to use NamedVector for now to be able 
to determine the vectors that belong to a cluster.

The Clustering algos are only reading VectorWritable() and if the 
VectorWritable() did not have the vector key (i.e. is not a Named Vector) the 
clustering algorithm just wouldn't have it.

See the following code snippet from PartialVectorMergeReducer.java :-

{Code}

if (namedVector) {
  vector = new NamedVector(vector, key.toString());
}

// drop empty vectors.
if (vector.getNumNondefaultElements() > 0) {
  VectorWritable vectorWritable = new VectorWritable(vector);
  context.write(key, vectorWritable);
}

{Code}

So from the above code snippet if its not a Named Vector then the corresponding 
vector key is not captured (in the VectorWritable).

The RowIdJob reads the same Tf-Idf vectors and creates a docIndex and matrix (I 
am sure u know their layout and what they are intended for so I'll avoid the 
details here).

The following code snippet from ClusterIterator.iterateSeq() only reads the 
VectorWritable but not the Key:

  for (VectorWritable vw : new 
SequenceFileDirValueIterable(inPath, PathType.LIST,
  PathFilters.logsCRCFilter(), conf)) {

It should have been reading a Pair to capture the Key for 
the vector as well.

I presently have a 0.9 Release sitting out there in staging waiting to be 
finalized.  Please create a JIRA for this and we should have it fixed in the 
next major release (or Release Candidate).



> Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
> WeightedVectorWritable
> 
>
> Key: MAHOUT-1030
> URL: https://issues.apache.org/jira/browse/MAHOUT-1030
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering, Integration
>Affects Versions: 0.7
>Reporter: Jeff Eastman
>Assignee: Andrew Musselman
> Fix For: 0.9
>
> Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, 
> MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch
>
>
> Looks like this won't make it into this build. Pretty widespread impact on 
> code and tests and I don't know which properties were implemented in the old 
> version. I will create a JIRA and post my interim results.
> On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > That's a reversion that evidently got in when the new 
> > ClusterClassificationDriver was introduced. It should be a pretty easy fix 
> > and I will see if I can make the change before Paritosh cuts the release 
> > bits tonight.
> >
> > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> >> It appears that in kmeans the clusteredPoints are now written as 
> >> WeightedVectorWritable where in mahout 0.6 they were 
> >> WeightedPropertyVectorWritable? This means that the distance from the 
> >> centroid is no longer stored here? Why? I hope I'm wrong because that is 
> >> not a welcome change. How is one to order clustered docs by distance from 
> >> cluster centroid?
> >>
> >> I'm sure I could calculate the distance but that would mean looking up the 
> >> centroid for the cluster id given in the above WeightedVectorWritable, 
> >> which means iterating through all the clusters for each clustered doc. In 
> >> my case the number of clusters could be fairly large.
> >>
> >> Am I missing something?
> >>
> >>
> >



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAHOUT-1410) clusteredPoints do not contain a vector id

2014-01-24 Thread Pat Ferrel (JIRA)
Pat Ferrel created MAHOUT-1410:
--

 Summary: clusteredPoints do not contain a vector id
 Key: MAHOUT-1410
 URL: https://issues.apache.org/jira/browse/MAHOUT-1410
 Project: Mahout
  Issue Type: Bug
  Components: Clustering
Affects Versions: 0.9
 Environment: using 0.9 release candidate
Reporter: Pat Ferrel


When clustering non-named vectors there are no vector ids in clusteredPoints so 
the other values there, cluster id, vector values, distance-squared, pdf, 
cannot be tied to any known vector.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2014-01-24 Thread Andrew Musselman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881273#comment-13881273
 ] 

Andrew Musselman commented on MAHOUT-1030:
--

Yes, was looking at this last night with Suneel.  I think he found where it's 
happening and we'll check it out.

> Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
> WeightedVectorWritable
> 
>
> Key: MAHOUT-1030
> URL: https://issues.apache.org/jira/browse/MAHOUT-1030
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering, Integration
>Affects Versions: 0.7
>Reporter: Jeff Eastman
>Assignee: Andrew Musselman
> Fix For: 0.9
>
> Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, 
> MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch
>
>
> Looks like this won't make it into this build. Pretty widespread impact on 
> code and tests and I don't know which properties were implemented in the old 
> version. I will create a JIRA and post my interim results.
> On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > That's a reversion that evidently got in when the new 
> > ClusterClassificationDriver was introduced. It should be a pretty easy fix 
> > and I will see if I can make the change before Paritosh cuts the release 
> > bits tonight.
> >
> > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> >> It appears that in kmeans the clusteredPoints are now written as 
> >> WeightedVectorWritable where in mahout 0.6 they were 
> >> WeightedPropertyVectorWritable? This means that the distance from the 
> >> centroid is no longer stored here? Why? I hope I'm wrong because that is 
> >> not a welcome change. How is one to order clustered docs by distance from 
> >> cluster centroid?
> >>
> >> I'm sure I could calculate the distance but that would mean looking up the 
> >> centroid for the cluster id given in the above WeightedVectorWritable, 
> >> which means iterating through all the clusters for each clustered doc. In 
> >> my case the number of clusters could be fairly large.
> >>
> >> Am I missing something?
> >>
> >>
> >



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2014-01-24 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881226#comment-13881226
 ] 

Pat Ferrel commented on MAHOUT-1030:


This fixes a very literal reading of the bug. The distance-squared is indeed 
included in clusteredPoints BUT there are no vector ids so the distance can't 
actually be used. Without a vector id in clusteredPoints, Mahout doesn't really 
perform unsupervised categorization. I will now have to loop through all 
vectors, recalculate the distance and categorize them according to the cluster 
centroid they are closest to. 

The clusteredPoints and distance-squared can't actually be used without knowing 
the vector id. I think named vectors work here but many cases including mine do 
not have names only Mahout integer ids.

Please correct me if I've missed something.

When I cluster the user preference data used in the Mahout recommender I get 
clusteredPoints something like this. The data from the vector is given but not 
its id??? The Key here is a cluster id.

pat$ mahout seqdumper -i /Users/pat/big-data/temp/clusters/clusteredPoints/ | 
more
Jan 24, 2014 10:02:05 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--endPhase=[2147483647], 
--input=[/Users/pat/big-data/temp/clusters/clusteredPoints/], --startPhase=[0], 
--tempDir=[temp]}
2014-01-24 10:02:05.707 java[29221:1003] Unable to load realm info from 
SCDynamicStore
Input Path: file:/Users/pat/big-data/temp/clusters/clusteredPoints/part-m-0
Key class: class org.apache.hadoop.io.IntWritable Value Class: class 
org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable
Key: 39: Value: wt: 1.0 distance-squared: 9.656875  vec: [0:1.000, 2:1.000, 
5:1.000, 9:1.000, 12:1.000, 13:1.000, 17:1.000, 18:1.000, 19:1.000, 20:1.000]
Key: 48: Value: wt: 1.0 distance-squared: 22.2291686  vec: [25:1.000, 
26:1.000, 27:1.000, 28:1.000, 29:1.000, 30:1.000, 31:1.000, 36:1.000, 38:1.000, 
39:1.000, 40:1.000, 41:1.000, 43:1.000, 44:1.000, 46:1.000, 48:1.000, 53:1.000, 
54:1.000, 55:1.000, 56:1.000, 57:1.000, 58:1.000, 60:1.000, 63:1.000, 64:1.000, 
66:1.000, 67:1.000, 68:1.000, 69:1.000, 70:1.000, 71:1.000, 72:1.000]


> Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
> WeightedVectorWritable
> 
>
> Key: MAHOUT-1030
> URL: https://issues.apache.org/jira/browse/MAHOUT-1030
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering, Integration
>Affects Versions: 0.7
>Reporter: Jeff Eastman
>Assignee: Andrew Musselman
> Fix For: 0.9
>
> Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, 
> MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch
>
>
> Looks like this won't make it into this build. Pretty widespread impact on 
> code and tests and I don't know which properties were implemented in the old 
> version. I will create a JIRA and post my interim results.
> On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > That's a reversion that evidently got in when the new 
> > ClusterClassificationDriver was introduced. It should be a pretty easy fix 
> > and I will see if I can make the change before Paritosh cuts the release 
> > bits tonight.
> >
> > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> >> It appears that in kmeans the clusteredPoints are now written as 
> >> WeightedVectorWritable where in mahout 0.6 they were 
> >> WeightedPropertyVectorWritable? This means that the distance from the 
> >> centroid is no longer stored here? Why? I hope I'm wrong because that is 
> >> not a welcome change. How is one to order clustered docs by distance from 
> >> cluster centroid?
> >>
> >> I'm sure I could calculate the distance but that would mean looking up the 
> >> centroid for the cluster id given in the above WeightedVectorWritable, 
> >> which means iterating through all the clusters for each clustered doc. In 
> >> my case the number of clusters could be fairly large.
> >>
> >> Am I missing something?
> >>
> >>
> >



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: cluster-reuters.sh broken in trunk

2014-01-24 Thread Andrew Musselman
Yeah, disregard, my repo was out of whack.


On Fri, Jan 24, 2014 at 10:00 AM, ap.dev  wrote:

> I'm not getting any exceptions there.
>
>  Original message 
> From: Andrew Musselman 
> Date:01/24/2014  11:38 AM  (GMT-05:00)
> To: dev@mahout.apache.org
> Subject: cluster-reuters.sh broken in trunk
>
> Last night I had this issue when testing out cluster-reuters.sh with no
> flags; anyone seen this recently?
>
> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms
> (Minutes: 4.7799833)
> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments:
> {--clustering=null,
> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
> --convergenceDelta=[0.5],
>
> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
> --endPhase=[2147483647],
>
> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
> --maxIter=[10], --method=[mapreduce], --numClusters=[20],
> --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
> --startPhase=[0], --tempDir=[temp]}
> 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor
> 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> 14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input:
> /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors
> Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> Out: /tmp/mahout-work-akm/reuters-kmeans Distance:
> org.apache.mahout.common.distance.CosineDistanceMeasure
> 14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max
> Iterations: 10
> 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor
> Exception in thread "main" java.lang.IllegalStateException: No input
> clusters found in
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c
> argument.
> at
>
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> $ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> Found 1 items
> -rw-r--r--   1 akm supergroup149 2014-01-23 22:03
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
>
>
> $ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> SEQ
>  
> org.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable
> *org.apache.hadoop.io.compress.DefaultCodec�M5�0ü��� $
>


RE: cluster-reuters.sh broken in trunk

2014-01-24 Thread ap . dev
I'm not getting any exceptions there.

 Original message 
From: Andrew Musselman 
Date:01/24/2014  11:38 AM  (GMT-05:00)
To: dev@mahout.apache.org
Subject: cluster-reuters.sh broken in trunk

Last night I had this issue when testing out cluster-reuters.sh with no
flags; anyone seen this recently?

14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms
(Minutes: 4.7799833)
Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments:
{--clustering=null,
--clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
--convergenceDelta=[0.5],
--distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
--endPhase=[2147483647],
--input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
--maxIter=[10], --method=[mapreduce], --numClusters=[20],
--output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
--startPhase=[0], --tempDir=[temp]}
14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor
14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input:
/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors
Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
Out: /tmp/mahout-work-akm/reuters-kmeans Distance:
org.apache.mahout.common.distance.CosineDistanceMeasure
14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max
Iterations: 10
14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.IllegalStateException: No input
clusters found in
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c
argument.
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
$ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
Found 1 items
-rw-r--r--   1 akm supergroup149 2014-01-23 22:03
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed


$ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
SEQorg.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable*org.apache.hadoop.io.compress.DefaultCodec�M5�0ü���$


Re: cluster-reuters.sh broken in trunk

2014-01-24 Thread Andrew Musselman
`./cluster-reuters.sh` from examples/bin which worked the other day just
fine, selecting option (1).

It runs fine with `./cluster-reuters.sh -nv` though.


On Fri, Jan 24, 2014 at 8:50 AM, Suneel Marthi wrote:

> 'No Flags' ???  Could u post the command u were trying?
>
>
>
>
>
>
> On Friday, January 24, 2014 11:38 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> Last night I had this issue when testing out cluster-reuters.sh with no
> flags; anyone seen this recently?
>
> 14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms
> (Minutes: 4.7799833)
> Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments:
> {--clustering=null,
> --clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
> --convergenceDelta=[0.5],
>
> --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
> --endPhase=[2147483647],
>
> --input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
> --maxIter=[10], --method=[mapreduce], --numClusters=[20],
> --output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
> --startPhase=[0], --tempDir=[temp]}
> 14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor
> 14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> 14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input:
> /tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors
> Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> Out: /tmp/mahout-work-akm/reuters-kmeans Distance:
> org.apache.mahout.common.distance.CosineDistanceMeasure
> 14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max
> Iterations: 10
> 14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor
> Exception in thread "main" java.lang.IllegalStateException: No input
> clusters found in
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c
> argument.
> at
>
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> $ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
> Found 1 items
> -rw-r--r--   1 akm supergroup149 2014-01-23 22:03
> /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
>
>
> $ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
>
> SEQorg.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable*org.apache.hadoop.io.compress.DefaultCodec�M5�0ü���$
>


Re: cluster-reuters.sh broken in trunk

2014-01-24 Thread Suneel Marthi
'No Flags' ???  Could u post the command u were trying? 






On Friday, January 24, 2014 11:38 AM, Andrew Musselman 
 wrote:
 
Last night I had this issue when testing out cluster-reuters.sh with no
flags; anyone seen this recently?

14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms
(Minutes: 4.7799833)
Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments:
{--clustering=null,
--clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
--convergenceDelta=[0.5],
--distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
--endPhase=[2147483647],
--input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
--maxIter=[10], --method=[mapreduce], --numClusters=[20],
--output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
--startPhase=[0], --tempDir=[temp]}
14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor
14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input:
/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors
Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
Out: /tmp/mahout-work-akm/reuters-kmeans Distance:
org.apache.mahout.common.distance.CosineDistanceMeasure
14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max
Iterations: 10
14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.IllegalStateException: No input
clusters found in
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c
argument.
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
$ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
Found 1 items
-rw-r--r--   1 akm supergroup        149 2014-01-23 22:03
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed


$ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
SEQorg.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable*org.apache.hadoop.io.compress.DefaultCodec�M5�0ü���$

cluster-reuters.sh broken in trunk

2014-01-24 Thread Andrew Musselman
Last night I had this issue when testing out cluster-reuters.sh with no
flags; anyone seen this recently?

14/01/23 22:03:54 INFO driver.MahoutDriver: Program took 286799 ms
(Minutes: 4.7799833)
Running on hadoop, using /home/akm/hadoop-0.20.205.0/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/akm/mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
14/01/23 22:03:57 INFO common.AbstractJob: Command line arguments:
{--clustering=null,
--clusters=[/tmp/mahout-work-akm/reuters-kmeans-clusters],
--convergenceDelta=[0.5],
--distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure],
--endPhase=[2147483647],
--input=[/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/],
--maxIter=[10], --method=[mapreduce], --numClusters=[20],
--output=[/tmp/mahout-work-akm/reuters-kmeans], --overwrite=null,
--startPhase=[0], --tempDir=[temp]}
14/01/23 22:03:57 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new compressor
14/01/23 22:03:57 INFO kmeans.RandomSeedGenerator: Wrote 20 Klusters to
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
14/01/23 22:03:57 INFO kmeans.KMeansDriver: Input:
/tmp/mahout-work-akm/reuters-out-seqdir-sparse-kmeans/tfidf-vectors
Clusters In: /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
Out: /tmp/mahout-work-akm/reuters-kmeans Distance:
org.apache.mahout.common.distance.CosineDistanceMeasure
14/01/23 22:03:57 INFO kmeans.KMeansDriver: convergence: 0.5 max
Iterations: 10
14/01/23 22:03:57 INFO compress.CodecPool: Got brand-new decompressor
Exception in thread "main" java.lang.IllegalStateException: No input
clusters found in
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed. Check your -c
argument.
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
$ hls /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
Found 1 items
-rw-r--r--   1 akm supergroup149 2014-01-23 22:03
/tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed


$ hcat /tmp/mahout-work-akm/reuters-kmeans-clusters/part-randomSeed
SEQorg.apache.hadoop.io.Text5org.apache.mahout.clustering.iterator.ClusterWritable*org.apache.hadoop.io.compress.DefaultCodec�M5�0ü���$


Re: MAHOUT 0.9 Release - New URL

2014-01-24 Thread Gokhan Capan
Using CentOS 6.5 and hadoop 1.2.1, all passed.

+1 from me

Gokhan


On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo  wrote:

> a),b),c),d) all passed on CentOS for me
>
> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > From: ssvinarc...@hortonworks.com
> > To: dev@mahout.apache.org
> >
> > I did a), b), c), d) and all steps pass.
> > +1
> >
> >
> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll  >wrote:
> >
> > > +1 from me.
> > >
> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi 
> > > wrote:
> > >
> > > > Fixed the issues that were reported this week and restored FP mining
> > > into the codebase.
> > > >
> > > > Here's the URL for the final release in staging:-
> > > >
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > > >
> > > > The artifacts have been signed with the following key:
> > > > https://people.apache.org/keys/committer/smarthi.asc
> > > >
> > > >
> > > > a) Verify that u can unpack the release (tar or zip)
> > > > b) Verify u r able to compile the distro
> > > > c)  Run through the unit tests: mvn clean test
> > > > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please
> run
> > > through all the different options in each script.
> > > >
> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> to be
> > > finalized.
> > >
> > > 
> > > Grant Ingersoll | @gsingers
> > > http://www.lucidworks.com
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>