The reason u r seeing the error is because there are were no sequence files in HDFS in MR mode to begin with => hence no term vectors generated => and hence no vectors to cluster.
MR mode: ------------ 1. Set HADOOP_HOME 2. unset MAHOUT_LOCAL 3. clean up ur local /tmp/mahout-work-xxxxx directory 4. run ./examples/bin/cluster-reuters.sh => option 4 Sequential Mode: --------------------- 1. set MAHOUT_LOCAL=true 2. Add "-xm sequential" flag to cluster-reuters.sh script 3. run ./examples/bin/cluster-reuters.sh => option 4 On Sunday, January 19, 2014 12:22 PM, Frank Scholten <[email protected]> wrote: When I run in MR mode I get the same problem. See http://pastebin.com/TXJ5mQmt On Sun, Jan 19, 2014 at 5:31 PM, Frank Scholten <[email protected]> wrote: OK, running in MR mode now. > > > > >On Sun, Jan 19, 2014 at 5:30 PM, Suneel Marthi <[email protected]> wrote: > >Its presently setup to run in MR mode (the way its been coded in >cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this. >>I am able to see this fail locally when MAHOUT_LOCAL=true. >> >> >> >> >> >> >>On Sunday, January 19, 2014 11:17 AM, Frank Scholten <[email protected]> >>wrote: >> >>Exported MAHOUT_LOCAL=true and still get the same results. >> >> >> >>On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <[email protected]>wrote: >> >>> Frank, >>> >>> Were u running this with MAHOUT_LOCAL=true? >>> >>> >>> >>> >>> >>> On Sunday, January 19, 2014 10:29 AM, Frank Scholten < >>> [email protected]> wrote: >>> >>> -1 >>> >>> The cluster reuters example results in zero clusters when choosing >>> streaming k-means. The other steps, unpacking and building do work. >>> >>> I see this stacktrace: >>> >>> INFO: Number of Centroids: 0 >>> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run >>> WARNING: job_local797072544_0001 >>> java.lang.IllegalArgumentException: Must have nonzero number of training >>> and test vectors. Asked for %.1f %% of %d vectors for test >>> [10.000000149011612, 0] >>> at >>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120) >>> at >>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176) >>> at >>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192) >>> at >>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107) >>> at >>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73) >>> at >>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37) >>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177) >>> at >>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) >>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) >>> at >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) >>> >>> Num clusters: 0; maxDistance: 0.000000 >>> [Dunn Index] First: Infinity >>> [Davies-Bouldin Index] First: NaN >>> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info >>> INFO: Program took 278 ms (Minutes: 0.004633333333333333) >>> cluster,distance.mean,distance.sd >>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train >>> >>> >>> Here is the full log: http://pastebin.com/TxLV0rDr >>> >>> As of yet I am unfamiliar with the streaming k-means code and the >>> algorithms behind it. If anyone has suggestion on what goes wrong in the >>> code I am I happy to help where I can. >>> >>> >>> Frank >>> >>> >>> >>> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <[email protected]> >>> wrote: >>> >>> Thanks Grant. >>> > >>> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister >>> for 0.9. >>> >Here's my +1 FWIW. >>> > >>> >a) Attached is the draft of the Release notes for 0.9, would definitely >>> appreciate feedback on that. >>> > >>> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if >>> a majority of atleast 3 +1 PMC votes are cast. >>> > >>> >The release files, including signatures, digests, etc can be found at: >>> > >>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/ >>> > >>> >The staging repository for this release can be found at: >>> >https://repository.apache.org/content/repositories/orgapachemahout-1002 >>> > >>> >Release artifacts have been signed with the following key: >>> >https://people.apache.org/keys/committer/smarthi.asc >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll < >>> [email protected]> wrote: >>> > >>> >Ran the tests, verified sigs, tried out a few of the examples. >>> > >>> >+1 (binding) >>> > >>> > >>> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <[email protected]> >>> wrote: >>> > >>> >> Third time's a Charm!!! >>> >> >>> >> >>> >> Here's the new URL for Mahout 0.9 Release: >>> >> >>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/ >>> >> >>> >> For those volunteering to test this, some of the things to be verified: >>> >> >>> >> a) Verify that u can >>> unpack the release (tar or zip) >>> >> b) Verify u r able to compile the distro >>> >> c) Run through the unit tests: mvn clean test >>> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run >>> through all the different options in each script. >>> >> >>> >> >>> >> Committers >>> >> and PMC members: >>> >> --------------------------------------- >>> >> >>> >> Need 'at least 3 +1 votes' for the Release to pass. >>> >> >>> >> >>> >> Thanks and Regards. >>> > >>> > >>> > >>> > >>> >
