The reason u r seeing the error is because there are were no sequence files in 
HDFS in MR mode to begin with => hence no term vectors generated => and hence 
no vectors to cluster.

MR mode:
------------
1. Set HADOOP_HOME
2. unset MAHOUT_LOCAL
3. clean up ur local /tmp/mahout-work-xxxxx directory
4. run ./examples/bin/cluster-reuters.sh => option 4

Sequential Mode:
---------------------

1. set MAHOUT_LOCAL=true
2. Add "-xm sequential" flag to cluster-reuters.sh script
3. run ./examples/bin/cluster-reuters.sh => option 4








On Sunday, January 19, 2014 12:22 PM, Frank Scholten <[email protected]> 
wrote:
 
When I run in MR mode I get the same problem.

See http://pastebin.com/TXJ5mQmt




On Sun, Jan 19, 2014 at 5:31 PM, Frank Scholten <[email protected]> wrote:

OK, running in MR mode now.
>
>
>
>
>On Sun, Jan 19, 2014 at 5:30 PM, Suneel Marthi <[email protected]> wrote:
>
>Its presently setup to run in MR mode (the way its been coded in 
>cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
>>I am able to see this fail locally when MAHOUT_LOCAL=true. 
>>
>>
>>
>>
>>
>>
>>On Sunday, January 19, 2014 11:17 AM, Frank Scholten <[email protected]> 
>>wrote:
>>
>>Exported MAHOUT_LOCAL=true and still get the same results.
>>
>>
>>
>>On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <[email protected]>wrote:
>>
>>> Frank,
>>>
>>> Were u running this with MAHOUT_LOCAL=true?
>>>
>>>
>>>
>>>
>>>
>>> On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
>>> [email protected]> wrote:
>>>
>>> -1
>>>
>>> The cluster reuters example results in zero clusters when choosing
>>> streaming k-means. The other steps, unpacking and building do work.
>>>
>>> I see this stacktrace:
>>>
>>> INFO: Number of Centroids: 0
>>> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
>>> WARNING: job_local797072544_0001
>>> java.lang.IllegalArgumentException: Must have nonzero number of training
>>> and test vectors. Asked for %.1f %% of %d vectors for test
>>> [10.000000149011612, 0]
>>>     at
>>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>>>     at
>>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>>>     at
>>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>>>     at
>>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>>>     at
>>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>>>     at
>>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>>>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>>>     at
>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>>     at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>>
>>> Num clusters: 0; maxDistance: 0.000000
>>> [Dunn Index] First: Infinity
>>> [Davies-Bouldin Index] First: NaN
>>> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 278 ms (Minutes: 0.004633333333333333)
>>> cluster,distance.mean,distance.sd
>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>>
>>>
>>> Here is the full log: http://pastebin.com/TxLV0rDr
>>>
>>> As of  yet I am  unfamiliar with the streaming k-means code and the
>>> algorithms behind it. If anyone has suggestion on what goes wrong in the
>>> code I am I happy to help  where I can.
>>>
>>>
>>> Frank
>>>
>>>
>>>
>>> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <[email protected]>
>>> wrote:
>>>
>>> Thanks Grant.
>>> >
>>> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
>>> for 0.9.
>>> >Here's my +1 FWIW.
>>> >
>>> >a) Attached is the draft of the Release notes for 0.9, would definitely
>>> appreciate feedback on that.
>>> >
>>> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
>>> a majority of atleast 3 +1 PMC votes are cast.
>>> >
>>> >The release files, including signatures, digests, etc can be found at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>> >
>>> >The staging repository for this release can be found at:
>>> >https://repository.apache.org/content/repositories/orgapachemahout-1002
>>> >
>>> >Release artifacts have been signed with the following key:
>>> >https://people.apache.org/keys/committer/smarthi.asc
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
>>> [email protected]> wrote:
>>> >
>>> >Ran the tests, verified sigs, tried out a few of the examples.
>>> >
>>> >+1 (binding)
>>> >
>>> >
>>> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <[email protected]>
>>> wrote:
>>> >
>>> >> Third time's a Charm!!!
>>> >>
>>> >>
>>> >> Here's the new URL for Mahout 0.9 Release:
>>> >>
>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>> >>
>>> >> For those volunteering to test this, some of the things to be verified:
>>> >>
>>> >> a) Verify that u can
>>>  unpack the release (tar or zip)
>>> >> b) Verify u r able to compile the distro
>>> >> c)  Run through the unit tests: mvn clean test
>>> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
>>> through all the different options in each script.
>>> >>
>>> >>
>>> >> Committers
>>> >> and PMC members:
>>> >> ---------------------------------------
>>> >>
>>> >> Need 'at least 3 +1 votes' for the Release to pass.
>>> >>
>>> >>
>>> >> Thanks and Regards.
>>> >
>>> >
>>> >
>>> >
>>>
>

Reply via email to