This cannot be running on the latest trunk. The job no longer has a -c
argument and the initial clusters are always computed by running Canopy
on the converted data. It is meant to be run with no arguments; default
values are provided (EuclideanDM, 80, 55) that work consistently. The
only variables are the distance measure, t1 and t2 values for Canopy. If
these are changed there will be somewhere between 1 and 600 clusters
generated by Canopy and k-Means processes them fine.
Predictably, when I run with t1=800 and t2=550 I get a single cluster
out; with t1=8 and t2=5.5 I get 600. There is no way I can imagine to
ever get 0 clusters out of Canopy.
I think this has been fixed, but show me a command line that can
generate this error and I will have something to work with.
On 9/25/10 3:57 AM, Sean Owen (JIRA) wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen reopened MAHOUT-504:
------------------------------
Kmeans clustering error
-----------------------
Key: MAHOUT-504
URL: https://issues.apache.org/jira/browse/MAHOUT-504
Project: Mahout
Issue Type: Bug
Reporter: Zhen Guo
Assignee: Robin Anil
Fix For: 0.4
I tried the Kmeans algorithm on the Synthetic Control data. The following error
appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I
am using Trunk.
10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
attempt_201008261432_1324_m_000000_0, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
at
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)