Re: [jira] Reopened: (MAHOUT-504) Kmeans clustering error

Jeff Eastman Sat, 25 Sep 2010 06:45:23 -0700

This cannot be running on the latest trunk. The job no longer has a -cargument and the initial clusters are always computed by running Canopyon the converted data. It is meant to be run with no arguments; defaultvalues are provided (EuclideanDM, 80, 55) that work consistently. Theonly variables are the distance measure, t1 and t2 values for Canopy. Ifthese are changed there will be somewhere between 1 and 600 clustersgenerated by Canopy and k-Means processes them fine.

Predictably, when I run with t1=800 and t2=550 I get a single clusterout; with t1=8 and t2=5.5 I get 600. There is no way I can imagine toever get 0 clusters out of Canopy.

I think this has been fixed, but show me a command line that cangenerate this error and I will have something to work with.



On 9/25/10 3:57 AM, Sean Owen (JIRA) wrote:

      [ 
https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened MAHOUT-504:
------------------------------

Kmeans clustering error
-----------------------

                 Key: MAHOUT-504
                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
             Project: Mahout
          Issue Type: Bug
            Reporter: Zhen Guo
            Assignee: Robin Anil
             Fix For: 0.4


I tried the Kmeans algorithm on the Synthetic Control data. The following error 
appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I 
am using Trunk.
10/09/20 19:40:06 INFO mapred.JobClient: Task Id : 
attempt_201008261432_1324_m_000000_0, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

Re: [jira] Reopened: (MAHOUT-504) Kmeans clustering error

Reply via email to