Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Jeff Eastman Tue, 05 Oct 2010 11:58:24 -0700

 Hi Pragnesh,

I really don't know what to suggest to you. I just did a new Mahoutcheckout and build, followed by uploading the synthetic_control.datafile to a local Hadoop instance. The k-means job ran without incident.On a hunch, I also uploaded the file as testdata (not in directorytestdata) and that worked too. I'm baffled why I can't duplicate thisand suspect it is a local system issue. What OS are you running?

If yours works from Eclipse but not from the command line, I wonder ifyou have done mvn clean build from the command line before you ran theCLI Mahout job? Eclipse compiles its bits into different directories anddoes not build the necessary job files. Other than that, I suggestchecking your file system groups and permissions.

If you find something that gets you running again, *please* post yoursolution so we can advise others who are experiencing the same errormessage.



On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:

     [ 
https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502
 ]

pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
----------------------------------------------------------

i am also getting same exption with trunk code

10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
10/10/04 12:42:45 INFO mapred.JobClient: Task Id : 
attempt_201010041038_0019_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at 
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)


this run fine from eclipse

but when i try to run from command line with hadoop. i see following output.

while  $MAHOUT_HOME/bin/mahout 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine 
without any error.

pragnesh-laptop% $MAHOUT_HOME/bin/mahout 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
10/10/05 12:26:05 WARN driver.MahoutDriver: No 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on 
classpath, will use command-line arguments only
10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:26:09 INFO mapred.JobClient: Running job: job_201010051117_0005
10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
10/10/05 12:26:28 INFO mapred.JobClient: Job complete: job_201010051117_0005
10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input: output/data 
Out: output Measure: 
org.apache.mahout.common.distance.euclideandistancemeas...@136a43c t1: 80.0 t2: 
55.0
10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:26:30 INFO mapred.JobClient: Running job: job_201010051117_0006
10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
10/10/05 12:26:56 INFO mapred.JobClient: Job complete: job_201010051117_0006
10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters In: 
output/clusters-0 Out: output Distance: 
org.apache.mahout.common.distance.EuclideanDistanceMeasure
10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 
num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:26:58 INFO mapred.JobClient: Running job: job_201010051117_0007
10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:27:08 INFO mapred.JobClient: Task Id : 
attempt_201010051117_0007_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at 
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:14 INFO mapred.JobClient: Task Id : 
attempt_201010051117_0007_m_000000_1, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at 
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:23 INFO mapred.JobClient: Task Id : 
attempt_201010051117_0007_m_000000_2, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at 
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:35 INFO mapred.JobClient: Job complete: job_201010051117_0007
10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters In: 
output/clusters-1 Out: output/clusteredPoints Distance: 
org.apache.mahout.common.distance.euclideandistancemeas...@136a43c
10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: 
org.apache.mahout.math.VectorWritable
10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:27:37 INFO mapred.JobClient: Running job: job_201010051117_0008
10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:27:47 INFO mapred.JobClient: Task Id : 
attempt_201010051117_0008_m_000000_0, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:53 INFO mapred.JobClient: Task Id : 
attempt_201010051117_0008_m_000000_1, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:59 INFO mapred.JobClient: Task Id : 
attempt_201010051117_0008_m_000000_2, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:28:11 INFO mapred.JobClient: Job complete: job_201010051117_0008
10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms

       was (Author: pgradadia):
     i am also getting same exption with trunk code

10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
10/10/04 12:42:45 INFO mapred.JobClient: Task Id : 
attempt_201010041038_0019_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at 
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

Kmeans clustering error
-----------------------

                 Key: MAHOUT-504
                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
             Project: Mahout
          Issue Type: Bug
            Reporter: Zhen Guo
            Assignee: Robin Anil
             Fix For: 0.4


I tried the Kmeans algorithm on the Synthetic Control data. The following error 
appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I 
am using Trunk.
10/09/20 19:40:06 INFO mapred.JobClient: Task Id : 
attempt_201008261432_1324_m_000000_0, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Reply via email to