Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

pragnesh radadia Fri, 08 Oct 2010 02:02:37 -0700

finally I am able to run kmean example of Clustering of synthetic control data.


I think problem is "hadoop is running as hadoop user(using cloudera
cdh3) and I am trying to run example as pragnesh user"

so hadoop is not able find the under "/user/hadoop"

since example is using relative path to store the input and clustering data.

-pragnesh


On Wed, Oct 6, 2010 at 12:27 AM, Jeff Eastman
<[email protected]> wrote:
>  Hi Pragnesh,
>
> I really don't know what to suggest to you. I just did a new Mahout checkout
> and build, followed by uploading the synthetic_control.data file to a local
> Hadoop instance. The k-means job ran without incident. On a hunch, I also
> uploaded the file as testdata (not in directory testdata) and that worked
> too. I'm baffled why I can't duplicate this and suspect it is a local system
> issue. What OS are you running?
>
> If yours works from Eclipse but not from the command line, I wonder if you
> have done mvn clean build from the command line before you ran the CLI
> Mahout job? Eclipse compiles its bits int


o different directories and does
> not build the necessary job files. Other than that, I suggest checking your
> file system groups and permissions.
>
> If you find something that gets you running again, *please* post your
> solution so we can advise others who are experiencing the same error
> message.
>
>
> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>
>>     [
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502
>> ]
>>
>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>> ----------------------------------------------------------
>>
>> i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>
>> this run fine from eclipse
>>
>> but when i try to run from command line with hadoop. i see following
>> output.
>>
>> while  $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>> without any error.
>>
>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>> classpath, will use command-line arguments only
>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>> job_201010051117_0005
>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>> job_201010051117_0005
>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>> output/data Out: output Measure:
>> org.apache.mahout.common.distance.euclideandistancemeas...@136a43c t1: 80.0
>> t2: 55.0
>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>> job_201010051117_0006
>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>> job_201010051117_0006
>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-0 Out: output Distance:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>> job_201010051117_0007
>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>> job_201010051117_0007
>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>> org.apache.mahout.common.distance.euclideandistancemeas...@136a43c
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>> Vectors: org.apache.mahout.math.VectorWritable
>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>> job_201010051117_0008
>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>> job_201010051117_0008
>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>
>>       was (Author: pgradadia):
>>     i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>> Kmeans clustering error
>>> -----------------------
>>>
>>>                 Key: MAHOUT-504
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>             Project: Mahout
>>>          Issue Type: Bug
>>>            Reporter: Zhen Guo
>>>            Assignee: Robin Anil
>>>             Fix For: 0.4
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>> Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Reply via email to