[jira] [Created] (MAHOUT-1452) Kmeans unexpected behaviour after removal of file scheme in output path for method mapreduce

Bikash Gupta (JIRA) Wed, 12 Mar 2014 13:05:27 -0700

Bikash Gupta created MAHOUT-1452:
------------------------------------

             Summary: Kmeans unexpected behaviour after removal of file scheme 
in output path for method mapreduce
                 Key: MAHOUT-1452
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1452
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.9
         Environment: CentOS, CDH4.6(3 Node Cluster)
            Reporter: Bikash Gupta
            Priority: Minor
             Fix For: 1.0



Remove the hdfs scheme from output path, it will create clusters-0 in local 
file system and clusters-1 in HDFS and after that it spits an error as it 
expects clusters-0 to be in HDFS. Please check below stacktrace

2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments: 
{--clustering=null, --clusters=[/3/clusters-0-final], --convergenceDelta=[0.1], 
--distanceMeasure=[org.apache.mahout.common.distance.EuclideanDistanceMeasure], 
--endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100], 
--method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0], 
--tempDir=[temp]}
2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence Clusters 
In: /3/clusters-0-final Out: /5
2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max 
Iterations: 100
2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser for 
parsing the arguments. Applications should implement Tool for the same.
2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths to 
process : 3
2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job: job_201403111332_0011
2014-03-11 14:52:20 o.a.h.m.JobClient [INFO]  map 0% reduce 0%
2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id : 
attempt_201403111332_0011_m_000000_0, Status : FAILED
2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException: /5/clusters-0
        at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
        at 
org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
        at 
org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.io.FileNotFoundException: File /5/clusters-0


If you provide HDFS uri in output then it works like a charm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (MAHOUT-1452) Kmeans unexpected behaviour after removal of file scheme in output path for method mapreduce

Reply via email to