[ https://issues.apache.org/jira/browse/MAHOUT-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Schelter resolved MAHOUT-1452. ---------------------------------------- Resolution: Not a Problem > Kmeans unexpected behaviour after removal of file scheme in output path for > method mapreduce > -------------------------------------------------------------------------------------------- > > Key: MAHOUT-1452 > URL: https://issues.apache.org/jira/browse/MAHOUT-1452 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.9 > Environment: CentOS, CDH4.6(3 Node Cluster) > Reporter: Bikash Gupta > Priority: Minor > Labels: patch > Fix For: 1.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > Remove the hdfs scheme from output path, it will create clusters-0 in local > file system and clusters-1 in HDFS and after that it spits an error as it > expects clusters-0 to be in HDFS. Please check below stacktrace > 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments: > {--clustering=null, --clusters=[/3/clusters-0-final], > --convergenceDelta=[0.1], > --distanceMeasure=[org.apache.mahout.common.distance.EuclideanDistanceMeasure], > --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100], > --method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0], > --tempDir=[temp]} > 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence Clusters > In: /3/clusters-0-final Out: /5 > 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max > Iterations: 100 > 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths to > process : 3 > 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job: > job_201403111332_0011 > 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO] map 0% reduce 0% > 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id : > attempt_201403111332_0011_m_000000_0, Status : FAILED > 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException: > /5/clusters-0 > at > org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78) > at > org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208) > at > org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: java.io.FileNotFoundException: File /5/clusters-0 > If you provide HDFS uri in output then it works like a charm. -- This message was sent by Atlassian JIRA (v6.2#6252)