[ https://issues.apache.org/jira/browse/MAHOUT-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550729#comment-13550729 ]
Shengchao Ding commented on MAHOUT-1137: ---------------------------------------- It's an issue reported same to https://issues.apache.org/jira/browse/MAHOUT-1061 After modifying SplitInputJob.java, it works well on CDH4. > Same to MAHOUT-1061: ClassNotFoundException in mahout split -xm mapreduce: > org.apache.mahout.utils.SplitInputJob$SplitInputMapper > --------------------------------------------------------------------------------------------------------------------------------- > > Key: MAHOUT-1137 > URL: https://issues.apache.org/jira/browse/MAHOUT-1137 > Project: Mahout > Issue Type: Bug > Components: Integration > Affects Versions: 0.7 > Environment: Cloudera virtual machine cdh4.1.2 > Linux localhost.localdomain 2.6.18-308.8.2.el5 #1 SMP Tue Jun 12 09:58:12 EDT > 2012 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Shengchao Ding > Labels: ClassNotFoundException, SplitInputJob > Original Estimate: 1h > Remaining Estimate: 1h > > I'm running the 20 newsgroups examples on virtual machine of CDH4.1.2. > It ran smoothly but failed if I modify the split command to > mahout split \ > -i newsgroup/vectors \ > --trainingOutput newsgroup/train-vectors \ > --testOutput newsgroup/test-vectors \ > --randomSelectionPct 40 --overwrite --sequenceFiles -xm mapreduce > -mro newsgroup/mro > The only different to original command is that the method is modified > to mapreduce while the original example is sequential. > I got the following exception. > Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: > Class org.apache.mahout.utils.SplitInputJob$SplitInputMapper not found > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1571) > at > org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:685) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147) > Caused by: java.lang.ClassNotFoundException: Class > org.apache.mahout.utils.SplitInputJob$SplitInputMapper not found > at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1477) > at > org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1569) > ... 8 more > I checked the mahout package on the distribution as follows. > [cloudera@localhost ~]$ jar tf > /usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar | grep SplitInput > org/apache/mahout/utils/SplitInputJob$SplitInputReducer.class > org/apache/mahout/utils/SplitInputJob$SplitInputMapper.class > org/apache/mahout/utils/SplitInputJob$SplitInputComparator.class > org/apache/mahout/utils/SplitInputJob.class > org/apache/mahout/utils/SplitInput.class > org/apache/mahout/utils/SplitInput$SplitCallback.class -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira