Hello, On Mon, Mar 7, 2011 at 12:22 AM, Clement Jebakumar <jeba.r...@gmail.com> wrote: > I want to parse XML file in hadoop. > > I have my own mapper class called "MyXMLMapper"... > > $ ./bin/hadoop jar hadoop-streaming.jar -inputreader > "StreamXmlRecordReader,begin='<Page',end='</Page>'" -file > /home/hdfs/XML2HBase.jar -mapper MyXMLMapper -output /temp/sample -input > /temp/example.xml > Caused by: java.io.IOException: Cannot run program "DmozXMLMapper": > java.io.IOException: error=2, No such file or directory > at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) > Caused by: java.io.IOException: java.io.IOException: error=2, No such file > or directory
>From what I understand, your MyXMLMapper/DmozXMLMapper java class is not being found by the streaming runner, so it probably considers that as a shell program instead of a java class, and hence fails. The issue with your command is that "-file" simply adds the given files to the operating MR cluster but does not add it to the runtime classpath of your mappers/reducers. Using "-libjars" instead, for your XML2HBase.jar file to specify a dependent jar, should solve this, if am right. You can see all other available options of hadoop-streaming by passing "-info". Hope this helps. -- Harsh J www.harshj.com