Hello,

On Mon, Mar 7, 2011 at 12:22 AM, Clement Jebakumar <jeba.r...@gmail.com> wrote:
> I want to parse XML file in hadoop.
>
> I have my own mapper class called "MyXMLMapper"...
>
> $ ./bin/hadoop jar hadoop-streaming.jar -inputreader
> "StreamXmlRecordReader,begin='<Page',end='</Page>'" -file
> /home/hdfs/XML2HBase.jar -mapper MyXMLMapper -output /temp/sample -input
> /temp/example.xml
> Caused by: java.io.IOException: Cannot run program "DmozXMLMapper":
> java.io.IOException: error=2, No such file or directory
>       at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
> Caused by: java.io.IOException: java.io.IOException: error=2, No such file
> or directory

>From what I understand, your MyXMLMapper/DmozXMLMapper java class is
not being found by the streaming runner, so it probably considers that
as a shell program instead of a java class, and hence fails.

The issue with your command is that "-file" simply adds the given
files to the operating MR cluster but does not add it to the runtime
classpath of your mappers/reducers. Using "-libjars" instead, for your
XML2HBase.jar file to specify a dependent jar, should solve this, if
am right.

You can see all other available options of hadoop-streaming by passing
"-info". Hope this helps.

-- 
Harsh J
www.harshj.com

Reply via email to