Please read the source code of DataJoinJob.java Then you would know that the last parameter should be the number of reducers.
On Wed, Jul 14, 2010 at 2:33 AM, Denim Live <denim.l...@yahoo.com> wrote: > Hi, > > Thanks. I have located the datajoin jar. Now I execute the progam the same > way > as specified in the readme file of the datajoin. I have two text files A > and B > with the same content as mentioned in the > > $Hadoop_Home/src/contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/readme.txt > file. The command line i use is: > > bin/hadoop jar hadoop-0.19.2-datajoin.jar > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoinIn datajoinOut > org.apache.hadoop.io.Text 1 > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput > org.apache.hadoop.io.Text > > But I get the following error: > > Using SequenceFileInputFormat: datajoinOut > java.lang.NumberFormatException: For input string: > "org.apache.hadoop.io.Text" > at > java.lang.NumberFormatException.forInputString(NumberFormatException. > java:48) > at java.lang.Integer.parseInt(Integer.java:449) > at java.lang.Integer.parseInt(Integer.java:499) > at > org.apache.hadoop.contrib.utils.join.DataJoinJob.createDataJoinJob(Da > taJoinJob.java:70) > at > org.apache.hadoop.contrib.utils.join.DataJoinJob.main(DataJoinJob.jav > a:165) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. > java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > sorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:165) > at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) > > I don't understand why is it giving the number format exception? and also > that > in datajoinOut?? My input files contain the records with tab-separated > fields as > described in the readme file. Should I use sequence files for input? I have > tried that as well but I get the same error. > > Any help in this regard is highly appreciated. I have tried this for so > long in > vain. > > Thanks in advance > > > > ________________________________ > From: Hemanth Yamijala <yhema...@gmail.com> > To: common-user@hadoop.apache.org > Sent: Mon, July 12, 2010 9:21:31 AM > Subject: Re: Hadoop's datajoin > > Hi, > > > I am trying to use the hadoop's datajoin for joining two relation. > According > to > > the Readme file of datajoin, it gives the following syntax: > > > > $HADOOP_HOME/bin/hadoop jar hadoop-datajoin-examples.jar > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input > > datajoin/output > > Text 1 org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text > > > > > > But I do not find hadoop-datajoin-examples.jar anywhere in my > Hadoop_home. Can > > anyone tell me how to produce it or where to find it? > > Datajoin is a contrib module. So, you will typically find it under > contrib/datajoin/. The name could something slightly different - it > could have a version number and other things. > > Thanks > Hemanth > > > > Thanks in advance. > > > > > > > > > > > > >