Fantastic ! Worked like a charm. Thanks much Bochun.

For those who are facing similar issues, here is the command and output:

$ hadoop jar ../MyHadoopProgram.jar com.ABC.MyHadoopProgram -libjars
~/CDH3/extJars/json-rpc-1.0.jar /usr/PD/input/sample22.json /usr/PD/output
11/10/08 17:51:45 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/10/08 17:51:46 INFO mapred.JobClient: Running job: job_201110072230_0005
11/10/08 17:51:47 INFO mapred.JobClient:  map 0% reduce 0%
11/10/08 17:51:58 INFO mapred.JobClient:  map 50% reduce 0%
11/10/08 17:51:59 INFO mapred.JobClient:  map 100% reduce 0%
11/10/08 17:52:08 INFO mapred.JobClient:  map 100% reduce 100%
11/10/08 17:52:10 INFO mapred.JobClient: Job complete: job_201110072230_0005
11/10/08 17:52:10 INFO mapred.JobClient: Counters: 23
11/10/08 17:52:10 INFO mapred.JobClient:   Job Counters
11/10/08 17:52:10 INFO mapred.JobClient:     Launched reduce tasks=1
11/10/08 17:52:10 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=17981
11/10/08 17:52:10 INFO mapred.JobClient:     Total time spent by all reduces
waiting after reserving slots (ms)=0
11/10/08 17:52:10 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
11/10/08 17:52:10 INFO mapred.JobClient:     Launched map tasks=2
11/10/08 17:52:10 INFO mapred.JobClient:     Data-local map tasks=2
11/10/08 17:52:10 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9421
11/10/08 17:52:10 INFO mapred.JobClient:   FileSystemCounters
11/10/08 17:52:10 INFO mapred.JobClient:     FILE_BYTES_READ=606
11/10/08 17:52:10 INFO mapred.JobClient:     HDFS_BYTES_READ=56375
11/10/08 17:52:10 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=157057
11/10/08 17:52:10 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=504
11/10/08 17:52:10 INFO mapred.JobClient:   Map-Reduce Framework
11/10/08 17:52:10 INFO mapred.JobClient:     Reduce input groups=24
11/10/08 17:52:10 INFO mapred.JobClient:     Combine output records=24
11/10/08 17:52:10 INFO mapred.JobClient:     Map input records=24
11/10/08 17:52:10 INFO mapred.JobClient:     Reduce shuffle bytes=306
11/10/08 17:52:10 INFO mapred.JobClient:     Reduce output records=24
11/10/08 17:52:10 INFO mapred.JobClient:     Spilled Records=48
11/10/08 17:52:10 INFO mapred.JobClient:     Map output bytes=552
11/10/08 17:52:10 INFO mapred.JobClient:     Map input bytes=54923
11/10/08 17:52:10 INFO mapred.JobClient:     Combine input records=24
11/10/08 17:52:10 INFO mapred.JobClient:     Map output records=24
11/10/08 17:52:10 INFO mapred.JobClient:     SPLIT_RAW_BYTES=240
11/10/08 17:52:10 INFO mapred.JobClient:     Reduce input records=24
$



Appreciate you help.
PD.

On Fri, Oct 7, 2011 at 11:31 PM, Bochun Bai <b...@bbcn.name> wrote:

> To make a jar bundled big jar file using maven I suggest this plugin:
>    http://anydoby.com/fatjar/usage.html
> But I prefer not doing so, because the classpath order is different
> with different environment.
>
> I guess your old myHadoopProgram.jar should contains Main-Class meta info.
> So the following ***xxx*** part is omitted. It originally likes:
>
> hadoop jar jar/myHadoopProgram.jar ***com.ABC.xxx*** -libjars
> ../lib/json-rpc-1.0.jar
> /usr/PD/input/sample22.json /usr/PD/output/
>
> I suggest you add the Main-Class meta following this:
>
> http://maven.apache.org/plugins/maven-assembly-plugin/usage.html#Advanced_Configuration
> or
>    pay attention to the order of <class> and <-libjars ..> using:
>    hadoop jar <jar> <class> <-libjars ...> <input> <output>
>
> On Sat, Oct 8, 2011 at 12:05 PM, Periya.Data <periya.d...@gmail.com>
> wrote:
> > Hi all,
> >    I am migrating from ant builds to maven. So, brand new to Maven and do
> > not yet understand many parts of it.
> >
> > Problem: I have a perfectly working map-reduce program (working by ant
> > build). This program needs an external jar file (json-rpc-1.0.jar). So,
> when
> > I run the program, I do the following to get a nice output:
> >
> > $ hadoop jar jar/myHadoopProgram.jar -libjars ../lib/json-rpc-1.0.jar
> > /usr/PD/input/sample22.json /usr/PD/output/
> >
> > (note that I include the external jar file by the "-libjars" option as
> > mentioned in the "Hadoop: The Definitive Guide 2nd Edition" - page 253).
> > Everything is fine with my ant build.
> >
> > So, now, I move on to Maven. I had some trouble getting my pom.xml right.
> I
> > am still unsure if it is right, but, it builds "successfully" (the
> resulting
> > jar file has the class files of my program).  The essential part of my
> > pom.xml has the two following dependencies (a complete pom.xml is at the
> end
> > of this email).
> >
> > <!-- org.json.* -->
> >     <dependency>
> >       <groupId>com.metaparadigm</groupId>
> >       <artifactId>json-rpc</artifactId>
> >       <version>1.0</version>
> >     </dependency>
> >
> >  <!-- org.apache.hadoop.* -->
> >     <dependency>
> >       <groupId>org.apache.hadoop</groupId>
> >       <artifactId>hadoop-core</artifactId>
> >       <version>0.20.2</version>
> >       <scope>provided</scope>
> >     </dependency>
> >
> >
> > I try to run it like this:
> >
> > $ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
> > com.ABC.MyHadoopProgram /usr/PD/input/sample22.json /usr/PD/output
> > Exception in thread "main" java.lang.ClassNotFoundException: -libjars
> >    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >    at java.lang.Class.forName0(Native Method)
> >    at java.lang.Class.forName(Class.java:247)
> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
> > $
> >
> > Then, I thought, maybe it is not necessary to include the classpath. So,
> I
> > ran with the following command:
> >
> > $ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
> > /usr/PD/input/sample22.json /usr/PD/output
> > Exception in thread "main" java.lang.ClassNotFoundException: -libjars
> >    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >    at java.lang.Class.forName0(Native Method)
> >    at java.lang.Class.forName(Class.java:247)
> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
> > $
> >
> > Question: What am I doing wrong? I know, since I am new to Maven, I may
> be
> > missing some key pieces/concepts. What really happens when one builds the
> > classes, where my java program imports org.json.JSONArray and
> > org.json.JSONObject? This import is just for compilation I suppose and it
> > does not get "embedded" into the final jar. Am I right?
> >
> > I want to either bundle-up the external jar(s) into a single jar and
> > conveniently run hadoop using that, or, know how to include the external
> > jars in my command-line.
> >
> >
> > This is what I have:
> > - maven 3.0.3
> > - Mac OSX
> > - Java 1.6.0_26
> > - Hadoop - CDH 0.20.2-cdh3u0
> >
> > I have Googled, looked at Tom White's github repo (
> > https://github.com/cloudera/repository-example/blob/master/pom.xml). The
> > more I Google, the more confused I get.
> >
> > Any help is highly appreciated.
> >
> > Thanks,
> > PD.
> >
> >
> >
> >
> >
> > <project xmlns="http://maven.apache.org/POM/4.0.0"; xmlns:xsi="
> > http://www.w3.org/2001/XMLSchema-instance";
> >  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> > http://maven.apache.org/xsd/maven-4.0.0.xsd";>
> >  <modelVersion>4.0.0</modelVersion>
> >
> >  <groupId>com.ABC</groupId>
> >  <artifactId>MyHadoopProgram</artifactId>
> >  <version>1.0</version>
> >  <packaging>jar</packaging>
> >
> >  <name>MyHadoopProgram</name>
> >  <url>http://maven.apache.org</url>
> >
> >  <properties>
> >    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
> >  </properties>
> >
> >  <dependencies>
> >
> >  <!-- org.json.* -->
> >     <dependency>
> >       <groupId>com.metaparadigm</groupId>
> >       <artifactId>json-rpc</artifactId>
> >       <version>1.0</version>
> >     </dependency>
> >
> >  <!-- org.apache.hadoop.* -->
> >     <dependency>
> >       <groupId>org.apache.hadoop</groupId>
> >       <artifactId>hadoop-core</artifactId>
> >       <version>0.20.2</version>
> >       <scope>provided</scope>
> >     </dependency>
> >
> >  </dependencies>
> > </project>
> >
>

Reply via email to