Fantastic ! Worked like a charm. Thanks much Bochun. For those who are facing similar issues, here is the command and output:
$ hadoop jar ../MyHadoopProgram.jar com.ABC.MyHadoopProgram -libjars ~/CDH3/extJars/json-rpc-1.0.jar /usr/PD/input/sample22.json /usr/PD/output 11/10/08 17:51:45 INFO mapred.FileInputFormat: Total input paths to process : 1 11/10/08 17:51:46 INFO mapred.JobClient: Running job: job_201110072230_0005 11/10/08 17:51:47 INFO mapred.JobClient: map 0% reduce 0% 11/10/08 17:51:58 INFO mapred.JobClient: map 50% reduce 0% 11/10/08 17:51:59 INFO mapred.JobClient: map 100% reduce 0% 11/10/08 17:52:08 INFO mapred.JobClient: map 100% reduce 100% 11/10/08 17:52:10 INFO mapred.JobClient: Job complete: job_201110072230_0005 11/10/08 17:52:10 INFO mapred.JobClient: Counters: 23 11/10/08 17:52:10 INFO mapred.JobClient: Job Counters 11/10/08 17:52:10 INFO mapred.JobClient: Launched reduce tasks=1 11/10/08 17:52:10 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=17981 11/10/08 17:52:10 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/10/08 17:52:10 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/10/08 17:52:10 INFO mapred.JobClient: Launched map tasks=2 11/10/08 17:52:10 INFO mapred.JobClient: Data-local map tasks=2 11/10/08 17:52:10 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9421 11/10/08 17:52:10 INFO mapred.JobClient: FileSystemCounters 11/10/08 17:52:10 INFO mapred.JobClient: FILE_BYTES_READ=606 11/10/08 17:52:10 INFO mapred.JobClient: HDFS_BYTES_READ=56375 11/10/08 17:52:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=157057 11/10/08 17:52:10 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=504 11/10/08 17:52:10 INFO mapred.JobClient: Map-Reduce Framework 11/10/08 17:52:10 INFO mapred.JobClient: Reduce input groups=24 11/10/08 17:52:10 INFO mapred.JobClient: Combine output records=24 11/10/08 17:52:10 INFO mapred.JobClient: Map input records=24 11/10/08 17:52:10 INFO mapred.JobClient: Reduce shuffle bytes=306 11/10/08 17:52:10 INFO mapred.JobClient: Reduce output records=24 11/10/08 17:52:10 INFO mapred.JobClient: Spilled Records=48 11/10/08 17:52:10 INFO mapred.JobClient: Map output bytes=552 11/10/08 17:52:10 INFO mapred.JobClient: Map input bytes=54923 11/10/08 17:52:10 INFO mapred.JobClient: Combine input records=24 11/10/08 17:52:10 INFO mapred.JobClient: Map output records=24 11/10/08 17:52:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=240 11/10/08 17:52:10 INFO mapred.JobClient: Reduce input records=24 $ Appreciate you help. PD. On Fri, Oct 7, 2011 at 11:31 PM, Bochun Bai <b...@bbcn.name> wrote: > To make a jar bundled big jar file using maven I suggest this plugin: > http://anydoby.com/fatjar/usage.html > But I prefer not doing so, because the classpath order is different > with different environment. > > I guess your old myHadoopProgram.jar should contains Main-Class meta info. > So the following ***xxx*** part is omitted. It originally likes: > > hadoop jar jar/myHadoopProgram.jar ***com.ABC.xxx*** -libjars > ../lib/json-rpc-1.0.jar > /usr/PD/input/sample22.json /usr/PD/output/ > > I suggest you add the Main-Class meta following this: > > http://maven.apache.org/plugins/maven-assembly-plugin/usage.html#Advanced_Configuration > or > pay attention to the order of <class> and <-libjars ..> using: > hadoop jar <jar> <class> <-libjars ...> <input> <output> > > On Sat, Oct 8, 2011 at 12:05 PM, Periya.Data <periya.d...@gmail.com> > wrote: > > Hi all, > > I am migrating from ant builds to maven. So, brand new to Maven and do > > not yet understand many parts of it. > > > > Problem: I have a perfectly working map-reduce program (working by ant > > build). This program needs an external jar file (json-rpc-1.0.jar). So, > when > > I run the program, I do the following to get a nice output: > > > > $ hadoop jar jar/myHadoopProgram.jar -libjars ../lib/json-rpc-1.0.jar > > /usr/PD/input/sample22.json /usr/PD/output/ > > > > (note that I include the external jar file by the "-libjars" option as > > mentioned in the "Hadoop: The Definitive Guide 2nd Edition" - page 253). > > Everything is fine with my ant build. > > > > So, now, I move on to Maven. I had some trouble getting my pom.xml right. > I > > am still unsure if it is right, but, it builds "successfully" (the > resulting > > jar file has the class files of my program). The essential part of my > > pom.xml has the two following dependencies (a complete pom.xml is at the > end > > of this email). > > > > <!-- org.json.* --> > > <dependency> > > <groupId>com.metaparadigm</groupId> > > <artifactId>json-rpc</artifactId> > > <version>1.0</version> > > </dependency> > > > > <!-- org.apache.hadoop.* --> > > <dependency> > > <groupId>org.apache.hadoop</groupId> > > <artifactId>hadoop-core</artifactId> > > <version>0.20.2</version> > > <scope>provided</scope> > > </dependency> > > > > > > I try to run it like this: > > > > $ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar > > com.ABC.MyHadoopProgram /usr/PD/input/sample22.json /usr/PD/output > > Exception in thread "main" java.lang.ClassNotFoundException: -libjars > > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:247) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:179) > > $ > > > > Then, I thought, maybe it is not necessary to include the classpath. So, > I > > ran with the following command: > > > > $ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar > > /usr/PD/input/sample22.json /usr/PD/output > > Exception in thread "main" java.lang.ClassNotFoundException: -libjars > > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:247) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:179) > > $ > > > > Question: What am I doing wrong? I know, since I am new to Maven, I may > be > > missing some key pieces/concepts. What really happens when one builds the > > classes, where my java program imports org.json.JSONArray and > > org.json.JSONObject? This import is just for compilation I suppose and it > > does not get "embedded" into the final jar. Am I right? > > > > I want to either bundle-up the external jar(s) into a single jar and > > conveniently run hadoop using that, or, know how to include the external > > jars in my command-line. > > > > > > This is what I have: > > - maven 3.0.3 > > - Mac OSX > > - Java 1.6.0_26 > > - Hadoop - CDH 0.20.2-cdh3u0 > > > > I have Googled, looked at Tom White's github repo ( > > https://github.com/cloudera/repository-example/blob/master/pom.xml). The > > more I Google, the more confused I get. > > > > Any help is highly appreciated. > > > > Thanks, > > PD. > > > > > > > > > > > > <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi=" > > http://www.w3.org/2001/XMLSchema-instance" > > xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 > > http://maven.apache.org/xsd/maven-4.0.0.xsd"> > > <modelVersion>4.0.0</modelVersion> > > > > <groupId>com.ABC</groupId> > > <artifactId>MyHadoopProgram</artifactId> > > <version>1.0</version> > > <packaging>jar</packaging> > > > > <name>MyHadoopProgram</name> > > <url>http://maven.apache.org</url> > > > > <properties> > > <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> > > </properties> > > > > <dependencies> > > > > <!-- org.json.* --> > > <dependency> > > <groupId>com.metaparadigm</groupId> > > <artifactId>json-rpc</artifactId> > > <version>1.0</version> > > </dependency> > > > > <!-- org.apache.hadoop.* --> > > <dependency> > > <groupId>org.apache.hadoop</groupId> > > <artifactId>hadoop-core</artifactId> > > <version>0.20.2</version> > > <scope>provided</scope> > > </dependency> > > > > </dependencies> > > </project> > > >