Simple Hadoop program build with Maven

2011-10-07 Thread Periya.Data
Hi all,
I am migrating from ant builds to maven. So, brand new to Maven and do
not yet understand many parts of it.

Problem: I have a perfectly working map-reduce program (working by ant
build). This program needs an external jar file (json-rpc-1.0.jar). So, when
I run the program, I do the following to get a nice output:

$ hadoop jar jar/myHadoopProgram.jar -libjars ../lib/json-rpc-1.0.jar
/usr/PD/input/sample22.json /usr/PD/output/

(note that I include the external jar file by the "-libjars" option as
mentioned in the "Hadoop: The Definitive Guide 2nd Edition" - page 253).
Everything is fine with my ant build.

So, now, I move on to Maven. I had some trouble getting my pom.xml right. I
am still unsure if it is right, but, it builds "successfully" (the resulting
jar file has the class files of my program).  The essential part of my
pom.xml has the two following dependencies (a complete pom.xml is at the end
of this email).


 
   com.metaparadigm
   json-rpc
   1.0
 

  
 
   org.apache.hadoop
   hadoop-core
   0.20.2
   provided
 


I try to run it like this:

$ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
com.ABC.MyHadoopProgram /usr/PD/input/sample22.json /usr/PD/output
Exception in thread "main" java.lang.ClassNotFoundException: -libjars
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
$

Then, I thought, maybe it is not necessary to include the classpath. So, I
ran with the following command:

$ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
/usr/PD/input/sample22.json /usr/PD/output
Exception in thread "main" java.lang.ClassNotFoundException: -libjars
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
$

Question: What am I doing wrong? I know, since I am new to Maven, I may be
missing some key pieces/concepts. What really happens when one builds the
classes, where my java program imports org.json.JSONArray and
org.json.JSONObject? This import is just for compilation I suppose and it
does not get "embedded" into the final jar. Am I right?

I want to either bundle-up the external jar(s) into a single jar and
conveniently run hadoop using that, or, know how to include the external
jars in my command-line.


This is what I have:
- maven 3.0.3
- Mac OSX
- Java 1.6.0_26
- Hadoop - CDH 0.20.2-cdh3u0

I have Googled, looked at Tom White's github repo (
https://github.com/cloudera/repository-example/blob/master/pom.xml). The
more I Google, the more confused I get.

Any help is highly appreciated.

Thanks,
PD.





http://maven.apache.org/POM/4.0.0"; xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance";
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
  4.0.0

  com.ABC
  MyHadoopProgram
  1.0
  jar

  MyHadoopProgram
  http://maven.apache.org

  
UTF-8
  

  

  
 
   com.metaparadigm
   json-rpc
   1.0
 

  
 
   org.apache.hadoop
   hadoop-core
   0.20.2
   provided
 

  



Re: Simple Hadoop program build with Maven

2011-10-07 Thread Bochun Bai
To make a jar bundled big jar file using maven I suggest this plugin:
http://anydoby.com/fatjar/usage.html
But I prefer not doing so, because the classpath order is different
with different environment.

I guess your old myHadoopProgram.jar should contains Main-Class meta info.
So the following ***xxx*** part is omitted. It originally likes:

hadoop jar jar/myHadoopProgram.jar ***com.ABC.xxx*** -libjars
../lib/json-rpc-1.0.jar
/usr/PD/input/sample22.json /usr/PD/output/

I suggest you add the Main-Class meta following this:

http://maven.apache.org/plugins/maven-assembly-plugin/usage.html#Advanced_Configuration
or
pay attention to the order of  and <-libjars ..> using:
hadoop jar   <-libjars ...>  

On Sat, Oct 8, 2011 at 12:05 PM, Periya.Data  wrote:
> Hi all,
>    I am migrating from ant builds to maven. So, brand new to Maven and do
> not yet understand many parts of it.
>
> Problem: I have a perfectly working map-reduce program (working by ant
> build). This program needs an external jar file (json-rpc-1.0.jar). So, when
> I run the program, I do the following to get a nice output:
>
> $ hadoop jar jar/myHadoopProgram.jar -libjars ../lib/json-rpc-1.0.jar
> /usr/PD/input/sample22.json /usr/PD/output/
>
> (note that I include the external jar file by the "-libjars" option as
> mentioned in the "Hadoop: The Definitive Guide 2nd Edition" - page 253).
> Everything is fine with my ant build.
>
> So, now, I move on to Maven. I had some trouble getting my pom.xml right. I
> am still unsure if it is right, but, it builds "successfully" (the resulting
> jar file has the class files of my program).  The essential part of my
> pom.xml has the two following dependencies (a complete pom.xml is at the end
> of this email).
>
> 
>     
>       com.metaparadigm
>       json-rpc
>       1.0
>     
>
>  
>     
>       org.apache.hadoop
>       hadoop-core
>       0.20.2
>       provided
>     
>
>
> I try to run it like this:
>
> $ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
> com.ABC.MyHadoopProgram /usr/PD/input/sample22.json /usr/PD/output
> Exception in thread "main" java.lang.ClassNotFoundException: -libjars
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>    at java.lang.Class.forName0(Native Method)
>    at java.lang.Class.forName(Class.java:247)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
> $
>
> Then, I thought, maybe it is not necessary to include the classpath. So, I
> ran with the following command:
>
> $ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
> /usr/PD/input/sample22.json /usr/PD/output
> Exception in thread "main" java.lang.ClassNotFoundException: -libjars
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>    at java.lang.Class.forName0(Native Method)
>    at java.lang.Class.forName(Class.java:247)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
> $
>
> Question: What am I doing wrong? I know, since I am new to Maven, I may be
> missing some key pieces/concepts. What really happens when one builds the
> classes, where my java program imports org.json.JSONArray and
> org.json.JSONObject? This import is just for compilation I suppose and it
> does not get "embedded" into the final jar. Am I right?
>
> I want to either bundle-up the external jar(s) into a single jar and
> conveniently run hadoop using that, or, know how to include the external
> jars in my command-line.
>
>
> This is what I have:
> - maven 3.0.3
> - Mac OSX
> - Java 1.6.0_26
> - Hadoop - CDH 0.20.2-cdh3u0
>
> I have Googled, looked at Tom White's github repo (
> https://github.com/cloudera/repository-example/blob/master/pom.xml). The
> more I Google, the more confused I get.
>
> Any help is highly appreciated.
>
> Thanks,
> PD.
>
>
>
>
>
> http://maven.apache.org/POM/4.0.0"; xmlns:xsi="
> http://www.w3.org/2001/XMLSchema-instance";
>  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
> http://maven.apache.org/xsd/maven-4.0.0.xsd";>
>  4.0.0
>
>  com.ABC
>  MyHadoopProgram
>  1.0
>  jar
>
>  MyHadoopProgram
>  http://maven.apache.org
>
>  
>    UTF-8
>  
>
>  
>
>  
>     
>       com.metaparadigm
>       json-rpc
>       1.0
>     
>
>  
>     
>       org.apache.hadoop
>       hadoop-core
>       0.20.2
>       provided
>     
>
>  
> 
>


Re: Simple Hadoop program build with Maven

2011-10-08 Thread Periya.Data
Fantastic ! Worked like a charm. Thanks much Bochun.

For those who are facing similar issues, here is the command and output:

$ hadoop jar ../MyHadoopProgram.jar com.ABC.MyHadoopProgram -libjars
~/CDH3/extJars/json-rpc-1.0.jar /usr/PD/input/sample22.json /usr/PD/output
11/10/08 17:51:45 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/10/08 17:51:46 INFO mapred.JobClient: Running job: job_201110072230_0005
11/10/08 17:51:47 INFO mapred.JobClient:  map 0% reduce 0%
11/10/08 17:51:58 INFO mapred.JobClient:  map 50% reduce 0%
11/10/08 17:51:59 INFO mapred.JobClient:  map 100% reduce 0%
11/10/08 17:52:08 INFO mapred.JobClient:  map 100% reduce 100%
11/10/08 17:52:10 INFO mapred.JobClient: Job complete: job_201110072230_0005
11/10/08 17:52:10 INFO mapred.JobClient: Counters: 23
11/10/08 17:52:10 INFO mapred.JobClient:   Job Counters
11/10/08 17:52:10 INFO mapred.JobClient: Launched reduce tasks=1
11/10/08 17:52:10 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=17981
11/10/08 17:52:10 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
11/10/08 17:52:10 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
11/10/08 17:52:10 INFO mapred.JobClient: Launched map tasks=2
11/10/08 17:52:10 INFO mapred.JobClient: Data-local map tasks=2
11/10/08 17:52:10 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9421
11/10/08 17:52:10 INFO mapred.JobClient:   FileSystemCounters
11/10/08 17:52:10 INFO mapred.JobClient: FILE_BYTES_READ=606
11/10/08 17:52:10 INFO mapred.JobClient: HDFS_BYTES_READ=56375
11/10/08 17:52:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=157057
11/10/08 17:52:10 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=504
11/10/08 17:52:10 INFO mapred.JobClient:   Map-Reduce Framework
11/10/08 17:52:10 INFO mapred.JobClient: Reduce input groups=24
11/10/08 17:52:10 INFO mapred.JobClient: Combine output records=24
11/10/08 17:52:10 INFO mapred.JobClient: Map input records=24
11/10/08 17:52:10 INFO mapred.JobClient: Reduce shuffle bytes=306
11/10/08 17:52:10 INFO mapred.JobClient: Reduce output records=24
11/10/08 17:52:10 INFO mapred.JobClient: Spilled Records=48
11/10/08 17:52:10 INFO mapred.JobClient: Map output bytes=552
11/10/08 17:52:10 INFO mapred.JobClient: Map input bytes=54923
11/10/08 17:52:10 INFO mapred.JobClient: Combine input records=24
11/10/08 17:52:10 INFO mapred.JobClient: Map output records=24
11/10/08 17:52:10 INFO mapred.JobClient: SPLIT_RAW_BYTES=240
11/10/08 17:52:10 INFO mapred.JobClient: Reduce input records=24
$



Appreciate you help.
PD.

On Fri, Oct 7, 2011 at 11:31 PM, Bochun Bai  wrote:

> To make a jar bundled big jar file using maven I suggest this plugin:
>http://anydoby.com/fatjar/usage.html
> But I prefer not doing so, because the classpath order is different
> with different environment.
>
> I guess your old myHadoopProgram.jar should contains Main-Class meta info.
> So the following ***xxx*** part is omitted. It originally likes:
>
> hadoop jar jar/myHadoopProgram.jar ***com.ABC.xxx*** -libjars
> ../lib/json-rpc-1.0.jar
> /usr/PD/input/sample22.json /usr/PD/output/
>
> I suggest you add the Main-Class meta following this:
>
> http://maven.apache.org/plugins/maven-assembly-plugin/usage.html#Advanced_Configuration
> or
>pay attention to the order of  and <-libjars ..> using:
>hadoop jar   <-libjars ...>  
>
> On Sat, Oct 8, 2011 at 12:05 PM, Periya.Data 
> wrote:
> > Hi all,
> >I am migrating from ant builds to maven. So, brand new to Maven and do
> > not yet understand many parts of it.
> >
> > Problem: I have a perfectly working map-reduce program (working by ant
> > build). This program needs an external jar file (json-rpc-1.0.jar). So,
> when
> > I run the program, I do the following to get a nice output:
> >
> > $ hadoop jar jar/myHadoopProgram.jar -libjars ../lib/json-rpc-1.0.jar
> > /usr/PD/input/sample22.json /usr/PD/output/
> >
> > (note that I include the external jar file by the "-libjars" option as
> > mentioned in the "Hadoop: The Definitive Guide 2nd Edition" - page 253).
> > Everything is fine with my ant build.
> >
> > So, now, I move on to Maven. I had some trouble getting my pom.xml right.
> I
> > am still unsure if it is right, but, it builds "successfully" (the
> resulting
> > jar file has the class files of my program).  The essential part of my
> > pom.xml has the two following dependencies (a complete pom.xml is at the
> end
> > of this email).
> >
> > 
> > 
> >   com.metaparadigm
> >   json-rpc
> >   1.0
> > 
> >
> >  
> > 
> >   org.apache.hadoop
> >   hadoop-core
> >   0.20.2
> >   provided
> > 
> >
> >
> > I try to run it like this:
> >
> > $ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
> > com.ABC.MyHadoopProgram /usr/PD/input/sample22.json /usr/PD/output
> > Exception in thread "main" java.lang.ClassNotFoundExcep