Yes the job doesn't even start, no mapping phase...it fails almost instantly. I think I tried setting the HADOOP_CLASSPATH variable but I'll do it again.
Thanks for your help, Marco On 15 September 2011 13:44, Joey Echeverria <j...@cloudera.com> wrote: > Ok, but does the job even start the maps, or does it fail during initial > setup? > > The reason I ask is libjars only adds the jar to the classpath for the > mappers and reducers. If you need the class before the job is > submitted to the cluster, you should do something like this: > > HADOOP_CLASSPATH=../umd-hadoop-core/cloud9.jar hadoop jar myjob.jar > myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar > home/my/pyworkspace/openAnc.xml index/ 10 1 > > -Joey > > On Thu, Sep 15, 2011 at 4:24 AM, Marco Didonna <m.didonn...@gmail.com> wrote: >> Right now I am still in standalone mode ... I'd like to fix this issue >> before starting a cluster on EC2. :) >> >> Thanks for your time >> >> Marco >> >> On 14 September 2011 14:04, Joey Echeverria <j...@cloudera.com> wrote: >>> When are you getting the exception? Is it during the setup of your >>> job, or after it's running on the cluster? >>> >>> -Joey >>> >>> On Wed, Sep 14, 2011 at 4:50 AM, Marco Didonna <m.didonn...@gmail.com> >>> wrote: >>>> Hello everyone, >>>> sorry to bring this up again but I need some clarification. I wrote a >>>> map-reduce application that need cloud9 library >>>> (https://github.com/lintool/Cloud9). This library is packet in a jar >>>> file and I want to make it available to the whole cluster. So far I >>>> have been working in standalone mode and I have unsuccessfully tried >>>> to use the -libjars options. I always get ClassNotDefException: the >>>> only way I made everything work fine is by copying the cloud9.jar into >>>> hadoop/lib folder. >>>> I suppose I cannot do it when using a cluster of N machines since I >>>> would have to copy it on the N machines and this approach isn't >>>> feasible. >>>> >>>> Here's how I perform the job "hadoop jar myjob.jar >>>> myjob.driver.PreprocessANC -libjars ../umd-hadoop-core/cloud9.jar >>>> home/my/pyworkspace/openAnc.xml index/ 10 1" >>>> >>>> Is there some code that needs to be written in the driver in order to >>>> have the darn library added to the "global" classpath? This -libjars >>>> option is really poor documented IMHO. >>>> >>>> Any help would be very much appreciated ;) >>>> >>>> Marco Didonna >>>> >>>> On 17 August 2011 03:57, Anty <anty....@gmail.com> wrote: >>>>> Thanks very much , todd. I get it. >>>>> >>>>> >>>>> On Wed, Aug 17, 2011 at 6:23 AM, Todd Lipcon <t...@cloudera.com> wrote: >>>>>> Putting files on the classpath doesn't make them accessible to JVM's >>>>>> resource loader. If you have dir/foo.properties, then "dir" needs to >>>>>> be on the classpath, not "dir/foo.properties". Since the working dir >>>>>> of the task is on the classpath, then -files works since it gets the >>>>>> properties file into a directory on the classpath. >>>>>> >>>>>> -Todd >>>>>> >>>>>> On Mon, Aug 15, 2011 at 8:09 PM, Anty <anty....@gmail.com> wrote: >>>>>>> thanks very much for you reply, todd. >>>>>>> I am at a complete loss. I want to ship a configuration file to the >>>>>>> cluster to run my mapreduce job. >>>>>>> >>>>>>> if I use -libjars option to ship the configuration file, the launched >>>>>>> child JVM created by task tracker >>>>>>> can't find the configuration file,curiously, the configuration file >>>>>>> is already on the classpath of the child JVM. >>>>>>> >>>>>>> if I use -files option to ship the configuration file, the child JVM >>>>>>> can find the file. >>>>>>> IMO, what's the difference between -libjars and -files is that -files >>>>>>> will create a symbol sink to the configuration file >>>>>>> in current workding directory of child JVM. >>>>>>> >>>>>>> I dig into the source code,but it's so complicated, i can't figure out >>>>>>> the root cause of this. >>>>>>> So my question is : >>>>>>> with -libjars option ,the configuration file is already on the >>>>>>> classpath, why classload can't the configuration file , >>>>>>> but why JVM classload CAN find the shipped jar with -libjars option? >>>>>>> >>>>>>> any help will be appreciated. >>>>>>> >>>>>>> On Tue, Aug 16, 2011 at 1:06 AM, Todd Lipcon <t...@cloudera.com> wrote: >>>>>>>> Your "driver" is the program that submits the job. The task is the >>>>>>>> thing that runs on the cluster. They have separate classpaths. >>>>>>>> >>>>>>>> Better to ask on the public lists if you want a more indepth >>>>>>>> explanation >>>>>>>> >>>>>>>> -Todd >>>>>>>> >>>>>>>> On Mon, Aug 15, 2011 at 9:02 AM, Anty <anty....@gmail.com> wrote: >>>>>>>>> Hi:Todd >>>>>>>>> Would you please explain a litter more? >>>>>>>>> >>>>>>>>> On Sat, Dec 11, 2010 at 2:08 AM, Todd Lipcon <t...@cloudera.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> You need to put the library jar on your classpath (eg using >>>>>>>>>> HADOOP_CLASSPATH) as well. The -libjars will ship it to the cluster >>>>>>>>>> and put it on the classpath of your task, but not the classpath of >>>>>>>>>> your "driver" code. >>>>>>>>>> >>>>>>>>> I still can't understand you mean by " but not the classpath of >>>>>>>>> your "driver" code." >>>>>>>>> >>>>>>>>> THX advance. >>>>>>>>> >>>>>>>>> >>>>>>>>>> -Todd >>>>>>>>>> >>>>>>>>>> On Thu, Dec 9, 2010 at 10:29 PM, Vipul Pandey <vipan...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> > disclaimer : a newbie!!! >>>>>>>>>> > Howdy? >>>>>>>>>> > Got a quick question. -libjars option doesn't seem to work for me >>>>>>>>>> > in - >>>>>>>>>> > prettymuch - my first (or mayby second) mapreduce job. >>>>>>>>>> > Here's what i'm doing : >>>>>>>>>> > $bin/hadoop jar sherlock.jar somepkg.FindSchoolsJob -libjars >>>>>>>>>> > HStats-1A18.jar input output >>>>>>>>>> > >>>>>>>>>> > sherlock.jar has my main class (ofcourse) FindSchoolsJob, which >>>>>>>>>> > runs >>>>>>>>>> > just >>>>>>>>>> > fine by itself till I add a dependency on a class >>>>>>>>>> > in HStats-1A18.jar. >>>>>>>>>> > When I run the above command with -libjars specified - it fails to >>>>>>>>>> > find >>>>>>>>>> > my >>>>>>>>>> > classes that 'are' inside HStats jar file. >>>>>>>>>> > Exception in thread "main" java.lang.NoClassDefFoundError: >>>>>>>>>> > com/*****/HAgent >>>>>>>>>> > at com.*****.FindSchoolsJob.run(FindSchoolsJob.java:46) >>>>>>>>>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>>>>>>>>> > at com.******.FindSchoolsJob.main(FindSchoolsJob.java:101) >>>>>>>>>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>>>> > at >>>>>>>>>> > >>>>>>>>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>>>>>>> > at >>>>>>>>>> > >>>>>>>>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>>>>>> > at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>>>>> > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >>>>>>>>>> > Caused by: java.lang.ClassNotFoundException:com/*****/HAgent >>>>>>>>>> > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >>>>>>>>>> > at java.security.AccessController.doPrivileged(Native Method) >>>>>>>>>> > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >>>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >>>>>>>>>> > at java.lang.ClassLoader.loadClass(ClassLoader.java:248) >>>>>>>>>> > ... 8 more >>>>>>>>>> > >>>>>>>>>> > My main class is defined as below : >>>>>>>>>> > public class FindSchoolsJob extends Configured implements Tool { >>>>>>>>>> > : >>>>>>>>>> > public int run(String[] args) throws Exception { >>>>>>>>>> > : >>>>>>>>>> > : >>>>>>>>>> > } >>>>>>>>>> > : >>>>>>>>>> > public static void main(String[] args) throws Exception { >>>>>>>>>> > int res = ToolRunner.run(new Configuration(), new FindSchoolsJob(), >>>>>>>>>> > args); >>>>>>>>>> > System.exit(res); >>>>>>>>>> > } >>>>>>>>>> > } >>>>>>>>>> > Any hint would be highly appreciated. >>>>>>>>>> > Thank You! >>>>>>>>>> > ~V >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Todd Lipcon >>>>>>>>>> Software Engineer, Cloudera >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards >>>>>>>>> Anty Rao >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Todd Lipcon >>>>>>>> Software Engineer, Cloudera >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards >>>>>>> Anty Rao >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Todd Lipcon >>>>>> Software Engineer, Cloudera >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> Anty Rao >>>>> >>>> >>> >>> >>> >>> -- >>> Joseph Echeverria >>> Cloudera, Inc. >>> 443.305.9434 >>> >> > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >