Yeah, unfortunately your suggestion does not work, and neither does the
order given on the Pig wiki. Instead, see the Hadoop wiki for -libjars

hadoop jar hadoop-examples.jar wordcount -files cachefile.txt -libjars
mylib.jar input output

So I tried this:
hadoop jar $datagenjar org.apache.pig.test.utils.datagen.DataGenerator -conf
$conf_file -rows 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat
-libjars $zipfjar s:8:50:z:0

However, the DataGenerator does not like it as one of its' options:
Couldn't parse the command line arguments, Found unknown option (-libjars)
at position 5

I'd be happy/surprised to hear from anyone who can use the format given on
the Pig wiki for the DataGenerator, in cluster mode (using -m parameter).

Any more suggestions Dmitry, and thanks for your help, it's mucho


2010/1/14 Dmitriy Ryaboy <>

> Sorry if I am not reading carefully enough -- but the bug report you
> cite seems to indicate you want
> hadoop jar org.apache.pig.test.utils.datagen.DataGenerator -libjars
> $zipfjar $datagenjar -conf $conf_file -rows
> 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0
> (possibly separating zipfjar and datagenjar with commas if that patch
> was applied to your version of 20)
> which I don't see in the list of things you tried?
> -D
> On Thu, Jan 14, 2010 at 10:13 AM, Rob Stewart
> <> wrote:
> > Hi Dmitriy,
> >
> > No, I do think that there was a change in 0.20.0
> >
> > See the error I get:
> > Exception in thread "main" Error opening job jar:
> > -libjars
> >
> > This is what I am trying to run:
> > hadoop jar -libjars $zipfjar $datagenjar
> > org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file -rows
> > 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0
> >
> > The $zipfjar has only one jar file in this classpath. It seems that there
> > was a change to hadoop 0.20.0, not allowing for the option -libjars
> > immediately after "hadoop jar".
> >
> > This is the extract from the Hive bug report I was talking about:
> > -------------
> >
> >
> > In hadoop-20 - the -libjars has to come after the jar file/class
> >
> > Please try applying this patch to bin/ext/
> >
> > ---  (revision 789726)
> > +++  (working copy)
> > @@ -10,7 +10,7 @@
> >     exit 3;
> >   fi
> >
> > -  exec $HADOOP jar $AUX_JARS_CMD_LINE ${HIVE_LIB}/hive_cli.jar $CLASS
> > $HIVE_OPTS "$@"
> > +  exec $HADOOP jar ${HIVE_LIB}/hive_cli.jar $CLASS $AUX_JARS_CMD_LINE
> > $HIVE_OPTS "$@"
> >  }
> >
> > ----------------
> >
> > I have also tried:
> > hadoop jar -libjars [full_location_to_sdsuLibJKD14.jar] $datagenjar
> > org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file -rows
> > 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0
> >
> > This gives the same error.
> >
> >
> >
> > Rob
> >
> > 2010/1/14 Dmitriy Ryaboy <>
> >
> >> I think the link you sent got malformatted, but try separating the
> >> jars with a comma
> >>
> >>
> >> On Thu, Jan 14, 2010 at 7:40 AM, Rob Stewart
> >> <> wrote:
> >> > Hi Dmitriy,
> >> >
> >> > OK, well it seems that since 0.20.0 the order as specified on the Pig
> >> wiki
> >> > is no longer relevant:
> >> > doop jar -libjars $zipfjar $datagenjar
> org.apache.pig.test.utils.datagen.
> >> > DataGenerator </pig/DataGenerator> -conf $conf_file [options]
> colspec...
> >> >
> >> > See this patch over at Hive for 0.20.0:
> >> >
> >> >>
> >> >
> >> > I have tried a few combinations, but I can't seem to fit in the
> "-libjars
> >> > $zipfjar" in anywhere now.
> >> >
> >> > Any ideas?
> >> >
> >> > Thanks for your help.
> >> >
> >> > Rob
> >> >
> >> >
> >> >
> >> >
> >> > 2010/1/14 Dmitriy Ryaboy <>
> >> >
> >> >> Rob,
> >> >> You need to tell Hadoop which jars you need it to ship to the worker
> >> >> nodes. You include datagen.jar, etc, on the classpath, which makes
> >> >> them discoverable locally, but you aren't telling Hadoop to ship
> them.
> >> >> You want to list them, comma-separated, in the -libjars parameter.
> >> >>
> >> >> -D
> >> >>
> >> >> On Thu, Jan 14, 2010 at 6:49 AM, Rob Stewart
> >> >> <> wrote:
> >> >> > Hi there.
> >> >> >
> >> >> > I am well underway with comparing Pig, Hive, JAQL etc...
> >> >> >
> >> >> > The DataGenerator is proving a valuable tool for me. Thanks for
> that.
> >> >> >
> >> >> > I have one query. I am able to use it in local mode, no problem,
> and
> >> some
> >> >> > experiments are complete.
> >> >> >
> >> >> > However, I cannot seem to use it in MapReduce mode on the cluster.
> >> This
> >> >> is
> >> >> > my file "generateData" contents:
> >> >> > ------------------
> >> >> > export pigjar=$HOME/installation/pig/pig-0.5.0/pig-0.5.0-core.jar
> >> >> > export zipfjar=$HOME/installation/pig/pig-0.5.0/sdsuLibJKD14.jar
> >> >> > export
> datagenjar=$HOME/rs46/installation/DataGenerator/dist/MyPig.jar
> >> >> > export conf_file=/usr/lib/hadoop/conf/hadoop-site.xml
> >> >> > export HADOOP_CLASSPATH=$pigjar:$zipfjar:$datagenjar
> >> >> > /usr/lib/hadoop/bin/hadoop jar $datagenjar
> >> >> > org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file -m
> 1
> >> >> -rows
> >> >> > 10000000 -f words.dat s:8:50:z:0
> >> >> > ------------------
> >> >> >
> >> >> > The error I receive when trying to run it with "-m 1" option (in
> >> cluster
> >> >> > mode):
> >> >> > Caused by: java.lang.ClassNotFoundException:
> >> >> >
> >> >> > So in local mode, it successfully picks up the jar file
> >> sdsuLibJKD14.jar
> >> >> ,
> >> >> > but when running it in cluster mode, this classpath is not found?
> >> >> >
> >> >> >
> >> >> > thanks.
> >> >> >
> >> >> > Rob Stewart
> >> >> >
> >> >>
> >> >
> >>
> >

Reply via email to