ah, thanks, that got it.  now I'm at the same point you are -
part-00000.deflate is there and is not readable.  Seems like I should see
text output, right?

On Mon, Mar 29, 2010 at 2:04 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Under hadoop-0.20.2/src/contrib/data_join, run
> ant jar-examples
>
> You may need to rename the jars
> (hadoop-\$\{version\}-datajoin-examples.jar):
> [r...@tyu-linux datajoin]# ls
> classes  examples  hadoop-0.20.2-datajoin-examples.jar
> hadoop-0.20.2-datajoin.jar  input  output  test
>
> On Mon, Mar 29, 2010 at 1:59 PM, M B <machac...@gmail.com> wrote:
>
> > I don't see hadoop-0.20.2-datajoin-examples.jar in the
> > build/contrib/datajoin directory.  Is that a jar you created separately?
>  I
> > tried creating one, but it still doesn't run (the mappers show the same
> > error of missing the classes).
> >
> > had...@hadoop01:/opt/hadoop-0.20.2/build/contrib/datajoin$ ls
> > classes  examples  test
> >
> >
> > On Mon, Mar 29, 2010 at 9:26 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > I can run the sample (I created the input files according to
> > >
> > >
> >
> contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/README.txt):
> > >
> > > [r...@tyu-linux datajoin]# pwd
> > > /opt/ks/hadoop-0.20.2/build/contrib/datajoin
> > > [r...@tyu-linux datajoin]# /opt/ks/hadoop-0.20.2/bin/hadoop jar
> > > hadoop-0.20.2-datajoin-examples.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob input output Text 1
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > Using TextInputFormat: Text
> > > Using TextOutputFormat: Text
> > > 10/03/29 09:01:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> > > processName=JobTracker, sessionId=
> > > 10/03/29 09:01:30 WARN mapred.JobClient: Use GenericOptionsParser for
> > > parsing the arguments. Applications should implement Tool for the same.
> > > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to
> > process
> > > : 2
> > > Job job_local_0001 is submitted
> > > Job job_local_0001 is still running.
> > > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to
> > process
> > > : 2
> > > 10/03/29 09:01:31 INFO mapred.MapTask: numReduceTasks: 1
> > > 10/03/29 09:01:31 INFO mapred.MapTask: io.sort.mb = 100
> > > 10/03/29 09:01:31 INFO mapred.MapTask: data buffer = 79691776/99614720
> > > 10/03/29 09:01:31 INFO mapred.MapTask: record buffer = 262144/327680
> > > 10/03/29 09:01:31 INFO mapred.MapTask: Starting flush of map output
> > > 10/03/29 09:01:31 INFO mapred.MapTask: Finished spill 0
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner:
> > > Task:attempt_local_0001_m_000000_0
> > > is done. And is in the process of commiting
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    6
> > > totalCount      6
> > >
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > > 'attempt_local_0001_m_000000_0' done.
> > > 10/03/29 09:01:32 INFO mapred.MapTask: numReduceTasks: 1
> > > 10/03/29 09:01:32 INFO mapred.MapTask: io.sort.mb = 100
> > > 10/03/29 09:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720
> > > 10/03/29 09:01:32 INFO mapred.MapTask: record buffer = 262144/327680
> > > 10/03/29 09:01:32 INFO mapred.MapTask: Starting flush of map output
> > > 10/03/29 09:01:32 INFO mapred.MapTask: Finished spill 0
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner:
> > > Task:attempt_local_0001_m_000001_0
> > > is done. And is in the process of commiting
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    5
> > > totalCount      5
> > >
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > > 'attempt_local_0001_m_000001_0' done.
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> > > 10/03/29 09:01:32 INFO mapred.Merger: Merging 2 sorted segments
> > > 10/03/29 09:01:32 INFO mapred.Merger: Down to the last merge-pass, with
> 2
> > > segments left of total size: 939 bytes
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> > > 10/03/29 09:01:32 INFO util.NativeCodeLoader: Loaded the native-hadoop
> > > library
> > > 10/03/29 09:01:32 INFO zlib.ZlibFactory: Successfully loaded &
> > initialized
> > > native-zlib library
> > > 10/03/29 09:01:32 INFO datajoin.job: key: A.a11
> this.largestNumOfValues:
> > 3
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner:
> > > Task:attempt_local_0001_r_000000_0
> > > is done. And is in the process of commiting
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > > attempt_local_0001_r_000000_0
> > > is allowed to commit now
> > > 10/03/29 09:01:32 INFO mapred.FileOutputCommitter: Saved output of task
> > > 'attempt_local_0001_r_000000_0' to
> > > file:/opt/kindsight/hadoop-0.20.2/build/contrib/datajoin/output
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: actuallyCollectedCount
>  5
> > > collectedCount  7
> > > groupCount      6
> > >  > reduce
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > > 'attempt_local_0001_r_000000_0' done.
> > > [r...@tyu-linux datajoin]# date
> > > Mon Mar 29 09:02:37 PDT 2010
> > >
> > > It took a minute between the last INFO log and exit of DataJoinJob.
> > >
> > > Cheers
> > >
> > > On Mon, Mar 29, 2010 at 8:26 AM, M B <machac...@gmail.com> wrote:
> > >
> > > > Sorry, I should have mentioned that I tried that as well and it also
> > > gives
> > > > an error:
> > > >
> > > > $ <p...@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
> > >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
> > > > datajoin/output Text 1
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > Exception in thread "main" java.io.IOException: Error opening job
> jar:
> > > > -libjars
> > > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> > > > Caused by: java.util.zip.ZipException: error in opening zip file
> > > >        at java.util.zip.ZipFile.open(Native Method)
> > > >        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
> > > >        at java.util.jar.JarFile.<init>(JarFile.java:133)
> > > >        at java.util.jar.JarFile.<init>(JarFile.java:70)
> > > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> > > > Has something changed or is my environment not set up correctly?
> > > >  Appreciate
> > > > any help.
> > > >
> > > >
> > > >
> > > > On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > > >
> > > > > Then use the syntax given by
> > > > >
> > > > >
> > > >
> > >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
> > > > > :
> > > > >
> > > > > $ bin/hadoop jar -libjars ./samplejoin.jar
> > > > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
> > > > >
> > > > > On Fri, Mar 26, 2010 at 5:10 PM, M B <machac...@gmail.com> wrote:
> > > > >
> > > > > > Sorry, but where exactly do I include the libjars option?  I
> tried
> > to
> > > > put
> > > > > > it
> > > > > > where you stated (after the DataJoinJob class), but it just comes
> > > back
> > > > > with
> > > > > > usage information (as if the option is not valid):
> > > > > > $ <p...@hadoop01:~/hadoop_tests$> hadoop jar
> > > > >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > > > ./samplejoin.jar
> > > > > > datajoin/input datajoin/output Text 1
> > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > > > *usage: DataJoinJob inputdirs outputdir map_input_file_format
> > > > numofParts
> > > > > > mapper_class reducer_class map_output_value_class
> > output_value_class
> > > > > > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
> > > > > >
> > > > > > It seems like it's not taking the option for some reason, like
> it's
> > > > > failing
> > > > > > an argument check in DataJoinJob - does that not use the standard
> > > args
> > > > or
> > > > > > something?
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yuzhih...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is
> > in
> > > > your
> > > > > > > HADOOP_CLASSPATH
> > > > > > >
> > > > > > > I think you should specify samplejoin.jar using -libjars
> instead
> > of
> > > > > > putting
> > > > > > > it directly after jar command:
> > > > > > > hadoop jar hadoop-0.20.2-datajoin.jar
> > > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > > > > ./samplejoin.jar
> > > > > > > ... (same as your example)
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <machac...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > I may be having a setup issue with classpaths, would
> appreciate
> > > > some
> > > > > > > help.
> > > > > > > >
> > > > > > > > I created a jar with all the Sample* classes in
> > contrib/DataJoin.
> > > > >  Here
> > > > > > > is
> > > > > > > > the listing of my samplejoin.jar file:
> > > > > > > > " zip.vim version v22
> > > > > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > > > > > > " Select a file with cursor and press ENTER
> > > > > > > > META-INF/
> > > > > > > > META-INF/MANIFEST.MF
> > > > > > > > org/
> > > > > > > > org/apache/
> > > > > > > > org/apache/hadoop/
> > > > > > > > org/apache/hadoop/contrib/
> > > > > > > > org/apache/hadoop/contrib/utils/
> > > > > > > > org/apache/hadoop/contrib/utils/join/
> > > > > > > >
> > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > > > > > >
> > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > > > > > >
> org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > > > > > > >
> > > > > > > > When I go to run this, things start to run, but every Map try
> > > > errors
> > > > > > out
> > > > > > > > with:
> > > > > > > > "java.lang.RuntimeException:
> java.lang.ClassNotFoundException:
> > > > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > > > > > > >
> > > > > > > > Here is the command:
> > > > > > > > hadoop jar ./samplejoin.jar
> > > > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > > > > > > datajoin/input datajoin/output Text 1
> > > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput
> Text
> > > > > > > >
> > > > > > > > This is a new install of 0.20.2.
> > > > > > > >
> > > > > > > > HADOOP_CLASSPATH is set
> > > > > > > > to:
> > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > > > > Any help would be appreciated.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to