ah, thanks, that got it. now I'm at the same point you are - part-00000.deflate is there and is not readable. Seems like I should see text output, right?
On Mon, Mar 29, 2010 at 2:04 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Under hadoop-0.20.2/src/contrib/data_join, run > ant jar-examples > > You may need to rename the jars > (hadoop-\$\{version\}-datajoin-examples.jar): > [r...@tyu-linux datajoin]# ls > classes examples hadoop-0.20.2-datajoin-examples.jar > hadoop-0.20.2-datajoin.jar input output test > > On Mon, Mar 29, 2010 at 1:59 PM, M B <machac...@gmail.com> wrote: > > > I don't see hadoop-0.20.2-datajoin-examples.jar in the > > build/contrib/datajoin directory. Is that a jar you created separately? > I > > tried creating one, but it still doesn't run (the mappers show the same > > error of missing the classes). > > > > had...@hadoop01:/opt/hadoop-0.20.2/build/contrib/datajoin$ ls > > classes examples test > > > > > > On Mon, Mar 29, 2010 at 9:26 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > I can run the sample (I created the input files according to > > > > > > > > > contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/README.txt): > > > > > > [r...@tyu-linux datajoin]# pwd > > > /opt/ks/hadoop-0.20.2/build/contrib/datajoin > > > [r...@tyu-linux datajoin]# /opt/ks/hadoop-0.20.2/bin/hadoop jar > > > hadoop-0.20.2-datajoin-examples.jar > > > org.apache.hadoop.contrib.utils.join.DataJoinJob input output Text 1 > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text > > > Using TextInputFormat: Text > > > Using TextOutputFormat: Text > > > 10/03/29 09:01:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with > > > processName=JobTracker, sessionId= > > > 10/03/29 09:01:30 WARN mapred.JobClient: Use GenericOptionsParser for > > > parsing the arguments. Applications should implement Tool for the same. > > > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to > > process > > > : 2 > > > Job job_local_0001 is submitted > > > Job job_local_0001 is still running. > > > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to > > process > > > : 2 > > > 10/03/29 09:01:31 INFO mapred.MapTask: numReduceTasks: 1 > > > 10/03/29 09:01:31 INFO mapred.MapTask: io.sort.mb = 100 > > > 10/03/29 09:01:31 INFO mapred.MapTask: data buffer = 79691776/99614720 > > > 10/03/29 09:01:31 INFO mapred.MapTask: record buffer = 262144/327680 > > > 10/03/29 09:01:31 INFO mapred.MapTask: Starting flush of map output > > > 10/03/29 09:01:31 INFO mapred.MapTask: Finished spill 0 > > > 10/03/29 09:01:32 INFO mapred.TaskRunner: > > > Task:attempt_local_0001_m_000000_0 > > > is done. And is in the process of commiting > > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount 6 > > > totalCount 6 > > > > > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task > > > 'attempt_local_0001_m_000000_0' done. > > > 10/03/29 09:01:32 INFO mapred.MapTask: numReduceTasks: 1 > > > 10/03/29 09:01:32 INFO mapred.MapTask: io.sort.mb = 100 > > > 10/03/29 09:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720 > > > 10/03/29 09:01:32 INFO mapred.MapTask: record buffer = 262144/327680 > > > 10/03/29 09:01:32 INFO mapred.MapTask: Starting flush of map output > > > 10/03/29 09:01:32 INFO mapred.MapTask: Finished spill 0 > > > 10/03/29 09:01:32 INFO mapred.TaskRunner: > > > Task:attempt_local_0001_m_000001_0 > > > is done. And is in the process of commiting > > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount 5 > > > totalCount 5 > > > > > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task > > > 'attempt_local_0001_m_000001_0' done. > > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: > > > 10/03/29 09:01:32 INFO mapred.Merger: Merging 2 sorted segments > > > 10/03/29 09:01:32 INFO mapred.Merger: Down to the last merge-pass, with > 2 > > > segments left of total size: 939 bytes > > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: > > > 10/03/29 09:01:32 INFO util.NativeCodeLoader: Loaded the native-hadoop > > > library > > > 10/03/29 09:01:32 INFO zlib.ZlibFactory: Successfully loaded & > > initialized > > > native-zlib library > > > 10/03/29 09:01:32 INFO datajoin.job: key: A.a11 > this.largestNumOfValues: > > 3 > > > 10/03/29 09:01:32 INFO mapred.TaskRunner: > > > Task:attempt_local_0001_r_000000_0 > > > is done. And is in the process of commiting > > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: > > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task > > > attempt_local_0001_r_000000_0 > > > is allowed to commit now > > > 10/03/29 09:01:32 INFO mapred.FileOutputCommitter: Saved output of task > > > 'attempt_local_0001_r_000000_0' to > > > file:/opt/kindsight/hadoop-0.20.2/build/contrib/datajoin/output > > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: actuallyCollectedCount > 5 > > > collectedCount 7 > > > groupCount 6 > > > > reduce > > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task > > > 'attempt_local_0001_r_000000_0' done. > > > [r...@tyu-linux datajoin]# date > > > Mon Mar 29 09:02:37 PDT 2010 > > > > > > It took a minute between the last INFO log and exit of DataJoinJob. > > > > > > Cheers > > > > > > On Mon, Mar 29, 2010 at 8:26 AM, M B <machac...@gmail.com> wrote: > > > > > > > Sorry, I should have mentioned that I tried that as well and it also > > > gives > > > > an error: > > > > > > > > $ <p...@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar > > > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input > > > > datajoin/output Text 1 > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text > > > > Exception in thread "main" java.io.IOException: Error opening job > jar: > > > > -libjars > > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:90) > > > > Caused by: java.util.zip.ZipException: error in opening zip file > > > > at java.util.zip.ZipFile.open(Native Method) > > > > at java.util.zip.ZipFile.<init>(ZipFile.java:114) > > > > at java.util.jar.JarFile.<init>(JarFile.java:133) > > > > at java.util.jar.JarFile.<init>(JarFile.java:70) > > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:88) > > > > Has something changed or is my environment not set up correctly? > > > > Appreciate > > > > any help. > > > > > > > > > > > > > > > > On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > > > Then use the syntax given by > > > > > > > > > > > > > > > > > > > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html > > > > > : > > > > > > > > > > $ bin/hadoop jar -libjars ./samplejoin.jar > > > > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ... > > > > > > > > > > On Fri, Mar 26, 2010 at 5:10 PM, M B <machac...@gmail.com> wrote: > > > > > > > > > > > Sorry, but where exactly do I include the libjars option? I > tried > > to > > > > put > > > > > > it > > > > > > where you stated (after the DataJoinJob class), but it just comes > > > back > > > > > with > > > > > > usage information (as if the option is not valid): > > > > > > $ <p...@hadoop01:~/hadoop_tests$> hadoop jar > > > > > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar > > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars > > > > > ./samplejoin.jar > > > > > > datajoin/input datajoin/output Text 1 > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text > > > > > > *usage: DataJoinJob inputdirs outputdir map_input_file_format > > > > numofParts > > > > > > mapper_class reducer_class map_output_value_class > > output_value_class > > > > > > [maxNumOfValuesPerGroup [descriptionOfJob]]]* > > > > > > > > > > > > It seems like it's not taking the option for some reason, like > it's > > > > > failing > > > > > > an argument check in DataJoinJob - does that not use the standard > > > args > > > > or > > > > > > something? > > > > > > > > > > > > > > > > > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yuzhih...@gmail.com> > > wrote: > > > > > > > > > > > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is > > in > > > > your > > > > > > > HADOOP_CLASSPATH > > > > > > > > > > > > > > I think you should specify samplejoin.jar using -libjars > instead > > of > > > > > > putting > > > > > > > it directly after jar command: > > > > > > > hadoop jar hadoop-0.20.2-datajoin.jar > > > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars > > > > > > ./samplejoin.jar > > > > > > > ... (same as your example) > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <machac...@gmail.com> > > wrote: > > > > > > > > > > > > > > > I may be having a setup issue with classpaths, would > appreciate > > > > some > > > > > > > help. > > > > > > > > > > > > > > > > I created a jar with all the Sample* classes in > > contrib/DataJoin. > > > > > Here > > > > > > > is > > > > > > > > the listing of my samplejoin.jar file: > > > > > > > > " zip.vim version v22 > > > > > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar > > > > > > > > " Select a file with cursor and press ENTER > > > > > > > > META-INF/ > > > > > > > > META-INF/MANIFEST.MF > > > > > > > > org/ > > > > > > > > org/apache/ > > > > > > > > org/apache/hadoop/ > > > > > > > > org/apache/hadoop/contrib/ > > > > > > > > org/apache/hadoop/contrib/utils/ > > > > > > > > org/apache/hadoop/contrib/utils/join/ > > > > > > > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class > > > > > > > > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class > > > > > > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class > > > > > > > > > > > > > > > > When I go to run this, things start to run, but every Map try > > > > errors > > > > > > out > > > > > > > > with: > > > > > > > > "java.lang.RuntimeException: > java.lang.ClassNotFoundException: > > > > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput" > > > > > > > > > > > > > > > > Here is the command: > > > > > > > > hadoop jar ./samplejoin.jar > > > > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob > > > > > > > > datajoin/input datajoin/output Text 1 > > > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper > > > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer > > > > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput > Text > > > > > > > > > > > > > > > > This is a new install of 0.20.2. > > > > > > > > > > > > > > > > HADOOP_CLASSPATH is set > > > > > > > > to: > > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar > > > > > > > > Any help would be appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >