Hello Praveen, I'm using 0.20.2. I can try it with 0.21 this morning when I get into the office
Regards, Dan On Nov 2, 2011 11:47 PM, "Praveen Sripati" <praveensrip...@gmail.com> wrote: > Dan, > > It is a known bug (https://issues.apache.org/jira/browse/MAPREDUCE-1888) > which has been identified in 0.21.0 release. Which Hadoop release are you > using? > > Thanks, > Praveen > > On Thu, Nov 3, 2011 at 10:22 AM, Dan Young <danoyo...@gmail.com> wrote: > >> I'm a total newbie @ Hadoop and and trying to follow an example (a Useful >> Partitioner Class) on the Hadoop Streaming Wiki, but with my data. So I >> have data like this: >> >> 520460379 1 14067 759015 1142 3 1 8.8 >> 520460380 1 120543 2759354 1142 0 0 0 >> 520460381 3 120543 2759352 1142 0 0 0 >> 520460382 3 12660 679569 1142 0 0 0 >> 520460383 1 120543 2759355 1142 0 0 0 >> 520460384 3 120543 2759353 1142 0 0 0 >> 520460385 1 120575 2759568 1142 0 0 0 >> 520460386 3 120575 2759570 1142 0 0 0 >> 520460387 1 120575 2759569 1142 0 0 0 >> >> and I'm trying to run a streaming job that partitions all the keys >> together based on field 2 and field 3. So for example 1 120543 >> 2759354 and 1 120543 2759355 would go to the same partitioner, and the >> output key(s) would be something like 1.120543 . I'm trying the following >> command but get an error: >> >> $HADOOP_HOME/bin/hadoop jar >> $HADOOP_HOME/contrib/streaming/hadoop-0.20.2-streaming.jar \ >> -D stream.map.output.field.separator=. \ >> -D stream.num.map.output.key.fields=2 \ >> -D mapreduce.map.output.key.field.separator=. \ >> -D mapreduce.partition.keypartitioner.options=-k1,2 \ >> -D mapreduce.job.reduces=1 \ >> -input $HOME/temp/foo \ >> -output dank_phase0 \ >> -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ >> -reducer org.apache.hadoop.mapred.lib.IdentityReducer \ >> -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner >> >> >> 11/11/02 22:45:05 INFO jvm.JvmMetrics: Initializing JVM Metrics with >> processName=JobTracker, sessionId= >> 11/11/02 22:45:05 WARN mapred.JobClient: No job jar file set. User >> classes may not be found. See JobConf(Class) or JobConf#setJar(String). >> 11/11/02 22:45:05 INFO mapred.FileInputFormat: Total input paths to >> process : 1 >> 11/11/02 22:45:06 INFO streaming.StreamJob: getLocalDirs(): >> [/tmp/hadoop-dyoung/mapred/local] >> 11/11/02 22:45:06 INFO streaming.StreamJob: Running job: job_local_0001 >> 11/11/02 22:45:06 INFO streaming.StreamJob: Job running in-process (local >> Hadoop) >> 11/11/02 22:45:06 INFO mapred.FileInputFormat: Total input paths to >> process : 1 >> 11/11/02 22:45:07 INFO mapred.MapTask: numReduceTasks: 1 >> 11/11/02 22:45:07 INFO mapred.MapTask: io.sort.mb = 200 >> 11/11/02 22:45:07 INFO mapred.MapTask: data buffer = 159383552/199229440 >> 11/11/02 22:45:07 INFO mapred.MapTask: record buffer = 524288/655360 >> 11/11/02 22:45:07 WARN mapred.LocalJobRunner: job_local_0001 >> java.io.IOException: Type mismatch in key from map: expected >> org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:845) >> at >> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466) >> at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >> 11/11/02 22:45:07 INFO streaming.StreamJob: map 0% reduce 0% >> 11/11/02 22:45:07 INFO streaming.StreamJob: Job running in-process (local >> Hadoop) >> 11/11/02 22:45:07 ERROR streaming.StreamJob: Job not Successful! >> 11/11/02 22:45:07 INFO streaming.StreamJob: killJob... >> Streaming Job Failed! >> >> I've tried a number of permutations of what's on the Hadoop Wiki, but I'm >> still having the error. Does anyone have any insight into what I'm doing >> wrong? >> >> Regards, >> >> Dan >> >> >