I'm a total newbie @ Hadoop and and trying to follow an example (a Useful
Partitioner Class) on the Hadoop Streaming Wiki, but with my data. So I
have data like this:

520460379 1 14067 759015 1142 3 1 8.8
520460380 1 120543 2759354 1142 0 0 0
520460381 3 120543 2759352 1142 0 0 0
520460382 3 12660 679569 1142 0 0 0
520460383 1 120543 2759355 1142 0 0 0
520460384 3 120543 2759353 1142 0 0 0
520460385 1 120575 2759568 1142 0 0 0
520460386 3 120575 2759570 1142 0 0 0
520460387 1 120575 2759569 1142 0 0 0

and I'm trying to run a streaming job that partitions all the keys together
based on field 2 and field 3.  So for example 1 120543 2759354 and 1
120543 2759355 would
go to the same partitioner, and the output key(s) would be something
like 1.120543 .  I'm trying the following command but get an error:

$HADOOP_HOME/bin/hadoop  jar
$HADOOP_HOME/contrib/streaming/hadoop-0.20.2-streaming.jar \
-D stream.map.output.field.separator=. \
-D stream.num.map.output.key.fields=2 \
-D mapreduce.map.output.key.field.separator=. \
-D mapreduce.partition.keypartitioner.options=-k1,2 \
-D mapreduce.job.reduces=1 \
-input $HOME/temp/foo \
-output dank_phase0 \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer \
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner


11/11/02 22:45:05 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
11/11/02 22:45:05 WARN mapred.JobClient: No job jar file set.  User classes
may not be found. See JobConf(Class) or JobConf#setJar(String).
11/11/02 22:45:05 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/11/02 22:45:06 INFO streaming.StreamJob: getLocalDirs():
[/tmp/hadoop-dyoung/mapred/local]
11/11/02 22:45:06 INFO streaming.StreamJob: Running job: job_local_0001
11/11/02 22:45:06 INFO streaming.StreamJob: Job running in-process (local
Hadoop)
11/11/02 22:45:06 INFO mapred.FileInputFormat: Total input paths to process
: 1
11/11/02 22:45:07 INFO mapred.MapTask: numReduceTasks: 1
11/11/02 22:45:07 INFO mapred.MapTask: io.sort.mb = 200
11/11/02 22:45:07 INFO mapred.MapTask: data buffer = 159383552/199229440
11/11/02 22:45:07 INFO mapred.MapTask: record buffer = 524288/655360
11/11/02 22:45:07 WARN mapred.LocalJobRunner: job_local_0001
java.io.IOException: Type mismatch in key from map: expected
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:845)
at
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:40)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
11/11/02 22:45:07 INFO streaming.StreamJob:  map 0%  reduce 0%
11/11/02 22:45:07 INFO streaming.StreamJob: Job running in-process (local
Hadoop)
11/11/02 22:45:07 ERROR streaming.StreamJob: Job not Successful!
11/11/02 22:45:07 INFO streaming.StreamJob: killJob...
Streaming Job Failed!

I've tried a number of permutations of what's on the Hadoop Wiki, but I'm
still having the error. Does anyone have any insight into what I'm doing
wrong?

Regards,

Dan

Reply via email to