Hi,

I was reading through the Streaming documentation 
(http://hadoop.apache.org/core/docs/r0.15.3/streaming.html), and the 
KeyFieldBasedPartitioner example might need some fixing.
First I got errors about Text vs LongWriteable because of the 
IdentityMapper/Reducer, and I changed those to "cat".
Next I believe the partitioner class is using regexes to do the split and the 
"map.output.key.field.separator" needs to be quoted to "\.".

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input myInputDirs \
    -output myOutputDir \
    -mapper cat \
    -reducer cat \
    -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
    -jobconf stream.map.output.field.separator=. \
    -jobconf stream.num.map.output.key.fields=4 \
    -jobconf map.output.key.field.separator="\." \
    -jobconf num.key.fields.for.partition=2 \
    -jobconf mapred.reduce.tasks=12

Ideally though I think this partitioner should be fixed to not use regexes, and 
just use indexOf or some such.

Of course I'm relatively new to Hadoop (which is why I'm reading the 
documentation!), and might just be misunderstanding something here.

Richendra

Reply via email to