Hi, According to the attached image found on yahoo's hadoop tutorial<http://developer.yahoo.com/hadoop/tutorial/module4.html>, the order of operations is map > combine > partition which should be followed by reduce
Here is my an example key emmited by the map operation LongValueSum:geo_US|1311722400|E 1 This should get combined with other keys as geo_US|1311722400|E 100 (assuming there are 100 keys of the same type) Then i'd like to partition the keys by the value before the first pipe(|) http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Working+with+the+Hadoop+Aggregate+Package+%28the+-reduce+aggregate+option%29 geo_US so here's my streaming command hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-0.20.203.0.jar \ -D mapred.reduce.tasks=8 \ -D stream.num.map.output.key.fields=1 \ -D mapred.text.key.partitioner.options=-k1,1 \ -D stream.map.output.field.separator=\| \ -file mapper.py \ -mapper mapper.py \ -file reducer.py \ -reducer reducer.py \ -combiner org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer \ -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \ -input input_file \ -output output_path This is the error I get java.lang.NumberFormatException: For input string: "1311722400|E 1" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:419) at java.lang.Long.parseLong(Long.java:468)* at org.apache.hadoop.mapred.lib.aggregate.LongValueSum.addNextValue(LongValueSum.java:48) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:59) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:35)* at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1349) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) I think its because the partitioner is running before the combiner. Any thoughts? -- Regards, Premal Shah.