Does anyone have a suggestion for implementing an Job using the
o.a.avro.mapred classes where it is necessary to maintain a key and
(logical) value? For example, consider WordCount with a combiner. If
two counts of the same word is seen, then the combiner would emit an
avro record worth a count of two. This would no longer equal the
record with a count of one and presuming in a separate map task that
word was seen once, the partitioner might send it to a different
reduce task. This would cause the word to appear twice in the reduce
outputs with different counts. I'm considering sub-classing
AvroKeyComparator to have it compare the datum of a field in the
record rather than the datum itself, although this approach is
necessarily job specific. Any other thoughts?

Thanks,

Jacob Rideout

Reply via email to