Shrinivas,

The MapOutputBuffer.compare() is the primary method for all Map-output
key comparisons (for the sorting done at map end).

Your job.getOutputKeyComparator() is the one utilized within that for
comparing all KV pairs as they are emitted out by the Mapper for
Reduce's-consumption (and further merge sort, again).

I'd look at the job's comparator's method to figure why it may be
slow. Ideally, for it to be fast enough, it should not deserialize
(and hence be 'raw'). Avro does that quite nicely iirc.

On Wed, Sep 21, 2011 at 12:24 AM, Shrinivas Joshi <jshrini...@gmail.com> wrote:
> With JVM inlining enabled, profiles of Terasort run show more than 3% time
> spent in MapTask$MapOutputBuffer.compare( ) method in each of the Map JVMs.
> In this particular configuration there are 8 Map JVMs. So a big chunk of
> time is spent in this particular method. With JVM inlining disabled,
> java.nio.Bits.getIntB( ), java.nio.HeapByteBuffer.get( ) and
> MapTask$MapOutputBuffer.compare( ) methods become hot spots in that order.
> Looking at the code for compare( ) method above observations make sense.
>
> I would appreciate if someone could explain what exactly is the role of the
> compare( ) method. May be there is some config property which would reduce
> the number of times this method gets called?
>
> Thanks for your time.
>
> -Shrinivas
>



-- 
Harsh J

Reply via email to