[ 
https://issues.apache.org/jira/browse/MAPREDUCE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-712:
------------------------------------

    Attachment: MR712-0.patch

RandomTextWriter is probably spending most of its CPU doing its work 
inefficiently, mostly in generateSentence and Text::encode. For each word, 
generateSentence generates a random number, writes a String into a 
StringBuffer, which gets written out as full String, then encoded as Text, then 
it's finally written out after looking up the counters in the Context for that 
particular record. This process generates a *lot* of garbage, so Owen and 
Arun's hypothesis that we're spending an inordinate amount of time in GC seems 
well founded.

The attached should be more sparing of the CPU. Would you mind confirming?

> TextWritter example is CPU bound!!
> ----------------------------------
>
>                 Key: MAPREDUCE-712
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-712
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 0.20.1, 0.21.0
>         Environment: ~200 nodes cluster
> Each node has the following configuration:
> Processors:     2 x Xeon L5420 2.50GHz (8 cores) - Harpertown C0, 64-bit, 
> quad-core (8 CPUs)
> 4 Disks
> 16 GB RAM
> Linux 2.6
> Hadoop version: trunk
>            Reporter: Khaled Elmeleegy
>         Attachments: MR712-0.patch
>
>
> Running the RandomTextWritter example job ( from the examples jar) pegs the 
> machiens' CPUs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to