[ 
https://issues.apache.org/jira/browse/HADOOP-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HADOOP-1485:
--------------------------------

    Attachment: 1485.1.patch

Thanks David for the reiew. Looks like I had made a couple of careless 
copy/paste errors in my previous patch. This patch fixes all those and the 
other issues pointed out, and is also up-to-date with the trunk.
I forgot to mention the last time the metrics that I added for the shuffle 
phase.
The shuffle metrics is given out by the TaskTracker and the ReduceTask. 
The TaskTracker side is handled by a class called ShuffleServerMetrics and it 
reports the following metrics:
   (a) shuffle_handler_busy_percent  [this tells us how busy the servlet 
handler is] 
   (b) shuffle_output_bytes [the number of map output bytes read from map 
output files]
   (c) shuffle_failed_outputs [the number of map output sends that failed] 
   (d) shuffle_success_outputs [the number of map output sends that succeeded 
from the server's point of view]
   These metrics are tagged with the "sessionId" (there is little to gain by 
tagging them with something like "user" since the tasktracker can potentially 
serve outputs for maps belonging to different-jobs/different-users 
concurrently).

The ReduceTask side is handled by a class called ShuffleClientMetrics and it 
reports the following metrics:
   (a) shuffle_fetchers_busy_percent [this tells us how busy the map output 
copier subsystem is]
   (b) shuffle_input_bytes [the number of map output bytes read off the wire]
   (c) shuffle_failed_fetches [the number of failed fetches]
   (d) shuffle_success_fetches [the number of successful fetches]
   These metrics are tagged with "user", "jobName", "jobId", "taskId", 
"sessionId".

> Metrics should be there for reporting shuffle failures/successes
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1485
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1485
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.14.0
>
>         Attachments: 1485.1.patch, 1485.1.patch, shuffle-metrics.patch
>
>
> It would be nice to have metrics for the shuffle phase which reports the 
> failures/successes for the fetches. This would aid in performance tests and 
> in debugging (shuffle).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to