[
https://issues.apache.org/jira/browse/HADOOP-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj Das updated HADOOP-1485:
--------------------------------
Attachment: 1485.1.patch
Thanks David for the reiew. Looks like I had made a couple of careless
copy/paste errors in my previous patch. This patch fixes all those and the
other issues pointed out, and is also up-to-date with the trunk.
I forgot to mention the last time the metrics that I added for the shuffle
phase.
The shuffle metrics is given out by the TaskTracker and the ReduceTask.
The TaskTracker side is handled by a class called ShuffleServerMetrics and it
reports the following metrics:
(a) shuffle_handler_busy_percent [this tells us how busy the servlet
handler is]
(b) shuffle_output_bytes [the number of map output bytes read from map
output files]
(c) shuffle_failed_outputs [the number of map output sends that failed]
(d) shuffle_success_outputs [the number of map output sends that succeeded
from the server's point of view]
These metrics are tagged with the "sessionId" (there is little to gain by
tagging them with something like "user" since the tasktracker can potentially
serve outputs for maps belonging to different-jobs/different-users
concurrently).
The ReduceTask side is handled by a class called ShuffleClientMetrics and it
reports the following metrics:
(a) shuffle_fetchers_busy_percent [this tells us how busy the map output
copier subsystem is]
(b) shuffle_input_bytes [the number of map output bytes read off the wire]
(c) shuffle_failed_fetches [the number of failed fetches]
(d) shuffle_success_fetches [the number of successful fetches]
These metrics are tagged with "user", "jobName", "jobId", "taskId",
"sessionId".
> Metrics should be there for reporting shuffle failures/successes
> ----------------------------------------------------------------
>
> Key: HADOOP-1485
> URL: https://issues.apache.org/jira/browse/HADOOP-1485
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: Devaraj Das
> Assignee: Devaraj Das
> Fix For: 0.14.0
>
> Attachments: 1485.1.patch, 1485.1.patch, shuffle-metrics.patch
>
>
> It would be nice to have metrics for the shuffle phase which reports the
> failures/successes for the fetches. This would aid in performance tests and
> in debugging (shuffle).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.