[ 
https://issues.apache.org/jira/browse/HADOOP-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711384#action_12711384
 ] 

Eric Yang commented on HADOOP-5876:
-----------------------------------

Although shuffling information is available in hadoop metrics form.  It doesn't 
contain the full information.  For example:

Timestamp : 1242815580000
[jobId] :job_200905200055_0930
[jobName] :Chukwa-Demux_20090520_10_32
[recordName] :shuffleInput
[sessionId] :
[shuffle_failed_fetches] :0
[shuffle_fetchers_busy_percent] :0
[shuffle_input_bytes] :10
[shuffle_success_fetches] :5
[taskId] :attempt_200905200055_0930_r_000006_0

The task attempt id doesn't indicate the source of the shuffle.  It is 
difficult to match the corresponding shuffle input and output.  In addition, 
the start time and end time of the suffle would also be useful.  What is the 
easiest way to get this information into Chukwa?

> Shuffling information logged to userlogs/attempt_####_###_r_###_#/syslogs
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5876
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: Redhat EL 5.1, Java 6
>            Reporter: Eric Yang
>            Priority: Critical
>
> Shuffling information is currently logged to userlogs, it would be ideal to 
> have this information consolidated in task tracker log or job history log 
> file for down stream log collection and analysis program (Chukwa) to pickup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to