[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eirik Bakke updated MAPREDUCE-4435:
-----------------------------------

    Attachment: mapreduce-4435-branch-1.0_with_test.patch

Hi, Arun.

Sorry for the delayed response--I no longer work at Cloudera. It would be nice 
to get this patch through, still.

Attached is a new patch with an updated TestJobTrackerInstrumentation unit 
test. Since SleepJob doesn't sleep during the sort phase, I added a 
SleepingComparator to trigger the occupiedReduceSlotsSortPhase metric during 
the test. I'd generally try to avoid timing-dependent tests, but since 
TestJobTrackerInstrumentation already uses Thread.sleep() through SleepJob, it 
seemed permissible to do so here.

About caching countRunningTasksByPhase(): I don't see any guarantees anywhere 
about taskReports or TaskStatus being immutable, so I'd be hesitant to do that. 
But neither of the analogous existing methods do it either, and the performance 
difference should be negligible (allocation-wise it's only an EnumMap with a 
constant number of entries). If you disagree, let me know, and I'll change it.
                
> Expose JobTracker metrics for number of reducers in shuffle vs. sort vs. 
> reduce phase
> -------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4435
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4435
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, tasktracker
>            Reporter: Eirik Bakke
>         Attachments: mapreduce-4435-branch-1.0_with_test.patch, 
> mapreduce.patch
>
>
> We'd like to be able to show our Cloudera Manager users some more detailed 
> metrics about the number of reducers running at any given time--specifically, 
> how many reducers are running in each of the three possible phases (shuffle, 
> sort, and reduce). This would require the addition of some new overridable 
> methods to the JobTrackerInstrumentation API, plus a little bit of code to 
> actually call them from the JobTracker class. The necessary information seems 
> to already be available in the TaskStatus object. The attached patch (which 
> I've tested on hadoop-common/branch-1.0) shows one way to do it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to