[ https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402522#comment-13402522 ]
Alejandro Abdelnur commented on MAPREDUCE-4346: ----------------------------------------------- @Arun, I'm working with Ahmed on this one. The use case we have is large clusters running 1000+ concurrent jobs, monitoring agents are querying the cluster for jobs in different statuses, most of the times this agents focus on running/just finished jobs. With the current API we are forced to query ALL jobs, including retired jobs (which increases significantly the number of jobs being returned), and do the filtering in the client side. This creates unnecessary load on the JT (serializing all jobs) and on the client (deserializing all jobs). Thus adding this new API, which does not break backwards compatibility will definitely help reducing this load. Regarding the support in MRv2, we currently have a the getAllJobs() method there as well, we can address it in the client side for sure (the fallback implementation Ahmed did in the client for MRv1). We could add and PB call to support the filtering on the RM side. While looking at MRv2 code I've noticed we are only querying the RM, this means that completed jobs will never be returned by this call. If I'm correct here, a solution would be for the client to call the HS to ask for jobs younger than X; this would be the equivalent of 'retired' jobs, and definitely the filtering would be useful as well for the same reasons explained above. > Adding a refined version of JobTracker.getAllJobs() and exposing through the > JobClient > -------------------------------------------------------------------------------------- > > Key: MAPREDUCE-4346 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1 > Reporter: Ahmed Radwan > Assignee: Ahmed Radwan > Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, > MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch > > > The current implementation for JobTracker.getAllJobs() returns all submitted > jobs in any state, in addition to retired jobs. This list can be long and > represents an unneeded overhead especially in the case of clients only > interested in jobs in specific state(s). > It is beneficial to include a refined version where only jobs having specific > statuses are returned and retired jobs are optional to include. > I'll be uploading an initial patch momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira