[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402522#comment-13402522
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4346:
-----------------------------------------------

@Arun, 

I'm working with Ahmed on this one. 

The use case we have is large clusters running 1000+ concurrent jobs, 
monitoring agents are querying the cluster for jobs in different statuses, most 
of the times this agents focus on running/just finished jobs. With the current 
API we are forced to query ALL jobs, including retired jobs (which increases 
significantly the number of jobs being returned), and do the filtering in the 
client side. This creates unnecessary load on the JT (serializing all jobs) and 
on the client (deserializing all jobs). Thus adding this new API, which does 
not break backwards compatibility will definitely help reducing this load. 

Regarding the support in MRv2, we currently have a the getAllJobs() method 
there as well, we can address it in the client side for sure (the fallback 
implementation Ahmed did in the client for MRv1). We could add and PB call to 
support the filtering on the RM side. While looking at MRv2 code I've noticed 
we are only querying the RM, this means that completed jobs will never be 
returned by this call. If I'm correct here, a solution would be for the client 
to call the HS to ask for jobs younger than X; this would be the equivalent of 
'retired' jobs, and definitely the filtering would be useful as well for the 
same reasons explained above.
 
                
> Adding a refined version of JobTracker.getAllJobs() and exposing through the 
> JobClient
> --------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4346
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>         Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, 
> MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch
>
>
> The current implementation for JobTracker.getAllJobs() returns all submitted 
> jobs in any state, in addition to retired jobs. This list can be long and 
> represents an unneeded overhead especially in the case of clients only 
> interested in jobs in specific state(s). 
> It is beneficial to include a refined version where only jobs having specific 
> statuses are returned and retired jobs are optional to include. 
> I'll be uploading an initial patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to