[ 
https://issues.apache.org/jira/browse/OOZIE-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905610#comment-15905610
 ] 

Satish Subhashrao Saley commented on OOZIE-2807:
------------------------------------------------

Liked Robert's suggestions. addRMDelegationToken method will be only place for 
adding tokens. Would be easier to track duplicate additions. Updated the patch.

> Oozie gets RM delegation token even for checking job status
> -----------------------------------------------------------
>
>                 Key: OOZIE-2807
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2807
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Satish Subhashrao Saley
>             Fix For: 5.0.0
>
>         Attachments: OOZIE-2807-1.patch, OOZIE-2807-2.patch, 
> OOZIE-2807-3.patch, OOZIE-2807-4.patch
>
>
> We had one user submitting way too many workflows with single hive query - 
> ~3600 workflows running concurrently. Surprisingly Oozie held up well without 
> issues.
> But [~daryn] from our hadoop team saw that the amount of delegation tokens 
> fetched by Oozie was very high compared to actual number of jobs submitted 
> and was stressing RM with the calls and also pushing it close to its memory 
> limits. This is because we are fetching the delegation token every time we 
> create a JobClient instead of only during job submission.
> https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/service/HadoopAccessorService.java#L503-L519
> So for one job we fetch
> 1) 1 token during submission
> 2) 1 token every 5 minutes when we check status of job
> 3) 1 token after the job ends to retrieve status.
> 4) 1 token if we are killing the job.
> So for a job running for 11 minutes, we would have fetched the token 4 times. 
> May be more in other cases like mapreduce where we check for end of launcher 
> and child job.
> Only 1 out of the token (used in the job submission) will be cancelled after 
> job completes. Other tokens are kind of leaked and will only be cleaned up by 
> RM after the expiry period (24 hrs is default). This can make RM go out of 
> memory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to