[ https://issues.apache.org/jira/browse/OOZIE-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905610#comment-15905610 ]
Satish Subhashrao Saley commented on OOZIE-2807: ------------------------------------------------ Liked Robert's suggestions. addRMDelegationToken method will be only place for adding tokens. Would be easier to track duplicate additions. Updated the patch. > Oozie gets RM delegation token even for checking job status > ----------------------------------------------------------- > > Key: OOZIE-2807 > URL: https://issues.apache.org/jira/browse/OOZIE-2807 > Project: Oozie > Issue Type: Bug > Reporter: Rohini Palaniswamy > Assignee: Satish Subhashrao Saley > Fix For: 5.0.0 > > Attachments: OOZIE-2807-1.patch, OOZIE-2807-2.patch, > OOZIE-2807-3.patch, OOZIE-2807-4.patch > > > We had one user submitting way too many workflows with single hive query - > ~3600 workflows running concurrently. Surprisingly Oozie held up well without > issues. > But [~daryn] from our hadoop team saw that the amount of delegation tokens > fetched by Oozie was very high compared to actual number of jobs submitted > and was stressing RM with the calls and also pushing it close to its memory > limits. This is because we are fetching the delegation token every time we > create a JobClient instead of only during job submission. > https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/service/HadoopAccessorService.java#L503-L519 > So for one job we fetch > 1) 1 token during submission > 2) 1 token every 5 minutes when we check status of job > 3) 1 token after the job ends to retrieve status. > 4) 1 token if we are killing the job. > So for a job running for 11 minutes, we would have fetched the token 4 times. > May be more in other cases like mapreduce where we check for end of launcher > and child job. > Only 1 out of the token (used in the job submission) will be cancelled after > job completes. Other tokens are kind of leaked and will only be cleaned up by > RM after the expiry period (24 hrs is default). This can make RM go out of > memory. -- This message was sent by Atlassian JIRA (v6.3.15#6346)