[ https://issues.apache.org/jira/browse/MAPREDUCE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701351#comment-13701351 ]
Karthik Kambatla commented on MAPREDUCE-5364: --------------------------------------------- Thanks Sid. bq. The addendum patch can cause deadlocks on the call to {{setTimerForTokenRenewal}} Looking at the code, I don't see a deadlock possibility. While a call to {{setTimerForTokenRenewal}} requires a lock on DelegationTokenRenewer.class, I don't see any method holding a lock on DelegationTokenRenewer.class requiring a lock on delegationTokens or cancelled flag. Am I missing something here? bq. A cancelled flag could be used on the DelegationTokenToRenew structure itself. Set intent to cancel before attempting to cancel the timer task, and check this during renewal and before queuing another renewal. I think I like this approach better - {{setTimerForTokenRenewal}} can be called conditionally based on the success of {{DelegationTokenToRenew#renew()}}. Let me take a stab. > Deadlock between RenewalTimerTask methods cancel() and run() > ------------------------------------------------------------ > > Key: MAPREDUCE-5364 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5364 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 1.2.0 > Reporter: Karthik Kambatla > Assignee: Karthik Kambatla > Fix For: 1.2.1 > > Attachments: mr-5364-1.patch, mr-5364-addendum-1.patch > > > MAPREDUCE-4860 introduced a local variable {{cancelled}} in > {{RenewalTimerTask}} to fix the race where {{DelegationTokenRenewal}} > attempts to renew a token even after the job is removed. However, the patch > also makes {{run()}} and {{cancel()}} synchronized methods leading to a > potential deadlock against {{run()}}'s catch-block (error-path). > The deadlock stacks below: > {noformat} > - > org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.cancel() > @bci=0, line=240 (Interpreted frame) > - > org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeDelegationTokenRenewalForJob(org.apache.hadoop.mapreduce.JobID) > @bci=109, line=319 (Interpreted frame) > {noformat} > {noformat} > - > org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeFailedDelegationToken(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew) > @bci=62, line=297 (Interpreted frame) > - > org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.access$300(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew) > @bci=1, line=47 (Interpreted frame) > - > org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.run() > @bci=148, line=234 (Interpreted frame) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira