[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701351#comment-13701351
 ] 

Karthik Kambatla commented on MAPREDUCE-5364:
---------------------------------------------

Thanks Sid.

bq. The addendum patch can cause deadlocks on the call to 
{{setTimerForTokenRenewal}}
Looking at the code, I don't see a deadlock possibility. While a call to 
{{setTimerForTokenRenewal}} requires a lock on DelegationTokenRenewer.class, I 
don't see any method holding a lock on DelegationTokenRenewer.class requiring a 
lock on delegationTokens or cancelled flag. Am I missing something here?

bq. A cancelled flag could be used on the DelegationTokenToRenew structure 
itself. Set intent to cancel before attempting to cancel the timer task, and 
check this during renewal and before queuing another renewal.
I think I like this approach better - {{setTimerForTokenRenewal}} can be called 
conditionally based on the success of {{DelegationTokenToRenew#renew()}}. Let 
me take a stab.
                
> Deadlock between RenewalTimerTask methods cancel() and run()
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-5364
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5364
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>             Fix For: 1.2.1
>
>         Attachments: mr-5364-1.patch, mr-5364-addendum-1.patch
>
>
> MAPREDUCE-4860 introduced a local variable {{cancelled}} in 
> {{RenewalTimerTask}} to fix the race where {{DelegationTokenRenewal}} 
> attempts to renew a token even after the job is removed. However, the patch 
> also makes {{run()}} and {{cancel()}} synchronized methods leading to a 
> potential deadlock against {{run()}}'s catch-block (error-path).
> The deadlock stacks below:
> {noformat}
>  - 
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.cancel()
>  @bci=0, line=240 (Interpreted frame)
>  - 
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeDelegationTokenRenewalForJob(org.apache.hadoop.mapreduce.JobID)
>  @bci=109, line=319 (Interpreted frame)
> {noformat}
> {noformat}
>  - 
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.removeFailedDelegationToken(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
>  @bci=62, line=297 (Interpreted frame)
>  - 
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal.access$300(org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$DelegationTokenToRenew)
>  @bci=1, line=47 (Interpreted frame)
>  - 
> org.apache.hadoop.mapreduce.security.token.DelegationTokenRenewal$RenewalTimerTask.run()
>  @bci=148, line=234 (Interpreted frame)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to