[ 
https://issues.apache.org/jira/browse/HDFS-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359271#comment-16359271
 ] 

Xiao Chen commented on HDFS-13112:
----------------------------------

Thanks for the fix Daryn and Kihwal. I have not reviewed as careful as Kihwal 
did, but from what I see, LGTM. :)

One question:
{code:title=FSNamesystem.java}
  public void logUpdateMasterKey(DelegationKey key) {
    ...
    assert hasReadLock();
    getEditLog().logUpdateMasterKey(key);
    getEditLog().logSync();
  }
{code}

I think {{logSync}} is usually done outside of the FSN lock, why not do the 
same here?

Also just to confirm my understanding: the comment in 
{{logExpireDelegationToken}} says that expiration edits are batched, which is 
reasonable. In code there is no {{logSync}} called at the end of the 
{{removeExpiredToken}}, but we don't necessarily have to call it because worst 
case is we lost it on failover, and new NN will still remove it in the next 
interval.

> Token expiration edits may cause log corruption or deadlock
> -----------------------------------------------------------
>
>                 Key: HDFS-13112
>                 URL: https://issues.apache.org/jira/browse/HDFS-13112
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.1.0-beta, 0.23.8
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-13112.1.patch, HDFS-13112.patch
>
>
> HDFS-4477 specifically did not acquire the fsn lock during token cancellation 
> based on the belief that edit logs are thread-safe.  However, log rolling is 
> not thread-safe.  Failure to externally synchronize on the fsn lock during a 
> roll will cause problems.
> For sync edit logging, it may cause corruption by interspersing edits with 
> the end/start segment edits.  Async edit logging may encounter a deadlock if 
> the log queue overflows.  Luckily, losing the race is extremely rare.  In ~5 
> years, we've never encountered it.  However, HDFS-13051 lost the race with 
> async edits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to