[jira] [Commented] (HDFS-5064) Standby checkpoints should not block concurrent readers

Kihwal Lee (JIRA) Mon, 05 Aug 2013 07:21:22 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729534#comment-13729534
 ]


Kihwal Lee commented on HDFS-5064:
----------------------------------

A long read lock against FSNamesystem should be avoided. Even on ANN, repeated 
getContentSummary() calls against big directory trees can degrade the 
performance significantly. I complained about the fairness setting, but 
realized that it can get worse without it. 

I think most of writers on SBN are datanodes. If this is true, separating FSN 
and BlockManager locking will help. Last time I checked, we wanted a facility 
to enforce lock hierarchy before attempting to do this.

Or we could resort to a SBN-scpecific solution, since it probably only needs to 
block EditLogTailer and perhaps prevent concurrent checkpointing.
                
> Standby checkpoints should not block concurrent readers
> -------------------------------------------------------
>
>                 Key: HDFS-5064
>                 URL: https://issues.apache.org/jira/browse/HDFS-5064
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, namenode
>    Affects Versions: 2.1.1-beta
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> We've observed an issue which causes fetches of the {{/jmx}} page of the NN 
> to take a long time to load when the standby is in the process of creating a 
> checkpoint.
> Even though both creating the checkpoint and gathering the statistics for 
> {{/jmx}} take only the FSNS read lock, the issue is that since the FSNS uses 
> a _fair_ RW lock, a single writer attempting to get the lock will block all 
> threads attempting to get only the read lock for the duration of the 
> checkpoint. This will cause {{/jmx}}, and really any thread only attempting 
> to get the read lock, to block for the duration of the checkpoint, even 
> though they should be able to proceed concurrently with the checkpointing 
> thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-5064) Standby checkpoints should not block concurrent readers

Reply via email to