[ https://issues.apache.org/jira/browse/HDFS-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130325#comment-14130325 ]
Daryn Sharp commented on HDFS-6306: ----------------------------------- I like both approaches. I think #1 might be easier. Rather than re-locking for every edit, it would be preferable to batch edits within the lock. Either a fixed number of edits, or better yet as many as possible within a bounded amount of time. > Standby NN can hold FSDirectory's writeLock for a long time under heavy load > ---------------------------------------------------------------------------- > > Key: HDFS-6306 > URL: https://issues.apache.org/jira/browse/HDFS-6306 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Ming Ma > > Standby NN uses FSEditLogLoader to update its namespace. It can hold > FSDirectory's writeLock for a long time when active NN generates lots of > edits. > {noformat} > loadEditRecords > fsNamesys.writeLock(); > fsDir.writeLock(); > ... > try { > while (true) { > try { > FSEditLogOp op; > try { > op = in.readOp(); > ... > } > } > } finally { > ... > fsDir.writeUnlock(); > fsNamesys.writeUnlock(); > } > {noformat} > With the fix in https://issues.apache.org/jira/browse/HDFS-5693, JMX response > time is good for active NN as it no longer requires FSnamesystem's lock, even > though it still need to acquire FSDirectory's readlock during FSDirectory's > totalInodes. That isn't an issue for active NN as each client RPC request > might only acquire FSDirectory lock for short period of time. But Standby NN > could hold the lock for a longer period of time. > There are two ways to fix these: > 1. Fix standby NN to acquire FSDirectory's writeLock for each edit record. > 2. Fix FSDirectory's totalInodes to not take readLock so JMX can still go > through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)