[
https://issues.apache.org/jira/browse/HDFS-6306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130325#comment-14130325
]
Daryn Sharp commented on HDFS-6306:
-----------------------------------
I like both approaches. I think #1 might be easier. Rather than re-locking
for every edit, it would be preferable to batch edits within the lock. Either
a fixed number of edits, or better yet as many as possible within a bounded
amount of time.
> Standby NN can hold FSDirectory's writeLock for a long time under heavy load
> ----------------------------------------------------------------------------
>
> Key: HDFS-6306
> URL: https://issues.apache.org/jira/browse/HDFS-6306
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Ming Ma
>
> Standby NN uses FSEditLogLoader to update its namespace. It can hold
> FSDirectory's writeLock for a long time when active NN generates lots of
> edits.
> {noformat}
> loadEditRecords
> fsNamesys.writeLock();
> fsDir.writeLock();
> ...
> try {
> while (true) {
> try {
> FSEditLogOp op;
> try {
> op = in.readOp();
> ...
> }
> }
> } finally {
> ...
> fsDir.writeUnlock();
> fsNamesys.writeUnlock();
> }
> {noformat}
> With the fix in https://issues.apache.org/jira/browse/HDFS-5693, JMX response
> time is good for active NN as it no longer requires FSnamesystem's lock, even
> though it still need to acquire FSDirectory's readlock during FSDirectory's
> totalInodes. That isn't an issue for active NN as each client RPC request
> might only acquire FSDirectory lock for short period of time. But Standby NN
> could hold the lock for a longer period of time.
> There are two ways to fix these:
> 1. Fix standby NN to acquire FSDirectory's writeLock for each edit record.
> 2. Fix FSDirectory's totalInodes to not take readLock so JMX can still go
> through.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)