[ https://issues.apache.org/jira/browse/HDFS-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HDFS-1093: ----------------------------------- Attachment: NNreadwriteLock_trunk_3.txt I have purposely kept the try-catch block un-indented so that it is easier for you to review. Once the review is complete, I will upload a final patch with proper indentation. FSNamesystem.java ------------------------- I removed the import of the ReadWriteLock. However, I kept the fsLock initializing inside the initialize method rather than moving it to the declaration. Most variables seem to be getting initialized inside the initialize() method, isn't it? getBlockLocations() - I now acquire the readLock and attempt to proceed ahead. If the access-time has to be set, then I release the readLock, acquire the writeLock and start all over again. Check for safemode should be inside the FSnamesystem lock, in fact more of it is coming in HDFS-988. setTimes() move isAccessTimeSupported() call out side the lock : done. CommitBlockSynchronization(): the key is to call the logSync() outside the FSnamesystem lock. And there is a LOG statement that prints out "src" after the call to logSync(). That is the reason why i had to declare "String src" right at the beginning of the method. I agree with you that there are code portions in processReport, createSymLinkInternal, startFileInternal() that can move outside the FSNamesystem lock. However, I would like to avoid doing this code reorganizatin as part of this JIRA, especially because it makes the code difficult to review. Also, this is not a regression because the original code has all these code inside the synchronized section anyway. Please let me know if you agree on this one. getDatanodeListForReport - move boolean and HashMap out of read lock: done. getSafeModeTip - readLock instead of writeLock : done BlockManager.java ------------------------ chooseUnderReplicatedBlocks - should we grab readLock instead of writeLock? I wold prefer to keep the writeLock because this method updates member variables of the class, e.g. missingBlocksInPrevIter removeStoredBlock - assert to be replaced by grab writeLock: I have the impression that all calls to removeStoredBlock already has the writeLock, that is the reason for the assert. Do you know of a code path via which this is not the case? DecommissionManager.java - readLock instead of writeLock; The DecommissionManager.check() calls FSNamesystem.checkDecommissionStateInterna() which, in turn, invokes node.setDecommissioned(). This updates the state of the node, hence I was confortable keeping the writeLock in this code path. FSDirectory.java --------------------- BackupStorage.loadCheckPoint() is now fixed. Do you have a suggestion on how I can fix the code in FSPermissionChecker.checkPermission()? > Improve namenode scalability by splitting the FSNamesystem synchronized > section in a read/write lock > ---------------------------------------------------------------------------------------------------- > > Key: HDFS-1093 > URL: https://issues.apache.org/jira/browse/HDFS-1093 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: NNreadwriteLock.txt, NNreadwriteLock_trunk_1.txt, > NNreadwriteLock_trunk_2.txt, NNreadwriteLock_trunk_3.txt > > > Most critical data structures in the NameNode (NN) are protected by a > syncronized methods in the FSNamesystem class. This essentially makes > critical code paths in the NN single-threaded. However, a large percentage of > the NN calls are listStatus, getBlockLocations, etc which do not change > internal data structures at all, these are read-only calls. If we change the > FSNamesystem lock to a read/write lock, many of the above operations can > occur in parallel, thus improving the scalability of the NN. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.