[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903374#comment-16903374 ]
Konstantin Shvachko commented on HDFS-14703: -------------------------------------------- Hi [~hexiaoqiao], thanks for reviewing the doc. Very good questions: # "Cousins" means files like {{/a/b/c/d}} and {{/a/b/m/n}}. They will have keys, respectively, {{<idb, idc, idd>}} and {{<idb, idm, idn>}}, which have common prefix {{<idb>}} and therefore are likely to fall into the same RangeGSet. In your example {{<ida, idb, idc>}} is the parent of {{<idb, idc, idd>}} and this key definition does not guarantee them to be in the same range. # Deleting a directory {{/a/b/c}} means deleting the entire sub-tree underneath this directory. We should lock all RangeGSets involved in such deletion, particularly the one containing containing file {{f}}. So {{f}} cannot be modified concurrently with the delete. # Just to clarify RangeMap is the upper level part of PartitionedGSet, which maps key ranges into RangeGSets. So there is only one RangeMap and many RangeGSets. Holding a lock on RangeMap is akin to holding a global lock. You make a good point that some operations like failover, large deletes, renames, quota changes will still require a global lock. The lock on RangeMap could play the role of such global lock. This should be defined in more details within the design of LatchLock. Ideally we should retain FSNamesystemLock as a global lock for some operations. This will also help us gradually switch operations from FSNamesystemLock to LatchLock. # I don't know what the next bottleneck we will see, but you are absolutely correct there will be something. For edits log, I indeed saw while running my benchmarks that the number of transactions batched together while journaling was increasing. This is expected and desirable behavior, since writing large batches to a disk is more efficient than lots of small writes. > NameNode Fine-Grained Locking via Metadata Partitioning > ------------------------------------------------------- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode > Reporter: Konstantin Shvachko > Priority: Major > Attachments: NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org