[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390818#comment-17390818 ]
Konstantin Shvachko commented on HDFS-14703: -------------------------------------------- Some thoughts on [~daryn]'s comment: * For small clusters/namespaces you don't need to do anything at all, performance should be great. * 1 billion object namespaces can be effectively handled with Observers (HDFS-12943), as described in our [Exabyte Club blog|https://engineering.linkedin.com/blog/2021/the-exabyte-club--linkedin-s-journey-of-scaling-the-hadoop-distr]. * This namespace partitioning idea should help if you want to grow the workloads and cluster size further. And sure, it's a big "if" there. * There is plenty of benchmark data above. I built the POC exactly with the purpose to obtain some preliminary synthetic numbers. For me 30% is a threshold separating worthy improvements. * We won't know the real performance numbers until the feature is done. As with "Consistent Reads from Standby", our initial synthetic benchmarks showed ~50% improvement. The real numbers in production were 3x better in both average throughput and latency. * You bring up good design concerns. But conceptually multiple partitions cannot be worse than the single. When an operation spans all partitions, its like taking a global lock as we do today. So in this case the performance of multiple partitions degenerates to the current level, but in all other cases multiple namespace operations can go in parallel. * Let us know if you have concrete suggestions: you don't want it to sound like FUD. > NameNode Fine-Grained Locking via Metadata Partitioning > ------------------------------------------------------- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode > Reporter: Konstantin Shvachko > Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org