[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390818#comment-17390818
 ] 

Konstantin Shvachko commented on HDFS-14703:
--------------------------------------------

Some thoughts on [~daryn]'s comment:
* For small clusters/namespaces you don't need to do anything at all, 
performance should be great.
* 1 billion object namespaces can be effectively handled with Observers 
(HDFS-12943), as described in our [Exabyte Club 
blog|https://engineering.linkedin.com/blog/2021/the-exabyte-club--linkedin-s-journey-of-scaling-the-hadoop-distr].
* This namespace partitioning idea should help if you want to grow the 
workloads and cluster size further. And sure, it's a big "if" there.
* There is plenty of benchmark data above. I built the POC exactly with the 
purpose to obtain some preliminary synthetic numbers. For me 30% is a threshold 
separating worthy improvements.
* We won't know the real performance numbers until the feature is done. As with 
"Consistent Reads from Standby", our initial synthetic  benchmarks showed ~50% 
improvement. The real numbers in production were 3x better in both average 
throughput and latency.
* You bring up good design concerns. But conceptually multiple partitions 
cannot be worse than the single. When an operation spans all partitions, its 
like taking a global lock as we do today. So in this case the performance of 
multiple partitions degenerates to the current level, but in all other cases 
multiple namespace operations can go in parallel.
* Let us know if you have concrete suggestions: you don't want it to sound like 
FUD.

> NameNode Fine-Grained Locking via Metadata Partitioning
> -------------------------------------------------------
>
>                 Key: HDFS-14703
>                 URL: https://issues.apache.org/jira/browse/HDFS-14703
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, namenode
>            Reporter: Konstantin Shvachko
>            Priority: Major
>         Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to