[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-07-27 Thread Xing Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387797#comment-17387797
 ] 

Xing Lin edited comment on HDFS-14703 at 7/27/21, 6:07 AM:
---

[~daryn] Thanks for your comments. I will address your last question and leave 
other questions to [~shv]. :)

 

Regarding the results, we used the standard NNThroughputBenchmark, with 
commands like the following. 
  
{code:java}
./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
file:/// -op mkdirs -threads 200 -dirs 1000 -dirsPerDir 512{code}
Here are a result from [~prasad-acit], since his QPS numbers are higher than 
what I got. 
{code:java}
BASE:
 common/hadoop-hdfs-32021-05-17 11:17:36,973 INFO 
namenode.NNThroughputBenchmark: — mkdirs inputs —
 2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrDirs = 100
 2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrThreads = 200
 2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 32
 2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: — mkdirs stats —
 2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: # operations: 
100
 2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
17718
 2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: Ops per sec: 
56439.77875606727
 2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: Average Time: 3
 2021-05-17 11:17:36,973 INFO namenode.FSEditLog: Ending log segment 1, 1031254
PATCH:
 2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: — mkdirs inputs —
 2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrDirs = 100
 2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrThreads = 200
 2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 32
 2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: — mkdirs stats —
 2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: # operations: 
100
 2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
15010
 2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: Ops per sec: 
66622.25183211193
 2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: Average Time: 2
 2021-05-17 11:11:09,331 INFO namenode.FSEditLog: Ending log segment 1, 1031254
{code}
 


was (Author: xinglin):
[~daryn] Thanks for your comments. I will address your last question and leave 
other questions to [~shv]. :)

 

Regarding the results, we used the standard NNThroughputBenchmark, with 
commands like the following. 
 
./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark *-fs* 
[*file:///*|file:///*] -op mkdirs -threads 200 -dirs 1000 -dirsPerDir 512

Here are a result from [~prasad-acit], since his QPS numbers are higher than 
what I got. 
BASE:
common/hadoop-hdfs-32021-05-17 11:17:36,973 INFO 
namenode.NNThroughputBenchmark: --- mkdirs inputs ---
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrDirs = 100
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 32
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: # operations: 
100
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: Elapsed Time: 17718
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
56439.77875606727
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: Average Time: 3
2021-05-17 11:17:36,973 INFO namenode.FSEditLog: Ending log segment 1, 1031254

PATCH:
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: --- mkdirs inputs 
---
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrDirs = 100
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 32
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: # operations: 
100
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: Elapsed Time: 15010
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
66622.25183211193
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: Average Time: 2
2021-05-17 11:11:09,331 INFO namenode.FSEditLog: Ending log segment 1, 1031254

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>

[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-07-14 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345870#comment-17345870
 ] 

Konstantin Shvachko edited comment on HDFS-14703 at 7/14/21, 5:28 PM:
--

I did some performance benchmarks using a physical server (a d430 server in 
[Utah Emulab testbed|http://www.emulab.net]). I used either RAMDISK or SSD, as 
the storage for HDFS. By using RAMDISK, we can remove the time used by the SSD 
to make each write persistent. For the RAM case, we observed an improvement of 
45% from fine-grained locking. For the SSD case, fine-grained locking gives us 
20% improvement.  We used an Intel SSD (model: SSDSC2BX200G4R).  

We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't 
know the reason for this yet. We repeated the experiment for RAMDISK for trunk 
twice to confirm the performance number.
h2. tmpfs, hadoop-tmp-dir = /run/hadoop-utos
h3. 45% improvements fgl vs. trunk
trunk 
{noformat:nowrap}
2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
663510
2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
15071.362
2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
710248
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
14079.5
2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Average Time: 14
2021-05-16 22:15:13,515 INFO namenode.FSEditLog: Ending log segment 8345565, 
10019540
{noformat}

fgl
{noformat:nowrap}
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
445980
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
22422.530
2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Average Time: 8
{noformat}

h2. SSD, hadoop.tmp.dir=/dev/sda4
h3. 23% improvement fgl vs. trunk

trunk:
{noformat:nowrap}
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
593839
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
16839.581
2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Average Time: 11
{noformat:nowrap}

fgl
{noformat:nowrap}
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
481269
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
20778.400
2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Average Time: 9
{noformat}
 
{noformat:nowrap}
/dev/sda:
ATA device, with non-removable media
Model Number:   INTEL SSDSC2BX200G4R
Serial Number:  BTHC523202RD200TGN
Firmware Revision:  G201DL2D
{noformat}


was (Author: xinglin):
I did some performance benchmarks using a physical server (a d430 server in 
[Utah Emulab testbed|http://www.emulab.net]). I used either RAMDISK or SSD, as 
the storage for HDFS. By using RAMDISK, we can remove the time used by the SSD 
to make each write persistent. For the RAM case, we observed an improvement of 
45% from fine-grained locking. For the SSD case, fine-grained locking gives us 
20% improvement.  We used an Intel SSD (model: SSDSC2BX200G4R).  

We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't 
know the reason for this yet. We repeated the experiment for RAMDISK for trunk 
twice to confirm the performance number.
h1. tmpfs, hadoop-tmp-dir = /run/hadoop-utos
h1. 45% improvements fgl vs. trunk
h2. trunk 

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
663510

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
15071.362

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
710248

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
14079.5

2021-05-16 22:15:13,515 INFO 

[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-06-08 Thread Renukaprasad C (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17359306#comment-17359306
 ] 

Renukaprasad C edited comment on HDFS-14703 at 6/8/21, 12:17 PM:
-

[~shv]/ [~xinglin]

We have implemented FGL for Create API and done basic testing & captured the 
performance reading. With the create API we could see around 25% of improvement.

I have created PR - [https://github.com/apache/hadoop/pull/3013] for the same. 
Can you please review & feedback when you get time?

Command:

/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
[file:///] -op create -threads 200 -files 100 -filesPerDir 40

Result:

 
||Iteration||Base||Patch||
|Itr-1|27124|32712|
|Itr-2|26460|31312|
|Itr-3|24166|32276|
|Avg|25916.66|32100|
|Improvement| |23.86|

 

 

 


was (Author: prasad-acit):
[~shv]/ [~xinglin]

We have implemented FGL for Create API and done basic testing & captured the 
performance reading. With the create API we could see around 25% of improvement.

I have created PR - [https://github.com/apache/hadoop/pull/3013] for the same. 
Can you please review & feedback when you get time?

Command:

/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
file:/// -op create -threads 200 -files 100 -filesPerDir 40

Result:

 
||Iteration||Heading 1||Heading 2||
|Itr-1|27124|32712|
|Itr-2|26460|31312|
|Itr-3|24166|32276|
|Avg|25916.66|32100|
|Improvement| |23.86|

 

 

 

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-17 Thread Renukaprasad C (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346002#comment-17346002
 ] 

Renukaprasad C edited comment on HDFS-14703 at 5/17/21, 10:56 AM:
--

Thanks [~shv] & [~xinglin]
 We have tested on 48 Core physical machine & could see significant performance 
improvement with the patch.
 On average improvement is +*around 30%*+ with default storage policy.
||Itr||Base||Patch||
|ITR-1|56439|66622|
|ITR-2|58092|65074|
|ITR-3|60132|74354|
|ITR-4|52056|76522|
|ITR-5|55478|65526|
|ITR-6|60664|76881|
|AVG|56066|72976|

h3. +Improvement 30.16+ 

Attached few results:
{code:java}
BASE:
common/hadoop-hdfs-32021-05-17 11:17:36,973 INFO 
namenode.NNThroughputBenchmark: --- mkdirs inputs ---
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrDirs = 100
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 32
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: # operations: 
100
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: Elapsed Time: 17718
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
56439.77875606727
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: Average Time: 3
2021-05-17 11:17:36,973 INFO namenode.FSEditLog: Ending log segment 1, 1031254

PATCH:
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: --- mkdirs inputs 
---
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrDirs = 100
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 32
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: # operations: 
100
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: Elapsed Time: 15010
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
66622.25183211193
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: Average Time: 2
2021-05-17 11:11:09,331 INFO namenode.FSEditLog: Ending log segment 1, 1031254

{code}
Command: ./hadoop jar 
../share/hadoop/common/hadoop-hdfs-3.1.1-hw-ei-SNAPSHOT-tests.jar 
org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs [file:///] -op 
mkdirs -threads 200 -dirs 100 -dirsPerDir 32

Hw configuration:
 Architecture: x86_64
 CPU op-mode(s): 32-bit, 64-bit
 CPU(s): 48
 On-line CPU(s) list: 0-47
 Thread(s) per core: 2
 Core(s) per socket: 12
 Socket(s): 2
 NUMA node(s): 2
 Vendor ID: GenuineIntel
 CPU family: 6
 Model: 63
 Model name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
 Stepping: 2
 CPU MHz: 2600.406
 CPU max MHz: 3500.
 CPU min MHz: 1200.
 BogoMIPS: 5189.51
 Virtualization: VT-x
 L1d cache: 32K
 L1i cache: 32K
 L2 cache: 256K
 L3 cache: 30720K
 NUMA node0 CPU(s): 0-11,24-35
 NUMA node1 CPU(s): 12-23,36-47


was (Author: prasad-acit):
Thanks [~shv] & [~xinglin]
I have tested on 48 Core physical machine & could see significant performance 
improvement with the patch.
On average improvement is around 30% with default storage policy.

BasePatch
ITR-1   56439   66622
ITR-2   58092   65074
ITR-3   60132   74354
ITR-4   52056   76522
ITR-5   55478   65526
ITR-6   60664   76881
AVG 56066   72976.3

Improvement 30.16147636 


Attached few results:

{code:java}
BASE:
common/hadoop-hdfs-32021-05-17 11:17:36,973 INFO 
namenode.NNThroughputBenchmark: --- mkdirs inputs ---
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrDirs = 100
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 32
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: # operations: 
100
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: Elapsed Time: 17718
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
56439.77875606727
2021-05-17 11:17:36,973 INFO namenode.NNThroughputBenchmark: Average Time: 3
2021-05-17 11:17:36,973 INFO namenode.FSEditLog: Ending log segment 1, 1031254

PATCH:
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: --- mkdirs inputs 
---
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrDirs = 100
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 32
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-17 11:11:09,321 INFO namenode.NNThroughputBenchmark: # operations: 
100

[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-16 Thread Xing Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345870#comment-17345870
 ] 

Xing Lin edited comment on HDFS-14703 at 5/17/21, 4:42 AM:
---

I did some performance benchmarks using a physical server (a d430 server in 
[Utah Emulab testbed|http://www.emulab.net]). I used either RAMDISK or SSD, as 
the storage for HDFS. By using RAMDISK, we can remove the time used by the SSD 
to make each write persistent. For the RAM case, we observed an improvement of 
45% from fine-grained locking. For the SSD case, fine-grained locking gives us 
20% improvement.  We used an Intel SSD (model: SSDSC2BX200G4R).  

We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't 
know the reason for this yet. We repeated the experiment for RAMDISK for trunk 
twice to confirm the performance number.
h1. tmpfs, hadoop-tmp-dir = /run/hadoop-utos
h1. 45% improvements fgl vs. trunk
h2. trunk 

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
663510

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
15071.362

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
710248

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
14079.5

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Average Time: 14

2021-05-16 22:15:13,515 INFO namenode.FSEditLog: Ending log segment 8345565, 
10019540

fgl

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
445980

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
22422.530

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Average Time: 8
h1. SSD, hadoop.tmp.dir=/dev/sda4
h1. 23% improvement fgl vs. trunk

trunk:

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
593839

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
16839.581

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Average Time: 11

 

fgl

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
481269

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
20778.400

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Average Time: 9

 

/dev/sda:

ATA device, with non-removable media

Model Number:       INTEL SSDSC2BX200G4R

Serial Number:      BTHC523202RD200TGN

Firmware Revision:  G201DL2D


was (Author: xinglin):
I did some performance benchmarks using a physical server (a d430 server in 
[Utah Emulab testbed|www.emulab.net]). I used either RAMDISK or SSD, as the 
storage for HDFS. By using RAMDISK, we can remove the time used by the SSD to 
make each write persistent. For the RAM case, we observed an improvement of 45% 
from fine-grained locking. For the SSD case, fine-grained locking gives us 20% 
improvement.  We used an Intel SSD (model: SSDSC2BX200G4R).  

We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't 
know the reason for this yet. We repeated the experiment for RAMDISK for trunk 
twice to confirm the performance number.
h1. tmpfs, hadoop-tmp-dir = /run/hadoop-utos
h1. 45% improvements fgl vs. trunk
h2. trunk 

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
663510

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
15071.362

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
710248

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
14079.5

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Average Time: 14

2021-05-16 22:15:13,515 INFO namenode.FSEditLog: Ending log segment 8345565, 
10019540

fgl


[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-16 Thread Xing Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345870#comment-17345870
 ] 

Xing Lin edited comment on HDFS-14703 at 5/17/21, 4:41 AM:
---

I did some performance benchmarks using a physical server (a d430 server in 
[Utah Emulab testbed|www.emulab.net]). I used either RAMDISK or SSD, as the 
storage for HDFS. By using RAMDISK, we can remove the time used by the SSD to 
make each write persistent. For the RAM case, we observed an improvement of 45% 
from fine-grained locking. For the SSD case, fine-grained locking gives us 20% 
improvement.  We used an Intel SSD (model: SSDSC2BX200G4R).  

We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't 
know the reason for this yet. We repeated the experiment for RAMDISK for trunk 
twice to confirm the performance number.
h1. tmpfs, hadoop-tmp-dir = /run/hadoop-utos
h1. 45% improvements fgl vs. trunk
h2. trunk 

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
663510

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
15071.362

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
710248

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
14079.5

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Average Time: 14

2021-05-16 22:15:13,515 INFO namenode.FSEditLog: Ending log segment 8345565, 
10019540

fgl

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
445980

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
22422.530

2021-05-16 21:06:46,476 INFO namenode.NNThroughputBenchmark: Average Time: 8
h1. SSD, hadoop.tmp.dir=/dev/sda4
h1. 23% improvement fgl vs. trunk

trunk:

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
593839

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
16839.581

2021-05-16 21:59:06,042 INFO namenode.NNThroughputBenchmark: Average Time: 11

 

fgl

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
481269

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
20778.400

2021-05-16 21:21:03,906 INFO namenode.NNThroughputBenchmark: Average Time: 9

 

/dev/sda:

ATA device, with non-removable media

Model Number:       INTEL SSDSC2BX200G4R

Serial Number:      BTHC523202RD200TGN

Firmware Revision:  G201DL2D


was (Author: xinglin):
I did some performance benchmarks using a physical server (a d430 server in 
[utah Emulab testbed|[www.emulab.net].) |http://www.emulab.net].%29/]I used 
either RAMDISK or SSD, as the storage for HDFS. By using RAMDISK, we can remove 
the time used by the SSD to make each write persistent. For the RAM case, we 
observed an improvement of 45% from fine-grained locking. For the SSD case, 
fine-grained locking gives us 20% improvement.  We used an Intel SSD (model: 
SSDSC2BX200G4R).  

We noticed for trunk, the mkdir OPS is lower for the RAMDISK than SSD. We don't 
know the reason for this yet. We repeated the experiment for RAMDISK for trunk 
twice to confirm the performance number.
h1. tmpfs, hadoop-tmp-dir = /run/hadoop-utos
h1. 45% improvements fgl vs. trunk
h2. trunk 

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
663510

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
15071.362

2021-05-16 20:37:20,280 INFO namenode.NNThroughputBenchmark: Average Time: 13

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: # operations: 
1000

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
710248

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
14079.5

2021-05-16 22:15:13,515 INFO namenode.NNThroughputBenchmark: Average Time: 14

2021-05-16 22:15:13,515 INFO namenode.FSEditLog: Ending log segment 

[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-15 Thread Xing Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345069#comment-17345069
 ] 

Xing Lin edited comment on HDFS-14703 at 5/15/21, 3:55 PM:
---

[~prasad-acit] try this command: use -fs [file:///], instead of 
hdfs://server:port. "-fs [file:///]" will bypass the RPC layer and should give 
you higher numbers at your VM. I use the default partition size of 256. 

dir: /home/xinglin/projs/hadoop/hadoop-dist/target/hadoop-3.4.0-SNAPSHOT

$ ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark 
*-fs* [*file:///*|file:///*] -op mkdirs -threads 200 -dirs 1000 -dirsPerDir 
512


was (Author: xinglin):
[~prasad-acit] try this command: use -fs [file:///], instead of 
hdfs://server:port. "-fs [file:///]" will bypass the RPC layer and should give 
you higher numbers at your VM. I use the default partition size of 256. 

dir: /home/xinglin/projs/hadoop/hadoop-dist/target/hadoop-3.4.0-SNAPSHOT

$ ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark 
*-fs [file:///*] -op mkdirs -threads 200 -dirs 1000 -dirsPerDir 512

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-15 Thread Xing Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17345069#comment-17345069
 ] 

Xing Lin edited comment on HDFS-14703 at 5/15/21, 3:53 PM:
---

[~prasad-acit] try this command: use -fs [file:///], instead of 
hdfs://server:port. "-fs [file:///]" will bypass the RPC layer and should give 
you higher numbers at your VM. I use the default partition size of 256. 

dir: /home/xinglin/projs/hadoop/hadoop-dist/target/hadoop-3.4.0-SNAPSHOT

$ ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark 
*-fs [file:///*] -op mkdirs -threads 200 -dirs 1000 -dirsPerDir 512


was (Author: xinglin):
[~prasad-acit] try this command: use -fs file:///, instead of 
hdfs://server:port. "-fs file:///" will bypass the RPC layer and should give 
you higher numbers at your VM. 

dir: /home/xinglin/projs/hadoop/hadoop-dist/target/hadoop-3.4.0-SNAPSHOT

$ ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark 
*-fs file:///* -op mkdirs -threads 200 -dirs 1000 -dirsPerDir 512

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-15 Thread Xing Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344970#comment-17344970
 ] 

Xing Lin edited comment on HDFS-14703 at 5/15/21, 6:06 AM:
---

[~prasad-acit] how many CPU cores does your server have? The OPS per sec seems 
rather low, than I got from my Mac laptop (with 8 cores). fgl gives us 10% 
improvement running on my Mac. We will find some proper hardware to do more 
serious performance benchmarks. 

 
 *Trunk*
 021-05-11 09:52:35,666 INFO namenode.NNThroughputBenchmark:
 2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: — mkdirs inputs —
 2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: nrDirs = 1000
 2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: nrThreads = 200
 2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 512
 2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
 2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: # operations: 
1000
 2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
542905
 2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
18419.42881351249
 2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: Average Time: 10
 2021-05-11 09:52:35,667 INFO namenode.FSEditLog: Ending log segment 5488830, 
10019538
 2021-05-11 09:52:35,670 INFO namenode.FSEditLog: Number of transactions: 
4530710 Total time for transactions(ms): 14288 Number of transactions batched 
in Syncs: 4452444 Number of syncs: 78267 SyncTimes(ms): 200575
  
  
 *fgl*
 021-05-11 10:58:40,142 INFO namenode.NNThroughputBenchmark:
 2021-05-11 10:58:40,142 INFO namenode.NNThroughputBenchmark: — mkdirs inputs —
 2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: nrDirs = 1000
 2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: nrThreads = 200
 2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 512
 2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: — mkdirs stats  —
 2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: # operations: 
1000
 2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
505892
 2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
19767.06490713433
 2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: Average Time: 10
 2021-05-11 10:58:40,143 INFO namenode.FSEditLog: Ending log segment 5826307, 
10019538
 2021-05-11 10:58:40,146 INFO namenode.FSEditLog: Number of transactions: 
4193233 Total time f
 or transactions(ms): 13990 Number of transactions batched in Syncs: 4130972 
Number of syncs:
 62262 SyncTimes(ms): 168203


was (Author: xinglin):
[~prasad-acit] how many CPU cores does your server have? The OPS per sec seems 
rather low, than I got from my Mac laptop (with 8 cores). fgl gives us 10% 
improvement running on my Mac. We will find some proper hardware to do more 
serial performance benchmarks. 

 
*Trunk*
021-05-11 09:52:35,666 INFO namenode.NNThroughputBenchmark:
2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: --- mkdirs inputs 
---
2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: nrDirs = 1000
2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 512
2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
542905
2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
18419.42881351249
2021-05-11 09:52:35,667 INFO namenode.NNThroughputBenchmark: Average Time: 10
2021-05-11 09:52:35,667 INFO namenode.FSEditLog: Ending log segment 5488830, 
10019538
2021-05-11 09:52:35,670 INFO namenode.FSEditLog: Number of transactions: 
4530710 Total time for transactions(ms): 14288 Number of transactions batched 
in Syncs: 4452444 Number of syncs: 78267 SyncTimes(ms): 200575
 
 
*fgl*
021-05-11 10:58:40,142 INFO namenode.NNThroughputBenchmark:
2021-05-11 10:58:40,142 INFO namenode.NNThroughputBenchmark: --- mkdirs inputs 
---
2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: nrDirs = 1000
2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 512
2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: # operations: 
1000
2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
505892
2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
19767.06490713433
2021-05-11 10:58:40,143 INFO namenode.NNThroughputBenchmark: 

[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-13 Thread Renukaprasad C (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343725#comment-17343725
 ] 

Renukaprasad C edited comment on HDFS-14703 at 5/13/21, 6:32 AM:
-

[~shv] Thanks for sharing the patch.
I tried to test the pach applied on Trunk, results found similar with & without 
patch. I have attached results for both the results below. Did I miss something?

With Patch:
{code:java}
~/hadoop-3.4.0-SNAPSHOT/bin$ ./hdfs  
org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
hdfs://localhost:9000 -op mkdirs -threads 200 -dirs 200 -dirsPerDir 128
2021-05-13 01:57:41,279 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2021-05-13 01:57:41,976 INFO namenode.NNThroughputBenchmark: Starting 
benchmark: mkdirs
2021-05-13 01:57:42,065 INFO namenode.NNThroughputBenchmark: Generate 200 
inputs for mkdirs
2021-05-13 01:57:43,385 INFO namenode.NNThroughputBenchmark: Log level = ERROR
2021-05-13 01:57:44,079 INFO namenode.NNThroughputBenchmark: Starting 200 
mkdirs(s).
2021-05-13 02:15:59,958 INFO namenode.NNThroughputBenchmark: 
2021-05-13 02:15:59,958 INFO namenode.NNThroughputBenchmark: --- mkdirs inputs 
---
2021-05-13 02:15:59,958 INFO namenode.NNThroughputBenchmark: nrDirs = 200
2021-05-13 02:15:59,958 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-13 02:15:59,958 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 128
2021-05-13 02:15:59,958 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-13 02:15:59,958 INFO namenode.NNThroughputBenchmark: # operations: 
200
2021-05-13 02:15:59,958 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
1095122
2021-05-13 02:15:59,958 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
1826.2805422592187
2021-05-13 02:15:59,959 INFO namenode.NNThroughputBenchmark: Average Time: 108
{code}

Without Patch:
{code:java}
/hadoop-3.4.0-SNAPSHOT/bin$ ./hdfs  
org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
hdfs://localhost:9000 -op mkdirs -threads 200 -dirs 200 -dirsPerDir 128
2021-05-13 03:25:53,243 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2021-05-13 03:25:54,046 INFO namenode.NNThroughputBenchmark: Starting 
benchmark: mkdirs
2021-05-13 03:25:54,117 INFO namenode.NNThroughputBenchmark: Generate 200 
inputs for mkdirs
2021-05-13 03:25:55,076 INFO namenode.NNThroughputBenchmark: Log level = ERROR
2021-05-13 03:25:55,163 INFO namenode.NNThroughputBenchmark: Starting 200 
mkdirs(s).
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark: 
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark: --- mkdirs inputs 
---
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark: nrDirs = 200
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 128
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark: # operations: 
200
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
1064420
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
1878.9575543488472
2021-05-13 03:43:40,125 INFO namenode.NNThroughputBenchmark: Average Time: 105
{code}


Similar results achived with when i tried with "file" as well, but this case 
Partitions were empty.

{code:java}
./hdfs  org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
file:/// -op mkdirs -threads 200 -dirs 200 -dirsPerDir 128
...
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark: 
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark: --- mkdirs inputs 
---
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark: nrDirs = 200
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark: nrThreads = 200
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark: nrDirsPerDir = 128
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark: --- mkdirs stats  
---
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark: # operations: 
200
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark: Elapsed Time: 
845625
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark:  Ops per sec: 
2365.1145602365114
2021-05-13 09:20:36,921 INFO namenode.NNThroughputBenchmark: Average Time: 84
2021-05-13 09:20:36,922 INFO namenode.FSEditLog: Ending log segment 1465676, 
2015633
2021-05-13 09:20:36,987 INFO namenode.FSEditLog: Number of transactions: 549959 
Total time for transactions(ms): 2840 Number of transactions batched in Syncs: 
545346 Number of syncs: 4614 SyncTimes(ms): 240432 
2021-05-13 09:20:36,996 INFO namenode.FileJournalManager: Finalizing edits file 

[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-10 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089
 ] 

Konstantin Shvachko edited comment on HDFS-14703 at 5/10/21, 7:30 PM:
--

Updated the POC patches to current trunk. There were indeed some missing parts 
in the first patch.
 See [^003-partitioned-inodeMap-POC.tar.gz].

Also created a remote branch called {{fgl}} in hadoop repo with both patches 
applied to current trunk. [~xinglin] is working on adding {{create()}} call to 
FGL. Right now only {{mkdirs()}} is supported.


was (Author: shv):
Updated the POC patches to current trunk. There were indeed some missing parts 
in the first patch.
 See [^003-partitioned-inodeMap-POC.tar.gz].

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-07 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089
 ] 

Konstantin Shvachko edited comment on HDFS-14703 at 5/8/21, 1:05 AM:
-

Updated the POC patches to current trunk. There were indeed some missing parts 
in the first patch.
 See [^003-partitioned-inodeMap-POC.tar.gz].


was (Author: shv):
Updated the POC patches. There were indeed some missing parts in the first 
patch.
See 
[003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz].

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2021-05-07 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17341089#comment-17341089
 ] 

Konstantin Shvachko edited comment on HDFS-14703 at 5/8/21, 1:04 AM:
-

Updated the POC patches. There were indeed some missing parts in the first 
patch.
See 
[003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz].


was (Author: shv):
Updated the POC patches. There were indeed some missing parts in the first 
patch. See 
[https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz|https://issues.apache.org/jira/secure/attachment/13025177/003-partitioned-inodeMap-POC.tar.gz].

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, 
> 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, 
> NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2020-08-31 Thread junbiao chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187409#comment-17187409
 ] 

junbiao chen edited comment on HDFS-14703 at 8/31/20, 6:19 AM:
---

[~hexiaoqiao] I want to do some work on this issue ,could you tell me which  
version does the patch base on?thanks


was (Author: dahaishuantuoba):
[~hexiaoqiao] I want to do some work on this issue ,could you which  version 
does the patch base on?thanks

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, NameNode 
> Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2020-08-30 Thread junbiao chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187409#comment-17187409
 ] 

junbiao chen edited comment on HDFS-14703 at 8/31/20, 4:01 AM:
---

[~hexiaoqiao] I want to do some work on this issue ,could you which  version 
does the patch base on?thanks


was (Author: dahaishuantuoba):
Which  version does the patch base on?thanks

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, NameNode 
> Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2019-08-30 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919948#comment-16919948
 ] 

Arpit Agarwal edited comment on HDFS-14703 at 8/30/19 11:04 PM:


Interesting proposal [~shv] . Thanks for sharing this and the PoC patch. I went 
through the doc and the idea seems interesting. I didn't understand how the 
partitioning scheme works. Do atomic rename and snapshots still work as before 
with these changes?

Did you measure write throughput improvement with 
{{dfs.namenode.edits.asynclogging}}?


was (Author: arpitagarwal):
Interesting proposal [~shv] . Thanks for sharing this and the PoC patch. I went 
through the doc and the idea seems interesting. I didn't understand how the 
partitioning scheme works. Do atomic rename and snapshots still as before with 
these changes?

Did you measure write throughput improvement with 
{{dfs.namenode.edits.asynclogging}}?

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: 001-partitioned-inodeMap-POC.tar.gz, NameNode 
> Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2019-08-17 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909681#comment-16909681
 ] 

He Xiaoqiao edited comment on HDFS-14703 at 8/17/19 12:41 PM:
--

Thanks [~shv] for your POC patches. I have to state that this is very clever 
design for fine-grained global locking. There are still couple of questions 
what I do not quite understand and look forward to your response.
1. Write concurrency control. Consider one case with two threads with mkdir 
(/a/b/c/d/e) and delete(/a/b/c) ops. I try to ran this case following design 
and POC patches, but I usually get unstable result since key with  
and  could be located at different RangeGSet using 
{{INodeMap#latchWriteLock}}, then the two threads could run concurrently and 
get unstable result even if from one client and one by one. As your last 
explains, `deleting a directory should lock all RangeGets involved`. Is it one 
special case about Delete Ops? Sorry for asking this question again.
{quote}
Deleting a directory /a/b/c means deleting the entire sub-tree underneath this 
directory. We should lock all RangeGSets involved in such deletion, 
particularly the one containing file f. So f cannot be modified concurrently 
with the delete.
{quote}
2. {{INode}} involves local variable {{long[] namespaceKey}} at 0004 in POC 
package. I believe this attributes is very useful to partition for INode. 
meanwhile does it bring some other potential issues
* heap footprint overhead. For a long while running of NameNode process, 
namespaceKey of most INode (visited once at least) in the directory tree may be 
not null. If we consider there are 500M INodes and {{level}} is both 2, it need 
over than 8GB heap size.
* when one INode is renamed, the {{namespaceKey}} have to update, right? Since 
its parent INode has changes. POC seems not update anymore if {{namespaceKey}} 
is not null.
Is it possible to calculate namespaceKey for INode when use it out of the Lock. 
Of course, it will bring CPU overhead. Please correct me if I am wrong. Thanks.

3. No LatchLock unlock in the POC for operation #mkdir, it seems like a bit of 
oversight. In my opinion, it has to release childLock after used, right?
[~shv] Thanks for your POC patches again and looks forward to the next 
milestone. And I would like to involve to push forward this feature if need.


was (Author: hexiaoqiao):
Thanks [~shv] for your POC patches. I have to state that this is very clever 
design for fine-grained global locking. There are still couple of questions 
what I do not quite understand and look forward to your response.
1. Write concurrency control. Consider one case with two threads with mkdir 
(/a/b/c/d/e) and delete(/a/b/c) ops. I try to ran this case following design 
and POC patches, but I usually get unstable result since key with  
and  could be located at different RangeGSet using 
{{INodeMap#latchWriteLock}}, then the two threads could run concurrently and 
get unstable result even if from one client and one by one. As your last 
explains, `deleting a directory should lock all RangeGets involved`. Is it one 
special case about Delete Ops? Sorry for asking this question again.
{quote}
Deleting a directory /a/b/c means deleting the entire sub-tree underneath this 
directory. We should lock all RangeGSets involved in such deletion, 
particularly the one containing file f. So f cannot be modified concurrently 
with the delete.
{quote}
2. {{INode}} involves local variable {{long[] namespaceKey}} at 0004 in POC 
package. I believe this attributes is very useful to partition for INode. 
meanwhile does it bring some other potential issues
* heap footprint overhead. For a long while running of NameNode process, 
namespaceKey of most INode (visited once at least) in the directory tree may be 
not null. If we consider there are 500M INodes and {{level}} is both 2, it need 
over than 8GB heap size.
* when one INode is renamed, the {{namespaceKey}} have to update, right? Since 
its parent INode has changes. POC seems not update anymore if {{namespaceKey}} 
is not null.
Is it possible to calculate namespaceKey for INode when use it out of the Lock. 
Of course, it will bring CPU overhead. Please correct me if I am wrong. Thanks.
3. No LatchLock unlock in the POC for operation #mkdir, it seems like a bit of 
oversight. In my opinion, it has to release childLock after used, right?
[~shv] Thanks for your POC patches again and looks forward to the next 
milestone. And I would like to involve to push forward this feature if need.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin 

[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2019-08-08 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903374#comment-16903374
 ] 

Konstantin Shvachko edited comment on HDFS-14703 at 8/8/19 10:10 PM:
-

Hi [~hexiaoqiao], thanks for reviewing the doc. Very good questions:
# "Cousins" means files like {{/a/b/c/d}} and {{/a/b/m/n}}. They will have 
keys, respectively, {{}} and {{}}, which have 
common prefix {{}} and therefore are likely to fall into the same 
RangeGSet. In your example {{}} is the parent of {{}} and this key definition does not guarantee them to be in the same range.
# Deleting a directory {{/a/b/c}} means deleting the entire sub-tree underneath 
this directory. We should lock all RangeGSets involved in such deletion, 
particularly the one containing file {{f}}. So {{f}} cannot be modified 
concurrently with the delete.
# Just to clarify RangeMap is the upper level part of PartitionedGSet, which 
maps key ranges into RangeGSets. So there is only one RangeMap and many 
RangeGSets. Holding a lock on RangeMap is akin to holding a global lock. You 
make a good point that some operations like failover, large deletes, renames, 
quota changes will still require a global lock. The lock on RangeMap could play 
the role of such global lock. This should be defined in more details within the 
design of LatchLock. Ideally we should retain FSNamesystemLock as a global lock 
for some operations. This will also help us gradually switch operations from 
FSNamesystemLock to LatchLock.
# I don't know what the next bottleneck we will see, but you are absolutely 
correct there will be something. For edits log, I indeed saw while running my 
benchmarks that the number of transactions batched together while journaling 
was increasing. This is expected and desirable behavior, since writing large 
batches to a disk is more efficient than lots of small writes.


was (Author: shv):
Hi [~hexiaoqiao], thanks for reviewing the doc. Very good questions:
# "Cousins" means files like {{/a/b/c/d}} and {{/a/b/m/n}}. They will have 
keys, respectively, {{}} and {{}}, which have 
common prefix {{}} and therefore are likely to fall into the same 
RangeGSet. In your example {{}} is the parent of {{}} and this key definition does not guarantee them to be in the same range.
# Deleting a directory {{/a/b/c}} means deleting the entire sub-tree underneath 
this directory. We should lock all RangeGSets involved in such deletion, 
particularly the one containing containing file {{f}}. So {{f}} cannot be 
modified concurrently with the delete.
# Just to clarify RangeMap is the upper level part of PartitionedGSet, which 
maps key ranges into RangeGSets. So there is only one RangeMap and many 
RangeGSets. Holding a lock on RangeMap is akin to holding a global lock. You 
make a good point that some operations like failover, large deletes, renames, 
quota changes will still require a global lock. The lock on RangeMap could play 
the role of such global lock. This should be defined in more details within the 
design of LatchLock. Ideally we should retain FSNamesystemLock as a global lock 
for some operations. This will also help us gradually switch operations from 
FSNamesystemLock to LatchLock.
# I don't know what the next bottleneck we will see, but you are absolutely 
correct there will be something. For edits log, I indeed saw while running my 
benchmarks that the number of transactions batched together while journaling 
was increasing. This is expected and desirable behavior, since writing large 
batches to a disk is more efficient than lots of small writes.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2019-08-07 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901887#comment-16901887
 ] 

He Xiaoqiao edited comment on HDFS-14703 at 8/7/19 1:04 PM:


Thanks [~shv] for file this JIRA and plan to push this feature forward, it is 
very great work. Really appreciate doing this.
There are some details I am confused after reading the design document.
As design document said, each inode maps (through inode key) to one RangeMap 
who has a separate lock and carry out concurrently.
{quote}The inode key is a fixed length sequence of parent inodeids ending with 
the file inode id itself:
  key(f) = 
Where selfId is the inodeId of file f, pId is the id of its parent, and ppId is 
the id of the parent of the parent. Such definition of a key guarantees that 
not only siblings but also cousins (objects having the same grandparent) are 
partitioned into the same range most of the time
{quote}
Consider the following path: /a/b/c/d, corresponding inode id is [ida, idb, 
idc, idd].
1. How could we guarantee to map 'cousins' into the same range? In my first 
opinion, it could map to different RangeMaps, since for idc, its inode key = 
 and for idd its inode key = . Furthermore, if we 
rename one inode from one range to another one, do we need to bring it's all 
children and sub-tree inode transfer to another range? 
2. Any consideration about operating one nodes and its ancestor node 
concurrently? for instance, /a/b/c/d/e/f, we could delete inode c and modify 
inode f at the same time if they map to different range since we do not 
guarantee map them to the same one. maybe it is problem in the case.
3. Which lock will be hold if request some global request like ha failover, 
safemode etc.? do we need to obtain all RangeMap lock?
4. Any bottleneck meet after improve write throughput, I believe that EditLog 
OPS will keep increase, and will it to be the new bottleneck?
Please correct me if I do not understand correctly. Thanks.


was (Author: hexiaoqiao):
Thanks [~shv] for file this JIRA and plan to push this feature forward, it is 
very great work. Really appreciate doing this.
 There are some details I am confused after reading the design document.
 As design document said, each inode maps (through inode key) to one RangeMap 
who has a separate lock and carry out concurrently.
{quote}The inode key is a fixed length sequence of parent inodeids ending with 
the file inode id itself:
    key(f) = 
 Where selfId is the inodeId of file f, pId is the id of its parent, and ppId 
is the id of the parent of the parent. Such definition of a key guarantees that 
not only siblings but also cousins (objects having the same grandparent) are 
partitioned into the same range most of the time
{quote}
Consider the following path: /a/b/c/d/e, corresponding inode id is [ida, idb, 
idc, idd].
 1. How we could guarantee to map 'cousins' into the same range? In my first 
opinion, it could map to different RangeMaps, since for idc, its inode key = 
 and for idd its inode key = .
 2. Any consideration about operating one nodes and its ancestor node 
concurrently? for instance, /a/b/c/d/e/f, we could delete inode c and modify 
inode f at the same time if they map to different range since we do not 
guarantee map them to the same one. maybe it is problem in the case.
 3. Which lock will be hold if request some global request like ha failover, 
safemode etc.? do we need to obtain all RangeMap lock?
 4. Any bottleneck meet after improve write throughput, I believe that EditLog 
OPS will keep increase, and will it to be the new bottleneck?
Please correct me if I do not understand correctly. Thanks.

> NameNode Fine-Grained Locking via Metadata Partitioning
> ---
>
> Key: HDFS-14703
> URL: https://issues.apache.org/jira/browse/HDFS-14703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: NameNode Fine-Grained Locking.pdf
>
>
> We target to enable fine-grained locking by splitting the in-memory namespace 
> into multiple partitions each having a separate lock. Intended to improve 
> performance of NameNode write operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org