[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990758#comment-14990758 ]
Staffan Friberg commented on HDFS-9260: --------------------------------------- Hi Daryn, Thanks for the comments and the additional data points. Interesting to learn more about the scale of HDFS instances. I wonder if the NN was running on older and slower hardware in my case compared to your setup, the cluster I was able to get my hands on for these runs has fairly old machines. Adds of new blocks are relatively fast since they will be at the far right of the Tree the number of lookups will be minimal. However the current implementation only needs to do around two writes to insert something at the head/end of the list nothing that has a more complicated datastructure will be able to match it. It will be a question of trade-off. Also to clarify, the microbenchmarks only measures the actual remove and insert of random values not the whole process of copying files etc. I would expect the other parts to far outweigh the time it takes to update the datastructures, so while the 4x sounds scary it should be a minor part of the whole transaction. I think the patch you are referring to is HDFS-6658. I applied it to the 3.0.0 branch from March 11 2015 which was from when the patch was created and ran it on the same microbenchmarks I built to test my patch. I will attach the source code for the benchmarks so you can check that I used the right APIs for it to be comparable. From what I can tell the benchmarks should do the same thing on a high level. The performance overhead for adding and removing are similar between our two implementations. {noformat} fbrAllExisting - Do a Full Block Report with the same 2M entries that are already registered for the Storage in the NN. addRemoveBulk - Remove 32k random blocks from a StorageInfo that has 64k entries, then re-add them all. addRemoveRandom - Remove and directly re-add a block from a Storage entry, repeat for 32k blocks from a StorageInfo with 64k blocks iterate - Iterate and get blockID for 64k blocks associated with a particular StorageInfo ==> benchmarks_trunkMarch11_intMapping.jar.output <== Benchmark Mode Cnt Score Error Units FullBlockReport.fbrAllExisting avgt 25 379.659 ± 5.463 ms/op StorageInfoAccess.addRemoveBulk avgt 25 16.426 ± 0.380 ms/op StorageInfoAccess.addRemoveRandom avgt 25 15.401 ± 0.196 ms/op StorageInfoAccess.iterate avgt 25 1.496 ± 0.004 ms/op ==> benchmarks_trunk_baseline.jar.output <== Benchmark Mode Cnt Score Error Units FullBlockReport.fbrAllExisting avgt 25 288.974 ± 3.970 ms/op StorageInfoAccess.addRemoveBulk avgt 25 3.157 ± 0.046 ms/op StorageInfoAccess.addRemoveRandom avgt 25 2.815 ± 0.012 ms/op StorageInfoAccess.iterate avgt 25 0.788 ± 0.006 ms/op ==> benchmarks_trunk_treeset.jar.output <== Benchmark Mode Cnt Score Error Units FullBlockReport.fbrAllExisting avgt 25 231.270 ± 3.450 ms/op StorageInfoAccess.addRemoveBulk avgt 25 11.596 ± 0.521 ms/op StorageInfoAccess.addRemoveRandom avgt 25 11.249 ± 0.101 ms/op StorageInfoAccess.iterate avgt 25 0.385 ± 0.010 ms/op {noformat} Do you have a good suggestion for some other perf test/stress test that would be good to try out? Any stress load you have on your end that would be possible to try it out on? > Improve performance and GC friendliness of startup and FBRs > ----------------------------------------------------------- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance > Affects Versions: 2.7.1 > Reporter: Staffan Friberg > Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)