[ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990758#comment-14990758
 ] 

Staffan Friberg commented on HDFS-9260:
---------------------------------------

Hi Daryn,

Thanks for the comments and the additional data points. Interesting to learn 
more about the scale of HDFS instances. I wonder if the NN was running on older 
and slower hardware in my case compared to your setup, the cluster I was able 
to get my hands on for these runs has fairly old machines.

Adds of new blocks are relatively fast since they will be at the far right of 
the Tree the number of lookups will be minimal. However the current 
implementation only needs to do around two writes to insert something at the 
head/end of the list nothing that has a more complicated datastructure will be 
able to match it. It will be a question of trade-off.

Also to clarify, the microbenchmarks only measures the actual remove and insert 
of random values not the whole process of copying files etc. I would expect the 
other parts to far outweigh the time it takes to update the datastructures, so 
while the 4x sounds scary it should be a minor part of the whole transaction.

I think the patch you are referring to is HDFS-6658. I applied it to the 3.0.0 
branch from March 11 2015 which was from when the patch was created and ran it 
on the same microbenchmarks I built to test my patch. I will attach the source 
code for the benchmarks so you can check that I used the right APIs for it to 
be comparable. From what I can tell the benchmarks should do the same thing on 
a high level. The performance overhead for adding and removing are similar 
between our two implementations. 

{noformat}
fbrAllExisting  - Do a Full Block Report with the same 2M entries that are 
already registered for the Storage in the NN.
addRemoveBulk   - Remove 32k random blocks from a StorageInfo that has 64k 
entries, then re-add them all.
addRemoveRandom - Remove and directly re-add a block from a Storage entry, 
repeat for 32k blocks from a StorageInfo with 64k blocks
iterate         - Iterate and get blockID for 64k blocks associated with a 
particular StorageInfo

==> benchmarks_trunkMarch11_intMapping.jar.output <==
Benchmark                          Mode  Cnt    Score   Error  Units
FullBlockReport.fbrAllExisting     avgt   25  379.659 ± 5.463  ms/op
StorageInfoAccess.addRemoveBulk    avgt   25   16.426 ± 0.380  ms/op
StorageInfoAccess.addRemoveRandom  avgt   25   15.401 ± 0.196  ms/op
StorageInfoAccess.iterate          avgt   25    1.496 ± 0.004  ms/op

==> benchmarks_trunk_baseline.jar.output <==
Benchmark                          Mode  Cnt    Score   Error  Units
FullBlockReport.fbrAllExisting     avgt   25  288.974 ± 3.970  ms/op
StorageInfoAccess.addRemoveBulk    avgt   25    3.157 ± 0.046  ms/op
StorageInfoAccess.addRemoveRandom  avgt   25    2.815 ± 0.012  ms/op
StorageInfoAccess.iterate          avgt   25    0.788 ± 0.006  ms/op

==> benchmarks_trunk_treeset.jar.output <==
Benchmark                          Mode  Cnt    Score   Error  Units
FullBlockReport.fbrAllExisting     avgt   25  231.270 ± 3.450  ms/op
StorageInfoAccess.addRemoveBulk    avgt   25   11.596 ± 0.521  ms/op
StorageInfoAccess.addRemoveRandom  avgt   25   11.249 ± 0.101  ms/op
StorageInfoAccess.iterate          avgt   25    0.385 ± 0.010  ms/op
{noformat}

Do you have a good suggestion for some other perf test/stress test that would 
be good to try out? Any stress load you have on your end that would be possible 
to try it out on?

> Improve performance and GC friendliness of startup and FBRs
> -----------------------------------------------------------
>
>                 Key: HDFS-9260
>                 URL: https://issues.apache.org/jira/browse/HDFS-9260
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode, performance
>    Affects Versions: 2.7.1
>            Reporter: Staffan Friberg
>            Assignee: Staffan Friberg
>         Attachments: FBR processing.png, HDFS Block and Replica Management 
> 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, 
> HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change and also some help 
> investigating/understanding a few outstanding issues if we are interested in 
> moving forward with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to