[jira] [Comment Edited] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

Daryn Sharp (JIRA) Tue, 03 Nov 2015 14:37:56 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988315#comment-14988315
 ]


Daryn Sharp edited comment on HDFS-9260 at 11/3/15 10:36 PM:
-------------------------------------------------------------

I've read the doc now.  Sorry I commented before doing so.   The results are 
interesting until the the final details about a 4x reduction in block updates.  
Here are some basic specs to consider:
* 10-80k adds/min
* job submissions increasing replication factor to 10
* at least 1 node/day decommissioning or going dead with 100k-400k blocks
* every few weeks entire racks (40 nodes) are decommissioned for refresh or 
reallocation
* balancer is constantly churning to populate recommissioned dead nodes

That's a lot of IBRs which is why a 4x degradation is quite concerning.  The 
block report processing times seem a bit high in the tests. I'll attach an 
image of the BR processing times for some of our busiest clusters.  They span 
the gamut from 100M-300M blocks with roughly the same number of files.  We got 
a huge improvement from my BR encoding change + per-storage reports.

BTW, I had/have a working patch that replaced the triplets with sparse yet 
densely packed 2-dimensional primitive arrays.  Everything is linked via 
indices to a greatly reduce the dirty cards to scan.  Need to dig up the jira 
when my head is above water.



was (Author: daryn):
I've read the doc now.  Sorry I commented before doing so.   The results are 
interesting until the the final details about a 4x reduction in block updates.  
Here are some basic specs to consider:
* 10-80k adds/min
* job submissions increasing replication factor to 10
* at least 1 node/day decommissioning or going dead with 100k-400k blocks
* every few weeks entire racks (40 nodes) are decommissioned for refresh or 
reallocation
* balancer is constantly churning to populate recommissioned dead nodes

That's a lot of IBRs which is why a 4x degradation is quite concerning.  The 
block report processing times seem a bit high in the tests. :)  I'll attach an 
image of the BR processing times for some of our busiest clusters.  They span 
the gamut from 100M-300M blocks with roughly the same number of files.  We got 
a huge improvement from my BR encoding change + per-storage reports.

BTW, I had/have a working patch that replaced the triplets with sparse yet 
densely packed 2-dimensional primitive arrays.  Everything is linked via 
indices to a greatly reduce the dirty cards to scan.  Need to dig up the jira 
when my head is above water.


> Improve performance and GC friendliness of startup and FBRs
> -----------------------------------------------------------
>
>                 Key: HDFS-9260
>                 URL: https://issues.apache.org/jira/browse/HDFS-9260
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode, performance
>    Affects Versions: 2.7.1
>            Reporter: Staffan Friberg
>            Assignee: Staffan Friberg
>         Attachments: FBR processing.png, HDFS Block and Replica Management 
> 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, 
> HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change and also some help 
> investigating/understanding a few outstanding issues if we are interested in 
> moving forward with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

Reply via email to