[ 
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793785#comment-16793785
 ] 

Chao Sun edited comment on HDFS-14366 at 3/15/19 5:16 PM:
----------------------------------------------------------

[~elgoiri], [~jojochuang]: can you help to get this committed? Thanks.


was (Author: csun):
[~elgoiri], [~jojochuang]: can you help to get committed? Thanks.

> Improve HDFS append performance
> -------------------------------
>
>                 Key: HDFS-14366
>                 URL: https://issues.apache.org/jira/browse/HDFS-14366
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 2.8.2
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>            Priority: Major
>         Attachments: HDFS-14366.000.patch, HDFS-14366.001.patch, 
> append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as 
> 10X write lock time than other write operations. By collecting flamegraph on 
> the namenode (see attachment: append-flamegraph.png), we found that most of 
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
>   /** @return the number of live datanodes. */
>   public int getNumLiveDataNodes() {
>     int numLive = 0;
>     synchronized (this) {
>       for(DatanodeDescriptor dn : datanodeMap.values()) {
>         if (!isDatanodeDead(dn) ) {
>           numLive++;
>         }
>       }
>     }
>     return numLive;
>   }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly 
> expensive in large clusters since {{datanodeMap}} is being modified in many 
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in 
> {{isSufficientlyReplicated}}:
> {code}
>   /**
>    * Check if a block is replicated to at least the minimum replication.
>    */
>   public boolean isSufficientlyReplicated(BlockInfo b) {
>     // Compare against the lesser of the minReplication and number of live 
> DNs.
>     final int replication =
>         Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
>     return countNodes(b).liveReplicas() >= replication;
>   }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it 
> will call {{getNumLiveDataNodes()}} _every time_ even though usually 
> {{minReplication}} is much smaller than the latter. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to