[ https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793785#comment-16793785 ]
Chao Sun edited comment on HDFS-14366 at 3/15/19 5:16 PM: ---------------------------------------------------------- [~elgoiri], [~jojochuang]: can you help to get this committed? Thanks. was (Author: csun): [~elgoiri], [~jojochuang]: can you help to get committed? Thanks. > Improve HDFS append performance > ------------------------------- > > Key: HDFS-14366 > URL: https://issues.apache.org/jira/browse/HDFS-14366 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Affects Versions: 2.8.2 > Reporter: Chao Sun > Assignee: Chao Sun > Priority: Major > Attachments: HDFS-14366.000.patch, HDFS-14366.001.patch, > append-flamegraph.png > > > In our HDFS cluster we observed that {{append}} operation can take as much as > 10X write lock time than other write operations. By collecting flamegraph on > the namenode (see attachment: append-flamegraph.png), we found that most of > the append call is spent on {{getNumLiveDataNodes()}}: > {code} > /** @return the number of live datanodes. */ > public int getNumLiveDataNodes() { > int numLive = 0; > synchronized (this) { > for(DatanodeDescriptor dn : datanodeMap.values()) { > if (!isDatanodeDead(dn) ) { > numLive++; > } > } > } > return numLive; > } > {code} > this method synchronizes on the {{DatanodeManager}} which is particularly > expensive in large clusters since {{datanodeMap}} is being modified in many > places such as processing DN heartbeats. > For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in > {{isSufficientlyReplicated}}: > {code} > /** > * Check if a block is replicated to at least the minimum replication. > */ > public boolean isSufficientlyReplicated(BlockInfo b) { > // Compare against the lesser of the minReplication and number of live > DNs. > final int replication = > Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes()); > return countNodes(b).liveReplicas() >= replication; > } > {code} > The way that the {{replication}} is calculated is not very optimal, as it > will call {{getNumLiveDataNodes()}} _every time_ even though usually > {{minReplication}} is much smaller than the latter. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org