Goodness Ayinmode created HDFS-17639:
----------------------------------------
Summary: Lock contention for hasStorageType when the number of
storage nodes is large
Key: HDFS-17639
URL: https://issues.apache.org/jira/browse/HDFS-17639
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode, server
Affects Versions: 3.4.0
Reporter: Goodness Ayinmode
Lock contention and for hasStorageType when the number of storage nodes is large
Hi,
I was looking into methods associated with storages and storageTypes. I found
[DatanodeDescriptor.hasStorageType|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L1138]
could be a source of bottlenecks. To check whether a specific storage type
exists among the storage locations associated with a DatanodeDescriptor,
[hasStorageType|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L1138]
iterates over an array of DatanodeStorageInfos returned by
[getStorageInfos()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L305].
This retrieves the storage information from a storageMap and converts it to an
array while under a lock. As the system scales and the size of storageMap grows
with more datanodes, the duration spent in the synchronized block will
increase. This issue could become more significant when hasStorageType is
called in methods like
[DatanodeDescriptor.pruneStorageMap|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L568]
that could iterate (resulting in a form of nested iteration) over a large data
structure. The combination of a repeated linear search (within hasStorageType)
and the iteration within a lock can lead to a significant complexity
(potentially quadratic) and significant synchronization bottlenecks
[DFSNetworkTopology.chooseRandomWithStorageType|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DFSNetworkTopology.java#L180]
and [DFSNetworkTopology.
chooseRandomWithStorageTypeTwoTrial|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DFSNetworkTopology.java#L107]
are affected because they both invoke hasStorageType. Additionally,
[INodeFile.assertAllBlocksComplete|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java#L345]
and
[BlockManager.checkRedundancy()|https://github.com/apache/hadoop/blob/6be04633b55bbd67c2875e39977cd9d2308dc1d1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5018]
faces a similar issue
([FSNamesystem.finalizeINodeFileUnderConstruction|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L3908]
invokes both methods under a writeLock)
This appears to be a similar issue with
https://issues.apache.org/jira/browse/HDFS-17638 . I’m curious to know if my
analysis is wrong and if there is anything that can be done to reduce the
impact of these issues
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]