[
https://issues.apache.org/jira/browse/HDDS-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated HDDS-4970:
----------------------------------
Summary: Significant overhead when DataNode is over-subscribed (was:
Significant overhead when DataNode is over-scribed)
> Significant overhead when DataNode is over-subscribed
> -----------------------------------------------------
>
> Key: HDDS-4970
> URL: https://issues.apache.org/jira/browse/HDDS-4970
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Datanode
> Affects Versions: 1.0.0
> Reporter: Wei-Chiu Chuang
> Priority: Critical
> Attachments: Screen Shot 2021-03-11 at 11.58.23 PM.png
>
>
> Ran a microbenchmark to have concurrent clients reading chunks from a
> DataNode.
> When the number of clients grows, there is a significant amount of overhead
> in accessing a concurrent hash map. The overhead grows exponentially with
> respect to the number of clients.
> {code:java|title=ChunkUtils#processFileExclusively}
> @VisibleForTesting
> static <T> T processFileExclusively(Path path, Supplier<T> op) {
> for (;;) {
> if (LOCKS.add(path)) {
> break;
> }
> }
> try {
> return op.get();
> } finally {
> LOCKS.remove(path);
> }
> }
> {code}
> In my test, having 64 concurrent clients reading chunks from a 1-disk
> DataNode caused the DN to spend nearly half of the time adding into the LOCKS
> object (a concurrent hash map).
>
> !Screen Shot 2021-03-11 at 11.58.23 PM.png|width=640!
>
> Given that it is not uncommon to find HDFS DataNodes with tens of thousands
> of incoming client connections, I expect to see similar traffic to an Ozone
> DataNode at scale.
> We should fix this code.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]