[jira] [Work logged] (HDFS-16090) Fine grained locking for datanodeNetworkCounts

ASF GitHub Bot (Jira) Mon, 28 Jun 2021 01:48:06 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-16090?focusedWorklogId=615504&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-615504
 ]


ASF GitHub Bot logged work on HDFS-16090:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Jun/21 08:47
            Start Date: 28/Jun/21 08:47
    Worklog Time Spent: 10m 
      Work Description: virajjasani commented on a change in pull request #3148:
URL: https://github.com/apache/hadoop/pull/3148#discussion_r659596959



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
##########
@@ -2272,19 +2274,11 @@ public int getActiveTransferThreadCount() {
   void incrDatanodeNetworkErrors(String host) {
     metrics.incrDatanodeNetworkErrors();
 
-    /*
-     * Synchronizing on the whole cache is a big hammer, but since it's only
-     * accumulating errors, it should be ok. If this is ever expanded to 
include
-     * non-error stats, then finer-grained concurrency should be applied.
-     */
-    synchronized (datanodeNetworkCounts) {
-      try {
-        final Map<String, Long> curCount = datanodeNetworkCounts.get(host);
-        curCount.put("networkErrors", curCount.get("networkErrors") + 1L);
-        datanodeNetworkCounts.put(host, curCount);
-      } catch (ExecutionException e) {
-        LOG.warn("failed to increment network error counts for host: {}", 
host);
-      }
+    try {
+      datanodeNetworkCounts.get(host).compute(NETWORK_ERRORS,
+          (key, errors) -> errors == null ? null : errors + 1L);

Review comment:
       So everytime we have a network error, instead of locking entire 
LoadingCache, with CHM.compute(), we will just take lock on bucket of Map where 
the key resides and then error count will be incremented. So this is fine 
grained locking and much performant than taking lock on entire `LoadingCache`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 615504)
    Time Spent: 1h 10m  (was: 1h)

> Fine grained locking for datanodeNetworkCounts
> ----------------------------------------------
>
>                 Key: HDFS-16090
>                 URL: https://issues.apache.org/jira/browse/HDFS-16090
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> While incrementing DataNode network error count, we lock entire LoadingCache 
> in order to increment network count of specific host. We should provide fine 
> grained concurrency for this update because locking entire cache is redundant 
> and could impact performance while incrementing network count for multiple 
> hosts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16090) Fine grained locking for datanodeNetworkCounts

Reply via email to