[GitHub] [hadoop] virajjasani commented on a change in pull request #3148: HDFS-16090. Fine grained lock for datanodeNetworkCounts
virajjasani commented on a change in pull request #3148: URL: https://github.com/apache/hadoop/pull/3148#discussion_r659601136 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java ## @@ -2272,19 +2274,11 @@ public int getActiveTransferThreadCount() { void incrDatanodeNetworkErrors(String host) { metrics.incrDatanodeNetworkErrors(); -/* - * Synchronizing on the whole cache is a big hammer, but since it's only - * accumulating errors, it should be ok. If this is ever expanded to include - * non-error stats, then finer-grained concurrency should be applied. - */ -synchronized (datanodeNetworkCounts) { - try { -final Map curCount = datanodeNetworkCounts.get(host); -curCount.put("networkErrors", curCount.get("networkErrors") + 1L); -datanodeNetworkCounts.put(host, curCount); - } catch (ExecutionException e) { -LOG.warn("failed to increment network error counts for host: {}", host); - } +try { + datanodeNetworkCounts.get(host).compute(NETWORK_ERRORS, + (key, errors) -> errors == null ? null : errors + 1L); Review comment: > i mean, shouldn't it be made 1 when errors is null (meaning the key didn't exist before)? I see. Based on the LoadingCache creation, we will always find value `0L` at the beginning and the only reason why I handled `errors == null` case is because findbugs don't complain about missing this. But I think your suggestion is better, we should return error `1L` when it is null. Let me change this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] virajjasani commented on a change in pull request #3148: HDFS-16090. Fine grained lock for datanodeNetworkCounts
virajjasani commented on a change in pull request #3148: URL: https://github.com/apache/hadoop/pull/3148#discussion_r659596959 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java ## @@ -2272,19 +2274,11 @@ public int getActiveTransferThreadCount() { void incrDatanodeNetworkErrors(String host) { metrics.incrDatanodeNetworkErrors(); -/* - * Synchronizing on the whole cache is a big hammer, but since it's only - * accumulating errors, it should be ok. If this is ever expanded to include - * non-error stats, then finer-grained concurrency should be applied. - */ -synchronized (datanodeNetworkCounts) { - try { -final Map curCount = datanodeNetworkCounts.get(host); -curCount.put("networkErrors", curCount.get("networkErrors") + 1L); -datanodeNetworkCounts.put(host, curCount); - } catch (ExecutionException e) { -LOG.warn("failed to increment network error counts for host: {}", host); - } +try { + datanodeNetworkCounts.get(host).compute(NETWORK_ERRORS, + (key, errors) -> errors == null ? null : errors + 1L); Review comment: So everytime we have a network error, instead of locking entire LoadingCache, with CHM.compute(), we will just take lock on bucket of Map where the key resides and then error count will be incremented. So this is fine grained locking and much performant than taking lock on entire `LoadingCache`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] virajjasani commented on a change in pull request #3148: HDFS-16090. Fine grained lock for datanodeNetworkCounts
virajjasani commented on a change in pull request #3148: URL: https://github.com/apache/hadoop/pull/3148#discussion_r659595214 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java ## @@ -2272,19 +2274,11 @@ public int getActiveTransferThreadCount() { void incrDatanodeNetworkErrors(String host) { metrics.incrDatanodeNetworkErrors(); -/* - * Synchronizing on the whole cache is a big hammer, but since it's only - * accumulating errors, it should be ok. If this is ever expanded to include - * non-error stats, then finer-grained concurrency should be applied. - */ -synchronized (datanodeNetworkCounts) { - try { -final Map curCount = datanodeNetworkCounts.get(host); -curCount.put("networkErrors", curCount.get("networkErrors") + 1L); -datanodeNetworkCounts.put(host, curCount); - } catch (ExecutionException e) { -LOG.warn("failed to increment network error counts for host: {}", host); - } +try { + datanodeNetworkCounts.get(host).compute(NETWORK_ERRORS, + (key, errors) -> errors == null ? null : errors + 1L); Review comment: Map.compute() is just replacement of below code (and ConcurrentHashMap does it atomically): ``` * {@code * V oldValue = map.get(key); * V newValue = remappingFunction.apply(key, oldValue); * if (oldValue != null ) { *if (newValue != null) * map.put(key, newValue); *else * map.remove(key); * } else { *if (newValue != null) * map.put(key, newValue); *else * return null; * } * } ``` errors will ideally never be null because it is defined as `0L` here: ``` datanodeNetworkCounts = CacheBuilder.newBuilder() .maximumSize(dncCacheMaxSize) .build(new CacheLoader>() { @Override public Map load(String key) throws Exception { final Map ret = new HashMap(); ret.put("networkErrors", 0L); return ret; } }); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org