[ 
https://issues.apache.org/jira/browse/HDFS-16671?focusedWorklogId=794399&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794399
 ]

ASF GitHub Bot logged work on HDFS-16671:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Jul/22 22:17
            Start Date: 22/Jul/22 22:17
    Worklog Time Spent: 10m 
      Work Description: ayushtkn commented on code in PR #4597:
URL: https://github.com/apache/hadoop/pull/4597#discussion_r928012699


##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RBFConfigKeys.java:
##########
@@ -354,6 +354,10 @@ public class RBFConfigKeys extends 
CommonConfigurationKeysPublic {
       NoRouterRpcFairnessPolicyController.class;
   public static final String DFS_ROUTER_FAIR_HANDLER_COUNT_KEY_PREFIX =
       FEDERATION_ROUTER_FAIRNESS_PREFIX + "handler.count.";
+  public static final String DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT =
+      FEDERATION_ROUTER_FAIRNESS_PREFIX + "acquire.timeout";
+  public static final long   DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_MS_DEFAULT =

Review Comment:
   The variable name should be similar to the original config name + 
``_DEFAULT``, remove the  ``_MS_`` in middle.
   Should be ``DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_DEFAULT``
   



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/fairness/AbstractRouterRpcFairnessPolicyController.java:
##########
@@ -42,15 +45,22 @@
   /** Hash table to hold semaphore for each configured name service. */
   private Map<String, Semaphore> permits;
 
+  private long acquireTimeoutMs = 
DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_MS_DEFAULT;
+
   public void init(Configuration conf) {
     this.permits = new HashMap<>();
+    long timeoutMs = conf.getTimeDuration(DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT,
+        DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+    if (timeoutMs >= 0) {
+      acquireTimeoutMs = timeoutMs;

Review Comment:
   if there is an invalid entry configured and we are moving to using the 
default value. We should atleast have a warn log. Kind of ``Invalid value -1 
configured for dfs.... should be greater than or equal to 0, Using default 
value of : 1s instead``
   something like this



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/fairness/TestRouterRpcFairnessPolicyController.java:
##########
@@ -83,6 +87,29 @@ public void testHandlerAllocationPreconfigured() {
     
assertFalse(routerRpcFairnessPolicyController.acquirePermit(CONCURRENT_NS));
   }
 
+  @Test
+  public void testAcquireTimeout() {
+    Configuration conf = createConf(40);
+    conf.setInt(DFS_ROUTER_FAIR_HANDLER_COUNT_KEY_PREFIX + "ns1", 30);
+    conf.setTimeDuration(DFS_ROUTER_FAIRNESS_ACQUIRE_TIMEOUT, 100, 
TimeUnit.MILLISECONDS);
+    RouterRpcFairnessPolicyController routerRpcFairnessPolicyController =
+        FederationUtil.newFairnessPolicyController(conf);
+
+    // ns1 should have 30 permits allocated
+    for (int i = 0; i < 30; i++) {
+      assertTrue(routerRpcFairnessPolicyController.acquirePermit("ns1"));
+    }
+    long acquireBeginTimeMs = Time.monotonicNow();
+    assertFalse(routerRpcFairnessPolicyController.acquirePermit("ns1"));
+    long acquireEndTimeMs = Time.monotonicNow();
+
+    long acquireTimeMs = acquireEndTimeMs - acquireBeginTimeMs;
+
+    // There are some other operations, so acquireTimeMs >= 100ms.
+    assertTrue(acquireTimeMs >= 100);
+    assertTrue(acquireTimeMs < 100 + 50);

Review Comment:
   Is this 50 added as safe margin, kind of that other operations won't take 
more than 50ms? In that case I doubt this test is gonna go flaky in future, 
Double check to confirm it isn't gonna cross the safe limit of 50 ever.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 794399)
    Time Spent: 1.5h  (was: 1h 20m)

> RBF: RouterRpcFairnessPolicyController supports configurable permit acquire 
> timeout
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-16671
>                 URL: https://issues.apache.org/jira/browse/HDFS-16671
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> RouterRpcFairnessPolicyController supports configurable permit acquire 
> timeout. Hardcode 1s is very long, and it has causes an incident in our prod 
> environment when one nameserivce is busy.
> And the optimal timeout maybe should be less than p50(avgTime).
> And all handlers in RBF is waiting to acquire the permit of the busy ns. 
> {code:java}
> "IPC Server handler 12 on default port 8888" #2370 daemon prio=5 os_prio=0 
> tid=? nid=?  waiting on condition [?]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <?> (a 
> java.util.concurrent.Semaphore$NonfairSync)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>       at java.util.concurrent.Semaphore.tryAcquire(Semaphore.java:409)
>       at 
> org.apache.hadoop.hdfs.server.federation.fairness.AbstractRouterRpcFairnessPolicyController.acquirePermit(AbstractRouterRpcFairnessPolicyController.java:56)
>       at 
> org.apache.hadoop.hdfs.server.federation.fairness.DynamicRouterRpcFairnessPolicyController.acquirePermit(DynamicRouterRpcFairnessPolicyController.java:123)
>       at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.acquirePermit(RouterRpcClient.java:1500)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to