apurtell commented on code in PR #7864:
URL: https://github.com/apache/hbase/pull/7864#discussion_r3268078733
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java:
##########
@@ -466,7 +457,15 @@ public void resizeQueues(Configuration conf) {
}
}
final int queueLimit = currentQueueLimit;
- currentQueueLimit = conf.getInt(configKey, queueLimit);
+ int newQueueLimit = conf.getInt(configKey, queueLimit);
+ if (newQueueLimit > DEFAULT_CALL_QUEUE_SIZE_HARD_LIMIT) {
+ LOG.warn(
+ "Requested soft limit {} exceeds dynamic-resize ceiling {}. "
Review Comment:
This WARN will be misleading.
The actual limit is `queueHardLimit`, initialized in the constructor as
Math.max(maxQueueLength, 250).
##########
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java:
##########
@@ -808,4 +837,161 @@ public void drop() {
}
};
}
+
+ /**
+ * Test LIFO switching behavior through actual RPC calls. This test verifies
that when the queue
+ * fills beyond the LIFO threshold, newer calls are processed before older
calls (LIFO mode).
+ */
+ @Test
+ public void testCoDelLifoWithRpcCalls() throws Exception {
+ Configuration testConf = HBaseConfiguration.create();
+ testConf.set(RpcExecutor.CALL_QUEUE_TYPE_CONF_KEY,
+ RpcExecutor.CALL_QUEUE_TYPE_CODEL_CONF_VALUE);
+ int maxCallQueueLength = 50;
+ double coDelLifoThreshold = 0.8;
+ testConf.setInt(RpcScheduler.IPC_SERVER_MAX_CALLQUEUE_LENGTH,
maxCallQueueLength);
+ testConf.setDouble(RpcExecutor.CALL_QUEUE_CODEL_LIFO_THRESHOLD,
coDelLifoThreshold);
+ testConf.setInt(RpcExecutor.CALL_QUEUE_CODEL_TARGET_DELAY, 100);
+ testConf.setInt(RpcExecutor.CALL_QUEUE_CODEL_INTERVAL, 100);
+ testConf.setInt(HConstants.REGION_SERVER_HANDLER_COUNT, 1); // Single
handler to control
+ // processing
+
+ PriorityFunction priority = mock(PriorityFunction.class);
+ when(priority.getPriority(any(), any(),
any())).thenReturn(HConstants.NORMAL_QOS);
+ SimpleRpcScheduler scheduler =
+ new SimpleRpcScheduler(testConf, 1, 0, 0, priority,
HConstants.QOS_THRESHOLD);
+
+ try {
+ scheduler.init(CONTEXT);
+ scheduler.start();
+
+ // Track completion order
+ final List<Integer> completedCalls = Collections.synchronizedList(new
ArrayList<>());
+
+ // Dispatch many slow calls rapidly to fill the queue beyond 80%
threshold
+ // With queue limit of 50, we need > 40 calls to cross 80%
+ int numCalls = 48;
+ for (int i = 0; i < numCalls; i++) {
+ final int callId = i;
+ CallRunner call = createMockTask(HConstants.NORMAL_QOS);
+ call.setStatus(new MonitoredRPCHandlerImpl("test"));
+ doAnswer(invocation -> {
+ completedCalls.add(callId);
+ Thread.sleep(100); // Slow processing to allow queue to build up
+ return null;
+ }).when(call).run();
+ scheduler.dispatch(call);
+ // No delay between dispatches - rapidly fill the queue
+ }
+
+ // Wait for some calls to complete
+ await().atMost(2, TimeUnit.SECONDS).until(() -> completedCalls.size() >=
3);
Review Comment:
This will probably flake the test.
The assertion below for `maxCallIdCompleted` inspect up to 5 completions.
With only 3 guaranteed by the `await()`, we might miss a completion that would
otherwise land in the 4th/5th slot.
##########
hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java:
##########
@@ -814,4 +817,166 @@ public void drop() {
}
};
}
+
+ /**
+ * Test LIFO switching behavior through actual RPC calls. This test verifies
that when the queue
+ * fills beyond the LIFO threshold, newer calls are processed before older
calls (LIFO mode).
+ */
+ @Test
+ public void testCoDelLifoWithRpcCalls() throws Exception {
+ Configuration testConf = HBaseConfiguration.create();
+ testConf.set(RpcExecutor.CALL_QUEUE_TYPE_CONF_KEY,
+ RpcExecutor.CALL_QUEUE_TYPE_CODEL_CONF_VALUE);
+ int maxCallQueueLength = 50;
+ double codelLifoThreshold = 0.8;
+ testConf.setInt(RpcScheduler.IPC_SERVER_MAX_CALLQUEUE_LENGTH,
maxCallQueueLength);
+ testConf.setDouble(RpcExecutor.CALL_QUEUE_CODEL_LIFO_THRESHOLD,
codelLifoThreshold);
+ testConf.setInt(RpcExecutor.CALL_QUEUE_CODEL_TARGET_DELAY, 100);
+ testConf.setInt(RpcExecutor.CALL_QUEUE_CODEL_INTERVAL, 100);
+ testConf.setInt(HConstants.REGION_SERVER_HANDLER_COUNT, 1); // Single
handler to control
+ // processing
+
+ PriorityFunction priority = mock(PriorityFunction.class);
+ when(priority.getPriority(any(), any(),
any())).thenReturn(HConstants.NORMAL_QOS);
+ SimpleRpcScheduler scheduler =
+ new SimpleRpcScheduler(testConf, 1, 0, 0, priority,
HConstants.QOS_THRESHOLD);
+
+ try {
+ scheduler.init(CONTEXT);
+ scheduler.start();
+
+ // Track completion order
+ final List<Integer> completedCalls = Collections.synchronizedList(new
ArrayList<>());
+
+ // Dispatch many slow calls rapidly to fill the queue beyond 80%
threshold
+ // With queue limit of 50, we need > 40 calls to cross 80%
+ int numCalls = 48;
+ for (int i = 0; i < numCalls; i++) {
+ final int callId = i;
+ CallRunner call = createMockTask(HConstants.NORMAL_QOS);
Review Comment:
Thank you, resolving this.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java:
##########
@@ -231,20 +234,7 @@ public Map<String, Long> getCallQueueSizeSummary() {
.collect(Collectors.groupingBy(Pair::getFirst,
Collectors.summingLong(Pair::getSecond)));
}
- // This method can only be called ONCE per executor instance.
- // Before calling: queueInitArgs[0] contains the soft limit (desired queue
capacity)
- // After calling: queueInitArgs[0] is set to hard limit and
currentQueueLimit stores the original
- // soft limit.
- // Multiple calls would incorrectly use the hard limit as the soft limit.
- // As all the queues has same initArgs and queueClass, there should be no
need to call this again.
protected void initializeQueues(final int numQueues) {
- if (!queues.isEmpty()) {
Review Comment:
Thank you, resolving this.
##########
hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java:
##########
@@ -90,7 +90,7 @@ public abstract class RpcExecutor {
public static final String CALL_QUEUE_CODEL_LIFO_THRESHOLD =
"hbase.ipc.server.callqueue.codel.lifo.threshold";
- public static final int CALL_QUEUE_CODEL_DEFAULT_TARGET_DELAY = 100;
+ public static final int CALL_QUEUE_CODEL_DEFAULT_TARGET_DELAY = 5;
Review Comment:
This change is fine for `master` branch, and even `branch-2` but will need
to be left to the current default in the releasing branches per our
compatibility guidelines, which disallow changes that affect operational
behaviors. Existing clusters relying on the looser 100ms target will drop calls
maybe more aggressively than expected or benchmarked under the same load.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]