Jaehui Lee created HBASE-29781:
----------------------------------
Summary: Dynamic configurations for call queue length doesn't work
correctly when increasing the limit
Key: HBASE-29781
URL: https://issues.apache.org/jira/browse/HBASE-29781
Project: HBase
Issue Type: Bug
Components: rpc
Reporter: Jaehui Lee
Assignee: Jaehui Lee
h2. Problem
The dynamic configurations for call queue length (such as
{{{}ipc.server.max.callqueue.length{}}},
{{{}ipc.server.priority.max.callqueue.length{}}},
{{{}ipc.server.replication.max.callqueue.length{}}},
{{{}ipc.server.bulkload.max.callqueue.length{}}}) only works when *decreasing*
the limit. When *increasing* the limit, the configuration change has no effect
- tasks are still rejected at the hard limit.
*Example:*
* Initial configuration: {{ipc.server.max.callqueue.length = 100}}
* Change configuration to: {{ipc.server.max.callqueue.length = 200}}
* Expected: Queue accepts up to 200 tasks
* Actual: Queue still rejects tasks at 100
h2. Root Cause
{{RpcExecutor}} uses two limit mechanisms: (This was introduced by HBASE-15306)
# *Soft limit* ({{{}currentQueueLimit{}}} variable): Updated by
{{resizeQueues()}} when configuration changes
# *Hard limit* ({{{}BlockingQueue{}}} capacity): Set once during queue
initialization and *cannot be changed*
During initialization, queues are created with a fixed capacity based on the
initial configuration value. When configuration changes, {{resizeQueues()}}
only updates the {{currentQueueLimit}} variable but cannot modify the
underlying {{BlockingQueue}} capacity, which is immutable.
h2. Proposed Solutions
*Option 1: Set hard limit to Integer.MAX_VALUE (or sufficiently large value)*
Modify {{initializeQueues()}} to set the queue capacity to
{{Integer.MAX_VALUE}} instead of the configured value, and rely solely on
{{currentQueueLimit}} for enforcement. This is simple and enables dynamic
resizing in both directions. Note that this may allow slight overshooting of
the soft limit due to race conditions in concurrent dispatch.
*Option 2: Recreate queues when increasing capacity*
When {{resizeQueues()}} detects an increase in limit, drain existing queues and
create new ones with the larger capacity. This is more complex but preserves
hard limit safety.
*Option 3: Use Semaphore for limit enforcement*
Maintain a separate {{Semaphore}} per queue for atomic limit control. This
eliminates race conditions but adds overhead.
I'm uncertain whether this behavior is intentional or needs fixing. Is this
something that should be addressed? If so, which approach would be most
appropriate for HBase's architecture? Any feedback would be greatly appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)