Shangshu Qian created HBASE-29006:
-------------------------------------
Summary: The region assignment retry logic in unconstrained and
may cause workload amplification
Key: HBASE-29006
URL: https://issues.apache.org/jira/browse/HBASE-29006
Project: HBase
Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Shangshu Qian
We found a potential feedback loop in the region assignment process that may
overload the RegionServer (RS).
The `AssigmentManager.processAssignmentPlans()` will retry the assignment when
any HBaseIOException happens. For example,
`FavoerableNodeAssignmentHelper.canPlaceFavoredNodes` may throw an HIOE when
the nodes available are less than three. The HIOE will be caught by the catch
block here:
{code:java}
private void processAssignmentPlans(final HashMap<RegionInfo,
RegionStateNode> regions,
final HashMap<RegionInfo, ServerName> retainMap, final List<RegionInfo>
hris,
final List<ServerName> servers) {
boolean isTraceEnabled = LOG.isTraceEnabled();
if (isTraceEnabled) {
LOG.trace("Available servers count=" + servers.size() + ": " + servers);
} final LoadBalancer balancer = getBalancer();
// ask the balancer where to place regions
if (retainMap != null && !retainMap.isEmpty()) {
if (isTraceEnabled) {
LOG.trace("retain assign regions=" + retainMap);
}
try {
acceptPlan(regions, balancer.retainAssignment(retainMap, servers));
} catch (HBaseIOException e) {
LOG.warn("unable to retain assignment", e);
addToPendingAssignment(regions, retainMap.keySet());
}
} {code}
The assignment is simply retried and is not bounded.
This can cause problems when the assignment fails because the RS is overloaded.
More retries in the region assignment can make the overloading worse.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)