Haoze Wu created HBASE-26256:
--------------------------------
Summary: The potential delay of HDFS RPC in HRegion may cause data
inconsistency and some HBase shell commands hanging
Key: HBASE-26256
URL: https://issues.apache.org/jira/browse/HBASE-26256
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 2.4.2
Reporter: Haoze Wu
When a RegionServer is initializing a new region, it writes its internal
metadata (e.g., WAL) in the HDFS cluster. We find that this write operation can
be potentially blocked due to network issues or overloading on HDFS side, and
the delay will result in inconsistency to HBase clients and cause multiple
HBase APIs to hang as well.
*Reproduction*
Steps to reproduce the symptom from scratch:
# Start a HDFS cluster (1 NameNode + 2 DataNodes) with the default
configuration.
# Start a ZooKeeper cluster (3 nodes) with the default configuration.
# Start a HBase cluster (1 Master + 2 RegionServers) with the default
configuration.
# In one of the RegionServers, introduce a delay by invoking `Thread.sleep`
when it is creating its third region (alternatively, use a network packet loss
injection tool like `tc`)
# When the HBase cluster just gets started, the fault has not yet been
triggered. We use the default HBase shell by running `bin/hbase shell` in the
terminal. In the HBase shell, we repeatedly use the `create` command to create
new tables, until the fault is triggered.
When the fault occurs, we observe several symptoms as follows:
# The HBase shell running the `create` command hangs, without any log or
warning.
# If we start another HBase shell and run the `list` command to see all the
tables, we can see the table in the result. However, this table has actually
not been created yet. Ideally the client should not see this pending table
before `create` succeeds.
# If we start another HBase shell and run the `disable` command to disable
this table, the HBase shell will hang, without any log or warning. Ideally, we
should see some error or warning within a short duration of time, because this
table has not been created yet.
The stack trace:
{code:java}
"RS_OPEN_REGION-regionserver/razor15:16022-0" #144 daemon prio=5 os_prio=0
tid=0x00007f4c34ed8000 nid=0x4463 waiting on condition [0x00007f4bfd496000]
java.lang.Thread.State: TIMED_WAITING (sleeping) at
java.lang.Thread.sleep(Native Method) at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1075)
at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955) at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8081)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:8040)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8016)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7974)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7925)
at
org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:145)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
Relevant code snippet:
{code:java}
// file path:
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
// class: org.apache.hadoop.hbase.regionserver.HRegion
public class HRegion implements HeapSize, PropagatingConfigurationObserver,
Region {
// ...
private long initializeRegionInternals(final CancelableProgressable reporter,
final MonitoredTask status) throws IOException {
// ...
if (!isRestoredRegion) {
// ...
if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) {
// ...
// At and only at the third time of invocation,
// invoke Thread.sleep, to simulate a delay of HDFS RPC
WALSplitUtil.writeRegionSequenceIdFile(getWalFileSystem(),
getWALRegionDir(),
nextSeqId - 1);
// ...
}
}
// ...
}
// ...
}
{code}
*Fix*
We’re not quite sure about the root causes for the inconsistencies or the
blocking of other APIs. One potential simple fix is to protect the
`WALSplitUtil.writeRegionSequenceIdFile` operation (or the HDFS RPCs inside it)
with timeout. We checked that throwing a timeout exception when the operation
takes too long would resolve the aforementioned symptoms.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)