Michael Stack created HBASE-23600:
-------------------------------------
Summary: Improve chances of edits landing into hbase:meta even
when high load
Key: HBASE-23600
URL: https://issues.apache.org/jira/browse/HBASE-23600
Project: HBase
Issue Type: Improvement
Components: rpc
Reporter: Michael Stack
Of late I've been testing clusters under high load to study failures and to
figure how to effect recovery if cluster is unable to recover on its own.
One interesting case is a RS that is struggling mostly because writes to HDFS
are backed up and sync calls are running very slow taking a long time to
complete. The RPC backs up with waiting requests, and eventually goes over one
or more bounds. The RS then starts throwing CallQueueTooBigExceptions. This
struggling state can last a good while. We throw CQTBEs whatever the priority
of the incoming request.
We throw CQTBE in two places; on original parse of the request before we
dispatch it on a handler -- here we check size of all queues and if over the
threshold (default 1G), throw the exception -- and then later when we dispatch
the request to internal queues, we'll count items in queue and if over default
in any one queue (default is 10 * handler count), we'll fail dispatch and again
throw CQTBE.
We shouldn't be running w/ big queues. We should be rejecting Requests we know
we'll never process in time before client loses interest (See the CoDel thesis
and the implementations added a good while back). TODO.
Meantime I was looking to see if having read a high-priority request, if rather
than dropping it on the floor, instead, what would happen if I let it through
even if above thresholds? My main concern is edits to hbase:meta. When
sustained, saturated load on the RS carrying hbase:meta, edits may not land.
The result is incomplete Procedures and a disorientated Master. I was playing
w/ trying to put off the corruption as long as possible, experimenting (CoDel
doesn't do priority at first blush; we probably want to add this).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)