[
https://issues.apache.org/jira/browse/HBASE-26012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani resolved HBASE-26012.
--
Fix Version/s: 2.2.8
Hadoop Flags: Reviewed
Resolution: Fixed
Thanks for your review [~zhangduo]
> Improve logging and dequeue logic in DelayQueue
> ---
>
> Key: HBASE-26012
> URL: https://issues.apache.org/jira/browse/HBASE-26012
> Project: HBase
> Issue Type: Improvement
>Affects Versions: 2.3.5, 2.4.4, 3.0.0-alpha-2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.2.8, 2.4.5
>
>
> In Remote Procedure dispatcher, before submitting (sub)Procedure to Thread
> pool, we enqueue it as DelayedWithTimeout object on DelayQueue.
> TimeoutExecutorThread keeps dequeuing elements from this DelayQueue and
> submit the Procedure to the threadpool. The expiration of DelayedWithTimeout
> is determined by getDelay(TimeUnit):
> {code:java}
> @Override
> public long getDelay(final TimeUnit unit) {
> return DelayedUtil.getRemainingTime(unit, getTimeout());
> }
> {code}
> {code:java}
> /**
>* @return Time remaining as milliseconds.
>*/
> public static long getRemainingTime(final TimeUnit resultUnit, final long
> timeout) {
> final long currentTime = EnvironmentEdgeManager.currentTime();
> if (currentTime >= timeout) {
> return 0;
> }
> return resultUnit.convert(timeout - currentTime, TimeUnit.MILLISECONDS);
> }
> {code}
> Hence, in order for the elements to get dequeued on time, it is necessary
> that EnvironmentEdgeManager.currentTime() returns the current time in millis.
> As part of unit test, if we use our own custom EnvironmentEdge and inject it
> using EnvironmentEdgeManager.injectEdge before creating any tables, it is
> possible that we continue returning same value (based on custom impl) with
> EnvironmentEdgeManager.currentTime(). If that is the case, getRemainingTime
> as mentioned above, will never return 0 and hence, the procedure wrapped in
> DelayedWithTimeout might never be dequeued from DelayQueue because it's delay
> will not expire.
> As of today, our system goes in hanging state while waiting for table regions
> to be available (as mentioned above, DelayedWithTimeout object never gets
> dequeued from DelayQueue).
> Thread dump might show something like this consistently:
> {code:java}
> "ProcedureDispatcherTimeoutThread" #319 daemon prio=5 os_prio=31
> tid=0x7fcaf0cae800 nid=0x21d03 waiting on condition [0x700019293000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x0007225a0090> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at java.util.concurrent.DelayQueue.take(DelayQueue.java:223)
> at
> org.apache.hadoop.hbase.procedure2.util.DelayedUtil.takeWithoutInterrupt(DelayedUtil.java:82)
> at
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher$TimeoutExecutorThread.run(RemoteProcedureDispatcher.java:314)
>Locked ownable synchronizers:
> - None
> {code}
> Although running into situation like this is not likely possible unless
> custom EnvironmentEdge is used as mentioned above, we should improve our
> dequeue logic as well as log important message to show where we are stuck.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)