[ 
https://issues.apache.org/jira/browse/HBASE-26012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HBASE-26012:
---------------------------------
    Fix Version/s: 2.4.5
                   3.0.0-alpha-2
                   2.3.6
                   2.5.0
                   3.0.0-alpha-1

> Improve logging and dequeue logic in DelayQueue
> -----------------------------------------------
>
>                 Key: HBASE-26012
>                 URL: https://issues.apache.org/jira/browse/HBASE-26012
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Viraj Jasani
>            Assignee: Viraj Jasani
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 3.0.0-alpha-2, 2.4.5
>
>
> In Remote Procedure dispatcher, before submitting (sub)Procedure to Thread 
> pool, we enqueue it as DelayedWithTimeout object on DelayQueue. 
> TimeoutExecutorThread keeps dequeuing elements from this DelayQueue and 
> submit the Procedure to the threadpool. The expiration of DelayedWithTimeout 
> is determined by getDelay(TimeUnit):
>  
> {code:java}
> @Override
> public long getDelay(final TimeUnit unit) {
>   return DelayedUtil.getRemainingTime(unit, getTimeout());
> }
> {code}
> {code:java}
>   /**
>    * @return Time remaining as milliseconds.
>    */
>   public static long getRemainingTime(final TimeUnit resultUnit, final long 
> timeout) {
>     final long currentTime = EnvironmentEdgeManager.currentTime();
>     if (currentTime >= timeout) {
>       return 0;
>     }
>     return resultUnit.convert(timeout - currentTime, TimeUnit.MILLISECONDS);
>   }
> {code}
> Hence, in order for the elements to get dequeued on time, it is necessary 
> that EnvironmentEdgeManager.currentTime() returns the current time in millis.
> As part of unit test, if we use our own custom EnvironmentEdge and inject it 
> using EnvironmentEdgeManager.injectEdge before creating any tables, it is 
> possible that we continue returning same value (based on custom impl) with 
> EnvironmentEdgeManager.currentTime(). If that is the case, getRemainingTime 
> as mentioned above, will never return 0 and hence, the procedure wrapped in 
> DelayedWithTimeout might never be dequeued from DelayQueue because it's delay 
> will not expire.
> As of today, our system goes in hanging state while waiting for table regions 
> to be available (as mentioned above, DelayedWithTimeout object never gets 
> dequeued from DelayQueue).
> Thread dump might show something like this consistently:
> {code:java}
> "ProcedureDispatcherTimeoutThread" #319 daemon prio=5 os_prio=31 
> tid=0x00007fcaf0cae800 nid=0x21d03 waiting on condition [0x0000700019293000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00000007225a0090> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
>   at java.util.concurrent.DelayQueue.take(DelayQueue.java:223)
>   at 
> org.apache.hadoop.hbase.procedure2.util.DelayedUtil.takeWithoutInterrupt(DelayedUtil.java:82)
>   at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher$TimeoutExecutorThread.run(RemoteProcedureDispatcher.java:314)
>    Locked ownable synchronizers:
>   - None
> {code}
> Although running into situation like this is not likely possible unless 
> custom EnvironmentEdge is used as mentioned above, we should improve our 
> dequeue logic as well as log important message to show where we are stuck.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to