[ 
https://issues.apache.org/jira/browse/HBASE-20445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441535#comment-16441535
 ] 

Andrew Purtell commented on HBASE-20445:
----------------------------------------

I'm not 100% familiar with the state of things in trunk. This is written from a 
branch-1 perspective, and is for brainstorming and discussion not a complete 
proposal. Ideally the results can be ported back to a branch-1 minor. 
Description subject to change as this idea is thought through.



> Defer work when a row lock is busy
> ----------------------------------
>
>                 Key: HBASE-20445
>                 URL: https://issues.apache.org/jira/browse/HBASE-20445
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Major
>
> Instead of blocking on row locks, defer the call and make the call runner 
> available so it can service other activity. Have runners pick up deferred 
> calls in the background after servicing the other request. 
> Spin briefly on tryLock() wherever we are now using lock() to acquire a row 
> lock. Introduce two new configuration parameters: one for the amount of time 
> to wait between lock acquisition attempts, and another for the total number 
> of times we wait before deferring the work. If the lock cannot be acquired, 
> put the call back into the call queue. Call queues therefore should be 
> priority queues sorted by deadline. Currently they are implemented with 
> LinkedBlockingQueue (which isn't), or AdaptiveLifoCoDelCallQueue (which is) 
> if the CoDel scheduler is enabled. Perhaps we could just require use of 
> AdaptiveLifoCoDelCallQueue. Runners will be picking up work from the head of 
> the queues as long as they are not empty, so deferred calls will be serviced 
> again, or dropped if the deadline has passed.
> Implementing continuations for simple operations should be straightforward. 
> Batch mutations try to acquire as many rowlocks as they can, then do the 
> partial batch over the successfully locked rows, then loop back to attempt 
> the remaining work. This is a partial implementation of what we need so we 
> can build on it. Rather than loop around, save the partial batch completion 
> state and put a pointer to it along with the call back into the RPC queue.
> For scans where allowPartialResults has been set to true we can simply 
> complete the call at the point we fail to acquire a row lock. The client will 
> handle the rest. For scans where allowPartialResults is false we have to save 
> the scanner state and partial results, and put a pointer to this state along 
> with the call back into the queue. 
> We could approach this in phases:
> Phase 0 - Sort out the call queuing details. Do we require 
> AdaptiveLifoCoDelCallQueue? Certainly we can make use of it. Can we also have 
> RWQueueRpcExecutor create queues as PriorityBlockingQueue instead of 
> LinkedBlockingQueue? There must be a reason why not already.
> Phase 1 - Implement deferral of simple ops only. (Batch mutations and scans 
> will still block on rowlocks.)
> Phase 2 - Implement deferral of batch mutations. (Scans will still block on 
> rowlocks.)
> Phase 3 - Implement deferral of scans where allowPartialResults is false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to