[ https://issues.apache.org/jira/browse/HBASE-20445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441535#comment-16441535 ]
Andrew Purtell commented on HBASE-20445: ---------------------------------------- I'm not 100% familiar with the state of things in trunk. This is written from a branch-1 perspective, and is for brainstorming and discussion not a complete proposal. Ideally the results can be ported back to a branch-1 minor. Description subject to change as this idea is thought through. > Defer work when a row lock is busy > ---------------------------------- > > Key: HBASE-20445 > URL: https://issues.apache.org/jira/browse/HBASE-20445 > Project: HBase > Issue Type: Improvement > Reporter: Andrew Purtell > Priority: Major > > Instead of blocking on row locks, defer the call and make the call runner > available so it can service other activity. Have runners pick up deferred > calls in the background after servicing the other request. > Spin briefly on tryLock() wherever we are now using lock() to acquire a row > lock. Introduce two new configuration parameters: one for the amount of time > to wait between lock acquisition attempts, and another for the total number > of times we wait before deferring the work. If the lock cannot be acquired, > put the call back into the call queue. Call queues therefore should be > priority queues sorted by deadline. Currently they are implemented with > LinkedBlockingQueue (which isn't), or AdaptiveLifoCoDelCallQueue (which is) > if the CoDel scheduler is enabled. Perhaps we could just require use of > AdaptiveLifoCoDelCallQueue. Runners will be picking up work from the head of > the queues as long as they are not empty, so deferred calls will be serviced > again, or dropped if the deadline has passed. > Implementing continuations for simple operations should be straightforward. > Batch mutations try to acquire as many rowlocks as they can, then do the > partial batch over the successfully locked rows, then loop back to attempt > the remaining work. This is a partial implementation of what we need so we > can build on it. Rather than loop around, save the partial batch completion > state and put a pointer to it along with the call back into the RPC queue. > For scans where allowPartialResults has been set to true we can simply > complete the call at the point we fail to acquire a row lock. The client will > handle the rest. For scans where allowPartialResults is false we have to save > the scanner state and partial results, and put a pointer to this state along > with the call back into the queue. > We could approach this in phases: > Phase 0 - Sort out the call queuing details. Do we require > AdaptiveLifoCoDelCallQueue? Certainly we can make use of it. Can we also have > RWQueueRpcExecutor create queues as PriorityBlockingQueue instead of > LinkedBlockingQueue? There must be a reason why not already. > Phase 1 - Implement deferral of simple ops only. (Batch mutations and scans > will still block on rowlocks.) > Phase 2 - Implement deferral of batch mutations. (Scans will still block on > rowlocks.) > Phase 3 - Implement deferral of scans where allowPartialResults is false. -- This message was sent by Atlassian JIRA (v7.6.3#76005)