[ 
https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895552#comment-15895552
 ] 

Lars Hofhansl commented on HBASE-12790:
---------------------------------------

bq. RpcScheduler is pluggable. You need more than that?

Is that all we need [~jamestaylor]?

Let me summarize how the Phoenix folks got here:
# HBase scan contract is serial per scan. I.e. a scan will already return all 
keys in order whether the client needs it that way or not. Hence no parallel 
execution on behalf of a single scan (both [~stack] and I had made attempts to 
improve that but did not finish)
# Scans cannot easily be broken down to units smaller than a region (it's 
certainly possible to do that, but there's no information about internal data 
skew inside a region)
# For this Phoenix adds "guideposts". These are equidistant markers, so that 
Phoenix can know about the key distribution inside a region.
# Phoenix uses guideposts to schedule many small scans. The units are fairly 
small (100MB-1GB worth of cells) to allow for fairness between queries.
# If many query-chunks - a.k.a. scans - of a large query can hog the RPC queues 
than much of the advantage is lost.
# Hence the desire for a this type of "group based" scheduling so that small 
queries can finish before all large queries in the queue need to finish. The 
group is a Phoenix query. So it is simply the desire to extend the fair queuing 
that HBase already has (HBASE-10993) to a query in Phoenix which may issue 
1000's of scans to as many region servers.

That just for the history... I do agree that the patch proposed here is too 
complex and perhaps wants to do too much.

Now perhaps hbase.ipc.server.callqueue.scan.ratio from HBASE-11355 and 
HBASE-11724 gives us what we need _if_ we can use this for small scans, so that 
small scans can land on the "Get" queue. That way we can reserve that queue for 
small scans and Gets, and other queues for large scans.

It's not ideal, though. The best is to allow somehow to round-robin between the 
queries on behave which the scans are operating. That abstraction is not 
available in HBase.


> Support fairness across parallelized scans
> ------------------------------------------
>
>                 Key: HBASE-12790
>                 URL: https://issues.apache.org/jira/browse/HBASE-12790
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>              Labels: Phoenix
>         Attachments: AbstractRoundRobinQueue.java, HBASE-12790_1.patch, 
> HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, HBASE-12790.patch, 
> HBASE-12790_trunk_1.patch, PHOENIX_4.5.3-HBase-0.98-2317-SNAPSHOT.zip
>
>
> Some HBase clients parallelize the execution of a scan to reduce latency in 
> getting back results. This can lead to starvation with a loaded cluster and 
> interleaved scans, since the RPC queue will be ordered and processed on a 
> FIFO basis. For example, if there are two clients, A & B that submit largish 
> scans at the same time. Say each scan is broken down into 100 scans by the 
> client (broken down into equal depth chunks along the row key), and the 100 
> scans of client A are queued first, followed immediately by the 100 scans of 
> client B. In this case, client B will be starved out of getting any results 
> back until the scans for client A complete.
> One solution to this is to use the attached AbstractRoundRobinQueue instead 
> of the standard FIFO queue. The queue to be used could be (maybe it already 
> is) configurable based on a new config parameter. Using this queue would 
> require the client to have the same identifier for all of the 100 parallel 
> scans that represent a single logical scan from the clients point of view. 
> With this information, the round robin queue would pick off a task from the 
> queue in a round robin fashion (instead of a strictly FIFO manner) to prevent 
> starvation over interleaved parallelized scans.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to