[ 
https://issues.apache.org/jira/browse/KUDU-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443686#comment-17443686
 ] 

Redriver edited comment on KUDU-2955 at 11/16/21, 5:10 AM:
-----------------------------------------------------------

I also observed the master RPC queue full when there are periodically ksck 
check in my local environment. I have checked the source code on server side.

Service register flow:
{code:java}
master.cc::StartAsync() -> RegisterService(impl) ==> 
RpcServer::RegisterService() ==> Messenger::RegisterService(ServiceName, 
ServicePool) // Create a queue for every service
{code}
Service lookup flow:
{code:java}
Connection::HandleIncomingCall() ==> messenger()->QueueInboundCall() // (1) 
find the service object through service name, (2) find the function through 
method name from service object.
{code}
Considering the service name and method name have already used on client side, 
we should not change the interface, otherwise, the client side has to change 
too and breaks backward compatibility.

But if we want to build two RPC queues for the admin and client master service, 
we have to dispatch the call to different queue by checking the method name 
(for example, TSHeartbeat vs. ConnectToMaster).

A proposal is to build another map<service_name_method, 
priority_service_queue_name> methodToPriorityQueueName.

When registering the service, the caller provides another tag, high or normal 
priority. For normal priority, the service registering is the same as now:

methodToPriorityQueueName.put(service_name + method_name, service_name)

For high priority, it adds a prefix, for example, "admin" to the service name, 
and save methodToPriorityQueueName.put(service_name + method_name, "admin" + 
service_name)

When dispatching the inboundcall, it first find the service name through 
methodToPriorityQueueName.find(service_name + method_name), then find the rpc 
object through the service name, and do the same logic as now.

This solution adds a intermediate map and a overhead during RPC dispatching, 
but it is the smallest change on current flow.


was (Author: redriver):
I also observed the master RPC queue full when there are periodically ksck 
check in my local environment. I have checked the source code on server side.

Service register flow:
{code:java}
master.cc::StartAsync() -> RegisterService(impl) ==> 
RpcServer::RegisterService() ==> Messenger::RegisterService(ServiceName, 
ServicePool) // Create a queue for every service
{code}
Service lookup flow:
{code:java}
Connection::HandleIncomingCall() ==> messenger()->QueueInboundCall() // (1) 
find the service object through service name, (2) find the function through 
method name from service object.
{code}
Considering the service name and method name have already used on client side, 
we should not change the interface, otherwise, the client side has to change 
too and breaks backward compatibility.

But if we want to build two RPC queues for the admin and client master service, 
we have to dispatch the call to different queue by checking the method name 
(for example, TSHeartbeat vs. ConnectToMaster).

A proposal is to build another map<service_name_method, 
priority_service_queue_name> methodToPriorityQueueName.

When registering the service, the caller provides another tag, high or normal 
priority. For normal priority, the service registering is the same as now. For 
high priority, it adds a prefix, for example, "admin" to the service name.

When dispatching the inboundcall, it first find the service name through 
methodToPriorityQueueName, then do the same logic as now.

This solution adds a intermediate map and a overhead during RPC dispatching, 
but it is the smallest change on current flow.

> kudu-master: separate RPC service queues for TSHeartbeat from client-facing 
> RPCs
> --------------------------------------------------------------------------------
>
>                 Key: KUDU-2955
>                 URL: https://issues.apache.org/jira/browse/KUDU-2955
>             Project: Kudu
>          Issue Type: Improvement
>          Components: master, rpc
>            Reporter: Alexey Serbin
>            Priority: Major
>
> As of now, all client-related RPCs like {{ConnectToMaster}}, 
> {{GetTabletLocations}}, {{GetTableLocations}}, {{GetTableSchema}}, etc., 
> tserver-related RPC {{TSHeartbeat}}, and other administrative RPCs are all 
> put into the same RPC service queue upon arrival.  In some cases of 
> congestion (e.g., full tablet reports from all tablet servers in a cluster 
> upon the change in the master leadership) and aggravating factors such as 
> slow master's WAL, that might lead to dropping requests sent from Kudu 
> clients to master, even if the inflow of client requests isn't high and the 
> client request might be served in parallel with processing {{TSHeartbeat}} 
> sent from tablet servers.
> It would be nice to put all {{TSHeartbeat}} requests and other administrative 
> requests into one service queue, and all the client-originated requests into 
> another.  That way spikes of RPC inflow from clients would not affect 
> internal cluster bookkeeping and vice versa.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to