[
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928656#comment-17928656
]
ASF GitHub Bot commented on HDFS-17531:
---------------------------------------
KeeProMise opened a new pull request, #7408:
URL: https://github.com/apache/hadoop/pull/7408
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
### How was this patch tested?
### For code changes:
- [ ] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> RBF: Asynchronous router RPC
> ----------------------------
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: rbf
> Reporter: Jian Zhang
> Assignee: Jian Zhang
> Priority: Major
> Labels: pull-request-available
> Attachments: Async router single ns performance test.pdf, Aynchronous
> router.pdf, Comparison of Async router & sync router performance.pdf,
> HDFS-17531.001.patch, Router asynchronous rpc implementation .pdf,
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client
> requests, forward the requests to the corresponding downstream ns, and then
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new
> call, place it in the call of the connection thread, and wait for the call
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns.
> Based on the call in the response, notify the call to complete processing
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler*
> thread adds the call to connection.calls, it needs to wait until the
> *connection* notifies the call to complete, and then Only after the response
> is put into the response queue can a new call be obtained from the call queue
> and processed. Therefore, the concurrency performance of the router is
> limited by the number of handlers; a simple example is as follows: If the
> number of handlers is 1 and the maximum number of calls in the connection
> thread is 10, then even if the connection thread can send 10 requests to the
> downstream ns, since the number of handlers is 1, the router can only process
> one request after another.
>
> Since the performance of router rpc is mainly limited by the number of
> handlers, the most effective way to improve rpc performance currently is to
> increase the number of handlers. Letting the router create a large number of
> handler threads will also increase the number of thread switches and cannot
> maximize the use of machine performance.
>
> There are usually multiple ns downstream of the router. If the handler
> forwards the request to an ns with poor performance, it will cause the
> handler to wait for a long time. Due to the reduction of available handlers,
> the router's ability to handle ns requests with normal performance will be
> reduced. From the perspective of the client, the performance of the
> downstream ns of the router has deteriorated at this time. We often find that
> the call queue of the downstream ns is not high, but the call queue of the
> router is very high.
>
> Therefore, although the main function of the router is to federate and handle
> requests from multiple NSs, the current synchronous RPC performance cannot
> satisfy the scenario where there are many NSs downstream of the router. Even
> if the concurrent performance of the router can be improved by increasing the
> number of handlers, it is still relatively slow. More threads will increase
> the CPU context switching time, and in fact many of the handler threads are
> in a blocked state, which is undoubtedly a waste of thread resources. When a
> request enters the router, there is no guarantee that there will be a running
> handler at this time.
>
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the
> complete solution.
>
> Welcome everyone to exchange and discuss!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]