[jira] [Commented] (HDFS-17281) Added support of reporting RPC round-trip time at NN.

ASF GitHub Bot (Jira) Fri, 08 Dec 2023 23:03:32 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794924#comment-17794924
 ]


ASF GitHub Bot commented on HDFS-17281:
---------------------------------------

xinglin opened a new pull request, #6337:
URL: https://github.com/apache/hadoop/pull/6337

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Added support of reporting RPC round-trip time at NN.
> -----------------------------------------------------
>
>                 Key: HDFS-17281
>                 URL: https://issues.apache.org/jira/browse/HDFS-17281
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Xing Lin
>            Assignee: Xing Lin
>            Priority: Major
>         Attachments: Screenshot 2023-10-28 at 10.26.41 PM.png
>
>
> We have come across a few cases where the hdfs clients are reporting very bad 
> latencies, while we don't see similar trends at NN-side. Instead, from 
> NN-side, the latency metrics seem normal as usual. I attached a screenshot 
> which we took during an internal investigation at LinkedIn. What was 
> happening is a token management service was reporting an average latency of 1 
> sec in fetching delegation tokens from our NN but at the NN-side, we did not 
> see anything abnormal. The recent OverallRpcProcessingTime metric we added in 
> HDFS-17042 did not seem to be sufficient to identify/signal such cases. 
> We propose to extend the IPC header in hadoop, to communicate call create 
> time at client-side to IPC servers, so that for each rpc call, the server can 
> get its round-trip time.
>  
> *Why is OverallRpcProcessingTime not sufficient?*
> OverallRpcProcessingTime captures the time starting from when the reader 
> thread reads in the call from the socket to when the response is sent back to 
> the client. As a result, it does not capture the time it takes to transmit 
> the call from client to the server. Besides, we only have a couple of reader 
> threads to monitor a large number of open connections. It is possible that 
> many connections become ready to read at the same time. Then, the reader 
> thread would need to read each call sequentially, leading to a wait time for 
> many Rpc Calls. We have also hit the case where the callQueue becomes full 
> (with a total of 25600 requests) and thus reader threads are blocked to add 
> new Calls into the callQueue. This would lead to a longer latency for all 
> connections/calls which are ready and wait to be read by reader threads. 
> Ideally, we want to measure the time between when a socket/call is ready to 
> read and when it is actually being read by the reader thread. This would give 
> us the wait time that a call is taking to be read. However, after some Google 
> search, we failed to find a way to get this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17281) Added support of reporting RPC round-trip time at NN.

Reply via email to