Thanks for replying, Chen! There are a lot of contexts about how this so it is probably better so set up some meeting about it. Do you have time this week? I am interested to know in what circumstances you ran into the queue latency issue.
Some more context from my side: I did below debugging to figure out. Double checked rpc processing time, but didn’t find obvious increase. Did some flamegraph profiling, but didn’t catch obvious jstack on cost based related area. Replayed with Dynamometer but was not able to find clear increase in that environment. Thanks, Fengnan From: Chen Liang <vagaryc...@gmail.com> Date: Wednesday, November 4, 2020 at 12:08 PM To: Fengnan Li <loyal...@gmail.com> Cc: Hdfs-dev <hdfs-dev@hadoop.apache.org> Subject: Re: Cost Based FairCallQueue latency issue Hi Fengnan, We had been testing cost based faire call queue internally. We also saw latency increase, and we are trying to debug into this issue as well. Current suspicion is that the way that the metrics were generated might be introducing too much overhead. We are in the process of trying to reproduce this using Dynamometer. If this is something you would be interested in, we can follow up on working together on this issue. Best, Chen Fengnan Li <loyal...@gmail.com> 于2020年10月30日周五 下午1:51写道: Hi all, Has someone deployed Cost Based Fair Call Queue in their production cluster? We ran into some RPC queue latency degradation with ~30k-40k rps. I tried to debug but didn’t find anything suspicious. It is worth mentioning there is no memory issue coming with the extra heap usage for storing the call cost. Thanks, Fengnan