Re: about issues-2425

yuhang xiu Mon, 24 Dec 2018 04:24:41 -0800

Hello, yeling,

I checked the code of the old version.
Since the current retry process will execute failed.remove(invocation);
anyway, this will cause our retry to be done only once. I think this is a
bug in the current failback.


I have given a few reviews about your pr. I need you to improve your ut
now, I don't think it is rigorous at the moment. The other parts I think
are correct. I will merge it after you perfect your ut.

This is your first pr, expecting you more PR. :)

你好，yeling，

我查看了老版本的代码。
由于目前重试过程之后，无论如何都会执行failed.remove(invocation);，这会导致我们的重试无论如何只会进行一次。我认为这是目前failback的一个bug。

关于你的pr，我已经给出了几次review。目前需要你完善一下你的ut，我认为目前并不严谨。其他的部分我认为是正确的。在你完善你的ut之后我会merge它。

这是你的第一个pr，期待你更多的PR。:)

yeling <xcwang...@gmail.com> 于2018年12月21日周五 下午10:22写道：

> FailbackClusterInvoker should limit the size of failed invocations
> issues:https://github.com/apache/incubator-dubbo/issues/2425
>
>
> In the FailbackClusterInvoker class, the failed member variable does not
> control the size, which may cause memory leaks. When reading the source
> code, I found some problems:
>
> 1. When a timed task is executed, just one invoke request fails and is
> added directly to the queue. This will cause the newly failed task to
> execute immediately, which does not meet the previous expectations. In
> theory, the invoke request is executed once in five seconds.
>
> 2. The addFailed (Invocation invocation, AbstractCluster Invoker<?>
> invoker) method passes in the Cluster itself.
>
> Look at some of the code for retry:
>        try {
>                 invoker.invoke(invocation);
>                 failed.remove(invocation);
>             } catch (Throwable e) {
>                 logger.error("Failed retry to invoke method " +
> invocation.getMethodName() + ", waiting again.", e);
>             }
> Since invoker is Cluster here, the specific implementation here is Failback
> Cluster Invoker. Look at the doInvoke part of the code:
>
>     protected Result doInvoke(Invocation invocation, List<Invoker<T>>
> invokers, LoadBalance loadbalance) throws RpcException {
>         try {
>             checkInvokers(invokers, invocation);
>             Invoker<T> invoker = select(loadbalance, invocation, invokers,
> null);
>             return invoker.invoke(invocation);
>         } catch (Throwable e) {
>             logger.error("Failback to invoke method " +
> invocation.getMethodName() + ", wait for retry in background. Ignored
> exception: "+ e.getMessage() + ", ", e);
>             addFailed(invocation, this);
>             return new RpcResult(); // ignore
>         }
> Obviously normally, there is a catch Throwable that is caught once and
> never goes into retry code. I understand that we don't want such a design.
>
> I have made some optimization and improvement on this.
> 1. Use the time wheel to solve the problem of task execution and ensure
> that the failed task is not executed immediately, but after 5 seconds.
> 2. Set the configurable number of failbackTasks tasks (default 100) and the
> number of retries (default 3)
> 3. The retry logic does not start with FailbackCluster Invoker, but follows
> the following strategy
>
> Invoker < T > retryInvoker = select (load balance, invocation, invokers,
> lastInvokers);
> RetryInvoker. invoke (invocation);
>
> test case:
> FailbackCluster InvokerTest. testRetryFailed
> I have submitted PR
> https://github.com/apache/incubator-dubbo/pull/2822
> 在FailbackClusterInvoker类中，成员变量failed没有控制容量，可能会引起内存泄露。在阅读源码时，发现有一些问题：
> 1.当定时任务在执行时，正好一个invoke请求失败，会直接加入到队列中，这会导致刚失败的任务会立即执行，不符合之前的预期，理论上的5秒执行一次。
> 2.addFailed(Invocation invocation, AbstractClusterInvoker<?>
> invoker)方法传入的是Cluster本身。
> 看下重试的部分代码：
>       try {
>                 invoker.invoke(invocation);
>                 failed.remove(invocation);
>             } catch (Throwable e) {
>                 logger.error("Failed retry to invoke method " +
> invocation.getMethodName() + ", waiting again.", e);
>             }
>
> 由于invoker这里是Cluster，这里具体实现是FailbackClusterInvoker，看下doInvoke部分代码：
>  protected Result doInvoke(Invocation invocation, List<Invoker<T>>
> invokers, LoadBalance loadbalance) throws RpcException {
>         try {
>             checkInvokers(invokers, invocation);
>             Invoker<T> invoker = select(loadbalance, invocation, invokers,
> null);
>             return invoker.invoke(invocation);
>         } catch (Throwable e) {
>             logger.error("Failback to invoke method " +
> invocation.getMethodName() + ", wait for retry in background. Ignored
> exception: "+ e.getMessage() + ", ", e);
>             addFailed(invocation, this);
>             return new RpcResult(); // ignore
>         }
> 很明显正常情况下，是又一次被捕获异常，永远走不进重试代码中的catch Throwable，我的理解任务，这并不是想要这样的设计。
>
> 我对此做了一些优化和改造：
> 1.使用时间轮来解决任务执行的问题，确保失败的任务不是马上执行，而是5秒后；
> 2.设置可配置的failbackTasks任务数(默认100)和复用retries重试次数(默认3)
> 3.重试逻辑不从FailbackClusterInvoker开始执行，而是如下策略
> Invoker<T> retryInvoker = select(loadbalance, invocation, invokers,
> lastInvokers);
> retryInvoker.invoke(invocation);
>
> 测试用例
> FailbackClusterInvokerTest.testRetryFailed
>
> 我已经提交了PR,你们能阅读一下吗？
> https://github.com/apache/incubator-dubbo/pull/2822
>

Re: about issues-2425

Reply via email to