Re: [VOTE][RIP-36] Optimize topic routing mechanism
+1 git_yang 于2022年3月14日周一 16:47写道: > +1 > > > | | > git_yang > | > | > git_y...@163.com > | > 签名由网易邮箱大师定制 > > > On 03/14/2022 14:02,jinrongtong wrote: > +1 > At 2022-03-12 14:19:35, "yuzhou" wrote: > +1 > > On 2022/03/02 11:42:14 xijiu wrote: > Hi, RocketMQ Community, > > As discussed in the previous email, we launched a new RIP to optimize > topic routing mechanism. Now the shepherds @dongeforever and @yukon are > willing to support the RIP, so I think it is time to start an email thread > to enter the voting process. > > > The vote will be open for at least 72 hours or until a necessary number of > votes are reached. > > Please vote accordingly: > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove with the reason > > > Best Regards! > xijiu > > links: > https://shimo.im/docs/vVAXVrDNnoSrMBqm/ >
Re: [VOTE][RIP-36] Optimize topic routing mechanism
+1 | | git_yang | | git_y...@163.com | 签名由网易邮箱大师定制 On 03/14/2022 14:02,jinrongtong wrote: +1 At 2022-03-12 14:19:35, "yuzhou" wrote: +1 On 2022/03/02 11:42:14 xijiu wrote: Hi, RocketMQ Community, As discussed in the previous email, we launched a new RIP to optimize topic routing mechanism. Now the shepherds @dongeforever and @yukon are willing to support the RIP, so I think it is time to start an email thread to enter the voting process. The vote will be open for at least 72 hours or until a necessary number of votes are reached. Please vote accordingly: [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove with the reason Best Regards! xijiu links: https://shimo.im/docs/vVAXVrDNnoSrMBqm/
Re: [VOTE][RIP-36] Optimize topic routing mechanism
It is ok to add a broker-role-status checking mechanism to avoid ineffective sending. The push-mechanism only notify the client to refresh metadata. After refreshment by the MQClientInstance,the broker role status could be updated. 发自我的 iPhone > 在 2022年3月12日,下午7:04,WJL 写道: > > Yes, I agree with you, both approach do the same thing. > If we design in a pull-for-exception way. more work should be done. > > I think the dledger-based rocketmq can introduce more broker status about > current broker role which can be a good indicator for client to refresh > metadata. if producer receive not-leader response code, refresh metadata is > needed. It should not send the message to the broker until the metadata > refresh. Nameserver-pushed way is async to the producer send code path. I > think this can be avoided. > > >> On 2022/03/12 09:37:45 刘振东 wrote: >> In fact,client-pull-for-exception is an alternative way. >> >> The push machanism is simple and effective. >> And, If the nameserver only push the topic names and only one nameserver >> node will do that, the load is reduced much. >> >> Currently,I believe it is not easy to handle all the exceptions in client >> for metadata refreshment. >> >> For example, if you want to refresh by topic,then you need to handle all the >> apis which contain topics,it introduces too much code work. If you refresh >> by all,it will cause too much network load,especially in unstable network >> environment. >> >> List all the exceptions is also not an easy thing. It needs to be designed >> more carefully. >> >> Anyway,performance test is needed. If push-based mechanism invoked too much >> load,an carefully designed client-pull-for-exception will be considered >> again. >> >> 发自我的 iPhone >> >>> 在 2022年3月12日,下午4:17,王金龙 写道: >>> >>> I think the design is a good point to shorten the unavailable time of >>> rocketmq when leader change. >>> >>> I agree with most of the design point in the doc. >>> >>> But the same as others. I think name server push-based topic route will >>> introduce a lot of load for nameserver. >>> >>> How about change to client pull route info when receive exception or >>> receive leader change-like response code. >>> >>> On 2022/03/02 11:42:14 xijiu wrote: Hi, RocketMQ Community, As discussed in the previous email, we launched a new RIP to optimize topic routing mechanism. Now the shepherds @dongeforever and @yukon are willing to support the RIP, so I think it is time to start an email thread to enter the voting process. The vote will be open for at least 72 hours or until a necessary number of votes are reached. Please vote accordingly: [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove with the reason Best Regards! xijiu links: https://shimo.im/docs/vVAXVrDNnoSrMBqm/
RE: Re: [VOTE][RIP-36] Optimize topic routing mechanism
Yes, I agree with you, both approach do the same thing. If we design in a pull-for-exception way. more work should be done. I think the dledger-based rocketmq can introduce more broker status about current broker role which can be a good indicator for client to refresh metadata. if producer receive not-leader response code, refresh metadata is needed. It should not send the message to the broker until the metadata refresh. Nameserver-pushed way is async to the producer send code path. I think this can be avoided. On 2022/03/12 09:37:45 刘振东 wrote: > In fact,client-pull-for-exception is an alternative way. > > The push machanism is simple and effective. > And, If the nameserver only push the topic names and only one nameserver node > will do that, the load is reduced much. > > Currently,I believe it is not easy to handle all the exceptions in client for > metadata refreshment. > > For example, if you want to refresh by topic,then you need to handle all the > apis which contain topics,it introduces too much code work. If you refresh > by all,it will cause too much network load,especially in unstable network > environment. > > List all the exceptions is also not an easy thing. It needs to be designed > more carefully. > > Anyway,performance test is needed. If push-based mechanism invoked too much > load,an carefully designed client-pull-for-exception will be considered again. > > 发自我的 iPhone > > > 在 2022年3月12日,下午4:17,王金龙 写道: > > > > I think the design is a good point to shorten the unavailable time of > > rocketmq when leader change. > > > > I agree with most of the design point in the doc. > > > > But the same as others. I think name server push-based topic route will > > introduce a lot of load for nameserver. > > > > How about change to client pull route info when receive exception or > > receive leader change-like response code. > > > > > >> On 2022/03/02 11:42:14 xijiu wrote: > >> Hi, RocketMQ Community, > >> > >> As discussed in the previous email, we launched a new RIP to optimize > >> topic routing mechanism. Now the shepherds @dongeforever and @yukon are > >> willing to support the RIP, so I think it is time to start an email thread > >> to enter the voting process. > >> > >> > >> The vote will be open for at least 72 hours or until a necessary number of > >> votes are reached. > >> > >> Please vote accordingly: > >> > >> [ ] +1 approve > >> [ ] +0 no opinion > >> [ ] -1 disapprove with the reason > >> > >> > >> Best Regards! > >> xijiu > >> > >> links: > >> https://shimo.im/docs/vVAXVrDNnoSrMBqm/ >
Re: [VOTE][RIP-36] Optimize topic routing mechanism
In fact,client-pull-for-exception is an alternative way. The push machanism is simple and effective. And, If the nameserver only push the topic names and only one nameserver node will do that, the load is reduced much. Currently,I believe it is not easy to handle all the exceptions in client for metadata refreshment. For example, if you want to refresh by topic,then you need to handle all the apis which contain topics,it introduces too much code work. If you refresh by all,it will cause too much network load,especially in unstable network environment. List all the exceptions is also not an easy thing. It needs to be designed more carefully. Anyway,performance test is needed. If push-based mechanism invoked too much load,an carefully designed client-pull-for-exception will be considered again. 发自我的 iPhone > 在 2022年3月12日,下午4:17,王金龙 写道: > > I think the design is a good point to shorten the unavailable time of > rocketmq when leader change. > > I agree with most of the design point in the doc. > > But the same as others. I think name server push-based topic route will > introduce a lot of load for nameserver. > > How about change to client pull route info when receive exception or receive > leader change-like response code. > > >> On 2022/03/02 11:42:14 xijiu wrote: >> Hi, RocketMQ Community, >> >> As discussed in the previous email, we launched a new RIP to optimize topic >> routing mechanism. Now the shepherds @dongeforever and @yukon are willing to >> support the RIP, so I think it is time to start an email thread to enter the >> voting process. >> >> >> The vote will be open for at least 72 hours or until a necessary number of >> votes are reached. >> >> Please vote accordingly: >> >> [ ] +1 approve >> [ ] +0 no opinion >> [ ] -1 disapprove with the reason >> >> >> Best Regards! >> xijiu >> >> links: >> https://shimo.im/docs/vVAXVrDNnoSrMBqm/
RE: [VOTE][RIP-36] Optimize topic routing mechanism
I think the design is a good point to shorten the unavailable time of rocketmq when leader change. I agree with most of the design point in the doc. But the same as others. I think name server push-based topic route will introduce a lot of load for nameserver. How about change to client pull route info when receive exception or receive leader change-like response code. On 2022/03/02 11:42:14 xijiu wrote: > Hi, RocketMQ Community, > > As discussed in the previous email, we launched a new RIP to optimize topic > routing mechanism. Now the shepherds @dongeforever and @yukon are willing to > support the RIP, so I think it is time to start an email thread to enter the > voting process. > > > The vote will be open for at least 72 hours or until a necessary number of > votes are reached. > > Please vote accordingly: > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove with the reason > > > Best Regards! > xijiu > > links: > https://shimo.im/docs/vVAXVrDNnoSrMBqm/
Re: [VOTE][RIP-36] Optimize topic routing mechanism
+1 On 2022/03/02 11:42:14 xijiu wrote: > Hi, RocketMQ Community, > > As discussed in the previous email, we launched a new RIP to optimize topic > routing mechanism. Now the shepherds @dongeforever and @yukon are willing to > support the RIP, so I think it is time to start an email thread to enter the > voting process. > > > The vote will be open for at least 72 hours or until a necessary number of > votes are reached. > > Please vote accordingly: > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove with the reason > > > Best Regards! > xijiu > > links: > https://shimo.im/docs/vVAXVrDNnoSrMBqm/
Re: [VOTE][RIP-36] Optimize topic routing mechanism
t; > > > > > 有两点我简单阐述一下: > > > > > > > > > > > > 1、当前版本除了client不能及时拿到Topic最新的路由数据外,还面临以下问题 > > > > > > Client如果频繁访问某个不存在的Topic,在不允许自动创建Topic的场景下,每次访问的链路会变得冗长,且需要发起两次网络请求 > > > > > > send request to nameServer to get topic route data > > > > > > send default topic TBW102 request to nameServer > > > > > > 某些应用,客户端会访问很多 Topic,这些 Topic 访问一次后,可能不再访问,或非常低频,但是 client 端在轮训时,每次还是会从 > > > NameServer 拉取路由信息,增加网络开销的同时,僵尸 Topic 也比较占用内存 > > > > > > 此问题相关的issues: > > > > > > https://github.com/apache/rocketmq/issues/3207 > > > > > > https://github.com/apache/rocketmq/issues/3858 > > > > > > https://github.com/apache/rocketmq/issues/3870 > > > > > > > > > > > > > > > > > > > > > 2、本次改造的定性是对当前轮训策略的补充,是轻量级的;而且会随着机器的繁忙程度调整通知策略,当机器load达到一定阈值时,会自动关闭。因此这个复杂度不会带来稳定性压力 > > > > > > > > > > > > > > > > > > > > > > > > --原始邮件-- > > > 发件人: > > > "dev" > > > < > > > vintagew...@apache.org; > > > 发送时间:2022年3月2日(星期三) 晚上9:00 > > > 收件人:"dev" > > > > > 主题:Re: [VOTE][RIP-36] Optimize topic routing mechanism > > > > > > > > > > > > I read the whole plan, it is beneficial for the nameserver to actively > > push > > > changes to the client, but this benefit also brings complexity. I > > > personally think this benefit is not very big. Unless there is a better > > > explanation, I will reject this proposal. > > > > > > Best regards, > > > > > > Xiaorui Wang 王小瑞 > > > Apache RocketMQ PMC Chair > > > > > > > > > xijiu <422766...@qq.com.invalid 于2022年3月2日周三 19:42写道: > > > > > > Hi, RocketMQ Community, > > > > > > As discussed in the previous email, we launched a new RIP to > > optimize > > > topic routing mechanism. Now the shepherds @dongeforever and > @yukon > > > are > > > willing to support the RIP, so I think it is time to start an > email > > > thread > > > to enter the voting process. > > > > > > > > > The vote will be open for at least 72 hours or until a necessary > > > number of > > > votes are reached. > > > > > > Please vote accordingly: > > > > > > [ ] +1 approve > > > [ ] +0 no opinion > > > [ ] -1 disapprove with the reason > > > > > > > > > Best Regards! > > > xijiu > > > > > > links: > > > https://shimo.im/docs/vVAXVrDNnoSrMBqm/ > > >
Re: [VOTE][RIP-36] Optimize topic routing mechanism
The core problem is up to 30 seconds of unavailable time during broker startup/shutdown or logic queue remapping, for the metadata discovery is too slow by scheduled pull. For non-ordered topics, the message will be failover to another broker. But for the ordered topic, more precisely, the topic with fixed queue num, the unavailable time will be up to 30 seconds. This is not tolerable. Adding the push mechanism will decrease the unavailable time from 30 seconds to 1~2 seconds. BTW, we should also pay attention to the complexity. To minimize the complexity, the push mechanism will be a bypass flow, will not harm the main pull flow. As for the problem "topic or broker not exist" or "resource overhead", it is just be polished in passing. The original issue is https://github.com/apache/rocketmq/issues/3843, which wants to reduce the unavailable time during broker(with dledger) role change, reduce the impact on sequential message producers. Xiaorui Wang 于2022年3月3日周四 22:57写道: > Thank you for your prompt reply. > > I have read your email carefully and know that what you said is mainly > about the following two problems. > > Problem one: Accesses a topic that does not exist, the path of each access > will be twice as long. > > Problem two: Because of the increasing number of topics, the network > overhead and memory will be increased by the round-robin training. > > For the above, I hope you could provide more quantitative data. > > IMO, I have such suggestions for the above problems, which is only for > reference. > > For question one: If push mechanism is added, whether to remove pull > mechanism, otherwise the problem will still exist. > > For question two: Whether the network overhead and memory overhead have a > significant impact on the application, if there is no modification. > > I hope my advice will be helpful to you, rather than disturbing you. Our > common goal is to fully discuss an architectural change and make it better. > > Best regards, > > Xiaorui Wang 王小瑞 > Apache RocketMQ PMC Chair > > > xijiu <422766...@qq.com.invalid> 于2022年3月3日周四 19:47写道: > > > Thanks for your reply~ > > > > > > > > There are two points I will briefly explain: > > > > > > > > 1.In addition to the problem of not being able to obtain the latest > > routing data of the topic in time, it also faces the following problems: > > > > If the client frequently accesses a topic that does not exist, the path > of > > each access will become lengthy and require two network requests > > > > send request to nameServer to get topic route data > > > > send default topic TBW102 request to nameServer > > > > In some applications, the client will access many topics. After these > > topics are accessed once, they may no longer be accessed or very > > infrequent, but the client will still pull routing information from the > > NameServer each time during round-robin training, which increases network > > overhead. Zombie Topic also occupies more memory > > > > Related issues > > > > https://github.com/apache/rocketmq/issues/3207 > > > > https://github.com/apache/rocketmq/issues/3858 > > > > https://github.com/apache/rocketmq/issues/3870 > > > > > > > > 2.The purpose of this modification is to supplement the current rotation > > training strategy, and the notification strategy will be adjusted > according > > to the busy degree of the machine. Therefore, this complexity will not > > bring stability pressure > > > > > > > > > > > > > > 感谢您的回复 > > > > > > > > 有两点我简单阐述一下: > > > > > > > > 1、当前版本除了client不能及时拿到Topic最新的路由数据外,还面临以下问题 > > > > Client如果频繁访问某个不存在的Topic,在不允许自动创建Topic的场景下,每次访问的链路会变得冗长,且需要发起两次网络请求 > > > > send request to nameServer to get topic route data > > > > send default topic TBW102 request to nameServer > > > > 某些应用,客户端会访问很多 Topic,这些 Topic 访问一次后,可能不再访问,或非常低频,但是 client 端在轮训时,每次还是会从 > > NameServer 拉取路由信息,增加网络开销的同时,僵尸 Topic 也比较占用内存 > > > > 此问题相关的issues: > > > > https://github.com/apache/rocketmq/issues/3207 > > > > https://github.com/apache/rocketmq/issues/3858 > > > > https://github.com/apache/rocketmq/issues/3870 > > > > > > > > > > > > > 2、本次改造的定性是对当前轮训策略的补充,是轻量级的;而且会随着机器的繁忙程度调整通知策略,当机器load达到一定阈值时,会自动关闭。因此这个复杂度不会带来稳定性压力 > > > > > > > > > > > > > > > > --原始邮件-- > > 发件人: > > "dev" >
Re: [VOTE][RIP-36] Optimize topic routing mechanism
Thank you for your prompt reply. I have read your email carefully and know that what you said is mainly about the following two problems. Problem one: Accesses a topic that does not exist, the path of each access will be twice as long. Problem two: Because of the increasing number of topics, the network overhead and memory will be increased by the round-robin training. For the above, I hope you could provide more quantitative data. IMO, I have such suggestions for the above problems, which is only for reference. For question one: If push mechanism is added, whether to remove pull mechanism, otherwise the problem will still exist. For question two: Whether the network overhead and memory overhead have a significant impact on the application, if there is no modification. I hope my advice will be helpful to you, rather than disturbing you. Our common goal is to fully discuss an architectural change and make it better. Best regards, Xiaorui Wang 王小瑞 Apache RocketMQ PMC Chair xijiu <422766...@qq.com.invalid> 于2022年3月3日周四 19:47写道: > Thanks for your reply~ > > > > There are two points I will briefly explain: > > > > 1.In addition to the problem of not being able to obtain the latest > routing data of the topic in time, it also faces the following problems: > > If the client frequently accesses a topic that does not exist, the path of > each access will become lengthy and require two network requests > > send request to nameServer to get topic route data > > send default topic TBW102 request to nameServer > > In some applications, the client will access many topics. After these > topics are accessed once, they may no longer be accessed or very > infrequent, but the client will still pull routing information from the > NameServer each time during round-robin training, which increases network > overhead. Zombie Topic also occupies more memory > > Related issues > > https://github.com/apache/rocketmq/issues/3207 > > https://github.com/apache/rocketmq/issues/3858 > > https://github.com/apache/rocketmq/issues/3870 > > > > 2.The purpose of this modification is to supplement the current rotation > training strategy, and the notification strategy will be adjusted according > to the busy degree of the machine. Therefore, this complexity will not > bring stability pressure > > > > > > > 感谢您的回复 > > > > 有两点我简单阐述一下: > > > > 1、当前版本除了client不能及时拿到Topic最新的路由数据外,还面临以下问题 > > Client如果频繁访问某个不存在的Topic,在不允许自动创建Topic的场景下,每次访问的链路会变得冗长,且需要发起两次网络请求 > > send request to nameServer to get topic route data > > send default topic TBW102 request to nameServer > > 某些应用,客户端会访问很多 Topic,这些 Topic 访问一次后,可能不再访问,或非常低频,但是 client 端在轮训时,每次还是会从 > NameServer 拉取路由信息,增加网络开销的同时,僵尸 Topic 也比较占用内存 > > 此问题相关的issues: > > https://github.com/apache/rocketmq/issues/3207 > > https://github.com/apache/rocketmq/issues/3858 > > https://github.com/apache/rocketmq/issues/3870 > > > > > > 2、本次改造的定性是对当前轮训策略的补充,是轻量级的;而且会随着机器的繁忙程度调整通知策略,当机器load达到一定阈值时,会自动关闭。因此这个复杂度不会带来稳定性压力 > > > > > > > > ----------原始邮件------ > 发件人: > "dev" > < > vintagew...@apache.org; > 发送时间:2022年3月2日(星期三) 晚上9:00 > 收件人:"dev" > 主题:Re: [VOTE][RIP-36] Optimize topic routing mechanism > > > > I read the whole plan, it is beneficial for the nameserver to actively push > changes to the client, but this benefit also brings complexity. I > personally think this benefit is not very big. Unless there is a better > explanation, I will reject this proposal. > > Best regards, > > Xiaorui Wang 王小瑞 > Apache RocketMQ PMC Chair > > > xijiu <422766...@qq.com.invalid 于2022年3月2日周三 19:42写道: > > Hi, RocketMQ Community, > > As discussed in the previous email, we launched a new RIP to optimize > topic routing mechanism. Now the shepherds @dongeforever and @yukon > are > willing to support the RIP, so I think it is time to start an email > thread > to enter the voting process. > > > The vote will be open for at least 72 hours or until a necessary > number of > votes are reached. > > Please vote accordingly: > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove with the reason > > > Best Regards! > xijiu > > links: > https://shimo.im/docs/vVAXVrDNnoSrMBqm/
Re: [VOTE][RIP-36] Optimize topic routing mechanism
Hi, I think its very beneficial to quickly discover broker is down in deledger distribution. and, In order to reduce complexity, Would it be better to change the exsiting request code to long-pull, just add a switch "wait=true" to indicate the request is long-poll. On 2022/03/02 11:42:14 xijiu wrote: > Hi, RocketMQ Community, > > As discussed in the previous email, we launched a new RIP to optimize topic > routing mechanism. Now the shepherds @dongeforever and @yukon are willing to > support the RIP, so I think it is time to start an email thread to enter the > voting process. > > > The vote will be open for at least 72 hours or until a necessary number of > votes are reached. > > Please vote accordingly: > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove with the reason > > > Best Regards! > xijiu > > links: > https://shimo.im/docs/vVAXVrDNnoSrMBqm/
Re: [VOTE][RIP-36] Optimize topic routing mechanism
I read the whole plan, it is beneficial for the nameserver to actively push changes to the client, but this benefit also brings complexity. I personally think this benefit is not very big. Unless there is a better explanation, I will reject this proposal. Best regards, Xiaorui Wang 王小瑞 Apache RocketMQ PMC Chair xijiu <422766...@qq.com.invalid> 于2022年3月2日周三 19:42写道: > Hi, RocketMQ Community, > > As discussed in the previous email, we launched a new RIP to optimize > topic routing mechanism. Now the shepherds @dongeforever and @yukon are > willing to support the RIP, so I think it is time to start an email thread > to enter the voting process. > > > The vote will be open for at least 72 hours or until a necessary number of > votes are reached. > > Please vote accordingly: > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove with the reason > > > Best Regards! > xijiu > > links: > https://shimo.im/docs/vVAXVrDNnoSrMBqm/
Re: [VOTE][RIP-36] Optimize topic routing mechanism
+1 for moving forward. On Wed, Mar 2, 2022 at 7:42 PM xijiu <422766...@qq.com.invalid> wrote: > Hi, RocketMQ Community, > > As discussed in the previous email, we launched a new RIP to optimize > topic routing mechanism. Now the shepherds @dongeforever and @yukon are > willing to support the RIP, so I think it is time to start an email thread > to enter the voting process. > > > The vote will be open for at least 72 hours or until a necessary number of > votes are reached. > > Please vote accordingly: > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove with the reason > > > Best Regards! > xijiu > > links: > https://shimo.im/docs/vVAXVrDNnoSrMBqm/