massakam opened a new pull request #7096: URL: https://github.com/apache/pulsar/pull/7096
Master Issue: #7041 ### Motivation When a leader broker is restarted, some producers for topics owned by that broker may not be reopened on the new broker. When this happens, message publishing will continue to fail until the client application is restarted. As a result of the investigation, I found that lookup requests sent by the producers in question are redirected more than 10,000 times between multiple brokers. When a lookup request is redirected, `BinaryProtoLookupService#findBroker()` is called recursively. Therefore, tens of thousands of redirects will cause `StackOverflowError` and `BinaryProtoLookupService#findBroker()` will never complete. ### Modifications Limit the number of times a lookup is redirected to 100. This maximum is user configurable. If the number of redirects exceeds 100, the lookup will fail. But `ConnectionHandler` retries lookup so that the producer can eventually reconnect to the new broker. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org