michaeljmarshall opened a new pull request, #19327: URL: https://github.com/apache/pulsar/pull/19327
Fixes https://github.com/apache/pulsar/issues/13923 ### Motivation When the Java client's `ClientCnx` is initialized, there is a race to set the target broker and the remote hostname. The target brorker is relevant when connecting through the proxy and the remote hostname is relevant for certain kinds of authentication, like SASL. In most cases, this race results in the preferred order where these values are set before the `ClientCnx#channelActive` method is called. However, when that method is called before they are set, problems occur. This PR ensures that the fields are set before we call `channel.connect(...)`. Additional motivation for understanding the correctness of this solution can be found in the relevant netty code: https://github.com/netty/netty/blob/cbd324c178135a82f23749bc218c2c6ee3a9b140/transport-classes-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollChannel.java#L648-L659 In that block, notice that the `isActive` gets set to `true`, the promise gets completed (which likely triggers callbacks), and then the channel active event gets triggered. Interestingly, the easiest way to reproduce the problematic behavior in the proxy is with the following steps: 1. Update `createConnection(InetSocketAddress logicalAddress, InetSocketAddress physicalAddress, int connectionKey)` so that the callback for `createConnection` is `thenAcceptAsync` instead of `thenAccept`. 2. Run `ProxyTest#testProducer()`. 3. Observe failure in logs. ### Modifications * Update `ConnectionPool` to pass the logical broker address through to the initialization methods. * Update logic to compare the logical and physical addresses because one address is resolved. ### Verifying this change It is challenging to create a pure reproducer for this case. I was easily able to reproduce it by making the callback complete in another thread. Here is the `thenAccept` that I changed to `thenAcceptAsync`: https://github.com/apache/pulsar/blob/3c06a4a19617006ba6c9fce6f4409aaf075216a9/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionPool.java#L247 At the very least, the current test coverage will ensure this does not introduce a regression. ### Does this pull request potentially affect one of the following parts: This is not a breaking change. ### Documentation - [x] `doc-not-needed` ### Matching PR in forked repository PR in forked repository: https://github.com/michaeljmarshall/pulsar/pull/20 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
