Zhengcy05 opened a new pull request, #16329:
URL: https://github.com/apache/dubbo/pull/16329

   Fixes #16186 
   
   ## What is the purpose of the change?
   
   This PR fixes an issue in the multi-ZooKeeper registry scenario when one 
registry is unavailable and configured with `check=false`.
   
   Based on the issue report, in the `dubbo 3.3` branch, if multiple ZooKeeper 
registries are configured and one of them is not started with `check=false`, 
`Curator5ZookeeperClient` may keep an unconnected Curator client alive after 
`blockUntilConnected(...)` returns `false`. As a result, the following registry 
workflow still treats that client as usable and continues trying to create 
nodes on it. This can cause repeated failures in node creation, significantly 
slow down application startup, and eventually lead to startup failure.
   
   From the call path, the problem has two parts:
   
   1. `Curator5ZookeeperClient` does not fail fast enough on connection failure 
 
      If `blockUntilConnected(...)` returns `false`, the ZooKeeper client has 
not been successfully established. This should always be treated as a client 
creation failure, and the underlying Curator client should be closed 
immediately so it cannot be reused later.  
      The `check=false` semantic should only control whether the upper layer 
continues to throw when registry creation fails. It should not change whether 
the low-level ZooKeeper client itself is considered successfully created.
   
   2. `ListenerRegistryWrapper` is not null-safe when the upper layer returns a 
`null` registry  
      In `AbstractRegistryFactory#getRegistry()`, when registry creation fails 
and `check=false`, Dubbo logs a warning and returns `null`.  
      However, `RegistryFactoryWrapper` still wraps that result, and methods 
such as `ListenerRegistryWrapper#getUrl()` previously dereferenced the delegate 
directly. This could later trigger a `NullPointerException` in paths like 
`RegistryDirectory#subscribe()`.
   
   To address this, this PR includes two parts:
   
   - Fix the connection failure handling in `Curator5ZookeeperClient`
     - When `blockUntilConnected(...) == false`, always close the client first 
and then throw an exception.
     - Remove the dependency on `check` in this low-level failure path.
     - Keep the client creation semantic consistent: if the connection is not 
established, client creation fails.
   
   - Add null-safe handling to `ListenerRegistryWrapper`
     - Add a constructor that preserves the original `URL`.
     - Return the original URL from `getUrl()` when `registry == null`.
     - Return `false` from `isAvailable()` when `registry == null`.
     - Make `destroy()` a safe no-op when `registry == null`.
     - Add null protection to `unsubscribe()`, `isServiceDiscovery()`, and 
`lookup()`.
     - Keep the existing listener callback behavior unchanged.
   
   With these changes:
   
   - an unconnected ZooKeeper client will not be reused by the following 
registry workflow;
   - the existing `check=false` behavior at the registry factory layer remains 
unchanged;
   - a `null` registry can still be safely wrapped without introducing extra 
NPEs.
   
   ## Tests
   
   This PR adds focused regression coverage for both the low-level client 
behavior and the upper-layer `check=false` flow:
   
   - `Curator5ZookeeperClientTest`
     - verifies that when `blockUntilConnected(...)` fails, constructing 
`Curator5ZookeeperClient` throws `IllegalStateException` for both `check=true` 
and `check=false`;
     - verifies that `client.close()` is invoked on connection failure.
   
   - `AbstractRegistryFactoryTest`
     - verifies that when `createRegistry(url)` fails and `check=false`, 
`getRegistry(url)` returns `null`, preserving the existing upper-layer behavior.
   
   - `ListenerRegistryWrapperTest`
     - verifies that `getUrl()`, `isAvailable()`, `destroy()`, `lookup()`, 
`subscribe()`, and `unsubscribe()` are safe when `registry == null`;
     - verifies that listener callbacks are still triggered as expected in the 
`null registry` case.
   
   - `RegistryFactoryWrapperTest`
     - verifies that when the underlying `RegistryFactory#getRegistry(url)` 
returns `null`, the wrapper still returns the original URL safely and allows 
subscription-related calls without throwing exceptions.
   
   Verification command used:
   
   ```bash
   mvn -pl 
dubbo-remoting/dubbo-remoting-zookeeper-curator5,dubbo-registry/dubbo-registry-api
 \
     
-Dtest=Curator5ZookeeperClientTest,ListenerRegistryWrapperTest,RegistryFactoryWrapperTest,AbstractRegistryFactoryTest
 \
     -Dsurefire.failIfNoSpecifiedTests=false test -q
   ```
   
   ## Checklist
   - [x] Make sure there is a 
[GitHub_issue](https://github.com/apache/dubbo/issues) field for the change.
   - [x] Write a pull request description that is detailed enough to understand 
what the pull request does, how, and why.
   - [x] Write necessary unit-test to verify your logic correction. If the new 
feature or significant change is committed, please remember to add sample in 
[dubbo samples](https://github.com/apache/dubbo-samples) project.
   - [x] Make sure gitHub actions can pass. [Why the workflow is failing and 
how to fix it?](../CONTRIBUTING.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to