merlimat opened a new pull request, #25690:
URL: https://github.com/apache/pulsar/pull/25690

   ### Motivation
   
   `ShadowTopicRealBkTest.setup` flakes intermittently with `BindException: 
Address already in use`:
   
   ```
   java.io.IOException: java.net.BindException: Address already in use
       at 
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.runZookeeper(LocalBookkeeperEnsemble.java:220)
       at 
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.start(LocalBookkeeperEnsemble.java:424)
       at 
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.start(LocalBookkeeperEnsemble.java:434)
       at 
org.apache.pulsar.broker.service.persistent.ShadowTopicRealBkTest.setup(ShadowTopicRealBkTest.java:52)
   ```
   
   `PortManager.nextLockedFreePort()` only locks the port inside the JVM's 
static set — another process can grab the OS-level port between allocation and 
the ZK bind.
   
   When that happens, `cleanup()` also fails with NPE because 
`LocalBookkeeperEnsemble.stop()` dereferences `bookieComponents` (and other 
fields) that the aborted setup left null:
   
   ```
   java.lang.NullPointerException: Cannot read the array length because 
"<local1>" is null
       at 
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.stop(LocalBookkeeperEnsemble.java:513)
   ```
   
   Failing CI: 
https://github.com/apache/pulsar/actions/runs/25387331732/job/74455050956
   
   ### Modifications
   
   - `LocalBookkeeperEnsemble.runZookeeper`: retry on `BindException` up to 5 
attempts; the last attempt falls back to a kernel-assigned port (port 0) so it 
always finds a free port. Callers should read the actual bound port via 
`getZookeeperPort()`.
   - `LocalBookkeeperEnsemble.stop`: null-guard `bookieComponents`, `zkc`, 
`zks`, `serverFactory`, and `streamStorage`, and wrap each shutdown in 
try/catch so cleanup never NPEs after a partial setup.
   - `ShadowTopicRealBkTest`: read the bound ZK port back from the ensemble 
after start so the metadata store URL stays correct when the kernel-assigned 
fallback fires; tolerate a partially-initialized `PulsarService` in cleanup.
   
   ### Verifying this change
   
   - `ShadowTopicRealBkTest` passes locally with the change applied.
   - The retry path is covered by the existing test's standard run; the 
explicit fallback to port 0 has been smoke-tested by simulating a busy port.
   
   ### Does this pull request potentially affect one of the following parts:
   
   - [ ] Dependencies (does it add or upgrade a dependency): no
   - [ ] The public API: no
   - [ ] The schema: no
   - [ ] The default values of configurations: no
   - [ ] The threading model: no
   - [ ] The binary protocol: no
   - [ ] The REST endpoints: no
   - [ ] The admin CLI options: no
   - [ ] The metrics: no
   - [ ] Anything that affects deployment: no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to