oneby-wang opened a new pull request, #25892:
URL: https://github.com/apache/pulsar/pull/25892
### Motivation
`SameAuthParamsLookupAutoClusterFailoverTest.testAutoClusterFailover` can
fail intermittently when run repeatedly with `invocationCount = 100`.
The failure reproduced as:
```text
Gradle suite > Gradle test >
org.apache.pulsar.broker.SameAuthParamsLookupAutoClusterFailoverTest >
testAutoClusterFailover[42](false) FAILED
org.awaitility.core.ConditionTimeoutException: Assertion condition
Arrays differ at element [1]: Healthy != PreFail expected [Healthy] but found
[PreFail] within 3 minutes.
at
org.apache.pulsar.broker.SameAuthParamsLookupAutoClusterFailoverTest.awaitStatesAndIndex(SameAuthParamsLookupAutoClusterFailoverTest.java:154)
at
org.apache.pulsar.broker.SameAuthParamsLookupAutoClusterFailoverTest.testAutoClusterFailover(SameAuthParamsLookupAutoClusterFailoverTest.java:134)
```
The key flaky signal is that the same failover provider was driven by two
scheduled check threads:
```text
2026-05-29T22:55:15,723 - INFO -
[broker-service-url-check-2796-1:SameAuthParamsLookupAutoClusterFailover] -
Failover to low priority pulsar service [0] pulsar://localhost:53673 --> [2]
pulsar://localhost:53683. States: [Failed, Failed, Healthy], Counters: [0, 0,
0] {}
2026-05-29T22:55:15,723 - INFO -
[broker-service-url-check-2795-1:SameAuthParamsLookupAutoClusterFailover] -
Failover to low priority pulsar service [0] pulsar://localhost:53673 --> [2]
pulsar://localhost:53683. States: [Failed, Failed, Healthy], Counters: [0, 0,
0] {}
2026-05-29T22:55:21,734 - INFO -
[broker-service-url-check-2796-1:SameAuthParamsLookupAutoClusterFailover] -
Recover to high priority pulsar service [2] pulsar://localhost:53683 --> [1]
pulsar://localhost:53683. States: [Failed, Healthy, Healthy], Counters: [0, 0,
0] {}
2026-05-29T22:55:21,734 - INFO -
[broker-service-url-check-2795-1:SameAuthParamsLookupAutoClusterFailover] -
Recover to high priority pulsar service [2] pulsar://localhost:53683 --> [1]
pulsar://localhost:53683. States: [Failed, Healthy, Healthy], Counters: [0, 0,
0] {}
2026-05-29T22:55:22,114 - INFO -
[broker-service-url-check-2796-1:SameAuthParamsLookupAutoClusterFailover] -
Recover to high priority pulsar service [1] pulsar://localhost:53683 --> [0]
pulsar://localhost:53683. States: [Healthy, Healthy, Healthy], Counters: [0, 0,
0] {}
2026-05-29T22:55:22,115 - WARN -
[broker-service-url-check-2795-1:SameAuthParamsLookupAutoClusterFailover] -
Failed to probe service availability {brokerServiceIndex=1, counters=[0, 0, 0],
states=[Healthy, Healthy, Healthy], url=pulsar://localhost:53683}
```
The test passed the provider into
`PulsarClient.builder().serviceUrlProvider(failover)`. `PulsarClientImpl`
already initializes the configured `ServiceUrlProvider` while building the
client. The test then called `failover.initialize(client)` again manually,
creating a second scheduled check task for the same provider instance.
Because both scheduled tasks mutate the same failover state, one task can
recover the current index back to `0` while the other task is still probing
index `1`. A transient failed probe can move index `1` to `PreFail`; after the
current index is `0`, the check loop no longer visits index `1`, so the test
waits until timeout with `[Healthy, PreFail, Healthy]`.
### Modifications
Removed the redundant manual `failover.initialize(client)` call from
`SameAuthParamsLookupAutoClusterFailoverTest`.
The provider lifecycle remains managed by `PulsarClient` through
`ClientBuilder.serviceUrlProvider(...)`.
### Verifying this change
- [x] Make sure that the change passes the CI checks.
This change is already covered by existing tests:
- `./gradlew :pulsar-broker:test --tests
"org.apache.pulsar.broker.SameAuthParamsLookupAutoClusterFailoverTest.testAutoClusterFailover"
-PtestRetryCount=0 --no-configuration-cache`
### Does this pull request potentially affect one of the following parts:
*If the box was checked, please highlight the changes*
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]