yyj8 opened a new pull request, #22133: URL: https://github.com/apache/pulsar/pull/22133
### Motivation The current Java client implementation has certain flaws in automatic fault switching. ``` org.apache.pulsar.client.impl.AutoClusterFailover.java boolean probeAvailable(String url) { try { resolver.updateServiceUrl(url); InetSocketAddress endpoint = resolver.resolveHost(); Socket socket = new Socket(); socket.connect(new InetSocketAddress(endpoint.getHostName(), endpoint.getPort()), TIMEOUT); socket.close(); return true } catch (Exception e) { log.warn("Failed to probe available, url: {}", url, e); return false; } } ``` The client only establishes a TCP connection with the exposed connection address of the cluster to determine whether the cluster is available, which cannot adapt to scenarios where the cluster is partially unavailable (half dead). In this scenario, we hope to make corresponding fault switching judgments by initiating cluster health status requests to the cluster. Then within the cluster, we provide an admin management command to update the cluster's health status. To avoid this scenario, all businesses that need to connect to this cluster need to manually switch cluster connection addresses and restart applications, resulting in inconsistent link data among multiple business team due to inconsistent operation steps. <!-- Explain here the context, and why you're making that change. What is the problem you're trying to solve. --> ### Modifications <!-- Describe the modifications you've done. --> 1. Add a new cluster health status request and response request; ``` case HEALTH_CHECK: checkArgument(cmd.hasHealthCheck()); handleHealthCheck(cmd.getHealthCheck()); break; case HEALTH_CHECK_RESPONSE: checkArgument(cmd.hasHealthCheckResponse()); handleHealthCheckResponse(cmd.getHealthCheckResponse()); break; ``` 3. Add a new admin management command to manually update the cluster health status; ``` //Update cluster health status, available or unavailable. default available bin/pulsar-admin clusters update-health-status --status unavailable ``` For other detailed information, please refer to the PR code. ### Verifying this change - [ ] Make sure that the change passes the CI checks. *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (10MB)* - *Extended integration test for recovery after broker failure* ### Does this pull request potentially affect one of the following parts: <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> *If the box was checked, please highlight the changes* - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [x] The binary protocol - [ ] The REST endpoints - [x] The admin CLI options - [ ] The metrics - [ ] Anything that affects deployment ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [x] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> ### Matching PR in forked repository PR in forked repository: <!-- ENTER URL HERE --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org