yyj8 opened a new pull request, #22133:
URL: https://github.com/apache/pulsar/pull/22133

   ### Motivation
   
   The current Java client implementation has certain flaws in automatic fault 
switching.
   
   ```
   org.apache.pulsar.client.impl.AutoClusterFailover.java
   boolean probeAvailable(String url) {
           try {
               resolver.updateServiceUrl(url);
               InetSocketAddress endpoint = resolver.resolveHost();
               Socket socket = new Socket();
               socket.connect(new InetSocketAddress(endpoint.getHostName(), 
endpoint.getPort()), TIMEOUT);
               socket.close();
   
               return true
           } catch (Exception e) {
               log.warn("Failed to probe available, url: {}", url, e);
               return false;
           }
       }
   
   ```
   The client only establishes a TCP connection with the exposed connection 
address of the cluster to determine whether the cluster is available, which 
cannot adapt to scenarios where the cluster is partially unavailable (half 
dead). In this scenario, we hope to make corresponding fault switching 
judgments by initiating cluster health status requests to the cluster. Then 
within the cluster, we provide an admin management command to update the 
cluster's health status. To avoid this scenario, all businesses that need to 
connect to this cluster need to manually switch cluster connection addresses 
and restart applications, resulting in inconsistent link data among multiple 
business team due to inconsistent operation steps.
   
   <!-- Explain here the context, and why you're making that change. What is 
the problem you're trying to solve. -->
   
   ### Modifications
   
   <!-- Describe the modifications you've done. -->
   1. Add a new cluster health status request and response request;
   ```
   case HEALTH_CHECK:
        checkArgument(cmd.hasHealthCheck());
        handleHealthCheck(cmd.getHealthCheck());
        break;
   
   case HEALTH_CHECK_RESPONSE:
        checkArgument(cmd.hasHealthCheckResponse());
        handleHealthCheckResponse(cmd.getHealthCheckResponse());
        break;            
   ```
   
   3. Add a new admin management command to manually update the cluster health 
status;
   ```
   //Update cluster health status, available or unavailable. default available
   bin/pulsar-admin clusters update-health-status --status unavailable
   ```
   
   For other detailed information, please refer to the PR code.
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads 
(10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Does this pull request potentially affect one of the following parts:
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [x] The binary protocol
   - [ ] The REST endpoints
   - [x] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: <!-- ENTER URL HERE -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to