RongtongJin opened a new issue, #10494:
URL: https://github.com/apache/rocketmq/issues/10494

   ### Problem
   `HATest.testSemiSyncReplica` can be flaky with:
   
   ```text
   expected:<PUT_OK> but was:<FLUSH_SLAVE_TIMEOUT>
   ```
   
   The test setup waits until the slave-side HA client enters `TRANSFER`, then 
immediately starts semi-sync writes. Entering `TRANSFER` only proves the slave 
has connected locally. The master-side `HAConnection` may not have received the 
slave's initial offset report yet, leaving `slaveAckOffset` at `-1` during the 
first synchronous replication request.
   
   ### Impact
   On slower or busy CI machines, the first `asyncPutMessage` can race the 
initial slave ack report and time out even though the HA connection is 
otherwise healthy.
   
   ### Proposed fix
   Make the test wait for the actual readiness condition needed by semi-sync 
replication: the master-side HA connection is in `TRANSFER` and its 
`slaveAckOffset` has caught up to the slave's current max physical offset 
before sending messages.
   
   ### Validation
   Ran locally with Maven 3.9.9:
   
   ```bash
   mvn -pl store -am -Dtest=HATest#testSemiSyncReplica -DskipITs 
-DfailIfNoTests=false test
   mvn -pl store -am -Dtest=HATest -DskipITs -DfailIfNoTests=false test
   ```
   
   The full `HATest` run reported `Tests run: 4, Failures: 0, Errors: 0, 
Skipped: 1`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to