chungen0126 opened a new pull request, #10397: URL: https://github.com/apache/ozone/pull/10397
## What changes were proposed in this pull request? ### Summary Fix intermittent failure in `TestContainerStateMachine#testApplyTransactionFailure`, `TestContainerStateMachine#testContainerStateMachineRestartWithDNChangePipeline`, `testWriteStateMachineDataIdempotencyWithClosedContainer`, and `testApplyTransactionIdempotencyWithClosedContainer`. ### Changes #### For `testWriteStateMachineDataIdempotencyWithClosedContainer`: The test stemmed from a race between a retry write operation and a close container request. The test expects idempotency for identical data, but intermittent failures occurred because the initial write and the retry write contained different data. - Case A (Success): If close container executes first, no error occurs. - Case B (Failure): If the retry write executes before the close container, a mismatch occurs between the written data "hello" and the committed metadata. While the container successfully closes, it is later marked as "unhealthy" by the scanner due to a checksum mismatch. Fix: Updated the test to ensure data consistency during retries or adjusted the timing expectations to handle the race condition correctly. #### For testContainerStateMachineRestartWithDNChangePipeline & testApplyTransactionFailure These tests failed due to testContainerStateMachineFailures, which triggers a Ratis storage reset that breaks existing pipelines. Because these pipelines are closed passively via client-side retries instead of the ScrubbingService, they remain in the PipelineManager, leading to inevitable failures in subsequent tests that inadvertently select them. Fix: Make `testContainerStateMachineFailures` at the end of the class. #### For testApplyTransactionIdempotencyWithClosedContainer When the `close container` command finishes sending, it does not guarantee that the last applied index has been updated concurrently. If take snapshot`is triggered immediately afterward, the resulting snapshot may not reflect the latest state. Fix: Added a waiting step after the `close container` command is sent to ensure that the `last applied index` has been fully updated before proceeding to `take snapshot`. This guarantees that the generated snapshot is always up-to-date with the latest index. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-13482 https://issues.apache.org/jira/browse/HDDS-12215 https://issues.apache.org/jira/browse/HDDS-14962 https://issues.apache.org/jira/browse/HDDS-6115 ## How was this patch tested? Before changes: TestContainerStateMachine failed 22 times in 20 * 10 iterations. https://github.com/chungen0126/ozone/actions/runs/26375145366 After changes: TestContainerStateMachine passed: 20 * 10 iterations after changes. https://github.com/chungen0126/ozone/actions/runs/26706385476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
