kssumin opened a new pull request, #7674:
URL: https://github.com/apache/incubator-seata/pull/7674

   
   
   - [x] I have read the 
[CONTRIBUTING.md](https://github.com/apache/incubator-seata/blob/2.x/CONTRIBUTING.md)
 guidelines.
   - [x] I have registered the PR 
[changes](https://github.com/apache/incubator-seata/tree/2.x/changes) in 
`changes/en-us/2.x.md` and `changes/zh-cn/2.x.md`.
   
   ### Ⅰ. Describe what this PR did
   
   This PR implements early rollback of global transactions when Transaction 
Manager (TM) disconnects, addressing issue #4422. 
   
   **Key Features:**
   - Add `TMDisconnectHandler` interface for handling TM disconnect events
   - Implement `DefaultTMDisconnectHandler` with VGroup-based matching logic
   - Integrate TM disconnect detection in `AbstractNettyRemotingServer`
   - Add configuration option `server.enableRollbackWhenDisconnect` (default: 
false)
   - Performance improvement: reduces rollback delays from 60 seconds to <1 
second
   
   **Implementation Details:**
   - **Primary matching**: TransactionServiceGroup (VGroup) - community 
consensus approach
   - **Secondary safety check**: ApplicationId matching when available
   - **Status transition**: BEGIN → TimeoutRollbacking → Rollback
   - **Configuration**: `server.enableRollbackWhenDisconnect=false` (disabled 
by default for safety)
   
   ### Ⅱ. Does this pull request fix one issue?
   
   Yes, this PR fixes issue #4422: "rollback of global transactions ahead of 
time"
   
   **Problem**: When TM crashes or disconnects, global transactions remain in 
BEGIN status for 60 seconds (default timeout), blocking resources and degrading 
system performance.
   
   **Solution**: This feature enables immediate TM disconnect detection and 
early rollback processing, reducing resource lock time from 60 seconds to <1 
second.
   
   ### Ⅲ. Why don't you add test cases (unit test/integration test)?
   
   I have added comprehensive test coverage:
   
   **Unit Tests:**
   - `DefaultTMDisconnectHandlerTest`: Tests core rollback logic and edge cases
   - `AbstractNettyRemotingServerTMDisconnectTest`: Tests server-side 
integration
   
   **Integration Tests:**
   - `TMDisconnectIntegrationTest`: Tests end-to-end TM disconnect scenarios
   - `DefaultCoordinatorInitTest`: Tests handler initialization and wiring
   
   **Test Coverage:**
   - Configuration enabled/disabled scenarios
   - VGroup and ApplicationId matching logic
   - Error handling and transaction state transitions
   - Mock-based testing for cross-module dependencies
   
   ### Ⅳ. Describe how to verify it
   
   **Testing:**
   ```bash
   # Run related tests
   mvn test -Dtest="*TMDisconnect*,*DefaultCoordinator*" -pl server,core
   
   # All tests pass with expected behavior:
   # - TM disconnect detection works correctly
   # - Rollback logic matches VGroup properly
   # - Configuration controls feature activation
   # - Error scenarios are handled gracefully
   ```
   
   **Manual Verification:**
   1. Enable feature: `server.enableRollbackWhenDisconnect=true`
   2. Start transaction and simulate TM disconnect
   3. Verify immediate rollback (logs show <1 second vs 60 second timeout)
   4. Confirm resource locks are released quickly
   
   **Performance Test:**
   - Before: 60 seconds resource lock duration
   - After: <1 second immediate rollback
   - ~60x performance improvement in resource recovery
   
   ### Ⅴ. Special notes for reviews
   
   **Safety Considerations:**
   - Feature is **disabled by default** 
(`server.enableRollbackWhenDisconnect=false`)
   - Uses conservative VGroup + ApplicationId matching to prevent false 
positives
   - Only affects transactions in BEGIN status
   - Maintains backward compatibility
   
   **Key Code Areas:**
   - `DefaultTMDisconnectHandler.shouldRollbackSession()`: Core matching logic
   - `AbstractNettyRemotingServer.handleDisconnect()`: TM disconnect detection
   - Configuration integration for feature toggle
   
   **Performance Impact:**
   - No performance overhead when disabled (default)
   - Minimal overhead when enabled (event-driven processing)
   - Significant improvement in failure scenarios (60s → <1s)
   
   **Breaking Changes:**
   - None. Feature is disabled by default and fully backward compatible
   - New configuration option: `server.enableRollbackWhenDisconnect=false`
   
   **Additional Notes:**
   - This addresses a long-standing performance issue in production environments
   - VGroup-based matching follows community consensus approach from issue 
discussions
   - Implementation is conservative to prevent false positive rollbacks
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to