rty813 commented on issue #3757: URL: https://github.com/apache/hertzbeat/issues/3757#issuecomment-3297423188
I used AI to read the code of group convergence and found a problem. > After the cache is cleared, even if the alert is still in the FIRING state, the repeatInterval notification will not be triggered! > > Because: > - The alertFingerprints cache is cleared after each send. > - The cache is empty the next time it's checked, so sendGroupAlert returns directly. > - The repeatInterval logic is inside sendGroupAlert and cannot be executed. > ### Design Comparison > #### Prometheus Alertmanager's Approach > The Prometheus Alertmanager does not clear the alert cache. Instead, it: > > Persists the alert until a RESOLVED signal is received > Continues to send unresolved alerts based on the repeat_interval > Removes the alert only when a clear RESOLVED signal is received > #### Current Implementation Issues > ```//Problematic Code Location > cache.getAlertFingerprints().clear(); // Should not clear unconditionally! > ``` > Should: > - Only clear resolved (RESOLVED) alerts > - Retain FIRING alerts to support repeated notifications <img width="3600" height="5305" alt="Image" src="https://github.com/user-attachments/assets/cdcc2c0d-444a-407e-b21d-5305735bf936" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
