wu-sheng commented on issue #13492: URL: https://github.com/apache/skywalking/issues/13492#issuecomment-3290687773
> In our use case, the recovery detection mechanism relies on the existing silence-period(configured in minutes) and SkyWalking's inherent metric aggregation window (which typically operates on a minute-based cycle). After the silencePeriodends, we wait for an additional metric collection cycle (typically one minute) to confirm no new alarms are triggered, before considering the alarm recovered. This is a possible way, but it should not be an official way. Sometimes, silence period lasts for a while, there is no point to wait for its end. > If this approach is acceptable, I'll implement the corresponding code to enhance the alarm kernel with recovered status notification capability for alarm rules. There are two things about this. 1. We need to keep the alarm kernel aware of the triggered status, then when it is recovered, this status should be reset to normal, and a new message should be sent out. A new recovery API should be created. 2. AlarmRecord is an immutable row. How will you keep this recovery status? And mark triggered the alarm. We were hesitant about how to implement this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
