Duansg commented on issue #3757: URL: https://github.com/apache/hertzbeat/issues/3757#issuecomment-3271839596
Hi, @rty813 thank you for your question. First, to address your inquiry: I believe this design is appropriate. It prevents an "alert storm" where you might receive `Alert A triggered`, followed by `Alert A resolved`, and then `Alert A triggered` again within the same timeframe. This ensures messages from a group are sent at regular intervals rather than fluctuating with system jitter. Therefore, by design, a group will send at most one notification within the [group_interval] window, even if its status changes. I'll share my understanding as well: At first glance, this might seem odd: Why didn't the notification trigger immediately when an "alarm recovery/new alarm" occurred within the [group_interval] window? I believe the first step is to understand the relationship between these configuration items. To put it in simpler terms: > Wait time [groupWait]: When the first alert is triggered, notifications will not be sent immediately. Instead, a short delay period will be observed first. - "Hold on, don't send it just yet," because more alerts from the same group might trigger soon. This way, you can consolidate them into a single message and avoid an alert storm. - If there are more alerts in the same group, they will be sent together; if not, only one alert will be sent. > Interval [groupInterval]: After an alarm is triggered once within the same group, how long must elapse before it can be triggered again. - Alerts within the same group will only be re-notified according to this interval window. - Even if some of these alert statuses change, this prevents excessive notifications from the same group. > Repeat Interval[repeatInterval]: How often the same alert repeats. - If the alert does not resolve, notifications will be periodically resent to prevent you from missing critical alert messages. Additionally: - You can refer to the [official documentation](https://hertzbeat.apache.org/zh-cn/docs/help/alarm_group). - Alternatively, refer to Prometheus's [configuration options](https://prometheus.io/docs/alerting/latest/configuration/#route), which are consistent with Heartbeat's. @rty813 @tomsun28 @all If I've misunderstood something, please point it out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
