Duansg commented on issue #3757:
URL: https://github.com/apache/hertzbeat/issues/3757#issuecomment-3271839596

   Hi, @rty813 thank you for your question. First, to address your inquiry:
   
   I believe this design is appropriate. It prevents an "alert storm" where you 
might receive `Alert A triggered`, followed by `Alert A resolved`, and then 
`Alert A triggered` again within the same timeframe. This ensures messages from 
a group are sent at regular intervals rather than fluctuating with system 
jitter. Therefore, by design, a group will send at most one notification within 
the [group_interval] window, even if its status changes.
   
   I'll share my understanding as well:
   At first glance, this might seem odd: Why didn't the notification trigger 
immediately when an "alarm recovery/new alarm" occurred within the 
[group_interval] window?
   
   I believe the first step is to understand the relationship between these 
configuration items. To put it in simpler terms:
   
   > Wait time [groupWait]: When the first alert is triggered, notifications 
will not be sent immediately. Instead, a short delay period will be observed 
first.
   - "Hold on, don't send it just yet," because more alerts from the same group 
might trigger soon. This way, you can consolidate them into a single message 
and avoid an alert storm.
   - If there are more alerts in the same group, they will be sent together; if 
not, only one alert will be sent.
   
   > Interval [groupInterval]: After an alarm is triggered once within the same 
group, how long must elapse before it can be triggered again.
   - Alerts within the same group will only be re-notified according to this 
interval window.
   - Even if some of these alert statuses change, this prevents excessive 
notifications from the same group.
   
   > Repeat Interval[repeatInterval]: How often the same alert repeats.
   - If the alert does not resolve, notifications will be periodically resent 
to prevent you from missing critical alert messages.
   
   
   Additionally:
   - You can refer to the [official 
documentation](https://hertzbeat.apache.org/zh-cn/docs/help/alarm_group).
   - Alternatively, refer to Prometheus's [configuration 
options](https://prometheus.io/docs/alerting/latest/configuration/#route), 
which are consistent with Heartbeat's.
   
   @rty813 @tomsun28 @all If I've misunderstood something, please point it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to