youjie23 commented on code in PR #13581:
URL: https://github.com/apache/skywalking/pull/13581#discussion_r2539003357
##########
docs/en/setup/backend/backend-alarm.md:
##########
@@ -40,7 +40,8 @@ The metrics names in the expression could be found in the
[list of all potential
- **Silence period**. After the alarm is triggered at Time-N (TN), there will
be silence during the **TN -> TN + period**.
By default, it works in the same manner as **period**. The same Alarm (having
the same ID in the same metrics name) may only be triggered once within a
period.
- **Recovery observation period**. Defines the number of consecutive periods
that the alarm condition must remain false before the alarm is considered
recovered. When the alarm condition becomes false, the system enters an
observation period. If the condition remains false for the specified number of
periods, a recovery notification is sent. If the condition becomes true again
during the observation period, the alarm returns to the FIRING state.
-The default value is 0, which means immediate recovery notification when the
condition becomes false.
+The default value is 0, which means immediate recovery notification when the
condition becomes false.
+**Notice:** because the alarm will not be triggered again during the silence
period, recovery won't be triggered during the silence period after an alarm is
fired. It will be in the OBSERVING_RECOVERY state, the recovery will be
triggered only after the silence period is over and the condition remains false
for the specified observation periods.
Review Comment:
Sorry for the delay. @wu-sheng
> Because during the silence period, the alarm will not trigger again.
Thank you for your review and patience. @wankai123
Our point is that the alarm should have been firing and sent to the webhooks
beforeit transitions to the silenced-firing state.
In the previous code, the transition from silenced-firingto recovered would
only occur when the `recovery-observation-periodis` set to `0`. This means an
immediate recovery notification is sent when the condition becomes false, as
shown in the following code snippet:
```
public void onMismatch() {
//other code
recoveryObservationCountdown--;
silenceCountdown--;
switch (currentState) {
case FIRING:
case SILENCED:
if (this.recoveryObservationCountdown < 0) { //
This condition would only be met if the recovery-observation-period is set to 0.
transitionTo(State.RECOVERED);
} else {
transitionTo(State.OBSERVING_RECOVERY);
}
break;
//other code
}
```
If we remove the condition, the recovery notification will be delayed by one
minute longer than expected.
**I want to confirm if we are aligned on this point**: Should the silent
period indeed have effect on the timing of the recovery notification?
I am not entirely sure that we have the same understanding here. Are we?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]