On Saturday, 13 May 2023 at 03:26:18 UTC+1 Christoph Anton Mitterer wrote:
(If there is jitter in the sampling time, then occasionally it might look at 4 or 6 samples) Jitter in the sense that the samples are taken at slightly different times? Yes. Each sample is timestamped with the time the scrape took place. Consider a 5 minute window which contains generally contains 5 samples at 1 minute intervals: |...*......*......*......*......*....|...*.... Now consider what happens when one of those samples is right on the boundary of the window: |*......*......*......*......*.......|*....... Depending on the exact timings that the scrape takes place, it's possible that the first sample could fall outside: *|......*......*......*......*.......|*....... Or the next sample could fall inside: |*......*......*......*......*......*|....... Do you think that could affect the desired behaviour? In my experience, the scraping regularity of Prometheus is very good (just try putting "up[5m]" into the PromQL browser and looking at the timestamps of the samples, they seem to increment in exact intervals). Oo it's unlikely to happen much, and it might when the system is under high load, I guess. Or it might never happen, if Prometheus writes the timestamps of the times it *wanted* to make the scrape, not when it actually occurred. Determining that would require looking in source code. Another point I basically don't understand... how does all that relate to the scrap intervals? The plain up == 0 simply looks at the most recent sample (going back up to 5m as you've said in the other thread). The series up[Ns] looks back N seconds, giving whichever samples are within there and now. AFAIU, there it doesn't go "automatically" back any further (like the 5m above), right? That's correct. So if you're trying to make mutual expressions which fire in case A but not B, and case B but not A, then you'd probably be better off writing then to both use up[5m]. min_over_time(up[5m]) == 0 # use this instead of "up == 0 // for: 5m" for the main alert. In order for the for: to work I need at least two samples No, you just need two rule evaluations. The rule evaluation interval doesn't have to be the same as the scrape interval, and even if they are the same, they are not synchronized. If what I've written above is correct (and it may well not be!), then expr: up == 0 for: 5m will fire if "up" is zero for 6 cycles, whereas (*rule evaluation* cycles, if your rule evaluation interval is 1m) As far as I understand you... 6 cycles of rule evaluation interval... with at least two samples within that interval, right? No. The expression "up" is evaluated at each rule evaluation time, and it gives the most recent value of "up", looking back up to 5 minutes. So if you had a scrape interval of 2 minutes, with a rule evaluation interval of 1 minute it could be that two rule evaluations of "up" see the same scraped value. (This can also happen in real life with a 1 minute scrape interval, if you have a failed scrape) Once an alert fires (in prometheus), even i just for one evaluation interval cycle.... and there is no inhibiton rule or so in alertmanager... is it expected that a notification is sent out for sure,... regardless of alertmanagers grouping settings? There is group_wait. If the alert were to trigger and clear within the group_wait interval, I'd expect no alert to be sent. But I've not tested that. Like when the alert fires for one short 15s evaluation interval and clears again afterwards,... but group_wait: is set to some 7d ... is it expected to send that singe firing event after 7d, even if it has resolved already once the 7d are over and there was .g. no further firing in between? You'll need to test it, but my expectation would be that it wouldn't send *anything* for 7 days (while it waits for other similar alerts to appear), and if all alerts have disappeared within that period, that nothing would be sent. However, I don't know if the 7 day clock resets as soon as all alerts go away, or it continues to tick. If this matters to you, then test it. Nobody in their right might would use 7d for group_wait of course. Typically you might set it to around a minute, so that if a bunch of similar alerts fire within that 1 minute period, they are gathered together into a single notification rather than a slew of separate notifications. HTH, Brian. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c2aa08ef-e27d-4c3c-b364-8a064d0fc7d0n%40googlegroups.com.