Hello! This is somewhere between a feature request and a number of questions to help me understand some of the design decisions made in Alertmanager.
When Alertmanager cannot expand a template, for example because the operator has made a mistake in the template: receivers: - name: test email_configs: - to: exam...@example.com from: nore...@example.com smarthost: 127.0.0.1:8585 require_tls: false text: "{{ $labels.foo }}" route: receiver: test group_wait: 30s group_interval: 1m repeat_interval: 1m it logs an error similar to the following: ts=2023-02-07T13:28:04.815Z caller=dispatch.go:352 level=error component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="test/email[0]: notify retry canceled due to unrecoverable error after 1 attempts: execute text template: template: :1: undefined variable \"$labels\"" I understand that following this error Alertmanager will begin the retry stage of the notification until the next group_interval or repeat_interval. To fix the issue the user must go and fix their template and reload Alertmanager. However, it seems to me that it's not uncommon to have quite complex templates, with for loops, if statements, and sub-templates. It can be quite difficult to verify the correctness of these templates at "compile-time", and if using amtool, you need to test all possible branches in the template. While I appreciate the responsibility of writing correct templates is on the user, I have also been considering whether Alertmanager should be more tolerant of template errors, and attempt to send some kind of notification when this happens. For example, falling back to the default template that we have high confidence of being correct. However, before discussing the issue further, I would like to first understand whether there is a conscious design choice behind how Alertmanager operates under such failures, or whether it came to be perhaps due to ease of implementation. Thank you, and I'm very interested to hear you opinions. Kind regards, George -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/deb3a588-fef3-4099-8f04-3c6bdea77134n%40googlegroups.com.