mysqld_exporter contains some useful code for a check on heartbeat lagging groups: - name: example.rules rules: - record: mysql_heartbeat_lag_seconds expr: mysql_heartbeat_now_timestamp_seconds - mysql_heartbeat_stored_timestamp_seconds ... - alert: MySQLReplicationLag expr: (mysql_heartbeat_lag_seconds > 30) and ON(instance) ( predict_linear(mysql_heartbeat_lag_seconds[5m], 60 * 2) > 0)
Now, in my case the master server_id may change due to the way we operate our MySQL cluster, and hence, we may get the following metrics {instance="batchdb001.mo-staging99-nonprod.dus1.cloud",job= "prometheus-mysqld-exporter",server_id="2001500"} 0.5187849998474121 {instance="batchdb001.mo-staging99-nonprod.dus1.cloud",job= "prometheus-mysqld-exporter",server_id="3212"} 1594051555.519615 As you can see, for one instance there's multiple metrics only one of which is the right one as it refers to the correct server_id. In principle, it's easy to determine the correct one as there's also a metric mysql_slave_status_master_server_id which returns the correct server_id: mysql_slave_status_master_server_id{instance= "batchdb001.mo-staging99-nonprod.dus1.cloud",job= "prometheus-mysqld-exporter",master_host="dbmaster001",master_uuid= "005e9c3d-baea-11ea-ab06-027e6d15fde3"}. 2001500 so for the alert definition I would have to take into account the server_id: - alert: MySQLReplicationLag expr: (mysql_heartbeat_lag_seconds{server_id="2001500"} > 30) and ON( instance) ... but how to do this in my case, where server_id has to be compared with a metrics value (mysql_slave_status_master_server_id)? -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cca8ab4b-eae3-4c54-be79-ef1137e6a052o%40googlegroups.com.