We see this in the stork events every few days:
2023-06-20 20:01:23 daemon [2] dhcp4 is unreachable
2023-06-20 20:01:07 Communication with daemon [2] dhcp4 of app
kea@10.0.0.231 failed
After a restart of both dhcp4 and stork-agent on that adc1-server things
work again.
I will have to check the logs in more detail, sure.
2 things:
1) we collect the prometheus metrics from stork and visualize them in
Grafana.
storkserver_auth_unreachable_machine_total{instance=~"$instance"}
is always 0, even when the mentioned events are seen and I would assume
that one of 2 machines should be marked unreachable. Right?
2) it's not solving the problem at the root, but I consider setting up
some external monitoring to detect this outage and let the monitoring
restart the daemons ...
I use monit (https://mmonit.com/wiki/Monit/ConfigurationExamples) for
such things, and think of letting it do http-API-calls to isc-kea to
check things.
Right approach?
thanks, Stefan
--
ISC funds the development of this software with paid support subscriptions.
Contact us at https://www.isc.org/contact/ for more information.
To unsubscribe visit https://lists.isc.org/mailman/listinfo/kea-users.
Kea-users mailing list
Kea-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/kea-users