We see this in the stork events every few days:

2023-06-20 20:01:23     daemon [2] dhcp4 is unreachable 

2023-06-20 20:01:07 Communication with daemon [2] dhcp4 of app kea@10.0.0.231 failed

After a restart of both dhcp4 and stork-agent on that adc1-server things work again.

I will have to check the logs in more detail, sure.

2 things:

1) we collect the prometheus metrics from stork and visualize them in Grafana.

storkserver_auth_unreachable_machine_total{instance=~"$instance"}

is always 0, even when the mentioned events are seen and I would assume that one of 2 machines should be marked unreachable. Right?

2) it's not solving the problem at the root, but I consider setting up some external monitoring to detect this outage and let the monitoring restart the daemons ...

I use monit (https://mmonit.com/wiki/Monit/ConfigurationExamples) for such things, and think of letting it do http-API-calls to isc-kea to check things.

Right approach?

thanks, Stefan
--
ISC funds the development of this software with paid support subscriptions. 
Contact us at https://www.isc.org/contact/ for more information.

To unsubscribe visit https://lists.isc.org/mailman/listinfo/kea-users.

Kea-users mailing list
Kea-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/kea-users

Reply via email to