Hi experts, I have defined a service as follows in cluster.conf -
<service autostart="0" domain="mydomain" exclusive="0" max_restarts="5" name="mgmt" recovery="restart"> <script ref="myHaAgent"/> <ip ref="192.168.51.51"/> </service> I mentioned max_restarts=5 hoping that if cluster fails to start service 5 times, then it will relocate to another cluster node in failover domain. To check this, I turned down NIC hosting service's floating IP and got following logs - Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not detected Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1... Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1... Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip "192.168.51.51" returned 1 (generic error) Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service service:mgmt *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is recovering* Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed service service:mgmt Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip "192.168.51.51" returned 1 (generic error) Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start service:mgmt; return value: 1 Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service service:mgmt *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is recovering Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed service service:mgmt* Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is stopped Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is stopped But from the log it appears that cluster tried to restart service only ONCE before relocating. I was expecting cluster to retry starting this service five times on the same node before relocating Can anybody correct my understanding? Thanks, Parvez
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster