Hello Maybe you missing recovery="restart" in your services
2012/10/31 Parvez Shaikh <parvez.h.sha...@gmail.com> > Hi Digimer, > > cman_tool version gives following - > > 6.2.0 config 22 > > Cluster.conf - > > <?xml version="1.0"?> > <cluster alias="PARVEZ" config_version="22" name="PARVEZ"> > <clusternodes> > <clusternode name="myblade2" nodeid="2" votes="1"> > <fence> > <method name="1"> > <device blade="2" > missing_as_off="1" name="BladeCenterFencing-1"/> > </method> > </fence> > </clusternode> > <clusternode name="myblade1" nodeid="1" votes="1"> > <fence> > <method name="1"> > <device blade="1" > missing_as_off="1" name="BladeCenterFencing-1"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman expected_votes="1" two_node="1"/> > <fencedevices> > <fencedevice agent="fence_bladecenter" ipaddr=" > mm-1.mydomain.com" login="XXXX" name="BladeCenterFencing-1" > passwd="XXXXX" shell_timeout="10"/> > </fencedevices> > <rm> > <resources> > <script file="/localhome/my/my_ha" > name="myHaAgent"/> > <ip address="192.168.51.51" monitor_link="1"/> > </resources> > <failoverdomains> > <failoverdomain name="mydomain" nofailback="1" > ordered="1" restricted="1"> > <failoverdomainnode name="myblade2" > priority="2"/> > <failoverdomainnode name="myblade1" > priority="1"/> > </failoverdomain> > </failoverdomains> > <service autostart="0" domain="mydomain" exclusive="0" > max_restarts="5" name="mgmt" recovery="restart"> > <script ref="myHaAgent"/> > <ip ref="192.168.51.51"/> > </service> > </rm> > <fence_daemon clean_start="1" post_fail_delay="0" > post_join_delay="0"/> > </cluster> > > Thanks, > Parvez > > On Tue, Oct 30, 2012 at 9:25 PM, Digimer <li...@alteeve.ca> wrote: > >> On 10/30/2012 01:54 AM, Parvez Shaikh wrote: >> > Hi experts, >> > >> > I have defined a service as follows in cluster.conf - >> > >> > <service autostart="0" domain="mydomain" exclusive="0" >> > max_restarts="5" name="mgmt" recovery="restart"> >> > <script ref="myHaAgent"/> >> > <ip ref="192.168.51.51"/> >> > </service> >> > >> > I mentioned max_restarts=5 hoping that if cluster fails to start service >> > 5 times, then it will relocate to another cluster node in failover >> domain. >> > >> > To check this, I turned down NIC hosting service's floating IP and got >> > following logs - >> > >> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not >> > detected >> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1... >> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1... >> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip >> > "192.168.51.51" returned 1 (generic error) >> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service >> > service:mgmt >> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is >> > recovering* >> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed >> > service service:mgmt >> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip >> > "192.168.51.51" returned 1 (generic error) >> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start >> > service:mgmt; return value: 1 >> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service >> > service:mgmt >> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is >> > recovering >> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed >> > service service:mgmt* >> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is >> > stopped >> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is >> > stopped >> > >> > But from the log it appears that cluster tried to restart service only >> > ONCE before relocating. >> > >> > I was expecting cluster to retry starting this service five times on the >> > same node before relocating >> > >> > Can anybody correct my understanding? >> > >> > Thanks, >> > Parvez >> >> What version? Please paste your full cluster.conf. >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> > > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster