Hi, I am using recovery=restart as evident from earlier attached cluster.conf
Thanks, Parvez On Wed, Oct 31, 2012 at 2:53 PM, emmanuel segura <emi2f...@gmail.com> wrote: > Hello > > Maybe you missing recovery="restart" in your services > > 2012/10/31 Parvez Shaikh <parvez.h.sha...@gmail.com> > >> Hi Digimer, >> >> cman_tool version gives following - >> >> 6.2.0 config 22 >> >> Cluster.conf - >> >> <?xml version="1.0"?> >> <cluster alias="PARVEZ" config_version="22" name="PARVEZ"> >> <clusternodes> >> <clusternode name="myblade2" nodeid="2" votes="1"> >> <fence> >> <method name="1"> >> <device blade="2" >> missing_as_off="1" name="BladeCenterFencing-1"/> >> </method> >> </fence> >> </clusternode> >> <clusternode name="myblade1" nodeid="1" votes="1"> >> <fence> >> <method name="1"> >> <device blade="1" >> missing_as_off="1" name="BladeCenterFencing-1"/> >> </method> >> </fence> >> </clusternode> >> </clusternodes> >> <cman expected_votes="1" two_node="1"/> >> <fencedevices> >> <fencedevice agent="fence_bladecenter" ipaddr=" >> mm-1.mydomain.com" login="XXXX" name="BladeCenterFencing-1" >> passwd="XXXXX" shell_timeout="10"/> >> </fencedevices> >> <rm> >> <resources> >> <script file="/localhome/my/my_ha" >> name="myHaAgent"/> >> <ip address="192.168.51.51" monitor_link="1"/> >> </resources> >> <failoverdomains> >> <failoverdomain name="mydomain" nofailback="1" >> ordered="1" restricted="1"> >> <failoverdomainnode name="myblade2" >> priority="2"/> >> <failoverdomainnode name="myblade1" >> priority="1"/> >> </failoverdomain> >> </failoverdomains> >> <service autostart="0" domain="mydomain" exclusive="0" >> max_restarts="5" name="mgmt" recovery="restart"> >> <script ref="myHaAgent"/> >> <ip ref="192.168.51.51"/> >> </service> >> </rm> >> <fence_daemon clean_start="1" post_fail_delay="0" >> post_join_delay="0"/> >> </cluster> >> >> Thanks, >> Parvez >> >> On Tue, Oct 30, 2012 at 9:25 PM, Digimer <li...@alteeve.ca> wrote: >> >>> On 10/30/2012 01:54 AM, Parvez Shaikh wrote: >>> > Hi experts, >>> > >>> > I have defined a service as follows in cluster.conf - >>> > >>> > <service autostart="0" domain="mydomain" exclusive="0" >>> > max_restarts="5" name="mgmt" recovery="restart"> >>> > <script ref="myHaAgent"/> >>> > <ip ref="192.168.51.51"/> >>> > </service> >>> > >>> > I mentioned max_restarts=5 hoping that if cluster fails to start >>> service >>> > 5 times, then it will relocate to another cluster node in failover >>> domain. >>> > >>> > To check this, I turned down NIC hosting service's floating IP and got >>> > following logs - >>> > >>> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: Not >>> > detected >>> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1... >>> > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on eth1... >>> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip >>> > "192.168.51.51" returned 1 (generic error) >>> > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service >>> > service:mgmt >>> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt >>> is >>> > recovering* >>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed >>> > service service:mgmt >>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip >>> > "192.168.51.51" returned 1 (generic error) >>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to start >>> > service:mgmt; return value: 1 >>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service >>> > service:mgmt >>> > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service service:mgmt >>> is >>> > recovering >>> > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating failed >>> > service service:mgmt* >>> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is >>> > stopped >>> > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service service:mgmt is >>> > stopped >>> > >>> > But from the log it appears that cluster tried to restart service only >>> > ONCE before relocating. >>> > >>> > I was expecting cluster to retry starting this service five times on >>> the >>> > same node before relocating >>> > >>> > Can anybody correct my understanding? >>> > >>> > Thanks, >>> > Parvez >>> >>> What version? Please paste your full cluster.conf. >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ >>> What if the cure for cancer is trapped in the mind of a person without >>> access to education? >>> >> >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > > -- > esta es mi vida e me la vivo hasta que dios quiera > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster