Digimer, Out put of rpm -q cman
cman-2.0.115-34.el5 There is no http mentioned in fencedevice, I think email client is inserting it. Thanks, Parvez On Wed, Oct 31, 2012 at 10:14 AM, Digimer <[email protected]> wrote: > What does 'rpm -q cman' return? > > This looks very odd; > <fencedevice agent="fence_bladecenter" > > ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>" > > Please remove this for now; > > <fence_daemon clean_start="1" post_fail_delay="0" > > post_join_delay="0"/> > > In general, you don't want to assume a clean start. It's asking for > trouble. The default delays are also sane. You can always come back to > this later after this issue is resolved, if you wish. > > On 10/30/2012 09:20 PM, Parvez Shaikh wrote: > > Hi Digimer, > > > > cman_tool version gives following - > > > > 6.2.0 config 22 > > > > Cluster.conf - > > > > <?xml version="1.0"?> > > <cluster alias="PARVEZ" config_version="22" name="PARVEZ"> > > <clusternodes> > > <clusternode name="myblade2" nodeid="2" votes="1"> > > <fence> > > <method name="1"> > > <device blade="2" > > missing_as_off="1" name="BladeCenterFencing-1"/> > > </method> > > </fence> > > </clusternode> > > <clusternode name="myblade1" nodeid="1" votes="1"> > > <fence> > > <method name="1"> > > <device blade="1" > > missing_as_off="1" name="BladeCenterFencing-1"/> > > </method> > > </fence> > > </clusternode> > > </clusternodes> > > <cman expected_votes="1" two_node="1"/> > > <fencedevices> > > <fencedevice agent="fence_bladecenter" > > ipaddr="mm-1.mydomain.com <http://mm-1.mydomain.com>" login="XXXX" > > name="BladeCenterFencing-1" passwd="XXXXX" shell_timeout="10"/> > > </fencedevices> > > <rm> > > <resources> > > <script file="/localhome/my/my_ha" > > name="myHaAgent"/> > > <ip address="192.168.51.51" monitor_link="1"/> > > </resources> > > <failoverdomains> > > <failoverdomain name="mydomain" nofailback="1" > > ordered="1" restricted="1"> > > <failoverdomainnode name="myblade2" > > priority="2"/> > > <failoverdomainnode name="myblade1" > > priority="1"/> > > </failoverdomain> > > </failoverdomains> > > <service autostart="0" domain="mydomain" exclusive="0" > > max_restarts="5" name="mgmt" recovery="restart"> > > <script ref="myHaAgent"/> > > <ip ref="192.168.51.51"/> > > </service> > > </rm> > > <fence_daemon clean_start="1" post_fail_delay="0" > > post_join_delay="0"/> > > </cluster> > > > > Thanks, > > Parvez > > > > On Tue, Oct 30, 2012 at 9:25 PM, Digimer <[email protected] > > <mailto:[email protected]>> wrote: > > > > On 10/30/2012 01:54 AM, Parvez Shaikh wrote: > > > Hi experts, > > > > > > I have defined a service as follows in cluster.conf - > > > > > > <service autostart="0" domain="mydomain" > exclusive="0" > > > max_restarts="5" name="mgmt" recovery="restart"> > > > <script ref="myHaAgent"/> > > > <ip ref="192.168.51.51"/> > > > </service> > > > > > > I mentioned max_restarts=5 hoping that if cluster fails to start > > service > > > 5 times, then it will relocate to another cluster node in failover > > domain. > > > > > > To check this, I turned down NIC hosting service's floating IP and > got > > > following logs - > > > > > > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> Link for eth1: > Not > > > detected > > > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on > eth1... > > > Oct 30 14:11:49 XXXX clurgmgrd: [10753]: <warning> No link on > eth1... > > > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> status on ip > > > "192.168.51.51" returned 1 (generic error) > > > Oct 30 14:11:49 XXXX clurgmgrd[10753]: <notice> Stopping service > > > service:mgmt > > > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service > > service:mgmt is > > > recovering* > > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Recovering failed > > > service service:mgmt > > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> start on ip > > > "192.168.51.51" returned 1 (generic error) > > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #68: Failed to > start > > > service:mgmt; return value: 1 > > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Stopping service > > > service:mgmt > > > *Oct 30 14:12:00 XXXX clurgmgrd[10753]: <notice> Service > > service:mgmt is > > > recovering > > > Oct 30 14:12:00 XXXX clurgmgrd[10753]: <warning> #71: Relocating > > failed > > > service service:mgmt* > > > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service > > service:mgmt is > > > stopped > > > Oct 30 14:12:01 XXXX clurgmgrd[10753]: <notice> Service > > service:mgmt is > > > stopped > > > > > > But from the log it appears that cluster tried to restart service > only > > > ONCE before relocating. > > > > > > I was expecting cluster to retry starting this service five times > > on the > > > same node before relocating > > > > > > Can anybody correct my understanding? > > > > > > Thanks, > > > Parvez > > > > What version? Please paste your full cluster.conf. > > > > -- > > Digimer > > Papers and Projects: https://alteeve.ca/w/ > > What if the cure for cancer is trapped in the mind of a person > without > > access to education? > > > > > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
