Re: [Linux-cluster] Fence Issue on BL 460C G6

Ben Turner Thu, 28 Oct 2010 07:16:25 -0700

Ok I see you are using fence_ilo but I don't see a messages snip.  The messages 
snip will tell us exactly what is going on, there isn't much I can do without 
it.  You may want to try running:


>From node 2:
# fence_node rhel-cluster-node1.mgmt.local

>From node 1:
# fence_node rhel-cluster-node2.mgmt.local

This should cause your nodes to get rebooted, if it doesn't then there is a 
problem with your fencing config.  A snip of your messages file from one of 
these events would help out here.

-Ben






----- "Wahyu Darmawan" <[email protected]> wrote:

> Hi Ben,
> Here is my cluster.conf. Need your help please.
> 
> 
> <?xml version="1.0"?>
> <cluster alias="PORTAL_WORLD" config_version="32" name="PORTAL_WORLD">
>       <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
>       <clusternodes>
>               <clusternode name="rhel-cluster-node1.mgmt.local" nodeid="1"
> votes="1">
>                       <fence>
>                               <method name="1">
>                                       <device name="NODE1-ILO"/>
>                               </method>
>                       </fence>
>               </clusternode>
>               <clusternode name="rhel-cluster-node2.mgmt.local" nodeid="2"
> votes="1">
>                       <fence>
>                               <method name="1">
>                                       <device name="NODE2-ILO"/>
>                               </method>
>                       </fence>
>               </clusternode>
>       </clusternodes>
>       <quorumd device="/dev/sdf1" interval="3" label="quorum_disk1"
> tko="23" votes="2">
>               <heuristic interval="2" program="ping 10.4.0.1 -c1 -t1" 
> score="1"/>
>       </quorumd>
>       <cman expected_votes="1" two_node="1"/>
>       <fencedevices>
>               <fencedevice agent="fence_ilo" hostname="ilo-node2"
> login="Administrator" name="NODE2-ILO" passwd="password"/>
>               <fencedevice agent="fence_ilo" hostname="ilo-node1"
> login="Administrator" name="NODE1-ILO" passwd="password"/>
>       </fencedevices>
>       <rm>
>               <failoverdomains>
>                       <failoverdomain name="Failover" nofailback="1" 
> ordered="0"
> restricted="0">
>                               <failoverdomainnode 
> name="rhel-cluster-node2.mgmt.local"
> priority="1"/>
>                               <failoverdomainnode 
> name="rhel-cluster-node1.mgmt.local"
> priority="1"/>
>                       </failoverdomain>
>               </failoverdomains>
>               <resources>
>                       <ip address="10.4.1.103" monitor_link="1"/>
>               </resources>
>               <service autostart="1" domain="Failover" exclusive="0"
> name="IP_Virtual" recovery="relocate">
>                       <ip ref="10.4.1.103"/>
>               </service>
>       </rm>
> </cluster>
> 
> Many thanks,
> Wahyu
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Ben Turner
> Sent: Thursday, October 28, 2010 12:18 AM
> To: linux clustering
> Subject: Re: [Linux-cluster] Fence Issue on BL 460C G6
> 
> My guess is there is a problem with fencing.  Are you running
> fence_ilo with an HP blade?   Iirc the iLOs on the blades have a
> different CLI, I don't think fence_ilo will work with them.  What do
> you see in the messages files during these events?  If you see failed
> fence messages you may want to look into using fence_ipmilan:
> 
> http://sources.redhat.com/cluster/wiki/IPMI_FencingConfig
> 
> If you post a snip of your messages file from this event and your
> cluster.conf I will have a better idea of what is going on.
> 
> -b
> 
> 
> 
> ----- "Wahyu Darmawan" <[email protected]> wrote:
> 
> > Hi all,
> >
> >
> >
> > For fencing, I’m using HP iLO and server is BL460c G6. Problem is
> > resource is start moving to the passive when the failed node is
> power
> > on. It is really strange for me. For example, I shutdown the node1
> and
> > physically remove the node1 machine from the blade chassis and
> monitor
> > the clustat output, clustat was still showing that the resource is
> on
> > node 1, even node 1 is power down and removed from c7000 blade
> > chassis. But when I plugged again the failed node1 on the c7000
> blade
> > chassis and it power-on, then clustat is showing that the resource
> is
> > start moving to the passive node from the failed node.
> > I’m powering down the blade server with power button in front of it,
> > then we remove it from the chassis, If we face the hardware problem
> in
> > our active node and the active node goes down then how the resource
> > move to the passive node. In addition, When I rebooted or shutdown
> the
> > machine from the CLI, then the resource moves successfully from the
> > passive node. Furthurmore, When I shutdown the active node with
> > "shutdown -hy 0" command, after shuting down the active node
> > automatically restart.
> >
> > Please help me.
> >
> >
> >
> > Many Thanks,
> > --
> > Linux-cluster mailing list
> > [email protected]
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> --
> Linux-cluster mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Fence Issue on BL 460C G6

Reply via email to