Hello Andreas, Thanks for the reply.
So can you please suggest what Stonith plugin should I use for the production release of my software. I have the following system requirements: 1. If a node in the cluster fails, it should be reboot and resources should re-start on the node. 2. If the physical link between the nodes in a cluster fails then that node should be isolated (kind of a power down) and the resources should continue to run on the other nodes. I have different types of resources e.g. primitive, master-slave and cone running on my system. Thanks and regards Neha Chatrath Date: Mon, 17 Oct 2011 15:08:16 +0200 From: Andreas Kurz <andr...@hastexo.com> To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] Problem in Stonith configuration Message-ID: <4e9c28c0.8070...@hastexo.com> Content-Type: text/plain; charset="iso-8859-1" Hello, On 10/17/2011 12:34 PM, neha chatrath wrote: > Hello, > I am configuring a 2 node cluster with following configuration: > > *[root@MCG1 init.d]# crm configure show > > node $id="16738ea4-adae-483f-9d79- b0ecce8050f4" mcg2 \ > attributes standby="off" > > node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \ > attributes standby="off" > > primitive ClusterIP ocf:heartbeat:IPaddr \ > params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \ > > op monitor interval="40s" timeout="20s" \ > meta target-role="Started" > > primitive app1_fencing stonith:suicide \ > op monitor interval="90" \ > meta target-role="Started" > > primitive myapp1 ocf:heartbeat:Redundancy \ > op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \ > op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart" > > primitive myapp2 ocf:mcg:Redundancy_myapp2 \ > op monitor interval="60" role="Master" timeout="30" on-fail="standby" \ > op monitor interval="40" role="Slave" timeout="40" on-fail="restart" > > primitive myapp3 ocf:mcg:red_app3 \ > op monitor interval="60" role="Master" timeout="30" on-fail="fence" \ > op monitor interval="40" role="Slave" timeout="40" on-fail="restart" > > ms ms_myapp1 myapp1 \ > meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" > notify="true" > > ms ms_myapp2 myapp2 \ > meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" > notify="true" > > ms ms_myapp3 myapp3 \ > meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1" > notify="true" > > colocation myapp1_col inf: ClusterIP ms_myapp1:Master > > colocation myapp2_col inf: ClusterIP ms_myapp2:Master > > colocation myapp3_col inf: ClusterIP ms_myapp3:Master > > order myapp1_order inf: ms_myapp1:promote ClusterIP:start > > order myapp2_order inf: ms_myapp2:promote ms_myapp1:start > > order myapp3_order inf: ms_myapp3:promote ms_myapp2:start > > property $id="cib-bootstrap-options" \ > dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \ > cluster-infrastructure="Heartbeat" \ > stonith-enabled="true" \ > no-quorum-policy="ignore" > > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" \ > migration-threshold="3" > * > I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the > resources (myapp, myapp1 etc) gets started even on this node. > Following is the output of "*crm_mon -f *" command: > > *Last updated: Mon Oct 17 10:19:22 2011 > Stack: Heartbeat > Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with > quorum > Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 > 2 Nodes configured, unknown expected votes > 5 Resources configured. > ============ > Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline) The cluster is waiting for a successful fencing event before starting all resources .. the only way to be sure the second node runs no resources. Since you are using suicide pluging this will never happen if Heartbeat is not started on that node. If this is only a _test_setup_ go with ssh or even null stonith plugin ... never use them on production systems! Regards, Andreas On Mon, Oct 17, 2011 at 4:04 PM, neha chatrath <nehachatr...@gmail.com>wrote: > Hello, > I am configuring a 2 node cluster with following configuration: > > *[root@MCG1 init.d]# crm configure show > > node $id="16738ea4-adae-483f-9d79-b0ecce8050f4" mcg2 \ > attributes standby="off" > > node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \ > attributes standby="off" > > primitive ClusterIP ocf:heartbeat:IPaddr \ > params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \ > > op monitor interval="40s" timeout="20s" \ > meta target-role="Started" > > primitive app1_fencing stonith:suicide \ > op monitor interval="90" \ > meta target-role="Started" > > primitive myapp1 ocf:heartbeat:Redundancy \ > op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \ > op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart" > > primitive myapp2 ocf:mcg:Redundancy_myapp2 \ > op monitor interval="60" role="Master" timeout="30" on-fail="standby" \ > op monitor interval="40" role="Slave" timeout="40" on-fail="restart" > > primitive myapp3 ocf:mcg:red_app3 \ > op monitor interval="60" role="Master" timeout="30" on-fail="fence" \ > op monitor interval="40" role="Slave" timeout="40" on-fail="restart" > > ms ms_myapp1 myapp1 \ > meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" > notify="true" > > ms ms_myapp2 myapp2 \ > meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" > notify="true" > > ms ms_myapp3 myapp3 \ > meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1" > notify="true" > > colocation myapp1_col inf: ClusterIP ms_myapp1:Master > > colocation myapp2_col inf: ClusterIP ms_myapp2:Master > > colocation myapp3_col inf: ClusterIP ms_myapp3:Master > > order myapp1_order inf: ms_myapp1:promote ClusterIP:start > > order myapp2_order inf: ms_myapp2:promote ms_myapp1:start > > order myapp3_order inf: ms_myapp3:promote ms_myapp2:start > > property $id="cib-bootstrap-options" \ > dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \ > cluster-infrastructure="Heartbeat" \ > stonith-enabled="true" \ > no-quorum-policy="ignore" > > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" \ > migration-threshold="3" > * > I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the > resources (myapp, myapp1 etc) gets started even on this node. > Following is the output of "*crm_mon -f *" command: > > *Last updated: Mon Oct 17 10:19:22 2011 > Stack: Heartbeat > Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with > quorum > Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 > 2 Nodes configured, unknown expected votes > 5 Resources configured. > ============ > Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline) > Online: [ mcg1 ] > app1_fencing (stonith:suicide):Started mcg1 > > Migration summary: > * Node mcg1: > * > When I set "stonith_enabled" as false, then all my resources comes up. > > Can somebody help me with STONITH configuration? > > Cheers > Neha Chatrath > KEEP SMILING!!!! >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker