Hello, On 10/28/2011 01:21 PM, neha chatrath wrote: > Hello, > > 1. How about using Integrated ILO device for fencing? I am using HP > Proliant DL360 G7 server which supports ILO3. > - Can RILOE Stonith be used for this?
works fine e.g. with external/ipmi stonith module > > 2. Can meatware Stonith plugin be used for production software? yes > > 3. One more issue which I am facing is that when I try > -"crm ra list stonith" command, there is no output. although > different RA's under Heartbeat class are visible. never saw this behavior, when all packages were installed ... > - Also, Stonith class is visible in the output of "crm ra > classes" command > - all the default Stonith RA's like meatware, suicide, > ibmrsa, ipmi etc are present in /usr/lib/stonith/plugins directory. > - Due to this I am not able to configure stonith in my system. if stonith agents are available when using the "stonith" cmdline tool, I expect this to work. You can also use this tool to get meta-data infos if tab completion in "crm configure" mode is not enough. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > Thanks and regards > Neha Chatrath > > On Tue, Oct 18, 2011 at 2:51 PM, neha chatrath <nehachatr...@gmail.com > <mailto:nehachatr...@gmail.com>> wrote: > > Hello, > > > 1. If a resource fails, node should reboot (through fencing mechanism) > > and resources should re-start on the node. > > Why would you want that? This would increase the service downtime > considerable. Why is a local restart not possible ... and even if there > is a good reason for a reboot, why not starting the resource on the > other node? > -In our system, there are some primitive, clone resources along with > 3 different master-slave resources. > -All the masters and slaves of these resources are co-located i.e. > all the 3 masters are co-located on a node and 3 slaves on the other > node. > -These 3 master-slaves resources are tightly coupled. There is a > requirement that failure of even any one of these resources, > restarts all the resources in the group > -All these resources can be shifted to the other node but > subsequently these should also be restarted as a lot of data/control > plane synching is being done between the two nodes. > e.g. If one of the resources running on node1 as a Master fails, > then all these 3 resources are shifted to the other node i.e. node2 > (with corresponding slave resources being promoted as master). On > node1, these resources should get re-started as slaves. > > We understand that node restart will increase the downtime but since > we could not find much on the option for group restart of > master-slave resources, we are trying for node restart option. > > > Thanks and regards > Neha Chatrath > > ---------- Forwarded message ---------- > From: *Andreas Kurz* <andr...@hastexo.com <mailto:andr...@hastexo.com>> > Date: Tue, Oct 18, 2011 at 1:55 PM > Subject: Re: [Pacemaker] Problem in Stonith configuration > To: pacemaker@oss.clusterlabs.org <mailto:pacemaker@oss.clusterlabs.org> > > > Hello, > > > On 10/18/2011 09:00 AM, neha chatrath wrote: > > Hello, > > > > Minor updates in the first requirement. > > 1. If a resource fails, node should reboot (through fencing mechanism) > > and resources should re-start on the node. > > Why would you want that? This would increase the service downtime > considerable. Why is a local restart not possible ... and even if there > is a good reason for a reboot, why not starting the resource on the > other node? > > > > 2. If the physical link between the nodes in a cluster fails then that > > node should be isolated (kind of a power down) and the resources > should > > continue to run on the other nodes > > That is how stonith works, yes. > > crm ra list stonith ... gives you a list of all available stonith > plugins. > > crm ra info stonit:xxxx ... details for a specific plugin. > > Using external/ipmi is often a good choice because a lot of servers > already have an BMC with IPMI on board or they are shipped with a > management card supporting IMPI. > > Regards, > Andreas > > > On Tue, Oct 18, 2011 at 12:30 PM, neha chatrath > <nehachatr...@gmail.com <mailto:nehachatr...@gmail.com>> wrote: > > Hello, > > Minor updates in the first requirement. > 1. If a resource fails, node should reboot (through fencing > mechanism) and resources should re-start on the node. > > 2. If the physical link between the nodes in a cluster fails > then that node should be isolated (kind of a power down) and the > resources should continue to run on the other nodes > > Apologies for the inconvenience. > > > Thanks and regards > Neha Chatrath > > On Tue, Oct 18, 2011 at 12:08 PM, neha chatrath > <nehachatr...@gmail.com <mailto:nehachatr...@gmail.com>> wrote: > > Hello Andreas, > > Thanks for the reply. > > So can you please suggest what Stonith plugin should I use > for the production release of my software. I have the > following system requirements: > 1. If a node in the cluster fails, it should be reboot and > resources should re-start on the node. > 2. If the physical link between the nodes in a cluster fails > then that node should be isolated (kind of a power down) and > the resources should continue to run on the other nodes. > > I have different types of resources e.g. primitive, > master-slave and cone running on my system. > > Thanks and regards > Neha Chatrath > > > Date: Mon, 17 Oct 2011 15:08:16 +0200 > From: Andreas Kurz <andr...@hastexo.com > <mailto:andr...@hastexo.com>> > To: pacemaker@oss.clusterlabs.org > <mailto:pacemaker@oss.clusterlabs.org> > Subject: Re: [Pacemaker] Problem in Stonith configuration > Message-ID: <4e9c28c0.8070...@hastexo.com > <mailto:4e9c28c0.8070...@hastexo.com>> > Content-Type: text/plain; charset="iso-8859-1" > > Hello, > > > On 10/17/2011 12:34 PM, neha chatrath wrote: > > Hello, > > I am configuring a 2 node cluster with following > configuration: > > > > *[root@MCG1 init.d]# crm configure show > > > > node $id="16738ea4-adae-483f-9d79- > b0ecce8050f4" mcg2 \ > > attributes standby="off" > > > > node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \ > > attributes standby="off" > > > > primitive ClusterIP ocf:heartbeat:IPaddr \ > > params ip="192.168.1.204" cidr_netmask="255.255.255.0" > nic="eth0:1" \ > > > > op monitor interval="40s" timeout="20s" \ > > meta target-role="Started" > > > > primitive app1_fencing stonith:suicide \ > > op monitor interval="90" \ > > meta target-role="Started" > > > > primitive myapp1 ocf:heartbeat:Redundancy \ > > op monitor interval="60s" role="Master" timeout="30s" > on-fail="standby" \ > > op monitor interval="40s" role="Slave" timeout="40s" > on-fail="restart" > > > > primitive myapp2 ocf:mcg:Redundancy_myapp2 \ > > op monitor interval="60" role="Master" timeout="30" > on-fail="standby" \ > > op monitor interval="40" role="Slave" timeout="40" > on-fail="restart" > > > > primitive myapp3 ocf:mcg:red_app3 \ > > op monitor interval="60" role="Master" timeout="30" > on-fail="fence" \ > > op monitor interval="40" role="Slave" timeout="40" > on-fail="restart" > > > > ms ms_myapp1 myapp1 \ > > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" > > notify="true" > > > > ms ms_myapp2 myapp2 \ > > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" > > notify="true" > > > > ms ms_myapp3 myapp3 \ > > meta master-max="1" master-max-node="1" clone-max="2" > clone-node-max="1" > > notify="true" > > > > colocation myapp1_col inf: ClusterIP ms_myapp1:Master > > > > colocation myapp2_col inf: ClusterIP ms_myapp2:Master > > > > colocation myapp3_col inf: ClusterIP ms_myapp3:Master > > > > order myapp1_order inf: ms_myapp1:promote ClusterIP:start > > > > order myapp2_order inf: ms_myapp2:promote ms_myapp1:start > > > > order myapp3_order inf: ms_myapp3:promote ms_myapp2:start > > > > property $id="cib-bootstrap-options" \ > > dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \ > > cluster-infrastructure="Heartbeat" \ > > stonith-enabled="true" \ > > no-quorum-policy="ignore" > > > > rsc_defaults $id="rsc-options" \ > > resource-stickiness="100" \ > > migration-threshold="3" > > * > > > I start Heartbeat demon only one of the nodes e.g. mcg1. > But none of the > > resources (myapp, myapp1 etc) gets started even on this node. > > Following is the output of "*crm_mon -f *" command: > > > > *Last updated: Mon Oct 17 10:19:22 2011 > > > Stack: Heartbeat > > Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- > partition with > > quorum > > Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 > > 2 Nodes configured, unknown expected votes > > 5 Resources configured. > > ============ > > Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN > (offline) > > The cluster is waiting for a successful fencing event before > starting > all resources .. the only way to be sure the second node > runs no resources. > > Since you are using suicide pluging this will never happen > if Heartbeat > is not started on that node. If this is only a _test_setup_ > go with ssh > or even null stonith plugin ... never use them on production > systems! > > Regards, > Andreas > > > On Mon, Oct 17, 2011 at 4:04 PM, neha chatrath > <nehachatr...@gmail.com <mailto:nehachatr...@gmail.com>> wrote: > > Hello, > I am configuring a 2 node cluster with following > configuration: > > *[root@MCG1 init.d]# crm configure show > > node $id="16738ea4-adae-483f-9d79-b0ecce8050f4" mcg2 \ > attributes standby="off" > > node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \ > attributes standby="off" > > primitive ClusterIP ocf:heartbeat:IPaddr \ > params ip="192.168.1.204" cidr_netmask="255.255.255.0" > nic="eth0:1" \ > > op monitor interval="40s" timeout="20s" \ > meta target-role="Started" > > primitive app1_fencing stonith:suicide \ > op monitor interval="90" \ > meta target-role="Started" > > primitive myapp1 ocf:heartbeat:Redundancy \ > op monitor interval="60s" role="Master" timeout="30s" > on-fail="standby" \ > op monitor interval="40s" role="Slave" timeout="40s" > on-fail="restart" > > primitive myapp2 ocf:mcg:Redundancy_myapp2 \ > op monitor interval="60" role="Master" timeout="30" > on-fail="standby" \ > op monitor interval="40" role="Slave" timeout="40" > on-fail="restart" > > primitive myapp3 ocf:mcg:red_app3 \ > op monitor interval="60" role="Master" timeout="30" > on-fail="fence" \ > op monitor interval="40" role="Slave" timeout="40" > on-fail="restart" > > ms ms_myapp1 myapp1 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > > ms ms_myapp2 myapp2 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > > ms ms_myapp3 myapp3 \ > meta master-max="1" master-max-node="1" clone-max="2" > clone-node-max="1" notify="true" > > colocation myapp1_col inf: ClusterIP ms_myapp1:Master > > colocation myapp2_col inf: ClusterIP ms_myapp2:Master > > colocation myapp3_col inf: ClusterIP ms_myapp3:Master > > order myapp1_order inf: ms_myapp1:promote ClusterIP:start > > order myapp2_order inf: ms_myapp2:promote ms_myapp1:start > > order myapp3_order inf: ms_myapp3:promote ms_myapp2:start > > property $id="cib-bootstrap-options" \ > dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" > \ > cluster-infrastructure="Heartbeat" \ > stonith-enabled="true" \ > no-quorum-policy="ignore" > > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" \ > migration-threshold="3" > * > I start Heartbeat demon only one of the nodes e.g. mcg1. > But none of the resources (myapp, myapp1 etc) gets > started even on this node. > Following is the output of "*crm_mon -f *" command: > > *Last updated: Mon Oct 17 10:19:22 2011 > Stack: Heartbeat > Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- > partition with quorum > Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 > 2 Nodes configured, unknown expected votes > 5 Resources configured. > ============ > Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): > UNCLEAN (offline) > Online: [ mcg1 ] > app1_fencing (stonith:suicide):Started mcg1 > > Migration summary: > * Node mcg1: > * > When I set "stonith_enabled" as false, then all my > resources comes up. > > Can somebody help me with STONITH configuration? > > Cheers > Neha Chatrath > KEEP SMILING!!!! > > > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker