Re: [Pacemaker] Two node KVM cluster
On 01/05/13 06:12, Andrew Beekhof wrote: On 28/04/2013, at 9:19 PM, Oriol Mula-Valls wrote: Hi, I have modified the previous configuration to use sbd fencing. I have also fixed several other issues with the configuration and now when the node reboots seems not to be able to rejoin the cluster. I attach the debug log I have just generated. Node was rebooted around 11:51:41 and came back at 12:52:47. The boot order of the services is: 1. sbd 2. corosync 3. pacemaker It doesn't look like pacemaker was restarted on node1, just corosync. The node was forcedly rebooted with echo b > /proc/syrq-trigger. I am still testing what will happen in case of an unexpected reboot. Could someone help me, please? Thanks, Oriol On 16/04/13 06:10, Andrew Beekhof wrote: On 10/04/2013, at 3:20 PM, Oriol Mula-Valls wrote: On 10/04/13 02:10, Andrew Beekhof wrote: On 09/04/2013, at 7:31 PM, Oriol Mula-Vallswrote: Thanks Andrew I've managed to set up the system and currently I have it working but still on testing. I have configure external/ipmi as fencing device and then I force a reboot doing a echo b>/proc/sysrq-trigger. The fencing is working properly as the node is shut off and the VM migrated. However, as soon as I turn on the fenced now and the OS has started the surviving is shut down. Is it normal or am I doing something wrong? Can you clarify "turn on the fenced"? To restart the fenced node I do either a power on with ipmitool or I power it on using the iRMC web interface. Oh, "fenced now" was meant to be "fenced node". That makes more sense now :) To answer your question, I would not expect the surviving node to be fenced when the previous node returns. The network between the two is still functional? -- Oriol Mula Valls Institut Català de Ciències del Clima (IC3) Doctor Trueta 203 - 08005 Barcelona Tel:+34 93 567 99 77 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Oriol Mula Valls Institut Català de Ciències del Clima (IC3) Doctor Trueta 203 - 08005 Barcelona Tel:+34 93 567 99 77 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Two node KVM cluster
On 28/04/2013, at 9:19 PM, Oriol Mula-Valls wrote: > Hi, > > I have modified the previous configuration to use sbd fencing. I have also > fixed several other issues with the configuration and now when the node > reboots seems not to be able to rejoin the cluster. > > I attach the debug log I have just generated. Node was rebooted around > 11:51:41 and came back at 12:52:47. > > The boot order of the services is: > 1. sbd > 2. corosync > 3. pacemaker It doesn't look like pacemaker was restarted on node1, just corosync. > > Could someone help me, please? > > Thanks, > Oriol > > On 16/04/13 06:10, Andrew Beekhof wrote: >> >> On 10/04/2013, at 3:20 PM, Oriol Mula-Valls wrote: >> >>> On 10/04/13 02:10, Andrew Beekhof wrote: On 09/04/2013, at 7:31 PM, Oriol Mula-Valls wrote: > Thanks Andrew I've managed to set up the system and currently I have it > working but still on testing. > > I have configure external/ipmi as fencing device and then I force a > reboot doing a echo b> /proc/sysrq-trigger. The fencing is working > properly as the node is shut off and the VM migrated. However, as soon as > I turn on the fenced now and the OS has started the surviving is shut > down. Is it normal or am I doing something wrong? Can you clarify "turn on the fenced"? >>> >>> To restart the fenced node I do either a power on with ipmitool or I power >>> it on using the iRMC web interface. >> >> Oh, "fenced now" was meant to be "fenced node". That makes more sense now :) >> >> To answer your question, I would not expect the surviving node to be fenced >> when the previous node returns. >> The network between the two is still functional? >> > > > -- > Oriol Mula Valls > Institut Català de Ciències del Clima (IC3) > Doctor Trueta 203 - 08005 Barcelona > Tel:+34 93 567 99 77 > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Two node KVM cluster
On 17/04/2013, at 4:02 PM, Oriol Mula-Valls wrote: > On 16/04/13 06:10, Andrew Beekhof wrote: >> >> On 10/04/2013, at 3:20 PM, Oriol Mula-Valls wrote: >> >>> On 10/04/13 02:10, Andrew Beekhof wrote: On 09/04/2013, at 7:31 PM, Oriol Mula-Valls wrote: > Thanks Andrew I've managed to set up the system and currently I have it > working but still on testing. > > I have configure external/ipmi as fencing device and then I force a > reboot doing a echo b> /proc/sysrq-trigger. The fencing is working > properly as the node is shut off and the VM migrated. However, as soon as > I turn on the fenced now and the OS has started the surviving is shut > down. Is it normal or am I doing something wrong? Can you clarify "turn on the fenced"? >>> >>> To restart the fenced node I do either a power on with ipmitool or I power >>> it on using the iRMC web interface. >> >> Oh, "fenced now" was meant to be "fenced node". That makes more sense now :) >> >> To answer your question, I would not expect the surviving node to be fenced >> when the previous node returns. >> The network between the two is still functional? > > Sorry I didn't not realised the mistake even while writing the answer :) > > IPMI network is still working between the nodes. Ok, but what about the network corosync is using? > > Thanks, > Oriol > >> >>> > > On the other hand I've seen that in case I completely lose power fencing > obviously fails. Would SBD stonith solve this issue? > > Kind regards, > Oriol > > On 08/04/13 04:11, Andrew Beekhof wrote: >> >> On 03/04/2013, at 9:15 PM, Oriol Mula-Valls >> wrote: >> >>> Hi, >>> >>> I've started with Linux HA about one year ago. Currently I'm facing a >>> new project in which I have to set up two nodes with high available >>> virtual machines. I have used as a starting point the Digimer's >>> tutorial (https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial). >>> >>> To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. >>> Both machines have 8GB of RAM and 2x500GB HD. I started creating a >>> software RAID1 with the internal drives and installing Debian 7.0 >>> (Wheezy). Apart from the O.S. partition I have created 3 more >>> partitions, one for the shared storage between both machines with OCFS2 >>> and the two other will be used as PVs to create LVs to support the VMs >>> (one for the VMs that will be primary on node1 an the other for primary >>> machines on node2). These 3 partitions are replicated using DRBD. >>> >>> The shared storage folder contains: >>> * ISO images needed when provisioning VMs >>> * scripts used to call virt-install which handles the creation of our >>> VMs. >>> * XML definition files which define the emulated hardware backing the >>> VMs >>> * old copies of the XML definition files. >>> >>> I have more or less done the configuration for the OCFS2 fs and I was >>> about to start the configuration of cLVM for one of the VGs but I have >>> some doubts. I have one dlm for the OCFS2 filesystem, should I create >>> another for cLVM RA? >> >> No, there should only ever be one dlm resource (cloned like you have it) >> >>> >>> This is the current configuration: >>> node node1 >>> node node2 >>> primitive p_dlm_controld ocf:pacemaker:controld \ >>> op start interval="0" timeout="90" \ >>> op stop interval="0" timeout="100" \ >>> op monitor interval="10" >>> primitive p_drbd_shared ocf:linbit:drbd \ >>> params drbd_resource="shared" \ >>> op monitor interval="10" role="Master" timeout="20" \ >>> op monitor interval="20" role="Slave" timeout="20" \ >>> op start interval="0" timeout="240s" \ >>> op stop interval="0" timeout="120s" >>> primitive p_drbd_vm_1 ocf:linbit:drbd \ >>> params drbd_resource="vm_1" \ >>> op monitor interval="10" role="Master" timeout="20" \ >>> op monitor interval="20" role="Slave" timeout="20" \ >>> op start interval="0" timeout="240s" \ >>> op stop interval="0" timeout="120s" >>> primitive p_fs_shared ocf:heartbeat:Filesystem \ >>> params device="/dev/drbd/by-res/shared" directory="/shared" >>> fstype="ocfs2" \ >>> meta target-role="Started" \ >>> op monitor interval="10" >>> primitive p_ipmi_node1 stonith:external/ipmi \ >>> params hostname="node1" userid="admin" passwd="xxx" >>> ipaddr="10.0.0.2" interface="lanplus" >>> primitive p_ipmi_node2 stonith:external/ipmi \ >>> params hostname="node2" userid="admin" passwd="xxx" >>> ipaddr="10.0.0.3" interface="lanplus" >>> primitive p_libvirt
Re: [Pacemaker] Two node KVM cluster
On 16/04/13 06:10, Andrew Beekhof wrote: On 10/04/2013, at 3:20 PM, Oriol Mula-Valls wrote: On 10/04/13 02:10, Andrew Beekhof wrote: On 09/04/2013, at 7:31 PM, Oriol Mula-Valls wrote: Thanks Andrew I've managed to set up the system and currently I have it working but still on testing. I have configure external/ipmi as fencing device and then I force a reboot doing a echo b> /proc/sysrq-trigger. The fencing is working properly as the node is shut off and the VM migrated. However, as soon as I turn on the fenced now and the OS has started the surviving is shut down. Is it normal or am I doing something wrong? Can you clarify "turn on the fenced"? To restart the fenced node I do either a power on with ipmitool or I power it on using the iRMC web interface. Oh, "fenced now" was meant to be "fenced node". That makes more sense now :) To answer your question, I would not expect the surviving node to be fenced when the previous node returns. The network between the two is still functional? Sorry I didn't not realised the mistake even while writing the answer :) IPMI network is still working between the nodes. Thanks, Oriol On the other hand I've seen that in case I completely lose power fencing obviously fails. Would SBD stonith solve this issue? Kind regards, Oriol On 08/04/13 04:11, Andrew Beekhof wrote: On 03/04/2013, at 9:15 PM, Oriol Mula-Vallswrote: Hi, I've started with Linux HA about one year ago. Currently I'm facing a new project in which I have to set up two nodes with high available virtual machines. I have used as a starting point the Digimer's tutorial (https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial). To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. Both machines have 8GB of RAM and 2x500GB HD. I started creating a software RAID1 with the internal drives and installing Debian 7.0 (Wheezy). Apart from the O.S. partition I have created 3 more partitions, one for the shared storage between both machines with OCFS2 and the two other will be used as PVs to create LVs to support the VMs (one for the VMs that will be primary on node1 an the other for primary machines on node2). These 3 partitions are replicated using DRBD. The shared storage folder contains: * ISO images needed when provisioning VMs * scripts used to call virt-install which handles the creation of our VMs. * XML definition files which define the emulated hardware backing the VMs * old copies of the XML definition files. I have more or less done the configuration for the OCFS2 fs and I was about to start the configuration of cLVM for one of the VGs but I have some doubts. I have one dlm for the OCFS2 filesystem, should I create another for cLVM RA? No, there should only ever be one dlm resource (cloned like you have it) This is the current configuration: node node1 node node2 primitive p_dlm_controld ocf:pacemaker:controld \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" primitive p_drbd_shared ocf:linbit:drbd \ params drbd_resource="shared" \ op monitor interval="10" role="Master" timeout="20" \ op monitor interval="20" role="Slave" timeout="20" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="120s" primitive p_drbd_vm_1 ocf:linbit:drbd \ params drbd_resource="vm_1" \ op monitor interval="10" role="Master" timeout="20" \ op monitor interval="20" role="Slave" timeout="20" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="120s" primitive p_fs_shared ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/shared" directory="/shared" fstype="ocfs2" \ meta target-role="Started" \ op monitor interval="10" primitive p_ipmi_node1 stonith:external/ipmi \ params hostname="node1" userid="admin" passwd="xxx" ipaddr="10.0.0.2" interface="lanplus" primitive p_ipmi_node2 stonith:external/ipmi \ params hostname="node2" userid="admin" passwd="xxx" ipaddr="10.0.0.3" interface="lanplus" primitive p_libvirtd lsb:libvirt-bin \ op monitor interval="120s" \ op start interval="0" \ op stop interval="0" primitive p_o2cb ocf:pacemaker:o2cb \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" \ meta target-role="Started" group g_shared p_dlm_controld p_o2cb p_fs_shared ms ms_drbd_shared p_drbd_shared \ meta master-max="2" clone-max="2" notify="true" ms ms_drbd_vm_1 p_drbd_vm_1 \ meta master-max="2" clone-max="2" notify="true" clone cl_libvirtd p_libvirtd \ meta globally-unique="false" interlave="true" clone cl_shared g_shared \ meta interleave="true" location l_ipmi_node1 p_ipmi_node1 -inf: node1 location l_ipmi_node2 p_ipmi_node2 -inf: node2 order o_drbd_before_shared inf: ms_drbd_share
Re: [Pacemaker] Two node KVM cluster
On 10/04/2013, at 3:20 PM, Oriol Mula-Valls wrote: > On 10/04/13 02:10, Andrew Beekhof wrote: >> >> On 09/04/2013, at 7:31 PM, Oriol Mula-Valls wrote: >> >>> Thanks Andrew I've managed to set up the system and currently I have it >>> working but still on testing. >>> >>> I have configure external/ipmi as fencing device and then I force a reboot >>> doing a echo b> /proc/sysrq-trigger. The fencing is working properly as >>> the node is shut off and the VM migrated. However, as soon as I turn on the >>> fenced now and the OS has started the surviving is shut down. Is it normal >>> or am I doing something wrong? >> >> Can you clarify "turn on the fenced"? >> > > To restart the fenced node I do either a power on with ipmitool or I power it > on using the iRMC web interface. Oh, "fenced now" was meant to be "fenced node". That makes more sense now :) To answer your question, I would not expect the surviving node to be fenced when the previous node returns. The network between the two is still functional? > >>> >>> On the other hand I've seen that in case I completely lose power fencing >>> obviously fails. Would SBD stonith solve this issue? >>> >>> Kind regards, >>> Oriol >>> >>> On 08/04/13 04:11, Andrew Beekhof wrote: On 03/04/2013, at 9:15 PM, Oriol Mula-Valls wrote: > Hi, > > I've started with Linux HA about one year ago. Currently I'm facing a new > project in which I have to set up two nodes with high available virtual > machines. I have used as a starting point the Digimer's tutorial > (https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial). > > To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. > Both machines have 8GB of RAM and 2x500GB HD. I started creating a > software RAID1 with the internal drives and installing Debian 7.0 > (Wheezy). Apart from the O.S. partition I have created 3 more partitions, > one for the shared storage between both machines with OCFS2 and the two > other will be used as PVs to create LVs to support the VMs (one for the > VMs that will be primary on node1 an the other for primary machines on > node2). These 3 partitions are replicated using DRBD. > > The shared storage folder contains: > * ISO images needed when provisioning VMs > * scripts used to call virt-install which handles the creation of our VMs. > * XML definition files which define the emulated hardware backing the VMs > * old copies of the XML definition files. > > I have more or less done the configuration for the OCFS2 fs and I was > about to start the configuration of cLVM for one of the VGs but I have > some doubts. I have one dlm for the OCFS2 filesystem, should I create > another for cLVM RA? No, there should only ever be one dlm resource (cloned like you have it) > > This is the current configuration: > node node1 > node node2 > primitive p_dlm_controld ocf:pacemaker:controld \ > op start interval="0" timeout="90" \ > op stop interval="0" timeout="100" \ > op monitor interval="10" > primitive p_drbd_shared ocf:linbit:drbd \ > params drbd_resource="shared" \ > op monitor interval="10" role="Master" timeout="20" \ > op monitor interval="20" role="Slave" timeout="20" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="120s" > primitive p_drbd_vm_1 ocf:linbit:drbd \ > params drbd_resource="vm_1" \ > op monitor interval="10" role="Master" timeout="20" \ > op monitor interval="20" role="Slave" timeout="20" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="120s" > primitive p_fs_shared ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/shared" directory="/shared" > fstype="ocfs2" \ > meta target-role="Started" \ > op monitor interval="10" > primitive p_ipmi_node1 stonith:external/ipmi \ > params hostname="node1" userid="admin" passwd="xxx" ipaddr="10.0.0.2" > interface="lanplus" > primitive p_ipmi_node2 stonith:external/ipmi \ > params hostname="node2" userid="admin" passwd="xxx" ipaddr="10.0.0.3" > interface="lanplus" > primitive p_libvirtd lsb:libvirt-bin \ > op monitor interval="120s" \ > op start interval="0" \ > op stop interval="0" > primitive p_o2cb ocf:pacemaker:o2cb \ > op start interval="0" timeout="90" \ > op stop interval="0" timeout="100" \ > op monitor interval="10" \ > meta target-role="Started" > group g_shared p_dlm_controld p_o2cb p_fs_shared > ms ms_drbd_shared p_drbd_shared \ > meta master-max="2" clone-max="2" notify="true" > ms ms_drbd_vm_1 p_drbd_vm_1 \ > meta master-max="2" clone-max="2" notify="true" > clone cl_libvirtd p_libvirtd \ > meta globally-unique="false" interlave="true"
Re: [Pacemaker] Two node KVM cluster
On 10/04/13 02:10, Andrew Beekhof wrote: On 09/04/2013, at 7:31 PM, Oriol Mula-Valls wrote: Thanks Andrew I've managed to set up the system and currently I have it working but still on testing. I have configure external/ipmi as fencing device and then I force a reboot doing a echo b> /proc/sysrq-trigger. The fencing is working properly as the node is shut off and the VM migrated. However, as soon as I turn on the fenced now and the OS has started the surviving is shut down. Is it normal or am I doing something wrong? Can you clarify "turn on the fenced"? To restart the fenced node I do either a power on with ipmitool or I power it on using the iRMC web interface. On the other hand I've seen that in case I completely lose power fencing obviously fails. Would SBD stonith solve this issue? Kind regards, Oriol On 08/04/13 04:11, Andrew Beekhof wrote: On 03/04/2013, at 9:15 PM, Oriol Mula-Valls wrote: Hi, I've started with Linux HA about one year ago. Currently I'm facing a new project in which I have to set up two nodes with high available virtual machines. I have used as a starting point the Digimer's tutorial (https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial). To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. Both machines have 8GB of RAM and 2x500GB HD. I started creating a software RAID1 with the internal drives and installing Debian 7.0 (Wheezy). Apart from the O.S. partition I have created 3 more partitions, one for the shared storage between both machines with OCFS2 and the two other will be used as PVs to create LVs to support the VMs (one for the VMs that will be primary on node1 an the other for primary machines on node2). These 3 partitions are replicated using DRBD. The shared storage folder contains: * ISO images needed when provisioning VMs * scripts used to call virt-install which handles the creation of our VMs. * XML definition files which define the emulated hardware backing the VMs * old copies of the XML definition files. I have more or less done the configuration for the OCFS2 fs and I was about to start the configuration of cLVM for one of the VGs but I have some doubts. I have one dlm for the OCFS2 filesystem, should I create another for cLVM RA? No, there should only ever be one dlm resource (cloned like you have it) This is the current configuration: node node1 node node2 primitive p_dlm_controld ocf:pacemaker:controld \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" primitive p_drbd_shared ocf:linbit:drbd \ params drbd_resource="shared" \ op monitor interval="10" role="Master" timeout="20" \ op monitor interval="20" role="Slave" timeout="20" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="120s" primitive p_drbd_vm_1 ocf:linbit:drbd \ params drbd_resource="vm_1" \ op monitor interval="10" role="Master" timeout="20" \ op monitor interval="20" role="Slave" timeout="20" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="120s" primitive p_fs_shared ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/shared" directory="/shared" fstype="ocfs2" \ meta target-role="Started" \ op monitor interval="10" primitive p_ipmi_node1 stonith:external/ipmi \ params hostname="node1" userid="admin" passwd="xxx" ipaddr="10.0.0.2" interface="lanplus" primitive p_ipmi_node2 stonith:external/ipmi \ params hostname="node2" userid="admin" passwd="xxx" ipaddr="10.0.0.3" interface="lanplus" primitive p_libvirtd lsb:libvirt-bin \ op monitor interval="120s" \ op start interval="0" \ op stop interval="0" primitive p_o2cb ocf:pacemaker:o2cb \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" \ meta target-role="Started" group g_shared p_dlm_controld p_o2cb p_fs_shared ms ms_drbd_shared p_drbd_shared \ meta master-max="2" clone-max="2" notify="true" ms ms_drbd_vm_1 p_drbd_vm_1 \ meta master-max="2" clone-max="2" notify="true" clone cl_libvirtd p_libvirtd \ meta globally-unique="false" interlave="true" clone cl_shared g_shared \ meta interleave="true" location l_ipmi_node1 p_ipmi_node1 -inf: node1 location l_ipmi_node2 p_ipmi_node2 -inf: node2 order o_drbd_before_shared inf: ms_drbd_shared:promote cl_shared:start Packages' versions: clvm 2.02.95-7 corosync 1.4.2-3 dlm-pcmk 3.0.12-3.2+deb7u2 drbd8-utils2:8.3.13-2 libdlm33.0.12-3.2+deb7u2 libdlmcontrol3 3.0.12-3.2+deb7u2 ocfs2-tools1.6.4-1+deb7u1 ocfs2-tools-pacemaker 1.6.4-1+deb7u1 openais1.1.4-4.
Re: [Pacemaker] Two node KVM cluster
On 09/04/2013, at 7:31 PM, Oriol Mula-Valls wrote: > Thanks Andrew I've managed to set up the system and currently I have it > working but still on testing. > > I have configure external/ipmi as fencing device and then I force a reboot > doing a echo b > /proc/sysrq-trigger. The fencing is working properly as the > node is shut off and the VM migrated. However, as soon as I turn on the > fenced now and the OS has started the surviving is shut down. Is it normal or > am I doing something wrong? Can you clarify "turn on the fenced"? > > On the other hand I've seen that in case I completely lose power fencing > obviously fails. Would SBD stonith solve this issue? > > Kind regards, > Oriol > > On 08/04/13 04:11, Andrew Beekhof wrote: >> >> On 03/04/2013, at 9:15 PM, Oriol Mula-Valls wrote: >> >>> Hi, >>> >>> I've started with Linux HA about one year ago. Currently I'm facing a new >>> project in which I have to set up two nodes with high available virtual >>> machines. I have used as a starting point the Digimer's tutorial >>> (https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial). >>> >>> To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. Both >>> machines have 8GB of RAM and 2x500GB HD. I started creating a software >>> RAID1 with the internal drives and installing Debian 7.0 (Wheezy). Apart >>> from the O.S. partition I have created 3 more partitions, one for the >>> shared storage between both machines with OCFS2 and the two other will be >>> used as PVs to create LVs to support the VMs (one for the VMs that will be >>> primary on node1 an the other for primary machines on node2). These 3 >>> partitions are replicated using DRBD. >>> >>> The shared storage folder contains: >>> * ISO images needed when provisioning VMs >>> * scripts used to call virt-install which handles the creation of our VMs. >>> * XML definition files which define the emulated hardware backing the VMs >>> * old copies of the XML definition files. >>> >>> I have more or less done the configuration for the OCFS2 fs and I was about >>> to start the configuration of cLVM for one of the VGs but I have some >>> doubts. I have one dlm for the OCFS2 filesystem, should I create another >>> for cLVM RA? >> >> No, there should only ever be one dlm resource (cloned like you have it) >> >>> >>> This is the current configuration: >>> node node1 >>> node node2 >>> primitive p_dlm_controld ocf:pacemaker:controld \ >>> op start interval="0" timeout="90" \ >>> op stop interval="0" timeout="100" \ >>> op monitor interval="10" >>> primitive p_drbd_shared ocf:linbit:drbd \ >>> params drbd_resource="shared" \ >>> op monitor interval="10" role="Master" timeout="20" \ >>> op monitor interval="20" role="Slave" timeout="20" \ >>> op start interval="0" timeout="240s" \ >>> op stop interval="0" timeout="120s" >>> primitive p_drbd_vm_1 ocf:linbit:drbd \ >>> params drbd_resource="vm_1" \ >>> op monitor interval="10" role="Master" timeout="20" \ >>> op monitor interval="20" role="Slave" timeout="20" \ >>> op start interval="0" timeout="240s" \ >>> op stop interval="0" timeout="120s" >>> primitive p_fs_shared ocf:heartbeat:Filesystem \ >>> params device="/dev/drbd/by-res/shared" directory="/shared" >>> fstype="ocfs2" \ >>> meta target-role="Started" \ >>> op monitor interval="10" >>> primitive p_ipmi_node1 stonith:external/ipmi \ >>> params hostname="node1" userid="admin" passwd="xxx" ipaddr="10.0.0.2" >>> interface="lanplus" >>> primitive p_ipmi_node2 stonith:external/ipmi \ >>> params hostname="node2" userid="admin" passwd="xxx" ipaddr="10.0.0.3" >>> interface="lanplus" >>> primitive p_libvirtd lsb:libvirt-bin \ >>> op monitor interval="120s" \ >>> op start interval="0" \ >>> op stop interval="0" >>> primitive p_o2cb ocf:pacemaker:o2cb \ >>> op start interval="0" timeout="90" \ >>> op stop interval="0" timeout="100" \ >>> op monitor interval="10" \ >>> meta target-role="Started" >>> group g_shared p_dlm_controld p_o2cb p_fs_shared >>> ms ms_drbd_shared p_drbd_shared \ >>> meta master-max="2" clone-max="2" notify="true" >>> ms ms_drbd_vm_1 p_drbd_vm_1 \ >>> meta master-max="2" clone-max="2" notify="true" >>> clone cl_libvirtd p_libvirtd \ >>> meta globally-unique="false" interlave="true" >>> clone cl_shared g_shared \ >>> meta interleave="true" >>> location l_ipmi_node1 p_ipmi_node1 -inf: node1 >>> location l_ipmi_node2 p_ipmi_node2 -inf: node2 >>> order o_drbd_before_shared inf: ms_drbd_shared:promote cl_shared:start >>> >>> Packages' versions: >>> clvm 2.02.95-7 >>> corosync 1.4.2-3 >>> dlm-pcmk 3.0.12-3.2+deb7u2 >>> drbd8-utils2:8.3.13-2 >>> libdlm33.0.12-3.2+deb7u2 >>> libdlmcontrol3 3.0.12-3.2+deb7u2 >>> ocfs2-tools
Re: [Pacemaker] Two node KVM cluster
Thanks Andrew I've managed to set up the system and currently I have it working but still on testing. I have configure external/ipmi as fencing device and then I force a reboot doing a echo b > /proc/sysrq-trigger. The fencing is working properly as the node is shut off and the VM migrated. However, as soon as I turn on the fenced now and the OS has started the surviving is shut down. Is it normal or am I doing something wrong? On the other hand I've seen that in case I completely lose power fencing obviously fails. Would SBD stonith solve this issue? Kind regards, Oriol On 08/04/13 04:11, Andrew Beekhof wrote: On 03/04/2013, at 9:15 PM, Oriol Mula-Valls wrote: Hi, I've started with Linux HA about one year ago. Currently I'm facing a new project in which I have to set up two nodes with high available virtual machines. I have used as a starting point the Digimer's tutorial (https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial). To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. Both machines have 8GB of RAM and 2x500GB HD. I started creating a software RAID1 with the internal drives and installing Debian 7.0 (Wheezy). Apart from the O.S. partition I have created 3 more partitions, one for the shared storage between both machines with OCFS2 and the two other will be used as PVs to create LVs to support the VMs (one for the VMs that will be primary on node1 an the other for primary machines on node2). These 3 partitions are replicated using DRBD. The shared storage folder contains: * ISO images needed when provisioning VMs * scripts used to call virt-install which handles the creation of our VMs. * XML definition files which define the emulated hardware backing the VMs * old copies of the XML definition files. I have more or less done the configuration for the OCFS2 fs and I was about to start the configuration of cLVM for one of the VGs but I have some doubts. I have one dlm for the OCFS2 filesystem, should I create another for cLVM RA? No, there should only ever be one dlm resource (cloned like you have it) This is the current configuration: node node1 node node2 primitive p_dlm_controld ocf:pacemaker:controld \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" primitive p_drbd_shared ocf:linbit:drbd \ params drbd_resource="shared" \ op monitor interval="10" role="Master" timeout="20" \ op monitor interval="20" role="Slave" timeout="20" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="120s" primitive p_drbd_vm_1 ocf:linbit:drbd \ params drbd_resource="vm_1" \ op monitor interval="10" role="Master" timeout="20" \ op monitor interval="20" role="Slave" timeout="20" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="120s" primitive p_fs_shared ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/shared" directory="/shared" fstype="ocfs2" \ meta target-role="Started" \ op monitor interval="10" primitive p_ipmi_node1 stonith:external/ipmi \ params hostname="node1" userid="admin" passwd="xxx" ipaddr="10.0.0.2" interface="lanplus" primitive p_ipmi_node2 stonith:external/ipmi \ params hostname="node2" userid="admin" passwd="xxx" ipaddr="10.0.0.3" interface="lanplus" primitive p_libvirtd lsb:libvirt-bin \ op monitor interval="120s" \ op start interval="0" \ op stop interval="0" primitive p_o2cb ocf:pacemaker:o2cb \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" \ meta target-role="Started" group g_shared p_dlm_controld p_o2cb p_fs_shared ms ms_drbd_shared p_drbd_shared \ meta master-max="2" clone-max="2" notify="true" ms ms_drbd_vm_1 p_drbd_vm_1 \ meta master-max="2" clone-max="2" notify="true" clone cl_libvirtd p_libvirtd \ meta globally-unique="false" interlave="true" clone cl_shared g_shared \ meta interleave="true" location l_ipmi_node1 p_ipmi_node1 -inf: node1 location l_ipmi_node2 p_ipmi_node2 -inf: node2 order o_drbd_before_shared inf: ms_drbd_shared:promote cl_shared:start Packages' versions: clvm 2.02.95-7 corosync 1.4.2-3 dlm-pcmk 3.0.12-3.2+deb7u2 drbd8-utils2:8.3.13-2 libdlm33.0.12-3.2+deb7u2 libdlmcontrol3 3.0.12-3.2+deb7u2 ocfs2-tools1.6.4-1+deb7u1 ocfs2-tools-pacemaker 1.6.4-1+deb7u1 openais1.1.4-4.1 pacemaker 1.1.7-1 As this is my first serious set up suggestions are more than welcome. Thanks for your help. Oriol -- Oriol Mula Valls Institut Català de Ciències del Clima (IC3) Doctor Trueta 203 - 08005 Barcelona Tel:+34
Re: [Pacemaker] Two node KVM cluster
On 03/04/2013, at 9:15 PM, Oriol Mula-Valls wrote: > Hi, > > I've started with Linux HA about one year ago. Currently I'm facing a new > project in which I have to set up two nodes with high available virtual > machines. I have used as a starting point the Digimer's tutorial > (https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial). > > To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. Both > machines have 8GB of RAM and 2x500GB HD. I started creating a software RAID1 > with the internal drives and installing Debian 7.0 (Wheezy). Apart from the > O.S. partition I have created 3 more partitions, one for the shared storage > between both machines with OCFS2 and the two other will be used as PVs to > create LVs to support the VMs (one for the VMs that will be primary on node1 > an the other for primary machines on node2). These 3 partitions are > replicated using DRBD. > > The shared storage folder contains: > * ISO images needed when provisioning VMs > * scripts used to call virt-install which handles the creation of our VMs. > * XML definition files which define the emulated hardware backing the VMs > * old copies of the XML definition files. > > I have more or less done the configuration for the OCFS2 fs and I was about > to start the configuration of cLVM for one of the VGs but I have some doubts. > I have one dlm for the OCFS2 filesystem, should I create another for cLVM RA? No, there should only ever be one dlm resource (cloned like you have it) > > This is the current configuration: > node node1 > node node2 > primitive p_dlm_controld ocf:pacemaker:controld \ > op start interval="0" timeout="90" \ > op stop interval="0" timeout="100" \ > op monitor interval="10" > primitive p_drbd_shared ocf:linbit:drbd \ > params drbd_resource="shared" \ > op monitor interval="10" role="Master" timeout="20" \ > op monitor interval="20" role="Slave" timeout="20" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="120s" > primitive p_drbd_vm_1 ocf:linbit:drbd \ > params drbd_resource="vm_1" \ > op monitor interval="10" role="Master" timeout="20" \ > op monitor interval="20" role="Slave" timeout="20" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="120s" > primitive p_fs_shared ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/shared" directory="/shared" > fstype="ocfs2" \ > meta target-role="Started" \ > op monitor interval="10" > primitive p_ipmi_node1 stonith:external/ipmi \ > params hostname="node1" userid="admin" passwd="xxx" ipaddr="10.0.0.2" > interface="lanplus" > primitive p_ipmi_node2 stonith:external/ipmi \ > params hostname="node2" userid="admin" passwd="xxx" ipaddr="10.0.0.3" > interface="lanplus" > primitive p_libvirtd lsb:libvirt-bin \ > op monitor interval="120s" \ > op start interval="0" \ > op stop interval="0" > primitive p_o2cb ocf:pacemaker:o2cb \ > op start interval="0" timeout="90" \ > op stop interval="0" timeout="100" \ > op monitor interval="10" \ > meta target-role="Started" > group g_shared p_dlm_controld p_o2cb p_fs_shared > ms ms_drbd_shared p_drbd_shared \ > meta master-max="2" clone-max="2" notify="true" > ms ms_drbd_vm_1 p_drbd_vm_1 \ > meta master-max="2" clone-max="2" notify="true" > clone cl_libvirtd p_libvirtd \ > meta globally-unique="false" interlave="true" > clone cl_shared g_shared \ > meta interleave="true" > location l_ipmi_node1 p_ipmi_node1 -inf: node1 > location l_ipmi_node2 p_ipmi_node2 -inf: node2 > order o_drbd_before_shared inf: ms_drbd_shared:promote cl_shared:start > > Packages' versions: > clvm 2.02.95-7 > corosync 1.4.2-3 > dlm-pcmk 3.0.12-3.2+deb7u2 > drbd8-utils2:8.3.13-2 > libdlm33.0.12-3.2+deb7u2 > libdlmcontrol3 3.0.12-3.2+deb7u2 > ocfs2-tools1.6.4-1+deb7u1 > ocfs2-tools-pacemaker 1.6.4-1+deb7u1 > openais1.1.4-4.1 > pacemaker 1.1.7-1 > > As this is my first serious set up suggestions are more than welcome. > > Thanks for your help. > > Oriol > -- > Oriol Mula Valls > Institut Català de Ciències del Clima (IC3) > Doctor Trueta 203 - 08005 Barcelona > Tel:+34 93 567 99 77 > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker