Thanks Andrew I've managed to set up the system and currently I have it working but still on testing.

I have configure external/ipmi as fencing device and then I force a reboot doing a echo b > /proc/sysrq-trigger. The fencing is working properly as the node is shut off and the VM migrated. However, as soon as I turn on the fenced now and the OS has started the surviving is shut down. Is it normal or am I doing something wrong?

On the other hand I've seen that in case I completely lose power fencing obviously fails. Would SBD stonith solve this issue?

Kind regards,
Oriol

On 08/04/13 04:11, Andrew Beekhof wrote:

On 03/04/2013, at 9:15 PM, Oriol Mula-Valls<oriol.mula-va...@ic3.cat>  wrote:

Hi,

I've started with Linux HA about one year ago. Currently I'm facing a new 
project in which I have to set up two nodes with high available virtual 
machines. I have used as a starting point the Digimer's tutorial 
(https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial).

To deploy this new infrastructure I have two Fujitsu Primergy Rx100S7. Both 
machines have 8GB of RAM and 2x500GB HD. I started creating a software RAID1 
with the internal drives and installing Debian 7.0 (Wheezy). Apart from the 
O.S. partition I have created 3 more partitions, one for the shared storage 
between both machines with OCFS2 and the two other will be used as PVs to 
create LVs to support the VMs (one for the VMs that will be primary on node1 an 
the other for primary machines on node2). These 3 partitions are replicated 
using DRBD.

The shared storage folder contains:
* ISO images needed when provisioning VMs
* scripts used to call virt-install which handles the creation of our VMs.
* XML definition files which define the emulated hardware backing the VMs
* old copies of the XML definition files.

I have more or less done the configuration for the OCFS2 fs and I was about to 
start the configuration of cLVM for one of the VGs but I have some doubts. I 
have one dlm for the OCFS2 filesystem, should I create another for cLVM RA?

No, there should only ever be one dlm resource (cloned like you have it)


This is the current configuration:
node node1
node node2
primitive p_dlm_controld ocf:pacemaker:controld \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="100" \
        op monitor interval="10"
primitive p_drbd_shared ocf:linbit:drbd \
        params drbd_resource="shared" \
        op monitor interval="10" role="Master" timeout="20" \
        op monitor interval="20" role="Slave" timeout="20" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="120s"
primitive p_drbd_vm_1 ocf:linbit:drbd \
        params drbd_resource="vm_1" \
        op monitor interval="10" role="Master" timeout="20" \
        op monitor interval="20" role="Slave" timeout="20" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="120s"
primitive p_fs_shared ocf:heartbeat:Filesystem \
        params device="/dev/drbd/by-res/shared" directory="/shared" 
fstype="ocfs2" \
        meta target-role="Started" \
        op monitor interval="10"
primitive p_ipmi_node1 stonith:external/ipmi \
        params hostname="node1" userid="admin" passwd="xxx" ipaddr="10.0.0.2" 
interface="lanplus"
primitive p_ipmi_node2 stonith:external/ipmi \
        params hostname="node2" userid="admin" passwd="xxx" ipaddr="10.0.0.3" 
interface="lanplus"
primitive p_libvirtd lsb:libvirt-bin \
        op monitor interval="120s" \
        op start interval="0" \
        op stop interval="0"
primitive p_o2cb ocf:pacemaker:o2cb \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="100" \
        op monitor interval="10" \
        meta target-role="Started"
group g_shared p_dlm_controld p_o2cb p_fs_shared
ms ms_drbd_shared p_drbd_shared \
        meta master-max="2" clone-max="2" notify="true"
ms ms_drbd_vm_1 p_drbd_vm_1 \
        meta master-max="2" clone-max="2" notify="true"
clone cl_libvirtd p_libvirtd \
        meta globally-unique="false" interlave="true"
clone cl_shared g_shared \
        meta interleave="true"
location l_ipmi_node1 p_ipmi_node1 -inf: node1
location l_ipmi_node2 p_ipmi_node2 -inf: node2
order o_drbd_before_shared inf: ms_drbd_shared:promote cl_shared:start

Packages' versions:
clvm                               2.02.95-7
corosync                           1.4.2-3
dlm-pcmk                           3.0.12-3.2+deb7u2
drbd8-utils                        2:8.3.13-2
libdlm3                            3.0.12-3.2+deb7u2
libdlmcontrol3                     3.0.12-3.2+deb7u2
ocfs2-tools                        1.6.4-1+deb7u1
ocfs2-tools-pacemaker              1.6.4-1+deb7u1
openais                            1.1.4-4.1
pacemaker                          1.1.7-1

As this is my first serious set up suggestions are more than welcome.

Thanks for your help.

Oriol
--
Oriol Mula Valls
Institut Català de Ciències del Clima (IC3)
Doctor Trueta 203 - 08005 Barcelona
Tel:+34 93 567 99 77

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
Oriol Mula Valls
Institut Català de Ciències del Clima (IC3)
Doctor Trueta 203 - 08005 Barcelona
Tel:+34 93 567 99 77

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to