[Pacemaker] SBD fencing with stonith disabled
Hello, I have a two node cluster based on cman+pacemaker running on ubuntu 12.04. Last weekend my active node was shutdown, even while I was stonith disabled. In my config I have: ... primitive stonith_sbd stonith:external/sbd \ params sbd_device=/dev/disk/by-id/wwn-0x60002ac000356f6d-part1 \ meta target-role=Started ... property $id=cib-bootstrap-options \ dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ cluster-infrastructure=cman \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=false \ last-lrm-refresh=1384411940 \ maintenance-mode=false rsc_defaults $id=rsc-options \ resource-stickiness=100 But my node was halted after the message: Nov 16 12:20:47 myotis51 sbd: [1377]: WARN: Latency: No liveness for 4 s exceeds threshold of 3 s ( healthy servants: 0) Should I stop sbd daemon even if I have stonith-enabled=false? -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868887590 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] SBD fencing with stonith disabled
El 19/11/13 09:44, Lars Marowsky-Bree escribió: On 2013-11-19T09:27:10, Angel L. Mateo ama...@um.es wrote: property $id=cib-bootstrap-options \ dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ Wow, that's quite old. It's pacemaker provided by ubuntu 12.04. Nov 16 12:20:47 myotis51 sbd: [1377]: WARN: Latency: No liveness for 4 s exceeds threshold of 3 s ( healthy servants: 0) Should I stop sbd daemon even if I have stonith-enabled=false? stonith-enabled=false does not disable sbd's self-fencing in case of lost devices. (I think I'd be willing to take a patch if it isn't too convoluted.) My cluster's nodes are in vmware vsphere virtual machines. Could I use another stonith device, like external/vcenter? Is there any recommendation about it? You may want to consider using the -P option to enable pacemaker integration though; that could also make things better. Is this a sbd's option? I can't see that options (or it is undocumented): amateo_adm@myotis51:/var/log$ sbd --help sbd: invalid option -- '-' Shared storage fencing tool. Syntax: sbd options command cmdarguments Options: -d devname Block device to use (mandatory; can be specified up to 3 times) -h Display this help. -n node Set local node name; defaults to uname -n (optional) -R Do NOT enable realtime priority (debugging only) -W Use watchdog (recommended) (watch only) -w dev Specify watchdog device (optional) (watch only) -T Do NOT initialize the watchdog timeout (watch only) -v Enable some verbose debug logging (optional) -1 NSet watchdog timeout to N seconds (optional, create only) -2 NSet slot allocation timeout to N seconds (optional, create only) -3 NSet daemon loop timeout to N seconds (optional, create only) -4 NSet msgwait timeout to N seconds (optional, create only) -5 NWarn if loop latency exceeds threshold (optional, watch only) (default is 3, set to 0 to disable) -t NInterval in seconds for automatic child restarts (optional) (default is 3600, set to 0 to disable) Commands: create initialize N slots on dev - OVERWRITES DEVICE! listList all allocated slots on device, and messages. dumpDump meta-data header from device. watch Loop forever, monitoring own slot allocate node Allocate a slot for node (optional) message node (test|reset|off|clear|exit) Writes the specified message to node's slot. Note that not running the sbd daemon and setting stonith-enabled=true again will yield false STONITH successes. You're really not encouraged to do that, or only very carefully. Yes, I know it. I had stonith disabled (but sbd was running) because I'm having latency problems with my fiber channel disks, so I wanted to debug them without unnecessary reboots. -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868887590 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] SBD fencing with stonith disabled
El 19/11/13 11:34, Lars Marowsky-Bree escribió: On 2013-11-19T11:25:36, Angel L. Mateo ama...@um.es wrote: property $id=cib-bootstrap-options \ dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ Wow, that's quite old. It's pacemaker provided by ubuntu 12.04. Yeah, well. Still old. Probably something to complain to about to the distribution maintainers. My cluster's nodes are in vmware vsphere virtual machines. Could I use another stonith device, like external/vcenter? Is there any recommendation about it? Yes, you should also be able to use that. But is it recommended for a two node cluster? I remember me reading in some place that in such scenario is better a sbd stonith because it provides mechanism (but I could be wrong) -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868887590 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problems with SBD fencing
El 06/08/13 13:49, Jan Christian Kaldestad escribió: In my case this does not work - read my original post. So I wonder if there is a pacemaker bug (version 1.1.9-2db99f1). Killing pengine and stonithd on the node which is supposed to shoot seems to resolve the problem, though this is not a solution of course. I also tested two separate stonith resources, one on each node. This stonith'ing works fine with this configuration. Is there somehing wrong about doing it this way? Are you sure you have property stonith-enabled=true? -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problems with SBD fencing
El 06/08/13 13:49, Jan Christian Kaldestad escribió: In my case this does not work - read my original post. So I wonder if there is a pacemaker bug (version 1.1.9-2db99f1). Killing pengine and stonithd on the node which is supposed to shoot seems to resolve the problem, though this is not a solution of course. I also tested two separate stonith resources, one on each node. This stonith'ing works fine with this configuration. Is there somehing wrong about doing it this way? For me to work (ubuntu 12.04) I had to create /etc/sysconfig/sbd file with: SBD_DEVICE=/dev/disk/by-id/wwn-0x6006016009702500a4227a04c6b0e211-part1 SBD_OPTS=-W and the resource configuration is primitive stonith_sbd stonith:external/sbd \ params sbd_device=/dev/disk/by-id/wwn-0x6006016009702500a4227a04c6b0e211-part1 \ meta target-role=Started Where /dev/disk/by-id/wwn-0x6006016009702500a4227a04c6b0e211-part1 is my disk device. -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Disabling failover for a resource
El 27/05/13 09:28, Michael Schwartzkopff escribió: Am Montag, 27. Mai 2013, 09:19:41 schrieb Angel L. Mateo: Hello, I have configured a active/passive cluster for my dovecot server. Now I want to add to it a resource for running the backup service. I want this resource to be run on the same node that the dovecot resource, but I don't want it to produce any failover. I mean, if dovecot resource is move to another node, then it should be moved too; but it the backup resource fails, then nothing has to be done with other resources. Is it enough for this just disabling monitoring in the resource? No. Do it proper. 1) Make a colocation for the backup resource to dovecot: col col_backup_dovecot inf: res_Backup res_Dovecot 2) Prevent the backup resource running on node2: loc loc_Backup resBackup -inf: node2 I tried this. This way I can't even move res_dovecot to node2, I guess because res_backup can't run on node2. What I want is not forbid res_backup to run on node2, but just avoid it to produce a failover. But it the failover is because of res_dovecot, then res_backup should run on node2. -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868887590 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Disabling failover for a resource
El 27/05/13 09:28, Michael Schwartzkopff escribió: Am Montag, 27. Mai 2013, 09:19:41 schrieb Angel L. Mateo: Hello, I have configured a active/passive cluster for my dovecot server. Now I want to add to it a resource for running the backup service. I want this resource to be run on the same node that the dovecot resource, but I don't want it to produce any failover. I mean, if dovecot resource is move to another node, then it should be moved too; but it the backup resource fails, then nothing has to be done with other resources. Is it enough for this just disabling monitoring in the resource? No. Do it proper. 1) Make a colocation for the backup resource to dovecot: col col_backup_dovecot inf: res_Backup res_Dovecot 2) Prevent the backup resource running on node2: loc loc_Backup resBackup -inf: node2 But I want the backup resource to be run on node2 when dovecot resource is running on node2. Is it possible with this configuration? -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] cman + corosync + pacemaker + fence_scsi
El 26/04/13 02:01, Andrew Beekhof escribió: On 24/04/2013, at 10:48 PM, Angel L. Mateo ama...@um.es wrote: Hello, I'm trying to configure a 2 node cluster in ubuntu with cman + corosync + pacemaker (the use of cman is because it is recommended at pacemaker quickstart). In order to solve the split brain in the 2 node cluster I'm using qdisk. If you want to use qdisk, then you need something newer than 1.1.8 (which did not know how to filter qdisk from the membership). Oopps. I have cman 3.1.7, corosync 1.4.2 and pacemaker 1.1.6 (the ones provided with ubuntu 12.04). My purpose for using qdisk is to solve split brain problem in my two nodes cluster. Another suggestion for this? For fencing, I'm trying to use fence_scsi and in this point I'm having the problem. I have attached my cluster.conf. xml node id=/dev/block/8:33 type=normal uname=/dev/block/8:33/ node myotis51 node myotis52 primitive cluster_ip ocf:heartbeat:IPaddr2 \ params ip=155.54.211.167 \ op monitor interval=30s property $id=cib-bootstrap-options \ dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ cluster-infrastructure=cman \ stonith-enabled=false \ last-lrm-refresh=1366803979 At this moment I'm trying just with an IP resource, but at the end I'll get LVM resources and dovecot server running in top of them. The problem I have is that whenever I interrupt network traffic between my nodes (to check if quorum and fencing is working) the IP resource is started in both nodes of the cluster. Do both side claim to have quorum? Also, had you enabled fencing the cluster would have shot its peer before trying to start the IP. I think I did (and this configuration with stonith disabled is because modified for later tests) but I will check it again. So it seems that node fencing configure at cluster.conf is not working for me. Because pacemaker cannot use it from there. You need to follow http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_configuring_cman_fencing.html and then teach pacemaker about fence_scsi: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch09.html Then I have tried to configure as a stonith resource (since it is listed by sudo crm ra list stonith), so I have tried to include primitive stonith_fence_scsi stonith:redhat/fence_scsi The problem I'm having with this is that I don't know how to indicate params for the resource (I have tried params devices=..., params -d ..., but they are not accepted) and with this (default) configuration I get: See the above link to chapter 9. I have tried this. The problem I'm having is that I don't know how to create the resource using fence_scsi. I have tried different syntaxes crm(live)configure# primitive stonith_fence_scsi stonith:redhat/fence_scsi \ params name=scsi_fence devices=/dev/sdc ERROR: stonith_fence_scsi: parameter name does not exist ERROR: stonith_fence_scsi: parameter devices does not exist crm(live)configure# primitive stonith_fence_scsi stonith:redhat/fence_scsi \ params n=scsi_fence d=/dev/sdc ERROR: stonith_fence_scsi: parameter d does not exist ERROR: stonith_fence_scsi: parameter n does not exist crm(live)configure# primitive stonith_fence_scsi stonith:redhat/fence_scsi \ params -n=scsi_fence -d=/dev/sdc ERROR: stonith_fence_scsi: parameter -d does not exist ERROR: stonith_fence_scsi: parameter -n does not exist Does anyone has an example for this? What I would like to do is that in case of problems, the node with the use of scsi channel (the one using my LMV volumes) shoots the other one. Could I use the same behaviour with external/sbd stonith resource? -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] best setup for corosync + pacemaker in ubuntu 12.04
Hello everbody, As suggested by Andreas Mock in a previous thread... what is the best setup for corosync and pacemaker in a VM running ubuntu 12.04? In pacemaker's quickstart (http://clusterlabs.org/quickstart-ubuntu.html) it is best adding cman to the configuration, but I'm not sure about this. What do you think? -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
El 25/03/13 20:50, Jacek Konieczny escribió: On Mon, 25 Mar 2013 20:01:28 +0100 Angel L. Mateo ama...@um.es wrote: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } Corosync will then manage quorum for the two-node cluster and Pacemaker I'm using corosync 1.1 which is the one provided with my distribution (ubuntu 12.04). I could also use cman. I don't think corosync 1.1 can do that, but I guess in this case cman should be able provide this functionality. Sorry, it's corosync 1.4, not 1.1. can use that. You still need proper fencing to enforce the quorum (both for pacemaker and the storage layer – dlm in case you use clvmd), but no extra quorum node is needed. I hace configured a dlm resource usted with clvm. One doubt... With this configuration, how split brain problem is handled? The first node to notice that the other is unreachable will fence (kill) the other, making sure it is the only one operating on the shared data. Even though it is only half of the node, the cluster is considered quorate as the other node is known not to be running any cluster resources. When the fenced node reboots its cluster stack starts, but with no quorum until it comminicates with the surviving node again. So no cluster services are started there until both nodes communicate properly and the proper quorum is recovered. But, will this work with corosync 1.4? Alghtough with corosync 1.4 I may won't be able to use quorum configuration you said (I'll try), I have configured no-quorum-policy=ignore so the cluster could still run in the case of one node failing. Could this be a problem? -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] stonith and avoiding split brain in two nodes cluster
Hello, I am newbie with pacemaker (and, generally, with ha clusters). I have configured a two nodes cluster. Both nodes are virtual machines (vmware esx) and use a shared storage (provided by a SAN, although access to the SAN is from esx infrastructure and VM consider it as scsi disk). I have configured clvm so logical volumes are only active in one of the nodes. Now I need some help with the stonith configuration to avoid data corrumption. Since I'm using ESX virtual machines, I think I won't have any problem using external/vcenter stonith plugin to shutdown virtual machines. My problem is how to avoid split brain situation with this configuration, without configuring a 3rd node. I have read about quorum disks, external/sbd stonith plugin and other references, but I'm too confused with all this. For example, [1] mention techniques to improve quorum with scsi reserve or quorum daemon, but it didn't point to how to do this pacemaker. Or [2] talks about external/sbd. Any help? PS: I have attached my corosync.conf and crm configure show outputs [1] http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html [2] http://www.gossamer-threads.com/lists/linuxha/pacemaker/78887 -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 # Please read the openais.conf.5 manual page totem { version: 2 # How long before declaring a token lost (ms) token: 3000 # How many token retransmits before forming a new configuration token_retransmits_before_loss_const: 10 # How long to wait for join messages in the membership protocol (ms) join: 60 # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms) consensus: 3600 # Turn off the virtual synchrony filter vsftype: none # Number of messages that may be sent by one processor on receipt of the token max_messages: 20 # Limit generated nodeids to 31-bits (positive signed integers) clear_node_high_bit: yes # Disable encryption secauth: off # How many threads to use for encryption/decryption threads: 0 # Optionally assign a fixed node id (integer) # nodeid: 1234 # This specifies the mode of redundant ring, which may be none, active, or passive. rrp_mode: none interface { # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 155.54.211.160 mcastaddr: 226.94.1.1 mcastport: 5405 } } amf { mode: disabled } service { # Load the Pacemaker Cluster Resource Manager ver: 1 name: pacemaker } aisexec { user: root group: root } logging { fileline: off to_stderr: yes to_logfile: no to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } } node myotis51 node myotis52 primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ meta target-role=Started primitive dlm ocf:pacemaker:controld \ meta target-role=Started primitive vg_users1 ocf:heartbeat:LVM \ params volgrpname=UsersDisk exclusive=yes \ op monitor interval=60 timeout=60 group dlm-clvm dlm clvm clone dlm-clvm-clone dlm-clvm \ meta interleave=true ordered=true target-role=Started location cli-prefer-vg_users1 vg_users1 \ rule $id=cli-prefer-rule-vg_users1 inf: #uname eq myotis52 property $id=cib-bootstrap-options \ dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1364212376 rsc_defaults $id=rsc-options \ resource-stickiness=100 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] stonith and avoiding split brain in two nodes cluster
Jacek Konieczny jaj...@jajcus.net escribió: On Mon, 25 Mar 2013 13:54:22 +0100 My problem is how to avoid split brain situation with this configuration, without configuring a 3rd node. I have read about quorum disks, external/sbd stonith plugin and other references, but I'm too confused with all this. For example, [1] mention techniques to improve quorum with scsi reserve or quorum daemon, but it didn't point to how to do this pacemaker. Or [2] talks about external/sbd. Any help? With corosync 2.2 (2.1 too, I guess) you can use, in corosync.conf: quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 } Corosync will then manage quorum for the two-node cluster and Pacemaker I'm using corosync 1.1 which is the one provided with my distribution (ubuntu 12.04). I could also use cman. can use that. You still need proper fencing to enforce the quorum (both for pacemaker and the storage layer – dlm in case you use clvmd), but no extra quorum node is needed. I hace configured a dlm resource usted with clvm. One doubt... With this configuration, how split brain problem is handled? There is one more thing, though: you need two nodes active to boot the cluster, but then when one fails (and is fenced) the other may continue, keeping quorum. Greets, Jacek -- Enviado desde mi teléfono Android con K-9 Mail. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pacemaker + corosync + clvm in ubuntu
I can't finde this packet in ubuntu repos. I'll try to use debian's one emmanuel segura emi2f...@gmail.com escribió: Hello Angel I'm using debian, i don't know if the result on ubuntu is the same, try apt-file search dlm_controld.pcmk Result should be: dlm-pcmk: /usr/sbin/dlm_controld.pcmk 2013/3/22 Angel L. Mateo ama...@um.es Hello, I'm trying to configure a cluster based in pacemaker and corosync in two ubuntu precise servers. The cluster is for an active/standby pop/imap server with a shared storage accesed through fibrechannel. In order to avoid concurrent access to this shared storage, I need clvm (maybe I'm wrong), so I'm trying to configure it. According to different guides and howtos I have found I have configured a DLM and clvm resource: root@myotis51:/etc/cluster# crm configure show node myotis51 node myotis52 primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ meta target-role=Started primitive dlm ocf:pacemaker:controld \ meta target-role=Started group dlm-clvm dlm clvm clone dlm-clvm-clone dlm-clvm \ meta interleave=true ordered=true property $id=cib-bootstrap-options \ dc-version=1.1.6-**9971ebba4494012a93c03b40a2c58e**c0eb60f50c \ cluster-infrastructure=cman \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1363957949 rsc_defaults $id=rsc-options \ resource-stickiness=100 With this configuration, resources are not launched, I think because DLM is failing because it's trying to launch dlm_controld.pcmk which is not installed in my system Mar 22 13:52:57 myotis51 pengine: [2989]: notice: LogActions: Start dlm:0 (myotis51) Mar 22 13:52:57 myotis51 pengine: [2989]: notice: LogActions: Leave dlm:1 (Stopped) Mar 22 13:52:57 myotis51 crmd: [2990]: info: te_rsc_command: Initiating action 4: monitor dlm:0_monitor_0 on myotis51 (local) Mar 22 13:52:57 myotis51 crmd: [2990]: info: do_lrm_rsc_op: Performing key=4:0:7:71fa2334-a3f3-4c01-**a000-7e702a32d0e2 op=dlm:0_monitor_0 ) Mar 22 13:52:57 myotis51 lrmd: [2987]: info: rsc:dlm:0 probe[2] (pid 3050) Mar 22 13:52:57 myotis51 controld[3050]: ERROR: Setup problem: couldn't find command: dlm_controld.pcmk Mar 22 13:52:57 myotis51 lrmd: [2987]: info: operation monitor[2] on dlm:0 for client 2990: pid 3050 exited with return code 5 Mar 22 13:52:57 myotis51 crmd: [2990]: info: process_lrm_event: LRM operation dlm:0_monitor_0 (call=2, rc=5, cib-update=27, confirmed=true) not installed Mar 22 13:52:57 myotis51 crmd: [2990]: WARN: status_from_rc: Action 4 (dlm:0_monitor_0) on myotis51 failed (target: 7 vs. rc: 5): Error Mar 22 13:52:57 myotis51 crmd: [2990]: info: abort_transition_graph: match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=dlm:0_last_failure_0, magic=0:5;4:0:7:71fa2334-a3f3-**4c01-a000-7e702a32d0e2, cib=0.32.14) : Event failed Mar 22 13:52:57 myotis51 crmd: [2990]: info: match_graph_event: Action dlm:0_monitor_0 (4) confirmed on myotis51 (rc=4) Mar 22 13:52:57 myotis51 pengine: [2989]: notice: unpack_rsc_op: Hard error - dlm:0_last_failure_0 failed with rc=5: Preventing dlm-clvm-clone from re-starting on myotis51 Mar 22 13:52:57 myotis51 pengine: [2989]: notice: LogActions: Leave dlm:0 (Stopped) Mar 22 13:52:57 myotis51 pengine: [2989]: notice: LogActions: Leave dlm:1 (Stopped) The problem with this is that I can't find any dlm_controld.pcmk binary for ubuntu. Any idea on how to fix this? The closest command I have found is dlm_controld provided with cman packages, but then I have to replace corosync with cman. Doing this is not a big problem for me. The fact is that I'm newbie in HA and the use of corosync instead of cman is because this is the one documented with pacemaker (http://clusterlabs.org/doc/**en-US/Pacemaker/1.1-plugin/** html/Clusters_from_Scratch/**index.htmlhttp://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/index.html ). Is it corosync supposed to be better (or more open or more standard based) than cman? In case corosync is more recommended, then what is the solution for the dlm problem? Thanks in advanced. -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 __**_ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/**mailman/listinfo/pacemakerhttp://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/**doc/Cluster_from_Scratch.pdfhttp://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- esta es mi vida e me la vivo hasta que dios
[Pacemaker] pacemaker + corosync + clvm in ubuntu
Hello, I'm trying to configure a cluster based in pacemaker and corosync in two ubuntu precise servers. The cluster is for an active/standby pop/imap server with a shared storage accesed through fibrechannel. In order to avoid concurrent access to this shared storage, I need clvm (maybe I'm wrong), so I'm trying to configure it. According to different guides and howtos I have found I have configured a DLM and clvm resource: root@myotis51:/etc/cluster# crm configure show node myotis51 node myotis52 primitive clvm ocf:lvm2:clvmd \ params daemon_timeout=30 \ meta target-role=Started primitive dlm ocf:pacemaker:controld \ meta target-role=Started group dlm-clvm dlm clvm clone dlm-clvm-clone dlm-clvm \ meta interleave=true ordered=true property $id=cib-bootstrap-options \ dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ cluster-infrastructure=cman \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1363957949 rsc_defaults $id=rsc-options \ resource-stickiness=100 With this configuration, resources are not launched, I think because DLM is failing because it's trying to launch dlm_controld.pcmk which is not installed in my system Mar 22 13:52:57 myotis51 pengine: [2989]: notice: LogActions: Start dlm:0 (myotis51) Mar 22 13:52:57 myotis51 pengine: [2989]: notice: LogActions: Leave dlm:1 (Stopped) Mar 22 13:52:57 myotis51 crmd: [2990]: info: te_rsc_command: Initiating action 4: monitor dlm:0_monitor_0 on myotis51 (local) Mar 22 13:52:57 myotis51 crmd: [2990]: info: do_lrm_rsc_op: Performing key=4:0:7:71fa2334-a3f3-4c01-a000-7e702a32d0e2 op=dlm:0_monitor_0 ) Mar 22 13:52:57 myotis51 lrmd: [2987]: info: rsc:dlm:0 probe[2] (pid 3050) Mar 22 13:52:57 myotis51 controld[3050]: ERROR: Setup problem: couldn't find command: dlm_controld.pcmk Mar 22 13:52:57 myotis51 lrmd: [2987]: info: operation monitor[2] on dlm:0 for client 2990: pid 3050 exited with return code 5 Mar 22 13:52:57 myotis51 crmd: [2990]: info: process_lrm_event: LRM operation dlm:0_monitor_0 (call=2, rc=5, cib-update=27, confirmed=true) not installed Mar 22 13:52:57 myotis51 crmd: [2990]: WARN: status_from_rc: Action 4 (dlm:0_monitor_0) on myotis51 failed (target: 7 vs. rc: 5): Error Mar 22 13:52:57 myotis51 crmd: [2990]: info: abort_transition_graph: match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=dlm:0_last_failure_0, magic=0:5;4:0:7:71fa2334-a3f3-4c01-a000-7e702a32d0e2, cib=0.32.14) : Event failed Mar 22 13:52:57 myotis51 crmd: [2990]: info: match_graph_event: Action dlm:0_monitor_0 (4) confirmed on myotis51 (rc=4) Mar 22 13:52:57 myotis51 pengine: [2989]: notice: unpack_rsc_op: Hard error - dlm:0_last_failure_0 failed with rc=5: Preventing dlm-clvm-clone from re-starting on myotis51 Mar 22 13:52:57 myotis51 pengine: [2989]: notice: LogActions: Leave dlm:0 (Stopped) Mar 22 13:52:57 myotis51 pengine: [2989]: notice: LogActions: Leave dlm:1 (Stopped) The problem with this is that I can't find any dlm_controld.pcmk binary for ubuntu. Any idea on how to fix this? The closest command I have found is dlm_controld provided with cman packages, but then I have to replace corosync with cman. Doing this is not a big problem for me. The fact is that I'm newbie in HA and the use of corosync instead of cman is because this is the one documented with pacemaker (http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/index.html). Is it corosync supposed to be better (or more open or more standard based) than cman? In case corosync is more recommended, then what is the solution for the dlm problem? Thanks in advanced. -- Angel L. Mateo Martínez Sección de Telemática Área de Tecnologías de la Información y las Comunicaciones Aplicadas (ATICA) http://www.um.es/atica Tfo: 868889150 Fax: 86337 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org