[ClusterLabs] Antw: After reboot each node thinks the other is offline.
>>> "Stephen Carville (HA List)" <62d2a...@opayq.com> schrieb am 31.07.2017 um 20:17 in Nachricht : > I am experimenting with pacemaker for high availability for some load > balancers. I was able to sucessfully get two CentOS (6.9) machines > (scahadev01da and scahadev01db) to form a cluster and the shared IP was > assigned to scahadev01da. I simulated a failure by halting the primary > and the secondary eventually noticed bringing up the shared IP on its > eth0. So far, so good. > > A problem arises when the primary comes back up and, for some reason, > each node thinks the other is offline. This leads to both nodes adding If a node thinks the other is unexpectedly offline, it will fence it, and then it will be offline! Thus the IP can't run there. I guess you have no fencing configured, right? Regards, Ulrich > the duplicate IP to its own eth0. I probably do not need to tell you > the mischief that can cause if these were production servers. > > I tried restarting cman, pcsd and pacemaker on both machines with no > effect on the situation. > > I've found several mentions of it in the search engines but I've been > unable to find how to fix it. Any help is appreciated > > Both nodes have quorum disabled in /etc/sysconfig/cman > > CMAN_QUORUM_TIMEOUT=0 > > # > Node 1 > > scahadev01da# sudo pcs status > Cluster name: scahadev01d > Stack: cman > Current DC: scahadev01da (version 1.1.15-5.el6-e174ec8) - partition > WITHOUT quorum > Last updated: Mon Jul 31 10:43:54 2017Last change: Mon Jul 31 > 10:30:46 > 2017 by root via cibadmin on scahadev01da > > 2 nodes and 1 resource configured > > Online: [ scahadev01da ] > OFFLINE: [ scahadev01db ] > > Full list of resources: > > VirtualIP(ocf::heartbeat:IPaddr2): Started scahadev01da > > Daemon Status: > cman: active/enabled > corosync: active/disabled > pacemaker: active/enabled > pcsd: active/enabled > > # > Node 2 > > scahadev01db ~]$ sudo pcs status > Cluster name: scahadev01d > Stack: cman > Current DC: scahadev01db (version 1.1.15-5.el6-e174ec8) - partition > WITHOUT quorum > Last updated: Mon Jul 31 10:43:47 2017Last change: Sat Jul 29 > 13:45:15 > 2017 by root via cibadmin on scahadev01da > > 2 nodes and 1 resource configured > > Online: [ scahadev01db ] > OFFLINE: [ scahadev01da ] > > Full list of resources: > > VirtualIP(ocf::heartbeat:IPaddr2): Started scahadev01db > > Daemon Status: > cman: active/enabled > corosync: active/disabled > pacemaker: active/enabled > pcsd: active/enabled > > -- > Stephen Carville > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: fence_vmware_soap: reads VM status but fails to reboot/on/off
>>> Octavian Ciobanu schrieb am 31.07.2017 um 20:16 in Nachricht : > Hello, > > Before I implement the cluster I'm testing the fence agents and I got stuck > at the rebooting the VMware based VMs. > > I have installed VMware ESXi 6.5 Hypervisor with 5 VMs. > > If I call : > # fence_vmware_soap --ssl --ip esxi_ip --username root --password pass > --action list > I get the list with the names and UUIDs of the VMs. > > If I call : > # fence_vmware_soap --ssl --ip esxi_ip --username root --password pass > --action status --plug "564d5bce-3c55-2b02-1a8b-052c1fd24d6d" > I get the status of the VM. > > But when I call any of the power actions (on, off, reboot) I get "Failed: > Timed out waiting to power OFF". > > I've tried with all the combinations of --power-timeout and --power-wait > and same error without any change in the response time. > > Any ideas from where or how to fix this issue ? I suspect "power off" is actually a virtual press of the ACPI power button (reboot likewise), so your VM tries to shut down cleanly. That could take time, and it could hang (I guess). I don't use VMware, but maybe there's a "reset" action that presses the virtual reset button of the virtual hardware... ;-) Regards, Ulrich > > Thank you in advance. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: DRBD AND cLVM ???
>>> "Lentes, Bernd" schrieb am 31.07.2017 um 18:51 in Nachricht <641329685.12981098.1501519915026.javamail.zim...@helmholtz-muenchen.de>: > Hi, > > i'm currently a bit confused. I have several resources running as > VirtualDomains, the vm reside on plain logical volumes without fs, these lv's > reside themself on a FC SAN. > In that scenario i need cLVM to distribute the lvm metadata between the > nodes. > For playing around a bit and getting used to it i created a DRBD partition. > It resides on a logical volume (one on each node), which should be possible > following the documentation on linbit. > The lv's reside each on a node on the local storage, not on the SAN (which > would be a very strange configuration). So you use cLVM to crate local VGs, and you use DRBD to sync the local LVs? Why don't you use the shared SAN? > But nevertheless it's a cLVM configuration. I don't think it's possible to > have a cLVM and non-cLVM configuration at the same time on the same node. YOu can definitely have clustered and non-clustered VGs on one node. > Is that possible what i try to do ? I'm still wonderinh what you really want to accieve. Regards, Ulrich > > > Bernd > > > -- > Bernd Lentes > > Systemadministration > institute of developmental genetics > Gebäude 35.34 - Raum 208 > HelmholtzZentrum München > bernd.len...@helmholtz-muenchen.de > phone: +49 (0)89 3187 1241 > fax: +49 (0)89 3187 2294 > > no backup - no mercy > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe > Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons > Enhsen > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] After reboot each node thinks the other is offline.
I am experimenting with pacemaker for high availability for some load balancers. I was able to sucessfully get two CentOS (6.9) machines (scahadev01da and scahadev01db) to form a cluster and the shared IP was assigned to scahadev01da. I simulated a failure by halting the primary and the secondary eventually noticed bringing up the shared IP on its eth0. So far, so good. A problem arises when the primary comes back up and, for some reason, each node thinks the other is offline. This leads to both nodes adding the duplicate IP to its own eth0. I probably do not need to tell you the mischief that can cause if these were production servers. I tried restarting cman, pcsd and pacemaker on both machines with no effect on the situation. I've found several mentions of it in the search engines but I've been unable to find how to fix it. Any help is appreciated Both nodes have quorum disabled in /etc/sysconfig/cman CMAN_QUORUM_TIMEOUT=0 # Node 1 scahadev01da# sudo pcs status Cluster name: scahadev01d Stack: cman Current DC: scahadev01da (version 1.1.15-5.el6-e174ec8) - partition WITHOUT quorum Last updated: Mon Jul 31 10:43:54 2017 Last change: Mon Jul 31 10:30:46 2017 by root via cibadmin on scahadev01da 2 nodes and 1 resource configured Online: [ scahadev01da ] OFFLINE: [ scahadev01db ] Full list of resources: VirtualIP (ocf::heartbeat:IPaddr2): Started scahadev01da Daemon Status: cman: active/enabled corosync: active/disabled pacemaker: active/enabled pcsd: active/enabled # Node 2 scahadev01db ~]$ sudo pcs status Cluster name: scahadev01d Stack: cman Current DC: scahadev01db (version 1.1.15-5.el6-e174ec8) - partition WITHOUT quorum Last updated: Mon Jul 31 10:43:47 2017 Last change: Sat Jul 29 13:45:15 2017 by root via cibadmin on scahadev01da 2 nodes and 1 resource configured Online: [ scahadev01db ] OFFLINE: [ scahadev01da ] Full list of resources: VirtualIP (ocf::heartbeat:IPaddr2): Started scahadev01db Daemon Status: cman: active/enabled corosync: active/disabled pacemaker: active/enabled pcsd: active/enabled -- Stephen Carville ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] fence_vmware_soap: reads VM status but fails to reboot/on/off
Hello, Before I implement the cluster I'm testing the fence agents and I got stuck at the rebooting the VMware based VMs. I have installed VMware ESXi 6.5 Hypervisor with 5 VMs. If I call : # fence_vmware_soap --ssl --ip esxi_ip --username root --password pass --action list I get the list with the names and UUIDs of the VMs. If I call : # fence_vmware_soap --ssl --ip esxi_ip --username root --password pass --action status --plug "564d5bce-3c55-2b02-1a8b-052c1fd24d6d" I get the status of the VM. But when I call any of the power actions (on, off, reboot) I get "Failed: Timed out waiting to power OFF". I've tried with all the combinations of --power-timeout and --power-wait and same error without any change in the response time. Any ideas from where or how to fix this issue ? Thank you in advance. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] DRBD AND cLVM ???
On 2017-07-31 12:51 PM, Lentes, Bernd wrote: > Hi, > > i'm currently a bit confused. I have several resources running as > VirtualDomains, the vm reside on plain logical volumes without fs, these lv's > reside themself on a FC SAN. > In that scenario i need cLVM to distribute the lvm metadata between the nodes. > For playing around a bit and getting used to it i created a DRBD partition. > It resides on a logical volume (one on each node), which should be possible > following the documentation on linbit. > The lv's reside each on a node on the local storage, not on the SAN (which > would be a very strange configuration). > But nevertheless it's a cLVM configuration. I don't think it's possible to > have a cLVM and non-cLVM configuration at the same time on the same node. > Is that possible what i try to do ? > > Bernd It's not recommended, but it is possible with creative use of filter = [] in lvm.conf. I've not done it myself, mind you. As far as clvmd on DRBD, to LVM, it's no different if the block device is a SAN LUN or DRBD... It only cares that a changed block/inode on one side is the same on the other. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] DRBD AND cLVM ???
Hi, i'm currently a bit confused. I have several resources running as VirtualDomains, the vm reside on plain logical volumes without fs, these lv's reside themself on a FC SAN. In that scenario i need cLVM to distribute the lvm metadata between the nodes. For playing around a bit and getting used to it i created a DRBD partition. It resides on a logical volume (one on each node), which should be possible following the documentation on linbit. The lv's reside each on a node on the local storage, not on the SAN (which would be a very strange configuration). But nevertheless it's a cLVM configuration. I don't think it's possible to have a cLVM and non-cLVM configuration at the same time on the same node. Is that possible what i try to do ? Bernd -- Bernd Lentes Systemadministration institute of developmental genetics Gebäude 35.34 - Raum 208 HelmholtzZentrum München bernd.len...@helmholtz-muenchen.de phone: +49 (0)89 3187 1241 fax: +49 (0)89 3187 2294 no backup - no mercy Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Two nodes cluster issue
Please ignore my re-reply to the original message, I'm in the middle of a move and am getting by on little sleep at the moment :-) On Mon, 2017-07-31 at 09:26 -0500, Ken Gaillot wrote: > On Mon, 2017-07-24 at 11:51 +, Tomer Azran wrote: > > Hello, > > > > > > > > We built a pacemaker cluster with 2 physical servers. > > > > We configured DRBD in Master\Slave setup, a floating IP and file > > system mount in Active\Passive mode. > > > > We configured two STONITH devices (fence_ipmilan), one for each > > server. > > > > > > > > We are trying to simulate a situation when the Master server crushes > > with no power. > > > > We pulled both of the PSU cables and the server becomes offline > > (UNCLEAN). > > > > The resources that the Master use to hold are now in Started (UNCLEAN) > > state. > > > > The state is unclean since the STONITH failed (the STONITH device is > > located on the server (Intel RMM4 - IPMI) – which uses the same power > > supply). > > > > > > > > The problem is that now, the cluster does not releasing the resources > > that the Master holds, and the service goes down. > > > > > > > > Is there any way to overcome this situation? > > > > We tried to add a qdevice but got the same results. > > > > > > > > We are using pacemaker 1.1.15 on CentOS 7.3 > > > > > > > > Thanks, > > > > Tomer. > > This is a limitation of using IPMI as the only fence device, when the > IPMI shares power with the main system. The way around it is to use a > fallback fence device, for example a switched power unit or sbd > (watchdog). Pacemaker lets you specify a fencing "topology" with > multiple devices -- level 1 would be the IPMI, and level 2 would be the > fallback device. > > qdevice helps with quorum, which would let one side attempt to fence the > other, but it doesn't affect whether the fencing succeeds. With a > two-node cluster, you can use qdevice to get quorum, or you can use > corosync's two_node option. > -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Two nodes cluster issue
On Mon, 2017-07-24 at 11:51 +, Tomer Azran wrote: > Hello, > > > > We built a pacemaker cluster with 2 physical servers. > > We configured DRBD in Master\Slave setup, a floating IP and file > system mount in Active\Passive mode. > > We configured two STONITH devices (fence_ipmilan), one for each > server. > > > > We are trying to simulate a situation when the Master server crushes > with no power. > > We pulled both of the PSU cables and the server becomes offline > (UNCLEAN). > > The resources that the Master use to hold are now in Started (UNCLEAN) > state. > > The state is unclean since the STONITH failed (the STONITH device is > located on the server (Intel RMM4 - IPMI) – which uses the same power > supply). > > > > The problem is that now, the cluster does not releasing the resources > that the Master holds, and the service goes down. > > > > Is there any way to overcome this situation? > > We tried to add a qdevice but got the same results. > > > > We are using pacemaker 1.1.15 on CentOS 7.3 > > > > Thanks, > > Tomer. This is a limitation of using IPMI as the only fence device, when the IPMI shares power with the main system. The way around it is to use a fallback fence device, for example a switched power unit or sbd (watchdog). Pacemaker lets you specify a fencing "topology" with multiple devices -- level 1 would be the IPMI, and level 2 would be the fallback device. qdevice helps with quorum, which would let one side attempt to fence the other, but it doesn't affect whether the fencing succeeds. With a two-node cluster, you can use qdevice to get quorum, or you can use corosync's two_node option. -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Stonith hostname vs port vs plug
> The "plug" should match the name used by the hypervisor, not the actual host name (if they differ). I understand the difference between plug and hostname. I don't clearly understand which fence config is correct (I reffer to pcs stonith describe fence_...): the same entry on every node: pmck_host_map="node1:Centos1;node2:Centos2" or different entry at every node like this: port=Centos1 (on node1) port=Centos2 (on node2) ? Thanks, Regards 2017-07-31 14:28 GMT+02:00 Digimer : > On 2017-07-31 03:18 AM, ArekW wrote: > > Hi, I'm confused how to properly set stonith when a hostname is > > different than port/plug name. I have 2 vms on vbox/vmware with > > hostnames: node1, node2. The port's names are: Centos1, Centos2. > > According to my understanding the stonith device must know which vm to > > control (each other) so I set: > > pmck_host_map="node1:Centos1;node2:Centos2" and it seems to work good, > > however documentation describes port as a decimal "port number"(?). > > Would it be correct to use something like pmck_host_list="node1 > > node2"? But how the fence device will combine the hostname with port > > (or plug)? I presume that node1 must somehow know that node2's plug is > > Centos2, otherwise It could reboot itself (?) > > Thank you. > > The "plug" should match the name used by the hypervisor, not the actual > host name (if they differ). > > > -- > Digimer > Papers and Projects: https://alteeve.com/w/ > "I am, somehow, less interested in the weight and convolutions of > Einstein’s brain than in the near certainty that people of equal talent > have lived and died in cotton fields and sweatshops." - Stephen Jay Gould > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Stonith hostname vs port vs plug
On 2017-07-31 03:18 AM, ArekW wrote: > Hi, I'm confused how to properly set stonith when a hostname is > different than port/plug name. I have 2 vms on vbox/vmware with > hostnames: node1, node2. The port's names are: Centos1, Centos2. > According to my understanding the stonith device must know which vm to > control (each other) so I set: > pmck_host_map="node1:Centos1;node2:Centos2" and it seems to work good, > however documentation describes port as a decimal "port number"(?). > Would it be correct to use something like pmck_host_list="node1 > node2"? But how the fence device will combine the hostname with port > (or plug)? I presume that node1 must somehow know that node2's plug is > Centos2, otherwise It could reboot itself (?) > Thank you. The "plug" should match the name used by the hypervisor, not the actual host name (if they differ). -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Stonith hostname vs port vs plug
Hi, I'm confused how to properly set stonith when a hostname is different than port/plug name. I have 2 vms on vbox/vmware with hostnames: node1, node2. The port's names are: Centos1, Centos2. According to my understanding the stonith device must know which vm to control (each other) so I set: pmck_host_map="node1:Centos1;node2:Centos2" and it seems to work good, however documentation describes port as a decimal "port number"(?). Would it be correct to use something like pmck_host_list="node1 node2"? But how the fence device will combine the hostname with port (or plug)? I presume that node1 must somehow know that node2's plug is Centos2, otherwise It could reboot itself (?) Thank you. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org