[ClusterLabs] Antw: After reboot each node thinks the other is offline.

2017-07-31 Thread Ulrich Windl
>>> "Stephen Carville (HA List)" <62d2a...@opayq.com> schrieb am 31.07.2017 um
20:17 in Nachricht :
> I am experimenting with pacemaker for high availability for some load
> balancers.  I was able to sucessfully get two CentOS (6.9) machines
> (scahadev01da and scahadev01db) to form a cluster and the shared IP was
> assigned to scahadev01da.  I simulated a failure by halting the primary
> and the secondary eventually noticed bringing up the shared IP on its
> eth0.  So far, so good.
> 
> A problem arises when the primary comes back up and, for some reason,
> each node thinks the other is offline.  This leads to both nodes adding

If a node thinks the other is unexpectedly offline, it will fence it, and then 
it will be offline! Thus the IP can't run there. I guess you have no fencing 
configured, right?

Regards,
Ulrich

> the duplicate IP to its own eth0.  I probably do not need to tell you
> the mischief that can cause if these were production servers.
> 
> I tried restarting cman, pcsd and pacemaker on both machines with no
> effect on the situation.
> 
> I've found several mentions of it in the search engines but I've been
> unable to find how to fix it.  Any help is appreciated
> 
> Both nodes have quorum disabled in /etc/sysconfig/cman
> 
> CMAN_QUORUM_TIMEOUT=0
> 
> #
> Node 1
> 
> scahadev01da# sudo pcs status
> Cluster name: scahadev01d
> Stack: cman
> Current DC: scahadev01da (version 1.1.15-5.el6-e174ec8) - partition
> WITHOUT quorum
> Last updated: Mon Jul 31 10:43:54 2017Last change: Mon Jul 31 
> 10:30:46
> 2017 by root via cibadmin on scahadev01da
> 
> 2 nodes and 1 resource configured
> 
> Online: [ scahadev01da ]
> OFFLINE: [ scahadev01db ]
> 
> Full list of resources:
> 
>  VirtualIP(ocf::heartbeat:IPaddr2):   Started scahadev01da
> 
> Daemon Status:
>   cman: active/enabled
>   corosync: active/disabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> #
> Node 2
> 
> scahadev01db ~]$ sudo pcs status
> Cluster name: scahadev01d
> Stack: cman
> Current DC: scahadev01db (version 1.1.15-5.el6-e174ec8) - partition
> WITHOUT quorum
> Last updated: Mon Jul 31 10:43:47 2017Last change: Sat Jul 29 
> 13:45:15
> 2017 by root via cibadmin on scahadev01da
> 
> 2 nodes and 1 resource configured
> 
> Online: [ scahadev01db ]
> OFFLINE: [ scahadev01da ]
> 
> Full list of resources:
> 
>  VirtualIP(ocf::heartbeat:IPaddr2):   Started scahadev01db
> 
> Daemon Status:
>   cman: active/enabled
>   corosync: active/disabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
> 
> --
> Stephen Carville
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: fence_vmware_soap: reads VM status but fails to reboot/on/off

2017-07-31 Thread Ulrich Windl
>>> Octavian Ciobanu  schrieb am 31.07.2017 um 20:16 in
Nachricht
:
> Hello,
> 
> Before I implement the cluster I'm testing the fence agents and I got stuck
> at the rebooting the VMware based VMs.
> 
> I have installed VMware ESXi 6.5 Hypervisor with 5 VMs.
> 
> If I call :
> # fence_vmware_soap --ssl --ip esxi_ip --username root --password pass
> --action list
> I get the list with the names and UUIDs of the VMs.
> 
> If I call :
> # fence_vmware_soap --ssl --ip esxi_ip --username root --password pass
> --action status --plug "564d5bce-3c55-2b02-1a8b-052c1fd24d6d"
> I get the status of the VM.
> 
> But when I call any of the power actions (on, off, reboot) I get "Failed:
> Timed out waiting to power OFF".
> 
> I've tried with all the combinations of --power-timeout and --power-wait
> and same error without any change in the response time.
> 
> Any ideas from where or how to fix this issue ?

I suspect "power off" is actually a virtual press of the ACPI power button 
(reboot likewise), so your VM tries to shut down cleanly. That could take time, 
and it could hang (I guess). I don't use VMware, but maybe there's a "reset" 
action that presses the virtual reset button of the virtual hardware... ;-)

Regards,
Ulrich

> 
> Thank you in advance.





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: DRBD AND cLVM ???

2017-07-31 Thread Ulrich Windl
>>> "Lentes, Bernd"  schrieb am 31.07.2017
um
18:51 in Nachricht
<641329685.12981098.1501519915026.javamail.zim...@helmholtz-muenchen.de>:
> Hi,
> 
> i'm currently a bit confused. I have several resources running as 
> VirtualDomains, the vm reside on plain logical volumes without fs, these
lv's 
> reside themself on a FC SAN. 
> In that scenario i need cLVM to distribute the lvm metadata between the 
> nodes.
> For playing around a bit and getting used to it i created a DRBD partition.

> It resides on a logical volume (one on each node), which should be possible

> following the documentation on linbit.
> The lv's reside each on a node on the local storage, not on the SAN (which 
> would be a very strange configuration).

So you use cLVM to crate local VGs, and you use DRBD to sync the local LVs?
Why don't you use the shared SAN?

> But nevertheless it's a cLVM configuration. I don't think it's possible to 
> have a cLVM and non-cLVM configuration at the same time on the same node.

YOu can definitely have clustered and non-clustered VGs on one node.

> Is that possible what i try to do ?

I'm still wonderinh what you really want to accieve.

Regards,
Ulrich

> 
> 
> Bernd
> 
> 
> -- 
> Bernd Lentes 
> 
> Systemadministration 
> institute of developmental genetics 
> Gebäude 35.34 - Raum 208 
> HelmholtzZentrum München 
> bernd.len...@helmholtz-muenchen.de 
> phone: +49 (0)89 3187 1241 
> fax: +49 (0)89 3187 2294 
> 
> no backup - no mercy
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de 
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] After reboot each node thinks the other is offline.

2017-07-31 Thread Stephen Carville (HA List)
I am experimenting with pacemaker for high availability for some load
balancers.  I was able to sucessfully get two CentOS (6.9) machines
(scahadev01da and scahadev01db) to form a cluster and the shared IP was
assigned to scahadev01da.  I simulated a failure by halting the primary
and the secondary eventually noticed bringing up the shared IP on its
eth0.  So far, so good.

A problem arises when the primary comes back up and, for some reason,
each node thinks the other is offline.  This leads to both nodes adding
the duplicate IP to its own eth0.  I probably do not need to tell you
the mischief that can cause if these were production servers.

I tried restarting cman, pcsd and pacemaker on both machines with no
effect on the situation.

I've found several mentions of it in the search engines but I've been
unable to find how to fix it.  Any help is appreciated

Both nodes have quorum disabled in /etc/sysconfig/cman

CMAN_QUORUM_TIMEOUT=0

#
Node 1

scahadev01da# sudo pcs status
Cluster name: scahadev01d
Stack: cman
Current DC: scahadev01da (version 1.1.15-5.el6-e174ec8) - partition
WITHOUT quorum
Last updated: Mon Jul 31 10:43:54 2017  Last change: Mon Jul 31 10:30:46
2017 by root via cibadmin on scahadev01da

2 nodes and 1 resource configured

Online: [ scahadev01da ]
OFFLINE: [ scahadev01db ]

Full list of resources:

 VirtualIP  (ocf::heartbeat:IPaddr2):   Started scahadev01da

Daemon Status:
  cman: active/enabled
  corosync: active/disabled
  pacemaker: active/enabled
  pcsd: active/enabled

#
Node 2

scahadev01db ~]$ sudo pcs status
Cluster name: scahadev01d
Stack: cman
Current DC: scahadev01db (version 1.1.15-5.el6-e174ec8) - partition
WITHOUT quorum
Last updated: Mon Jul 31 10:43:47 2017  Last change: Sat Jul 29 13:45:15
2017 by root via cibadmin on scahadev01da

2 nodes and 1 resource configured

Online: [ scahadev01db ]
OFFLINE: [ scahadev01da ]

Full list of resources:

 VirtualIP  (ocf::heartbeat:IPaddr2):   Started scahadev01db

Daemon Status:
  cman: active/enabled
  corosync: active/disabled
  pacemaker: active/enabled
  pcsd: active/enabled

--
Stephen Carville

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] fence_vmware_soap: reads VM status but fails to reboot/on/off

2017-07-31 Thread Octavian Ciobanu
Hello,

Before I implement the cluster I'm testing the fence agents and I got stuck
at the rebooting the VMware based VMs.

I have installed VMware ESXi 6.5 Hypervisor with 5 VMs.

If I call :
# fence_vmware_soap --ssl --ip esxi_ip --username root --password pass
--action list
I get the list with the names and UUIDs of the VMs.

If I call :
# fence_vmware_soap --ssl --ip esxi_ip --username root --password pass
--action status --plug "564d5bce-3c55-2b02-1a8b-052c1fd24d6d"
I get the status of the VM.

But when I call any of the power actions (on, off, reboot) I get "Failed:
Timed out waiting to power OFF".

I've tried with all the combinations of --power-timeout and --power-wait
and same error without any change in the response time.

Any ideas from where or how to fix this issue ?

Thank you in advance.
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] DRBD AND cLVM ???

2017-07-31 Thread Digimer
On 2017-07-31 12:51 PM, Lentes, Bernd wrote:
> Hi,
> 
> i'm currently a bit confused. I have several resources running as 
> VirtualDomains, the vm reside on plain logical volumes without fs, these lv's 
> reside themself on a FC SAN. 
> In that scenario i need cLVM to distribute the lvm metadata between the nodes.
> For playing around a bit and getting used to it i created a DRBD partition. 
> It resides on a logical volume (one on each node), which should be possible 
> following the documentation on linbit.
> The lv's reside each on a node on the local storage, not on the SAN (which 
> would be a very strange configuration).
> But nevertheless it's a cLVM configuration. I don't think it's possible to 
> have a cLVM and non-cLVM configuration at the same time on the same node.
> Is that possible what i try to do ?
> 
> Bernd

It's not recommended, but it is possible with creative use of filter =
[] in lvm.conf. I've not done it myself, mind you. As far as clvmd on
DRBD, to LVM, it's no different if the block device is a SAN LUN or
DRBD... It only cares that a changed block/inode on one side is the same
on the other.


-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] DRBD AND cLVM ???

2017-07-31 Thread Lentes, Bernd
Hi,

i'm currently a bit confused. I have several resources running as 
VirtualDomains, the vm reside on plain logical volumes without fs, these lv's 
reside themself on a FC SAN. 
In that scenario i need cLVM to distribute the lvm metadata between the nodes.
For playing around a bit and getting used to it i created a DRBD partition. It 
resides on a logical volume (one on each node), which should be possible 
following the documentation on linbit.
The lv's reside each on a node on the local storage, not on the SAN (which 
would be a very strange configuration).
But nevertheless it's a cLVM configuration. I don't think it's possible to have 
a cLVM and non-cLVM configuration at the same time on the same node.
Is that possible what i try to do ?


Bernd


-- 
Bernd Lentes 

Systemadministration 
institute of developmental genetics 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum München 
bernd.len...@helmholtz-muenchen.de 
phone: +49 (0)89 3187 1241 
fax: +49 (0)89 3187 2294 

no backup - no mercy
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Two nodes cluster issue

2017-07-31 Thread Ken Gaillot
Please ignore my re-reply to the original message, I'm in the middle of
a move and am getting by on little sleep at the moment :-)

On Mon, 2017-07-31 at 09:26 -0500, Ken Gaillot wrote:
> On Mon, 2017-07-24 at 11:51 +, Tomer Azran wrote:
> > Hello,
> > 
> >  
> > 
> > We built a pacemaker cluster with 2 physical servers.
> > 
> > We configured DRBD in Master\Slave setup, a floating IP and file
> > system mount in Active\Passive mode.
> > 
> > We configured two STONITH devices (fence_ipmilan), one for each
> > server.
> > 
> >  
> > 
> > We are trying to simulate a situation when the Master server crushes
> > with no power. 
> > 
> > We pulled both of the PSU cables and the server becomes offline
> > (UNCLEAN).
> > 
> > The resources that the Master use to hold are now in Started (UNCLEAN)
> > state.
> > 
> > The state is unclean since the STONITH failed (the STONITH device is
> > located on the server (Intel RMM4 - IPMI) – which uses the same power
> > supply). 
> > 
> >  
> > 
> > The problem is that now, the cluster does not releasing the resources
> > that the Master holds, and the service goes down.
> > 
> >  
> > 
> > Is there any way to overcome this situation? 
> > 
> > We tried to add a qdevice but got the same results.
> > 
> >  
> > 
> > We are using pacemaker 1.1.15 on CentOS 7.3
> > 
> >  
> > 
> > Thanks,
> > 
> > Tomer.
> 
> This is a limitation of using IPMI as the only fence device, when the
> IPMI shares power with the main system. The way around it is to use a
> fallback fence device, for example a switched power unit or sbd
> (watchdog). Pacemaker lets you specify a fencing "topology" with
> multiple devices -- level 1 would be the IPMI, and level 2 would be the
> fallback device.
> 
> qdevice helps with quorum, which would let one side attempt to fence the
> other, but it doesn't affect whether the fencing succeeds. With a
> two-node cluster, you can use qdevice to get quorum, or you can use
> corosync's two_node option.
> 

-- 
Ken Gaillot 





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Two nodes cluster issue

2017-07-31 Thread Ken Gaillot
On Mon, 2017-07-24 at 11:51 +, Tomer Azran wrote:
> Hello,
> 
>  
> 
> We built a pacemaker cluster with 2 physical servers.
> 
> We configured DRBD in Master\Slave setup, a floating IP and file
> system mount in Active\Passive mode.
> 
> We configured two STONITH devices (fence_ipmilan), one for each
> server.
> 
>  
> 
> We are trying to simulate a situation when the Master server crushes
> with no power. 
> 
> We pulled both of the PSU cables and the server becomes offline
> (UNCLEAN).
> 
> The resources that the Master use to hold are now in Started (UNCLEAN)
> state.
> 
> The state is unclean since the STONITH failed (the STONITH device is
> located on the server (Intel RMM4 - IPMI) – which uses the same power
> supply). 
> 
>  
> 
> The problem is that now, the cluster does not releasing the resources
> that the Master holds, and the service goes down.
> 
>  
> 
> Is there any way to overcome this situation? 
> 
> We tried to add a qdevice but got the same results.
> 
>  
> 
> We are using pacemaker 1.1.15 on CentOS 7.3
> 
>  
> 
> Thanks,
> 
> Tomer.

This is a limitation of using IPMI as the only fence device, when the
IPMI shares power with the main system. The way around it is to use a
fallback fence device, for example a switched power unit or sbd
(watchdog). Pacemaker lets you specify a fencing "topology" with
multiple devices -- level 1 would be the IPMI, and level 2 would be the
fallback device.

qdevice helps with quorum, which would let one side attempt to fence the
other, but it doesn't affect whether the fencing succeeds. With a
two-node cluster, you can use qdevice to get quorum, or you can use
corosync's two_node option.

-- 
Ken Gaillot 





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stonith hostname vs port vs plug

2017-07-31 Thread ArekW
> The "plug" should match the name used by the hypervisor, not the actual
host name (if they differ).
I understand the difference between plug and hostname. I don't clearly
understand which fence config is correct (I reffer to pcs stonith describe
fence_...):

the same entry on every node:
pmck_host_map="node1:Centos1;node2:Centos2"

or different entry at every node like this:

port=Centos1 (on node1)
port=Centos2 (on node2)
?
Thanks,
Regards

2017-07-31 14:28 GMT+02:00 Digimer :

> On 2017-07-31 03:18 AM, ArekW wrote:
> > Hi, I'm confused how to properly set stonith when a hostname is
> > different than port/plug name. I have 2 vms on vbox/vmware with
> > hostnames: node1, node2. The port's names are: Centos1, Centos2.
> > According to my understanding the stonith device must know which vm to
> > control (each other) so I set:
> > pmck_host_map="node1:Centos1;node2:Centos2" and it seems to work good,
> > however documentation describes port as a decimal "port number"(?).
> > Would it be correct to use something like pmck_host_list="node1
> > node2"? But how the fence device will combine the hostname with port
> > (or plug)? I presume that node1 must somehow know that node2's plug is
> > Centos2, otherwise It could reboot itself (?)
> > Thank you.
>
> The "plug" should match the name used by the hypervisor, not the actual
> host name (if they differ).
>
>
> --
> Digimer
> Papers and Projects: https://alteeve.com/w/
> "I am, somehow, less interested in the weight and convolutions of
> Einstein’s brain than in the near certainty that people of equal talent
> have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stonith hostname vs port vs plug

2017-07-31 Thread Digimer
On 2017-07-31 03:18 AM, ArekW wrote:
> Hi, I'm confused how to properly set stonith when a hostname is
> different than port/plug name. I have 2 vms on vbox/vmware with
> hostnames: node1, node2. The port's names are: Centos1, Centos2.
> According to my understanding the stonith device must know which vm to
> control (each other) so I set:
> pmck_host_map="node1:Centos1;node2:Centos2" and it seems to work good,
> however documentation describes port as a decimal "port number"(?).
> Would it be correct to use something like pmck_host_list="node1
> node2"? But how the fence device will combine the hostname with port
> (or plug)? I presume that node1 must somehow know that node2's plug is
> Centos2, otherwise It could reboot itself (?)
> Thank you.

The "plug" should match the name used by the hypervisor, not the actual
host name (if they differ).


-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Stonith hostname vs port vs plug

2017-07-31 Thread ArekW
Hi, I'm confused how to properly set stonith when a hostname is
different than port/plug name. I have 2 vms on vbox/vmware with
hostnames: node1, node2. The port's names are: Centos1, Centos2.
According to my understanding the stonith device must know which vm to
control (each other) so I set:
pmck_host_map="node1:Centos1;node2:Centos2" and it seems to work good,
however documentation describes port as a decimal "port number"(?).
Would it be correct to use something like pmck_host_list="node1
node2"? But how the fence device will combine the hostname with port
(or plug)? I presume that node1 must somehow know that node2's plug is
Centos2, otherwise It could reboot itself (?)
Thank you.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org