Re: [ClusterLabs] cluster problems after let's encrypt

2020-07-06 Thread Andrei Borzenkov
06.07.2020 19:13, fatcha...@gmx.de пишет:
> Hi,
> 
> I'm running a two node corosync httpd-cluster on a CentOS 7.
> corosync-2.4.5-4.el7.x86_64
> pcs-0.9.168-4.el7.centos.x86_64
> Today I used lets encrypt to installt https for two domains on that system.
> After that the node with the new https-domains is not longer able to hold the 
> apache resource:
> 
> The resource is configured like this:
> Resource: apache (class=ocf provider=heartbeat type=apache)
> Attributes: configfile=/etc/httpd/conf/httpd.conf 
> statusurl=http://127.0.0.1:8089/server-status
> Operations: start interval=0s timeout=40s (apache-start-interval-0s)
> stop interval=0s timeout=60s (apache-stop-interval-0s)
> monitor interval=1min (apache-monitor-interval-1min)
> 
> The status-page is configured like this:
> Listen 127.0.0.1:8089
> 
> SetHandler server-status
> Order deny,allow
> Deny from all
> Allow from 127.0.0.1
> 
> And the log shows this:
> Jul 6 16:55:18 bachi2 apache(apache)[7182]: INFO: waiting for apache 
> /etc/httpd/conf/httpd.conf to come up
> Jul 6 16:55:19 bachi2 apache(apache)[7182]: INFO: apache not running


Your apache server does not start. This is outside of pacemaker scope.
Examine apache logs, try to run server manually with the same options etc.

> Jul 6 16:55:19 bachi2 apache(apache)[7182]: INFO: waiting for apache 
> /etc/httpd/conf/httpd.conf to come up
> Jul 6 16:55:20 bachi2 apache(apache)[7182]: INFO: apache not running
> Jul 6 16:55:20 bachi2 apache(apache)[7182]: INFO: waiting for apache 
> /etc/httpd/conf/httpd.conf to come up
> Jul 6 16:55:21 bachi2 lrmd[8756]: warning: apache_start_0 process (PID 7182) 
> timed out
> Jul 6 16:55:21 bachi2 lrmd[8756]: warning: apache_start_0:7182 - timed out 
> after 4ms
> Jul 6 16:55:21 bachi2 crmd[8759]: error: Result of start operation for apache 
> on bachi2: Timed Out
> Jul 6 16:55:21 bachi2 apache(apache)[8416]: INFO: apache is not running.
> 
> Any suggestions are welcome
> 
> Best regards
> 
> fatcharly
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] cluster problems after let's encrypt

2020-07-06 Thread fatcharly
Hi,

I'm running a two node corosync httpd-cluster on a CentOS 7.
corosync-2.4.5-4.el7.x86_64
pcs-0.9.168-4.el7.centos.x86_64
Today I used lets encrypt to installt https for two domains on that system.
After that the node with the new https-domains is not longer able to hold the 
apache resource:

The resource is configured like this:
Resource: apache (class=ocf provider=heartbeat type=apache)
Attributes: configfile=/etc/httpd/conf/httpd.conf 
statusurl=http://127.0.0.1:8089/server-status
Operations: start interval=0s timeout=40s (apache-start-interval-0s)
stop interval=0s timeout=60s (apache-stop-interval-0s)
monitor interval=1min (apache-monitor-interval-1min)

The status-page is configured like this:
Listen 127.0.0.1:8089

SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1

And the log shows this:
Jul 6 16:55:18 bachi2 apache(apache)[7182]: INFO: waiting for apache 
/etc/httpd/conf/httpd.conf to come up
Jul 6 16:55:19 bachi2 apache(apache)[7182]: INFO: apache not running
Jul 6 16:55:19 bachi2 apache(apache)[7182]: INFO: waiting for apache 
/etc/httpd/conf/httpd.conf to come up
Jul 6 16:55:20 bachi2 apache(apache)[7182]: INFO: apache not running
Jul 6 16:55:20 bachi2 apache(apache)[7182]: INFO: waiting for apache 
/etc/httpd/conf/httpd.conf to come up
Jul 6 16:55:21 bachi2 lrmd[8756]: warning: apache_start_0 process (PID 7182) 
timed out
Jul 6 16:55:21 bachi2 lrmd[8756]: warning: apache_start_0:7182 - timed out 
after 4ms
Jul 6 16:55:21 bachi2 crmd[8759]: error: Result of start operation for apache 
on bachi2: Timed Out
Jul 6 16:55:21 bachi2 apache(apache)[8416]: INFO: apache is not running.

Any suggestions are welcome

Best regards

fatcharly
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-06 Thread Strahil Nikolov
As far as  I know  fence_xvm supports multiple  hosts, but you need  to open 
the port on both Hypervisour  (udp)  and Guest (tcp). 'fence_xvm -o list' 
should provide  a list of VMs from all hosts that responded (and have the key).
Usually,  the biggest problem is the multicast traffic - as in many 
environments it can be dropped  by firewalls.

Best  Regards,
Strahil Nikolov

На 6 юли 2020 г. 12:24:08 GMT+03:00, Klaus Wenninger  
написа:
>On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote:
>> Hello,
>>
>> >> # fence_xvm -o list
>> >> kvm102  
>bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>> >> on
>>
>> >This should show both VMs, so getting to that point will likely
>solve
>> >your problem. fence_xvm relies on multicast, there could be some
>> >obscure network configuration to get that working on the VMs.
>You said you tried on both hosts. What does 'virsh list'
>give you onthe 2 hosts? Hopefully different names for
>the VMs ...
>Did you try 'fence_xvm -a {mcast-ip} -o list' on the
>guests as well?
>Did you try pinging via the physical network that is
>connected tothe bridge configured to be used for
>fencing?
>If I got it right fence_xvm should supportcollecting
>answersfrom multiple hosts but I found a suggestion
>to do a setup with 2 multicast-addresses & keys for
>each host.
>Which route did you go?
>
>Klaus
>>
>> Thank you for pointing me in that direction. We have tried to solve
>> that but with no success. We were using an howto provided here
>> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>>
>> Problem is, it specifically states that the tutorial does not yet
>> support the case where guests are running on multiple hosts. There
>are
>> some short hints what might be necessary to do, but working through
>> those sadly just did not work nor where there any clues which would
>> help us finding a solution ourselves. So now we are completely stuck
>> here.
>>
>> Has someone the same configuration with Guest VMs on multiple hosts?
>> And how did you manage to get that to work? What do we need to do to
>> resolve this? Is there maybe even someone who would be willing to
>take
>> a closer look at our server? Any help would be greatly appreciated!
>>
>> Kind regards
>> Stefan Schmitz
>>
>>
>>
>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>>> On Thu, 2020-07-02 at 17:18 +0200,
>stefan.schm...@farmpartner-tec.com
>>> wrote:
 Hello,

 I hope someone can help with this problem. We are (still) trying to
 get
 Stonith to achieve a running active/active HA Cluster, but sadly to
 no
 avail.

 There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
 The
 Ubuntu VMs are the ones which should form the HA Cluster.

 The current status is this:

 # pcs status
 Cluster name: pacemaker_cluster
 WARNING: corosync and pacemaker node names do not match (IPs used
>in
 setup?)
 Stack: corosync
 Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
 with
 quorum
 Last updated: Thu Jul  2 17:03:53 2020
 Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
 server4ubuntu1

 2 nodes configured
 13 resources configured

 Online: [ server2ubuntu1 server4ubuntu1 ]

 Full list of resources:

    stonith_id_1   (stonith:external/libvirt): Stopped
    Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
    Masters: [ server4ubuntu1 ]
    Slaves: [ server2ubuntu1 ]
    Master/Slave Set: WebDataClone [WebData]
    Masters: [ server2ubuntu1 server4ubuntu1 ]
    Clone Set: dlm-clone [dlm]
    Started: [ server2ubuntu1 server4ubuntu1 ]
    Clone Set: ClusterIP-clone [ClusterIP] (unique)
    ClusterIP:0    (ocf::heartbeat:IPaddr2):   Started
 server2ubuntu1
    ClusterIP:1    (ocf::heartbeat:IPaddr2):   Started
 server4ubuntu1
    Clone Set: WebFS-clone [WebFS]
    Started: [ server4ubuntu1 ]
    Stopped: [ server2ubuntu1 ]
    Clone Set: WebSite-clone [WebSite]
    Started: [ server4ubuntu1 ]
    Stopped: [ server2ubuntu1 ]

 Failed Actions:
 * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
 call=201,
 status=Error, exitreason='',
   last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
 exec=3403ms
 * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8):
 call=203,
 status=complete, exitreason='',
   last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms,
>exec=0ms
 * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
 call=202,
 status=Error, exitreason='',
   last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
 exec=3411ms


 The stonith resoursce is stopped and does not seem to work.
 On both hosts the command
 # fence_xvm -o list
 kvm102  
>bab3749c-15fc-40b

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-06 Thread Klaus Wenninger
On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote:
> Hello,
>
> >> # fence_xvm -o list
> >> kvm102   bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
> >> on
>
> >This should show both VMs, so getting to that point will likely solve
> >your problem. fence_xvm relies on multicast, there could be some
> >obscure network configuration to get that working on the VMs.
You said you tried on both hosts. What does 'virsh list'
give you onthe 2 hosts? Hopefully different names for
the VMs ...
Did you try 'fence_xvm -a {mcast-ip} -o list' on the
guests as well?
Did you try pinging via the physical network that is
connected tothe bridge configured to be used for
fencing?
If I got it right fence_xvm should supportcollecting
answersfrom multiple hosts but I found a suggestion
to do a setup with 2 multicast-addresses & keys for
each host.
Which route did you go?

Klaus
>
> Thank you for pointing me in that direction. We have tried to solve
> that but with no success. We were using an howto provided here
> https://wiki.clusterlabs.org/wiki/Guest_Fencing
>
> Problem is, it specifically states that the tutorial does not yet
> support the case where guests are running on multiple hosts. There are
> some short hints what might be necessary to do, but working through
> those sadly just did not work nor where there any clues which would
> help us finding a solution ourselves. So now we are completely stuck
> here.
>
> Has someone the same configuration with Guest VMs on multiple hosts?
> And how did you manage to get that to work? What do we need to do to
> resolve this? Is there maybe even someone who would be willing to take
> a closer look at our server? Any help would be greatly appreciated!
>
> Kind regards
> Stefan Schmitz
>
>
>
> Am 03.07.2020 um 02:39 schrieb Ken Gaillot:
>> On Thu, 2020-07-02 at 17:18 +0200, stefan.schm...@farmpartner-tec.com
>> wrote:
>>> Hello,
>>>
>>> I hope someone can help with this problem. We are (still) trying to
>>> get
>>> Stonith to achieve a running active/active HA Cluster, but sadly to
>>> no
>>> avail.
>>>
>>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
>>> The
>>> Ubuntu VMs are the ones which should form the HA Cluster.
>>>
>>> The current status is this:
>>>
>>> # pcs status
>>> Cluster name: pacemaker_cluster
>>> WARNING: corosync and pacemaker node names do not match (IPs used in
>>> setup?)
>>> Stack: corosync
>>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
>>> with
>>> quorum
>>> Last updated: Thu Jul  2 17:03:53 2020
>>> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
>>> server4ubuntu1
>>>
>>> 2 nodes configured
>>> 13 resources configured
>>>
>>> Online: [ server2ubuntu1 server4ubuntu1 ]
>>>
>>> Full list of resources:
>>>
>>>    stonith_id_1   (stonith:external/libvirt): Stopped
>>>    Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>>>    Masters: [ server4ubuntu1 ]
>>>    Slaves: [ server2ubuntu1 ]
>>>    Master/Slave Set: WebDataClone [WebData]
>>>    Masters: [ server2ubuntu1 server4ubuntu1 ]
>>>    Clone Set: dlm-clone [dlm]
>>>    Started: [ server2ubuntu1 server4ubuntu1 ]
>>>    Clone Set: ClusterIP-clone [ClusterIP] (unique)
>>>    ClusterIP:0    (ocf::heartbeat:IPaddr2):   Started
>>> server2ubuntu1
>>>    ClusterIP:1    (ocf::heartbeat:IPaddr2):   Started
>>> server4ubuntu1
>>>    Clone Set: WebFS-clone [WebFS]
>>>    Started: [ server4ubuntu1 ]
>>>    Stopped: [ server2ubuntu1 ]
>>>    Clone Set: WebSite-clone [WebSite]
>>>    Started: [ server4ubuntu1 ]
>>>    Stopped: [ server2ubuntu1 ]
>>>
>>> Failed Actions:
>>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
>>> call=201,
>>> status=Error, exitreason='',
>>>   last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
>>> exec=3403ms
>>> * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8):
>>> call=203,
>>> status=complete, exitreason='',
>>>   last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
>>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
>>> call=202,
>>> status=Error, exitreason='',
>>>   last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
>>> exec=3411ms
>>>
>>>
>>> The stonith resoursce is stopped and does not seem to work.
>>> On both hosts the command
>>> # fence_xvm -o list
>>> kvm102   bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>>> on
>>
>> This should show both VMs, so getting to that point will likely solve
>> your problem. fence_xvm relies on multicast, there could be some
>> obscure network configuration to get that working on the VMs.
>>
>>> returns the local VM. Apparently it connects through the
>>> Virtualization
>>> interface because it returns the VM name not the Hostname of the
>>> client
>>> VM. I do not know if this is how it is supposed to work?
>>
>> Yes, fence_xvm knows only about the VM names.
>>
>> To get pacemaker to be able to use it for fencing the clus

Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-06 Thread stefan.schm...@farmpartner-tec.com

Hello,

>> # fence_xvm -o list
>> kvm102   bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
>> on

>This should show both VMs, so getting to that point will likely solve
>your problem. fence_xvm relies on multicast, there could be some
>obscure network configuration to get that working on the VMs.

Thank you for pointing me in that direction. We have tried to solve that 
but with no success. We were using an howto provided here 
https://wiki.clusterlabs.org/wiki/Guest_Fencing


Problem is, it specifically states that the tutorial does not yet 
support the case where guests are running on multiple hosts. There are 
some short hints what might be necessary to do, but working through 
those sadly just did not work nor where there any clues which would help 
us finding a solution ourselves. So now we are completely stuck here.


Has someone the same configuration with Guest VMs on multiple hosts? And 
how did you manage to get that to work? What do we need to do to resolve 
this? Is there maybe even someone who would be willing to take a closer 
look at our server? Any help would be greatly appreciated!


Kind regards
Stefan Schmitz



Am 03.07.2020 um 02:39 schrieb Ken Gaillot:

On Thu, 2020-07-02 at 17:18 +0200, stefan.schm...@farmpartner-tec.com
wrote:

Hello,

I hope someone can help with this problem. We are (still) trying to
get
Stonith to achieve a running active/active HA Cluster, but sadly to
no
avail.

There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
The
Ubuntu VMs are the ones which should form the HA Cluster.

The current status is this:

# pcs status
Cluster name: pacemaker_cluster
WARNING: corosync and pacemaker node names do not match (IPs used in
setup?)
Stack: corosync
Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
with
quorum
Last updated: Thu Jul  2 17:03:53 2020
Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
server4ubuntu1

2 nodes configured
13 resources configured

Online: [ server2ubuntu1 server4ubuntu1 ]

Full list of resources:

   stonith_id_1   (stonith:external/libvirt): Stopped
   Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
   Masters: [ server4ubuntu1 ]
   Slaves: [ server2ubuntu1 ]
   Master/Slave Set: WebDataClone [WebData]
   Masters: [ server2ubuntu1 server4ubuntu1 ]
   Clone Set: dlm-clone [dlm]
   Started: [ server2ubuntu1 server4ubuntu1 ]
   Clone Set: ClusterIP-clone [ClusterIP] (unique)
   ClusterIP:0(ocf::heartbeat:IPaddr2):   Started
server2ubuntu1
   ClusterIP:1(ocf::heartbeat:IPaddr2):   Started
server4ubuntu1
   Clone Set: WebFS-clone [WebFS]
   Started: [ server4ubuntu1 ]
   Stopped: [ server2ubuntu1 ]
   Clone Set: WebSite-clone [WebSite]
   Started: [ server4ubuntu1 ]
   Stopped: [ server2ubuntu1 ]

Failed Actions:
* stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
call=201,
status=Error, exitreason='',
  last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
exec=3403ms
* r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8):
call=203,
status=complete, exitreason='',
  last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
* stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
call=202,
status=Error, exitreason='',
  last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
exec=3411ms


The stonith resoursce is stopped and does not seem to work.
On both hosts the command
# fence_xvm -o list
kvm102   bab3749c-15fc-40b7-8b6c-d4267b9f0eb9
on


This should show both VMs, so getting to that point will likely solve
your problem. fence_xvm relies on multicast, there could be some
obscure network configuration to get that working on the VMs.


returns the local VM. Apparently it connects through the
Virtualization
interface because it returns the VM name not the Hostname of the
client
VM. I do not know if this is how it is supposed to work?


Yes, fence_xvm knows only about the VM names.

To get pacemaker to be able to use it for fencing the cluster nodes,
you have to add a pcmk_host_map parameter to the fencing resource. It
looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..."


In the local network, every traffic is allowed. No firewall is
locally
active, just the connections leaving the local network are
firewalled.
Hence there are no coneection problems between the hosts and clients.
For example we can succesfully connect from the clients to the Hosts:

# nc -z -v -u 192.168.1.21 1229
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.1.21:1229.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.

# nc -z -v -u 192.168.1.13 1229
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 192.168.1.13:1229.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.


On the Ubuntu VMs we created and configured the the stonith resource
according to the  howto pr