Re: [ClusterLabs] Redudant Ring Network failure

2020-06-09 Thread ROHWEDER-NEUBECK, MICHAEL (EXTERN)
Hi,

we are using unicast ("knet")

Greetings

Michael




Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa 
Aktiengesellschaft, Koeln, Registereintragung / Registration: Amtsgericht Koeln 
HR B 2168
Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board: Dr. 
Karl-Ludwig Kley
Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman), Thorsten 
Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef Kayser, Dr. Michael 
Niggemann


-Ursprüngliche Nachricht-
Von: Strahil Nikolov  
Gesendet: Dienstag, 9. Juni 2020 19:30
An: Cluster Labs - All topics related to open-source clustering welcomed 
; ROHWEDER-NEUBECK, MICHAEL (EXTERN) 

Betreff: Re: [ClusterLabs] Redudant Ring Network failure

Are you using multicast ?

Best Regards,
Strahil Nikolov

На 9 юни 2020 г. 10:28:25 GMT+03:00, "ROHWEDER-NEUBECK, MICHAEL (EXTERN)" 
 написа:
>Hello,
>We have massive problems with the redundant ring operation of our 
>Corosync / pacemaker 3 Node NFS clusters.
>
>Most of the nodes either have an entire ring offline or only 1 node in 
>a ring.
>Example: (Node1 Ring0 333 Ring1 n33 | Node2 Ring0 033 Ring1 3n3 | Node3
>Ring0 333 Ring 1 33n)
>
>corosync-cfgtool -R don't help
>All nodes are VMs that build the ring together using 2 VLANs.
>Which logs do you need to hopefully help me?
>
>Corosync Cluster Engine, version '3.0.1'
>Copyright (c) 2006-2018 Red Hat, Inc.
>Debian Buster
>
>
>--
>Mit freundlichen Grüßen
>  Michael Rohweder-Neubeck
>
>NSB GmbH – Nguyen Softwareentwicklung & Beratung GmbH Röntgenstraße 27
>D-64291 Darmstadt
>E-Mail:
>m...@nsb-software.dede%3cmailto:m...@nsb-software.de>>
>Manager: Van-Hien Nguyen, Jörg Jaspert
>USt-ID: DE 195 703 354; HRB 7131 Amtsgericht Darmstadt
>
>
>
>
>Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa 
>Aktiengesellschaft, Koeln, Registereintragung / Registration:
>Amtsgericht Koeln HR B 2168
>Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board: Dr.
>Karl-Ludwig Kley
>Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman), 
>Thorsten Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef 
>Kayser, Dr. Michael Niggemann
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] How to reload pacemaker_remote service?

2020-06-09 Thread Gilbert, Mike
Ken,

Thank you very much for the response. That method does allow us to reload the 
configs, but it's a bit heavy-handed for our use case. All we're wanting to do 
is rotate the log files. Is there any other mechanism we could use to achieve 
that goal?

Thanks!
Mike

On 6/9/20, 7:10 AM, "Users on behalf of Ken Gaillot" 
 wrote:

Currently it's not possible. However you should be able to put the
cluster into maintenance mode, restart pacemaker_remote, then take the
cluster out of maintenance mode.

Test it to be sure. I believe the connection resource might be marked
as failed, but the cluster should be able to reconnect and reprobe the
state of all resources without fencing.

On Tue, 2020-06-09 at 03:28 +, Gilbert, Mike wrote:
> Hello all,
>  
> We are running Pacemaker 1.1.21-4 and are trying to figure out how we
> can do the equivalent of “systemctl reload pacemaker_remote”. Does
> anyone know what signal needs to get sent to the pacemaker_remoted
> service to reload its config? Sending a SIGHUP appears to kill the
> process.
>  
> Thanks for any help!
> Mike
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Redudant Ring Network failure

2020-06-09 Thread Strahil Nikolov
It  will  be hard to guess if you are  using sctp or udp/udpu.
If possible  share  the corosync.conf  (you can remove sensitive data,  but  
make it meaningful).

Are you using a firewall ? If yes  check :
1. Node firewall is not blocking  the communication on the specific  interfaces
2. Verify with tcpdump that the heartbeats are received from the remote side.
3. Check for retransmissions or packet loss.

Usually you can find more details in the log specified in corosync.conf or in 
/var/log/messages (and also the journal).

Best Regards,
Strahil Nikolov

На 9 юни 2020 г. 21:11:02 GMT+03:00, "ROHWEDER-NEUBECK, MICHAEL (EXTERN)" 
 написа:
>Hi,
>
>we are using unicast ("knet")
>
>Greetings
>
>Michael
>
>
>
>
>Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa
>Aktiengesellschaft, Koeln, Registereintragung / Registration:
>Amtsgericht Koeln HR B 2168
>Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board: Dr.
>Karl-Ludwig Kley
>Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman),
>Thorsten Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef
>Kayser, Dr. Michael Niggemann
>
>
>-Ursprüngliche Nachricht-
>Von: Strahil Nikolov  
>Gesendet: Dienstag, 9. Juni 2020 19:30
>An: Cluster Labs - All topics related to open-source clustering
>welcomed ; ROHWEDER-NEUBECK, MICHAEL (EXTERN)
>
>Betreff: Re: [ClusterLabs] Redudant Ring Network failure
>
>Are you using multicast ?
>
>Best Regards,
>Strahil Nikolov
>
>На 9 юни 2020 г. 10:28:25 GMT+03:00, "ROHWEDER-NEUBECK, MICHAEL
>(EXTERN)"  написа:
>>Hello,
>>We have massive problems with the redundant ring operation of our 
>>Corosync / pacemaker 3 Node NFS clusters.
>>
>>Most of the nodes either have an entire ring offline or only 1 node in
>
>>a ring.
>>Example: (Node1 Ring0 333 Ring1 n33 | Node2 Ring0 033 Ring1 3n3 |
>Node3
>>Ring0 333 Ring 1 33n)
>>
>>corosync-cfgtool -R don't help
>>All nodes are VMs that build the ring together using 2 VLANs.
>>Which logs do you need to hopefully help me?
>>
>>Corosync Cluster Engine, version '3.0.1'
>>Copyright (c) 2006-2018 Red Hat, Inc.
>>Debian Buster
>>
>>
>>--
>>Mit freundlichen Grüßen
>>  Michael Rohweder-Neubeck
>>
>>NSB GmbH – Nguyen Softwareentwicklung & Beratung GmbH Röntgenstraße 27
>>D-64291 Darmstadt
>>E-Mail:
>>m...@nsb-software.de>de%3cmailto:m...@nsb-software.de>>
>>Manager: Van-Hien Nguyen, Jörg Jaspert
>>USt-ID: DE 195 703 354; HRB 7131 Amtsgericht Darmstadt
>>
>>
>>
>>
>>Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa 
>>Aktiengesellschaft, Koeln, Registereintragung / Registration:
>>Amtsgericht Koeln HR B 2168
>>Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board:
>Dr.
>>Karl-Ludwig Kley
>>Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman), 
>>Thorsten Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef 
>>Kayser, Dr. Michael Niggemann
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] How to reload pacemaker_remote service?

2020-06-09 Thread Ken Gaillot
On Tue, 2020-06-09 at 19:38 +, Gilbert, Mike wrote:
> Ken,
> 
> Thank you very much for the response. That method does allow us to
> reload the configs, but it's a bit heavy-handed for our use case. All
> we're wanting to do is rotate the log files. Is there any other
> mechanism we could use to achieve that goal?

Pacemaker ships with a logrotate snippet. If you installed a stock
package, your distribution should have installed that automatically.

If you're building yourself, just install extra/logrotate/pacemaker
from the pacemaker source to /etc/logrotate.d (assuming you're using
logrotate).

If you've changed the log file location, just edit the logrotate.d file
appropriately.

Pacemaker uses the "copytruncate" method of log rotation, so there's no
need for pacemaker to reopen the log after rotation.

> 
> Thanks!
> Mike
> 
> On 6/9/20, 7:10 AM, "Users on behalf of Ken Gaillot" <
> users-boun...@clusterlabs.org on behalf of kgail...@redhat.com>
> wrote:
> 
> Currently it's not possible. However you should be able to put
> the
> cluster into maintenance mode, restart pacemaker_remote, then
> take the
> cluster out of maintenance mode.
> 
> Test it to be sure. I believe the connection resource might be
> marked
> as failed, but the cluster should be able to reconnect and
> reprobe the
> state of all resources without fencing.
> 
> On Tue, 2020-06-09 at 03:28 +, Gilbert, Mike wrote:
> > Hello all,
> >  
> > We are running Pacemaker 1.1.21-4 and are trying to figure out
> how we
> > can do the equivalent of “systemctl reload pacemaker_remote”.
> Does
> > anyone know what signal needs to get sent to the
> pacemaker_remoted
> > service to reload its config? Sending a SIGHUP appears to kill
> the
> > process.
> >  
> > Thanks for any help!
> > Mike
> -- 
> Ken Gaillot 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Redudant Ring Network failure

2020-06-09 Thread Strahil Nikolov
Are you using multicast ?

Best Regards,
Strahil Nikolov

На 9 юни 2020 г. 10:28:25 GMT+03:00, "ROHWEDER-NEUBECK, MICHAEL (EXTERN)" 
 написа:
>Hello,
>We have massive problems with the redundant ring operation of our
>Corosync / pacemaker 3 Node NFS clusters.
>
>Most of the nodes either have an entire ring offline or only 1 node in
>a ring.
>Example: (Node1 Ring0 333 Ring1 n33 | Node2 Ring0 033 Ring1 3n3 | Node3
>Ring0 333 Ring 1 33n)
>
>corosync-cfgtool -R don't help
>All nodes are VMs that build the ring together using 2 VLANs.
>Which logs do you need to hopefully help me?
>
>Corosync Cluster Engine, version '3.0.1'
>Copyright (c) 2006-2018 Red Hat, Inc.
>Debian Buster
>
>
>--
>Mit freundlichen Grüßen
>  Michael Rohweder-Neubeck
>
>NSB GmbH – Nguyen Softwareentwicklung & Beratung GmbH Röntgenstraße 27
>D-64291 Darmstadt
>E-Mail:
>m...@nsb-software.de>
>Manager: Van-Hien Nguyen, Jörg Jaspert
>USt-ID: DE 195 703 354; HRB 7131 Amtsgericht Darmstadt
>
>
>
>
>Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa
>Aktiengesellschaft, Koeln, Registereintragung / Registration:
>Amtsgericht Koeln HR B 2168
>Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board: Dr.
>Karl-Ludwig Kley
>Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman),
>Thorsten Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef
>Kayser, Dr. Michael Niggemann
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Redudant Ring Network failure

2020-06-09 Thread ROHWEDER-NEUBECK, MICHAEL (EXTERN)
Hello,
We have massive problems with the redundant ring operation of our Corosync / 
pacemaker 3 Node NFS clusters.

Most of the nodes either have an entire ring offline or only 1 node in a ring.
Example: (Node1 Ring0 333 Ring1 n33 | Node2 Ring0 033 Ring1 3n3 | Node3 Ring0 
333 Ring 1 33n)

corosync-cfgtool -R don't help
All nodes are VMs that build the ring together using 2 VLANs.
Which logs do you need to hopefully help me?

Corosync Cluster Engine, version '3.0.1'
Copyright (c) 2006-2018 Red Hat, Inc.
Debian Buster


--
Mit freundlichen Grüßen
  Michael Rohweder-Neubeck

NSB GmbH – Nguyen Softwareentwicklung & Beratung GmbH Röntgenstraße 27
D-64291 Darmstadt
E-Mail: 
m...@nsb-software.de>
Manager: Van-Hien Nguyen, Jörg Jaspert
USt-ID: DE 195 703 354; HRB 7131 Amtsgericht Darmstadt




Sitz der Gesellschaft / Corporate Headquarters: Deutsche Lufthansa 
Aktiengesellschaft, Koeln, Registereintragung / Registration: Amtsgericht Koeln 
HR B 2168
Vorsitzender des Aufsichtsrats / Chairman of the Supervisory Board: Dr. 
Karl-Ludwig Kley
Vorstand / Executive Board: Carsten Spohr (Vorsitzender / Chairman), Thorsten 
Dirks, Christina Foerster, Harry Hohmeister, Dr. Detlef Kayser, Dr. Michael 
Niggemann

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] ethmonitor resource - for iface which does not exist yet - how?

2020-06-09 Thread Ken Gaillot
On Wed, 2020-06-03 at 12:33 +0100, lejeczek wrote:
> hi guys
> 
> I wonder about an idea of 'ethmonitor' watching a net iface
> which is of USB type. Such an iface which would physically
> roam between nodes.
> Would 'ethmonitor' work in such a case?
> I've tried and the resource actually got started (small
> two-node cluster) on the node which has USB iface physically
> absent?! That must be wrong, right?
> But if it's not and that how 'ethmonitor' works(the I got it
> wrong) then what else would be best resource agent for such
> a case of a USB net iface?
> 
> many thanks, L.

ethmonitor sets a node attribute for the node it's running on, marking
whether the interface is active on that node.

It's typically configured as a clone running on all nodes, monitoring
the same interface name.

These days, depending on distribution, interface names for removable
devices might change from boot to boot by default. But there's
generally a way to assign a permanent interface name via udev.

As long as your interface name stays the same, you can use it for what
you want -- configure it as a clone and you will get node attributes on
every node (named ethmonitor- by default). You can then
use location constraints with an attribute-based rule to keep resources
where the interface is. See the man page for examples:

https://www.mankier.com/7/ocf_heartbeat_ethmonitor

-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] How to reload pacemaker_remote service?

2020-06-09 Thread Ken Gaillot
Currently it's not possible. However you should be able to put the
cluster into maintenance mode, restart pacemaker_remote, then take the
cluster out of maintenance mode.

Test it to be sure. I believe the connection resource might be marked
as failed, but the cluster should be able to reconnect and reprobe the
state of all resources without fencing.

On Tue, 2020-06-09 at 03:28 +, Gilbert, Mike wrote:
> Hello all,
>  
> We are running Pacemaker 1.1.21-4 and are trying to figure out how we
> can do the equivalent of “systemctl reload pacemaker_remote”. Does
> anyone know what signal needs to get sent to the pacemaker_remoted
> service to reload its config? Sending a SIGHUP appears to kill the
> process.
>  
> Thanks for any help!
> Mike
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] How to reload pacemaker_remote service?

2020-06-09 Thread Gilbert, Mike
Hello all,

We are running Pacemaker 1.1.21-4 and are trying to figure out how we can do 
the equivalent of “systemctl reload pacemaker_remote”. Does anyone know what 
signal needs to get sent to the pacemaker_remoted service to reload its config? 
Sending a SIGHUP appears to kill the process.

Thanks for any help!
Mike
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/