Re: [ClusterLabs] Fast-failover on 2 nodes + qnetd: qdevice connenction disrupted.

2024-05-03 Thread Jan Friesse
Hi, some of your findings are really interesting. On 02/05/2024 01:56, ale...@pavlyuts.ru wrote: Hi All, I am trying to build application-specific 2-node failover cluster using ubuntu 22, pacemaker 2.1.2 + corosync 3.1.6 and DRBD 9.2.9, knet transport. ... Also, I've done

Re: [ClusterLabs] corosync service stopping

2024-04-29 Thread Jan Friesse
Hi, I will reply just to "sysadmin" question: On 26/04/2024 14:43, Alexander Eastwood via Users wrote: Dear Reid, ... Why does the corosync log say ’shutdown by sysadmin’ when the shutdown was triggered by pacemaker? Isn’t this misleading? This basically means shutdown was triggered by

Re: [ClusterLabs] Pacemaker 2.1.7-rc2 now available

2023-11-27 Thread Jan Friesse
On 24/11/2023 09:18, Klaus Wenninger wrote: Hi all, Source code for the 2nd release candidate for Pacemaker version 2.1.7 is available at: https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.1.7-rc2 This is primarily a bug fix release. See the ChangeLog or the link above for

[ClusterLabs] Corosync 3.1.8 is available at corosync.org!

2023-11-15 Thread Jan Friesse
obsolete bindgen flag parser: Allow a non-breaking space as 'whitespace' rust: Remove some pointless casts config: Fail to start if ping timers are invalid man: Update the corosync_overview manpage rust: Improve vector initialisation Jan

[ClusterLabs] Booth 1.1 is available at GitHub!

2023-10-18 Thread Jan Friesse
sary macro configure: drop dead code configure: move exec_prefix sanitizer closer to prefix configure: drop unnecessary check and define rpm: use new package name for pacemaker devel on opensuse Jan Friesse (64): test: Allow test running as a root tes

Re: [ClusterLabs] Centreon HA Cluster - VIP issue

2023-09-04 Thread Jan Friesse
Hi, On 02/09/2023 17:16, Adil Bouazzaoui wrote: Hello, My name is Adil,i worked for Tman company, we are testing the Centreon HA cluster to monitor our infrastructure for 13 companies, for now we are using the 100 IT licence to test the platform, once everything is working fine then we can

Re: [ClusterLabs] updu transport support with corosync

2023-07-27 Thread Jan Friesse
Hi, On 24/07/2023 18:13, Abhijeet Singh wrote: Hello, We have a 2-node corosync/pacemaker cluster setup. We recently updated corosync from v2.3.4 to v.3.0.3. I have couple of questions related to corosync transport mechanism - 1. Found below article which indicates updu support might be

Re: [ClusterLabs] Corosync 3.1.5 Fails to Autostart

2023-04-25 Thread Jan Friesse
On 24/04/2023 22:16, Tyler Phillippe via Users wrote: Hello all, We are currently using RHEL9 and have set up a PCS cluster. When restarting the servers, we noticed Corosync 3.1.5 doesn't start properly with the below error message: Parse error in config: No valid name found for local host

Re: [ClusterLabs] Could not initialize corosync configuration API error 2

2023-04-03 Thread Jan Friesse
Hi, On 31/03/2023 11:36, S Sathish S wrote: Hi Team, Please find the corosync version. [root@node2 ~]# rpm -qa corosync corosync-2.4.4-2.el7.x86_64. RHEL 7 never got 2.4.4 - there was 2.4.3 in RHEL 7.7 and 2.4.5 in RHEL 7.8/7.9. Is this self compiled version? If so, please consider

Re: [ClusterLabs] Could not initialize corosync configuration API error 2

2023-03-31 Thread Jan Friesse
Hi, more information would be needed to really find out real reason, so: - double check corosync.conf (ip addresses) - check firewall (mainly local one) - what is the version of corosync - try to set debug:on (or trace) - paste config file - paste full log - since corosync was started Also keep

Re: [ClusterLabs] Totem decrypt with Wireshark

2023-03-31 Thread Jan Friesse
Hi, On 29/03/2023 08:51, Justino, Fabiana wrote: Hi, I have corosync version 3.1.7-1, encrypted totem messages and would like to know how to decrypt them. Tried to disable encryption with crypto_cipher set to No and crypto_hash set to No but it keeps encrypted. it's definitively not

Re: [ClusterLabs] corosync 2.4.4 version provide secure the communication by default

2023-03-27 Thread Jan Friesse
On 26/03/2023 12:42, S Sathish S wrote: Hi Jan, Hi, In Corosync which all scenario it send cpg message and what is impact if we are not secure communication. It really depends of what services are used, but generally speaking corosync without cpg is not super useful so I guess cpg is

[ClusterLabs] Corosync-qdevice 3.0.3 is available at GitHub!

2023-03-22 Thread Jan Friesse
loopback exists. This bug was introduced in version 3.0.1. Version 3.0.0 and previous versions shipped within corosync package are not affected. Complete changelog for 3.0.3: Jan Friesse (1): qdevice: Destroy non blocking client on failure Upgrade is highly recommended. Thanks

Re: [ClusterLabs] 2-Node cluster - both nodes unclean - can't start cluster

2023-03-13 Thread Jan Friesse
On 10/03/2023 22:29, Reid Wahl wrote: On Fri, Mar 10, 2023 at 10:49 AM Lentes, Bernd wrote: Hi, I don’t get my cluster running. I had problems with an OCFS2 Volume, both nodes have been fenced. When I do now a “systemctl start pacemaker.service”, crm_mon shows for a few seconds both nodes as

Re: [ClusterLabs] Migrated to corosync 3.x knet become default protocol

2023-01-30 Thread Jan Friesse
On 30/01/2023 10:16, Jan Friesse wrote: Hi, On 30/01/2023 07:14, S Sathish S via Users wrote: Hi Team, In our application we are currently using UDPU as transport protocol with single ring, while migrated to corosync 3.x knet become default protocol. We need to understand any maintenance

Re: [ClusterLabs] Migrated to corosync 3.x knet become default protocol

2023-01-30 Thread Jan Friesse
Hi, On 30/01/2023 07:14, S Sathish S via Users wrote: Hi Team, In our application we are currently using UDPU as transport protocol with single ring, while migrated to corosync 3.x knet become default protocol. We need to understand any maintenance overhead that any required certificate/key

Re: [ClusterLabs] corosync 2.4.4 version provide secure the communication by default

2023-01-23 Thread Jan Friesse
Honza Thanks and Regards, S Sathish S -Original Message- From: Jan Friesse Sent: 23 January 2023 14:50 To: Cluster Labs - All topics related to open-source clustering welcomed Cc: S Sathish S Subject: Re: [ClusterLabs] corosync 2.4.4 version provide secure the communication by default

Re: [ClusterLabs] Antw: [EXT] Re: corosync 2.4.4 version provide secure the communication by default

2023-01-23 Thread Jan Friesse
On 23/01/2023 12:51, Ulrich Windl wrote: Jan Friesse schrieb am 23.01.2023 um 10:20 in Nachricht : Hi, On 23/01/2023 01:37, S Sathish S via Users wrote: Hi Team, corosync 2.4.4 version provide mechanism to secure the communication path between nodes of a cluster by default? bcoz in our

Re: [ClusterLabs] corosync 2.4.4 version provide secure the communication by default

2023-01-23 Thread Jan Friesse
Hi, On 23/01/2023 01:37, S Sathish S via Users wrote: Hi Team, corosync 2.4.4 version provide mechanism to secure the communication path between nodes of a cluster by default? bcoz in our configuration secauth is turned off but still communication occur is encrypted. Note : Capture tcpdump

[ClusterLabs] Corosync 3.1.7 is available at corosync.org!

2022-11-15 Thread Jan Friesse
(1): Remove bashism from configure script Jan Friesse (6): totemudpu: Don't block local socketpair pkgconfig: Export corosysconfdir totempg: Fix alignment handling logrotate: Use copytruncate method by default configure: Modernize

[ClusterLabs] Corosync 2.4.6 is available at corosync.org!

2022-11-09 Thread Jan Friesse
level Jan Friesse (52): totem: Increase ring_id seq after load totempg: Check sanity (length) of received message totemsrp: Reduce MTU to left room second mcast qnetd: Rename qnetd-log.c to log.c qnetd: Fix double -d description qnetd

[ClusterLabs] Corosync-qdevice 3.0.2 is available at GitHub!

2022-11-03 Thread Jan Friesse
I am pleased to announce the latest maintenance release of Corosync-Qdevice 3.0.2 available immediately from GitHub at https://github.com/corosync/corosync-qdevice/releases as corosync-qdevice-3.0.2. This release contains important bug fixes. Complete changelog for 3.0.2: Jan Friesse (6

Re: [ClusterLabs] Antw: [EXT] Re: QDevice not found after reboot but appears after cluster restart

2022-08-01 Thread Jan Friesse
Hi, On 01/08/2022 16:18, john tillman wrote: "john tillman" schrieb am 29.07.2022 um 22:51 in Nachricht : On Thursday 28 July 2022 at 22:17:01, john tillman wrote: I have a two cluster setup with a qdevice. 'pcs quorum status' from a cluster node shows the qdevice casting a vote. On the

Re: [ClusterLabs] Question regarding the security of corosync

2022-06-22 Thread Jan Friesse
/crypto.c and other crypto*.c files). Honza For the fourth, I agree with Jan Friesse - a dedicated physical network is best; a dedicated VLAN is second best. Antony. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users

Re: [ClusterLabs] Question regarding the security of corosync

2022-06-21 Thread Jan Friesse
Hi Mario, On 17/06/2022 11:39, Mario Freytag wrote: Dear sirs, or madams, I’d like to ask about the security of corosync. We’re using a Proxmox HA setup in our testing environment and need to confirm it’s compliance with PCI guidelines. We have a few questions: Is the communication

Re: [ClusterLabs] No node name in corosync-cmapctl output

2022-06-01 Thread Jan Friesse
On 31/05/2022 16:28, Andreas Hasenack wrote: Hi, On Tue, May 31, 2022 at 1:35 PM Jan Friesse wrote: Hi, On 31/05/2022 15:16, Andreas Hasenack wrote: Hi, corosync 3.1.6 pacemaker 2.1.2 crmsh 4.3.1 TL;DR I only seem to get a "name" attribute in the "corosync-cmapctl | grep n

Re: [ClusterLabs] No node name in corosync-cmapctl output

2022-05-31 Thread Jan Friesse
On 31/05/2022 16:11, Ken Gaillot wrote: On Tue, 2022-05-31 at 13:16 +, Andreas Hasenack wrote: Hi, corosync 3.1.6 pacemaker 2.1.2 crmsh 4.3.1 TL;DR I only seem to get a "name" attribute in the "corosync-cmapctl | grep nodelist" output if I set an explicit name in corosync.conf's nodelist.

Re: [ClusterLabs] No node name in corosync-cmapctl output

2022-05-31 Thread Jan Friesse
Hi, On 31/05/2022 15:16, Andreas Hasenack wrote: Hi, corosync 3.1.6 pacemaker 2.1.2 crmsh 4.3.1 TL;DR I only seem to get a "name" attribute in the "corosync-cmapctl | grep nodelist" output if I set an explicit name in corosync.conf's nodelist. If I rely on the default of "name will be uname

Re: [ClusterLabs] corosync-cfgtool -s shows all links not connected for one particular node

2022-05-24 Thread Jan Friesse
Dirk, On 23/05/2022 19:02, Dirk Gassen wrote: Greetings, I have a four-node cluster on Ubuntu Focal with the following versions: libknet1: 1.15-1ubuntu1 corosync: 3.0.3-2ubuntu2.1 pacemaker: 2.0.3-3ubuntu4.3 3.0.3 corosync-cfgtool was buggy - basically first version with correctly working

Re: [ClusterLabs] Cluster unable to find back together

2022-05-19 Thread Jan Friesse
Hi, On 19/05/2022 10:16, Leditzky, Fabian via Users wrote: Hello We have been dealing with our pacemaker/corosync clusters becoming unstable. The OS is Debian 10 and we use Debian packages for pacemaker and corosync, version 3.0.1-5+deb10u1 and 3.0.1-2+deb10u1 respectively. Seems like pcmk

Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-24 Thread Jan Friesse
2 at 12:17, Jan Friesse wrote: On 24/02/2022 10:28, Viet Nguyen wrote: Hi, Thank you so so much for your help. May i ask a following up question: For the option of having one big cluster with 4 nodes without booth, then, if one site (having 2 nodes) is down, then the other site doe

Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-24 Thread Jan Friesse
basically yet again site C. Regards, Honza Regards, Viet On Wed, 23 Feb 2022 at 17:08, Jan Friesse wrote: Viet, On 22/02/2022 22:37, Viet Nguyen wrote: Hi, Could you please help me out with this question? I have 4 nodes cluster running in the same network but in 2 different sites (building

Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-23 Thread Jan Friesse
Viet, On 22/02/2022 22:37, Viet Nguyen wrote: Hi, Could you please help me out with this question? I have 4 nodes cluster running in the same network but in 2 different sites (building A - 2 nodes and building B - 2 nodes). My objective is to setup HA for this cluster with pacemaker. The

[ClusterLabs] Corosync 3.1.6 is available at corosync.org!

2021-11-15 Thread Jan Friesse
Caulfield (1): cpghum: Allow to continue if corosync is restarted Jan Friesse (4): totem: Add cancel_hold_on_retransmit config option logsys: Unlock config mutex on error totemsrp: Switch totempg buffers at the right time build: Add explicit

Re: [ClusterLabs] Corosync 2 vs Corosync 3

2021-10-27 Thread Jan Friesse
On 25/10/2021 16:42, Ken Gaillot wrote: On Mon, 2021-10-25 at 13:44 +, Toby Haynes wrote: Looking at Pacemaker 2.1, I see that both corosync 2 and corosync 3 are supported. The last corosync 2 release (2.4.5) came out in 30 July 2019. Will there come a point when a future Pacemaker release

Re: [ClusterLabs] Antw: [EXT] Re: Sub‑clusters / super‑clusters?

2021-08-05 Thread Jan Friesse
On 05/08/2021 00:11, Frank D. Engel, Jr. wrote: In theory if you could have an independent voting infrastructure among the three clusters which serves to effectively create a second cluster infrastructure interconnecting them to support resource D, you could Yes. It's called booth. have D

[ClusterLabs] Corosync 3.1.5 is available at corosync.org!

2021-08-04 Thread Jan Friesse
. Complete changelog for 3.1.5: Christine Caulfield (1): knet: Fix node status display Jan Friesse (9): main: Add support for cgroup v2 and auto mode totemconfig: Do not process totem.nodeid cfgtool: Check existence of at least one of nodeid

Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-08-04 Thread Jan Friesse
On 03/08/2021 10:40, Antony Stone wrote: On Tuesday 11 May 2021 at 12:56:01, Strahil Nikolov wrote: Here is the example I had promised: pcs node attribute server1 city=LA pcs node attribute server2 city=NY # Don't run on any node that is not in LA pcs constraint location DummyRes1 rule

Re: [ClusterLabs] QDevice vs 3rd host for majority node quorum

2021-07-15 Thread Jan Friesse
On 15/07/2021 10:09, Jehan-Guillaume de Rorthais wrote: Hi all, On Tue, 13 Jul 2021 19:55:30 + (UTC) Strahil Nikolov wrote: In some cases the third location has a single IP and it makes sense to use it as QDevice. If it has multiple network connections to that location - use a full blown

Re: [ClusterLabs] heartbeat rings questions

2021-07-14 Thread Jan Friesse
On 13/07/2021 19:07, Kiril Pashin wrote: Hi , thanks for the quick reply. I have a couple of follow up questions below in blue below On 12/07/2021 23:27, Kiril Pashin wrote: > Hi , > is it valid to use the same network adapter interface on the same host to be > part of

Re: [ClusterLabs] heartbeat rings questions

2021-07-13 Thread Jan Friesse
On 12/07/2021 23:27, Kiril Pashin wrote: Hi , is it valid to use the same network adapter interface on the same host to be part of multiple heart beat rings ? There should be no problem from technical side, but I wouldn't call this use case "valid". Idea of multiple rings is to have multiple

Re: [ClusterLabs] Updating quorum configuration without restarting cluster

2021-06-21 Thread Jan Friesse
Gerry, Dear community, I would like to ask few questions regarding Corosync/Pacemaker quorum configuration. When updating the Corosync's quorum configuration I added last_man_standing, and auto_tie_breaker in corosync.conf on all hosts and refreshed with 'corosync-cfgtool -R'. Note that that

[ClusterLabs] Corosync 3.1.4 is available at corosync.org!

2021-06-03 Thread Jan Friesse
map where iterate operation may result in corosync crash. Complete changelog for 3.1.4: Christine Caulfield (1): stats: fix crash when iterating over deleted keys Jan Friesse (1): man: Add note about single node configuration Upgrade is highly recommended. Thanks

[ClusterLabs] Corosync 3.1.3 is available at corosync.org!

2021-05-21 Thread Jan Friesse
smaller feature. It's now possible to run `corosync -v` to get list of supported crypto and compression models which can be used in `corosync.conf`. Complete changelog for 3.1.3: Ferenc Wágner (1): man: corosync-cfgtool.8: use proper single quotes Jan Friesse (8

Re: [ClusterLabs] 32 nodes pacemaker cluster setup issue

2021-05-19 Thread Jan Friesse
S Sathish S: Hi Klaus, pacemaker/corosync we generated our own build from clusterlab source code. [root@node1 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) [root@node1 ~]# uname -r 3.10.0-693.82.1.el7.x86_64 [root@node1 ~]# rpm -qa | grep -iE

Re: [ClusterLabs] Antw: [EXT] Corosync 3.1.1 is available at corosync.org!

2021-04-06 Thread Jan Friesse
Ulrich, Jan Friesse schrieb am 31.03.2021 um 15:16 in Nachricht <8f611847-e341-b51b-49c9-fd9ef29fb...@redhat.com>: I am pleased to announce the latest maintenance release of Corosync 3.1.1 available immediately from GitHub release section at https://github.com/corosync/corosync/re

[ClusterLabs] Corosync 3.1.2 is available at corosync.org!

2021-04-06 Thread Jan Friesse
I am pleased to announce the latest maintenance release of Corosync 3.1.2 available immediately from GitHub release section at https://github.com/corosync/corosync/releases or our website at http://build.clusterlabs.org/corosync/releases/. This release contains only one (but very important)

[ClusterLabs] Corosync 3.1.1 is available at corosync.org!

2021-03-31 Thread Jan Friesse
configure: drop unnecessary check and define Ferenc Wágner (1): The ring id file needn't be executable Jan Friesse (4): spec: Add isa version of corosync-devel provides totemknet: Check both cipher and hash for crypto cfg: Improve

Re: [ClusterLabs] Corosync - qdevice not voting

2021-03-19 Thread Jan Friesse
Marcelo, Hello. I have configured corosync with 2 nodes and added a qdevice to help with the quorum. On node1 I added firewall rules to block connections from node2 and the qdevice, trying to simulate a network issue. Just please make sure to block both incoming and also outgoing packets.

Re: [ClusterLabs] maximum token value (knet)

2021-03-15 Thread Jan Friesse
-blackbox command. Trace is enabled by setting logging.debug to "trace" value (so where you have a debug:on, you just set debug: trace). Regards, Honza Best Regards,Strahil Nikolov On Fri, Mar 12, 2021 at 17:01, Jan Friesse wrote: Strahil, Interesting... Yet, th

Re: [ClusterLabs] maximum token value (knet)

2021-03-12 Thread Jan Friesse
четвъртък, 11 март 2021 г., 19:12:58 ч. Гринуич+2, Jan Friesse написа: Strahil, Hello all, I'm building a test cluster on RHEL8.2 and I have noticed that the cluster fails to assemble ( nodes stay inquorate as if the network is not working) if I set the token at 3 or more (30s

Re: [ClusterLabs] Antw: [EXT] Re: maximum token value (knet)

2021-03-12 Thread Jan Friesse
Ulrich, Jan Friesse schrieb am 11.03.2021 um 18:12 in Nachricht : Strahil, Hello all, I'm building a test cluster on RHEL8.2 and I have noticed that the cluster fails to assemble ( nodes stay inquorate as if the network is not working) if I set the token at 3 or more (30s+). Hi! I

Re: [ClusterLabs] maximum token value (knet)

2021-03-11 Thread Jan Friesse
Strahil, Hello all, I'm building a test cluster on RHEL8.2 and I have noticed that the cluster fails to assemble ( nodes stay inquorate as if the network is not working) if I set the token at 3 or more (30s+). Knet waits for enough pong replies for other nodes before it marks them as

Re: [ClusterLabs] Antw: [EXT] Feedback wanted: OCF Resource Agent API 1.1 proposed for adoption

2021-03-10 Thread Jan Friesse
Ulrich Windl napsal(a): Ken Gaillot schrieb am 10.03.2021 um 00:07 in Nachricht : Hi all, After many false starts over the years, we finally have a proposed 1.1 version of the resource agent standard. Discussion is invited here and/or on the pull request:

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-03 Thread Jan Friesse
Eric, -Original Message- From: Users On Behalf Of Jan Friesse Sent: Monday, March 1, 2021 3:27 AM To: Cluster Labs - All topics related to open-source clustering welcomed ... ha1 lost connection to qnetd so it gives up all hope immediately. ha2 retains connection to qnetd so

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-01 Thread Jan Friesse
Andrei, On 01.03.2021 15:45, Jan Friesse wrote: Andrei, On 01.03.2021 12:26, Jan Friesse wrote: Thanks for digging into logs. I believe Eric is hitting https://github.com/corosync/corosync-qdevice/issues/10 (already fixed, but may take some time to get into distributions) - it also

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-01 Thread Jan Friesse
Andrei, On 01.03.2021 12:26, Jan Friesse wrote: Thanks for digging into logs. I believe Eric is hitting https://github.com/corosync/corosync-qdevice/issues/10 (already fixed, but may take some time to get into distributions) - it also contains workaround. I tested corosync-qnetd

Re: [ClusterLabs] Our 2-Node Cluster with a Separate Qdevice Went Down Anyway?

2021-03-01 Thread Jan Friesse
On 27.02.2021 22:12, Andrei Borzenkov wrote: On 27.02.2021 17:08, Eric Robinson wrote: I agree, one node is expected to go out of quorum. Still the question is, why didn't 001db01b take over the services? I just remembered that 001db01b has services running on it, and those services did

Re: [ClusterLabs] cluster loses state (randomly) every few minutes.

2021-01-18 Thread Jan Friesse
lejeczek, hi guys, I have a very basic two-node cluster, not even a single resource on it, but very troublesome - it keeps braking. Journal for 'pacemaker' shows constantly (on both nodes): ... warning: Input I_DC_TIMEOUT received in state S_PENDING from crm_timer_popped  notice: State

Re: [ClusterLabs] Corosync permanently desyncs in face of packet loss

2021-01-18 Thread Jan Friesse
Mariusz, Hi, We've had a hardware problem causing asynchronous packet drop on one of our nodes that caused unrecoverable (required restarting corosync on both nodes) state, that then repeated next day. Log of the events in attachment. It did recover few times after the problem, but when it

Re: [ClusterLabs] corosync[3520]: [CPG ] *** 0x55ff99d211c0 can't mcast to group dlm:ls:lvm_testVG state:1, error:12

2021-01-05 Thread Jan Friesse
Ulrich, Hi! In my test cluster using UDPU(!) I saw this syslog message when I shut down a VG: Nov 30 13:38:28 h16 pacemaker-execd[3681]: notice: executing - rsc:prm_testVG_activate action:stop call_id:71 Nov 30 13:38:28 h16 LVM-activate(prm_testVG_activate)[7265]: INFO: Deactivating testVG

[ClusterLabs] Corosync-qdevice 3.0.1 is available at GitHub!

2020-11-23 Thread Jan Friesse
properly supported. Fixes GH issue #16. Complete changelog for 3.0.1: Fabio M. Di Nitto (1): devel: add corosync-qdevice.pc file for pcs to use Jan Friesse (79): qnetd: Check existence of NSS DB dir before fork spec: Use install -p and add license man

Re: [ClusterLabs] Minor bug in SLES 15 corosync-2.4.5-6.3.2.x86_64 (unicast, ttl)

2020-11-20 Thread Jan Friesse
Ulrich, Hi! A short notification: I had set up a new cluster using udpu, finding that ringnumber 0 has a ttl statement ("ttl:1"), but ringnumber 1 had not. So I added one for ringnumber 1, and then I reloaded corosync via corosync-cfgtool -R. probably ttl with value different from 1

Re: [ClusterLabs] egards, Q: what does " corosync-cfgtool -s" check actually?

2020-11-20 Thread Jan Friesse
Ulrich, Hi! having a problem, I wonder what " corosync-cfgtool -s" does check actually: I see on all nodes and all rings "status = ring 0 active with no faults", but the nodes seem unable to comminicate somehow. For UDPU/UDP without RRP it will always display "ring 0 active with no

Re: [ClusterLabs] Corosync 3.1.0 token timeout

2020-10-22 Thread Jan Friesse
Ulrich, Jan Friesse schrieb am 20.10.2020 um 18:05 in Nachricht <9e9edd13-847c-a81f-9b28-0ecf8f17f...@redhat.com>: I've forgot to mention one very important change (in text, release notes at github release is already fixed): ... - Default token timeout was changed from 1 second

Re: [ClusterLabs] Corosync 3.1.0 is available at corosync.org!

2020-10-20 Thread Jan Friesse
  man: reload during rolling upgrade     Ferenc Wágner (2):   man: fix typo: avaialable   man: votequorum.5: use proper single quotes     Jan Friesse (9):   spec: Require at least knet 1.18 for crypto reload   build: Update git-version-gen   build: Use

[ClusterLabs] Corosync 3.1.0 is available at corosync.org!

2020-10-20 Thread Jan Friesse
man: reload during rolling upgrade Ferenc Wágner (2): man: fix typo: avaialable man: votequorum.5: use proper single quotes Jan Friesse (9): spec: Require at least knet 1.18 for crypto reload build: Update git-version-gen build: Use

Re: [ClusterLabs] Two ethernet adapter within same subnet causing issue on Qdevice

2020-10-06 Thread Jan Friesse
Richard , To clarify my problem, this is more on Qdevice issue I want to fix. The question is, how much it is really qdevice problem and if so, if there is really something we can do about the problem. Qdevice itself is just using standard connect(2) call and standard TCP socket. So from

Re: [ClusterLabs] Alerts for qdevice/qnetd/booth

2020-08-17 Thread Jan Friesse
, Rohit On Thu, Aug 13, 2020 at 1:03 PM Jan Friesse wrote: Hi Rohit, Hi Honza, Thanks for your reply. Please find the attached image below: [image: image.png] Yes, I am talking about pacemaker alerts only. Please find my suggestions/requirements below: *Booth:* 1. Node5 booth-arbitrator should

Re: [ClusterLabs] Alerts for qdevice/qnetd/booth

2020-08-13 Thread Jan Friesse
ase use booth (https://github.com/ClusterLabs/booth) and qdevice (https://github.com/corosync/corosync-qdevice) upstream rather than pacemaker, because these requests has really nothing to do with pcmk. Regards, honza Thanks, Rohit On Wed, Aug 12, 2020 at 8:58 PM Jan Friesse wrote: Hi Rohit,

Re: [ClusterLabs] Alerts for qdevice/qnetd/booth

2020-08-12 Thread Jan Friesse
Hi Rohit, Rohit Saini napsal(a): Hi Team, Question-1: Similar to pcs alerts, do we have something similar for qdevice/qnetd? This You mean pacemaker alerts right? is to detect asynchronously if any of the member is unreachable/joined/left and if that member is qdevice or qnetd. Nope but

Re: [ClusterLabs] qnetd and booth arbitrator running together in a 3rd geo site

2020-07-14 Thread Jan Friesse
On Tue, Jul 14, 2020 at 4:42 PM Jan Friesse wrote: Rohit, Thanks Honja. That's helpful. Let's say I don't use qnetd, can I achieve same with booth arbitrator? That means to have two two-node clusters. Two-node cluster without fencing is strictly no. Booth arbitrator works for geo-clu

Re: [ClusterLabs] qnetd and booth arbitrator running together in a 3rd geo site

2020-07-14 Thread Jan Friesse
Thanks Honja. That's helpful. Let's say I don't use qnetd, can I achieve same with booth arbitrator? Booth arbitrator works for geo-clusters, can the same arbitrator be reused for local clusters as well? Is it even possible technically? Regards, Rohit On Tue, Jul 14, 2020 at 3:32 PM Jan Friesse wr

Re: [ClusterLabs] qnetd and booth arbitrator running together in a 3rd geo site

2020-07-14 Thread Jan Friesse
is more like a stretch cluster, then qnetd + stonith is enough. And of course your idea (original one) should work too. Honza Regards, Rohit On Tue, Jul 14, 2020 at 3:32 PM Jan Friesse wrote: Rohit, Hi Team, Can I execute corosync-qnetd and booth-arbitrator on the same VM in a different geo

Re: [ClusterLabs] qnetd and booth arbitrator running together in a 3rd geo site

2020-07-14 Thread Jan Friesse
Rohit, Hi Team, Can I execute corosync-qnetd and booth-arbitrator on the same VM in a different geo site? What's the recommendation? Will it have any limitations in a production deployment? There is no technical limitation. Both qnetd and booth are very lightweight and work just fine with

Re: [ClusterLabs] Linux 8.2 - high totem token requires manual setting of ping_interval and ping_timeout

2020-06-26 Thread Jan Friesse
Robert, thank you for the info/report. More comments inside. All, Hello. Hope all is well. I have been researching Oracle Linux 8.2 and ran across a situation that is not well documented. I decided to provide some details to the community in case I am missing something. Basically, if

Re: [ClusterLabs] Rolling upgrade from Corosync 2.3+ to Corosync 2.99+ or Corosync 3.0+?

2020-06-11 Thread Jan Friesse
upgrade for product upgrade to the new OS and product version Thanks again for your help! _Vitaly On June 11, 2020 3:30 AM Jan Friesse wrote: Vitaly, Hello everybody. We are trying to do a rolling upgrade from Corosync 2.3.5-1 to Corosync 2.99+. It looks like they are not compatible and we

Re: [ClusterLabs] New user needs some help stabilizing the cluster

2020-06-11 Thread Jan Friesse
Howard, Good morning. Thanks for reading. We have a requirement to provide high availability for PostgreSQL 10. I have built a two node cluster with a quorum device as the third vote, all running on RHEL 8. Here are the versions installed: [postgres@srv2 cluster]$ rpm -qa|grep

Re: [ClusterLabs] Redudant Ring Network failure

2020-06-10 Thread Jan Friesse
Michael, what version of knet you are using? We had quite a few problems with older versions of knet, so current stable is recommended (1.16). Same applies for corosync because 3.0.4 has vastly improved display of links status. Hello, We have massive problems with the redundant ring

Re: [ClusterLabs] Merging partitioned two_node cluster?

2020-05-06 Thread Jan Friesse
Richard, So I tried an experiment. I had tried switch over to 'udpu' unicast transport, but corosync threw an error starting up (which I did not drill down on yet.) I went over to my test environment and did the same thing and it worked fine, the cluster worked and everything. One thing that

Re: [ClusterLabs] Merging partitioned two_node cluster?

2020-05-05 Thread Jan Friesse
On May 5, 2020 6:39:54 AM GMT+03:00, "Nickle, Richard" wrote: I have a two node cluster managing a VIP. The service is an SMTP service. This could be active/active, it doesn't matter which node accepts the SMTP connection, but I wanted to make sure that a VIP was in place so that there was a

[ClusterLabs] Corosync 3.0.4 is available at corosync.org!

2020-04-23 Thread Jan Friesse
cfgtool: Fix error code as described in MP Jan Friesse (32): totemconfig: Free leaks found by coverity votequorum: Ignore the icmap_get_* return value logconfig: Remove double free of value totemconfig: Reuse already fetched pointer cmap: Assert

Re: [ClusterLabs] qdevice up and running -- but questions

2020-04-14 Thread Jan Friesse
On 4/11/20 6:52 PM, Eric Robinson wrote:  1. What command can I execute on the qdevice node which tells me which     client nodes are connected and alive? i use corosync-qnetd-tool -v -l  2. In the output of the pcs qdevice status command, what is the meaning of… Vote: 

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-09 Thread Jan Friesse
Sherrard Burton napsal(a): On 4/8/20 1:09 PM, Andrei Borzenkov wrote: 08.04.2020 10:12, Jan Friesse пишет: Sherrard, i could not determine which of these sub-threads to include this in, so i am going to (reluctantly) top-post it. i switched the transport to udp, and in limited testing i

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-09 Thread Jan Friesse
Sherrard Burton napsal(a): On 4/7/20 4:09 AM, Jan Friesse wrote: Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: It looks like some timing issue or race condition. After reboot node manages to contact qnetd first, before connection

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-09 Thread Jan Friesse
Andrei Borzenkov napsal(a): 08.04.2020 10:12, Jan Friesse пишет: Sherrard, i could not determine which of these sub-threads to include this in, so i am going to (reluctantly) top-post it. i switched the transport to udp, and in limited testing i seem to not be hitting the race condition

Re: [ClusterLabs] how to properly add/delete qdevice for an existing cluster

2020-04-08 Thread Jan Friesse
please forgive me if i have overlooked the answer somewhere. i have an existing cluster that is already configured with a qdevice. i now wish to update that configuration to point at a different qdevice. background: for the sake of working through the initial configuration details, tuning,

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-08 Thread Jan Friesse
you can't chase micro-second improvements that may lessen the chance of triggering it. you have to solve the underlying problem. thanks again folks, for your help, and the great work you are doing. On 4/7/20 4:09 AM, Jan Friesse wrote: Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzen

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-08 Thread Jan Friesse
Sherrard, On 4/7/20 4:09 AM, Jan Friesse wrote: Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: On 4/6/20 1:20 PM, Sherrard Burton wrote: On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-08 Thread Jan Friesse
On Tue, 7 Apr 2020 14:13:35 -0400 Sherrard Burton wrote: On 4/7/20 1:16 PM, Andrei Borzenkov wrote: 07.04.2020 00:21, Sherrard Burton пишет: It looks like some timing issue or race condition. After reboot node manages to contact qnetd first, before connection to other node is established.

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Jan Friesse
Sherrard, On 4/7/20 12:53 AM, Strahil Nikolov wrote: Hi Sherrard, Have you tried to increase the qnet timers in the corosync.conf ? Strahil, i have actually reduced the qnet timers in order to improve failover response time, per Jan's suggestion on the thread '[ClusterLabs] >

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

2020-04-07 Thread Jan Friesse
Sherrard and Andrei On 4/6/20 4:10 PM, Andrei Borzenkov wrote: 06.04.2020 20:57, Sherrard Burton пишет: On 4/6/20 1:20 PM, Sherrard Burton wrote: On 4/6/20 12:35 PM, Andrei Borzenkov wrote: 06.04.2020 17:05, Sherrard Burton пишет: from the quorum node: ... Apr 05 23:10:17 debug  

Re: [ClusterLabs] Ugrading Ubuntu 14.04 to 16.04 with corosync/pacemaker failed

2020-02-20 Thread Jan Friesse
Rasca Gmelch napsal(a): Am 19.02.20 um 19:20 schrieb Strahil Nikolov: On February 19, 2020 6:31:19 PM GMT+02:00, Rasca wrote: Hi, we run a 2-system cluster for Samba with Ubuntu 14.04 and Samba, Corosync and Pacemaker from the Ubuntu repos. We wanted to update to Ubuntu 16.04 but it failed:

Re: [ClusterLabs] corosync-cfgtool -sb. What is the "n" indicating?

2020-02-10 Thread Jan Friesse
Just for archival purpose, this issue is now worked on at gh https://github.com/corosync/corosync/issues/527 Hi Corosync Specialists! I have a production cluster with two nodes (node0/1). And I have setup for debugging this issue a completely virtual cluster also. Both are showing the same

Re: [ClusterLabs] Corosync-Qdevice SSL Ciphers

2020-01-22 Thread Jan Friesse
-Original Message- From: Jan Friesse Sent: Wednesday, January 22, 2020 13:45 To: Cluster Labs - All topics related to open-source clustering welcomed ; Somanath Jeeva Subject: Re: [ClusterLabs] Corosync-Qdevice SSL Ciphers Somanath, Hi , Is there a way to find/restrict the list of ciphers

Re: [ClusterLabs] Corosync-Qdevice SSL Ciphers

2020-01-22 Thread Jan Friesse
Somanath, Hi , Is there a way to find/restrict the list of ciphers used by corosync-qnetd similar to the PCSD_SSL_CIPHERS variable in /etc/sysconfig/pcsd configuration file. Nope. But qnetd is using NSS so it is possible to change the system policy in

Re: [ClusterLabs] SSL Certificates in Corosync-Qdevice.

2020-01-21 Thread Jan Friesse
Somanath, Hi , We are corosync-qdevice version 3.0.0 in a two node cluster setup. During qnetd configuration, ssl certificates with 100 year validity is generated. I want to know if it is possible to use custom generated certificates with different validity ,similar to the option available

Re: [ClusterLabs] Node replies with 401 ssl connect error

2020-01-15 Thread Jan Friesse
Klaus Wenninger napsal(a): On 1/15/20 12:31 PM, Raffaele Pantaleoni wrote: Hello, I'm trying to setup a cluster made up by three servers. Two of them runs on Debian 10 and they are already part of the cluster and marked online. I can't join the third machine running on Debian 9. I can see

Re: [ClusterLabs] Prevent Corosync Qdevice Failback in split brain scenario.

2020-01-02 Thread Jan Friesse
Somanath, Hi , I am planning to use Corosync Qdevice version 3.0.0 with corosync version 2.4.4 and pacemaker 1.1.16 in a two node cluster. I want to know if failback can be avoided in the below situation. 1. The pcs cluster is in split brain scenario after a network break between two

[ClusterLabs] Corosync 3.0.3 is available at corosync.org!

2019-11-25 Thread Jan Friesse
): cpg: notify_lib_joinlist: drop conn parameter cpg: send single confchg event per group on joinlist Fabio M. Di Nitto (2): build: add option for enabling sanitizer builds pkgconfig: Add libqb dependency Jan Friesse (17): cpg: Add more comments to notify_lib_joinlist

  1   2   3   >