Re: [ClusterLabs] corosync - CS_ERR_BAD_HANDLE when multiple nodes are starting up

2015-10-26 Thread Jan Friesse
ping echo reply ;) On 10/14/2015 02:10 PM, Thomas Lamprecht wrote: Hi, On 10/08/2015 10:57 AM, Jan Friesse wrote: Hi, Thomas Lamprecht napsal(a): [snip] Hello, we are using corosync version needle (2.3.5) for our cluster filesystem (pmxcfs). The situation is the following. First we

Re: [ClusterLabs] two node cluster not behaving right

2015-11-05 Thread Jan Friesse
user.clusterlabs@siimnet.dk napsal(a): Been new to pacemaker, I’m trying to create my first cluster of two nodes, but it seems to behave a little strange. Following this guide: http://clusterlabs.org/quickstart-redhat-6.html but am unable

Re: [ClusterLabs] two node cluster not behaving right

2015-11-06 Thread Jan Friesse
Steffen, user.clusterlabs@siimnet.dk napsal(a): On 6. nov. 2015, at 08.42, Jan Friesse <jfrie...@redhat.com> wrote: This means something is blocking successful delivery of packets. Make sure to: - Properly configure firewall (for testing you can disable it completely) - Make su

Re: [ClusterLabs] continous QUORUM messages in a 3-node cluster

2015-10-12 Thread Jan Friesse
Illia, Hi, We are using a 3-node pacemaker/corosync cluster on CentOs 7. We have several identical setups in our QA/DEV orgs, and a couple of them continuously spew the following messages on all 3 nodes: Oct 8 17:18:20 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] A new membership

Re: [ClusterLabs] corosync - CS_ERR_BAD_HANDLE when multiple nodes are starting up

2015-10-06 Thread Jan Friesse
Thomas, Thomas Lamprecht napsal(a): Hi, thanks for the response! I added some information and clarification below. On 10/01/2015 09:23 AM, Jan Friesse wrote: Hi, Thomas Lamprecht napsal(a): Hello, we are using corosync version needle (2.3.5) for our cluster filesystem (pmxcfs

Re: [ClusterLabs] "0 Nodes configured" in crm_mon

2015-08-31 Thread Jan Friesse
Stanislav, > Hi Ken, thanks for the info, I will try 2.3.4 or maybe even 2.3.3 like in original compilation guide. also maybe you are hitting same problem as was discussed on list in thread (Corosync: 100% cpu (corosync 2.3.5, libqb 0.17.1, pacemaker 1.1.13) Solution is ether apply

Re: [ClusterLabs] EL6, cman, rrp, unicast and iptables

2015-09-14 Thread Jan Friesse
Digimer napsal(a): Hi all, Starting a new thread from the "Clustered LVM with iptables issue" thread... I've decided to review how I do networking entirely in my cluster. I make zero claims to being great at networks, so I would love some feedback. I've got three active/passive

Re: [ClusterLabs] corosync - CS_ERR_BAD_HANDLE when multiple nodes are starting up

2015-10-01 Thread Jan Friesse
Hi, Thomas Lamprecht napsal(a): Hello, we are using corosync version needle (2.3.5) for our cluster filesystem (pmxcfs). The situation is the following. First we start up the pmxcfs, which is an fuse fs. And if there is an cluster configuration, we start also corosync. This allows the

Re: [ClusterLabs] Anyone successfully install Pacemaker/Corosync on Freebsd?

2016-01-04 Thread Jan Friesse
Christine Caulfield napsal(a): On 21/12/15 16:12, Ken Gaillot wrote: On 12/19/2015 04:56 PM, mike wrote: Hi All, just curious if anyone has had any luck at one point installing Pacemaker and Corosync on FreeBSD. I have to install from source of course and I've run into an issue when running

Re: [ClusterLabs] Few questions regarding corosync authkey

2016-06-06 Thread Jan Friesse
Hi, Would like to understand how secure is the corosync authkey. As the authkey is a binary file, how is the private key saved inside the authkey? Corosync uses symmetric encryption, so there is no public certificate. authkey = private key What safeguard mechanisms are in place if the

Re: [ClusterLabs] [corosync] Virtual Synchrony Property guarantees in case of network partition

2016-06-06 Thread Jan Friesse
Hi, Hello, Virtual Synchrony Property - messages are delivered in agreed order and configuration changes are delivered in agreed order relative to message. What happen to this property when network is partitioned the cluster into two. Consider following scenario (which I took from one of the

Re: [ClusterLabs] [corosync] Virtual Synchrony Property guarantees in case of network partition

2016-06-06 Thread Jan Friesse
But C1 is *guaranteed *to deliver *before *m(k)? No case where C1 is Yes delivered after m(k)? Nope. Regards, Satish On Mon, Jun 6, 2016 at 8:10 PM, Jan Friesse <jfrie...@redhat.com> wrote: satish kumar napsal(a): Hello honza, thanks for the response ! With state sync, I

Re: [ClusterLabs] [corosync][Problem] Very long "pause detect ... " was detected.

2016-06-13 Thread Jan Friesse
Hideo, Hi All, Our user constituted a cluster in corosync and Pacemaker in the next environment. The cluster constituted it among guests. * Host/Guest : RHEL6.6 - kernel : 2.6.32-504.el6.x86_64 * libqb 0.17.1 * corosync 2.3.4 * Pacemaker 1.1.12 The cluster worked well. When a user stopped

Re: [ClusterLabs] Corosync with passive rrp, udpu - Unable to reset after "Marking ringid 1 interface 127.0.0.1 FAULTY"

2016-06-17 Thread Jan Friesse
e-up /sbin/ethtool -s bond0 speed 1000 duplex full autoneg on post-up ifenslave bond0 eth0 eth2 pre-down ifenslave -d bond0 eth0 eth2 bond-slaves none bond-mode 4 bond-lacp-rate fast bond-miimon 100 bond-downdelay 0 bond-updelay 0 bond-xmit_hash_policy 1 address [...] Jan Friesse <jfrie...@redha

Re: [ClusterLabs] Corosync with passive rrp, udpu - Unable to reset after "Marking ringid 1 interface 127.0.0.1 FAULTY"

2016-06-16 Thread Jan Friesse
Martin Schlegel napsal(a): Hello everyone, we run a 3 node Pacemaker (1.1.14) / Corosync (2.3.5) cluster for a couple of months successfully and we have started seeing a faulty ring with unexpected 127.0.0.1 binding that we cannot reset via "corosync-cfgtool -r". This is problem. Bind to

[ClusterLabs] Temporarily corosync.org unavailability

2016-01-11 Thread Jan Friesse
I would like to inform all corosync.org user about temporarily corosync.org unavailability. We are working on fixing this issue. For now just use http://corosync.github.io/corosync/ (and http://build.clusterlabs.org/corosync/releases/) for downloading releases. We are planing to restore

Re: [ClusterLabs] Too quick node reboot leads to failed corosync assert on other node(s)

2016-02-12 Thread Jan Friesse
Michal, Hello. The subject is just a hypothesis that I'd like to confirm/discuss here. TL;DR Token timeout shouldn't be greater than reboot cycle, is that correct? actually when I was reading your mail it was like "this sounds so familiar, we solved this problem looong time ago".

Re: [ClusterLabs] Connection to the CPG API failed: Library error (2)

2016-02-01 Thread Jan Friesse
, what patch can help solve problem you hit (we did rebase in RHEL 6.6). So no real hint. Just upgrade to something supported. Regards, Honza Thanks, Karthik. -Original Message- From: Jan Friesse [mailto:jfrie...@redhat.com] Sent: 01 பிப்ரவரி 2016 13:43 To: Cluster Labs - All topics

Re: [ClusterLabs] Corosync main process was not scheduled for 115935.2266 ms (threshold is 800.0000 ms). Consider token timeout increase.

2016-02-25 Thread Jan Friesse
Adam Spiers napsal(a): Hi all, Jan Friesse <jfrie...@redhat.com> wrote: There is really no help. It's best to make sure corosync is scheduled regularly. I may sound silly, but how can I do it? It's actually very hard to say. Pauses like 30 sec is really unusual and shouldn't

Re: [ClusterLabs] Security with Corosync

2016-03-11 Thread Jan Friesse
Nikhil, Nikhil Utane napsal(a): Hi, I changed some configuration and captured packets. I can see that the data is already garbled and not in the clear. So does corosync already have this built-in? Can somebody provide more details as to what all security features are incorporated? See man

Re: [ClusterLabs] Security with Corosync

2016-03-14 Thread Jan Friesse
t not over totem.keyfile) - you are using COROSYNC_TOTEM_AUTHKEY_FILE env with file existing on all nodes Regards, Honza On Fri, Mar 11, 2016 at 4:15 PM, Nikhil Utane <nikhil.subscri...@gmail.com> wrote: Perfect. Thanks for the quick response Honza. Cheers Nikhil On Fri, Mar 11, 2016 at 4:10

Re: [ClusterLabs] Totem is unable to form a cluster because of an operating system or network fault

2016-04-13 Thread Jan Friesse
Hi , Am trying to configure my sql on ubuntu according to this article : https://azure.microsoft.com/en-in/documentation/articles/virtual-machines-linux-classic-mysql-cluster/ two node cluster looking on corosync log : Apr 12 11:01:09 corosync [TOTEM ] Totem is unable to form a cluster

Re: [ClusterLabs] getting "Totem is unable to form a cluster" error

2016-04-08 Thread Jan Friesse
pacemaker 1.1.12-11.12 openais 1.1.4-5.24.5 corosync 1.4.7-0.23.5 Its a two node active/passive cluster and we just upgraded the SLES 11 SP 3 to SLES 11 SP 4(nothing else) but when we try to start the cluster service we get the following error: "Totem is unable to form a cluster because of an

Re: [ClusterLabs] getting "Totem is unable to form a cluster" error

2016-04-11 Thread Jan Friesse
08.04.2016 17:51, Jan Friesse пишет: On 04/08/16 13:01, Jan Friesse wrote: >> pacemaker 1.1.12-11.12 >> openais 1.1.4-5.24.5 >> corosync 1.4.7-0.23.5 >> >> Its a two node active/passive cluster and we just upgraded the SLES 11 >> SP 3 to SLES 11

Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Jan Friesse
Christopher, If I ignore pacemaker's existence, and just run corosync, corosync disagrees about node membership in the situation presented in the first email. While it's true that stonith just happens to quickly correct the situation after it occurs it still smells like a bug in the case where

Re: [ClusterLabs] Security with Corosync

2016-03-19 Thread Jan Friesse
uses cman keyfile and if this is not provided, encryption key is simply cluster name. This is probably reason why everything worked when you haven't had authkey on one of nodes. Honza -Thanks Nikhil On Mon, Mar 14, 2016 at 1:19 PM, Jan Friesse <jfrie...@redhat.com> wrote: Nikhil Utane

Re: [ClusterLabs] service network restart and corosync

2016-03-03 Thread Jan Friesse
Hi, In our deployment, due to some requirement, we need to do a : service network restart What is exact reason for doing network restart? Due to this corosync crashes and the associated pacemaker processes crash as well. As per the last comment on this issue, --- Corosync reacts

Re: [ClusterLabs] Corosync do not send traffic

2016-04-05 Thread Jan Friesse
Roberto Munoz Gomez napsal(a): 2016-03-30 13:42 GMT+02:00 Jan Friesse <jfrie...@redhat.com>: Roberto Munoz Gomez napsal(a): 2016-03-30 11:27 GMT+02:00 Jan Friesse <jfrie...@redhat.com>: Hello, Due to a change in the switch in one of the datacenters now I have an o

Re: [ClusterLabs] service network restart and corosync

2016-03-30 Thread Jan Friesse
Hi Jan Friesse, Thank you for the update. I have responded inline. A few queries, as well. Regards, Debabrata Pani On 29/03/16 14:24, "Jan Friesse" <jfrie...@redhat.com> wrote: Hi (Jan Friesse) I studied the issue mentioned in the github url. It looks the crash that I

Re: [ClusterLabs] corosync-2.3.5 with ipv6 and multicast

2016-03-30 Thread Jan Friesse
Hi, Thanks for your confirmation, and I 've two questions: On 03/30/2016 05:11 PM, Jan Friesse wrote: Hi, Honza I want to use corosync-2.3.5 with ipv6 and multicast, and I see that I must specify nodeid in nodelist, but I also see nodeid in totem from totemconfig.c. Are they the same

Re: [ClusterLabs] corosync-2.3.5 with ipv6 and multicast

2016-03-30 Thread Jan Friesse
Hi, Honza I want to use corosync-2.3.5 with ipv6 and multicast, and I see that I must specify nodeid in nodelist, but I also see nodeid in totem from totemconfig.c. Are they the same? For given node it's same. Generally it's recommended to use nodelist nodeid. It's better simply

Re: [ClusterLabs] service network restart and corosync

2016-03-29 Thread Jan Friesse
Hi (Jan Friesse) I studied the issue mentioned in the github url. It looks the crash that I am talking about is slightly different from the one mentioned in the original issue. May be they are related, but I would like to Highlight my setup for ease. Three node cluster , one is in maintenance

Re: [ClusterLabs] Pacemaker with Zookeeper??

2016-05-16 Thread Jan Friesse
Hi, I have an idea: use Pacemaker with Zookeeper (instead of Corosync). Is it possible? Is there any examination about that? From my point of view (and yes, I'm biased), biggest problem of Zookeper is need to have quorum

Re: [ClusterLabs] Pacemaker with Zookeeper??

2016-05-18 Thread Jan Friesse
Ken Gaillot napsal(a): On 05/17/2016 09:54 AM, Digimer wrote: On 16/05/16 04:35 AM, Bogdan Dobrelya wrote: On 05/16/2016 09:23 AM, Jan Friesse wrote: Hi, I have an idea: use Pacemaker with Zookeeper (instead of Corosync). Is it possible? Is there any examination about that? Indeed, would

Re: [ClusterLabs] [ClusterLab] : Corosync not initializing successfully

2016-05-02 Thread Jan Friesse
As your hardware is probably capable of running ppcle and if you have an environment at hand without too much effort it might pay off to try that. There are of course distributions out there support corosync on big-endian architectures but I don't know if there is an automatized regression for

Re: [ClusterLabs] [ClusterLab] : Corosync not initializing successfully

2016-05-05 Thread Jan Friesse
il.fm> wrote: Hi, On Mon, May 02, 2016 at 08:54:09AM +0200, Jan Friesse wrote: As your hardware is probably capable of running ppcle and if you have an environment at hand without too much effort it might pay off to try that. There are of course distributions out there support corosync on big

[ClusterLabs] Corosync 2.4.1 is available at corosync.org!

2016-08-04 Thread Jan Friesse
with pacemaker-1.13+ Christine Caulfield (2): qdevice: Fix 'tie_breaker' in man page qdevice: some more small man page fixes HideoYamauchi (1): Low: totemsrp: Addition of the log. Jan Friesse (2): Config: Flag config uidgid entries Spec: Qdevice require same version

Re: [ClusterLabs] changing pacemaker.log location

2016-08-15 Thread Jan Friesse
Ken Gaillot napsal(a): On 08/12/2016 10:19 AM, Christopher Harvey wrote: I'm surprised I'm having such a hard time figuring this out on my own. I'm running pacemaker 1.1.13 and corosync-2.3.4 and want to change the location of pacemaker.log. By default it is located in /var/log. I looked in

Re: [ClusterLabs] [corosync] [build] the configure shows UNKNOWN version and runtime exits 17

2016-06-27 Thread Jan Friesse
Hello. I use this guide [0] to build libqb, corosync, pacemaker and test them as pid-space linked docker containers [1]. A Pacemaker builds OK and shows the v1.1.15 runtime, a build-time it complains about an unknown libqb version. I workarounded it by running a Pacemaker build with these env

[ClusterLabs] Corosync 2.4.0 is available at corosync.org!

2016-06-30 Thread Jan Friesse
eve Fix typo: alocated -> allocated Fix typo: Diabled -> disabled Jan Friesse (1): config: get_cluster_mcast_addr error is not fatal bliu (1): low:typo fix in sam.h Upgrade is (more than usually) highly recommended. Thanks/congratulations to all people that contribut

Re: [ClusterLabs] Corosync IPC unix socket path

2017-02-07 Thread Jan Friesse
Rytis, Hi, We are using corosync (currently v2.3.5) for group communication services, and we would like to package it to separate container (all the service-per-container stuff), thus requiring to forward IPC. Is it possible to set unix socket path for corosync IPC? Sadly nope. Corosync is

Re: [ClusterLabs] Corosync maximum nodes

2017-01-30 Thread Jan Friesse
Hello! I'm very sorry to disturb you with such question but I can't find information if there is maximum nodes' limit in corosync? I've found a bug report https://bugzilla.redhat.com/show_bug.cgi?id=905296#c5 with "Corosync has hardcoded maximum number of nodes to 64" but it was posted 4 years

Re: [ClusterLabs] Corosync ring shown faulty between healthy nodes & networks (rrp_mode: passive)

2016-10-07 Thread Jan Friesse
s a quorum in this case. Regards, Martin Schlegel Jan Friesse <jfrie...@redhat.com> hat am 5. Oktober 2016 um 09:01 geschrieben: Martin, Hello all, I am trying to understand why the following 2 Corosync heartbeat ring failure scenarios I have been testing and hope somebody can explain why this

Re: [ClusterLabs] corosync-quorum tool, output name key on Name column if set?

2016-09-21 Thread Jan Friesse
Thomas Lamprecht napsal(a): On 09/20/2016 12:36 PM, Christine Caulfield wrote: On 20/09/16 10:46, Thomas Lamprecht wrote: Hi, when I'm using corosync-quorumtool [-l] and have my ring0_addr set to a IP address, which does not resolve to a hostname, I get the nodes IP addresses for the 'Name'

Re: [ClusterLabs] permissions under /etc/corosync/qnetd

2016-11-08 Thread Jan Friesse
Ferenc Wágner napsal(a): Jan Friesse <jfrie...@redhat.com> writes: Ferenc Wágner napsal(a): Have you got any plans/timeline for 2.4.2 yet? Yep, I'm going to release it in few minutes/hours. Man, that was quick. I've got a bunch of typo fixes queued..:) Please consider anno

[ClusterLabs] Corosync 2.4.2 is available at corosync.org!

2016-11-07 Thread Jan Friesse
): [build] Fix build on RHEL7.3 latest Jan Friesse (3): Man: Fix corosync-qdevice-net-certutil link Qnetd LMS: Fix two partition use case libvotequorum: Bump version Michael Jones (1): cfg: Prevents use of uninitialized buffer Upgrade is (as usually) highly recommended

Re: [ClusterLabs] Corosync 2.4.0 is available at corosync.org!

2016-11-07 Thread Jan Friesse
Ferenc Wágner napsal(a): Jan Friesse <jfrie...@redhat.com> writes: Jan Friesse <jfrie...@redhat.com> writes: Please note that because of required changes in votequorum, libvotequorum is no longer binary compatible. This is reason for version bump. Er, what version bump? Co

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-20 Thread Jan Friesse
the patch to my build and it seems to be working as expected. A few thoughts triggered by this approach: - we have to alert the corosync-people as in a chat with Jan Friesse he pointed me to the fact that for corosync 3.x the wd-service was planned to be removed Actually I didn't express

Re: [ClusterLabs] cross DC cluster using public ip?

2016-10-13 Thread Jan Friesse
neeraj ch napsal(a): Hello , We are testing out corosync and pacemaker for DB high availability on the cloud. I was able to set up a cluster with in a DC using corosync 1.4 and pacemaker 1.12. It works great and I wanted to try a cross DC cluster. I was using unicast as multicast was disabled

Re: [ClusterLabs] large cluster with corosync

2017-01-04 Thread Jan Friesse
Arne Jansen napsal(a): On 04.01.2017 11:25, Kristoffer Grönlund wrote: Arne Jansen writes: Hi, I've built corosync for solaris and am trying to build a largish cluster. I started corosync with default configuration on an increasing number of nodes, one by one. At

Re: [ClusterLabs] Buffer overflow (re-transmission list)

2017-01-02 Thread Jan Friesse
Hello, I have a four node cluster. Each node connected with a centralized switch. MTU size is default, 1500. On each node, a program continuously tries to multi-cast as many messages as possible. With the default settings (corosync.conf), buffer overflow does *not* occur till program runs on

Re: [ClusterLabs] large cluster with corosync

2017-01-10 Thread Jan Friesse
Arne Jansen napsal(a): On 04.01.2017 13:52, Jan Friesse wrote: Variables you can try tweak. - Definitively start with increase totem.config (default 1000, you can try 1) what does that do? Haven't found it in corosync.conf(5) Sorry, typo. I meant totem.token. - If it doesn't help

Re: [ClusterLabs] wireshark cannot recognize corosync packets

2017-03-17 Thread Jan Friesse
200.201.162.54 exists and it shouldn't be 200.201.162.55 (or 200.201.162.55 shouldn't have ip 200.201.162.54)? Honza 在2017年03月16 15时54分, "Jan Friesse"<jfrie...@redhat.com>写道: corosync.conf and debug logs are in attachment. Thanks for them. They look really interesting. As can

Re: [ClusterLabs] wireshark cannot recognize corosync packets

2017-03-15 Thread Jan Friesse
Yesterday I found corosync took almost one hour to form a cluster(a failed node came back online). This for sure shouldn't happen (at least with default timeout settings). So I captured some corosync packets, and opened the pcap file in wireshark. But wireshark only displayed raw udp, no

Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-31 Thread Jan Friesse
The original message has the logs from nodes 1 and 3. Node 2, the one that got fenced in this test, doesn't really show much. Here are the logs from it: Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #5 enp6s0f0, 192.168.100.14#123, interface stats: received=0, sent=0, dropped=0,

Re: [ClusterLabs] Ubuntu 16.04 - 2 node setup

2017-04-13 Thread Jan Friesse
James, Hey guys, Apologies for burdening you with my issue, but I'm at my wits' end! I'm trying to set up a 2-node cluster on two Ubuntu 16.04 VMs. I actually had this working earlier, but because I had tweaked a number of different settings (both corosync related and external settings),

Re: [ClusterLabs] corosync totem.token too long may cause pacemaker(cluster) unstable?

2017-03-08 Thread Jan Friesse
cys napsal(a): At 2017-03-08 17:10:36, "Jan Friesse" <jfrie...@redhat.com> wrote: Hi, We changed totem.token from 3s to 60s. Then something strange were observed, such as unexpected node offline. I read corosync.conf manpage, but still don't understand the reason. Ca

Re: [ClusterLabs] corosync totem.token too long may cause pacemaker(cluster) unstable?

2017-03-08 Thread Jan Friesse
Hi, We changed totem.token from 3s to 60s. Then something strange were observed, such as unexpected node offline. I read corosync.conf manpage, but still don't understand the reason. Can anyone explain this? or maybe our conf is broken? What corosync version are you using? Your config file is

Re: [ClusterLabs] Updated attribute is not displayed in crm_mon

2017-08-15 Thread Jan Friesse
Ken Gaillot napsal(a): On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote: On Wed, 2017-08-02 at 09:59 +, 井上 和徳 wrote: Hi, In Pacemaker-1.1.17, the attribute updated while starting pacemaker is not displayed in crm_mon. In Pacemaker-1.1.16, it is displayed and results are different.

Re: [ClusterLabs] Two nodes cluster issue

2017-07-25 Thread Jan Friesse
Tomer Azran napsal(a): I tend to agree with Klaus – I don't think that having a hook that bypass stonith is the right way. It is better to not use stonith at all. I think I will try to use an iScsi target on my qdevice and set SBD to use it. I still don't understand why qdevice can't take the

Re: [ClusterLabs] Two nodes cluster issue

2017-07-25 Thread Jan Friesse
Tomer Azran napsal(a): I tend to agree with Klaus – I don't think that having a hook that bypass stonith is the right way. It is better to not use stonith at all. I think I will try to use an iScsi target on my qdevice and set SBD to use it. I still don't understand why qdevice can't take the

Re: [ClusterLabs] Corosync CPU load slowly increasing if one node present

2017-04-27 Thread Jan Friesse
Stefan, Hello everyone! I am using Pacemaker (1.1.12), Corosync (2.3.0) and libqb (0.16.0) in 2-node clusters (virtualized in VMware infrastructure, OS: RHEL 6.7). I noticed that if only one node is present, the CPU usage of Corosync (as seen with top) is slowly but steadily increasing (over

Re: [ClusterLabs] Digest does not match

2017-04-24 Thread Jan Friesse
Among two cases where I have seen this error messages I solved one. On one cluster these dedicated interfaces were connected to a switch instead of being connected directly. Ok Though I still don't know what caused these errors on another system (the logs in the previous email). Actually

Re: [ClusterLabs] Two nodes cluster issue

2017-08-08 Thread Jan Friesse
--Original Message----- From: Jan Friesse [mailto:jfrie...@redhat.com] Sent: Monday, August 7, 2017 2:38 PM To: Cluster Labs - All topics related to open-source clustering welcomed <users@clusterlabs.org>; kwenn...@redhat.com; Prasad, Shashank <sspra...@vanu.com> Subject: Re: [ClusterL

Re: [ClusterLabs] Two nodes cluster issue

2017-08-07 Thread Jan Friesse
of algorithms so if you were not able to determine the difference them there is something wrong and man page needs improvement. What exactly you were unable to understand? Also for your use case with 2 nodes both algorithms behaves same way. Honza -Original Message- From: Jan Friesse

Re: [ClusterLabs] ClusterLabs.Org Documentation Problem?

2017-08-23 Thread Jan Friesse
Thanks for the reply. Yes, it's a bit confusing. I did end up using the documentation for Corosync 2.X since that seemed newer, but it also assumed CentOS/RHEL7 and systemd-based commands. It also incorporates cman, pcsd, psmisc, and policycoreutils-pythonwhich, which are all new to me. If

Re: [ClusterLabs] vip is not removed after node lost connection with the other two nodes

2017-06-26 Thread Jan Friesse
Jan Pokorný napsal(a): [Hui, no need to address us individually along with the list, we are both subscribed to it since around the beginning] On 26/06/17 16:10 +0800, Hui Xiang wrote: Thanks guys!! @Ken I did "ifconfig ethx down" to make the cluster interface down. That's what I suspected

Re: [ClusterLabs] how to sync data using cmap between cluster

2017-05-26 Thread Jan Friesse
ok, but why the node status(left, join) can be sync to other node in the cluster by CMAP? thanks! It is not synced by cmap. Node status is essential property of totem protocol and it is just stored into local cmap mostly for diagnostics/monitoring. Also they are not so much in sync. You can

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-04 Thread Jan Friesse
Jean, Hi Jan, On Tue, 3 Oct 2017, Jan Friesse wrote: I hope this makes sense! :) I would still have some questions :) but that is really not related to the problem you have. Questions are welcome! I am new to this stack, so there is certainly room for learning and for improvement. My

Re: [ClusterLabs] corosync service not automatically started

2017-10-10 Thread Jan Friesse
Ken Gaillot napsal(a): On Tue, 2017-10-10 at 12:24 +0200, Václav Mach wrote: On 10/10/2017 11:40 AM, Valentin Vidic wrote: On Tue, Oct 10, 2017 at 11:26:24AM +0200, Václav Mach wrote: # The primary network interface allow-hotplug eth0 iface eth0 inet dhcp # This is an autoconfigured IPv6

Re: [ClusterLabs] corosync service not automatically started

2017-10-10 Thread Jan Friesse
Václav Mach napsal(a): On 10/10/2017 11:40 AM, Valentin Vidic wrote: On Tue, Oct 10, 2017 at 11:26:24AM +0200, Václav Mach wrote: # The primary network interface allow-hotplug eth0 iface eth0 inet dhcp # This is an autoconfigured IPv6 interface iface eth0 inet6 auto allow-hotplug or dhcp

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-16 Thread Jan Friesse
Jonathan, On 13/10/17 17:24, Jan Friesse wrote: I've done a bit of digging and am getting closer to the root cause of the race. We rely on having votequorum_sync_init called twice -- once when node 1 joins (with member_list_entries=2) and once when node 1 leaves (with member_list_entries=1

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-13 Thread Jan Friesse
Jonathan Davies napsal(a): On 12/10/17 11:54, Jan Friesse wrote: I'm on corosync-2.3.4 plus my patch Finally noticed ^^^ 2.3.4 is really old and as long as it is not some patched version, I wouldn't recommend to use it. Can you give a try to current needle? I was mistaken to think I

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Jan Friesse
Ferenc, wf...@niif.hu (Ferenc Wágner) writes: Jan Friesse <jfrie...@redhat.com> writes: wf...@niif.hu writes: In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosyn

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-12 Thread Jan Friesse
Ferenc, Jan Friesse <jfrie...@redhat.com> writes: Back to problem you have. It's definitively HW issue but I'm thinking how to solve it in software. Right now, I can see two ways: 1. Set dog FD to be non blocking right at the end of setup_watchdog - This is proffered but I'm no

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-02 Thread Jan Friesse
On Wed, 27 Sep 2017, Jan Friesse wrote: I don't think scheduling is the case. If scheduler would be the case other message (Corosync main process was not scheduled for ...) would kick in. This looks more like a something is blocked in totemsrp. Ah, interesting! Also, it looks like the side

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-03 Thread Jan Friesse
Jean, On Mon, 2 Oct 2017, Jan Friesse wrote: We had one problem on a real deployment of DLM+corosync (5 voters and 20 non-voters, with dlm on those 20, for a specific application that uses What you mean by voters and non-voters? There is 25 nodes in total and each of them is running

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-28 Thread Jan Friesse
Ferenc, Hi, In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration. vhbl03 corosync[3890]: [TOTEM ] A processor

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-01 Thread Jan Friesse
Ferenc, Jan Friesse <jfrie...@redhat.com> writes: wf...@niif.hu writes: In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, formi

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-08-29 Thread Jan Friesse
Ferenc, Jan Friesse <jfrie...@redhat.com> writes: wf...@niif.hu writes: In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day (in August; in May, it happened 0-2 times a day only, it's slowly ramping up): vhbl08 corosync[3687]: [TOTEM ] A processor failed, formi

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-12 Thread Jan Friesse
Jonathan, I believe main "problem" is votequorum ability to work during sync phase (votequorum is only one service with this ability, see votequorum_overview.8 section VIRTUAL SYNCHRONY)... Hi ClusterLabs, I'm seeing a race condition in corosync where votequorum can have incorrect

Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-09-27 Thread Jan Friesse
Jean, Hello, As the subject line suggests, I am wondering why I see so many of these log lines (many means about 10 times per minute, usually several in the same second): Sep 26 19:56:24 [950] vm0 corosync notice [TOTEM ] Process pause detected for 2555 ms, flushing membership messages. Sep

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jan Friesse
Jonathan, On 16/10/17 15:58, Jan Friesse wrote: Jonathan, On 13/10/17 17:24, Jan Friesse wrote: I've done a bit of digging and am getting closer to the root cause of the race. We rely on having votequorum_sync_init called twice -- once when node 1 joins (with member_list_entries=2

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jan Friesse
Jonathan, On 18/10/17 14:38, Jan Friesse wrote: Can you please try to remove "votequorum_exec_send_nodeinfo(us->node_id);" line from votequorum.c in the votequorum_exec_init_fn function (around line 2306) and let me know if problem persists? Wow! With that change, I'm p

Re: [ClusterLabs] pcs authentication fails on Centos 7.0 & 7.1

2017-11-13 Thread Jan Friesse
Digimer napsal(a): On 2017-11-12 04:20 AM, Aviran Jerbby wrote: Hi Clusterlabs mailing list, I'm having issues running pcs authentication on RH cent os 7.0/7.1 (Please see log below). *_It's important to mention that pcs authentication with RH cent os 7.2/7.4 and with the same setup and

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-11-13 Thread Jan Friesse
Jonathan, I've finished (I hope) proper fix for problem you've seen, so can you please try to test https://github.com/corosync/corosync/pull/280 Thanks, Honza On 31/10/17 10:41, Jan Friesse wrote: Did you get a chance to confirm whether the workaround to remove the final call

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-11-15 Thread Jan Friesse
On 13/11/17 17:06, Jan Friesse wrote: Jonathan, I've finished (I hope) proper fix for problem you've seen, so can you please try to test https://github.com/corosync/corosync/pull/280 Thanks, Honza Hi Honza, Hi Jonathan, Thanks very much for putting this fix together. I'm happy

[ClusterLabs] Corosync 1.4.9 is available at corosync.org!

2017-12-08 Thread Jan Friesse
I am pleased to announce the latest maintenance of old stable release of Corosync 1.4.9 available immediately from our website at http://build.clusterlabs.org/corosync/releases/. This release contains just a few bugfixes. Complete changelog for 1.4.9: Jan Friesse (7): Revert "ipcc

Re: [ClusterLabs] Corosync 1.4.9 is available at corosync.org!

2017-12-08 Thread Jan Friesse
Digimer, On 2017-12-08 11:46 AM, Jan Friesse wrote: I am pleased to announce the latest maintenance of old stable release of Corosync 1.4.9 available immediately from our website at http://build.clusterlabs.org/corosync/releases/. This release contains just a few bugfixes. Complete changelog

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-31 Thread Jan Friesse
Jonathan, Hi Honza, On 19/10/17 17:05, Jonathan Davies wrote: On 19/10/17 16:56, Jan Friesse wrote: Jonathan, On 18/10/17 16:18, Jan Friesse wrote: Jonathan, On 18/10/17 14:38, Jan Friesse wrote: Can you please try to remove "votequorum_exec_send_nodeinfo(us->node_id);&qu

[ClusterLabs] Corosync-qdevice 3.0 - Alpha2 is available at GitHub!

2018-04-27 Thread Jan Friesse
1): Bin Liu (1): qdevice: optarg should be str in init_from_cmap Ferenc Wágner (1): Fix typo: sucesfully -> successfully Jan Friesse (12): spec: Modernize spec file a bit init: Quote subshell result properly Quote certutils scripts properly Fix NULL poin

Re: [ClusterLabs] corosync 2.4 CPG config change callback

2018-07-02 Thread Jan Friesse
Hi Thomas, Hi, Am 04/25/2018 um 09:57 AM schrieb Jan Friesse: Thomas Lamprecht napsal(a): On 4/24/18 6:38 PM, Jan Friesse wrote: On 4/6/18 10:59 AM, Jan Friesse wrote: Thomas Lamprecht napsal(a): Am 03/09/2018 um 05:26 PM schrieb Jan Friesse: I've tested it too and yes, you are 100% right

[ClusterLabs] Corosync 3.0 - Alpha3 is available at corosync.org!

2018-04-30 Thread Jan Friesse
t: the parameter is reserved, must be NULL tools: don't distribute what we can easily make Jan Friesse (7): totemsrp: Implement sanity checks of received msgs totemsrp: Check join and leave msg length SECURITY: Remove SECURITY file totemsrp: Fix srp_addr_compare totemsr

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-19 Thread Jan Friesse
Jonathan, On 18/10/17 16:18, Jan Friesse wrote: Jonathan, On 18/10/17 14:38, Jan Friesse wrote: Can you please try to remove "votequorum_exec_send_nodeinfo(us->node_id);" line from votequorum.c in the votequorum_exec_init_fn function (around line 2306) and let me know if pro

[ClusterLabs] Corosync 2.4.3 is available at corosync.org!

2017-10-20 Thread Jan Friesse
tchdogs wd: remove extra capitalization typo corosync.conf.5: watchdog support is conditional Hideo Yamauchi (1): notifyd: Add the community name to an SNMP trap Jan Friesse (11): Logsys: Change logsys syslog_priority priority totemrrp: Fix situation when all rings are

Re: [ClusterLabs] [IMPORTANT] Fatal, yet rare issue verging on libqb's design flaw and/or it's use corosync around daemon-forking

2018-01-22 Thread Jan Friesse
It was discovered that corosync exposes itself for a self-crash under rare circumstance whereby corosync executable is run when there is already a daemon instance around (does not apply to corosync serving without any backgrounding, i.e. launched with "-f" switch). Such a circumstance can be

Re: [ClusterLabs] What does these logs mean in corosync.log

2018-02-12 Thread Jan Friesse
lkxjtu, I will just comment corosync log. These logs are both print when system is abnormal, I am very confused what they mean. Does anyone know what they mean? Thank you very much corosync version 2.4.0 pacemaker version 1.1.16 1) Feb 01 10:57:58 [18927] paas-controller-192-167-0-2

Re: [ClusterLabs] Does CMAN Still Not Support Multipe CoroSync Rings?

2018-02-12 Thread Jan Friesse
Eric, General question. I tried to set up a cman + corosync + pacemaker cluster using two corosync rings. When I start the cluster, everything works fine, except when I do a 'corosync-cfgtool -s' it only shows one ring. I tried manually editing the /etc/cluster/cluster.conf file adding two

Re: [ClusterLabs] Pacemaker 2.0.0-rc1 now available

2018-02-19 Thread Jan Friesse
Ken Gaillot napsal(a): On Fri, 2018-02-16 at 15:06 -0600, Ken Gaillot wrote: Source code for the first release candidate for Pacemaker version 2.0.0 is now available at:   https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-2.0 .0 -rc1 The main goal of the change from Pacemaker 1

Re: [ClusterLabs] corosync.conf token configuration

2018-01-03 Thread Jan Friesse
Adrián, Hello, I was wondering if someone have some description of the parameters: token, token_retransmits, token_retransmits_before_loss_const and consensus. I have read about it in the man page of corosync.conf but trying some configuration of the cluster I realized that I did not

  1   2   3   >