Re: [ClusterLabs] Regression in Filesystem RA

2017-10-18 Thread Christian Balzer
Hello Dejan, On Tue, 17 Oct 2017 13:13:11 +0200 Dejan Muhamedagic wrote: > Hi Lars, > > On Mon, Oct 16, 2017 at 08:52:04PM +0200, Lars Ellenberg wrote: > > On Mon, Oct 16, 2017 at 08:09:21PM +0200, Dejan Muhamedagic wrote: > > > Hi, > > > > > > On Thu, Oct 12, 2017 at 03:30:30PM +0900,

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Ken Gaillot
On Wed, 2017-10-18 at 16:58 +0200, Gerard Garcia wrote: > I'm using version 1.1.15-11.el7_3.2-e174ec8. As far as I know the > latest stable version in Centos 7.3 > > Gerard Interesting ... this was an undetected bug that was coincidentally fixed by the recent fail-count work released in 1.1.17.

Re: [ClusterLabs] monitor failed actions not cleared

2017-10-18 Thread Ken Gaillot
On Mon, 2017-10-02 at 13:29 +, LE COQUIL Pierre-Yves wrote: > Hi, >   > I finally found my mistake: > I have set up the failure-timeout like the lifetime example in the > RedHat Documentation with the value PT1M. > If I set up the failure-timeout with 60, it works like it should. This is a

Re: [ClusterLabs] VirtualDomain live migration error

2017-10-18 Thread Ken Gaillot
On Sat, 2017-09-02 at 01:21 +0200, Oscar Segarra wrote: > Hi,  > > I have updated the known_hosts: > > Now, I get the following error: > > Sep 02 01:03:41 [1535] vdicnode01        cib:     info: > cib_perform_op: + >  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resou >

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-18 Thread Lentes, Bernd
- On Oct 16, 2017, at 10:57 PM, kgaillot kgail...@redhat.com wrote: >> from the Changelog: >> >> Changes since Pacemaker-1.1.15 >>   ... >>   + pengine: do not fence a node in maintenance mode if it shuts down >> cleanly >>   ... >> >> just saying ... may or may not be what you are seeing.

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-18 Thread Lentes, Bernd
- On Oct 16, 2017, at 9:27 PM, Digimer li...@alteeve.ca wrote: > > I understood what you meant about it getting fenced after stopping > corosync. What I am not clear on is if you are stopping corosync on the > normal node, or the node that is in maintenance mode. > > In either case, as I

Re: [ClusterLabs] Regression in Filesystem RA

2017-10-18 Thread Christian Balzer
On Mon, 16 Oct 2017 20:52:04 +0200 Lars Ellenberg wrote: > On Mon, Oct 16, 2017 at 08:09:21PM +0200, Dejan Muhamedagic wrote: > > Hi, > > > > On Thu, Oct 12, 2017 at 03:30:30PM +0900, Christian Balzer wrote: > > > > > > Hello, > > > > > > 2nd post in 10 years, lets see if this one gets an

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jan Friesse
Jonathan, On 18/10/17 14:38, Jan Friesse wrote: Can you please try to remove "votequorum_exec_send_nodeinfo(us->node_id);" line from votequorum.c in the votequorum_exec_init_fn function (around line 2306) and let me know if problem persists? Wow! With that change, I'm pleased to say that

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Gerard Garcia
I'm using version 1.1.15-11.el7_3.2-e174ec8. As far as I know the latest stable version in Centos 7.3 Gerard On Wed, Oct 18, 2017 at 4:42 PM, Ken Gaillot wrote: > On Wed, 2017-10-18 at 14:25 +0200, Gerard Garcia wrote: > > So I think I found the problem. The two resources

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Ken Gaillot
On Wed, 2017-10-18 at 14:25 +0200, Gerard Garcia wrote: > So I think I found the problem. The two resources are named forwarder > and bgpforwarder. It doesn't matter if bgpforwarder exists. It is > just that when I set the failcount to INFINITY to a resource named > bgpforwarder (crm_failcount -r

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jonathan Davies
On 18/10/17 14:38, Jan Friesse wrote: Can you please try to remove "votequorum_exec_send_nodeinfo(us->node_id);" line from votequorum.c in the votequorum_exec_init_fn function (around line 2306) and let me know if problem persists? Wow! With that change, I'm pleased to say that I'm not able

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jan Friesse
Jonathan, On 16/10/17 15:58, Jan Friesse wrote: Jonathan, On 13/10/17 17:24, Jan Friesse wrote: I've done a bit of digging and am getting closer to the root cause of the race. We rely on having votequorum_sync_init called twice -- once when node 1 joins (with member_list_entries=2) and

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Gerard Garcia
So I think I found the problem. The two resources are named forwarder and bgpforwarder. It doesn't matter if bgpforwarder exists. It is just that when I set the failcount to INFINITY to a resource named bgpforwarder (crm_failcount -r bgpforwarder -v INFINITY) it directly affects the forwarder

Re: [ClusterLabs] Fwd: Stopped DRBD

2017-10-18 Thread Vladislav Bogdanov
Hi, ensure you have two monitor operations configured for your drbd resource: for 'Master' and 'Slave' roles ('Slave' == 'Started' == '' for ms resources). http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_monitoring_multi_state_resources.html 18.10.2017 11:18, Антон