Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-11-15 Thread Jan Friesse




On 13/11/17 17:06, Jan Friesse wrote:

Jonathan,
I've finished (I hope) proper fix for problem you've seen, so can you
please try to test

https://github.com/corosync/corosync/pull/280

Thanks,
   Honza


Hi Honza,


Hi Jonathan,



Thanks very much for putting this fix together.

I'm happy to confirm that I do not see the problem with this fix.


That is a good news.



In my repro environment that normally triggers the problem once in every
2 attempts, I didn't see the problem at all after over 1000 attempts
with these patches.


Perfect. I'll wait for proper peer-review of code and merge after that.

Thanks again for your patience and big help with reproducing and testing 
of this bug.


Regards,
  Honza



Thanks!
Jonathan



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker responsible of DRBD and a systemd resource

2017-11-15 Thread Digimer
I've driven for 22 years and never needed my seatbelt before, but yet, I
still make sure I use it every time I am in a car. ;)

Why it happened now is perhaps an interesting question, but it is one I
would try to answer after fixing the core problem.

cheers,

digimer

On 2017-11-15 03:37 PM, Derek Wuelfrath wrote:
> And just to make sure, I’m not the kind of person who stick to the “we
> always did it that way…” ;)
> Just trying to figure out why it suddenly breaks.
> 
> -derek
> 
> --
> Derek Wuelfrath
> dwuelfr...@inverse.ca  :: +1.514.447.4918
> (x110) :: +1.866.353.6153 (x110)
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu ),
> PacketFence (www.packetfence.org ) and
> Fingerbank (www.fingerbank.org )
> 
>> On Nov 15, 2017, at 15:30, Derek Wuelfrath > > wrote:
>>
>> I agree. Thing is, we have this kind of setup deployed largely and
>> since a while. Never ran into any issue.
>> Not sure if something changed in Corosync/Pacemaker code or way of
>> dealing with systemd resources.
>>
>> As said, without a systemd resource, everything just work as it
>> should… 100% of the time
>> As soon as a systemd resource comes in, it breaks.
>>
>> -derek
>>
>> --
>> Derek Wuelfrath
>> dwuelfr...@inverse.ca  ::
>> +1.514.447.4918 (x110) :: +1.866.353.6153 (x110)
>> Inverse inc. :: Leaders behind SOGo (www.sogo.nu
>> ), PacketFence (www.packetfence.org
>> ) and Fingerbank (www.fingerbank.org
>> )
>>
>>> On Nov 14, 2017, at 23:03, Digimer >> > wrote:
>>>
>>> Quorum doesn't prevent split-brains, stonith (fencing) does. 
>>>
>>> https://www.alteeve.com/w/The_2-Node_Myth
>>>
>>> There is no way to use quorum-only to avoid a potential split-brain.
>>> You might be able to make it less likely with enough effort, but
>>> never prevent it.
>>>
>>> digimer
>>>
>>> On 2017-11-14 10:45 PM, Garima wrote:
 Hello All,
  
 Split-brain situation occurs due to there is a drop in quorum which
 leads to Spilt-brain situation and status information is not
 exchanged between both two nodes of the cluster. 
 This can be avoided if quorum communicates between both the nodes.
 I have checked the code. In My opinion these files need to be
 updated (quorum.py/stonith.py) to avoid the spilt-brain situation to
 maintain Active-Passive configuration.
  
 Regards,
 Garima
  
 *From:* Derek Wuelfrath [mailto:dwuelfr...@inverse.ca] 
 *Sent:* 13 November 2017 20:55
 *To:* Cluster Labs - All topics related to open-source clustering
 welcomed 
 *Subject:* Re: [ClusterLabs] Pacemaker responsible of DRBD and a
 systemd resource
  
 Hello Ken !
  

 Make sure that the systemd service is not enabled. If pacemaker is
 managing a service, systemd can't also be trying to start and
 stop it.

  
 It is not. I made sure of this in the first place :)
  

 Beyond that, the question is what log messages are there from around
 the time of the issue (on both nodes).

  
 Well, that’s the thing. There is not much log messages telling what
 is actually happening. The ’systemd’ resource is not even trying to
 start (nothing in either log for that resource). Here are the logs
 from my last attempt:
 Scenario:
 - Services were running on ‘pancakeFence2’. DRBD was synced and
 connected
 - I rebooted ‘pancakeFence2’. Services failed to ‘pancakeFence1’
 - After ‘pancakeFence2’ comes back, services are running just fine
 on ‘pancakeFence1’ but DRBD is in Standalone due to split-brain
  
 Logs for pancakeFence1: https://pastebin.com/dVSGPP78
 Logs for pancakeFence2: https://pastebin.com/at8qPkHE
  
 It really looks like the status checkup mechanism of
 corosync/pacemaker for a systemd resource force the resource to
 “start” and therefore, start the ones above that resource in the
 group (DRBD in instance).
 This does not happen for a regular OCF resource (IPaddr2 per example)

 Cheers!
 -dw
  
 --
 Derek Wuelfrath
 dwuelfr...@inverse.ca  ::
 +1.514.447.4918 (x110) :: +1.866.353.6153 (x110)
 Inverse inc. :: Leaders behind SOGo (www.sogo.nu
 ), PacketFence (www.packetfence.org
 ) and Fingerbank (www.fingerbank.org
 )


 On Nov 10, 2017, at 11:39, Ken Gaillot > wrote:
  
 On Thu, 2017-11-09 at 20:27 -0500, Derek Wuelfrath wrote:

 Hello there,


Re: [ClusterLabs] Pacemaker responsible of DRBD and a systemd resource

2017-11-15 Thread Derek Wuelfrath
And just to make sure, I’m not the kind of person who stick to the “we always 
did it that way…” ;)
Just trying to figure out why it suddenly breaks.

-derek

--
Derek Wuelfrath
dwuelfr...@inverse.ca  :: +1.514.447.4918 (x110) 
:: +1.866.353.6153 (x110)
Inverse inc. :: Leaders behind SOGo (www.sogo.nu ), 
PacketFence (www.packetfence.org ) and Fingerbank 
(www.fingerbank.org )

> On Nov 15, 2017, at 15:30, Derek Wuelfrath  wrote:
> 
> I agree. Thing is, we have this kind of setup deployed largely and since a 
> while. Never ran into any issue.
> Not sure if something changed in Corosync/Pacemaker code or way of dealing 
> with systemd resources.
> 
> As said, without a systemd resource, everything just work as it should… 100% 
> of the time
> As soon as a systemd resource comes in, it breaks.
> 
> -derek
> 
> --
> Derek Wuelfrath
> dwuelfr...@inverse.ca  :: +1.514.447.4918 
> (x110) :: +1.866.353.6153 (x110)
> Inverse inc. :: Leaders behind SOGo (www.sogo.nu ), 
> PacketFence (www.packetfence.org ) and 
> Fingerbank (www.fingerbank.org )
> 
>> On Nov 14, 2017, at 23:03, Digimer > > wrote:
>> 
>> Quorum doesn't prevent split-brains, stonith (fencing) does. 
>> 
>> https://www.alteeve.com/w/The_2-Node_Myth 
>> 
>> 
>> There is no way to use quorum-only to avoid a potential split-brain. You 
>> might be able to make it less likely with enough effort, but never prevent 
>> it.
>> 
>> digimer
>> 
>> On 2017-11-14 10:45 PM, Garima wrote:
>>> Hello All,
>>>  
>>> Split-brain situation occurs due to there is a drop in quorum which leads 
>>> to Spilt-brain situation and status information is not exchanged between 
>>> both two nodes of the cluster. 
>>> This can be avoided if quorum communicates between both the nodes.
>>> I have checked the code. In My opinion these files need to be updated 
>>> (quorum.py/stonith.py) to avoid the spilt-brain situation to maintain 
>>> Active-Passive configuration.
>>>  
>>> Regards,
>>> Garima
>>>  
>>> From: Derek Wuelfrath [mailto:dwuelfr...@inverse.ca 
>>> ] 
>>> Sent: 13 November 2017 20:55
>>> To: Cluster Labs - All topics related to open-source clustering welcomed 
>>>  
>>> Subject: Re: [ClusterLabs] Pacemaker responsible of DRBD and a systemd 
>>> resource
>>>  
>>> Hello Ken !
>>>  
>>> Make sure that the systemd service is not enabled. If pacemaker is
>>> managing a service, systemd can't also be trying to start and stop it.
>>>  
>>> It is not. I made sure of this in the first place :)
>>>  
>>> Beyond that, the question is what log messages are there from around
>>> the time of the issue (on both nodes).
>>>  
>>> Well, that’s the thing. There is not much log messages telling what is 
>>> actually happening. The ’systemd’ resource is not even trying to start 
>>> (nothing in either log for that resource). Here are the logs from my last 
>>> attempt:
>>> Scenario:
>>> - Services were running on ‘pancakeFence2’. DRBD was synced and connected
>>> - I rebooted ‘pancakeFence2’. Services failed to ‘pancakeFence1’
>>> - After ‘pancakeFence2’ comes back, services are running just fine on 
>>> ‘pancakeFence1’ but DRBD is in Standalone due to split-brain
>>>  
>>> Logs for pancakeFence1: https://pastebin.com/dVSGPP78 
>>> 
>>> Logs for pancakeFence2: https://pastebin.com/at8qPkHE 
>>> 
>>>  
>>> It really looks like the status checkup mechanism of corosync/pacemaker for 
>>> a systemd resource force the resource to “start” and therefore, start the 
>>> ones above that resource in the group (DRBD in instance).
>>> This does not happen for a regular OCF resource (IPaddr2 per example)
>>> 
>>> Cheers!
>>> -dw
>>>  
>>> --
>>> Derek Wuelfrath
>>> dwuelfr...@inverse.ca  :: +1.514.447.4918 
>>> (x110) :: +1.866.353.6153 (x110)
>>> Inverse inc. :: Leaders behind SOGo (www.sogo.nu ), 
>>> PacketFence (www.packetfence.org ) and 
>>> Fingerbank (www.fingerbank.org )
>>> 
>>> 
>>> On Nov 10, 2017, at 11:39, Ken Gaillot >> > wrote:
>>>  
>>> On Thu, 2017-11-09 at 20:27 -0500, Derek Wuelfrath wrote:
>>> 
>>> Hello there,
>>> 
>>> First post here but following since a while!
>>> 
>>> Welcome!
>>> 
>>> 
>>> 
>>> Here’s my issue,
>>> we are putting in place and running this type of cluster since a
>>> while and never really encountered this kind of problem.
>>> 
>>> I recently set up a Corosync / Pacemaker / PCS cluster to manage DRBD
>>> along with different 

Re: [ClusterLabs] Pacemaker responsible of DRBD and a systemd resource

2017-11-15 Thread Derek Wuelfrath
I agree. Thing is, we have this kind of setup deployed largely and since a 
while. Never ran into any issue.
Not sure if something changed in Corosync/Pacemaker code or way of dealing with 
systemd resources.

As said, without a systemd resource, everything just work as it should… 100% of 
the time
As soon as a systemd resource comes in, it breaks.

-derek

--
Derek Wuelfrath
dwuelfr...@inverse.ca  :: +1.514.447.4918 (x110) 
:: +1.866.353.6153 (x110)
Inverse inc. :: Leaders behind SOGo (www.sogo.nu ), 
PacketFence (www.packetfence.org ) and Fingerbank 
(www.fingerbank.org )

> On Nov 14, 2017, at 23:03, Digimer  wrote:
> 
> Quorum doesn't prevent split-brains, stonith (fencing) does. 
> 
> https://www.alteeve.com/w/The_2-Node_Myth 
> 
> 
> There is no way to use quorum-only to avoid a potential split-brain. You 
> might be able to make it less likely with enough effort, but never prevent it.
> 
> digimer
> 
> On 2017-11-14 10:45 PM, Garima wrote:
>> Hello All,
>>  
>> Split-brain situation occurs due to there is a drop in quorum which leads to 
>> Spilt-brain situation and status information is not exchanged between both 
>> two nodes of the cluster. 
>> This can be avoided if quorum communicates between both the nodes.
>> I have checked the code. In My opinion these files need to be updated 
>> (quorum.py/stonith.py) to avoid the spilt-brain situation to maintain 
>> Active-Passive configuration.
>>  
>> Regards,
>> Garima
>>  
>> From: Derek Wuelfrath [mailto:dwuelfr...@inverse.ca 
>> ] 
>> Sent: 13 November 2017 20:55
>> To: Cluster Labs - All topics related to open-source clustering welcomed 
>>  
>> Subject: Re: [ClusterLabs] Pacemaker responsible of DRBD and a systemd 
>> resource
>>  
>> Hello Ken !
>>  
>> Make sure that the systemd service is not enabled. If pacemaker is
>> managing a service, systemd can't also be trying to start and stop it.
>>  
>> It is not. I made sure of this in the first place :)
>>  
>> Beyond that, the question is what log messages are there from around
>> the time of the issue (on both nodes).
>>  
>> Well, that’s the thing. There is not much log messages telling what is 
>> actually happening. The ’systemd’ resource is not even trying to start 
>> (nothing in either log for that resource). Here are the logs from my last 
>> attempt:
>> Scenario:
>> - Services were running on ‘pancakeFence2’. DRBD was synced and connected
>> - I rebooted ‘pancakeFence2’. Services failed to ‘pancakeFence1’
>> - After ‘pancakeFence2’ comes back, services are running just fine on 
>> ‘pancakeFence1’ but DRBD is in Standalone due to split-brain
>>  
>> Logs for pancakeFence1: https://pastebin.com/dVSGPP78 
>> 
>> Logs for pancakeFence2: https://pastebin.com/at8qPkHE 
>> 
>>  
>> It really looks like the status checkup mechanism of corosync/pacemaker for 
>> a systemd resource force the resource to “start” and therefore, start the 
>> ones above that resource in the group (DRBD in instance).
>> This does not happen for a regular OCF resource (IPaddr2 per example)
>> 
>> Cheers!
>> -dw
>>  
>> --
>> Derek Wuelfrath
>> dwuelfr...@inverse.ca  :: +1.514.447.4918 
>> (x110) :: +1.866.353.6153 (x110)
>> Inverse inc. :: Leaders behind SOGo (www.sogo.nu ), 
>> PacketFence (www.packetfence.org ) and 
>> Fingerbank (www.fingerbank.org )
>> 
>> 
>> On Nov 10, 2017, at 11:39, Ken Gaillot > > wrote:
>>  
>> On Thu, 2017-11-09 at 20:27 -0500, Derek Wuelfrath wrote:
>> 
>> Hello there,
>> 
>> First post here but following since a while!
>> 
>> Welcome!
>> 
>> 
>> 
>> Here’s my issue,
>> we are putting in place and running this type of cluster since a
>> while and never really encountered this kind of problem.
>> 
>> I recently set up a Corosync / Pacemaker / PCS cluster to manage DRBD
>> along with different other resources. Part of theses resources are
>> some systemd resources… this is the part where things are “breaking”.
>> 
>> Having a two servers cluster running only DRBD or DRBD with an OCF
>> ipaddr2 resource (Cluser IP in instance) works just fine. I can
>> easily move from one node to the other without any issue.
>> As soon as I add a systemd resource to the resource group, things are
>> breaking. Moving from one node to the other using standby mode works
>> just fine but as soon as Corosync / Pacemaker restart involves
>> polling of a systemd resource, it seems like it is trying to start
>> the whole resource group and therefore, create a split-brain of the
>> DRBD resource.
>> 
>> My first two suggestions would be:
>> 
>> 

Re: [ClusterLabs] systemd's TasksMax and pacemaker

2017-11-15 Thread Jan Pokorný
On 14/11/17 15:07 -0600, Ken Gaillot wrote:
> It is conceivable in a large cluster that Pacemaker could exceed
> this limit

[of 512 or 4915 tasks allowed per service process tree, possibly
overridden with systemd-system.conf(5) configuration],

> so we are now recommending that users set TasksMax=infinity in the
> Pacemaker unit file if building from scratch, or in a local override
> if already deployed, to disable the limit.

Thanks for broadcasting this ;)

> We are not setting TasksMax=infinity in the shipped unit file in the
> soon-to-be-released version 1.1.18 because older versions of systemd
> will log a warning about an "Unknown lvalue". However, we will set it
> in the 2.0.0 release, when we'll be making a number of behavioral
> changes.
> 
> Particular OS distributions may have backported the TasksMax feature to
> an older version of systemd, and/or changed its default value. For
> example, in RHEL, TasksMax was backported as of RHEL 7.3, but the
> default was changed to infinity.

Note also Ansible playbooks and other means of automation and
provisioning may have a final say on the system-wide default limit.
These unattended changes shall also be put into decision whether you
need a local customization of pristine pacemaker service file, at
least for versions prior to 2.0.0, to avoid any unpleasant surprises
with process proliferation limiting ("anti-fork-bomb") systemd.

-- 
Poki


pgprQfBl8l990.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org