Re: [ClusterLabs] Regression in Filesystem RA

2017-10-18 Thread Christian Balzer

Hello Dejan,

On Tue, 17 Oct 2017 13:13:11 +0200 Dejan Muhamedagic wrote:

> Hi Lars,
> 
> On Mon, Oct 16, 2017 at 08:52:04PM +0200, Lars Ellenberg wrote:
> > On Mon, Oct 16, 2017 at 08:09:21PM +0200, Dejan Muhamedagic wrote:  
> > > Hi,
> > > 
> > > On Thu, Oct 12, 2017 at 03:30:30PM +0900, Christian Balzer wrote:  
> > > > 
> > > > Hello,
> > > > 
> > > > 2nd post in 10 years, lets see if this one gets an answer unlike the 
> > > > first
> > > > one...  
> > 
> > Do you want to make me check for the old one? ;-)
> >   
> > > > One of the main use cases for pacemaker here are DRBD replicated
> > > > active/active mailbox servers (dovecot/exim) on Debian machines. 
> > > > We've been doing this for a loong time, as evidenced by the oldest pair
> > > > still running Wheezy with heartbeat and pacemaker 1.1.7.
> > > > 
> > > > The majority of cluster pairs is on Jessie with corosync and backported
> > > > pacemaker 1.1.16.
> > > > 
> > > > Yesterday we had a hiccup, resulting in half the machines loosing
> > > > their upstream router for 50 seconds which in turn caused the pingd RA 
> > > > to
> > > > trigger a fail-over of the DRBD RA and associated resource group
> > > > (filesystem/IP) to the other node. 
> > > > 
> > > > The old cluster performed flawlessly, the newer clusters all wound up 
> > > > with
> > > > DRBD and FS resource being BLOCKED as the processes holding open the
> > > > filesystem didn't get killed fast enough.
> > > > 
> > > > Comparing the 2 RAs (no versioning T_T) reveals a large change in the
> > > > "signal_processes" routine.
> > > > 
> > > > So with the old Filesystem RA using fuser we get something like this and
> > > > thousands of processes killed per second:
> > > > ---
> > > > Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: 
> > > > (res_Filesystem_mb07:stop:stdout)   3478  3593   ...
> > > > Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: 
> > > > (res_Filesystem_mb07:stop:stderr) 
> > > > cmccmccmccmcmcmcmcmccmccmcmcmcmcmcmcmcmcmcmcmcmccmcm
> > > > Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: 
> > > > (res_Filesystem_mb07:stop:stdout)   4032  4058   ...
> > > > ---
> > > > 
> > > > Whereas the new RA (newer isn't better) that goes around killing 
> > > > processes
> > > > individually with beautiful logging was a total fail at about 4 
> > > > processes
> > > > per second killed...
> > > > ---
> > > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: 
> > > > sending signal TERM to: mail42264909  0 09:43 ?S
> > > >   0:00 dovecot/imap 
> > > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: 
> > > > sending signal TERM to: mail42294909  0 09:43 ?S
> > > >   0:00 dovecot/imap [idling]
> > > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: 
> > > > sending signal TERM to: mail42384909  0 09:43 ?S
> > > >   0:00 dovecot/imap 
> > > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: 
> > > > sending signal TERM to: mail42394909  0 09:43 ?S
> > > >   0:00 dovecot/imap 
> > > > ---
> > > > 
> > > > So my questions are:
> > > > 
> > > > 1. Am I the only one with more than a handful of processes per FS who
> > > > can't afford to wait hours the new routine to finish?  
> > > 
> > > The change was introduced about five years ago.  
> > 
Yeah, that was thanks to Debian Jessie not having pacemaker at all from
the start and when the backport arrived it was corosync only w/o a
graceful transition from heartbeat option, so quite a few machines stayed
at wheezy (thanks to the LTS efforts). 

> > Also, usually there should be no process anymore,
> > because whatever is using the Filesystem should have it's own RA,
> > which should have appropriate constraints,
> > which means that should have been called and "stop"ped first,
> > before the Filesystem stop and umount, and only the "accidental,
> > stray, abandoned, idle since three weeks, operator shell session,
> > that happend to cd into that file system" is supposed to be around
> > *unexpectedly* and in need of killing, and not "thousands of service
> > processes, expectedly".  
> 
> Indeed, but obviously one can never tell ;-)
> 
> > So arguably your setup is broken,  
> 
> Or the other RA didn't/couldn't stop the resource ...
> 
See my previous mail, there is no good/right way to solve this with a RA
for dovecot, which would essentially mimic what the FS RA should be doing,
since stopping dovecot entirely is not what is called for.

> > relying on a fall-back workaround
> > which used to "perform" better.
> > 
> > The bug is not that this fall-back workaround now
> > has pretty printing and is much slower (and eventually times out),
> > the bug is that you don't properly kill the service first.
> > [and that you don't have fencing].  
> 
> ... and didn't exit with an appropriate exit code (i.e. fail).
> 
Could somebody elaborate on this, 

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Ken Gaillot
On Wed, 2017-10-18 at 16:58 +0200, Gerard Garcia wrote:
> I'm using version 1.1.15-11.el7_3.2-e174ec8. As far as I know the
> latest stable version in Centos 7.3
> 
> Gerard

Interesting ... this was an undetected bug that was coincidentally
fixed by the recent fail-count work released in 1.1.17. The bug only
affected cloned resources where one clone's name ended with the
other's.

FYI, CentOS 7.4 has 1.1.16, but that won't help this issue.

> 
> On Wed, Oct 18, 2017 at 4:42 PM, Ken Gaillot 
> wrote:
> > On Wed, 2017-10-18 at 14:25 +0200, Gerard Garcia wrote:
> > > So I think I found the problem. The two resources are named
> > forwarder
> > > and bgpforwarder. It doesn't matter if bgpforwarder exists. It is
> > > just that when I set the failcount to INFINITY to a resource
> > named
> > > bgpforwarder (crm_failcount -r bgpforwarder -v INFINITY) it
> > directly
> > > affects the forwarder resource. 
> > >
> > > If I change the name to forwarderbgp, the problem disappears. So
> > it
> > > seems that the problem is that Pacemaker mixes the bgpforwarder
> > and
> > > forwarder names. Is it a bug?
> > >
> > > Gerard
> > 
> > That's really surprising. What version of pacemaker are you using?
> > There were a lot of changes in fail count handling in the last few
> > releases.
> > 
> > >
> > > On Tue, Oct 17, 2017 at 6:27 PM, Gerard Garcia 
> > > wrote:
> > > > That makes sense. I've tried copying the anything resource and
> > > > changed its name and id (which I guess should be enough to make
> > > > pacemaker think they are different) but I still have the same
> > > > problem.
> > > >
> > > > After more debugging I have reduced the problem to this:
> > > > * First cloned resource running fine
> > > > * Second cloned resource running fine
> > > > * Manually set failcount to INFINITY to second cloned resource
> > > > * Pacemaker triggers an stop operation (without monitor
> > operation
> > > > failing) for the two resources in the node where the failcount
> > has
> > > > been set to INFINITY.
> > > > * Reset failcount starts the two resources again
> > > >
> > > > Weirdly enough the second resource doesn't stop if I set the
> > the
> > > > the first resource failcount to INFINITY (not even the first
> > > > resource stops...). 
> > > >
> > > > But:
> > > > * If I set the first resource as globally-unique=true it does
> > not
> > > > stop so somehow this breaks the relation.
> > > > * If I manually set the failcount to 0 in the first resource
> > that
> > > > also breaks the relation so it does not stop either. It seems
> > like
> > > > the failcount value is being inherited from the second resource
> > > > when it does not have any value. 
> > > >
> > > > I must have something wrongly configuration but I can't really
> > see
> > > > why there is this relationship...
> > > >
> > > > Gerard
> > > >
> > > > On Tue, Oct 17, 2017 at 3:35 PM, Ken Gaillot  > om>
> > > > wrote:
> > > > > On Tue, 2017-10-17 at 11:47 +0200, Gerard Garcia wrote:
> > > > > > Thanks Ken. Yes, inspecting the logs seems that the
> > failcount
> > > > > of the
> > > > > > correctly running resource reaches the maximum number of
> > > > > allowed
> > > > > > failures and gets banned in all nodes.
> > > > > >
> > > > > > What is weird is that I just see how the failcount for the
> > > > > first
> > > > > > resource gets updated, is like the failcount are being
> > mixed.
> > > > > In
> > > > > > fact, when the two resources get banned the only way I have
> > to
> > > > > make
> > > > > > the first one start is to disable the failing one and clean
> > the
> > > > > > failcount of the two resources (it is not enough to only
> > clean
> > > > > the
> > > > > > failcount of the first resource) does it make sense?
> > > > > >
> > > > > > Gerard
> > > > >
> > > > > My suspicion is that you have two instances of the same
> > service,
> > > > > and
> > > > > the resource agent monitor is only checking the general
> > service,
> > > > > rather
> > > > > than a specific instance of it, so the monitors on both of
> > them
> > > > > return
> > > > > failure if either one is failing.
> > > > >
> > > > > That would make sense why you have to disable the failing
> > > > > resource, so
> > > > > its monitor stops running. I can't think of why you'd have to
> > > > > clean its
> > > > > failcount for the other one to start, though.
> > > > >
> > > > > The "anything" agent very often causes more problems than it
> > > > > solves ...
> > > > >  I'd recommend writing your own OCF agent tailored to your
> > > > > service.
> > > > > It's not much more complicated than an init script.
> > > > >
> > > > > > On Mon, Oct 16, 2017 at 6:57 PM, Ken Gaillot  > at.c
> > > > > om>
> > > > > > wrote:
> > > > > > > On Mon, 2017-10-16 at 18:30 +0200, Gerard Garcia wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I have a cluster with two ocf:heartbeat:anything
> > resources
> > > > > each
> > > > 

Re: [ClusterLabs] monitor failed actions not cleared

2017-10-18 Thread Ken Gaillot
On Mon, 2017-10-02 at 13:29 +, LE COQUIL Pierre-Yves wrote:
> Hi,
>  
> I finally found my mistake:
> I have set up the failure-timeout like the lifetime example in the
> RedHat Documentation with the value PT1M.
> If I set up the failure-timeout with 60, it works like it should.

This is a bug somewhere in pacemaker. I recently got a bug report
related to recurring monitors, so I'm taking a closer look at time
interval handling in general. I'll make sure to figure out where this
one is.

>  
> Just trying a last question …:
> Couldn’t it be something in the log telling the value isn’t at the
> right format ?

Definitely, it should ... though in this case, it should parse PT1M
correctly to begin with.

> Pierre-Yves
>  
>  
> De : LE COQUIL Pierre-Yves 
> Envoyé : mercredi 27 septembre 2017 19:37
> À : 'users@clusterlabs.org' 
> Objet : RE: monitor failed actions not cleared
>  
>  
>  
> De : LE COQUIL Pierre-Yves 
> Envoyé : lundi 25 septembre 2017 16:58
> À : 'users@clusterlabs.org' 
> Objet : monitor failed actions not cleared
>  
> Hi,
>  
> I’am using Pacemaker  1.1.15-11.el7_3.4 / Corosync 2.4.0-4.el7 under
> CentOS 7.3.1611
>  
> ð  Is this configuration too old ? (yum indicates these versions are
> up to date)

No, those are recent versions. CentOS 7.4 has slightly newer versions,
but there's nothing wrong with staying on those for now.

> ð  Should I install more recent versions of Pacemaker and Corosync ?
>  
> My subject is very close to the post “clearing failed actions”
> initiated by Attila Megyeri in May 2017.
> But the issue doesn’t fit my case.
>  
> What I want to do is:
> -  2 systemd resources running on 1 of the 2 nodes of my
> cluster,
> -  When  1 resource fails (by killing it or by moving the
> resource), I want it to be restarted on the other node, but I want
> the other resource still running on the same node.
>  
> ð  Is this possible with Pacemaker ?
>  
> What I have done in addition to the default parameters:
> -  For my resources:
> o   migration-threshold=1,
> o   failure-timeout=PT1M
> -  For the cluster
> o   Cluster-recheck-interval=120
>  
> I have added for my resource operation monitor: on-fail=restart
> (which is the default)
>  
> I do not use Fencing (Stonith Enabled = false)
> ð  Is Fencing compatible with my goal ?

Yes, fencing should be considered a requirement for a stable cluster.

Fencing handles node-level failures rather than resource-level
failures. If a node becomes unresponsive, the rest of the cluster can't
know whether it is inoperational (and thus unable to pose any conflict)
or just misbehaving (perhaps the CPU is overloaded, or a network card
went out, or ...) in which case it's not safe to recover resources
elsewhere. Fencing makes it certain it's safe.

> What happens:
> -  When I kill or move 1 resource, it is restarted on the
> other node => OK
> -  The failcount is incremented to 1 for this resource => OK
> -  The failcount is never cleared => NOK
>  
> ð  I get a warning in the log :
> “pengine:  warning: unpack_rsc_op_failure:    Processing failed
> op monitor for ACTIVATION_KX on metro.cas-n1: not running (7)”
> when my resource  ACTIVATION_KX has been killed on node  metro.cas-n1
> but pcs status shows ACTIVATION_KX is started on the other node

It's a longstanding to-do to improve this message ... it doesn't
(necessarily) mean any new failure has occurred. It just means the
policy engine is processing the resource history, which includes a
failure (which could be recent, or old). The log message will show up
every time the policy engine runs, and continue to be displayed in the
status failure history, until you clean the resource.

> ð  Is it a bad monitor operation configuration for my resource ? (I
> have added “requires= nothing”)

Your configuration is fine, although "requires" has no effect in a
monitor operation. It's only relevant for start and promote operations,
and even then, it's deprecated to set it in the operation configuration
... it belongs in the resource configuration now. "requires=nothing" is
highly unlikely to be what you want, though; the default is usually
sufficient.

> I know that my english and my pacemaker knowledge are not so high but
> could you please give me some explanations about that behavior that I
> misunderstand.

Not at all, this was a very clear and well-thought-out post :)

> ð  If something is wrong with my post, just tell me (this is my
> first)
>  
> Thank you
>  
> Thanks
>  
> Pierre-Yves Le Coquil
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] VirtualDomain live migration error

2017-10-18 Thread Ken Gaillot
On Sat, 2017-09-02 at 01:21 +0200, Oscar Segarra wrote:
> Hi, 
> 
> I have updated the known_hosts:
> 
> Now, I get the following error:
> 
> Sep 02 01:03:41 [1535] vdicnode01        cib:     info:
> cib_perform_op: +
>  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resou
> rce[@id='vm-vdicdb01']/lrm_rsc_op[@id='vm-vdicdb01_last_0']:
>  @operation_key=vm-vdicdb01_migrate_to_0, @operation=migrate_to,
> @crm-debug-origin=cib_action_update, @transition-key=6:27:0:a7fef266-
> 46c3-429e-ab00-c1a0aab24da5, @transition-magic=-
> 1:193;6:27:0:a7fef266-46c3-429e-ab00-c1a0aab24da5, @call-id=-1, @rc-
> code=193, @op-status=-1, @last-run=1504307021, @last-rc-c
> Sep 02 01:03:41 [1535] vdicnode01        cib:     info:
> cib_process_request:    Completed cib_modify operation for section
> status: OK (rc=0, origin=vdicnode01/crmd/77, version=0.169.1)
> VirtualDomain(vm-vdicdb01)[13085]:      2017/09/02_01:03:41 INFO:
> vdicdb01: Starting live migration to vdicnode02 (using: virsh --
> connect=qemu:///system --quiet migrate --live  vdicdb01
> qemu+ssh://vdicnode02/system ).
> VirtualDomain(vm-vdicdb01)[13085]:      2017/09/02_01:03:41 ERROR:
> vdicdb01: live migration to vdicnode02 failed: 1
>  ]p 02 01:03:41 [1537] vdicnode01       lrmd:   notice:
> operation_finished:     vm-vdicdb01_migrate_to_0:13085:stderr [
> error: Cannot recv data: Permission denied, please try again.
>  ]p 02 01:03:41 [1537] vdicnode01       lrmd:   notice:
> operation_finished:     vm-vdicdb01_migrate_to_0:13085:stderr [
> Permission denied, please try again.
> Sep 02 01:03:41 [1537] vdicnode01       lrmd:   notice:
> operation_finished:     vm-vdicdb01_migrate_to_0:13085:stderr [
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).: 
> Connection reset by peer ]
> Sep 02 01:03:41 [1537] vdicnode01       lrmd:   notice:
> operation_finished:     vm-vdicdb01_migrate_to_0:13085:stderr [ ocf-
> exit-reason:vdicdb01: live migration to vdicnode02 failed: 1 ]
> Sep 02 01:03:41 [1537] vdicnode01       lrmd:     info: log_finished:
>   finished - rsc:vm-vdicdb01 action:migrate_to call_id:16 pid:13085
> exit-code:1 exec-time:119ms queue-time:0ms
> Sep 02 01:03:41 [1540] vdicnode01       crmd:   notice:
> process_lrm_event:      Result of migrate_to operation for vm-
> vdicdb01 on vdicnode01: 1 (unknown error) | call=16 key=vm-
> vdicdb01_migrate_to_0 confirmed=true cib-update=78
> Sep 02 01:03:41 [1540] vdicnode01       crmd:   notice:
> process_lrm_event:      vdicnode01-vm-vdicdb01_migrate_to_0:16 [
> error: Cannot recv data: Permission denied, please try
> again.\r\nPermission denied, please try again.\r\nPermission denied
> (publickey,gssapi-keyex,gssapi-with-mic,password).: Connection reset
> by peer\nocf-exit-reason:vdicdb01: live migration to vdicnode02
> failed: 1\n ]
> Sep 02 01:03:41 [1535] vdicnode01        cib:     info:
> cib_process_request:    Forwarding cib_modify operation for section
> status to all (origin=local/crmd/78)
> Sep 02 01:03:41 [1535] vdicnode01        cib:     info:
> cib_perform_op: Diff: --- 0.169.1 2
> Sep 02 01:03:41 [1535] vdicnode01        cib:     info:
> cib_perform_op: Diff: +++ 0.169.2 (null)
> Sep 02 01:03:41 [1535] vdicnode01        cib:     info:
> cib_perform_op: +  /cib:  @num_updates=2
> Sep 02 01:03:41 [1535] vdicnode01        cib:     info:
> cib_perform_op: +
>  /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resou
> rce[@id='vm-vdicdb01']/lrm_rsc_op[@id='vm-vdicdb01_last_0']:  @crm-
> debug-origin=do_update_resource, @transition-
> magic=0:1;6:27:0:a7fef266-46c3-429e-ab00-c1a0aab24da5, @call-id=16,
> @rc-code=1, @op-status=0, @exec-time=119, @exit-reason=vdicdb01: live
> migration to vdicnode02 failed: 1
> Sep 02 01:03:4
> 
> as root <-- system prompts the password
> [root@vdicnode01 .ssh]# virsh --connect=qemu:///system --quiet
> migrate --live  vdicdb01 qemu+ssh://vdicnode02/system
> root@vdicnode02's password:
> 
> as oneadmin (the user that executes the qemu-kvm) <-- does not prompt
> the password
> virsh --connect=qemu:///system --quiet migrate --live  vdicdb01
> qemu+ssh://vdicnode02/system
> 
> Must I configure passwordless connection with root in order to make
> live migration work?
> 
> Or maybe is there any way to instruct pacemaker to use my oneadmin
> user for migrations inestad of root?

Pacemaker calls the VirtualDomain resource agent as root, but it's up
to the agent what to do from there. I don't see any user options in
VirtualDomain or virsh, so I don't think there is currently.

I see two options: configure passwordless ssh for root, or copy the
VirtualDomain resource and modify it to use "sudo -u oneadmin" when it
calls virsh.

We've discussed adding the capability to tell pacemaker to execute a
resource agent as a particular user. We've already put the plumbing in
for it, so that lrmd can execute alert agents as the hacluster user.
All that would be needed would be a new resource meta-attribute and the
IPC API to use it. It's low 

Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-18 Thread Lentes, Bernd
- On Oct 16, 2017, at 10:57 PM, kgaillot kgail...@redhat.com wrote:


>> from the Changelog:
>> 
>> Changes since Pacemaker-1.1.15
>>   ...
>>   + pengine: do not fence a node in maintenance mode if it shuts down
>> cleanly
>>   ...
>> 
>> just saying ... may or may not be what you are seeing.
>> 
>> Short term "workaround" may be to do things differently.
>> Maybe just set the cluster wide maintenance mode, not per node?
> 
> Sounds right.
> 
> Another thing to keep in mind is that even if pacemaker doesn't fence
> the node, if you use DLM, DLM might fence the node (it doesn't know
> about or respect any pacemaker maintenance/unmanaged settings).
> 
> I'd stop pacemaker before stopping corosync, in any case. In
> maintenance mode, that should be fine. I don't think a running
> pacemaker would be able to reconnect to corosync after corosync comes
> back.
> 

As Ulrich already mentioned the suse openais init script is responsible for 
both, pacemaker and corosync.

I have DLM in combination with cLVM, maybe that's the culprit. I will test to 
stop the DLM and cLVM resource before doing maintenance and stop corosync, 
maybe then it's not fenced.
I'm thinking of stopping using DLM in conjunction with cLVM and a SAN. I read 
an article (http://www.admin-magazine.com/Articles/Live-Migration , see chapter 
"The Weakest Link")
saying that DLM is tricky and not completely stable. It mentioned that Bastian 
Blank, who seems to be a maintainer of the Debian team, deactivated cLVM in the 
debian kernel. But the article is from 2013, so i'm not pretty sure.
Maybe DRBD and no SAN, so no DLM would be the better solution.


Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] set node in maintenance - stop corosync - node is fenced - is that correct ?

2017-10-18 Thread Lentes, Bernd


- On Oct 16, 2017, at 9:27 PM, Digimer li...@alteeve.ca wrote:


> 
> I understood what you meant about it getting fenced after stopping
> corosync. What I am not clear on is if you are stopping corosync on the
> normal node, or the node that is in maintenance mode.
> 
> In either case, as I understand it, maintenance mode doesn't stop
> pacemaker, so it can still react to the sudden loss of membership.
> 
> I wonder; Why are you stopping corosync? If you want to stop the node,
> why not stop pacemaker entirely first?
> 

I did a /etc/init.d/openais stopped on that node i put in maintenance via "crm 
node maintenance "

I think on my SLES 11 SP4 the init script from openais is responsible for both: 
cluster (pacemaker) and communication (openais/corosync).
I didn't find a dedicated init script for pacemaker.


Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Regression in Filesystem RA

2017-10-18 Thread Christian Balzer
On Mon, 16 Oct 2017 20:52:04 +0200 Lars Ellenberg wrote:

> On Mon, Oct 16, 2017 at 08:09:21PM +0200, Dejan Muhamedagic wrote:
> > Hi,
> > 
> > On Thu, Oct 12, 2017 at 03:30:30PM +0900, Christian Balzer wrote:  
> > > 
> > > Hello,
> > > 
> > > 2nd post in 10 years, lets see if this one gets an answer unlike the first
> > > one...  
> 
> Do you want to make me check for the old one? ;-)
> 
Not really, no.

> > > One of the main use cases for pacemaker here are DRBD replicated
> > > active/active mailbox servers (dovecot/exim) on Debian machines. 
> > > We've been doing this for a loong time, as evidenced by the oldest pair
> > > still running Wheezy with heartbeat and pacemaker 1.1.7.
> > > 
> > > The majority of cluster pairs is on Jessie with corosync and backported
> > > pacemaker 1.1.16.
> > > 
> > > Yesterday we had a hiccup, resulting in half the machines loosing
> > > their upstream router for 50 seconds which in turn caused the pingd RA to
> > > trigger a fail-over of the DRBD RA and associated resource group
> > > (filesystem/IP) to the other node. 
> > > 
> > > The old cluster performed flawlessly, the newer clusters all wound up with
> > > DRBD and FS resource being BLOCKED as the processes holding open the
> > > filesystem didn't get killed fast enough.
> > > 
> > > Comparing the 2 RAs (no versioning T_T) reveals a large change in the
> > > "signal_processes" routine.
> > > 
> > > So with the old Filesystem RA using fuser we get something like this and
> > > thousands of processes killed per second:
> > > ---
> > > Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: 
> > > (res_Filesystem_mb07:stop:stdout)   3478  3593   ...
> > > Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: 
> > > (res_Filesystem_mb07:stop:stderr) 
> > > cmccmccmccmcmcmcmcmccmccmcmcmcmcmcmcmcmcmcmcmcmccmcm
> > > Oct 11 15:06:35 mbx07 lrmd: [4731]: info: RA output: 
> > > (res_Filesystem_mb07:stop:stdout)   4032  4058   ...
> > > ---
> > > 
> > > Whereas the new RA (newer isn't better) that goes around killing processes
> > > individually with beautiful logging was a total fail at about 4 processes
> > > per second killed...
> > > ---
> > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: 
> > > sending signal TERM to: mail42264909  0 09:43 ?S  
> > > 0:00 dovecot/imap 
> > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: 
> > > sending signal TERM to: mail42294909  0 09:43 ?S  
> > > 0:00 dovecot/imap [idling]
> > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: 
> > > sending signal TERM to: mail42384909  0 09:43 ?S  
> > > 0:00 dovecot/imap 
> > > Oct 11 15:06:46 mbx10 Filesystem(res_Filesystem_mb10)[288712]: INFO: 
> > > sending signal TERM to: mail42394909  0 09:43 ?S  
> > > 0:00 dovecot/imap 
> > > ---
> > > 
> > > So my questions are:
> > > 
> > > 1. Am I the only one with more than a handful of processes per FS who
> > > can't afford to wait hours the new routine to finish?  
> > 
> > The change was introduced about five years ago.  
> 
> Also, usually there should be no process anymore,
> because whatever is using the Filesystem should have it's own RA,
> which should have appropriate constraints,
> which means that should have been called and "stop"ped first,
> before the Filesystem stop and umount, and only the "accidental,
> stray, abandoned, idle since three weeks, operator shell session,
> that happend to cd into that file system" is supposed to be around
> *unexpectedly* and in need of killing, and not "thousands of service
> processes, expectedly".
> 
> So arguably your setup is broken,
> relying on a fall-back workaround
> which used to "perform" better.
>
I was expecting a snide remark like that. 

And while you can argue that, take a look at what I wrote, this is an
active-active cluster.
Making dovecot part of the HA setup would result in ALL processes being
killed on a node with a failed-over resource, making things far worse in
an already strained scenario. 
So no, doing it "right" is only an option if my budget is doubled. 
 
> The bug is not that this fall-back workaround now
> has pretty printing and is much slower (and eventually times out),
> the bug is that you don't properly kill the service first.
> [and that you don't have fencing].
> 
> > > 2. Can we have the old FUSER (kill) mode back?  
> > 
> > Yes. I'll make a pull request.  
> 
> Still, that's a sane thing to do,
> thanks, dejanm.
> 
> Maybe we can even come up with a way
> to both "pretty print" and kill fast?
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Rakuten Communications

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: 

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jan Friesse

Jonathan,



On 18/10/17 14:38, Jan Friesse wrote:

Can you please try to remove
"votequorum_exec_send_nodeinfo(us->node_id);" line from votequorum.c
in the votequorum_exec_init_fn function (around line 2306) and let me
know if problem persists?


Wow! With that change, I'm pleased to say that I'm not able to reproduce
the problem at all!


Sounds good.



Is this a legitimate fix, or do we still need the call to
votequorum_exec_send_nodeinfo for other reasons?


That is good question. Calling of votequorum_exec_send_nodeinfo should 
not be needed because it's called by sync_process only slightly later.


But to mark this as a legitimate fix, I would like to find out why is 
this happening and if it is legal or not. Basically because I'm not able 
to reproduce the bug at all (and I was really trying also with various 
usleeps/packet loss/...) I would like to have more information about 
notworking_cluster1.log. Because tracing doesn't work, we need to try 
blackbox. Could you please add


icmap_set_string("runtime.blackbox.dump_flight_data", "yes");

line before api->shutdown_request(); in cmap.c ?

It should trigger dumping blackbox in /var/lib/corosync. When you 
reproduce the nonworking_cluster1, could you please ether:

- compress the file pointed by /var/lib/corosync/fdata symlink
- or execute corosync-blackbox
- or execute qb-blackbox "/var/lib/corosync/fdata"

and send it?

Thank you for your help,
  Honza



Thanks,
Jonathan



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Gerard Garcia
I'm using version 1.1.15-11.el7_3.2-e174ec8. As far as I know the latest
stable version in Centos 7.3

Gerard

On Wed, Oct 18, 2017 at 4:42 PM, Ken Gaillot  wrote:

> On Wed, 2017-10-18 at 14:25 +0200, Gerard Garcia wrote:
> > So I think I found the problem. The two resources are named forwarder
> > and bgpforwarder. It doesn't matter if bgpforwarder exists. It is
> > just that when I set the failcount to INFINITY to a resource named
> > bgpforwarder (crm_failcount -r bgpforwarder -v INFINITY) it directly
> > affects the forwarder resource.
> >
> > If I change the name to forwarderbgp, the problem disappears. So it
> > seems that the problem is that Pacemaker mixes the bgpforwarder and
> > forwarder names. Is it a bug?
> >
> > Gerard
>
> That's really surprising. What version of pacemaker are you using?
> There were a lot of changes in fail count handling in the last few
> releases.
>
> >
> > On Tue, Oct 17, 2017 at 6:27 PM, Gerard Garcia 
> > wrote:
> > > That makes sense. I've tried copying the anything resource and
> > > changed its name and id (which I guess should be enough to make
> > > pacemaker think they are different) but I still have the same
> > > problem.
> > >
> > > After more debugging I have reduced the problem to this:
> > > * First cloned resource running fine
> > > * Second cloned resource running fine
> > > * Manually set failcount to INFINITY to second cloned resource
> > > * Pacemaker triggers an stop operation (without monitor operation
> > > failing) for the two resources in the node where the failcount has
> > > been set to INFINITY.
> > > * Reset failcount starts the two resources again
> > >
> > > Weirdly enough the second resource doesn't stop if I set the the
> > > the first resource failcount to INFINITY (not even the first
> > > resource stops...).
> > >
> > > But:
> > > * If I set the first resource as globally-unique=true it does not
> > > stop so somehow this breaks the relation.
> > > * If I manually set the failcount to 0 in the first resource that
> > > also breaks the relation so it does not stop either. It seems like
> > > the failcount value is being inherited from the second resource
> > > when it does not have any value.
> > >
> > > I must have something wrongly configuration but I can't really see
> > > why there is this relationship...
> > >
> > > Gerard
> > >
> > > On Tue, Oct 17, 2017 at 3:35 PM, Ken Gaillot 
> > > wrote:
> > > > On Tue, 2017-10-17 at 11:47 +0200, Gerard Garcia wrote:
> > > > > Thanks Ken. Yes, inspecting the logs seems that the failcount
> > > > of the
> > > > > correctly running resource reaches the maximum number of
> > > > allowed
> > > > > failures and gets banned in all nodes.
> > > > >
> > > > > What is weird is that I just see how the failcount for the
> > > > first
> > > > > resource gets updated, is like the failcount are being mixed.
> > > > In
> > > > > fact, when the two resources get banned the only way I have to
> > > > make
> > > > > the first one start is to disable the failing one and clean the
> > > > > failcount of the two resources (it is not enough to only clean
> > > > the
> > > > > failcount of the first resource) does it make sense?
> > > > >
> > > > > Gerard
> > > >
> > > > My suspicion is that you have two instances of the same service,
> > > > and
> > > > the resource agent monitor is only checking the general service,
> > > > rather
> > > > than a specific instance of it, so the monitors on both of them
> > > > return
> > > > failure if either one is failing.
> > > >
> > > > That would make sense why you have to disable the failing
> > > > resource, so
> > > > its monitor stops running. I can't think of why you'd have to
> > > > clean its
> > > > failcount for the other one to start, though.
> > > >
> > > > The "anything" agent very often causes more problems than it
> > > > solves ...
> > > >  I'd recommend writing your own OCF agent tailored to your
> > > > service.
> > > > It's not much more complicated than an init script.
> > > >
> > > > > On Mon, Oct 16, 2017 at 6:57 PM, Ken Gaillot  > > > om>
> > > > > wrote:
> > > > > > On Mon, 2017-10-16 at 18:30 +0200, Gerard Garcia wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I have a cluster with two ocf:heartbeat:anything resources
> > > > each
> > > > > > one
> > > > > > > running as a clone in all nodes of the cluster. For some
> > > > reason
> > > > > > when
> > > > > > > one of them fails to start the other one stops. There is
> > > > not any
> > > > > > > constrain configured or any kind of relation between them.
> > > > > > >
> > > > > > > Is it possible that there is some kind of implicit relation
> > > > that
> > > > > > I'm
> > > > > > > not aware of (for example because they are the same type?)
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Gerard
> > > > > >
> > > > > > There is no implicit relation on the Pacemaker side. However
> > > > if the
> > > > > 

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Ken Gaillot
On Wed, 2017-10-18 at 14:25 +0200, Gerard Garcia wrote:
> So I think I found the problem. The two resources are named forwarder
> and bgpforwarder. It doesn't matter if bgpforwarder exists. It is
> just that when I set the failcount to INFINITY to a resource named
> bgpforwarder (crm_failcount -r bgpforwarder -v INFINITY) it directly
> affects the forwarder resource. 
> 
> If I change the name to forwarderbgp, the problem disappears. So it
> seems that the problem is that Pacemaker mixes the bgpforwarder and
> forwarder names. Is it a bug?
> 
> Gerard

That's really surprising. What version of pacemaker are you using?
There were a lot of changes in fail count handling in the last few
releases.

> 
> On Tue, Oct 17, 2017 at 6:27 PM, Gerard Garcia 
> wrote:
> > That makes sense. I've tried copying the anything resource and
> > changed its name and id (which I guess should be enough to make
> > pacemaker think they are different) but I still have the same
> > problem.
> > 
> > After more debugging I have reduced the problem to this:
> > * First cloned resource running fine
> > * Second cloned resource running fine
> > * Manually set failcount to INFINITY to second cloned resource
> > * Pacemaker triggers an stop operation (without monitor operation
> > failing) for the two resources in the node where the failcount has
> > been set to INFINITY.
> > * Reset failcount starts the two resources again
> > 
> > Weirdly enough the second resource doesn't stop if I set the the
> > the first resource failcount to INFINITY (not even the first
> > resource stops...). 
> > 
> > But:
> > * If I set the first resource as globally-unique=true it does not
> > stop so somehow this breaks the relation.
> > * If I manually set the failcount to 0 in the first resource that
> > also breaks the relation so it does not stop either. It seems like
> > the failcount value is being inherited from the second resource
> > when it does not have any value. 
> > 
> > I must have something wrongly configuration but I can't really see
> > why there is this relationship...
> > 
> > Gerard
> > 
> > On Tue, Oct 17, 2017 at 3:35 PM, Ken Gaillot 
> > wrote:
> > > On Tue, 2017-10-17 at 11:47 +0200, Gerard Garcia wrote:
> > > > Thanks Ken. Yes, inspecting the logs seems that the failcount
> > > of the
> > > > correctly running resource reaches the maximum number of
> > > allowed
> > > > failures and gets banned in all nodes.
> > > >
> > > > What is weird is that I just see how the failcount for the
> > > first
> > > > resource gets updated, is like the failcount are being mixed.
> > > In
> > > > fact, when the two resources get banned the only way I have to
> > > make
> > > > the first one start is to disable the failing one and clean the
> > > > failcount of the two resources (it is not enough to only clean
> > > the
> > > > failcount of the first resource) does it make sense?
> > > >
> > > > Gerard
> > > 
> > > My suspicion is that you have two instances of the same service,
> > > and
> > > the resource agent monitor is only checking the general service,
> > > rather
> > > than a specific instance of it, so the monitors on both of them
> > > return
> > > failure if either one is failing.
> > > 
> > > That would make sense why you have to disable the failing
> > > resource, so
> > > its monitor stops running. I can't think of why you'd have to
> > > clean its
> > > failcount for the other one to start, though.
> > > 
> > > The "anything" agent very often causes more problems than it
> > > solves ...
> > >  I'd recommend writing your own OCF agent tailored to your
> > > service.
> > > It's not much more complicated than an init script.
> > > 
> > > > On Mon, Oct 16, 2017 at 6:57 PM, Ken Gaillot  > > om>
> > > > wrote:
> > > > > On Mon, 2017-10-16 at 18:30 +0200, Gerard Garcia wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I have a cluster with two ocf:heartbeat:anything resources
> > > each
> > > > > one
> > > > > > running as a clone in all nodes of the cluster. For some
> > > reason
> > > > > when
> > > > > > one of them fails to start the other one stops. There is
> > > not any
> > > > > > constrain configured or any kind of relation between them. 
> > > > > >
> > > > > > Is it possible that there is some kind of implicit relation
> > > that
> > > > > I'm
> > > > > > not aware of (for example because they are the same type?)
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Gerard
> > > > >
> > > > > There is no implicit relation on the Pacemaker side. However
> > > if the
> > > > > agent returns "failed" for both resources when either one
> > > fails,
> > > > > you
> > > > > could see something like that. I'd look at the logs on the DC
> > > and
> > > > > see
> > > > > why it decided to restart the second resource.
> > > > > --
> > > > > Ken Gaillot 
> > > > >
> > > > > ___
> > > > > Users mailing list: 

Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jonathan Davies


On 18/10/17 14:38, Jan Friesse wrote:
Can you please try to remove 
"votequorum_exec_send_nodeinfo(us->node_id);" line from votequorum.c in 
the votequorum_exec_init_fn function (around line 2306) and let me know 
if problem persists?


Wow! With that change, I'm pleased to say that I'm not able to reproduce 
the problem at all!


Is this a legitimate fix, or do we still need the call to 
votequorum_exec_send_nodeinfo for other reasons?


Thanks,
Jonathan

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync race condition when node leaves immediately after joining

2017-10-18 Thread Jan Friesse

Jonathan,




On 16/10/17 15:58, Jan Friesse wrote:

Jonathan,




On 13/10/17 17:24, Jan Friesse wrote:

I've done a bit of digging and am getting closer to the root cause of
the race.

We rely on having votequorum_sync_init called twice -- once when
node 1
joins (with member_list_entries=2) and once when node 1 leaves (with
member_list_entries=1). This is important because votequorum_sync_init
marks nodes as NODESTATE_DEAD if they are not in quorum_members[]
-- so
it needs to have seen the node appear then disappear. This is
important
because get_total_votes only counts votes from nodes in state
NODESTATE_MEMBER.


So there are basically two problems.

Actually first (main) problem is that votequorum_sync_init is ever
called when that node joins. It really shouldn't. And problem is
simply because calling api->shutdown_request() is not enough. Can you
try replace it with exit(1) (for testing) and reproduce the problem?
I'm pretty sure problem disappears.


No, the problem still happens :-(


Not good.



I am using the following patch:

diff --git a/exec/cmap.c b/exec/cmap.c
index de730d2..1125cef 100644
--- a/exec/cmap.c
+++ b/exec/cmap.c
@@ -406,7 +406,7 @@ static void cmap_sync_activate (void)
 log_printf(LOGSYS_LEVEL_ERROR,
 "Received config version (%"PRIu64") is different
than my config version (%"PRIu64")! Exiting",
 cmap_highest_config_version_received,
cmap_my_config_version);
-   api->shutdown_request();
+   exit(1);
 return ;
 }
  }
diff --git a/exec/main.c b/exec/main.c
index b0d5639..4fd3e68 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -627,6 +627,7 @@ static void deliver_fn (
 ((void *)msg);
 }

+   log_printf(LOGSYS_LEVEL_NOTICE, "executing '%s' exec_handler_fn
%p for node %d (fn %d)", corosync_service[service]->name,
corosync_service[service]->exec_engine[fn_id].exec_handler_fn, nodeid,
fn_id);
 corosync_service[service]->exec_engine[fn_id].exec_handler_fn
 (msg, nodeid);
  }
diff --git a/exec/votequorum.c b/exec/votequorum.c
index 1a97c6d..7c0f34f 100644
--- a/exec/votequorum.c
+++ b/exec/votequorum.c
@@ -2099,6 +2100,7 @@ static void
message_handler_req_exec_votequorum_nodeinfo (
 node->flags = req_exec_quorum_nodeinfo->flags;
 node->votes = req_exec_quorum_nodeinfo->votes;
 node->state = NODESTATE_MEMBER;
+   log_printf(LOGSYS_LEVEL_NOTICE,
"message_handler_req_exec_votequorum_nodeinfo (%p) marking node %d as
MEMBER", message_handler_req_exec_votequorum_nodeinfo, nodeid);

 if (node->flags & NODE_FLAGS_LEAVING) {
 node->state = NODESTATE_LEAVING;

When it's working correctly I see this:

1508151960.072927 notice  [TOTEM ] A new membership (10.71.218.17:2304)
was formed. Members joined: 1
1508151960.073082 notice  [SYNC  ] calling sync_init on service
'corosync configuration map access' (0) with my_member_list_entries = 2
1508151960.073150 notice  [MAIN  ] executing 'corosync configuration map
access' exec_handler_fn 0x55b5eb504ca0 for node 1 (fn 0)
1508151960.073197 notice  [MAIN  ] executing 'corosync configuration map
access' exec_handler_fn 0x55b5eb504ca0 for node 2 (fn 0)
1508151960.073238 notice  [SYNC  ] calling sync_init on service
'corosync cluster closed process group service v1.01' (1) with
my_member_list_entries = 2
1508151961.073033 notice  [TOTEM ] A processor failed, forming new
configuration.

When it's not working correctly I see this:

1508151908.447584 notice  [TOTEM ] A new membership (10.71.218.17:2292)
was formed. Members joined: 1
1508151908.447757 notice  [MAIN  ] executing 'corosync vote quorum
service v1.0' exec_handler_fn 0x558b39fbbaa0 for node 1 (fn 0)
1508151908.447866 notice  [VOTEQ ]
message_handler_req_exec_votequorum_nodeinfo (0x558b39fbbaa0) marking
node 1 as MEMBER
1508151908.447972 notice  [VOTEQ ] get_total_votes: node 1 is a MEMBER
so counting vote
1508151908.448045 notice  [VOTEQ ] get_total_votes: node 2 is a MEMBER
so counting vote
1508151908.448091 notice  [QUORUM] This node is within the primary
component and will provide service.
1508151908.448134 notice  [QUORUM] Members[1]: 2
1508151908.448175 notice  [SYNC  ] calling sync_init on service
'corosync configuration map access' (0) with my_member_list_entries = 2
1508151908.448205 notice  [MAIN  ] executing 'corosync configuration map
access' exec_handler_fn 0x558b39fb3ca0 for node 1 (fn 0)
1508151908.448247 notice  [MAIN  ] executing 'corosync configuration map
access' exec_handler_fn 0x558b39fb3ca0 for node 2 (fn 0)
1508151908.448307 notice  [SYNC  ] calling sync_init on service
'corosync cluster closed process group service v1.01' (1) with
my_member_list_entries = 2
1508151909.447182 notice  [TOTEM ] A processor failed, forming new
configuration.

... and at that point I already see "Total votes: 2" in the
corosync-quorumtool output.

The key difference seems to be whether 

Re: [ClusterLabs] When resource fails to start it stops an apparently unrelated resource

2017-10-18 Thread Gerard Garcia
So I think I found the problem. The two resources are named forwarder and
bgpforwarder. It doesn't matter if bgpforwarder exists. It is just that
when I set the failcount to INFINITY to a resource named bgpforwarder
(crm_failcount -r bgpforwarder -v INFINITY) it directly affects the
forwarder resource.

If I change the name to forwarderbgp, the problem disappears. So it seems
that the problem is that Pacemaker mixes the bgpforwarder and forwarder
names. Is it a bug?

Gerard

On Tue, Oct 17, 2017 at 6:27 PM, Gerard Garcia  wrote:

> That makes sense. I've tried copying the anything resource and changed its
> name and id (which I guess should be enough to make pacemaker think they
> are different) but I still have the same problem.
>
> After more debugging I have reduced the problem to this:
> * First cloned resource running fine
> * Second cloned resource running fine
> * Manually set failcount to INFINITY to second cloned resource
> * Pacemaker triggers an stop operation (without monitor operation failing)
> for the two resources in the node where the failcount has been set to
> INFINITY.
> * Reset failcount starts the two resources again
>
> Weirdly enough the second resource doesn't stop if I set the the the first
> resource failcount to INFINITY (not even the first resource stops...).
>
> But:
> * If I set the first resource as globally-unique=true it does not stop so
> somehow this breaks the relation.
> * If I manually set the failcount to 0 in the first resource that also
> breaks the relation so it does not stop either. It seems like the failcount
> value is being inherited from the second resource when it does not have any
> value.
>
> I must have something wrongly configuration but I can't really see why
> there is this relationship...
>
> Gerard
>
> On Tue, Oct 17, 2017 at 3:35 PM, Ken Gaillot  wrote:
>
>> On Tue, 2017-10-17 at 11:47 +0200, Gerard Garcia wrote:
>> > Thanks Ken. Yes, inspecting the logs seems that the failcount of the
>> > correctly running resource reaches the maximum number of allowed
>> > failures and gets banned in all nodes.
>> >
>> > What is weird is that I just see how the failcount for the first
>> > resource gets updated, is like the failcount are being mixed. In
>> > fact, when the two resources get banned the only way I have to make
>> > the first one start is to disable the failing one and clean the
>> > failcount of the two resources (it is not enough to only clean the
>> > failcount of the first resource) does it make sense?
>> >
>> > Gerard
>>
>> My suspicion is that you have two instances of the same service, and
>> the resource agent monitor is only checking the general service, rather
>> than a specific instance of it, so the monitors on both of them return
>> failure if either one is failing.
>>
>> That would make sense why you have to disable the failing resource, so
>> its monitor stops running. I can't think of why you'd have to clean its
>> failcount for the other one to start, though.
>>
>> The "anything" agent very often causes more problems than it solves ...
>>  I'd recommend writing your own OCF agent tailored to your service.
>> It's not much more complicated than an init script.
>>
>> > On Mon, Oct 16, 2017 at 6:57 PM, Ken Gaillot 
>> > wrote:
>> > > On Mon, 2017-10-16 at 18:30 +0200, Gerard Garcia wrote:
>> > > > Hi,
>> > > >
>> > > > I have a cluster with two ocf:heartbeat:anything resources each
>> > > one
>> > > > running as a clone in all nodes of the cluster. For some reason
>> > > when
>> > > > one of them fails to start the other one stops. There is not any
>> > > > constrain configured or any kind of relation between them.
>> > > >
>> > > > Is it possible that there is some kind of implicit relation that
>> > > I'm
>> > > > not aware of (for example because they are the same type?)
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Gerard
>> > >
>> > > There is no implicit relation on the Pacemaker side. However if the
>> > > agent returns "failed" for both resources when either one fails,
>> > > you
>> > > could see something like that. I'd look at the logs on the DC and
>> > > see
>> > > why it decided to restart the second resource.
>> > > --
>> > > Ken Gaillot 
>> > >
>> > > ___
>> > > Users mailing list: Users@clusterlabs.org
>> > > http://lists.clusterlabs.org/mailman/listinfo/users
>> > >
>> > > Project Home: http://www.clusterlabs.org
>> > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc
>> > > h.pdf
>> > > Bugs: http://bugs.clusterlabs.org
>> > >
>> >
>> > ___
>> > Users mailing list: Users@clusterlabs.org
>> > http://lists.clusterlabs.org/mailman/listinfo/users
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
>> > pdf
>> > Bugs: http://bugs.clusterlabs.org
>> --
>> Ken Gaillot 

Re: [ClusterLabs] Fwd: Stopped DRBD

2017-10-18 Thread Vladislav Bogdanov

Hi,

ensure you have two monitor operations configured for your drbd 
resource: for 'Master' and 'Slave' roles ('Slave' == 'Started' == '' for 
ms resources).


http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_monitoring_multi_state_resources.html


18.10.2017 11:18, Антон Сацкий wrote:


Hi list need your help


[root@voipserver ~]# pcs status
Cluster name: ClusterKrusher
Stack: corosync
Current DC: voipserver.backup (version 1.1.16-12.el7_4.2-94ff4df) -
partition with quorum
Last updated: Tue Oct 17 19:46:05 2017
Last change: Tue Oct 17 19:28:22 2017 by root via cibadmin on
voipserver.primary

2 nodes configured
3 resources configured

Node voipserver.backup: standby
Online: [ voipserver.primary ]

Full list of resources:

 ClusterIP  (ocf::heartbeat:IPaddr2):   Started voipserver.primary
 Master/Slave Set: DrbdDataClone [DrbdData]
 Masters: [ voipserver.primary ]
 Stopped: [ voipserver.backup ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled



BUT IN FACT
[root@voipserver ~]# drbd-overview
NOTE: drbd-overview will be deprecated soon.
Please consider using drbdtop.

 1:r0/0 Connected Primary/Secondary UpToDate/UpToDate


Is it normal behavior or a BUG




--
Best regards
Antony



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org