[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
Hi @niedbalski, Were you still interested in SRUing this? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
https://williamgrant.id.au -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
Hello Lucas, I'll reformat the patch accordingly and re-submit. Thanks. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
Sorry for taking too long to get to this bug. I have some comments about the proposed debdiff: 1- The version needs to be updated to 1.1.18-0ubuntu1.4. The .3 version was already released to bionic-updates. 2- The patches need some DEP-3 headers. I see you are just backporting the upstream patches but it would be good to also add some headers after the original commit message, such as Origin, Bug-Ubuntu, Reviewed-By. 3- The patches 0001-Fix-libpe_status-don-t-order-implied-stops- relative-.patch and 0002-Fix-scheduler-remote-state-is-failed-if-node- is-shut.patch are in the debdiff but they are not mentioned in debian/patches/series nor debian/changelog. Should they be removed? Or added to d/p/series and d/changelog? The proposed debdiff as-is built fine for me locally. We need to address the comments above to be able to upload this package. In parallel, we can update the bug description to add the SRU template (impact, test plan, where problems could occur), are you willing to do that @Jorge? Thanks for the work you have done so far! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
I think this has fallen through the cracks by Rafel being unavailable. @Lucas - is this something you could look into? @Jorge - did this resolve some other way as things just stayed silent? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
** Changed in: pacemaker (Ubuntu Bionic) Importance: Undecided => Medium ** Changed in: pacemaker (Ubuntu Focal) Importance: Undecided => Medium ** Changed in: pacemaker (Ubuntu Groovy) Importance: Undecided => Medium -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
** Patch added: "lp1890491-bionic.debdiff" https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+attachment/5408794/+files/lp1890491-bionic.debdiff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
Thanks for taking care of this Jorge. Let me know whenever you have a fix ready for this. Cheers! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
** Changed in: pacemaker (Ubuntu Bionic) Status: New => In Progress ** Changed in: pacemaker (Ubuntu Bionic) Assignee: (unassigned) => Jorge Niedbalski (niedbalski) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
Hello, I am testing a couple of patches (both imported from master), through this PPA: https://launchpad.net/~niedbalski/+archive/ubuntu/fix-1890491 c20f8920 - don't order implied stops relative to a remote connection 938e99f2 - remote state is failed if node is shutting down with connection failure I'll report back here if these patches fixes the behavior described in my previous comment. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
I am able to reproduce a similar issue with the following bundle: https://paste.ubuntu.com/p/VJ3m7nMN79/ Resource created with sudo pcs resource create test2 ocf:pacemaker:Dummy op_sleep=10 op monitor interval=30s timeout=30s op start timeout=30s op stop timeout=30s juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers juju-acda3d-pacemaker-remote-10.cloud.sts" juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers juju-acda3d-pacemaker-remote-11.cloud.sts" juju ssh nova-cloud-controller/2 "sudo pcs constraint location test2 prefers juju-acda3d-pacemaker-remote-12.cloud.sts" Online: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 juju-acda3d-pacemaker-remote-9 ] RemoteOnline: [ juju-acda3d-pacemaker-remote-10.cloud.sts juju-acda3d-pacemaker-remote-11.cloud.sts juju-acda3d-pacemaker-remote-12.cloud.sts ] Full list of resources: Resource Group: grp_nova_vips res_nova_bf9661e_vip (ocf::heartbeat:IPaddr2): Started juju-acda3d-pacemaker-remote-7 Clone Set: cl_nova_haproxy [res_nova_haproxy] Started: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 juju-acda3d-pacemaker-remote-9 ] juju-acda3d-pacemaker-remote-10.cloud.sts (ocf::pacemaker:remote): Started juju-acda3d-pacemaker-remote-8 juju-acda3d-pacemaker-remote-12.cloud.sts (ocf::pacemaker:remote): Started juju-acda3d-pacemaker-remote-8 juju-acda3d-pacemaker-remote-11.cloud.sts (ocf::pacemaker:remote): Started juju-acda3d-pacemaker-remote-7 test2 (ocf::pacemaker:Dummy): Started juju-acda3d-pacemaker- remote-10.cloud.sts ## After running the following commands on juju-acda3d-pacemaker- remote-10.cloud.sts 1) sudo systemctl stop pacemaker_remote 2) forcedfully shutdown (openstack server stop ) in less than 10 seconds after the pacemaker_remote gets executed. Remote is shutdown RemoteOFFLINE: [ juju-acda3d-pacemaker-remote-10.cloud.sts ] The resource status remains as stopped across the 3 machines, and doesn't recovers. $ juju run --application nova-cloud-controller "sudo pcs resource show | grep -i test2" - Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n" UnitId: nova-cloud-controller/0 - Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n" UnitId: nova-cloud-controller/1 - Stdout: " test2\t(ocf::pacemaker:Dummy):\tStopped\n" UnitId: nova-cloud-controller/2 However, If I do a clean shutdown (without interrupting the pacemaker_remote fence), that ends up with the resource migrated correctly to another node. 6 nodes configured 9 resources configured Online: [ juju-acda3d-pacemaker-remote-7 juju-acda3d-pacemaker-remote-8 juju-acda3d-pacemaker-remote-9 ] RemoteOnline: [ juju-acda3d-pacemaker-remote-11.cloud.sts juju-acda3d-pacemaker-remote-12.cloud.sts ] RemoteOFFLINE: [ juju-acda3d-pacemaker-remote-10.cloud.sts ] Full list of resources: [...] test2 (ocf::pacemaker:Dummy): Started juju-acda3d-pacemaker-remote-12.cloud.sts I will keep investigating this behavior and determine is this is linked to the bug reported. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1890491] Re: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189
** Also affects: pacemaker (Ubuntu Groovy) Importance: Undecided Status: New ** Also affects: pacemaker (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: pacemaker (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: pacemaker (Ubuntu Groovy) Status: New => Fix Released ** Changed in: pacemaker (Ubuntu Focal) Status: New => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1890491 Title: A pacemaker node fails monitor (probe) and stop /start operations on a resource because it returns "rc=189 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1890491/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs