[ClusterLabs] Antw: [EXT] Re: clusterlabs.org upgrade done
>>> Valentin Vidic schrieb am 03.03.2020 um 16:52 in Nachricht <20449_1583250783_5e5e7d5f_20449_2155_1_20200303155240.ga24...@valentin-vidic.fr m.hr>: > On Sat, Feb 29, 2020 at 03:44:50PM ‑0600, Ken Gaillot wrote: >> The clusterlabs.org server OS upgrade is (mostly) done. >> >> Services are back up, with the exception of some cosmetic issues and >> the source code continuous integration testing for ClusterLabs github >> projects (ci.kronosnet.org). Those will be dealt with at a more >> reasonable time :) > > Regarding the upgrade, perhaps the mailman config for the list should > be updated to work better with SPF and DKIM checks? How do you define "work better"? > > ‑‑ > Valentin > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Finding attributes of a past resource agent invocation
>>> schrieb am 03.03.2020 um 15:22 in Nachricht <21288_1583245382_5e5e6846_21288_1264_1_87zhcx1psd@lant.ki.iif.hu>: > Hi, > > I suffered unexpected fencing under Pacemaker 2.0.1. I set a resource > to unmanaged (crm_resource ‑r vm‑invtest ‑m ‑p is‑managed ‑v false), > then played with ocf‑tester, which left the resource stopped. Finally I To me it looks as if the resource wasn't stopped at the time of deletion (maybe that's pacemaker's confusion due to management mode): pacemaker-controld[11670]: notice: Initiating stop operation vm-invtest_stop_0 on inv1 pacemaker-controld[11670]: notice: Transition 959 aborted by deletion of lrm_rsc_op[@id='vm-invtest_last_failure_0']: If you intended to delete the resource, why didn't you stop it in normal mode? Did you expect stop to fail? Even in older versions pacemaker did not like deletion of started resources. Regards, Ulrich > deleted the resource (crm_resource ‑r vm‑invtest ‑‑delete ‑t primitive), > which led to: > > pacemaker‑controld[11670]: notice: State transition S_IDLE ‑> S_POLICY_ENGINE > pacemaker‑schedulerd[11669]: notice: Clearing failure of vm‑invtest on inv1 > because resource parameters have changed > pacemaker‑schedulerd[11669]: warning: Processing failed monitor of vm‑invtest > on inv1: not running > pacemaker‑schedulerd[11669]: warning: Detected active orphan vm‑invtest > running on inv1 > pacemaker‑schedulerd[11669]: notice: Clearing failure of vm‑invtest on inv1 > because it is orphaned > pacemaker‑schedulerd[11669]: notice: * Stop vm‑invtest ( inv1 ) > due to node availability > pacemaker‑schedulerd[11669]: notice: Calculated transition 959, saving > inputs in /var/lib/pacemaker/pengine/pe‑input‑87.bz2 > pacemaker‑controld[11670]: notice: Initiating stop operation > vm‑invtest_stop_0 on inv1 > pacemaker‑controld[11670]: notice: Transition 959 aborted by deletion of > lrm_rsc_op[@id='vm‑invtest_last_failure_0']: Resource operation removal > pacemaker‑controld[11670]: warning: Action 6 (vm‑invtest_stop_0) on inv1 > failed (target: 0 vs. rc: 6): Error > pacemaker‑controld[11670]: notice: Transition 959 (Complete=5, Pending=0, > Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe‑input‑87.bz2): Complete > pacemaker‑schedulerd[11669]: warning: Processing failed stop of vm‑invtest on > inv1: not configured > pacemaker‑schedulerd[11669]: error: Preventing vm‑invtest from re‑starting > anywhere: operation stop failed 'not configured' (6) > pacemaker‑schedulerd[11669]: warning: Processing failed stop of vm‑invtest on > inv1: not configured > pacemaker‑schedulerd[11669]: error: Preventing vm‑invtest from re‑starting > anywhere: operation stop failed 'not configured' (6) > pacemaker‑schedulerd[11669]: warning: Cluster node inv1 will be fenced: > vm‑invtest failed there > pacemaker‑schedulerd[11669]: warning: Detected active orphan vm‑invtest > running on inv1 > pacemaker‑schedulerd[11669]: warning: Scheduling Node inv1 for STONITH > pacemaker‑schedulerd[11669]: notice: Stop of failed resource vm‑invtest is > implicit after inv1 is fenced > pacemaker‑schedulerd[11669]: notice: * Fence (reboot) inv1 'vm‑invtest > failed there' > pacemaker‑schedulerd[11669]: notice: * Move fencing‑inv3 ( inv1 ‑> > inv2 ) > pacemaker‑schedulerd[11669]: notice: * Stop vm‑invtest ( > inv1 ) due to node availability > > The OCF resource agent (on inv1) reported that it failed to validate one > of the attributes passed to it for the stop operation, hence the "not > configured" error, which caused the fencing. Is there a way to find out > what attributes were passed to the OCF agent in that fateful invocation? > I've got pe‑input files, Pacemaker detail logs and a hard time wading > through them. I failed to reproduce the issue till now (but I haven't > rewound the CIB yet). > ‑‑ > Thanks, > Feri > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Finding attributes of a past resource agent invocation
On 3/3/20 11:22 PM, wf...@niif.hu wrote: Hi, I suffered unexpected fencing under Pacemaker 2.0.1. I set a resource to unmanaged (crm_resource -r vm-invtest -m -p is-managed -v false), then played with ocf-tester, which left the resource stopped. Finally I deleted the resource (crm_resource -r vm-invtest --delete -t primitive), which led to: pacemaker-controld[11670]: notice: State transition S_IDLE -> S_POLICY_ENGINE pacemaker-schedulerd[11669]: notice: Clearing failure of vm-invtest on inv1 because resource parameters have changed pacemaker-schedulerd[11669]: warning: Processing failed monitor of vm-invtest on inv1: not running pacemaker-schedulerd[11669]: warning: Detected active orphan vm-invtest running on inv1 pacemaker-schedulerd[11669]: notice: Clearing failure of vm-invtest on inv1 because it is orphaned pacemaker-schedulerd[11669]: notice: * Stop vm-invtest ( inv1 ) due to node availability pacemaker-schedulerd[11669]: notice: Calculated transition 959, saving inputs in /var/lib/pacemaker/pengine/pe-input-87.bz2 pacemaker-controld[11670]: notice: Initiating stop operation vm-invtest_stop_0 on inv1 pacemaker-controld[11670]: notice: Transition 959 aborted by deletion of lrm_rsc_op[@id='vm-invtest_last_failure_0']: Resource operation removal pacemaker-controld[11670]: warning: Action 6 (vm-invtest_stop_0) on inv1 failed (target: 0 vs. rc: 6): Error pacemaker-controld[11670]: notice: Transition 959 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Complete pacemaker-schedulerd[11669]: warning: Processing failed stop of vm-invtest on inv1: not configured pacemaker-schedulerd[11669]: error: Preventing vm-invtest from re-starting anywhere: operation stop failed 'not configured' (6) pacemaker-schedulerd[11669]: warning: Processing failed stop of vm-invtest on inv1: not configured pacemaker-schedulerd[11669]: error: Preventing vm-invtest from re-starting anywhere: operation stop failed 'not configured' (6) pacemaker-schedulerd[11669]: warning: Cluster node inv1 will be fenced: vm-invtest failed there pacemaker-schedulerd[11669]: warning: Detected active orphan vm-invtest running on inv1 pacemaker-schedulerd[11669]: warning: Scheduling Node inv1 for STONITH pacemaker-schedulerd[11669]: notice: Stop of failed resource vm-invtest is implicit after inv1 is fenced pacemaker-schedulerd[11669]: notice: * Fence (reboot) inv1 'vm-invtest failed there' pacemaker-schedulerd[11669]: notice: * Move fencing-inv3 ( inv1 -> inv2 ) pacemaker-schedulerd[11669]: notice: * Stop vm-invtest ( inv1 ) due to node availability The OCF resource agent (on inv1) reported that it failed to validate one of the attributes passed to it for the stop operation, hence the "not configured" error, which caused the fencing. Is there a way to find out what attributes were passed to the OCF agent in that fateful invocation? I've got pe-input files, Pacemaker detail logs and a hard time wading through them. I failed to reproduce the issue till now (but I haven't rewound the CIB yet). Hi Feri, > Is there a way to find out what attributes were passed to the OCF agent in that fateful invocation? Basically same as with any other operation while the resource was configured (with exception of ACTION which was 'stop' in case of stopping resource). As you have the pe-input files which contains the attributes of the resource you can get the attributes and their values from there. == For example if I have tried to delete my test resource with same name, the following can be found in pe-input file ... type="Dummy"> name="target-role" value="Stopped"/> value="some_value"/> name="migrate_from" timeout="20s"/> name="migrate_to" timeout="20s"/> name="monitor" timeout="20s"/> name="reload" timeout="20s"/> name="start" timeout="20s"/> name="stop" timeout="20s"/> ... From above you can see that cluster will be stopping it because of the 'name="target-role" value="Stopped"'. Also you can see that this resource has one attribute (nvpair) with value - name="fake value="some_value"'. Taking inspiration from /usr/lib/ocf/resource.d/pacemaker/Dummy I can see that resource agent will be called like "/usr/lib/ocf/resource.d/pacemaker/Dummy stop" and there will be at minimum $OCF_RESKEY_fake variable passed to it. If you can reproduce the same issue you can try to dump all variables to file when validation fails (take inspiration from function 'dump_env()' of Dummy resource). So if you wanna check what attributes were set around the time of deletion have a look at /var/lib/pacemaker/pengine/pe-input-87.bz2 or maybe /var/lib/pacemaker/pengine/pe-input-86.bz2. -- Ondrej Famera ___ Manage your
Re: [ClusterLabs] DRBD not failing over
Quoting Jaap Winius : Very interesting. I'm already running DRBD 9, so that base has already been covered, but here's some extra information: My test system actually consists of a single 4-node DRBD cluster that spans two data centers, with each data center having a 2-node Pacemaker cluster to fail resources over between the two DRBD nodes in that data center. But, for the purpose of quorum arbitration I guess these extra DRBD nodes don't matter, perhaps because four is not an odd number? No, four nodes are enough and I eventually figured it out. There was an extra firewall port that may have helped (2224), but I suspect that the main problem was that I forgot to enable the SELinux boolean for DRBD: daemons_enable_cluster_mode=1. Now everything is working perfectly. Thanks, Jaap ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] clusterlabs.org upgrade done
On Sat, Feb 29, 2020 at 03:44:50PM -0600, Ken Gaillot wrote: > The clusterlabs.org server OS upgrade is (mostly) done. > > Services are back up, with the exception of some cosmetic issues and > the source code continuous integration testing for ClusterLabs github > projects (ci.kronosnet.org). Those will be dealt with at a more > reasonable time :) Regarding the upgrade, perhaps the mailman config for the list should be updated to work better with SPF and DKIM checks? -- Valentin ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Finding attributes of a past resource agent invocation
Hi, I suffered unexpected fencing under Pacemaker 2.0.1. I set a resource to unmanaged (crm_resource -r vm-invtest -m -p is-managed -v false), then played with ocf-tester, which left the resource stopped. Finally I deleted the resource (crm_resource -r vm-invtest --delete -t primitive), which led to: pacemaker-controld[11670]: notice: State transition S_IDLE -> S_POLICY_ENGINE pacemaker-schedulerd[11669]: notice: Clearing failure of vm-invtest on inv1 because resource parameters have changed pacemaker-schedulerd[11669]: warning: Processing failed monitor of vm-invtest on inv1: not running pacemaker-schedulerd[11669]: warning: Detected active orphan vm-invtest running on inv1 pacemaker-schedulerd[11669]: notice: Clearing failure of vm-invtest on inv1 because it is orphaned pacemaker-schedulerd[11669]: notice: * Stop vm-invtest ( inv1 ) due to node availability pacemaker-schedulerd[11669]: notice: Calculated transition 959, saving inputs in /var/lib/pacemaker/pengine/pe-input-87.bz2 pacemaker-controld[11670]: notice: Initiating stop operation vm-invtest_stop_0 on inv1 pacemaker-controld[11670]: notice: Transition 959 aborted by deletion of lrm_rsc_op[@id='vm-invtest_last_failure_0']: Resource operation removal pacemaker-controld[11670]: warning: Action 6 (vm-invtest_stop_0) on inv1 failed (target: 0 vs. rc: 6): Error pacemaker-controld[11670]: notice: Transition 959 (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Complete pacemaker-schedulerd[11669]: warning: Processing failed stop of vm-invtest on inv1: not configured pacemaker-schedulerd[11669]: error: Preventing vm-invtest from re-starting anywhere: operation stop failed 'not configured' (6) pacemaker-schedulerd[11669]: warning: Processing failed stop of vm-invtest on inv1: not configured pacemaker-schedulerd[11669]: error: Preventing vm-invtest from re-starting anywhere: operation stop failed 'not configured' (6) pacemaker-schedulerd[11669]: warning: Cluster node inv1 will be fenced: vm-invtest failed there pacemaker-schedulerd[11669]: warning: Detected active orphan vm-invtest running on inv1 pacemaker-schedulerd[11669]: warning: Scheduling Node inv1 for STONITH pacemaker-schedulerd[11669]: notice: Stop of failed resource vm-invtest is implicit after inv1 is fenced pacemaker-schedulerd[11669]: notice: * Fence (reboot) inv1 'vm-invtest failed there' pacemaker-schedulerd[11669]: notice: * Move fencing-inv3 ( inv1 -> inv2 ) pacemaker-schedulerd[11669]: notice: * Stop vm-invtest ( inv1 ) due to node availability The OCF resource agent (on inv1) reported that it failed to validate one of the attributes passed to it for the stop operation, hence the "not configured" error, which caused the fencing. Is there a way to find out what attributes were passed to the OCF agent in that fateful invocation? I've got pe-input files, Pacemaker detail logs and a hard time wading through them. I failed to reproduce the issue till now (but I haven't rewound the CIB yet). -- Thanks, Feri ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/