[ClusterLabs] Antw: [EXT] Re: clusterlabs.org upgrade done

2020-03-03 Thread Ulrich Windl
>>> Valentin Vidic  schrieb am 03.03.2020 um
16:52
in Nachricht
<20449_1583250783_5e5e7d5f_20449_2155_1_20200303155240.ga24...@valentin-vidic.fr

m.hr>:
> On Sat, Feb 29, 2020 at 03:44:50PM ‑0600, Ken Gaillot wrote:
>> The clusterlabs.org server OS upgrade is (mostly) done.
>> 
>> Services are back up, with the exception of some cosmetic issues and
>> the source code continuous integration testing for ClusterLabs github
>> projects (ci.kronosnet.org). Those will be dealt with at a more
>> reasonable time :)
> 
> Regarding the upgrade, perhaps the mailman config for the list should
> be updated to work better with SPF and DKIM checks?

How do you define "work better"?

> 
> ‑‑ 
> Valentin
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Finding attributes of a past resource agent invocation

2020-03-03 Thread Ulrich Windl
>>>  schrieb am 03.03.2020 um 15:22 in Nachricht
<21288_1583245382_5e5e6846_21288_1264_1_87zhcx1psd@lant.ki.iif.hu>:
> Hi,
> 
> I suffered unexpected fencing under Pacemaker 2.0.1.  I set a resource
> to unmanaged (crm_resource ‑r vm‑invtest ‑m ‑p is‑managed ‑v false),
> then played with ocf‑tester, which left the resource stopped.  Finally I

To me it looks as if the resource wasn't stopped at the time of deletion
(maybe that's pacemaker's confusion due to management mode):
pacemaker-controld[11670]:  notice: Initiating stop operation
vm-invtest_stop_0 on inv1 
pacemaker-controld[11670]:  notice: Transition 959 aborted by deletion of
lrm_rsc_op[@id='vm-invtest_last_failure_0']: 

If you intended to delete the resource, why didn't you stop it in normal mode?
Did you expect stop to fail?

Even in older versions pacemaker did not like deletion of started resources.

Regards,
Ulrich

> deleted the resource (crm_resource ‑r vm‑invtest ‑‑delete ‑t primitive),
> which led to:
> 
> pacemaker‑controld[11670]:  notice: State transition S_IDLE ‑>
S_POLICY_ENGINE 
> pacemaker‑schedulerd[11669]:  notice: Clearing failure of vm‑invtest on inv1

> because resource parameters have changed 
> pacemaker‑schedulerd[11669]:  warning: Processing failed monitor of
vm‑invtest 
> on inv1: not running 
> pacemaker‑schedulerd[11669]:  warning: Detected active orphan vm‑invtest 
> running on inv1
> pacemaker‑schedulerd[11669]:  notice: Clearing failure of vm‑invtest on inv1

> because it is orphaned 
> pacemaker‑schedulerd[11669]:  notice:  * Stop   vm‑invtest   (  inv1
) 
>   due to node availability
> pacemaker‑schedulerd[11669]:  notice: Calculated transition 959, saving 
> inputs in /var/lib/pacemaker/pengine/pe‑input‑87.bz2
> pacemaker‑controld[11670]:  notice: Initiating stop operation 
> vm‑invtest_stop_0 on inv1 
> pacemaker‑controld[11670]:  notice: Transition 959 aborted by deletion of 
> lrm_rsc_op[@id='vm‑invtest_last_failure_0']: Resource operation removal 
> pacemaker‑controld[11670]:  warning: Action 6 (vm‑invtest_stop_0) on inv1 
> failed (target: 0 vs. rc: 6): Error
> pacemaker‑controld[11670]:  notice: Transition 959 (Complete=5, Pending=0, 
> Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe‑input‑87.bz2): Complete
> pacemaker‑schedulerd[11669]:  warning: Processing failed stop of vm‑invtest
on 
> inv1: not configured 
> pacemaker‑schedulerd[11669]:  error: Preventing vm‑invtest from re‑starting

> anywhere: operation stop failed 'not configured' (6)
> pacemaker‑schedulerd[11669]:  warning: Processing failed stop of vm‑invtest
on 
> inv1: not configured 
> pacemaker‑schedulerd[11669]:  error: Preventing vm‑invtest from re‑starting

> anywhere: operation stop failed 'not configured' (6)
> pacemaker‑schedulerd[11669]:  warning: Cluster node inv1 will be fenced: 
> vm‑invtest failed there
> pacemaker‑schedulerd[11669]:  warning: Detected active orphan vm‑invtest 
> running on inv1
> pacemaker‑schedulerd[11669]:  warning: Scheduling Node inv1 for STONITH
> pacemaker‑schedulerd[11669]:  notice: Stop of failed resource vm‑invtest is

> implicit after inv1 is fenced
> pacemaker‑schedulerd[11669]:  notice:  * Fence (reboot) inv1 'vm‑invtest 
> failed there'
> pacemaker‑schedulerd[11669]:  notice:  * Move   fencing‑inv3 ( inv1
‑> 
> inv2 )  
> pacemaker‑schedulerd[11669]:  notice:  * Stop   vm‑invtest   (  
  
> inv1 )   due to node availability
> 
> The OCF resource agent (on inv1) reported that it failed to validate one
> of the attributes passed to it for the stop operation, hence the "not
> configured" error, which caused the fencing.  Is there a way to find out
> what attributes were passed to the OCF agent in that fateful invocation?
> I've got pe‑input files, Pacemaker detail logs and a hard time wading
> through them.  I failed to reproduce the issue till now (but I haven't
> rewound the CIB yet).
> ‑‑ 
> Thanks,
> Feri
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Finding attributes of a past resource agent invocation

2020-03-03 Thread Ondrej

On 3/3/20 11:22 PM, wf...@niif.hu wrote:

Hi,

I suffered unexpected fencing under Pacemaker 2.0.1.  I set a resource
to unmanaged (crm_resource -r vm-invtest -m -p is-managed -v false),
then played with ocf-tester, which left the resource stopped.  Finally I
deleted the resource (crm_resource -r vm-invtest --delete -t primitive),
which led to:

pacemaker-controld[11670]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
pacemaker-schedulerd[11669]:  notice: Clearing failure of vm-invtest on inv1 
because resource parameters have changed
pacemaker-schedulerd[11669]:  warning: Processing failed monitor of vm-invtest 
on inv1: not running
pacemaker-schedulerd[11669]:  warning: Detected active orphan vm-invtest 
running on inv1
pacemaker-schedulerd[11669]:  notice: Clearing failure of vm-invtest on inv1 
because it is orphaned
pacemaker-schedulerd[11669]:  notice:  * Stop   vm-invtest   (  inv1 )  
 due to node availability
pacemaker-schedulerd[11669]:  notice: Calculated transition 959, saving inputs 
in /var/lib/pacemaker/pengine/pe-input-87.bz2
pacemaker-controld[11670]:  notice: Initiating stop operation vm-invtest_stop_0 
on inv1
pacemaker-controld[11670]:  notice: Transition 959 aborted by deletion of 
lrm_rsc_op[@id='vm-invtest_last_failure_0']: Resource operation removal
pacemaker-controld[11670]:  warning: Action 6 (vm-invtest_stop_0) on inv1 
failed (target: 0 vs. rc: 6): Error
pacemaker-controld[11670]:  notice: Transition 959 (Complete=5, Pending=0, 
Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Complete
pacemaker-schedulerd[11669]:  warning: Processing failed stop of vm-invtest on 
inv1: not configured
pacemaker-schedulerd[11669]:  error: Preventing vm-invtest from re-starting 
anywhere: operation stop failed 'not configured' (6)
pacemaker-schedulerd[11669]:  warning: Processing failed stop of vm-invtest on 
inv1: not configured
pacemaker-schedulerd[11669]:  error: Preventing vm-invtest from re-starting 
anywhere: operation stop failed 'not configured' (6)
pacemaker-schedulerd[11669]:  warning: Cluster node inv1 will be fenced: 
vm-invtest failed there
pacemaker-schedulerd[11669]:  warning: Detected active orphan vm-invtest 
running on inv1
pacemaker-schedulerd[11669]:  warning: Scheduling Node inv1 for STONITH
pacemaker-schedulerd[11669]:  notice: Stop of failed resource vm-invtest is 
implicit after inv1 is fenced
pacemaker-schedulerd[11669]:  notice:  * Fence (reboot) inv1 'vm-invtest failed 
there'
pacemaker-schedulerd[11669]:  notice:  * Move   fencing-inv3 ( inv1 -> 
inv2 )
pacemaker-schedulerd[11669]:  notice:  * Stop   vm-invtest   ( 
inv1 )   due to node availability

The OCF resource agent (on inv1) reported that it failed to validate one
of the attributes passed to it for the stop operation, hence the "not
configured" error, which caused the fencing.  Is there a way to find out
what attributes were passed to the OCF agent in that fateful invocation?
I've got pe-input files, Pacemaker detail logs and a hard time wading
through them.  I failed to reproduce the issue till now (but I haven't
rewound the CIB yet).



Hi Feri,

> Is there a way to find out what attributes were passed to the OCF 
agent in that fateful invocation?


Basically same as with any other operation while the resource was 
configured (with exception of ACTION which was 'stop' in case of 
stopping resource).


As you have the pe-input files which contains the attributes of the 
resource you can get the attributes and their values from there.

==
For example if I have tried to delete my test resource with same name, 
the following can be found in pe-input file


...
  type="Dummy">


  name="target-role" value="Stopped"/>



  value="some_value"/>



  name="migrate_from" timeout="20s"/>
  name="migrate_to" timeout="20s"/>
  name="monitor" timeout="20s"/>
  name="reload" timeout="20s"/>
  name="start" timeout="20s"/>
  name="stop" timeout="20s"/>


  
...
From above you can see that cluster will be stopping it because of the 
'name="target-role" value="Stopped"'. Also you can see that this 
resource has one attribute (nvpair) with value - name="fake 
value="some_value"'. Taking inspiration from 
/usr/lib/ocf/resource.d/pacemaker/Dummy I can see that resource agent 
will be called like
"/usr/lib/ocf/resource.d/pacemaker/Dummy stop" and there will be at 
minimum $OCF_RESKEY_fake variable passed to it. If you can reproduce the 
same issue you can try to dump all variables to file when validation 
fails (take inspiration from function 'dump_env()' of Dummy resource).


So if you wanna check what attributes were set around the time of 
deletion have a look at /var/lib/pacemaker/pengine/pe-input-87.bz2 or 
maybe /var/lib/pacemaker/pengine/pe-input-86.bz2.


--
Ondrej Famera
___
Manage your 

Re: [ClusterLabs] DRBD not failing over

2020-03-03 Thread Jaap Winius



Quoting Jaap Winius :

Very interesting. I'm already running DRBD 9, so that base has  
already been covered, but here's some extra information: My test  
system actually consists of a single 4-node DRBD cluster that spans  
two data centers, with each data center having a 2-node Pacemaker  
cluster to fail resources over between the two DRBD nodes in that  
data center. But, for the purpose of quorum arbitration I guess  
these extra DRBD nodes don't matter, perhaps because four is not an  
odd number?


No, four nodes are enough and I eventually figured it out. There was  
an extra firewall port that may have helped (2224), but I suspect that  
the main problem was that I forgot to enable the SELinux boolean for  
DRBD: daemons_enable_cluster_mode=1. Now everything is working  
perfectly.


Thanks,

Jaap

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] clusterlabs.org upgrade done

2020-03-03 Thread Valentin Vidić
On Sat, Feb 29, 2020 at 03:44:50PM -0600, Ken Gaillot wrote:
> The clusterlabs.org server OS upgrade is (mostly) done.
> 
> Services are back up, with the exception of some cosmetic issues and
> the source code continuous integration testing for ClusterLabs github
> projects (ci.kronosnet.org). Those will be dealt with at a more
> reasonable time :)

Regarding the upgrade, perhaps the mailman config for the list should
be updated to work better with SPF and DKIM checks?

-- 
Valentin
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Finding attributes of a past resource agent invocation

2020-03-03 Thread wferi
Hi,

I suffered unexpected fencing under Pacemaker 2.0.1.  I set a resource
to unmanaged (crm_resource -r vm-invtest -m -p is-managed -v false),
then played with ocf-tester, which left the resource stopped.  Finally I
deleted the resource (crm_resource -r vm-invtest --delete -t primitive),
which led to:

pacemaker-controld[11670]:  notice: State transition S_IDLE -> S_POLICY_ENGINE 
pacemaker-schedulerd[11669]:  notice: Clearing failure of vm-invtest on inv1 
because resource parameters have changed 
pacemaker-schedulerd[11669]:  warning: Processing failed monitor of vm-invtest 
on inv1: not running 
pacemaker-schedulerd[11669]:  warning: Detected active orphan vm-invtest 
running on inv1
pacemaker-schedulerd[11669]:  notice: Clearing failure of vm-invtest on inv1 
because it is orphaned 
pacemaker-schedulerd[11669]:  notice:  * Stop   vm-invtest   (  inv1 )  
 due to node availability
pacemaker-schedulerd[11669]:  notice: Calculated transition 959, saving inputs 
in /var/lib/pacemaker/pengine/pe-input-87.bz2
pacemaker-controld[11670]:  notice: Initiating stop operation vm-invtest_stop_0 
on inv1 
pacemaker-controld[11670]:  notice: Transition 959 aborted by deletion of 
lrm_rsc_op[@id='vm-invtest_last_failure_0']: Resource operation removal 
pacemaker-controld[11670]:  warning: Action 6 (vm-invtest_stop_0) on inv1 
failed (target: 0 vs. rc: 6): Error
pacemaker-controld[11670]:  notice: Transition 959 (Complete=5, Pending=0, 
Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Complete
pacemaker-schedulerd[11669]:  warning: Processing failed stop of vm-invtest on 
inv1: not configured 
pacemaker-schedulerd[11669]:  error: Preventing vm-invtest from re-starting 
anywhere: operation stop failed 'not configured' (6)
pacemaker-schedulerd[11669]:  warning: Processing failed stop of vm-invtest on 
inv1: not configured 
pacemaker-schedulerd[11669]:  error: Preventing vm-invtest from re-starting 
anywhere: operation stop failed 'not configured' (6)
pacemaker-schedulerd[11669]:  warning: Cluster node inv1 will be fenced: 
vm-invtest failed there
pacemaker-schedulerd[11669]:  warning: Detected active orphan vm-invtest 
running on inv1
pacemaker-schedulerd[11669]:  warning: Scheduling Node inv1 for STONITH
pacemaker-schedulerd[11669]:  notice: Stop of failed resource vm-invtest is 
implicit after inv1 is fenced
pacemaker-schedulerd[11669]:  notice:  * Fence (reboot) inv1 'vm-invtest failed 
there'
pacemaker-schedulerd[11669]:  notice:  * Move   fencing-inv3 ( inv1 -> 
inv2 )  
pacemaker-schedulerd[11669]:  notice:  * Stop   vm-invtest   ( 
inv1 )   due to node availability

The OCF resource agent (on inv1) reported that it failed to validate one
of the attributes passed to it for the stop operation, hence the "not
configured" error, which caused the fencing.  Is there a way to find out
what attributes were passed to the OCF agent in that fateful invocation?
I've got pe-input files, Pacemaker detail logs and a hard time wading
through them.  I failed to reproduce the issue till now (but I haven't
rewound the CIB yet).
-- 
Thanks,
Feri
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/