On Fri, 2023-12-08 at 17:44 +0300, Artem wrote:
> Hello experts.
> 
> I use pacemaker for a Lustre cluster. But for simplicity and
> exploration I use a Dummy resource. I didn't like how resource
> performed failover and failback. When I shut down VM with remote
> agent, pacemaker tries to restart it. According to pcs status it
> marks the resource (not RA) Online for some time while VM stays
> down. 
> 
> OK, I wanted to improve its behavior and set up a ping monitor. I
> tuned the scores like this:
> pcs resource create FAKE3 ocf:pacemaker:Dummy
> pcs resource create FAKE4 ocf:pacemaker:Dummy
> pcs constraint location FAKE3 prefers lustre3=100
> pcs constraint location FAKE3 prefers lustre4=90
> pcs constraint location FAKE4 prefers lustre3=90
> pcs constraint location FAKE4 prefers lustre4=100
> pcs resource defaults update resource-stickiness=110
> pcs resource create ping ocf:pacemaker:ping dampen=5s host_list=local
> op monitor interval=3s timeout=7s clone meta target-role="started"
> for i in lustre{1..4}; do pcs constraint location ping-clone prefers
> $i; done
> pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined
> pingd
> pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined
> pingd
> pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined
> pingd
> pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined
> pingd

The gt 0 part is redundant since "defined pingd" matches *any* score.

> 
> 
> Question #1) Why I cannot see accumulated score from pingd in
> crm_simulate output? Only location score and stickiness. 
> pcmk__primitive_assign: FAKE3 allocation score on lustre3: 210
> pcmk__primitive_assign: FAKE3 allocation score on lustre4: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre3: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre4: 210
> Either when all is OK or when VM is down - score from pingd not added
> to total score of RA

ping scores aren't added to resource scores, they're just set as node
attribute values. Location constraint rules map those values to
resource scores (in this case any defined ping score gets mapped to
125).

> 
> 
> Question #2) I shut lustre3 VM down and leave it like that. pcs
> status:

How did you shut it down? Outside cluster control, or with something
like pcs resource disable?

>   * FAKE3       (ocf::pacemaker:Dummy):  Stopped
>   * FAKE4       (ocf::pacemaker:Dummy):  Started lustre4
>   * Clone Set: ping-clone [ping]:
>     * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2
> lustre4 ] << lustre3 missing
> OK for now
> VM boots up. pcs status: 
>   * FAKE3       (ocf::pacemaker:Dummy):  FAILED (blocked) [ lustre3
> lustre4 ]  << what is it?
>   * Clone Set: ping-clone [ping]:
>     * ping      (ocf::pacemaker:ping):   FAILED lustre3 (blocked)   
> << why not started?
>     * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2
> lustre4 ]
> I checked server processes manually and found that lustre4 runs
> "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3
> doesn't
> All is according to documentation but results are strange.
> Then I tried to add meta target-role="started" to pcs resource create
> ping and this time ping started after node rebooted. Can I expect
> that it was just missing from official setup documentation, and now
> everything will work fine?
-- 
Ken Gaillot <kgail...@redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to