On Fri, Dec 8, 2023 at 5:44 PM Artem <tyom...@gmail.com> wrote: > > Hello experts. > > I use pacemaker for a Lustre cluster. But for simplicity and exploration I > use a Dummy resource. I didn't like how resource performed failover and > failback. When I shut down VM with remote agent, pacemaker tries to restart > it. According to pcs status it marks the resource (not RA) Online for some > time while VM stays down. > > OK, I wanted to improve its behavior and set up a ping monitor. I tuned the > scores like this: > pcs resource create FAKE3 ocf:pacemaker:Dummy > pcs resource create FAKE4 ocf:pacemaker:Dummy > pcs constraint location FAKE3 prefers lustre3=100 > pcs constraint location FAKE3 prefers lustre4=90 > pcs constraint location FAKE4 prefers lustre3=90 > pcs constraint location FAKE4 prefers lustre4=100 > pcs resource defaults update resource-stickiness=110 > pcs resource create ping ocf:pacemaker:ping dampen=5s host_list=local op > monitor interval=3s timeout=7s clone meta target-role="started" > for i in lustre{1..4}; do pcs constraint location ping-clone prefers $i; done > pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined pingd > pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined pingd > pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined pingd > pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined pingd >
These rules are contradicting. You set the score to 125 if pingd is defined and at the same time set it to 0 if the score is less than 1. To be "less than 1" it must be defined to start with so both rules will always apply. I do not know how the rules are ordered. Either you get random behavior, or one pair of these rules is effectively ignored. > > Question #1) Why I cannot see accumulated score from pingd in crm_simulate > output? Only location score and stickiness. > pcmk__primitive_assign: FAKE3 allocation score on lustre3: 210 > pcmk__primitive_assign: FAKE3 allocation score on lustre4: 90 > pcmk__primitive_assign: FAKE4 allocation score on lustre3: 90 > pcmk__primitive_assign: FAKE4 allocation score on lustre4: 210 > Either when all is OK or when VM is down - score from pingd not added to > total score of RA > > > Question #2) I shut lustre3 VM down and leave it like that. pcs status: > * FAKE3 (ocf::pacemaker:Dummy): Stopped > * FAKE4 (ocf::pacemaker:Dummy): Started lustre4 > * Clone Set: ping-clone [ping]: > * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ] > << lustre3 missing > OK for now > VM boots up. pcs status: > * FAKE3 (ocf::pacemaker:Dummy): FAILED (blocked) [ lustre3 lustre4 ] > << what is it? > * Clone Set: ping-clone [ping]: > * ping (ocf::pacemaker:ping): FAILED lustre3 (blocked) << why > not started? > * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ] If this is full pcs status output, I miss stonith resource. > I checked server processes manually and found that lustre4 runs > "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3 doesn't > All is according to documentation but results are strange. > Then I tried to add meta target-role="started" to pcs resource create ping > and this time ping started after node rebooted. Can I expect that it was just > missing from official setup documentation, and now everything will work fine? > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/