Hello experts. I use pacemaker for a Lustre cluster. But for simplicity and exploration I use a Dummy resource. I didn't like how resource performed failover and failback. When I shut down VM with remote agent, pacemaker tries to restart it. According to pcs status it marks the resource (not RA) Online for some time while VM stays down.
OK, I wanted to improve its behavior and set up a ping monitor. I tuned the scores like this: pcs resource create FAKE3 ocf:pacemaker:Dummy pcs resource create FAKE4 ocf:pacemaker:Dummy pcs constraint location FAKE3 prefers lustre3=100 pcs constraint location FAKE3 prefers lustre4=90 pcs constraint location FAKE4 prefers lustre3=90 pcs constraint location FAKE4 prefers lustre4=100 pcs resource defaults update resource-stickiness=110 pcs resource create ping ocf:pacemaker:ping dampen=5s host_list=local op monitor interval=3s timeout=7s clone meta target-role="started" for i in lustre{1..4}; do pcs constraint location ping-clone prefers $i; done pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined pingd pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined pingd pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined pingd pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined pingd Question #1) Why I cannot see accumulated score from pingd in crm_simulate output? Only location score and stickiness. pcmk__primitive_assign: FAKE3 allocation score on lustre3: 210 pcmk__primitive_assign: FAKE3 allocation score on lustre4: 90 pcmk__primitive_assign: FAKE4 allocation score on lustre3: 90 pcmk__primitive_assign: FAKE4 allocation score on lustre4: 210 Either when all is OK or when VM is down - score from pingd not added to total score of RA Question #2) I shut lustre3 VM down and leave it like that. pcs status: * FAKE3 (ocf::pacemaker:Dummy): Stopped * FAKE4 (ocf::pacemaker:Dummy): Started lustre4 * Clone Set: ping-clone [ping]: * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ] << lustre3 missing OK for now VM boots up. pcs status: * FAKE3 (ocf::pacemaker:Dummy): FAILED (blocked) [ lustre3 lustre4 ] << what is it? * Clone Set: ping-clone [ping]: * ping (ocf::pacemaker:ping): FAILED lustre3 (blocked) << why not started? * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ] I checked server processes manually and found that lustre4 runs "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3 doesn't All is according to documentation but results are strange. Then I tried to add meta target-role="started" to pcs resource create ping and this time ping started after node rebooted. Can I expect that it was just missing from official setup documentation, and now everything will work fine?
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/