On 01/19/2016 11:02 AM, Kostiantyn Ponomarenko wrote: > Just in case, this is the monitor function from the resource agent: > ra_monitor() { > # ocf_log info "$RA: [monitor]" > systemctl status ${service} > rc=$? > if [ "$rc" -eq "0" ]; then > return $OCF_SUCCESS > fi > > ocf_log warn "$RA: [monitor] : got rc=$rc" > return $OCF_NOT_RUNNING > }
Out of curiosity, why are you wrapping systemctl with OCF when pacemaker supports systemd resources natively? The native support works around a number of quirks in systemd behavior. (In fact a recent commit to the master branch handles yet another one.) > Thank you, > Kostia > > On Tue, Jan 19, 2016 at 6:30 PM, Kostiantyn Ponomarenko < > konstantin.ponomare...@gmail.com> wrote: > >> The resource that wasn't running, but was reported as running, is >> "adminServer". >> >> Here are a brief chronological description: >> >> [Jan 19 23:42:16] The first time Pacemaker triggers its monitor function >> at line #1107. (those lines are from its Resource Agent) >> [Jan 19 23:42:16] Then Pacemaker starts the resource - line #1191. >> [Jan 19 11:42:53] The first failure is reported by monitor operation at >> line #1543. >> [Jan 19 11:42:53] The fail-count is set, but I don't see any attempt from >> Pacemaker to "start" the resource - the start function is not called (from >> the logs) - line #1553. >> [Jan 19 12:27:56] Then adminServer's monitor operation keeps returning >> $OCF_NOT_RUNNING - starts at line #1860. >> [Jan 19 12:57:53] Then the expired failcount is cleared at line #1969. >> [Jan 19 12:57:53] Another call of the monitor function happens at line >> #2038. >> [Jan 19 12:57:53] I assume that the line #2046 means "not running" (?). >> [Jan 19 12:57:53] The "stop" function is called - line #2150 >> [Jan 19 12:57:53] The "start" function is called and the resource is >> successfully started - line #2164 >> >> >> The time change occurred while cluster was starting, I see this from >> "journalctl --since="2016-01-19" --until="2016-01-20"": >> >> Jan 19 23:10:39 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c61c 0c clock_step >> -43193.793349 s >> Jan 19 11:10:45 A2-2U12-302-LS ntpd[2210]: 0.0.0.0 c614 04 freq_mode >> Jan 19 11:10:45 A2-2U12-302-LS systemd[1]: Time has been changed >> >> I am attaching corosync.log. >> >> Thank you, >> Kostia >> >> On Tue, Jan 19, 2016 at 5:17 PM, Bogdan Dobrelya <bdobre...@mirantis.com> >> wrote: >> >>> On 19.01.2016 16:13, Ken Gaillot wrote: >>>> On 01/19/2016 06:49 AM, Kostiantyn Ponomarenko wrote: >>>>> One of resources in my cluster is not actually running, but "crm_mon" >>> shows >>>>> it with the "Started" status. >>>>> Its resource agent's monitor function returns "$OCF_NOT_RUNNING", but >>>>> Pacemaker doesn't react on this anyhow - crm_mon show the resource as >>>>> Started. >>>>> I couldn't find an explanation to this behavior, so I suppose it is a >>> bug, >>>>> is it? >>>> >>>> That is unexpected. Can you post the configuration and logs from around >>>> the time of the issue? >>>> >>> >>> Oh, sorry, I forgot to mention the related thread [0]. That is exactly >>> the case I reported there. Looks same, so I thought you've just updated >>> my thread :) >>> >>> These may be merged perhaps. >>> >>> [0] http://clusterlabs.org/pipermail/users/2016-January/002035.html _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org