Hi, Ulrich. I presume you're talking about the log_finished line, which looks like this in pacemaker.log:
Feb 05 01:48:47.192 fastvm-rhel-8-0-23 pacemaker-execd [15446] (log_finished@execd_commands.c:214) info: dummy start (call 23, PID 18743) exited with status 0 (execution time 11ms, queue time 0ms) Is that correct? I got curious and took a look. It looks like this comes down to the action_complete() function: - https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-2.0.5/daemons/execd/execd_commands.c#L896-L1050 This function calls cmd_finalize() at the end, which calls a chain of functions ending in log_finished(). log_finished() logs the "start ... exited with status" line shown above. However, systemd resources have to be handled a bit differently. We have to schedule a monitor operation to run after the start operation, in order to check whether the start was truly a success. ~~~ if (pcmk__str_eq(rclass, PCMK_RESOURCE_CLASS_SYSTEMD, pcmk__str_casei)) { if ((cmd->exec_rc == PCMK_OCF_OK) && pcmk__strcase_any_of(cmd->action, "start", "stop", NULL)) { /* systemd returns from start and stop actions after the action * begins, not after it completes. We have to jump through a few * hoops so that we don't report 'complete' to the rest of pacemaker * until it's actually done. */ goagain = true; cmd->real_action = cmd->action; cmd->action = strdup("monitor"); ... if (goagain) { ... schedule_lrmd_cmd(rsc, cmd); /* Don't finalize cmd, we're not done with it yet */ return; ~~~ So for the start operation, it never reaches the cmd_finalize() call at the end, until the follow-up monitor runs. The follow-up monitor operation does end up calling cmd_finalize() at the end of action_complete. But the "log_finished" message is logged at debug level for monitor operations. So you won't see it unless debugging is enabled. Does this make sense? Example: ~~~ Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd [19354] (log_execute) info: executing - rsc:nfs-daemon action:start call_id:20 Feb 05 02:06:51.123 fastvm-rhel-8-0-23 pacemaker-execd [19354] (systemd_unit_exec) debug: Performing asynchronous start op on systemd unit nfs-server.service named 'nfs-daemon' Feb 05 02:06:51.124 fastvm-rhel-8-0-23 pacemaker-execd [19354] (systemd_unit_exec_with_unit) debug: Calling StartUnit for nfs-daemon: /org/freedesktop/systemd1/unit/nfs_2dserver_2eservice Feb 05 02:06:51.517 fastvm-rhel-8-0-23 pacemaker-execd [19354] (action_complete) debug: nfs-daemon start may still be in progress: re-scheduling (elapsed=394ms, remaining=99606ms, start_delay=2000ms) Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd [19354] (log_execute) debug: executing - rsc:nfs-daemon action:monitor call_id:20 Feb 05 02:06:53.518 fastvm-rhel-8-0-23 pacemaker-execd [19354] (systemd_unit_exec) debug: Performing asynchronous status op on systemd unit nfs-server.service named 'nfs-daemon' Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd [19354] (action_complete) debug: nfs-daemon systemd start is now complete (elapsed=2397ms, remaining=97603ms): ok (0) Feb 05 02:06:53.521 fastvm-rhel-8-0-23 pacemaker-execd [19354] (log_finished) debug: nfs-daemon monitor (call 20) exited with status 0 (execution time 2397ms, queue time 0ms) ~~~ On Thu, Feb 4, 2021 at 11:25 PM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > Hi! > > While analyzing cluster problems I noticed this: > Normal resources executed via OCF RAs create two log entries by > pacemaker-execd: One when starting the resource and another when the > resource completed starting. > However for systemd units I only get a start message. Is that intentional? > Does that mean systemd starts are asynchronous in general (i.e.: The > process returns before start in complete)? > (Still I get a completed message from pacemaker-controld) > > Example: > Feb 04 15:41:25 h19 pacemaker-execd[7793]: notice: executing - > rsc:prm_virtlockd action:start call_id:95 > Feb 04 15:41:27 h19 pacemaker-execd[7793]: notice: executing - > rsc:prm_libvirtd action:start call_id:97 > > So one could guess that lirtlockd and libvirtd are staring concurrently, > but the did not because of this sequence: > Feb 04 15:41:27 h19 pacemaker-controld[7796]: notice: Result of start > operation for prm_virtlockd on h19: ok > Feb 04 15:41:27 h19 pacemaker-execd[7793]: notice: executing - > rsc:prm_libvirtd action:start call_id:97 > > Regards, > Ulrich > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > -- Regards, Reid Wahl, RHCA Senior Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/