Hi All, I sent the pull request of this patch.
* https://github.com/ClusterLabs/pacemaker-1.0/pull/13 Best Regards, Hideo Yamauchi. --- On Wed, 2013/4/10, [email protected] <[email protected]> wrote: > Hi All, > > We confirmed the phenomenon that an error generated to be behind with a stop > of pingd. > > The problem seems to be to be behind with receiving of SIGTERM of pingd until > stand_alone_ping processing is completed. > > ------------------------------------------------------------------------------------------------------------------------ > Apr 11 00:48:33 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node > 192.168.40.1 is unreachable (read) > Apr 11 00:48:36 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node > 192.168.40.1 is unreachable (read) > Apr 11 00:48:39 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node > 192.168.40.1 is unreachable (read) > Apr 11 00:48:42 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node > 192.168.40.1 is unreachable (read) > Apr 11 00:48:45 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node > 192.168.40.1 is unreachable (read) > Apr 11 00:48:48 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node > 192.168.40.1 is unreachable (read) > (snip) > Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing > /usr/lib64/heartbeat/crmd process group 2427 with signal 15 > Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_signal_dispatch: > Invoking handler for signal 15: Terminated > Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_shutdown: Requesting > shutdown > (snip) > Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: te_rsc_command: > Initiating action 9: stop prmPingd:0_stop_0 on rh64-heartbeat1 (local) > Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: cancel_op: operation > monitor[5] on prmPingd:0 for client 2427, its parameters: CRM_meta_clone=[0] > host_list=[192.168.40.1] name=[default_ping_set] attempts=[2] > CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] > CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[1] > timeout=[2] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] > multiplier=[100] CRM_meta_interval=[10000] CRM_meta_timeout=[60000] cancelled > Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: do_lrm_rsc_op: Performing > key=9:4:0:948901c2-4e97-4715-9f6b-1611810f8ef7 op=prmPingd:0_stop_0 ) > Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: rsc:prmPingd:0 stop[9] > (pid 2570) > Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM > operation prmPingd:0_monitor_10000 (call=5, status=1, cib-update=0, > confirmed=true) Cancelled > Apr 11 00:48:50 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node > 192.168.40.1 is unreachable (read) > Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: operation stop[9] on > prmPingd:0 for client 2427: pid 2570 exited with return code 0 > Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM > operation prmPingd:0_stop_0 (call=9, rc=0, cib-update=59, confirmed=true) ok > Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: match_graph_event: Action > prmPingd:0_stop_0 (9) confirmed on rh64-heartbeat1 (rc=0) > (snip) > Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing > /usr/lib64/heartbeat/ccm process group 2422 with signal 15 > Apr 11 00:48:50 rh64-heartbeat1 ccm: [2422]: info: received SIGTERM, going to > shut down > Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: ERROR: send_ipc_message: IPC > Channel to 2426 is not connected -------> ERROR > Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: info: attrd_update: Could not > send update: default_ping_set=0 for localhost > Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBWRITE > process 2418 with signal 15 > Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBREAD > process 2419 with signal 15 > Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBFIFO > process 2417 with signal 15 > Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2417 > exited. 3 remaining > Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2418 > exited. 2 remaining > Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2419 > exited. 1 remaining > Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: rh64-heartbeat1 > Heartbeat shutdown complete. > Apr 11 00:48:53 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: > Connecting to cluster... 4 retries remaining --------> Pingd > do not yet stop > Apr 11 00:48:55 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: > Connecting to cluster... 3 retries remaining > Apr 11 00:48:57 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: > Connecting to cluster... 2 retries remaining > Apr 11 00:48:59 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: > Connecting to cluster... 1 retries remaining > Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: crm_signal_dispatch: > Invoking handler for signal 15: Terminated > Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: > Connecting to cluster... 5 retries remaining > Apr 11 00:49:03 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: > Connecting to cluster... 4 retries remaining > Apr 11 00:49:05 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: > Connecting to cluster... 3 retries remaining > Apr 11 00:49:07 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: > Connecting to cluster... 2 retries remaining > Apr 11 00:49:09 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: > Connecting to cluster... 1 retries remaining > ------------------------------------------------------------------------------------------------------------------------ > > I added the end confirmation of the pingd process to solve this problem. > > I attached a patch. > Please take this patch in Pacemaker1.0. > > Best Reargds, > Hideo Yamauchi. > > > > _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
