Re: [Pacemaker] [Patch] An error may occur to be behind with a stop of pingd.

renayama19661014 Wed, 17 Apr 2013 18:59:00 -0700

Hi All,

I sent the pull request of this patch.


 * https://github.com/ClusterLabs/pacemaker-1.0/pull/13

Best Regards,
Hideo Yamauchi.

--- On Wed, 2013/4/10, [email protected] <[email protected]> 
wrote:

> Hi All,
> 
> We confirmed the phenomenon that an error generated to be behind with a stop 
> of pingd.
> 
> The problem seems to be to be behind with receiving of SIGTERM of pingd until 
> stand_alone_ping processing is completed.
> 
> ------------------------------------------------------------------------------------------------------------------------
> Apr 11 00:48:33 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:36 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:39 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:42 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:45 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:48 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> (snip)
> Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing 
> /usr/lib64/heartbeat/crmd process group 2427 with signal 15
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_signal_dispatch: 
> Invoking handler for signal 15: Terminated
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: crm_shutdown: Requesting 
> shutdown
> (snip)
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: te_rsc_command: 
> Initiating action 9: stop prmPingd:0_stop_0 on rh64-heartbeat1 (local)
> Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: cancel_op: operation 
> monitor[5] on prmPingd:0 for client 2427, its parameters: CRM_meta_clone=[0] 
> host_list=[192.168.40.1] name=[default_ping_set] attempts=[2] 
> CRM_meta_clone_node_max=[1] CRM_meta_clone_max=[1] CRM_meta_notify=[false] 
> CRM_meta_globally_unique=[false] crm_feature_set=[3.0.1] interval=[1] 
> timeout=[2] CRM_meta_on_fail=[restart] CRM_meta_name=[monitor] 
> multiplier=[100] CRM_meta_interval=[10000] CRM_meta_timeout=[60000]  cancelled
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: do_lrm_rsc_op: Performing 
> key=9:4:0:948901c2-4e97-4715-9f6b-1611810f8ef7 op=prmPingd:0_stop_0 )
> Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: rsc:prmPingd:0 stop[9] 
> (pid 2570)
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM 
> operation prmPingd:0_monitor_10000 (call=5, status=1, cib-update=0, 
> confirmed=true) Cancelled
> Apr 11 00:48:50 rh64-heartbeat1 pingd: [2505]: info: stand_alone_ping: Node 
> 192.168.40.1 is unreachable (read)
> Apr 11 00:48:50 rh64-heartbeat1 lrmd: [2424]: info: operation stop[9] on 
> prmPingd:0 for client 2427: pid 2570 exited with return code 0
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: process_lrm_event: LRM 
> operation prmPingd:0_stop_0 (call=9, rc=0, cib-update=59, confirmed=true) ok
> Apr 11 00:48:50 rh64-heartbeat1 crmd: [2427]: info: match_graph_event: Action 
> prmPingd:0_stop_0 (9) confirmed on rh64-heartbeat1 (rc=0)
> (snip)
> Apr 11 00:48:50 rh64-heartbeat1 heartbeat: [2413]: info: killing 
> /usr/lib64/heartbeat/ccm process group 2422 with signal 15
> Apr 11 00:48:50 rh64-heartbeat1 ccm: [2422]: info: received SIGTERM, going to 
> shut down
> Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: ERROR: send_ipc_message: IPC 
> Channel to 2426 is not connected                        -------> ERROR
> Apr 11 00:48:51 rh64-heartbeat1 pingd: [2505]: info: attrd_update: Could not 
> send update: default_ping_set=0 for localhost
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBWRITE 
> process 2418 with signal 15
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBREAD 
> process 2419 with signal 15
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: killing HBFIFO 
> process 2417 with signal 15
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2417 
> exited. 3 remaining
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2418 
> exited. 2 remaining
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: Core process 2419 
> exited. 1 remaining
> Apr 11 00:48:51 rh64-heartbeat1 heartbeat: [2413]: info: rh64-heartbeat1 
> Heartbeat shutdown complete.
> Apr 11 00:48:53 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 4 retries remaining                --------> Pingd 
> do not yet stop
> Apr 11 00:48:55 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 3 retries remaining
> Apr 11 00:48:57 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 2 retries remaining
> Apr 11 00:48:59 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 1 retries remaining
> Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: crm_signal_dispatch: 
> Invoking handler for signal 15: Terminated
> Apr 11 00:49:01 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 5 retries remaining
> Apr 11 00:49:03 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 4 retries remaining
> Apr 11 00:49:05 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 3 retries remaining
> Apr 11 00:49:07 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 2 retries remaining
> Apr 11 00:49:09 rh64-heartbeat1 pingd: [2505]: info: attrd_lazy_update: 
> Connecting to cluster... 1 retries remaining
> ------------------------------------------------------------------------------------------------------------------------
> 
> I added the end confirmation of the pingd process to solve this problem.
> 
> I attached a patch.
> Please take this patch in Pacemaker1.0.
> 
> Best Reargds,
> Hideo Yamauchi.
> 
> 
> 
> 

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] [Patch] An error may occur to be behind with a stop of pingd.

Reply via email to