Hi, We sometimes fail in a stop of attrd.
Step1. start a cluster in 2 nodes Step2. stop the first node.(/etc/init.d/heartbeat stop.) Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat stop.) The attrd catches the TERM signal, but does not stop. (snip) Oct 5 02:37:38 hpdb0201 crmd: [12238]: info: do_exit: [crmd] stopped (0) Oct 5 02:37:38 hpdb0201 cib: [12234]: WARN: send_ipc_message: IPC Channel to 12238 is not connected Oct 5 02:37:38 hpdb0201 cib: [12234]: WARN: send_via_callback_channel: Delivery of reply to client 12238/0dbc9e28-d90d-4335-b9c4-9dd3fcb38163 failed Oct 5 02:37:38 hpdb0201 cib: [12234]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed Oct 5 02:37:38 hpdb0201 heartbeat: [12223]: info: killing /usr/lib64/heartbeat/attrd process group 12237 with signal 15 Oct 5 02:47:03 hpdb0201 cib: [12234]: info: cib_stats: Processed 97 operations (4123.00us average, 0% utilization) in the last 10min Oct 5 07:15:25 hpdb0201 ccm: [12233]: WARN: G_CH_check_int: working on IPC channel took 1010 ms (> 100 ms) Oct 5 07:15:26 hpdb0201 ccm: [12233]: WARN: G_CH_check_int: working on IPC channel took 1010 ms (> 100 ms) Oct 5 07:15:37 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch: Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before being called (GSource: 0xd28010) Oct 5 07:15:37 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch: started at 431583547 should have started at 431583444 Oct 5 07:15:44 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status was delayed 1030 ms (> 1010 ms) before being called (GSource: 0xd27dd0) Oct 5 07:15:44 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch: started at 431584254 should have started at 431584151 Oct 5 07:15:44 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch: Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before being called (GSource: 0xd28010) Oct 5 07:15:44 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch: started at 431584254 should have started at 431584151 Oct 5 07:16:59 hpdb0201 heartbeat: [12223]: WARN: G_CH_check_int: working on write child took 1010 ms (> 100 ms) Oct 5 07:17:14 hpdb0201 stonithd: [12236]: WARN: G_CH_check_int: working on Heartbeat API channel took 1010 ms (> 100 ms) Oct 5 07:19:41 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status was delayed 1030 ms (> 1010 ms) before being called (GSource: 0xd27dd0) Oct 5 07:19:41 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch: started at 431607988 should have started at 431607885 Oct 5 07:19:41 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch: Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before being called (GSource: 0xd28010) Oct 5 07:19:41 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch: started at 431607988 should have started at 431607885 (snip) We try the reproduction of the phenomenon, but do not reappear very much. The same phenomenon is reported by the next email. However, the argument of the problem is over on the way. * http://www.gossamer-threads.com/lists/linuxha/pacemaker/62147 The phenomenon occurred by the next combination. * pacemaker-1.0.11 * resource-agents-3.9.2 * cluster-glue-1.0.7 * heartbeat-3.0.5 I registered these contents with Bugzilla. * http://bugs.clusterlabs.org/show_bug.cgi?id=5004 Best Regards, Hideo Yamauchi. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker