Hi,

We sometimes fail in a stop of attrd.

Step1. start a cluster in 2 nodes
Step2. stop the first node.(/etc/init.d/heartbeat stop.)
Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat
stop.)

The attrd catches the TERM signal, but does not stop.

(snip)
Oct  5 02:37:38 hpdb0201 crmd: [12238]: info: do_exit: [crmd] stopped (0)
Oct  5 02:37:38 hpdb0201 cib: [12234]: WARN: send_ipc_message: IPC Channel to
12238 is not connected
Oct  5 02:37:38 hpdb0201 cib: [12234]: WARN: send_via_callback_channel:
Delivery of reply to client 12238/0dbc9e28-d90d-4335-b9c4-9dd3fcb38163 failed
Oct  5 02:37:38 hpdb0201 cib: [12234]: WARN: do_local_notify: A-Sync reply to
crmd failed: reply failed
Oct  5 02:37:38 hpdb0201 heartbeat: [12223]: info: killing
/usr/lib64/heartbeat/attrd process group 12237 with signal 15
Oct  5 02:47:03 hpdb0201 cib: [12234]: info: cib_stats: Processed 97 operations
(4123.00us average, 0% utilization) in the last 10min
Oct  5 07:15:25 hpdb0201 ccm: [12233]: WARN: G_CH_check_int: working on IPC
channel took 1010 ms (> 100 ms)
Oct  5 07:15:26 hpdb0201 ccm: [12233]: WARN: G_CH_check_int: working on IPC
channel took 1010 ms (> 100 ms)
Oct  5 07:15:37 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd28010)
Oct  5 07:15:37 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431583547 should have started at 431583444
Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd27dd0)
Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431584254 should have started at 431584151
Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd28010)
Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431584254 should have started at 431584151
Oct  5 07:16:59 hpdb0201 heartbeat: [12223]: WARN: G_CH_check_int: working on
write child took 1010 ms (> 100 ms)
Oct  5 07:17:14 hpdb0201 stonithd: [12236]: WARN: G_CH_check_int: working on
Heartbeat API channel took 1010 ms (> 100 ms)
Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd27dd0)
Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431607988 should have started at 431607885
Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd28010)
Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431607988 should have started at 431607885
(snip)

We try the reproduction of the phenomenon, but do not reappear very much.

The same phenomenon is reported by the next email.
However, the argument of the problem is over on the way.

 * http://www.gossamer-threads.com/lists/linuxha/pacemaker/62147

The phenomenon occurred by the next combination.
 * pacemaker-1.0.11
 * resource-agents-3.9.2
 * cluster-glue-1.0.7
 * heartbeat-3.0.5

I registered these contents with Bugzilla.
 * http://bugs.clusterlabs.org/show_bug.cgi?id=5004

Best Regards,
Hideo Yamauchi.

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to