On Mon, Jun 14, 2010 at 4:46 AM, <renayama19661...@ybb.ne.jp> wrote: > We tested 16 node constitution (15+1). > > We carried out the next procedure. > > Step1) Start 16 nodes. > Step2) Send cib after a DC node was decided. > > An error occurs by the update of the attribute of pingd after Probe > processing was over. > > ---------------------------------------------------------------------------------------------------------------------------------------- > Jun 14 10:58:03 hb0102 pingd: [2465]: info: ping_read: Retrying... > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 337 > for default_ping_set=1600 > failed: Remote node did not respond > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 340 > for default_ping_set=1600 > failed: Remote node did not respond > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 343 > for default_ping_set=1600 > failed: Remote node did not respond > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 346 > for default_ping_set=1600 > failed: Remote node did not respond > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 349 > for default_ping_set=1600 > failed: Remote node did not respond > ---------------------------------------------------------------------------------------------------------------------------------------- > > In the middle of this error, I carried out a cibadmin(-Q optin) command, but > time-out occurred. > In addition, cib of the DC node seemed to move by the top command very busily. > > > In addition, a communication error with cib occurs in the DC node, and crmd > reboots. > > ---------------------------------------------------------------------------------------------------------------------------------------- > Jun 14 10:58:09 hb0101 attrd: [2278]: WARN: xmlfromIPC: No message received > in the required interval > (120s) > Jun 14 10:58:09 hb0101 attrd: [2278]: info: attrd_perform_update: Sent update > -41: > default_ping_set=1600 > (snip) > Jun 14 10:59:07 hb0101 crmd: [2280]: info: do_exit: [crmd] stopped (2) > Jun 14 10:59:07 hb0101 corosync[2269]: [pcmk ] plugin.c:858 info: > pcmk_ipc_exit: Client crmd > (conn=0x106a2bf0, async-conn=0x106a2bf0) left > Jun 14 10:59:08 hb0101 corosync[2269]: [pcmk ] plugin.c:481 ERROR: > pcmk_wait_dispatch: Child > process crmd exited (pid=2280, rc=2) > Jun 14 10:59:08 hb0101 corosync[2269]: [pcmk ] plugin.c:498 notice: > pcmk_wait_dispatch: Respawning > failed child process: crmd > Jun 14 10:59:08 hb0101 corosync[2269]: [pcmk ] utils.c:131 info: > spawn_child: Forked child 2680 for > process crmd > Jun 14 10:59:08 hb0101 crmd: [2680]: info: Invoked: /usr/lib64/heartbeat/crmd > Jun 14 10:59:08 hb0101 crmd: [2680]: info: main: CRM Hg Version: > 9f04fa88cfd3da553e977cc79983d1c494c8b502 > Jun 14 10:59:08 hb0101 crmd: [2680]: info: crmd_init: Starting crmd > Jun 14 10:59:08 hb0101 crmd: [2680]: info: G_main_add_SignalHandler: Added > signal handler for signal > 17 > ---------------------------------------------------------------------------------------------------------------------------------------- > > There seems to be a problem in cib of the DC node somehow or other. > We hope that an attribute change is completed in 16 nodes definitely. > * Is this phenomenon a limit of the current cib process?
More likely of the underlying messaging infrastructure, but I'll take a look. Perhaps the default cib operation timeouts are too low for larger clusters. > > The log attached it to next Bugzilla. > * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443 Ok, I'll follow up there. > > Best Regards, > Hideo Yamauchi. > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker