Hi Andrew, I changed cluster option to batch-limit=3,I re-tried it. However, similar time-out occurs.
I measured processing just before the time-out(120s) in systemtap. The following only the function long time. ----- probe start! --------------------------------- cib_process_request [call-count:179][117,540,173,155 nsec] cib_process_command [call:179] [116,471,047,275 nsec] cib_process_command call function --- cib_config_changed [call:179] [101,169,909,572 nsec] cib_config_changed call function --- calculate_xml_digest [call:179] [ 68,820,560,745 nsec] create_xml_node [call:3012263] [ 19,855,469,976 nsec]※ xpath_search [call:179] [ 145,030,232 nsec] diff_xml_object [call:179] [ 32,677,359,476 nsec]※ calculate_xml_digest call function --- sorted_xml [call:1505799] [ 52,512,465,838 nsec]※ copy_xml [call:179] [ 3,692,232,073 nsec] dump_xml [call:536] [ 6,177,606,232 nsec] ----- Is there the method to make these processing early? 2010/6/14 <renayama19661...@ybb.ne.jp> > Hi Andrew, > > Thank you for comment. > > > More likely of the underlying messaging infrastructure, but I'll take a > look. > > Perhaps the default cib operation timeouts are too low for larger > clusters. > > > > > > > > The log attached it to next Bugzilla. > > > �* > http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443 > > > > Ok, I'll follow up there. > > If it is necessary for us to work for the solution of the problem, please > order it. > > Best Regards, > Hideo Yamauchi. > > --- Andrew Beekhof <and...@beekhof.net> wrote: > > > On Mon, Jun 14, 2010 at 4:46 AM, <renayama19661...@ybb.ne.jp> wrote: > > > We tested 16 node constitution (15+1). > > > > > > We carried out the next procedure. > > > > > > Step1) Start 16 nodes. > > > Step2) Send cib after a DC node was decided. > > > > > > An error occurs by the update of the attribute of pingd after Probe > processing was over. > > > > > > > > > > ---------------------------------------------------------------------------------------------------------------------------------------- > > > Jun 14 10:58:03 hb0102 pingd: [2465]: info: ping_read: Retrying... > > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update > 337 for > > default_ping_set=1600 > > > failed: Remote node did not respond > > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update > 340 for > > default_ping_set=1600 > > > failed: Remote node did not respond > > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update > 343 for > > default_ping_set=1600 > > > failed: Remote node did not respond > > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update > 346 for > > default_ping_set=1600 > > > failed: Remote node did not respond > > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update > 349 for > > default_ping_set=1600 > > > failed: Remote node did not respond > > > > > > > ---------------------------------------------------------------------------------------------------------------------------------------- > > > > > > In the middle of this error, I carried out a cibadmin(-Q optin) > command, but time-out > > occurred. > > > In addition, cib of the DC node seemed to move by the top command very > busily. > > > > > > > > > In addition, a communication error with cib occurs in the DC node, and > crmd reboots. > > > > > > > > > > ---------------------------------------------------------------------------------------------------------------------------------------- > > > Jun 14 10:58:09 hb0101 attrd: [2278]: WARN: xmlfromIPC: No message > received in the required > > interval > > > (120s) > > > Jun 14 10:58:09 hb0101 attrd: [2278]: info: attrd_perform_update: Sent > update -41: > > > default_ping_set=1600 > > > (snip) > > > Jun 14 10:59:07 hb0101 crmd: [2280]: info: do_exit: [crmd] stopped (2) > > > Jun 14 10:59:07 hb0101 corosync[2269]: � [pcmk �] > plugin.c:858 info: pcmk_ipc_exit: > Client > > crmd > > > (conn=0x106a2bf0, async-conn=0x106a2bf0) left > > > Jun 14 10:59:08 hb0101 corosync[2269]: � [pcmk �] > plugin.c:481 ERROR: > pcmk_wait_dispatch: > > Child > > > process crmd exited (pid=2280, rc=2) > > > Jun 14 10:59:08 hb0101 corosync[2269]: � [pcmk �] > plugin.c:498 notice: > pcmk_wait_dispatch: > > Respawning > > > failed child process: crmd > > > Jun 14 10:59:08 hb0101 corosync[2269]: � [pcmk �] > utils.c:131 info: spawn_child: > Forked child > > 2680 for > > > process crmd > > > Jun 14 10:59:08 hb0101 crmd: [2680]: info: Invoked: > /usr/lib64/heartbeat/crmd > > > Jun 14 10:59:08 hb0101 crmd: [2680]: info: main: CRM Hg Version: > > > 9f04fa88cfd3da553e977cc79983d1c494c8b502 > > > Jun 14 10:59:08 hb0101 crmd: [2680]: info: crmd_init: Starting crmd > > > Jun 14 10:59:08 hb0101 crmd: [2680]: info: G_main_add_SignalHandler: > Added signal handler for > > signal > > > 17 > > > > > > > ---------------------------------------------------------------------------------------------------------------------------------------- > > > > > > There seems to be a problem in cib of the DC node somehow or other. > > > We hope that an attribute change is completed in 16 nodes definitely. > > > �* Is this phenomenon a limit of the current cib process? > > > > More likely of the underlying messaging infrastructure, but I'll take a > look. > > Perhaps the default cib operation timeouts are too low for larger > clusters. > > > > > > > > The log attached it to next Bugzilla. > > > �* > http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443 > > > > Ok, I'll follow up there. > > > > > > > > Best Regards, > > > Hideo Yamauchi. > > > > > > > > > > > > _______________________________________________ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker