Hi Andrew, I made the patch that extended an internal time-out as one of the optional clusters. There is a possibility that the necessity for extending this argument value comes out when there are a lot of numbers of nodes. Default is 120s according to the current.
Regards, Tomo 2010年8月3日12:12 nozawat <noza...@gmail.com>: > Hi Andrew, > > I changed cluster option to batch-limit=3,I re-tried it. > However, similar time-out occurs. > > I measured processing just before the time-out(120s) in systemtap. > The following only the function long time. > ----- > probe start! --------------------------------- > cib_process_request [call-count:179][117,540,173,155 nsec] > cib_process_command [call:179] [116,471,047,275 nsec] > cib_process_command call function --- > cib_config_changed [call:179] [101,169,909,572 nsec] > cib_config_changed call function --- > calculate_xml_digest [call:179] [ 68,820,560,745 nsec] > create_xml_node [call:3012263] [ 19,855,469,976 nsec]※ > xpath_search [call:179] [ 145,030,232 nsec] > diff_xml_object [call:179] [ 32,677,359,476 nsec]※ > calculate_xml_digest call function --- > sorted_xml [call:1505799] [ 52,512,465,838 nsec]※ > copy_xml [call:179] [ 3,692,232,073 nsec] > dump_xml [call:536] [ 6,177,606,232 nsec] > ----- > Is there the method to make these processing early? > > > 2010/6/14 <renayama19661...@ybb.ne.jp> > > Hi Andrew, >> >> Thank you for comment. >> >> > More likely of the underlying messaging infrastructure, but I'll take a >> look. >> > Perhaps the default cib operation timeouts are too low for larger >> clusters. >> > >> > > >> > > The log attached it to next Bugzilla. >> > > �* >> http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443 >> > >> > Ok, I'll follow up there. >> >> If it is necessary for us to work for the solution of the problem, please >> order it. >> >> Best Regards, >> Hideo Yamauchi. >> >> --- Andrew Beekhof <and...@beekhof.net> wrote: >> >> > On Mon, Jun 14, 2010 at 4:46 AM, <renayama19661...@ybb.ne.jp> wrote: >> > > We tested 16 node constitution (15+1). >> > > >> > > We carried out the next procedure. >> > > >> > > Step1) Start 16 nodes. >> > > Step2) Send cib after a DC node was decided. >> > > >> > > An error occurs by the update of the attribute of pingd after Probe >> processing was over. >> > > >> > > >> > >> >> ---------------------------------------------------------------------------------------------------------------------------------------- >> > > Jun 14 10:58:03 hb0102 pingd: [2465]: info: ping_read: Retrying... >> > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update >> 337 for >> > default_ping_set=1600 >> > > failed: Remote node did not respond >> > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update >> 340 for >> > default_ping_set=1600 >> > > failed: Remote node did not respond >> > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update >> 343 for >> > default_ping_set=1600 >> > > failed: Remote node did not respond >> > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update >> 346 for >> > default_ping_set=1600 >> > > failed: Remote node did not respond >> > > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update >> 349 for >> > default_ping_set=1600 >> > > failed: Remote node did not respond >> > > >> > >> >> ---------------------------------------------------------------------------------------------------------------------------------------- >> > > >> > > In the middle of this error, I carried out a cibadmin(-Q optin) >> command, but time-out >> > occurred. >> > > In addition, cib of the DC node seemed to move by the top command very >> busily. >> > > >> > > >> > > In addition, a communication error with cib occurs in the DC node, and >> crmd reboots. >> > > >> > > >> > >> >> ---------------------------------------------------------------------------------------------------------------------------------------- >> > > Jun 14 10:58:09 hb0101 attrd: [2278]: WARN: xmlfromIPC: No message >> received in the required >> > interval >> > > (120s) >> > > Jun 14 10:58:09 hb0101 attrd: [2278]: info: attrd_perform_update: Sent >> update -41: >> > > default_ping_set=1600 >> > > (snip) >> > > Jun 14 10:59:07 hb0101 crmd: [2280]: info: do_exit: [crmd] stopped (2) >> > > Jun 14 10:59:07 hb0101 corosync[2269]: � [pcmk �] >> plugin.c:858 info: pcmk_ipc_exit: >> Client >> > crmd >> > > (conn=0x106a2bf0, async-conn=0x106a2bf0) left >> > > Jun 14 10:59:08 hb0101 corosync[2269]: � [pcmk �] >> plugin.c:481 ERROR: >> pcmk_wait_dispatch: >> > Child >> > > process crmd exited (pid=2280, rc=2) >> > > Jun 14 10:59:08 hb0101 corosync[2269]: � [pcmk �] >> plugin.c:498 notice: >> pcmk_wait_dispatch: >> > Respawning >> > > failed child process: crmd >> > > Jun 14 10:59:08 hb0101 corosync[2269]: � [pcmk �] >> utils.c:131 info: spawn_child: >> Forked child >> > 2680 for >> > > process crmd >> > > Jun 14 10:59:08 hb0101 crmd: [2680]: info: Invoked: >> /usr/lib64/heartbeat/crmd >> > > Jun 14 10:59:08 hb0101 crmd: [2680]: info: main: CRM Hg Version: >> > > 9f04fa88cfd3da553e977cc79983d1c494c8b502 >> > > Jun 14 10:59:08 hb0101 crmd: [2680]: info: crmd_init: Starting crmd >> > > Jun 14 10:59:08 hb0101 crmd: [2680]: info: G_main_add_SignalHandler: >> Added signal handler for >> > signal >> > > 17 >> > > >> > >> >> ---------------------------------------------------------------------------------------------------------------------------------------- >> > > >> > > There seems to be a problem in cib of the DC node somehow or other. >> > > We hope that an attribute change is completed in 16 nodes definitely. >> > > �* Is this phenomenon a limit of the current cib process? >> > >> > More likely of the underlying messaging infrastructure, but I'll take a >> look. >> > Perhaps the default cib operation timeouts are too low for larger >> clusters. >> > >> > > >> > > The log attached it to next Bugzilla. >> > > �* >> http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443 >> > >> > Ok, I'll follow up there. >> > >> > > >> > > Best Regards, >> > > Hideo Yamauchi. >> > > >> > > >> > > >> > > _______________________________________________ >> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > > >> > > Project Home: http://www.clusterlabs.org >> > > Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > > Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > > >> > >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > >
cib_callback_timeout.patch
Description: Binary data
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker