subject:"Re\: \[Pacemaker\] \[Problem\]Cib cannot update an attribute by 16 node constitution."

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

2010-08-26 Thread Andrew Beekhof

On Fri, Aug 6, 2010 at 4:36 AM, nozawat noza...@gmail.com wrote:
 Hi Andrew,

 Is this data all from the CIB process?
 What was the load from the other processes like? (Ie. using top)
 The cib process was approximately 100% CPU rates of use,
 but most of the other processes did not use a CPU when they confirmed it in
 top.
 The number of the cores of the CPU assigns 2.

 Is there the method to make these processing early?
 You mean faster?
 Uh...For example, I thought in a parameter like batch-limit..
 Because I feel that the handling of XML is not so early...

My brain might be a little bit fuzzy from the vacation, but I'm still
not 100% sure what you mean here...


 Could you attach the CIB your cluster was using so I can have a go?
 I'm sorry.cib.xml was not able to send a e-mail by the limit of the attached
 file.
 I compress a file and put it in Bugzilla.
 - cib.xml  :http://developerbugs.linux-foundation.org/attachment.cgi?id=1684
 - systemtap:http://developerbugs.linux-foundation.org/attachment.cgi?id=1685

Great. I'll follow up there once I finish making my way through my
email backlog.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

2010-08-05 Thread Andrew Beekhof

2010/8/5 nozawat noza...@gmail.com:
 Hi Andrew,

  I took oprofile of 120 seconds before Time-out occurs.
  However, sorted_xml function seems not to be called it so much as far as I
 watch log of Oprofile.

I suspect you're misinterpreting the results.
The number of calls isn't nearly as important as how much time is
spent in that function.

Could you attach the CIB your cluster was using so I can have a go?


  I took Oprofile by a sample program.
  But, the contents of Oprofile did not agree with the contents of the sample
 program.
  Therefore I used systemtap because I was not able to do internal bottleneck
 analysis from information of this Oprofile.

Ah, I wondered where you go the data from.
Could you share how you used systemtap for this?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

2010-08-05 Thread nozawat

Hi Andrew,

 Is this data all from the CIB process?
 What was the load from the other processes like? (Ie. using top)
The cib process was approximately 100% CPU rates of use,
but most of the other processes did not use a CPU when they confirmed it in
top.
The number of the cores of the CPU assigns 2.

 Is there the method to make these processing early?
 You mean faster?
Uh...For example, I thought in a parameter like batch-limit..
Because I feel that the handling of XML is not so early...

 Could you attach the CIB your cluster was using so I can have a go?
I'm sorry.cib.xml was not able to send a e-mail by the limit of the attached
file.
I compress a file and put it in Bugzilla.
- cib.xml  :http://developerbugs.linux-foundation.org/attachment.cgi?id=1684
- systemtap:http://developerbugs.linux-foundation.org/attachment.cgi?id=1685

 Ah, I wondered where you go the data from.
 Could you share how you used systemtap for this?
step 1) I divide a crm file at first to build a cluster.
 - The attached file is cib_14+2_functest01.crm and cib_14+2_functest02.crm.
 - Because I become the error by construction with one file.
step 2) After construction completion, I let a DC node break down.
step 3) I started systemtap from the 1,266th line of the log file in 120
seconds of the 1,267th line and acquired information.
 - The attached file is ha.log.This file is ha.log of the new DC node.

note ) A systemtap use procedure
1)stap -x PID functime.stp
2)stop is Ctl+C

-- 
Regards,
Tomo


2010/8/5 Andrew Beekhof and...@beekhof.net

 2010/8/5 nozawat noza...@gmail.com:
  Hi Andrew,
 
   I took oprofile of 120 seconds before Time-out occurs.
   However, sorted_xml function seems not to be called it so much as far as
 I
  watch log of Oprofile.

 I suspect you're misinterpreting the results.
 The number of calls isn't nearly as important as how much time is
 spent in that function.

 Could you attach the CIB your cluster was using so I can have a go?

 
   I took Oprofile by a sample program.
   But, the contents of Oprofile did not agree with the contents of the
 sample
  program.
   Therefore I used systemtap because I was not able to do internal
 bottleneck
  analysis from information of this Oprofile.

 Ah, I wondered where you go the data from.
 Could you share how you used systemtap for this?

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




-- 
Regards,
Tomo
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

2010-08-04 Thread Andrew Beekhof

2010/8/3 nozawat noza...@gmail.com:
 Hi Andrew,

 I changed cluster option to batch-limit=3,I re-tried it.
 However, similar time-out occurs.

 I measured processing just before the time-out(120s) in systemtap.
 The following only the function long time.
 -
 probe start! -
   cib_process_request  [call-count:179][117,540,173,155 nsec]
   cib_process_command  [call:179]  [116,471,047,275 nsec]
 cib_process_command  call function ---
   cib_config_changed   [call:179]  [101,169,909,572 nsec]
 cib_config_changed   call function ---
   calculate_xml_digest [call:179]  [ 68,820,560,745 nsec]
   create_xml_node  [call:3012263]  [ 19,855,469,976 nsec]※

But, but, create_xml_node() barely does _anything_.

It's hard to imagine how this could take 19s:

xmlNode*
create_xml_node(xmlNode *parent, const char *name)
{
xmlDoc *doc = NULL;
xmlNode *node = NULL;   

if (name == NULL || name[0] == 0) {
return NULL;
}

if(parent == NULL) {
doc = xmlNewDoc((const xmlChar*)1.0);
node = xmlNewDocRawNode(doc, NULL, (const xmlChar*)name, NULL);
xmlDocSetRootElement(doc, node);

} else {
doc = getDocPtr(parent);
node = xmlNewDocRawNode(doc, NULL, (const xmlChar*)name, NULL);
xmlAddChild(parent, node);
}
return node;
}

Is this data all from the CIB process?
What was the load from the other processes like? (Ie. using top)

   xpath_search [call:179]  [    145,030,232 nsec]
   diff_xml_object  [call:179]  [ 32,677,359,476 nsec]※
 calculate_xml_digest call function ---
   sorted_xml   [call:1505799]  [ 52,512,465,838 nsec]※

Yikes, these are also really high.
Could you send me the CIB so that we can profile these operations on it?

   copy_xml [call:179]  [  3,692,232,073 nsec]
   dump_xml [call:536]  [  6,177,606,232 nsec]
 -
 Is there the method to make these processing early?

You mean faster?
Maybe oprofile can suggest something.



 2010/6/14 renayama19661...@ybb.ne.jp

 Hi Andrew,

 Thank you for comment.

  More likely of the underlying messaging infrastructure, but I'll take a
  look.
  Perhaps the default cib operation timeouts are too low for larger
  clusters.
 
  
   The log attached it to next Bugzilla.
   #65533;*
   http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443
 
  Ok, I'll follow up there.

 If it is necessary for us to work for the solution of the problem, please
 order it.

 Best Regards,
 Hideo Yamauchi.

 --- Andrew Beekhof and...@beekhof.net wrote:

  On Mon, Jun 14, 2010 at 4:46 AM,  renayama19661...@ybb.ne.jp wrote:
   We tested 16 node constitution (15+1).
  
   We carried out the next procedure.
  
   Step1) Start 16 nodes.
   Step2) Send cib after a DC node was decided.
  
   An error occurs by the update of the attribute of pingd after Probe
   processing was over.
  
  
 

 
   Jun 14 10:58:03 hb0102 pingd: [2465]: info: ping_read: Retrying...
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
   337 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
   340 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
   343 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
   346 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
   349 for
  default_ping_set=1600
   failed: Remote node did not respond
  
 

 
  
   In the middle of this error, I carried out a cibadmin(-Q optin)
   command, but time-out
  occurred.
   In addition, cib of the DC node seemed to move by the top command very
   busily.
  
  
   In addition, a communication error with cib occurs in the DC node, and
   crmd reboots.
  
  
 

 
   Jun 14 10:58:09 hb0101 attrd: [2278]: WARN: xmlfromIPC: No message
   received in the required
  interval
   (120s)
   Jun 14 10:58:09 hb0101 attrd: [2278]: info: attrd_perform_update: Sent
   update -41:
   default_ping_set=1600
   (snip)
   Jun 14 10:59:07 hb0101 crmd: [2280]: info: do_exit: [crmd] stopped (2)
   Jun 14 10:59:07 hb0101 corosync[2269]: #65533; [pcmk #65533;]
   plugin.c:858 info: pcmk_ipc_exit:
 Client
  crmd
   (conn=0x106a2bf0, async-conn=0x106a2bf0) left
   Jun 14 10:59:08 hb0101

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

2010-08-02 Thread nozawat

Hi Andrew,

I changed cluster option to batch-limit=3,I re-tried it.
However, similar time-out occurs.

I measured processing just before the time-out(120s) in systemtap.
The following only the function long time.
-
probe start! -
  cib_process_request  [call-count:179][117,540,173,155 nsec]
  cib_process_command  [call:179]  [116,471,047,275 nsec]
cib_process_command  call function ---
  cib_config_changed   [call:179]  [101,169,909,572 nsec]
cib_config_changed   call function ---
  calculate_xml_digest [call:179]  [ 68,820,560,745 nsec]
  create_xml_node  [call:3012263]  [ 19,855,469,976 nsec]※
  xpath_search [call:179]  [145,030,232 nsec]
  diff_xml_object  [call:179]  [ 32,677,359,476 nsec]※
calculate_xml_digest call function ---
  sorted_xml   [call:1505799]  [ 52,512,465,838 nsec]※
  copy_xml [call:179]  [  3,692,232,073 nsec]
  dump_xml [call:536]  [  6,177,606,232 nsec]
-
Is there the method to make these processing early?


2010/6/14 renayama19661...@ybb.ne.jp

 Hi Andrew,

 Thank you for comment.

  More likely of the underlying messaging infrastructure, but I'll take a
 look.
  Perhaps the default cib operation timeouts are too low for larger
 clusters.
 
  
   The log attached it to next Bugzilla.
   #65533;*
 http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443
 
  Ok, I'll follow up there.

 If it is necessary for us to work for the solution of the problem, please
 order it.

 Best Regards,
 Hideo Yamauchi.

 --- Andrew Beekhof and...@beekhof.net wrote:

  On Mon, Jun 14, 2010 at 4:46 AM,  renayama19661...@ybb.ne.jp wrote:
   We tested 16 node constitution (15+1).
  
   We carried out the next procedure.
  
   Step1) Start 16 nodes.
   Step2) Send cib after a DC node was decided.
  
   An error occurs by the update of the attribute of pingd after Probe
 processing was over.
  
  
 

 
   Jun 14 10:58:03 hb0102 pingd: [2465]: info: ping_read: Retrying...
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 337 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 340 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 343 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 346 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 349 for
  default_ping_set=1600
   failed: Remote node did not respond
  
 

 
  
   In the middle of this error, I carried out a cibadmin(-Q optin)
 command, but time-out
  occurred.
   In addition, cib of the DC node seemed to move by the top command very
 busily.
  
  
   In addition, a communication error with cib occurs in the DC node, and
 crmd reboots.
  
  
 

 
   Jun 14 10:58:09 hb0101 attrd: [2278]: WARN: xmlfromIPC: No message
 received in the required
  interval
   (120s)
   Jun 14 10:58:09 hb0101 attrd: [2278]: info: attrd_perform_update: Sent
 update -41:
   default_ping_set=1600
   (snip)
   Jun 14 10:59:07 hb0101 crmd: [2280]: info: do_exit: [crmd] stopped (2)
   Jun 14 10:59:07 hb0101 corosync[2269]: #65533; [pcmk #65533;]
 plugin.c:858 info: pcmk_ipc_exit:
 Client
  crmd
   (conn=0x106a2bf0, async-conn=0x106a2bf0) left
   Jun 14 10:59:08 hb0101 corosync[2269]: #65533; [pcmk #65533;]
 plugin.c:481 ERROR:
 pcmk_wait_dispatch:
  Child
   process crmd exited (pid=2280, rc=2)
   Jun 14 10:59:08 hb0101 corosync[2269]: #65533; [pcmk #65533;]
 plugin.c:498 notice:
 pcmk_wait_dispatch:
  Respawning
   failed child process: crmd
   Jun 14 10:59:08 hb0101 corosync[2269]: #65533; [pcmk #65533;]
 utils.c:131 info: spawn_child:
 Forked child
  2680 for
   process crmd
   Jun 14 10:59:08 hb0101 crmd: [2680]: info: Invoked:
 /usr/lib64/heartbeat/crmd
   Jun 14 10:59:08 hb0101 crmd: [2680]: info: main: CRM Hg Version:
   9f04fa88cfd3da553e977cc79983d1c494c8b502
   Jun 14 10:59:08 hb0101 crmd: [2680]: info: crmd_init: Starting crmd
   Jun 14 10:59:08 hb0101 crmd: [2680]: info: G_main_add_SignalHandler:
 Added signal handler for
  signal
   17
  
 

 
  
   There seems to be a problem in cib of the DC node somehow or other.
   We hope

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

2010-08-02 Thread nozawat

Hi Andrew,

I made the patch that extended an internal time-out as one of the optional
clusters.
There is a possibility that the necessity for extending this argument value
comes out when there are a lot of numbers of nodes.
Default is 120s according to the current.

Regards,
Tomo


2010年8月3日12:12 nozawat noza...@gmail.com:

 Hi Andrew,

 I changed cluster option to batch-limit=3,I re-tried it.
 However, similar time-out occurs.

 I measured processing just before the time-out(120s) in systemtap.
 The following only the function long time.
 -
 probe start! -
   cib_process_request  [call-count:179][117,540,173,155 nsec]
   cib_process_command  [call:179]  [116,471,047,275 nsec]
 cib_process_command  call function ---
   cib_config_changed   [call:179]  [101,169,909,572 nsec]
 cib_config_changed   call function ---
   calculate_xml_digest [call:179]  [ 68,820,560,745 nsec]
   create_xml_node  [call:3012263]  [ 19,855,469,976 nsec]※
   xpath_search [call:179]  [145,030,232 nsec]
   diff_xml_object  [call:179]  [ 32,677,359,476 nsec]※
 calculate_xml_digest call function ---
   sorted_xml   [call:1505799]  [ 52,512,465,838 nsec]※
   copy_xml [call:179]  [  3,692,232,073 nsec]
   dump_xml [call:536]  [  6,177,606,232 nsec]
 -
 Is there the method to make these processing early?


 2010/6/14 renayama19661...@ybb.ne.jp

 Hi Andrew,

 Thank you for comment.

  More likely of the underlying messaging infrastructure, but I'll take a
 look.
  Perhaps the default cib operation timeouts are too low for larger
 clusters.
 
  
   The log attached it to next Bugzilla.
   #65533;*
 http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443
 
  Ok, I'll follow up there.

 If it is necessary for us to work for the solution of the problem, please
 order it.

 Best Regards,
 Hideo Yamauchi.

 --- Andrew Beekhof and...@beekhof.net wrote:

  On Mon, Jun 14, 2010 at 4:46 AM,  renayama19661...@ybb.ne.jp wrote:
   We tested 16 node constitution (15+1).
  
   We carried out the next procedure.
  
   Step1) Start 16 nodes.
   Step2) Send cib after a DC node was decided.
  
   An error occurs by the update of the attribute of pingd after Probe
 processing was over.
  
  
 

 
   Jun 14 10:58:03 hb0102 pingd: [2465]: info: ping_read: Retrying...
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 337 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 340 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 343 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 346 for
  default_ping_set=1600
   failed: Remote node did not respond
   Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update
 349 for
  default_ping_set=1600
   failed: Remote node did not respond
  
 

 
  
   In the middle of this error, I carried out a cibadmin(-Q optin)
 command, but time-out
  occurred.
   In addition, cib of the DC node seemed to move by the top command very
 busily.
  
  
   In addition, a communication error with cib occurs in the DC node, and
 crmd reboots.
  
  
 

 
   Jun 14 10:58:09 hb0101 attrd: [2278]: WARN: xmlfromIPC: No message
 received in the required
  interval
   (120s)
   Jun 14 10:58:09 hb0101 attrd: [2278]: info: attrd_perform_update: Sent
 update -41:
   default_ping_set=1600
   (snip)
   Jun 14 10:59:07 hb0101 crmd: [2280]: info: do_exit: [crmd] stopped (2)
   Jun 14 10:59:07 hb0101 corosync[2269]: #65533; [pcmk #65533;]
 plugin.c:858 info: pcmk_ipc_exit:
 Client
  crmd
   (conn=0x106a2bf0, async-conn=0x106a2bf0) left
   Jun 14 10:59:08 hb0101 corosync[2269]: #65533; [pcmk #65533;]
 plugin.c:481 ERROR:
 pcmk_wait_dispatch:
  Child
   process crmd exited (pid=2280, rc=2)
   Jun 14 10:59:08 hb0101 corosync[2269]: #65533; [pcmk #65533;]
 plugin.c:498 notice:
 pcmk_wait_dispatch:
  Respawning
   failed child process: crmd
   Jun 14 10:59:08 hb0101 corosync[2269]: #65533; [pcmk #65533;]
 utils.c:131 info: spawn_child:
 Forked child
  2680 for
   process crmd
   Jun 14 10:59:08 hb0101 crmd: [2680]: info: Invoked:
 /usr/lib64/heartbeat/crmd
   Jun 14 10:59:08 hb0101 crmd: [2680]: info: main: CRM Hg Version:
   9f04fa88cfd3da553e977cc79983d1c494c8b502
   Jun 14 10:59:08 hb0101 crmd: [2680]: info: crmd_init:

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

2010-06-14 Thread renayama19661014

Hi Andrew,

Thank you for comment.

 More likely of the underlying messaging infrastructure, but I'll take a look.
 Perhaps the default cib operation timeouts are too low for larger clusters.
 
 
  The log attached it to next Bugzilla.
  #65533;* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443
 
 Ok, I'll follow up there.

If it is necessary for us to work for the solution of the problem, please order 
it.

Best Regards,
Hideo Yamauchi.

--- Andrew Beekhof and...@beekhof.net wrote:

 On Mon, Jun 14, 2010 at 4:46 AM,  renayama19661...@ybb.ne.jp wrote:
  We tested 16 node constitution (15+1).
 
  We carried out the next procedure.
 
  Step1) Start 16 nodes.
  Step2) Send cib after a DC node was decided.
 
  An error occurs by the update of the attribute of pingd after Probe 
  processing was over.
 
 


  Jun 14 10:58:03 hb0102 pingd: [2465]: info: ping_read: Retrying...
  Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 337 
  for
 default_ping_set=1600
  failed: Remote node did not respond
  Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 340 
  for
 default_ping_set=1600
  failed: Remote node did not respond
  Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 343 
  for
 default_ping_set=1600
  failed: Remote node did not respond
  Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 346 
  for
 default_ping_set=1600
  failed: Remote node did not respond
  Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 349 
  for
 default_ping_set=1600
  failed: Remote node did not respond
 


 
  In the middle of this error, I carried out a cibadmin(-Q optin) command, 
  but time-out
 occurred.
  In addition, cib of the DC node seemed to move by the top command very 
  busily.
 
 
  In addition, a communication error with cib occurs in the DC node, and crmd 
  reboots.
 
 


  Jun 14 10:58:09 hb0101 attrd: [2278]: WARN: xmlfromIPC: No message received 
  in the required
 interval
  (120s)
  Jun 14 10:58:09 hb0101 attrd: [2278]: info: attrd_perform_update: Sent 
  update -41:
  default_ping_set=1600
  (snip)
  Jun 14 10:59:07 hb0101 crmd: [2280]: info: do_exit: [crmd] stopped (2)
  Jun 14 10:59:07 hb0101 corosync[2269]: #65533; [pcmk #65533;] 
  plugin.c:858 info: pcmk_ipc_exit:
Client
 crmd
  (conn=0x106a2bf0, async-conn=0x106a2bf0) left
  Jun 14 10:59:08 hb0101 corosync[2269]: #65533; [pcmk #65533;] 
  plugin.c:481 ERROR:
pcmk_wait_dispatch:
 Child
  process crmd exited (pid=2280, rc=2)
  Jun 14 10:59:08 hb0101 corosync[2269]: #65533; [pcmk #65533;] 
  plugin.c:498 notice:
pcmk_wait_dispatch:
 Respawning
  failed child process: crmd
  Jun 14 10:59:08 hb0101 corosync[2269]: #65533; [pcmk #65533;] utils.c:131 
  info: spawn_child:
Forked child
 2680 for
  process crmd
  Jun 14 10:59:08 hb0101 crmd: [2680]: info: Invoked: 
  /usr/lib64/heartbeat/crmd
  Jun 14 10:59:08 hb0101 crmd: [2680]: info: main: CRM Hg Version:
  9f04fa88cfd3da553e977cc79983d1c494c8b502
  Jun 14 10:59:08 hb0101 crmd: [2680]: info: crmd_init: Starting crmd
  Jun 14 10:59:08 hb0101 crmd: [2680]: info: G_main_add_SignalHandler: Added 
  signal handler for
 signal
  17
 


 
  There seems to be a problem in cib of the DC node somehow or other.
  We hope that an attribute change is completed in 16 nodes definitely.
  #65533;* Is this phenomenon a limit of the current cib process?
 
 More likely of the underlying messaging infrastructure, but I'll take a look.
 Perhaps the default cib operation timeouts are too low for larger clusters.
 
 
  The log attached it to next Bugzilla.
  #65533;* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443
 
 Ok, I'll follow up there.
 
 
  Best Regards,
  Hideo Yamauchi.
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: 
  http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

7 matches

Site Navigation

Mail list logo

Footer information