Hi again, that's strange because I did tests around this parameter LRMD_MAX_CHILDREN, with 24 Dummy resources, therefore resources which do quite nothing and so Pacemaker should start all resources at quite the same time one after the other. Then monitor op should also be quite at the same time one after the other. First, I test with no LRMD_MAX_CHILDREN in /etc/sysconfig/pacemaker so default value which is probably 4 as you told me, then I set it to 2, restart Pacemaker and did same test, and finally set it to 24 (just for a school case) and did the same test . And the result is the same for the three tests : when all the 24 Dummy resources are started , as you can see below, the op monitor seems to be gathered by 4, whatever is the LRMD_MAX_CHILDREN value, whereas my understanding was the monitor operations should have been parallelized for almost the 24 resources as the monitor takes a very short while to be completed ...
Where am I wrong ? [root@cuzco4 tmp]# grep monitor /var/log/syslog | grep resname | grep ok 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname1_monitor_20000 (call=236, rc=0, cib-update=436, confirmed=false) ok 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname3_monitor_20000 (call=237, rc=0, cib-update=437, confirmed=false) ok 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname5_monitor_20000 (call=238, rc=0, cib-update=438, confirmed=false) ok 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname7_monitor_20000 (call=239, rc=0, cib-update=439, confirmed=false) ok 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname15_monitor_20000 (call=240, rc=0, cib-update=440, confirmed=false) ok 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname9_monitor_20000 (call=241, rc=0, cib-update=441, confirmed=false) ok 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname11_monitor_20000 (call=242, rc=0, cib-update=442, confirmed=false) ok 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname13_monitor_20000 (call=243, rc=0, cib-update=443, confirmed=false) ok 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname17_monitor_20000 (call=244, rc=0, cib-update=444, confirmed=false) ok 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname23_monitor_20000 (call=245, rc=0, cib-update=445, confirmed=false) ok 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname21_monitor_20000 (call=246, rc=0, cib-update=446, confirmed=false) ok 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: process_lrm_event: LRM operation resname19_monitor_20000 (call=247, rc=0, cib-update=447, confirmed=false) ok [root@cuzco6 tmp]# grep monitor /var/log/syslog | grep resname | grep ok 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname2_monitor_20000 (call=236, rc=0, cib-update=245, confirmed=false) ok 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname4_monitor_20000 (call=237, rc=0, cib-update=246, confirmed=false) ok 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname8_monitor_20000 (call=238, rc=0, cib-update=247, confirmed=false) ok 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname6_monitor_20000 (call=239, rc=0, cib-update=248, confirmed=false) ok 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname10_monitor_20000 (call=240, rc=0, cib-update=249, confirmed=false) ok 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname16_monitor_20000 (call=241, rc=0, cib-update=250, confirmed=false) ok 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname14_monitor_20000 (call=242, rc=0, cib-update=251, confirmed=false) ok 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname12_monitor_20000 (call=243, rc=0, cib-update=252, confirmed=false) ok 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname24_monitor_20000 (call=244, rc=0, cib-update=253, confirmed=false) ok 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname22_monitor_20000 (call=245, rc=0, cib-update=254, confirmed=false) ok 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname20_monitor_20000 (call=246, rc=0, cib-update=255, confirmed=false) ok 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: process_lrm_event: LRM operation resname18_monitor_20000 (call=247, rc=0, cib-update=256, confirmed=false) ok Alain De : Dejan Muhamedagic <deja...@fastmail.fm> A : General Linux-HA mailing list <linux-ha@lists.linux-ha.org> Date : 22/11/2011 13:18 Objet : Re: [Linux-HA] Antw: What about "start-delay" attribute status ? Envoyé par : linux-ha-boun...@lists.linux-ha.org Hi, On Tue, Nov 22, 2011 at 08:17:28AM +0100, alain.mou...@bull.net wrote: > Hi > > By the way, is there a description somewhere of parameters from > /etc/sysconfig/pacemaker ? To the best of my knowledge, there is only LRMD_MAX_CHILDREN. Thanks, Dejan > Thanks > Alain > > > > De : Dejan Muhamedagic <deja...@fastmail.fm> > A : General Linux-HA mailing list <linux-ha@lists.linux-ha.org> > Date : 21/11/2011 15:48 > Objet : Re: [Linux-HA] Antw: What about "start-delay" attribute status ? > Envoyé par : linux-ha-boun...@lists.linux-ha.org > > > > On Mon, Nov 21, 2011 at 03:07:43PM +0100, alain.mou...@bull.net wrote: > > Thanks Dejan, > > ok I understand, so we have to choose between a small value of > > LRMD_MAX_CHILDREN > > and on start, stop, or status of 64 resources it will take a while ... > > and a big value of LRMD_MAX_CHILDREN and then either the start, stop and > > at best, status will be achieved very quickly as they are parallelized > or > > at > > worst the system will be "on knees" ... > > We'll give it a try ... as I have big computers ;-) > > Just note that you should try to think of every possible > combination of resource operations. For instance, imagine 64 Xen > VMs trying to start in parallel. Better be conservative than > to push your nodes to their limit. > > > But my question is now : when you write : > > "Let me just add that operations which were supposed to > > start at the same time get spaced out." > > So if LRMD_MAX_CHILDREN=4, that means that if ask for start on 32 > > resources at the > > same time, Pacemaker will mange 4, delay the remaing 28, manage 4 again, > > > etc. so > > it will be completed in 8 shots, right ? > > No. > > > But what is the delay value between each shot ? > > There is none. As soon as one operation finishes, another one > gets started. Now, if you have say four big RDBMS instances > starting and each of them takes five minutes or so, the other > resources will obviously stay in the queue for five minutes. > > Anyway, you can see for yourself on cluster start, just grep > your logs for lrmd:.*rsc:, it should show you all timestamps > when certain operation was started (apart from recurring > monitors). > > Thanks, > > Dejan > > > Thanks > > Alain > > > > > > > > > > De : Dejan Muhamedagic <deja...@fastmail.fm> > > A : General Linux-HA mailing list <linux-ha@lists.linux-ha.org> > > Date : 21/11/2011 13:45 > > Objet : Re: [Linux-HA] Antw: What about "start-delay" attribute status > ? > > Envoyé par : linux-ha-boun...@lists.linux-ha.org > > > > > > > > Hi, > > > > On Mon, Nov 21, 2011 at 01:42:15PM +0100, alain.mou...@bull.net wrote: > > > Hi Florian, > > > ok I've checked the thread, so that means that on RHEL6 , if I have > > let's > > > say 32 resources groups of 2 primitives on > > > each node, I can set the LRMD_MAX_CHILDREN environment variable in > > > /etc/sysconfig/pacemaker to 64 ? > > > > The number of resources shouldn't be the main criteria for > > setting this parameter, but what can your nodes handle without > > being overloaded. So, 64 sounds sounds like you have some really > > big computers :) It also depends on the nature of the cluster > > resources. The default of 4 is rather conservative, perhaps > > nowadays 8 would be better. > > > > > Is it acceptable for lrmd and Pacemaker ? Or will we face any > > side-effect > > > ? > > > > LRMD_MAX_CHILDREN is the maximum number of resource operations > > allowed to run in parallel. Hope that that answers your question. > > > > Thanks, > > > > Dejan > > > > > Thanks > > > Alain > > > > > > > > > > > > De : Florian Haas <flor...@hastexo.com> > > > A : General Linux-HA mailing list <linux-ha@lists.linux-ha.org> > > > Date : 21/11/2011 12:58 > > > Objet : Re: [Linux-HA] Antw: What about "start-delay" attribute > status > > ? > > > Envoyé par : linux-ha-boun...@lists.linux-ha.org > > > > > > > > > > > > On 11/21/11 13:03, alain.mou...@bull.net wrote: > > > > Hi, > > > > yes that's exactly the purpose of my question (and exactly the same > > > > problem of "big-monitoring-trains") : > > > > if we can always use start-delay to ramdomize the first monitor > > > operation > > > > time on all the resources on a server, > > > > but if it is really deprecated, that means that in the future this > > > option > > > > will no more > > > > be managed by Pacemaker (perhaps it already is the case ... ?) , so > in > > > > > > this case > > > > we must not use this option. > > > > > > > > Could someone give us a clear status on this option "start-delay" ? > > > > > > If your RA needs it, then the RA is most likely broken. :) > > > > > > For monitor operations allegedly piling up, please consider this: > > > http://www.gossamer-threads.com/lists/linuxha/pacemaker/76152#76152 > > > > > > Hope this helps. > > > Cheers, > > > Florian > > > > > > -- > > > Need help with High Availability? > > > http://www.hastexo.com/now > > > _______________________________________________ > > > Linux-HA mailing list > > > Linux-HA@lists.linux-ha.org > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > > > > _______________________________________________ > > > Linux-HA mailing list > > > Linux-HA@lists.linux-ha.org > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > > Linux-HA mailing list > > Linux-HA@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > _______________________________________________ > > Linux-HA mailing list > > Linux-HA@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems