Re: [Linux-HA] Antw: What about "start-delay" attribute status ?

alain . moulle Tue, 22 Nov 2011 07:29:26 -0800

Hi again,

that's strange because I did tests around this parameter 
LRMD_MAX_CHILDREN,
with 24 Dummy resources, therefore resources which do quite nothing and so 
Pacemaker
should start all resources at quite the same time one after the other. 
Then monitor op
should also be quite at the same time one after the other.
First, I test with no  LRMD_MAX_CHILDREN in /etc/sysconfig/pacemaker so 
default value 
which is probably 4 as you told me, then  I set it to 2, restart Pacemaker 
and did same test,
and finally set it to 24 (just for a school case) and did the same test .
And the result is the same for the three tests :
when all the 24 Dummy resources are started , as you can see below,
the op monitor seems to be gathered by 4, whatever is the 
LRMD_MAX_CHILDREN value,
whereas my understanding was the monitor operations should have been 
parallelized for 
almost the 24 resources as the monitor takes a very short while to be 
completed ...


Where am I wrong ?

[root@cuzco4 tmp]# grep monitor /var/log/syslog | grep resname | grep ok
1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname1_monitor_20000 (call=236, rc=0, 
cib-update=436, confirmed=false) ok
1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname3_monitor_20000 (call=237, rc=0, 
cib-update=437, confirmed=false) ok
1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname5_monitor_20000 (call=238, rc=0, 
cib-update=438, confirmed=false) ok
1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname7_monitor_20000 (call=239, rc=0, 
cib-update=439, confirmed=false) ok
1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname15_monitor_20000 (call=240, rc=0, 
cib-update=440, confirmed=false) ok
1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname9_monitor_20000 (call=241, rc=0, 
cib-update=441, confirmed=false) ok
1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname11_monitor_20000 (call=242, rc=0, 
cib-update=442, confirmed=false) ok
1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname13_monitor_20000 (call=243, rc=0, 
cib-update=443, confirmed=false) ok
1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname17_monitor_20000 (call=244, rc=0, 
cib-update=444, confirmed=false) ok
1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname23_monitor_20000 (call=245, rc=0, 
cib-update=445, confirmed=false) ok
1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname21_monitor_20000 (call=246, rc=0, 
cib-update=446, confirmed=false) ok
1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: 
process_lrm_event: LRM operation resname19_monitor_20000 (call=247, rc=0, 
cib-update=447, confirmed=false) ok
[root@cuzco6 tmp]# grep monitor /var/log/syslog | grep resname | grep ok
1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname2_monitor_20000 (call=236, rc=0, 
cib-update=245, confirmed=false) ok
1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname4_monitor_20000 (call=237, rc=0, 
cib-update=246, confirmed=false) ok
1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname8_monitor_20000 (call=238, rc=0, 
cib-update=247, confirmed=false) ok
1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname6_monitor_20000 (call=239, rc=0, 
cib-update=248, confirmed=false) ok
1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname10_monitor_20000 (call=240, rc=0, 
cib-update=249, confirmed=false) ok
1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname16_monitor_20000 (call=241, rc=0, 
cib-update=250, confirmed=false) ok
1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname14_monitor_20000 (call=242, rc=0, 
cib-update=251, confirmed=false) ok
1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname12_monitor_20000 (call=243, rc=0, 
cib-update=252, confirmed=false) ok
1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname24_monitor_20000 (call=244, rc=0, 
cib-update=253, confirmed=false) ok
1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname22_monitor_20000 (call=245, rc=0, 
cib-update=254, confirmed=false) ok
1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname20_monitor_20000 (call=246, rc=0, 
cib-update=255, confirmed=false) ok
1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: 
process_lrm_event: LRM operation resname18_monitor_20000 (call=247, rc=0, 
cib-update=256, confirmed=false) ok

Alain




De :    Dejan Muhamedagic <deja...@fastmail.fm>
A :     General Linux-HA mailing list <linux-ha@lists.linux-ha.org>
Date :  22/11/2011 13:18
Objet : Re: [Linux-HA] Antw:  What about "start-delay" attribute status ?
Envoyé par :    linux-ha-boun...@lists.linux-ha.org



Hi,

On Tue, Nov 22, 2011 at 08:17:28AM +0100, alain.mou...@bull.net wrote:
> Hi
> 
> By the way, is there a description somewhere of parameters from 
> /etc/sysconfig/pacemaker ?

To the best of my knowledge, there is only LRMD_MAX_CHILDREN.

Thanks,

Dejan

> Thanks
> Alain
> 
> 
> 
> De :    Dejan Muhamedagic <deja...@fastmail.fm>
> A :     General Linux-HA mailing list <linux-ha@lists.linux-ha.org>
> Date :  21/11/2011 15:48
> Objet : Re: [Linux-HA] Antw:  What about "start-delay" attribute status 
?
> Envoyé par :    linux-ha-boun...@lists.linux-ha.org
> 
> 
> 
> On Mon, Nov 21, 2011 at 03:07:43PM +0100, alain.mou...@bull.net wrote:
> > Thanks Dejan,
> > ok I understand, so we have to choose between a small value of 
> > LRMD_MAX_CHILDREN
> > and on start, stop, or status of 64 resources it will take a while ... 

> > and a big value of LRMD_MAX_CHILDREN and then either the start, stop 
and
> > at best, status will be achieved very quickly as they are parallelized 

> or 
> > at
> > worst the system will be "on knees" ... 
> > We'll give it a try ... as I have big computers ;-)
> 
> Just note that you should try to think of every possible
> combination of resource operations. For instance, imagine 64 Xen
> VMs trying to start in parallel. Better be conservative than
> to push your nodes to their limit.
> 
> > But my question is now : when you write :
> > "Let me just add that operations which were supposed to
> > start at the same time get spaced out."
> > So if LRMD_MAX_CHILDREN=4, that means that if ask for start on 32 
> > resources at the
> > same time, Pacemaker will mange 4, delay the remaing 28, manage 4 
again, 
> 
> > etc. so
> > it will be completed in 8 shots, right ?
> 
> No.
> 
> > But what is the delay value between each shot ?
> 
> There is none. As soon as one operation finishes, another one
> gets started. Now, if you have say four big RDBMS instances
> starting and each of them takes five minutes or so, the other
> resources will obviously stay in the queue for five minutes.
> 
> Anyway, you can see for yourself on cluster start, just grep
> your logs for lrmd:.*rsc:, it should show you all timestamps
> when certain operation was started (apart from recurring
> monitors).
> 
> Thanks,
> 
> Dejan
> 
> > Thanks
> > Alain
> > 
> > 
> > 
> > 
> > De :    Dejan Muhamedagic <deja...@fastmail.fm>
> > A :     General Linux-HA mailing list <linux-ha@lists.linux-ha.org>
> > Date :  21/11/2011 13:45
> > Objet : Re: [Linux-HA] Antw:  What about "start-delay" attribute 
status 
> ?
> > Envoyé par :    linux-ha-boun...@lists.linux-ha.org
> > 
> > 
> > 
> > Hi,
> > 
> > On Mon, Nov 21, 2011 at 01:42:15PM +0100, alain.mou...@bull.net wrote:
> > > Hi Florian,
> > > ok I've checked the thread, so that means that on RHEL6 , if I have 
> > let's 
> > > say 32 resources groups of 2 primitives on
> > > each node, I can set the LRMD_MAX_CHILDREN environment variable in 
> > > /etc/sysconfig/pacemaker to 64 ? 
> > 
> > The number of resources shouldn't be the main criteria for
> > setting this parameter, but what can your nodes handle without
> > being overloaded. So, 64 sounds sounds like you have some really
> > big computers :) It also depends on the nature of the cluster
> > resources. The default of 4 is rather conservative, perhaps
> > nowadays 8 would be better.
> > 
> > > Is it acceptable for lrmd and Pacemaker ? Or will we face any 
> > side-effect 
> > > ?
> > 
> > LRMD_MAX_CHILDREN is the maximum number of resource operations
> > allowed to run in parallel. Hope that that answers your question.
> > 
> > Thanks,
> > 
> > Dejan
> > 
> > > Thanks
> > > Alain
> > > 
> > > 
> > > 
> > > De :    Florian Haas <flor...@hastexo.com>
> > > A :     General Linux-HA mailing list <linux-ha@lists.linux-ha.org>
> > > Date :  21/11/2011 12:58
> > > Objet : Re: [Linux-HA] Antw:  What about "start-delay" attribute 
> status 
> > ?
> > > Envoyé par :    linux-ha-boun...@lists.linux-ha.org
> > > 
> > > 
> > > 
> > > On 11/21/11 13:03, alain.mou...@bull.net wrote:
> > > > Hi,
> > > > yes that's exactly the purpose of my question (and exactly the 
same 
> > > > problem of "big-monitoring-trains")  : 
> > > > if we can always use start-delay to ramdomize the first monitor 
> > > operation 
> > > > time on all the resources on a server,
> > > > but if it is really deprecated, that means that in the future this 

> > > option 
> > > > will no more
> > > > be managed by Pacemaker (perhaps it already is the case ... ?) , 
so 
> in 
> > 
> > > > this case
> > > > we must not use this option.
> > > > 
> > > > Could someone give us a clear status on this option "start-delay" 
?
> > > 
> > > If your RA needs it, then the RA is most likely broken. :)
> > > 
> > > For monitor operations allegedly piling up, please consider this:
> > > http://www.gossamer-threads.com/lists/linuxha/pacemaker/76152#76152
> > > 
> > > Hope this helps.
> > > Cheers,
> > > Florian
> > > 
> > > -- 
> > > Need help with High Availability?
> > > http://www.hastexo.com/now
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > > 
> > > _______________________________________________
> > > Linux-HA mailing list
> > > Linux-HA@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: What about "start-delay" attribute status ?

Reply via email to