Sent: Wed Jan 12 2011 01:56:31 GMT-0700 (Mountain Standard Time)
From: Lars Ellenberg <lars.ellenb...@linbit.com>
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] Speed up resource failover?
On Wed, Jan 12, 2011 at 09:30:41AM +0100, Robert van Leeuwen wrote:
-----Original message-----
To: pacemaker@oss.clusterlabs.org; From: Patrick H. <pacema...@feystorm.net>
Sent:   Wed 12-01-2011 00:06
Subject:        [Pacemaker] Speed up resource failover?
Attachment:     inline.txt
As it is right now, pacemaker seems to take a long time (in computer terms) to fail over resources from one node to the other. Right now, I have 477 IPaddr2 resources evenly distributed among 2 nodes. When I put one node in standby, it takes approximately 5 minutes to move the half of those from one node to the other. And before you ask, theyre because of SSL http virtual hosting. I have no order rules, colocations or anything on those resources, so it should be able migrate the entire list simultaneously, but it seems to do them sequentially. Is there any way to make it migrate the resources in parallel? Or at the very least speed it up?
Patrick,

It's probably not so much the cluster suite but is has to do with the specific resource script. For a proper takeover of a IP you have to do an arp "deregister/register".
This will take a few seconds.
This is apparently not true :-/
I have attached a portion of the lrmd log showing an example of this. Notice that the very first line it starts the vip_55.63 resource, and then immediately on the next line it exits successfully. Another point of note is that somehow after the script already exited, lrmd logs the stderr output from it. I'm not sure if its just delayed logging or what. However, even if the script is still running, notice that there is a huge time gap between 16:11:01 and 16:11:25 where its just sitting there doing nothing. I even did a series of `ps` commands to watch for the processes, and it starts up a bunch of them, and then they all exit, and it sits there for a long period before starting up more. So it is definitely not the resource script slowing it down.

Also, in the log, notice that its only starting up a few scripts every second. It should be able to fire off every single script at the exact same time.

As long as a resource script is busy the cluster suite will not start the next 
action.
Parallel execution is not possible in the cluster suite as far as I know.
(without being a programmer myself I would expect it is pretty tricky to implement 
parallelization "code-wise" and making 100% sure the cluster does not break)

You could consider to edit the IPaddr2 resource script so it does not wait for 
the arp commands.
At you're own risk of course ;-)

There is the cluster option "batch-limit" (in the cib), see
"configuration explained".
and there is lrmd "max-children" (can be set in some /etc/defaults/ or
/etc/sysconfig file, should be set by the init script).
you can set it manually with lrmadmin -p max-children $some_number
That should help you a bit.
But don't overdo. Raise them slowly ;-)


batch-limit it says defaults to 30 which seems like a sane value. I tried playing with the max-children and upped it to 30 as well, but to no effect. It does seem to be launching 30 instances of the IPaddr2 script at a time (as can be seen from the attached log), but the problem is apparently that its sitting there for long periods of time before starting up the next batch. I would think that when one of the 30 completes, it would launch another to take its place. But instead it launches 30, then sits there for a while, then launches another 30.
Jan 12 16:10:50 ha01 lrmd: [4707]: info: rsc:vip_55.63:1390: start
Jan 12 16:10:50 ha01 lrmd: [4707]: info: Managed vip_55.63:start process 8826 
exited with return code 0.
Jan 12 16:10:50 ha01 lrmd: [4707]: info: rsc:vip_55.65:1391: start
Jan 12 16:10:50 ha01 lrmd: [4707]: info: Managed vip_55.65:start process 8878 
exited with return code 0.
Jan 12 16:10:51 ha01 lrmd: [4707]: info: rsc:vip_55.67:1392: start
Jan 12 16:10:51 ha01 lrmd: [4707]: info: Managed vip_55.67:start process 8926 
exited with return code 0.
Jan 12 16:10:51 ha01 lrmd: [4707]: info: rsc:vip_55.69:1393: start
Jan 12 16:10:51 ha01 lrmd: [4707]: info: Managed vip_55.69:start process 8976 
exited with return code 0.
Jan 12 16:10:51 ha01 lrmd: [4707]: info: rsc:vip_55.71:1394: start
Jan 12 16:10:51 ha01 lrmd: [4707]: info: Managed vip_55.71:start process 9024 
exited with return code 0.
Jan 12 16:10:51 ha01 lrmd: [4707]: info: rsc:vip_55.73:1395: start
Jan 12 16:10:51 ha01 lrmd: [4707]: info: Managed vip_55.73:start process 9072 
exited with return code 0.
Jan 12 16:10:51 ha01 lrmd: [4707]: info: rsc:vip_55.75:1396: start
Jan 12 16:10:51 ha01 lrmd: [4707]: info: Managed vip_55.75:start process 9120 
exited with return code 0.
Jan 12 16:10:52 ha01 lrmd: [4707]: info: rsc:vip_55.77:1397: start
Jan 12 16:10:52 ha01 lrmd: [4707]: info: Managed vip_55.77:start process 9170 
exited with return code 0.
Jan 12 16:10:52 ha01 lrmd: [4707]: info: rsc:vip_55.79:1398: start
Jan 12 16:10:52 ha01 lrmd: [4707]: info: Managed vip_55.79:start process 9218 
exited with return code 0.
Jan 12 16:10:52 ha01 lrmd: [4707]: info: rsc:vip_55.81:1399: start
Jan 12 16:10:52 ha01 lrmd: [4707]: info: Managed vip_55.81:start process 9266 
exited with return code 0.
Jan 12 16:10:52 ha01 lrmd: [4707]: info: rsc:vip_55.83:1400: start
Jan 12 16:10:53 ha01 lrmd: [4707]: info: Managed vip_55.83:start process 9316 
exited with return code 0.
Jan 12 16:10:53 ha01 lrmd: [4707]: info: rsc:vip_55.85:1401: start
Jan 12 16:10:53 ha01 lrmd: [4707]: info: Managed vip_55.85:start process 9364 
exited with return code 0.
Jan 12 16:10:53 ha01 lrmd: [4707]: info: rsc:vip_55.87:1402: start
Jan 12 16:10:53 ha01 lrmd: [4707]: info: Managed vip_55.87:start process 9412 
exited with return code 0.
Jan 12 16:10:53 ha01 lrmd: [4707]: info: rsc:vip_55.89:1403: start
Jan 12 16:10:53 ha01 lrmd: [4707]: info: Managed vip_55.89:start process 9460 
exited with return code 0.
Jan 12 16:10:53 ha01 lrmd: [4707]: info: rsc:vip_55.91:1404: start
Jan 12 16:10:53 ha01 lrmd: [4707]: info: Managed vip_55.91:start process 9510 
exited with return code 0.
Jan 12 16:10:54 ha01 lrmd: [4707]: info: rsc:vip_55.93:1405: start
Jan 12 16:10:54 ha01 lrmd: [4707]: info: Managed vip_55.93:start process 9558 
exited with return code 0.
Jan 12 16:10:54 ha01 lrmd: [4707]: info: rsc:vip_55.95:1406: start
Jan 12 16:10:54 ha01 lrmd: [4707]: info: Managed vip_55.95:start process 9606 
exited with return code 0.
Jan 12 16:10:54 ha01 lrmd: [4707]: info: rsc:vip_55.97:1407: start
Jan 12 16:10:54 ha01 lrmd: [4707]: info: Managed vip_55.97:start process 9656 
exited with return code 0.
Jan 12 16:10:54 ha01 lrmd: [4707]: info: rsc:vip_55.99:1408: start
Jan 12 16:10:54 ha01 lrmd: [4707]: info: RA output: (vip_55.63:start:stderr) 
ARPING 165.212.55.63 from 165.212.55.63 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:54 ha01 lrmd: [4707]: info: Managed vip_55.99:start process 9704 
exited with return code 0.
Jan 12 16:10:54 ha01 lrmd: [4707]: info: rsc:vip_55.101:1409: start
Jan 12 16:10:54 ha01 lrmd: [4707]: info: RA output: (vip_55.65:start:stderr) 
ARPING 165.212.55.65 from 165.212.55.65 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:54 ha01 lrmd: [4707]: info: Managed vip_55.101:start process 9752 
exited with return code 0.
Jan 12 16:10:55 ha01 lrmd: [4707]: info: RA output: (vip_55.67:start:stderr) 
ARPING 165.212.55.67 from 165.212.55.67 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:55 ha01 lrmd: [4707]: info: rsc:vip_55.103:1410: start
Jan 12 16:10:55 ha01 lrmd: [4707]: info: Managed vip_55.103:start process 9802 
exited with return code 0.
Jan 12 16:10:55 ha01 lrmd: [4707]: info: rsc:vip_55.105:1411: start
Jan 12 16:10:55 ha01 lrmd: [4707]: info: RA output: (vip_55.69:start:stderr) 
ARPING 165.212.55.69 from 165.212.55.69 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:55 ha01 lrmd: [4707]: info: Managed vip_55.105:start process 9850 
exited with return code 0.
Jan 12 16:10:55 ha01 lrmd: [4707]: info: rsc:vip_55.107:1412: start
Jan 12 16:10:55 ha01 lrmd: [4707]: info: RA output: (vip_55.71:start:stderr) 
ARPING 165.212.55.71 from 165.212.55.71 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:55 ha01 lrmd: [4707]: info: Managed vip_55.107:start process 9898 
exited with return code 0.
Jan 12 16:10:55 ha01 lrmd: [4707]: info: rsc:vip_55.109:1413: start
Jan 12 16:10:55 ha01 lrmd: [4707]: info: RA output: (vip_55.73:start:stderr) 
ARPING 165.212.55.73 from 165.212.55.73 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:55 ha01 lrmd: [4707]: info: Managed vip_55.109:start process 9946 
exited with return code 0.
Jan 12 16:10:55 ha01 lrmd: [4707]: info: rsc:vip_55.111:1414: start
Jan 12 16:10:55 ha01 lrmd: [4707]: info: RA output: (vip_55.75:start:stderr) 
ARPING 165.212.55.75 from 165.212.55.75 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:55 ha01 lrmd: [4707]: info: Managed vip_55.111:start process 9997 
exited with return code 0.
Jan 12 16:10:56 ha01 lrmd: [4707]: info: rsc:vip_55.113:1415: start
Jan 12 16:10:56 ha01 lrmd: [4707]: info: RA output: (vip_55.77:start:stderr) 
ARPING 165.212.55.77 from 165.212.55.77 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:56 ha01 lrmd: [4707]: info: Managed vip_55.113:start process 10045 
exited with return code 0.
Jan 12 16:10:56 ha01 lrmd: [4707]: info: rsc:vip_55.115:1416: start
Jan 12 16:10:56 ha01 lrmd: [4707]: info: Managed vip_55.115:start process 10093 
exited with return code 0.
Jan 12 16:10:56 ha01 lrmd: [4707]: info: RA output: (vip_55.79:start:stderr) 
ARPING 165.212.55.79 from 165.212.55.79 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:56 ha01 lrmd: [4707]: info: rsc:vip_55.117:1417: start
Jan 12 16:10:56 ha01 lrmd: [4707]: info: Managed vip_55.117:start process 10143 
exited with return code 0.
Jan 12 16:10:56 ha01 lrmd: [4707]: info: RA output: (vip_55.81:start:stderr) 
ARPING 165.212.55.81 from 165.212.55.81 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:56 ha01 lrmd: [4707]: info: rsc:vip_55.119:1418: start
Jan 12 16:10:57 ha01 lrmd: [4707]: info: RA output: (vip_55.83:start:stderr) 
ARPING 165.212.55.83 from 165.212.55.83 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:57 ha01 lrmd: [4707]: info: Managed vip_55.119:start process 10191 
exited with return code 0.
Jan 12 16:10:57 ha01 lrmd: [4707]: info: rsc:vip_55.121:1419: start
Jan 12 16:10:57 ha01 lrmd: [4707]: info: RA output: (vip_55.85:start:stderr) 
ARPING 165.212.55.85 from 165.212.55.85 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:57 ha01 lrmd: [4707]: info: Managed vip_55.121:start process 10239 
exited with return code 0.
Jan 12 16:10:57 ha01 lrmd: [4707]: info: RA output: (vip_55.87:start:stderr) 
ARPING 165.212.55.87 from 165.212.55.87 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:57 ha01 lrmd: [4707]: info: RA output: (vip_55.89:start:stderr) 
ARPING 165.212.55.89 from 165.212.55.89 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:57 ha01 lrmd: [4707]: info: RA output: (vip_55.91:start:stderr) 
ARPING 165.212.55.91 from 165.212.55.91 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:58 ha01 lrmd: [4707]: info: RA output: (vip_55.93:start:stderr) 
ARPING 165.212.55.93 from 165.212.55.93 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:58 ha01 lrmd: [4707]: info: RA output: (vip_55.95:start:stderr) 
ARPING 165.212.55.95 from 165.212.55.95 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:58 ha01 lrmd: [4707]: info: RA output: (vip_55.97:start:stderr) 
ARPING 165.212.55.97 from 165.212.55.97 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:58 ha01 lrmd: [4707]: info: RA output: (vip_55.99:start:stderr) 
ARPING 165.212.55.99 from 165.212.55.99 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:58 ha01 lrmd: [4707]: info: RA output: (vip_55.101:start:stderr) 
ARPING 165.212.55.101 from 165.212.55.101 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:59 ha01 lrmd: [4707]: info: RA output: (vip_55.103:start:stderr) 
ARPING 165.212.55.103 from 165.212.55.103 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:59 ha01 lrmd: [4707]: info: RA output: (vip_55.105:start:stderr) 
ARPING 165.212.55.105 from 165.212.55.105 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:59 ha01 lrmd: [4707]: info: RA output: (vip_55.107:start:stderr) 
ARPING 165.212.55.107 from 165.212.55.107 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:10:59 ha01 lrmd: [4707]: info: RA output: (vip_55.109:start:stderr) 
ARPING 165.212.55.109 from 165.212.55.109 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:11:00 ha01 lrmd: [4707]: info: RA output: (vip_55.111:start:stderr) 
ARPING 165.212.55.111 from 165.212.55.111 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:11:00 ha01 lrmd: [4707]: info: RA output: (vip_55.113:start:stderr) 
ARPING 165.212.55.113 from 165.212.55.113 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:11:00 ha01 lrmd: [4707]: info: RA output: (vip_55.115:start:stderr) 
ARPING 165.212.55.115 from 165.212.55.115 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:11:00 ha01 lrmd: [4707]: info: RA output: (vip_55.117:start:stderr) 
ARPING 165.212.55.117 from 165.212.55.117 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:11:01 ha01 lrmd: [4707]: info: RA output: (vip_55.119:start:stderr) 
ARPING 165.212.55.119 from 165.212.55.119 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:11:01 ha01 lrmd: [4707]: info: RA output: (vip_55.121:start:stderr) 
ARPING 165.212.55.121 from 165.212.55.121 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:11:25 ha01 lrmd: [4707]: info: rsc:vip_55.123:1420: start
Jan 12 16:11:25 ha01 lrmd: [4707]: info: Managed vip_55.123:start process 10445 
exited with return code 0.
Jan 12 16:11:25 ha01 lrmd: [4707]: info: rsc:vip_55.125:1421: start
Jan 12 16:11:25 ha01 lrmd: [4707]: info: Managed vip_55.125:start process 10497 
exited with return code 0.
Jan 12 16:11:26 ha01 lrmd: [4707]: info: rsc:vip_55.127:1422: start
Jan 12 16:11:26 ha01 lrmd: [4707]: info: Managed vip_55.127:start process 10546 
exited with return code 0.
Jan 12 16:11:26 ha01 lrmd: [4707]: info: rsc:vip_55.129:1423: start
Jan 12 16:11:26 ha01 lrmd: [4707]: info: Managed vip_55.129:start process 10594 
exited with return code 0.
Jan 12 16:11:26 ha01 lrmd: [4707]: info: rsc:vip_55.131:1424: start
Jan 12 16:11:26 ha01 lrmd: [4707]: info: Managed vip_55.131:start process 10642 
exited with return code 0.
Jan 12 16:11:26 ha01 lrmd: [4707]: info: rsc:vip_55.133:1425: start
Jan 12 16:11:26 ha01 lrmd: [4707]: info: Managed vip_55.133:start process 10690 
exited with return code 0.
Jan 12 16:11:27 ha01 lrmd: [4707]: info: rsc:vip_55.135:1426: start
Jan 12 16:11:27 ha01 lrmd: [4707]: info: Managed vip_55.135:start process 10738 
exited with return code 0.
Jan 12 16:11:27 ha01 lrmd: [4707]: info: rsc:vip_55.137:1427: start
Jan 12 16:11:27 ha01 lrmd: [4707]: info: Managed vip_55.137:start process 10786 
exited with return code 0.
Jan 12 16:11:27 ha01 lrmd: [4707]: info: rsc:vip_55.139:1428: start
Jan 12 16:11:27 ha01 lrmd: [4707]: info: Managed vip_55.139:start process 10834 
exited with return code 0.
Jan 12 16:11:27 ha01 lrmd: [4707]: info: rsc:vip_55.141:1429: start
Jan 12 16:11:27 ha01 lrmd: [4707]: info: Managed vip_55.141:start process 10882 
exited with return code 0.
Jan 12 16:11:27 ha01 lrmd: [4707]: info: rsc:vip_55.143:1430: start
Jan 12 16:11:27 ha01 lrmd: [4707]: info: Managed vip_55.143:start process 10930 
exited with return code 0.
Jan 12 16:11:28 ha01 lrmd: [4707]: info: rsc:vip_55.145:1431: start
Jan 12 16:11:28 ha01 lrmd: [4707]: info: Managed vip_55.145:start process 10978 
exited with return code 0.
Jan 12 16:11:28 ha01 lrmd: [4707]: info: rsc:vip_55.147:1432: start
Jan 12 16:11:28 ha01 lrmd: [4707]: info: Managed vip_55.147:start process 11026 
exited with return code 0.
Jan 12 16:11:28 ha01 lrmd: [4707]: info: rsc:vip_55.149:1433: start
Jan 12 16:11:28 ha01 lrmd: [4707]: info: Managed vip_55.149:start process 11074 
exited with return code 0.
Jan 12 16:11:28 ha01 lrmd: [4707]: info: rsc:vip_55.151:1434: start
Jan 12 16:11:28 ha01 lrmd: [4707]: info: Managed vip_55.151:start process 11122 
exited with return code 0.
Jan 12 16:11:29 ha01 lrmd: [4707]: info: rsc:vip_55.153:1435: start
Jan 12 16:11:29 ha01 lrmd: [4707]: info: Managed vip_55.153:start process 11170 
exited with return code 0.
Jan 12 16:11:29 ha01 lrmd: [4707]: info: rsc:vip_55.155:1436: start
Jan 12 16:11:29 ha01 lrmd: [4707]: info: Managed vip_55.155:start process 11218 
exited with return code 0.
Jan 12 16:11:29 ha01 lrmd: [4707]: info: rsc:vip_55.157:1437: start
Jan 12 16:11:29 ha01 lrmd: [4707]: info: Managed vip_55.157:start process 11266 
exited with return code 0.
Jan 12 16:11:29 ha01 lrmd: [4707]: info: RA output: (vip_55.123:start:stderr) 
ARPING 165.212.55.123 from 165.212.55.123 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:11:29 ha01 lrmd: [4707]: info: rsc:vip_55.159:1438: start
Jan 12 16:11:29 ha01 lrmd: [4707]: info: Managed vip_55.159:start process 11314 
exited with return code 0.
Jan 12 16:11:29 ha01 lrmd: [4707]: info: RA output: (vip_55.125:start:stderr) 
ARPING 165.212.55.125 from 165.212.55.125 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)
Jan 12 16:11:29 ha01 lrmd: [4707]: info: rsc:vip_55.161:1439: start
Jan 12 16:11:29 ha01 lrmd: [4707]: info: Managed vip_55.161:start process 11362 
exited with return code 0.
Jan 12 16:11:30 ha01 lrmd: [4707]: info: RA output: (vip_55.127:start:stderr) 
ARPING 165.212.55.127 from 165.212.55.127 eth0 Sent 5 probes (5 broadcast(s)) 
Received 0 response(s)

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to