Hi,
On Thu, Aug 23, 2012 at 04:47:11PM -0400, David Parker wrote:
On 08/23/2012 04:19 PM, Jake Smith wrote:
Okay, I think I've almost got this. I updated my Pacemaker config
and
made a few changes. I put the MysqlIP and mysqld primitives into a
resource group called mysqld-resources, ordered them such that
mysqld
will always wait for MysqlIP to be ready first, and added constraints
to
make ha1 the preferred host for the mysqld-resources group and ha2
the
failover host. I also created STONITH devices for both ha1 and ha2,
and
added constraints to fix the STONIOTH location issues. My new
constraints section looks like this:
constraints
rsc_location id=loc-1 rsc=stonith-ha1 node=ha2
score=INFINITY/
rsc_location id=loc-2 rsc=stonith-ha2 node=ha1
score=INFINITY/
Don't need the 2 above as long as you have the 2 negative locations below
for stonith locations. I prefer the negative below because if you ever
expanded to greater than 2 nodes the stonith for any node could run on any
node but itself.
Good call. I'll take those out of the config.
rsc_location id=loc-3 rsc=stonith-ha1 node=ha1
score=-INFINITY/
rsc_location id=loc-4 rsc=stonith-ha2 node=ha2
score=-INFINITY/
rsc_location id=loc-5 rsc=mysql-resources node=ha1
score=200/
Don't need the 0 score below either - the 200 above will take care of it.
Pretty sure no location constraint is the same as a 0 score location.
That was based on the example found in the documentation. If I
don't have the 0 score entry, will the service still fail over?
rsc_location id=loc-6 rsc=mysql-resources node=ha2 score=0/
/constraints
Everything seems to work. I had the virtual IP and mysqld running on
ha1, and not on ha2. I shut down ha1 using poweroff -n and both
the
virtual IP and mysqld came up on ha2 almost instantly. When I
powered
ha1 on again, ha2 shut down the the virtual IP and mysqld. The
virtual
IP moved over instantly; a continuous ping of the IP produced one
Time
to live exceeded message and one packet was lost, but that's to be
expected. However, mysqld took almost 30 seconds to start up on ha1
after being stopped on ha2, and I'm not exactly sure why.
Here's the relevant log output from ha2:
Aug 23 11:42:48 ha2 crmd: [1166]: info: te_rsc_command: Initiating
action 16: stop mysqld_stop_0 on ha2 (local)
Aug 23 11:42:48 ha2 crmd: [1166]: info: do_lrm_rsc_op: Performing
key=16:1:0:ec1989a8-ff84-4fc5-9f48-88e9b285797c op=mysqld_stop_0 )
Aug 23 11:42:48 ha2 lrmd: [1163]: info: rsc:mysqld:10: stop
Aug 23 11:42:50 ha2 lrmd: [1163]: info: RA output:
(mysqld:stop:stdout)
Stopping MySQL daemon: mysqld_safe.
Aug 23 11:42:50 ha2 crmd: [1166]: info: process_lrm_event: LRM
operation
mysqld_stop_0 (call=10, rc=0, cib-update=57, confirmed=true) ok
Aug 23 11:42:50 ha2 crmd: [1166]: info: match_graph_event: Action
mysqld_stop_0 (16) confirmed on ha2 (rc=0)
And here's the relevant log output from ha1:
Aug 23 11:42:47 ha1 crmd: [1243]: info: do_lrm_rsc_op: Performing
key=8:1:7:ec1989a8-ff84-4fc5-9f48-88e9b285797c op=mysqld_monitor_0 )
Aug 23 11:42:47 ha1 lrmd: [1240]: info: rsc:mysqld:5: probe
Aug 23 11:42:47 ha1 crmd: [1243]: info: process_lrm_event: LRM
operation
mysqld_monitor_0 (call=5, rc=7, cib-update=10, confirmed=true) not
running
Aug 23 11:43:36 ha1 crmd: [1243]: info: do_lrm_rsc_op: Performing
key=11:3:0:ec1989a8-ff84-4fc5-9f48-88e9b285797c op=mysqld_start_0 )
Aug 23 11:43:36 ha1 lrmd: [1240]: info: rsc:mysqld:11: start
Aug 23 11:43:36 ha1 lrmd: [1240]: info: RA output:
(mysqld:start:stdout)
Starting MySQL daemon: mysqld_safe.#012(See
/usr/local/mysql/data/mysql.messages for messages).
Aug 23 11:43:36 ha1 crmd: [1243]: info: process_lrm_event: LRM
operation
mysqld_start_0 (call=11, rc=0, cib-update=18, confirmed=true) ok
So, ha2 stopped mysqld at 11:42:50, but ha1 didn't start mysqld until
11:43:36, a full 46 seconds after it was stopped on ha2. Any ideas
why
the delay for mysqld was so long, when the MysqlIP resource moved
almost
instantly?
Couple thoughts.
Are you sure both servers have the same time (in sync)?
Yep. They're both using NTP.
On HA2 did verify mysqld was actually done stopping at the 11:42:50 mark?
I don't use mysql so I can't say from experience.
Yes, I kept checking (with ps -ef | grep mysqld) every few
seconds, and it stopped running around that time. As soon as it
stopped running on ha2, I started checking on ha1 and it was quite a
while before mysqld started. I knew it was at least 30 seconds, and
I believe it was actually 42 seconds as the logs indicate.
Just curious but do you really want it to failback if it's actively running
on ha2?
Interesting point. I had just assumed that it was good practice to
have a preferred node for a service, but I guess it doesn't matter.
If I don't care which node the services run on, do I just remove the
location constraints for the mysql-resources group