Hi,
On Thu, Aug 23, 2012 at 04:47:11PM -0400, David Parker wrote:
>
> On 08/23/2012 04:19 PM, Jake Smith wrote:
> >>Okay, I think I've almost got this. I updated my Pacemaker config
> >>and
> >>made a few changes. I put the MysqlIP and mysqld primitives into a
> >>resource group called "mysqld-resources", ordered them such that
> >>mysqld
> >>will always wait for MysqlIP to be ready first, and added constraints
> >>to
> >>make ha1 the preferred host for the mysqld-resources group and ha2
> >>the
> >>failover host. I also created STONITH devices for both ha1 and ha2,
> >>and
> >>added constraints to fix the STONIOTH location issues. My new
> >>constraints section looks like this:
> >>
> >>
> >> >>score="INFINITY"/>
> >> >>score="INFINITY"/>
> >Don't need the 2 above as long as you have the 2 negative locations below
> >for stonith locations. I prefer the negative below because if you ever
> >expanded to greater than 2 nodes the stonith for any node could run on any
> >node but itself.
>
> Good call. I'll take those out of the config.
>
> >> >>score="-INFINITY"/>
> >> >>score="-INFINITY"/>
> >> >>score="200"/>
> >Don't need the 0 score below either - the 200 above will take care of it.
> >Pretty sure no location constraint is the same as a 0 score location.
>
> That was based on the example found in the documentation. If I
> don't have the 0 score entry, will the service still fail over?
>
> >>
> >>
> >>
> >>Everything seems to work. I had the virtual IP and mysqld running on
> >>ha1, and not on ha2. I shut down ha1 using "poweroff -n" and both
> >>the
> >>virtual IP and mysqld came up on ha2 almost instantly. When I
> >>powered
> >>ha1 on again, ha2 shut down the the virtual IP and mysqld. The
> >>virtual
> >>IP moved over instantly; a continuous ping of the IP produced one
> >>"Time
> >>to live exceeded" message and one packet was lost, but that's to be
> >>expected. However, mysqld took almost 30 seconds to start up on ha1
> >>after being stopped on ha2, and I'm not exactly sure why.
> >>
> >>Here's the relevant log output from ha2:
> >>
> >>Aug 23 11:42:48 ha2 crmd: [1166]: info: te_rsc_command: Initiating
> >>action 16: stop mysqld_stop_0 on ha2 (local)
> >>Aug 23 11:42:48 ha2 crmd: [1166]: info: do_lrm_rsc_op: Performing
> >>key=16:1:0:ec1989a8-ff84-4fc5-9f48-88e9b285797c op=mysqld_stop_0 )
> >>Aug 23 11:42:48 ha2 lrmd: [1163]: info: rsc:mysqld:10: stop
> >>Aug 23 11:42:50 ha2 lrmd: [1163]: info: RA output:
> >>(mysqld:stop:stdout)
> >>Stopping MySQL daemon: mysqld_safe.
> >>Aug 23 11:42:50 ha2 crmd: [1166]: info: process_lrm_event: LRM
> >>operation
> >>mysqld_stop_0 (call=10, rc=0, cib-update=57, confirmed=true) ok
> >>Aug 23 11:42:50 ha2 crmd: [1166]: info: match_graph_event: Action
> >>mysqld_stop_0 (16) confirmed on ha2 (rc=0)
> >>
> >>And here's the relevant log output from ha1:
> >>
> >>Aug 23 11:42:47 ha1 crmd: [1243]: info: do_lrm_rsc_op: Performing
> >>key=8:1:7:ec1989a8-ff84-4fc5-9f48-88e9b285797c op=mysqld_monitor_0 )
> >>Aug 23 11:42:47 ha1 lrmd: [1240]: info: rsc:mysqld:5: probe
> >>Aug 23 11:42:47 ha1 crmd: [1243]: info: process_lrm_event: LRM
> >>operation
> >>mysqld_monitor_0 (call=5, rc=7, cib-update=10, confirmed=true) not
> >>running
> >>Aug 23 11:43:36 ha1 crmd: [1243]: info: do_lrm_rsc_op: Performing
> >>key=11:3:0:ec1989a8-ff84-4fc5-9f48-88e9b285797c op=mysqld_start_0 )
> >>Aug 23 11:43:36 ha1 lrmd: [1240]: info: rsc:mysqld:11: start
> >>Aug 23 11:43:36 ha1 lrmd: [1240]: info: RA output:
> >>(mysqld:start:stdout)
> >>Starting MySQL daemon: mysqld_safe.#012(See
> >>/usr/local/mysql/data/mysql.messages for messages).
> >>Aug 23 11:43:36 ha1 crmd: [1243]: info: process_lrm_event: LRM
> >>operation
> >>mysqld_start_0 (call=11, rc=0, cib-update=18, confirmed=true) ok
> >>
> >>So, ha2 stopped mysqld at 11:42:50, but ha1 didn't start mysqld until
> >>11:43:36, a full 46 seconds after it was stopped on ha2. Any ideas
> >>why
> >>the delay for mysqld was so long, when the MysqlIP resource moved
> >>almost
> >>instantly?
> >Couple thoughts.
> >
> >Are you sure both servers have the same time (in sync)?
>
> Yep. They're both using NTP.
>
> >On HA2 did verify mysqld was actually done stopping at the 11:42:50 mark?
> >I don't use mysql so I can't say from experience.
>
> Yes, I kept checking (with "ps -ef | grep mysqld") every few
> seconds, and it stopped running around that time. As soon as it
> stopped running on ha2, I started checking on ha1 and it was quite a
> while before mysqld started. I knew it was at least 30 seconds, and
> I believe it was actually 42 seconds as the logs indicate.
>
> >Just curious but do you really want it to failback if it's actively running
> >on ha2?
>
> Interesting point. I had just assumed that it was good practice to
> have a preferred node for a service, but I guess it doesn't matter.
> If I don't care which node the services run on, do I just remove the
> location constraints for the "mysql-resources" group altogether?
>