Re: [Linux-HA] failover questions

Florian Haas Tue, 22 Nov 2011 11:35:33 -0800

On 11/22/11 20:18, Willi Fehler wrote:
> Hi,
> 
> I'm trying to setup a database cluster with MySQL/Redis. My problem is, 
> the failover is working if I shutdown/reboot one node.


I take it that _that_ part isn't really a problem. :)

> If I shutdown the network on one node(ifdown eth0 or ifdown eth1), the 
> failover isn't working.

No failover would be expected there. So what's "not working" here?

 If I shutdown eth0 and eth1
> the failover is working

If you shut down both your cluster communications links and you failed
to configure fencing of any kind, then you don't get any "working"
failover. Instead, you'll have your service running on both nodes.

 but if I reboot the node without network access,
> I get a split-brain.

No, you get split brain straight away, it's just that it's not detected
until you reboot (and DRBD reconnects).

> I hope you can help me.

You ignored this part of the DRBD User's Guide, and you really shouldn't
have:

http://www.drbd.org/users-guide-8.3/s-pacemaker-fencing.html

A few other issues:

> My current setup:
> 2 nodes with CentOS-6.0
> Pacemaker

Suggest to go to Pacemaker 1.1.5 instead of using the stock 1.1.2 that
ships with 6.0.

> OpenAIS
> Corosync

Strongly recommend to go with at least Corosync 1.4.1 if you're using
RRP (which you are).

> DRBD

I'll assume that that's DRBD 8.3.x as opposed to 8.4.0.

> MySQL
> Redis
> 

> crm(live)configure#primitive mysqld lsb:mysql \
>                             op monitor interval="15s"

Strongly suggest to use ocf:heartbeat:mysql here instead.

> crm(live)configure#primitive redisd lsb:redis \
>                             op monitor interval="15s"
> crm(live)configure#group mysql_redis fs_mysql ip_mysql_redis mysqld 
> fs_redis redisd
> crm(live)configure#location cli-prefer-mysql_redis mysql_redis \
>                     rule $id="cli-prefer-rule-mysql_redis" inf: #uname 

That looks like a leftover constraint set by "crm resource move";
consider doing "crm resource unmove".

> eq ESCPDB-HA-01v.escapio.local

.local is a really poor choice for a domain name, unless you're running
a DNS-free environment and everything resolves via mDNS.

> # Please read the corosync.conf.5 manual page
> compatibility: whitetank
> 
> totem {
>          version: 2
>          secauth: off
>          threads: 0
>          rrp_mode: passive
>          interface {
>                  ringnumber: 0
>                  bindnetaddr: 10.246.214.0
>                  mcastaddr: 225.94.1.1
>                  mcastport: 5404
>          }
>          interface {
>                  ringnumber: 1
>                  bindnetaddr: 10.10.10.0
>                  mcastaddr: 225.94.2.1
>                  mcastport: 5406
>          }
> }
> 
> logging {
>          fileline: off
>          to_stderr: no
>          to_logfile: yes
>          to_syslog: yes
>          logfile: /var/log/corosync.log
>          debug: off
>          timestamp: on
>          logger_subsys {
>                  subsys: AMF
>                  debug: off
>          }
> }
> 
> amf {
>          mode: disabled
> }
> 
> service {
>    ver:       0
>    name:      pacemaker
>    use_mgmtd: yes
> }

Strongly suggest to use ver:1 and pacemakerd, and to disable mgmtd.

Hope this is useful.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] failover questions

Reply via email to