Hi Igor, I've been following this and your previous thread a little bit and have some suggestions.
What version(s) of the following packages are installed on your system: Heartbeat, drbd, drbdlinks, cluster-glue, cluster-agents drbd0.7-module-source or drbd8-source (these package names are based on Ubuntu Lucid which you indicated you were running). How is DRBD configured on your system? (can you post your configs please?) I've run heartbeat and corosync both in production using DRBD and apart from some very occasional odd behaviour they all work perfectly. Are you running any kind of iptables firewalls on your systems? The first thing to establish is that DRBD is working properly and you can manually promote / demote your resource - because if that doesn't work nothing will. Secondly, DRBD resource agents changed in version 8.x, the one supplied with Heartbeat is _not_ supported according to Linbit. Instead you should use Linbit's OCF resource agent (ocf:linbit:heartbeat) DRBDLINKS is useful but it relies upon DRBD being started by the OS, not by the cluster manager. This is why many people use heartbeat/pacemaker because drbdlinks can still be used in a controlled manner (after permitting the cluster to start drbd first). Start simple - start off just getting DRBD to go primary, worry about drbdlinks later. What about startup order on boot? Is heartbeat started before or after DRBD? Probably in your case (if you're using drbdlinks) Heartbeat should be started _after_ drbd (in RHEL systems its typically the reverse). After a reboot what does cat /proc/drbd say on each system? That will at least confirm that DRBD is in the correct state. Yes you are on the right track with heartbeat or corosync - but clusters are not simple creatures and many things can cause intermittent or downright silly problems (such as port span or auto negotiation on switches). Don't give up. Best Regards, Brett On Wed, 2010-08-11 at 16:23 -0500, Igor Chudov wrote: > On Wed, Aug 11, 2010 at 3:24 PM, Dimitri Maziuk <dmaz...@bmrb.wisc.edu> wrote: > > On Wednesday 11 August 2010 15:12, Igor Chudov wrote: > > ... > >> At this point, I am beginning to have my doubts about this whole > >> heartbeat system and its ability to serve for years, in what looks to > >> me like simple configuration. > > ... > > > > Well, that's kinda why I stick to 2.1.4 (also b/c it's a stock rpm on > > centos) > > and v1-style config. From back when things were simple stupid. > > Simple stupid is exactly what I want. > > > As I understand it, most heartbeat work since was done on v2 features: xml, > > resource monitoring, corosync, pacemaker... which I'm either not missing > > (mon > > works just fine for monitoring) or actively don't want (xml in particular). > > I would not mind xml if either 1) it was documented or 2) the command > line tool was documented beyond just mentioning every field or 3) the > GUI was working instead of not working. > > > When I need a 3-node cluster I'll think about those. Until then, 2.1.4 is > > not > > perfect but it works well enough. > > My heartbeat is 3.0.3. > > Do you think that, say, 2.1.4 s sufficiently bug free that I could > install it from source and just let it run forever? > > I mean, I just want to get that simple two node cluster to run. I am > not trying to back up Mars to Venus and Uranus by TCP over light rays. > is 2.1.4 is easy and works, I will just install it. I assume that it > can work with standard Ubuntu Lucid drbd. > > > i > -- Best Regards, Brett Delle Grazie ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems