On Thu, Aug 5, 2010 at 6:32 PM, Pushkar Pradhan <[email protected]> wrote: > I set up two Ubuntu Lucid machines to serve as a two-node Heartbeat > cluster without Corosync. > > They support a DRBD service, IP address, NFS and Samba services. > > Things mostly work, and if I reboot one server, the other takes over. > > What does NOT work is that if I reboot both, then *neither* takes > over. When they are in this state -- both running and none active -- > if I reboot one of them, then the other begins to work. > > This is becoming a real embarrassment for me at work and I would love > to get some help. > > haresources: > pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24 > nfs-kernel-server smbd > pfs-srv4 > > ha.cf: > use_logd on > udpport 12694 > keepalive 1 > warntime 15 > deadtime 20 > debug 1 > initdead 60 > bcast eth1 > node pfs-srv3 > node pfs-srv4 > auto_failback on > crm off > > > Can you experiment with a really large initdead time like 2 or 5 minutes? > Also see if it helps to do unicast messaging?
Larger initdead does not help. I will try unicast tomorrow but I doubt it will help. Pushkar, could someone or someone else suggest some tools to trouble shoot this issue? Right now I am poking in the dark. i > > pushkar > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
