ha-log should give you a detailed picture of what each box is thinking as they startup. I've always been able to track down the problem with that info for my systems.
David Lang On Mon, 9 Aug 2010, Igor Chudov wrote: > Pushkar, I will be at work tomorrow (took a couple of days off) and > will try mcast. > > This issue is a huge problem for is, as our old installation of what I > am trying to replace is having issues. > > I am at the end of my rope and will do everything possible to resolve it. > > What presently bothers me is that asides from some suggestions to try > this and that, I have no mechanism to debug this problem. > > Igor > > On Mon, Aug 9, 2010 at 12:53 PM, Pushkar Pradhan <[email protected]> > wrote: >> >> >> ________________________________ >> >> From: [email protected] on behalf of Igor Chudov >> Sent: Thu 8/5/2010 9:47 PM >> To: General Linux-HA mailing list >> Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH machines >> arebootedat the same time >> >> >> >> On Thu, Aug 5, 2010 at 6:32 PM, Pushkar Pradhan <[email protected]> >> wrote: >>> I set up two Ubuntu Lucid machines to serve as a two-node Heartbeat >>> cluster without Corosync. >>> >>> They support a DRBD service, IP address, NFS and Samba services. >>> >>> Things mostly work, and if I reboot one server, the other takes over. >>> >>> What does NOT work is that if I reboot both, then *neither* takes >>> over. When they are in this state -- both running and none active -- >>> if I reboot one of them, then the other begins to work. >>> >>> This is becoming a real embarrassment for me at work and I would love >>> to get some help. >>> >>> haresources: >>> pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24 >>> nfs-kernel-server smbd >>> pfs-srv4 >>> >>> ha.cf: >>> use_logd on >>> udpport 12694 >>> keepalive 1 >>> warntime 15 >>> deadtime 20 >>> debug 1 >>> initdead 60 >>> bcast eth1 >>> node pfs-srv3 >>> node pfs-srv4 >>> auto_failback on >>> crm off >>> >>> >>> Can you experiment with a really large initdead time like 2 or 5 minutes? >>> Also see if it helps to do unicast messaging? >> >> Larger initdead does not help. I will try unicast tomorrow but I doubt >> it will help. >> >> Pushkar, could someone or someone else suggest some tools to trouble >> shoot this issue? >> >> Right now I am poking in the dark. >> >> >> Igor, >> >> Sorry to hear that. Any luck with unicast messaging? I am interested in >> helping you, if you want we can take this discussion offline, i.e. off the >> HA mailing list. >> >> pushkar >> >> >> >> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
