how long do you wait for one machine to become active? on initial bootup there is an extra-long delay (put in based on experiance with machines that would boot so fast that they would boot up and decide the other box was dead before the Cisco switches would enable their network ports.)
what does ha-log show on pfs-srv3? David Lang On Mon, 9 Aug 2010, Nick Calvert wrote: > Date: Mon, 9 Aug 2010 21:10:13 +0100 > From: Nick Calvert <[email protected]> > Reply-To: General Linux-HA mailing list <[email protected]> > To: General Linux-HA mailing list <[email protected]> > Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH machines > arebootedat the same time > > I have the same problem, its not a huge issue for me but i wouldn't > mind fixing it. > > If this issue is resolved id appreciate if the 'solution' is posted on-list. > > Cheers, > > Nick > > On Mon, Aug 9, 2010 at 6:53 PM, Pushkar Pradhan <[email protected]> > wrote: >> >> >> ________________________________ >> >> From: [email protected] on behalf of Igor Chudov >> Sent: Thu 8/5/2010 9:47 PM >> To: General Linux-HA mailing list >> Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH machines >> arebootedat the same time >> >> >> >> On Thu, Aug 5, 2010 at 6:32 PM, Pushkar Pradhan <[email protected]> >> wrote: >>> I set up two Ubuntu Lucid machines to serve as a two-node Heartbeat >>> cluster without Corosync. >>> >>> They support a DRBD service, IP address, NFS and Samba services. >>> >>> Things mostly work, and if I reboot one server, the other takes over. >>> >>> What does NOT work is that if I reboot both, then *neither* takes >>> over. When they are in this state -- both running and none active -- >>> if I reboot one of them, then the other begins to work. >>> >>> This is becoming a real embarrassment for me at work and I would love >>> to get some help. >>> >>> haresources: >>> pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24 >>> nfs-kernel-server smbd >>> pfs-srv4 >>> >>> ha.cf: >>> use_logd on >>> udpport 12694 >>> keepalive 1 >>> warntime 15 >>> deadtime 20 >>> debug 1 >>> initdead 60 >>> bcast eth1 >>> node pfs-srv3 >>> node pfs-srv4 >>> auto_failback on >>> crm off >>> >>> >>> Can you experiment with a really large initdead time like 2 or 5 minutes? >>> Also see if it helps to do unicast messaging? >> >> Larger initdead does not help. I will try unicast tomorrow but I doubt >> it will help. >> >> Pushkar, could someone or someone else suggest some tools to trouble >> shoot this issue? >> >> Right now I am poking in the dark. >> >> >> Igor, >> >> Sorry to hear that. Any luck with unicast messaging? I am interested in >> helping you, if you want we can take this discussion offline, i.e. off the >> HA mailing list. >> >> pushkar >> >> >> >> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
