Re: [Linux-HA] Heartbeat does not take over if BOTH machines arebootedat the same time

David Lang Mon, 09 Aug 2010 15:08:40 -0700

ha-log should give you a detailed picture of what each box is thinking as they 
startup. I've always been able to track down the problem with that info for my 
systems.


David Lang

On Mon, 9 Aug 2010, Igor Chudov wrote:

> Pushkar, I will be at work tomorrow (took a couple of days off) and
> will try mcast.
>
> This issue is a huge problem for is, as our old installation of what I
> am trying to replace is having issues.
>
> I am at the end of my rope and will do everything possible to resolve it.
>
> What presently bothers me is that asides from some suggestions to try
> this and that, I have no mechanism to debug this problem.
>
> Igor
>
> On Mon, Aug 9, 2010 at 12:53 PM, Pushkar Pradhan <[email protected]> 
> wrote:
>>
>>
>> ________________________________
>>
>> From: [email protected] on behalf of Igor Chudov
>> Sent: Thu 8/5/2010 9:47 PM
>> To: General Linux-HA mailing list
>> Subject: Re: [Linux-HA] Heartbeat does not take over if BOTH machines 
>> arebootedat the same time
>>
>>
>>
>> On Thu, Aug 5, 2010 at 6:32 PM, Pushkar Pradhan <[email protected]> 
>> wrote:
>>> I set up two Ubuntu Lucid machines to serve as a two-node Heartbeat
>>> cluster without Corosync.
>>>
>>> They support a DRBD service, IP address, NFS and Samba services.
>>>
>>> Things mostly work, and if I reboot one server, the other takes over.
>>>
>>> What does NOT work is that if I reboot both, then *neither* takes
>>> over. When they are in this state -- both running and none active --
>>> if I reboot one of them, then the other begins to work.
>>>
>>> This is becoming a real embarrassment for me at work and I would love
>>> to get some help.
>>>
>>> haresources:
>>> pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24
>>> nfs-kernel-server smbd
>>> pfs-srv4
>>>
>>> ha.cf:
>>> use_logd on
>>> udpport 12694
>>> keepalive 1
>>> warntime 15
>>> deadtime 20
>>> debug 1
>>> initdead 60
>>> bcast eth1
>>> node pfs-srv3
>>> node pfs-srv4
>>> auto_failback on
>>> crm off
>>>
>>>
>>> Can you experiment with a really large initdead time like 2 or 5 minutes? 
>>> Also see if it helps to do unicast messaging?
>>
>> Larger initdead does not help. I will try unicast tomorrow but I doubt
>> it will help.
>>
>> Pushkar, could someone or someone else suggest some tools to trouble
>> shoot this issue?
>>
>> Right now I am poking in the dark.
>>
>>
>> Igor,
>>
>> Sorry to hear that. Any luck with unicast messaging? I am interested in 
>> helping you, if you want we can take this discussion offline, i.e. off the 
>> HA mailing list.
>>
>> pushkar
>>
>>
>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat does not take over if BOTH machines arebootedat the same time

Reply via email to