On Tue 12/18/2007 12:52 AM, Andrew Beekhof said:

>On Dec 17, 2007, at 11:28 PM, Scott Mann wrote:
>
>> On Mon 12/17/2007 1:36 AM, Andrew Beekhof said:
>>
>>> On Dec 14, 2007, at 6:31 PM, Scott Mann wrote:
>>>
>>>>
>>>> On Fri 12/14/2007 1:04 AM, Andrew Beekhof said:
>>>>
>>>>> On Dec 14, 2007, at 12:12 AM, Scott Mann wrote:
>>>>>
>>>>>>
>>>>>> On Thu 12/13/2007 3:09 PM, Andrew Beekhof said:
>>>>>>
>>>>>>> On Dec 13, 2007, at 8:11 PM, Scott Mann wrote:
>>>>>>
>>>>>>>>>>> I'm seeing about a 2.5minute delay between the time that
>>>>>>>>>>> heartbeat
>>>>>>>>>>> starts and the time that the IP address comes up on eth0:0
>>>>>>>>>>> (if it
>>>>>>>>>>> were 5minutes, I'd at least have a clue).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> i depends on your configured deadtime IIRC.
>>>>>>>>> what does ha.cf look like?
>>>>>>>>
>>>>>>>> Here's my ha.cf:
>>>>>>>>
>>>>>>>> logfacility     local0
>>>>>>>> keepalive 2
>>>>>>>> deadtime 30
>>>>>>>> warntime 10
>>>>>>>> initdead 120
>>>>>>>
>>>>>>> 120 - that's 2 of your 2.5 minutes right there
>>>>>>
>>>>>> Ah, interesting. So, in v2 (due to autojoin, perhaps?), initdead
>>>>>> causes a
>>>>>> delay in startup, whereas in v1 mode it doesn't. Very good to 
>>>>>> know.
>>>>
>>>>> should do in both i'd have thought...
>>>>
>>>>> when are you measuring from?
>>>>
>>>> OK. More details.
>>>>
>>>> First, in both cases I am running 2.1.2-24.1.
>>>>
>>>> In the case of v1 mode, my ha.cf file looks identical to the one I
>>>> sent,
>>>> except for the fact that I specify the two nodes (no autojoin) AND
>>>> crm is off.
>>>> The haresources file has one line with the "preferred" node and 
>>>> the IP
>>>> address to manage.
>>>>
>>>> Heartbeat is started with the init script (/etc/init.d/heartbeat)
>>>> and then
>>>> another init script is run that starts my API application. In v1
>>>> mode, I
>>>> can start my API application as soon as the init script completes
>>>> and everything
>>>> works as expected.
>>>>
>>>> In v2 mode, I cannot start the API app as soon as the heartbeat init
>>>> script completes
>>>> because I get a "Cannot signon" message because my app cannot
>>>> connect to heartbeat.
>>>> Only after the election completes and the resource is "started" am I
>>>> able to connect
>>>> to heartbeat via the API, which as you pointed out is delayed by
>>>> initdead.
>>>
>>> That's really strange.
>>> In order for the election to take place a number of components have 
>>> to
>>> be signed into heartbeat... so I have no idea why your app cant.
>>> Especially since nothing the CRM does (having elections or starting
>>> resources) should influence your ability to sign in.
>>>
>>> Unless the resource is an IP and you're using it to connect to the
>>> cluster in some way?
>>
>> The resource is an IP, but I'm using signon ((hb->llc_ops->signon).
>> The heartbeat API I wrote doesn't really depend on the IP resource,
>> it just wants to monitor it
>
>in v2 mode you can't monitor the resource using the HA API... only via 
>the CIB.

Yes, right. Figured that out when my ha api call for resources failed
the first time ;-)

The CIB API appears to be in /usr/include/heartbeat/crm/cib.h, correct?
It appears that I can get notified of resource status changes via a
cib signon. I'm beginning to work on that part now.

>
>> (and a few other things) and pass messages
>> back and forth. But it is the signon that fails until everything is up
>> and running. It's not just my api, by the way, no other client can 
>> signon
>> either (e.g., cl_status).
>>
>>>
>>>>
>>>>
>>>> I am concluding that in v1 mode, since the nodes are known, there's
>>>> no need to
>>>> delay initdead time. Whereas
>>>> in v2 mode with autojoin any, the initdead wait time is consumed
>>>> because
>>>> there may be another node joining. Is that right?
>>>
>>> Its possible.  I don't know how that code works.
>>> Have you tried v2 without autojoin?
>>>
>>
>> Yes. That's what I tried to explain above. In v1 mode with specific 
>> nodes, I can signon right
>> away. In v2 with autojoin, it takes 2.5 minutes.
>
>Its still not clear to me that you've tried the third option.... "v2 
>mode with specific nodes"

Ah, sorry. Obvious oversight on my part. In v2, with the two nodes
specified in ha.cf:

node hostA
node hostB

and autojoin commented out, things are a bit different. In that case, I
can connect via the api as quickly as with v1, however the resource is
not available for ~2.5 minutes.

Any clues where I should look for that?

Thanks, again.


Scott Mann
Sr Software Engineer
Aztek Networks

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

<<winmail.dat>>

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to