Serge, thanks for the quick response (and missing flame :)).  I've added
to the server primitive:

           <operations>
             <op id="1" name="stop"  timeout="20s"/>
             <op id="2" name="start" timeout="20s"/>
           </operations>

but still gets timeout.  At the risk of exposing my stupidity, here are
more details:

I've added to the server script some ocf_log calls, as in the first two
lines below containing (lew).  My start_server function looks as
follows:

   start_server() {
                ocf_log info "(lew) in /usr/lib/ocf/resource.d/heartbeat
server start_server function"
                instance=`echo $OCF_RESOURCE_INSTANCE`
                ocf_log info "(lew) instance = $instance"
                ocf_run /lew/server h
                ocf_log info "(lew) in /usr/lib/ocf/resource.d/heartbeat
server start_server function, return from    ocf_run"
                return $OCF_SUCCESS
   }  

Notice, the return from ocf_run is not logged below.  So, maybe I've
some Unix daemon coding issue.  But, the app is pretty trivial, like I
said.  It just forks a child and the parent calls exit(0).  


server[23250]:  2008/01/02_21:22:50 INFO: (lew) in
/usr/lib/ocf/resource.d/heartbeat server start_server function
server[23250]:  2008/01/02_21:22:50 INFO: (lew) instance = server_value2
cib[16393]: 2008/01/02_21:22:51 WARN: do_cib_notify: cib_modify of
<nvpair > FAILED: The object/attribute does not exist
cib[16393]: 2008/01/02_21:22:51 ERROR: cib_process_request: cib_modify
operation failed: The object/attribute does not exist
cib[16393]: 2008/01/02_21:22:51 info: crm_log_message_adv: #=========
Input message message start ==========#
cib[16393]: 2008/01/02_21:22:51 info: MSG: Dumping message with 20
fields
cib[16393]: 2008/01/02_21:22:51 info: MSG[0] : [t=cib]
cib[16393]: 2008/01/02_21:22:51 info: MSG[1] :
[cib_clientid=ab553d5e-f9b9-459d-88b2-0e9de0bf9e59]
cib[16393]: 2008/01/02_21:22:51 info: MSG[2] : [cib_callopt=1048576]
cib[16393]: 2008/01/02_21:22:51 info: MSG[3] : [cib_callid=153]
cib[16393]: 2008/01/02_21:22:51 info: MSG[4] : [cib_op=cib_modify]
cib[16393]: 2008/01/02_21:22:51 info: MSG[5] : [cib_section=status]
cib[16393]: 2008/01/02_21:22:51 info: MSG[6] : [cib_clientname=961]
cib[16393]: 2008/01/02_21:22:51 info: MSG[7] :
[(5)cib_calldata=0x806a490(114 136)]
cib[16393]: 2008/01/02_21:22:51 info:  <nvpair
id="status-41b0e7f1-55ca-472e-8ea0-f7acb9e99613-pingd" name="pingd"
value="0"/>
cib[16393]: 2008/01/02_21:22:51 info: MSG[8] :
[cib_delegated_from=c001n01]
cib[16393]: 2008/01/02_21:22:51 info: MSG[9] : [from_id=cib]
cib[16393]: 2008/01/02_21:22:51 info: MSG[10] : [to_id=cib]
cib[16393]: 2008/01/02_21:22:51 info: MSG[11] : [client_gen=5]
cib[16393]: 2008/01/02_21:22:51 info: MSG[12] : [src=c001n01]
cib[16393]: 2008/01/02_21:22:51 info: MSG[13] : [(1)srcuuid=0x807f408(36
27)]
cib[16393]: 2008/01/02_21:22:51 info: MSG[14] : [seq=2c9f1]
cib[16393]: 2008/01/02_21:22:51 info: MSG[15] : [hg=47684a1b]
cib[16393]: 2008/01/02_21:22:51 info: MSG[16] : [ts=477c00ab]
cib[16393]: 2008/01/02_21:22:51 info: MSG[17] : [ld=0.24 0.28 0.21 5/157
19634]
cib[16393]: 2008/01/02_21:22:51 info: MSG[18] : [ttl=4]
cib[16393]: 2008/01/02_21:22:51 info: MSG[19] :
[_compression_algorithm=zlib]
tengine[18635]: 2008/01/02_21:23:10 WARN: action_timer_callback: Timer
popped (abort_level=1000000, complete=false)
tengine[18635]: 2008/01/02_21:23:10 WARN: print_elem: Action missed its
timeout[Action 5]: In-flight (id: server2_start_0, loc: c001n02,
priority: 0)
lrmd[16394]: 2008/01/02_21:23:10 WARN: server2:start process (PID 23238)
timed out (try 1).  Killing with signal SIGTERM (15).

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Serge
Dubrouski
Sent: Wednesday, January 02, 2008 4:25 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] daemon timeout trying to use ocf to startup

Looks like your OCF "server" script wasn't able to start server in a
given time.

On Jan 2, 2008 1:47 PM,  <[EMAIL PROTECTED]> wrote:
> Well you all seem like a friendly enough bunch as I lurk about the
list,
> so here goes...
> I've read some fine Linux-HA (V2) tutorials and have begun
experimenting
> with Linux-HA on a 2 node setup.  Installation of heartbeat went well
> and I even glimpsed ip failover in action.  Now I am attempting to use
> ocf to launch a test daemon of mine, by mimicing the apache scripts
> d/l'd with V2.
>
> >From log debug o/p, I see ocf_run launching my daemon, but then see a
> timeout in ha-log:
>
>    tengine[23602]: 2008/01/02_17:56:46 WARN: print_elem: Action missed
> its timeout[Action 6]: In-flight (id: server1_start_0, loc: c001n02,
> priority: 0)
>
> I'll be the first to admit, this app is not production qual.  It
simply
> forks a child that sits and waits on a listen.  I could not imagine
> there was more needed for this experiment but maybe there is.
>
> o/p from crm_verify also reveals:
>
>    crm_verify[26701]: 2008/01/02_19:40:11 WARN: unpack_rsc_op:
> Processing failed op server2_start_0 on c001n02: Timed Out
>    crm_verify[26701]: 2008/01/02_19:40:11 WARN: unpack_rsc_op:
> Compatability handling for failed op server2_start_0 on c001n02
>    crm_verify[26701]: 2008/01/02_19:40:11 WARN: native_color: Resource
> server2 cannot run anywhere
>
> Here is the related portion of the cib.xml:
>
>          <primitive id="server2" class="heartbeat" type="server">
>            <instance_attributes id="ia2_s2">
>              <attributes>
>                <nvpair id="s2" name="1" value="value2"/>
>              </attributes>
>            </instance_attributes>
>          </primitive>
> .
> .
> .
>      <constraints>
>        <rsc_location id="run_ip_resource_2" rsc="server2">
>          <rule id="pref_run_ip_resource_2" score="100">
>            <expression id="e2" attribute="#hostname" operation="eq"
> value="c001n02"/>
>          </rule>
>        </rsc_location>
>      </constraints>
>
>
> The type "server", as I alluded, mimics the apache scripts provided in
> the heartbeat d/l.  I launch the app using ocf_run, via the server
> script deposited in /usr/lib/ocf/resource.d/heartbeat/server.
>
> Can someone give me a clue why the timeout occurs ?
>
> Thanks alot,
> lew
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



-- 
Serge Dubrouski.
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to