Re: [Pacemaker] metadata (timeout) ignored?

2010-01-21 Thread Andrew Beekhof
On Thu, Jan 21, 2010 at 11:00 AM, Dejan Muhamedagic  wrote:

> Also, if you use crm
> shell it will print warnings in case the timeouts are smaller
> than what's advised.

Oh! Neat :-)

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] metadata (timeout) ignored?

2010-01-21 Thread Dejan Muhamedagic
Hi,

On Thu, Jan 21, 2010 at 10:18:09AM +0100, Markus M. wrote:
> Hello,
> 
> Dejan Muhamedagic wrote:
> 
> >> returning the value of 100 seconds for the stop action? Is there
> >> another place to set the timeout for the stop action of this ra?
> >Yes, in the cluster configuration. Like this:
> 
> Thank you, i see, and it works now!
> 
> This was really a RTFM question, sorry. But i wonder what is the
> intention of the ocf resource agent "meta-data" action if the
> returned output seems not to be used anywhere?

It's minimum values advised by the author of the resource agent.
Obviously they can't fit all resources. Also, if you use crm
shell it will print warnings in case the timeouts are smaller
than what's advised.

Thanks,

Dejan

> With kind regards
> Markus
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] metadata (timeout) ignored?

2010-01-21 Thread Andrew Beekhof
On Thu, Jan 21, 2010 at 10:18 AM, Markus M.  wrote:
> Hello,
>
> Dejan Muhamedagic wrote:
>
>>> returning the value of 100 seconds for the stop action? Is there
>>> another place to set the timeout for the stop action of this ra?
>>Yes, in the cluster configuration. Like this:
>
> Thank you, i see, and it works now!
>
> This was really a RTFM question, sorry. But i wonder what is the intention
> of the ocf resource agent "meta-data" action if the returned output seems
> not to be used anywhere?


Hints to GUIs.
Ie. they could in theory preset things like timeouts when you create a
monitor operation

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] metadata (timeout) ignored?

2010-01-21 Thread Markus M.

Hello,

Dejan Muhamedagic wrote:

>> returning the value of 100 seconds for the stop action? Is there
>> another place to set the timeout for the stop action of this ra?
>Yes, in the cluster configuration. Like this:

Thank you, i see, and it works now!

This was really a RTFM question, sorry. But i wonder what is the 
intention of the ocf resource agent "meta-data" action if the returned 
output seems not to be used anywhere?


With kind regards
Markus

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] metadata (timeout) ignored?

2010-01-20 Thread Dejan Muhamedagic
Hi,

On Wed, Jan 20, 2010 at 09:45:46PM +0100, Markus M. wrote:
> Dejan Muhamedagic wrote:
> 
> >>Operations' defaults (advisory minimum):
> >>
> >>stop timeout=100
> >>
> >>So it seems for the "stop" action there is a timeout of 100 seconds
> >>defined. But at cluster shutdown i can see this in the ha-debug log:
> >
> >It says above that it's "advisory minimum" (the wording should
> >probably be changed). You have to set the timeouts yourself.
> 
> Sorry, maybe i've misunderstood something... i thought _i've set the
> timeout_ by making the ocf resource agent meta-data function
> returning the value of 100 seconds for the stop action? Is there
> another place to set the timeout for the stop action of this ra?

Yes, in the cluster configuration. Like this:

primitive rsc_c001n07 ocf:heartbeat:IPaddr \
params ip="127.0.0.16" cidr_netmask="32" \
op stop timeout="100s"

Thanks,

Dejan

> The timeout is occuring after 20 seconds:
> 
> >>Jan 18 14:31:35 node1 crmd: [12844]: info: te_rsc_command:
> >>Initiating action 5: stop oracle_primary_stop_0 on node1 (local)
> ...
> >>Jan 18 14:31:55 node1 lrmd: [12841]: WARN: oracle_primary:stop
> >>process (PID 14386) timed out (try 1).  Killing with signal SIGTERM
> >>(15).
> 
> Regards
> Markus
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] metadata (timeout) ignored?

2010-01-20 Thread Markus M.

Dejan Muhamedagic wrote:


Operations' defaults (advisory minimum):

>>

stop timeout=100

So it seems for the "stop" action there is a timeout of 100 seconds
defined. But at cluster shutdown i can see this in the ha-debug log:


It says above that it's "advisory minimum" (the wording should
probably be changed). You have to set the timeouts yourself.


Sorry, maybe i've misunderstood something... i thought _i've set the 
timeout_ by making the ocf resource agent meta-data function returning 
the value of 100 seconds for the stop action? Is there another place to 
set the timeout for the stop action of this ra?


The timeout is occuring after 20 seconds:


Jan 18 14:31:35 node1 crmd: [12844]: info: te_rsc_command:
Initiating action 5: stop oracle_primary_stop_0 on node1 (local)

...

Jan 18 14:31:55 node1 lrmd: [12841]: WARN: oracle_primary:stop
process (PID 14386) timed out (try 1).  Killing with signal SIGTERM
(15).


Regards
Markus

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] metadata (timeout) ignored?

2010-01-20 Thread Dejan Muhamedagic
Hi,

On Wed, Jan 20, 2010 at 04:28:49PM +0100, Markus M. wrote:
> Hello,
> 
> i've a question about metadata returned by an ocf resource agent
> using the "meta-data" command and the behaviour of the cluster.
> 
> When checking the resource agent's metadata using crm i get this:
> 
> # crm
> crm(live)# ra
> crm(live)ra#  meta cluster_oracle ocf
> bla (ocf:heartbeat:cluster_oracle)
> 
> Master/Slave OCF Resource Agent for Oracle (clustered)
> 
> Parameters (* denotes required, [] the default):
> 
> oracle_role* (string): Ora role
> Required to assign the Oracle role. Must be "master" or "slave"
> 
> Operations' defaults (advisory minimum):
> 
> starttimeout=240
> promote  timeout=90
> demote   timeout=90
> notify   timeout=90
> stop timeout=100
> monitor  timeout=20 interval=20 depth=0
> monitor  timeout=20 interval=10 depth=0
> 
> So it seems for the "stop" action there is a timeout of 100 seconds
> defined. But at cluster shutdown i can see this in the ha-debug log:

It says above that it's "advisory minimum" (the wording should
probably be changed). You have to set the timeouts yourself.

Thanks,

Dejan

> Jan 18 14:31:35 node1 crmd: [12844]: info: te_rsc_command:
> Initiating action 5: stop oracle_primary_stop_0 on node1 (local)
> Jan 18 14:31:35 node11 pengine: [12848]: notice: LogActions: Leave
> resource oracle_secondary  (Stopped)
> Jan 18 14:31:35 node1 lrmd: [12841]: info: rsc:oracle_primary:7: stop
> Jan 18 14:31:35 node1 crmd: [12844]: info: do_lrm_rsc_op: Performing
> key=5:10:0:40ea1f42-c929-40d6-a0ed-569a7c8944bc
> op=oracle_primary_stop_0 )
> Jan 18 14:31:35 node1 lrmd: [12841]: info: RA output:
> (oracle_primary:stop:stderr)
> /usr/lib/ocf/resource.d//heartbeat/cluster_oracle[247]:
> Jan 18 14:31:35 node1 pengine: [12848]: WARN: process_pe_message:
> Transition 10: WARNINGs found during PE processing. PEngine Input
> stored in: /var/lib/pengine/pe-warn-2220.bz2
> Jan 18 14:31:35 node1 pengine: [12848]: info: process_pe_message:
> Configuration WARNINGs found during PE processing.  Please run
> "crm_verify -L" to identify issues.
> Jan 18 14:31:55 node1 lrmd: [12841]: WARN: oracle_primary:stop
> process (PID 14386) timed out (try 1).  Killing with signal SIGTERM
> (15).
> Jan 18 14:31:55 node1 lrmd: [12841]: info: RA output:
> (oracle_primary:stop:stderr)
> Session terminated, killing shell...
> Jan 18 14:31:57 node1 lrmd: [12841]: info: RA output:
> (oracle_primary:stop:stderr)  ...killed.
> 
> Apparently a timeout occured at the stop action after 20 seconds.
> But why, if the resource defined 100 secs?
> 
> With kind regards
> Markus
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] metadata (timeout) ignored?

2010-01-20 Thread Markus M.

Hello,

i've a question about metadata returned by an ocf resource agent using 
the "meta-data" command and the behaviour of the cluster.


When checking the resource agent's metadata using crm i get this:

# crm
crm(live)# ra
crm(live)ra#  meta cluster_oracle ocf
bla (ocf:heartbeat:cluster_oracle)

Master/Slave OCF Resource Agent for Oracle (clustered)

Parameters (* denotes required, [] the default):

oracle_role* (string): Ora role
Required to assign the Oracle role. Must be "master" or "slave"

Operations' defaults (advisory minimum):

starttimeout=240
promote  timeout=90
demote   timeout=90
notify   timeout=90
stop timeout=100
monitor  timeout=20 interval=20 depth=0
monitor  timeout=20 interval=10 depth=0

So it seems for the "stop" action there is a timeout of 100 seconds 
defined. But at cluster shutdown i can see this in the ha-debug log:


...
Jan 18 14:31:35 node1 crmd: [12844]: info: te_rsc_command: Initiating 
action 5: stop oracle_primary_stop_0 on node1 (local)
Jan 18 14:31:35 node11 pengine: [12848]: notice: LogActions: Leave 
resource oracle_secondary  (Stopped)

Jan 18 14:31:35 node1 lrmd: [12841]: info: rsc:oracle_primary:7: stop
Jan 18 14:31:35 node1 crmd: [12844]: info: do_lrm_rsc_op: Performing 
key=5:10:0:40ea1f42-c929-40d6-a0ed-569a7c8944bc op=oracle_primary_stop_0 )
Jan 18 14:31:35 node1 lrmd: [12841]: info: RA output: 
(oracle_primary:stop:stderr) 
/usr/lib/ocf/resource.d//heartbeat/cluster_oracle[247]:
Jan 18 14:31:35 node1 pengine: [12848]: WARN: process_pe_message: 
Transition 10: WARNINGs found during PE processing. PEngine Input stored 
in: /var/lib/pengine/pe-warn-2220.bz2
Jan 18 14:31:35 node1 pengine: [12848]: info: process_pe_message: 
Configuration WARNINGs found during PE processing.  Please run 
"crm_verify -L" to identify issues.
Jan 18 14:31:55 node1 lrmd: [12841]: WARN: oracle_primary:stop process 
(PID 14386) timed out (try 1).  Killing with signal SIGTERM (15).
Jan 18 14:31:55 node1 lrmd: [12841]: info: RA output: 
(oracle_primary:stop:stderr)

Session terminated, killing shell...
Jan 18 14:31:57 node1 lrmd: [12841]: info: RA output: 
(oracle_primary:stop:stderr)  ...killed.


Apparently a timeout occured at the stop action after 20 seconds. But 
why, if the resource defined 100 secs?


With kind regards
Markus

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker