Re: [Pacemaker] bug in monitor timeout?

Dejan Muhamedagic Thu, 04 Oct 2012 01:41:44 -0700

Hi,

On Wed, Oct 03, 2012 at 10:07:06PM +0000, James Harper wrote:
> It seems like everytime I modify a resource, things start timing out. Just 
> now I changed the location of where a ping resource could run and this 
> happened:
> Oct  4 07:07:07 bitvs5 lrmd: [3681]: WARN: perform_ra_op: the operation 
> monitor[52] on p_lvm_iscsi:0 for client 3686 stayed in operation list for 
> 22000 ms (longer than 10000 ms)


That's interesting. Normally such a change should result in just
a few operations. Did you take a look at the transition which
resulted from this change?

> Another oddity is that the resource for p_lvm_iscsi is defined as:
> 
> primitive p_lvm_iscsi ocf:heartbeat:LVM \
>         params volgrpname="vg-drbd" \
>         op start interval="0" timeout="30s" \
>         op stop interval="0" timeout="30s" \
>         op monitor interval="10s" timeout="30s"
> 
> so I don't know where the timeout of 10000ms is coming from??
> 
> When I change something with crm configure the cib process shoots up to 100% 
> CPU and stays there for a while, and the node becomes more-or-less 
> unresponsive, which may go some way to explaining why things time out. Is 
> this normal? It doesn't explain why lrmd complains that something took longer 
> than 10s when I set the timeout to 30s though, unless the interval somehow 
> interacts with that?

Ten seconds is an ad-hoc time and has nothing to do with specific
timeouts. lrmd logs a warning if an operation stays in the queue
for longer than that. How many resources do you have? You can
also increase max-children (a lrmd parameter), which is a number
of operations that lrmd is allowed to run concurrently (lrmadmin
-p max-children n, by default it's set to 4).

> Versions of software are all from Debian Wheezy:
> corosync 1.4.2-3
> pacemaker 1.1.7-1

I'd suggest to open a bugzilla and include hb_report (or
crm_report, whatever your distribution ships).

Thanks,

Dejan

> thanks
> 
> James
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] bug in monitor timeout?

Reply via email to