Re: [ClusterLabs] Q: monitor and probe result codes and consequences

2016-05-12 Thread Ken Gaillot
On 05/12/2016 02:56 AM, Ulrich Windl wrote:
> Hi!
> 
> I have a question regarding an RA written by myself and pacemaker 
> 1.1.12-f47ea56 (SLES11 SP4):
> 
> During "probe" all resources' "monitor" actions are executed (regardless of 
> any ordering constraints). Therefore my RA considers a parameter as invalid 
> ("file does not exist") (the file will be provided once some supplying 
> resource is up) and returns rc=2.
> OK, this may not be optimal, but pacemaker makes it worse: It does not repeat 
> the probe once the resource would start, but keeps the state, preventing a 
> resource start:
> 
>  primitive_monitor_0 on h05 'invalid parameter' (2): call=73, 
> status=complete, exit-reason='none', last-rc-change='Wed May 11 17:03:39 
> 2016', queued=0ms, exec=82ms

Correct, OCF_ERR_CONFIGURED is a "fatal" error:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_how_are_ocf_return_codes_interpreted

> So you would say that monitor may only return "success" or "not running", but 
> I feel the RA should detect the condition that the resource could not run at 
> all at the present state.

OCF_ERR_CONFIGURED is meant to indicate that the resource could not
possibly run *as configured*, regardless of the system's current state.
So for example, a required parameter is missing or invalid.

You could possibly use OCF_ERR_ARGS in this case (a "hard" error that
bans the particular node, and means that the resource's configuration is
not valid on this particular node).

But, I suspect the right answer here is simply an order constraint
between the supplying resource and this resource. This resource's start
action, not monitor, should be the one that checks for the existence of
the supplied file.

> Shouldn't pacemaker reprobe resources before it tries to start them?

Probes are meant to check whether the resource is already active
anywhere. The decision of whether and where to start the resource takes
into account the result of the probes, so it doesn't make sense to
re-probe -- that's what the initial probe was for.

> Before my RA had passed all the ocf-tester checks, so this situation is hard 
> to test (unless you have a test cluster you can restart any time).
> 
> (After manual resource cleanup the resource started as usual)
> 
> My monitor uses the following logic:
> ---
> monitor|status)
> if validate; then
> set_variables
> check_resource || exit $OCF_NOT_RUNNING
> status=$OCF_SUCCESS
> else # cannot check status with invalid parameters
> status=$?
> fi
> exit $status
> ;;
> ---
> 
> Should I mess with ocf_is_probe?
> 
> Regards,
> Ulrich

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Q: monitor and probe result codes and consequences

2016-05-12 Thread Kristoffer Grönlund
Ulrich Windl  writes:

> Should I mess with ocf_is_probe?

That's probably the easiest way forward. Quite a few RAs have solved the
problem in exactly that way (see the vmware RA for example).

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Q: monitor and probe result codes and consequences

2016-05-12 Thread Ulrich Windl
Hi!

I have a question regarding an RA written by myself and pacemaker 
1.1.12-f47ea56 (SLES11 SP4):

During "probe" all resources' "monitor" actions are executed (regardless of any 
ordering constraints). Therefore my RA considers a parameter as invalid ("file 
does not exist") (the file will be provided once some supplying resource is up) 
and returns rc=2.
OK, this may not be optimal, but pacemaker makes it worse: It does not repeat 
the probe once the resource would start, but keeps the state, preventing a 
resource start:

 primitive_monitor_0 on h05 'invalid parameter' (2): call=73, status=complete, 
exit-reason='none', last-rc-change='Wed May 11 17:03:39 2016', queued=0ms, 
exec=82ms

So you would say that monitor may only return "success" or "not running", but I 
feel the RA should detect the condition that the resource could not run at all 
at the present state.

Shouldn't pacemaker reprobe resources before it tries to start them?

Before my RA had passed all the ocf-tester checks, so this situation is hard to 
test (unless you have a test cluster you can restart any time).

(After manual resource cleanup the resource started as usual)

My monitor uses the following logic:
---
monitor|status)
if validate; then
set_variables
check_resource || exit $OCF_NOT_RUNNING
status=$OCF_SUCCESS
else # cannot check status with invalid parameters
status=$?
fi
exit $status
;;
---

Should I mess with ocf_is_probe?

Regards,
Ulrich



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org