On 06/10/2011 09:07 AM, Dejan Muhamedagic wrote:
>> mhm ... one problem is that i cannot distinguish between an inital
>> probe and a probe from "crm resource reprobe".
>>
>> when i do this, my current postfix ocf ra reports "not running",
> 
> Even though it is started? Well, that sounds like a problem. But
> I don't really understand. You mentioned at the beginning of the
> thread that it is this error:
> 
>> ERROR: Postfix configuration directory '/data/mail/conf' does not exist. 3
> 
> If the resource runs then the directory must be present, right?

i'll try to start all over ;)

i have two nodes a and b in a failover configuration.
the configuration resides on a shared storage which is only mounted
on the active node:

> primitive m-mail-fs ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/mail" directory="/data/mail/" 
> fstype="ext4" options="nosuid,nodev,noatime,nodiratime" \
>         op stop interval="0" timeout="60" \
>         op start interval="0" timeout="60"
> primitive m-mail-postfix ocf:ipax:postfix \
>         op monitor interval="30" timeout="30" \
>         params config_dir="/data/mail/conf/" \
>         meta target-role="Started"
> group group-mail-base m-mail-fs m-mail-postfix

at [1] you find the most current revision of my resource.

looking at VirtualDomain i got an idea bout handling the initial probe
using ocf_is_probe to determine if this is a probe or not:

if it is a probe, the checks do not generate an error (line 217ff),
and some checks aren't even run (e.g. postfix check, line 286ff).

so, in case of a probe, postfix_validate_all() will return OCF_SUCCESS.
(btw. before my changes, postfix_validate_all would return some
OCF_ERR_xxx instead)


then, i formally used to run the following check:
> LSB_STATUS_STOPPED=3
> if [ $ret -ne $OCF_SUCCESS ]; then
> case $1 in
>     stop) exit $OCF_SUCCESS ;;
>     monitor) exit $OCF_NOT_RUNNING;;
>     status) exit $LSB_STATUS_STOPPED;;
>     *) exit $ret;;
>     esac
> fi


means: if monitor/status was issued and we did not have a return
of OCF_SUCCESS, we return OCF_NOT_RUNNING (afairc, this was actually
handling the probing situation before ocf_is_probe was available).

because of my changes to postfix_validate_all() introducing
ocf_is_probe and returning OCF_SUCCESS, i do not enter this case.
this caused errors for the initial probe, so i did the following change:

> LSB_STATUS_STOPPED=3
> if [ $ret -ne $OCF_SUCCESS ] || ocf_is_probe; then
(see the new ocf_is_probe?)
> case $1 in
>     stop) exit $OCF_SUCCESS ;;
>     monitor) exit $OCF_NOT_RUNNING;;
>     status) exit $LSB_STATUS_STOPPED;;
>     *) exit $ret;;
>     esac
> fi

so we always enter this case in the event of a probe. this correctly
handles the initial probe and returns OCF_NOT_RUNNING so that pacemaker
can continue.


*but* the command "crm resource reprobe" is also considered a
ocf_is_probe. thus, this block will return a OCF_NOT_RUNNING on *every*
node. the standby node *not* running postfix (which is ok) but also
on the node which actually *is* running postfix. (and it would also
return OCF_NOT_RUNNING if postfix was started at system bootup...)

this lets the cluster believe the resource is not running and - because
of my configuration - the resource will be (re)started on the last
known location/node (which in fact is still running postfix).

i hope i managed to explain it properly. :)





one possibility to tackle this would be to have a possibility to
distinguish the initial probe from a "manual" probe.

i could also revert my probing settings and live with an error of
e.g.

> ocf_log err "Postfix configuration directory '$config_dir' does not exist or 
> is not readable."
>                 return $OCF_ERR_INSTALLED

instead of
> ocf_log info "Postfix configuration directory '$config_dir' not readable 
> during probe."

but this isn't quite what i want...


another possibility would be to rewrite/drop this probe. but i don't
quite know how to do that properly.

suggestions are welcome!

cheers,
raoul

[1]
https://github.com/raoulbhatia/resource-agents/blob/master/heartbeat/postfix
-- 
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc.          email.          r.bha...@ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OG          web.          http://www.ipax.at
Barawitzkagasse 10/2/2/11           email.            off...@ipax.at
1190 Wien                           tel.               +43 1 3670030
FN 277995t HG Wien                  fax.            +43 1 3670030 15
____________________________________________________________________
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to