Hi Keisuke-san,

Reducing the list of recipients to linux-ha-dev, since this
isn't about pacemaker.

On Tue, Mar 16, 2010 at 05:26:14PM +0900, Keisuke MORI wrote:
> Hi,
> 
> Sorry for a bit long mail.
> I'm going to describe the issue of the Subject: and would like to
> suggest some changes to the agents package (and possibly Pacemaker, too).

Which part of pacemaker? As far as I can see only resource agents
and init scripts (heartbeat/corosync/openais) are involved.

> I would be grad if you could give me your thought and comments.
> 
> 
> 
> A pseudo RA which creates a stat file under HA_RSCTMP
> (/var/run/heartbeat/rsctmp), such as Dummy, MailTo, etc. do not
> work properly on the Pacemaker+Corosync stack.
> 
> When a node crashed and was rebooted, a stale stat file is
> left over the reboot and hence the RA misbehaves as if the
> resource was already started when the cluster is launched again
> for the recovery.

What exactly did you observe, i.e. in which way resource agents
misbehaved.

> This problem does not occur on Heartbeat stack because
> Heartbeat removes HA_RSCTMP when its startup,
> while on Pacemaker stack none of Pacemaker/Corosync removes it.
> 
> But removing them by Pacemaker does not seem to be correct -
> if they were removed at the cluster startup time then the
> maintenance mode would no longer work properly.
> 
> In my understanding, the "correct" behavior is:
>  - They should NOT be removed at the cluster startup time.
>  - They should be removed at the OS bootup time.

All true. Just a small correction: corosync/openais init scripts
are not part of Pacemaker.

It is indeed more correct to remove those files on boot, because
of the maintenance mode. Though in case the cluster stack on the
node was not stopped properly, either because it crashed or the
administrator lost patience and used kill -9, the state files
should perhaps be removed. So, there doesn't seem to be the right
way to do this.

> My suggestion to address this issue is, to fix as the following;
> 
>  - 1) change the HA_RSCTMP location to /var/run/resource-agents,
>       or wherever a subdirectory right under /var/run.
>  - 2) having the directory permission as 01777 (with sticky bit)
>  - 3) change IPaddr/SendArp RA not to use its own subdirectory
>       but instead, add a prefix for the filename.
>  - 4) make /var/run/heartbeat/rsctmp as obsolete;
>       Heartbeat/Pacemaker could preserve the current behavior
>       for a while for the compatibility.
> 
> 
> The basic idea of the changes is that, we're now going to follow the
> file removal procedure defined by FHS(Filesystem Hierarchy Standard).
> 
> http://www.pathname.com/fhs/pub/fhs-2.3.html#VARRUNRUNTIMEVARIABLEDATA
> 
> FHS defines that any files under a subdirectory of /var/run
> should be removed at the OS bootup time.

I hope that the standard is really followed.

> Unfortunately the second level subdirectory is out of the scope and
> you can not rely on the removal (and that's the case of
> /var/run/heartbeat/rsctmp).

OK. Yes, the scheme you suggest is probably better than what we
currently have.

> I believe that the impacts for existing RAs are minimum.
> If your RA is implemented "correctly" then you need to do nothing -
> just notice that the location of the stat file is changed.
> 
> If your RA has hardcoded /var/run/heartbeat/rsctmp, or it
> creates its own subdirectory, it is encouraged to fix because it
> may not work well with the maintenance mode, but you can
> continue to use the old rsctmp if you would like.
> 
> 
> I would like to hear your thought and comments.

Cheers,

Dejan

> 
> Regards,
> -- 
> Keisuke MORI
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to