Hi Lars,

Thank all of you for reviewing and making a suggestion.

I think I understand your point as the Heartbeat architecture,
but it would require re-writing the code almost all ;-)

I will discuss with my colleagues about what we can do for procd
as the next step.



Lars Marowsky-Bree <[EMAIL PROTECTED]> writes:

> On 2008-02-27T20:39:13, Keisuke MORI <[EMAIL PROTECTED]> wrote:
>
> Hi Keisuke-san,
>
> thanks for your patch and contribution. I have to apologize in the name
> of everyone for the late feedback.
>
> I really appreciate the idea of monitoring processes directly, and
> receiving async failure notifications to reduce fail-over times.
>
> I have just discussed this with Dejan and Andrew, and we think that the
> best path forward, alas necessary before inclusion, is to
>
> - Make procd independent of Pacemaker. It should talk only to the RAs
>   and the LRM.
>
> - RAs should "sign in" with it for the processes they want monitored,
>   instead of listing the processes in the procd configuration section
>   (means it gets decoupled from the CIB further). The RAs could write a
>   record to /var/run/heartbeat/procd/<resource-id>, for example. 
>   
>   The RAs would add/remove the required processes on start/promote or
>   demote/stop. (So procd itself would not need to be master-slave.)
>
>   I'm afraid that having users manually specify process lists in the CIB
>   really is not workable - the users will not be able to get this
>   right.
>
> - Instead of respawning procd, there should be a resource agent which
>   starts/stops (and monitors!) procd. You already have one, but why
>   doesn't it go into resources/OCF/ ?

We've only thought to use procd by respawning so far
and we didn't have a such RA yet.


>
> - procd should talk to the LRM to insert a "fake" failed resource
>   action, which would then cause the CRM/PE to handle the resource as
>   failed and initiate recovery. (This is not currently possible with the
>   LRM client library; you could exec crm_resource -F, which would mean
>   you no longer have a build-time dependency on the CRM.)
>
> - This would have the advantage of decoupling procd from pacemaker as
>   well as heartbeat. It could be included with the LRM/RA package build,
>   and possibly be useful with other cluster managers too.
>
> I think all that would help simplify the code.
>
>
>> +#define RSCID_LEN      128 /* ref. include/lrm/lrm_api.h */
>> +#define MAX_PID_LEN    256 /* ref. lrm/lrmd/lrmd.h */
>> +#define MAX_LISTEN_NUM 10 /* ref. lib/clplumbing/ipcsocket.c */
>
> If you're referencing from other include files, please do include the
> includes as to avoid diverging header definitions.
>

Right.


Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to