Hi, On Tue, Apr 08, 2008 at 02:45:12PM +0200, Lars Marowsky-Bree wrote: > On 2008-02-27T20:39:13, Keisuke MORI <[EMAIL PROTECTED]> wrote: > > Hi Keisuke-san, > > thanks for your patch and contribution. I have to apologize in the name > of everyone for the late feedback. > > I really appreciate the idea of monitoring processes directly, and > receiving async failure notifications to reduce fail-over times. > > I have just discussed this with Dejan and Andrew, and we think that the > best path forward, alas necessary before inclusion, is to > > - Make procd independent of Pacemaker. It should talk only to the RAs > and the LRM. > > - RAs should "sign in" with it for the processes they want monitored, > instead of listing the processes in the procd configuration section > (means it gets decoupled from the CIB further). The RAs could write a > record to /var/run/heartbeat/procd/<resource-id>, for example. > > The RAs would add/remove the required processes on start/promote or > demote/stop. (So procd itself would not need to be master-slave.) > > I'm afraid that having users manually specify process lists in the CIB > really is not workable - the users will not be able to get this > right. > > - Instead of respawning procd, there should be a resource agent which > starts/stops (and monitors!) procd. You already have one, but why > doesn't it go into resources/OCF/ ? > > - procd should talk to the LRM to insert a "fake" failed resource > action, which would then cause the CRM/PE to handle the resource as > failed and initiate recovery. (This is not currently possible with the > LRM client library; you could exec crm_resource -F, which would mean > you no longer have a build-time dependency on the CRM.)
This is going to be implemented in the LRM. Please see (and subscribe) here: http://developerbugs.linux-foundation.org/show_bug.cgi?id=1872 Cheers, Dejan > - This would have the advantage of decoupling procd from pacemaker as > well as heartbeat. It could be included with the LRM/RA package build, > and possibly be useful with other cluster managers too. > > I think all that would help simplify the code. > > > > +#define RSCID_LEN 128 /* ref. include/lrm/lrm_api.h */ > > +#define MAX_PID_LEN 256 /* ref. lrm/lrmd/lrmd.h */ > > +#define MAX_LISTEN_NUM 10 /* ref. lib/clplumbing/ipcsocket.c */ > > If you're referencing from other include files, please do include the > includes as to avoid diverging header definitions. > > > Regards, > Lars > > -- > Teamlead Kernel, SuSE Labs, Research and Development > SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > _______________________________________________________ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/