Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Jan Friesse Wed, 19 Oct 2016 23:49:39 -0700


On 10/14/2016 11:21 AM, renayama19661...@ybb.ne.jp wrote:

Hi Klaus,
Hi All,


I tried prototype of watchdog using WD service.
  - 
https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9

Please comment.

Thank you Hideo for providing the prototype.
Added the patch to my build and it seems to
be working as expected.

A few thoughts triggered by this approach:

- we have to alert the corosync-people as in
   a chat with Jan Friesse he pointed me to the
   fact that for corosync 3.x the wd-service was
   planned to be removed

Actually I didn't express myself correctly. What I wanted to say was"I'm considering idea of removing it", simply because it's disabled indownstream.

BUT keep in mind that removing functionality = ask community to find outif there is not somebody actively using it.

And because there is active users and future use case, removing of wd isnot an option.


   especially delicate as the binding is very loose
   so that - as is - it builds against a corosync with
   disabled wd-service without any complaints...

- as of now if you enable wd-service in the
   corosync-build it is on by default and would
   be hogging the watchdog presumably
   (there is obviously a pull request that makes
   it default to off)

- with my thoughts about adding an API to
   sbd previously in the thread I was trying to
   target closer observation of pacemaker_remoted
   as well (remote-nodes don't have corosync
   running)

   I guess it would be possible to run corosync
   with a static config as single-node cluster
   bound to localhost for that purpose.

   I read the thread about corosync-remote and
   that happening might make the special-handling
   for pacemaker-remote obsolete anyway ...

- to enable the approach to live alongside
   sbd it would be possible to make sbd use
   the corosync-API as well for watchdog purposes
   instead of opening the watchdog directly

   This shouldn't be a big deal for sbd used to
   observe a pacemaker-node as cluster-watcher
   (the part of sbd that sends cpg-pings to corosync)
   already builds against corosync.
   The blockdevice-part of sbd being basically
   generic it might be an issue though.

Regards,
Klaus



Best Regards,
Hideo Yamauchi.


----- Original Message -----

From: "renayama19661...@ybb.ne.jp" <renayama19661...@ybb.ne.jp>
To: "users@clusterlabs.org" <users@clusterlabs.org>
Cc:
Date: 2016/10/11, Tue 17:58
Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
frozen, cluster decisions are delayed infinitely

Hi Klaus,

Thank you for comment.

I make the patch which is prototype using WD service.

Please wait a little.

Best Regards,
Hideo Yamauchi.




----- Original Message -----

  From: Klaus Wenninger <kwenn...@redhat.com>
  To: users@clusterlabs.org
  Cc:
  Date: 2016/10/10, Mon 21:03
  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd

is frozen, cluster decisions are delayed infinitely

  On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp wrote:

   Hi All,

   Our user may not necessarily use sdb.

   I confirmed that there was a method using WD service of corosync as

one

  method not to use sdb.

   Pacemaker watches the process of pacemaker by WD service using CMAP

and can

  carry out watchdog.

  Have to have a look at that...
  But if we establish some in-between-layer in pacemaker we could have this
  as one of the possibilities besides e.g. sbd (with enhanced API), going for
  a watchdog-device directly, ...


   We can set up a patch of pacemaker.

  Always helpful to discuss/clarify an idea once some code is available ...

   Was the discussion of using WD service over so far?

  Not from my pov. Just a day off ;-)


   Best Regard,
   Hideo Yamauchi.


   ----- Original Message -----

   From: Klaus Wenninger <kwenn...@redhat.com>
   To: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>;

  users@clusterlabs.org

   Cc:
   Date: 2016/10/7, Fri 17:47
   Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the

DC

  crmd is frozen, cluster decisions are delayed infinitely

   On 10/07/2016 08:14 AM, Ulrich Windl wrote:

    Klaus Wenninger <kwenn...@redhat.com>

schrieb am

   06.10.2016 um 18:03 in

    Nachricht

<3980cfdd-ebd9-1597-f6bd-a1ca808f7...@redhat.com>:

    On 10/05/2016 04:22 PM, renayama19661...@ybb.ne.jp wrote:

    Hi All,

    If a user uses sbd, can the cluster evade a

  problem of

   SIGSTOP of crmd?


    As pointed out earlier, maybe crmd should feed a

  watchdog. Then

   stopping

    crmd

    will reboot the node (unless the watchdog fails).

    Thank you for comment.

    We examine watchdog of crmd, too.
    In addition, I comment after examination advanced.

    Was thinking of doing a small test implementation going
    a little in the direction Lars Ellenberg had been

pointing

  out.

    a couple of thoughts I had so far:

    - add an API (via DBus or libqb - favoring libqb atm) to

sbd

      an application can use to create a watchdog within sbd

    Why has it to be done within sbd?

   Not necessarily, could be spawned out as well into an own project

or

   something already existent could be taken.
   Remember to have added a dbus-interface to
   https://sourceforge.net/projects/watchdog/ for a project once.
   If you have a suggestion I'm open.
   Going off sbd would have the advantage of a smooth start:

   - cluster/pacemaker-watcher are there already and can
     be replaced/moved over time
   - the lifecycle of the daemon (when started/stopped) is
     already something that is in the code and in the people's

minds

    - parameters for the first are a name and a timeout

    - first use-case would be crmd observation

    - later on we could think of removing pacemaker

dependencies

      from sbd by moving the actual implementation of
      pacemaker-watcher and probably cluster-watcher as well
      into pacemaker - using the new API

    - this of course creates sbd dependency within pacemaker

so

      that it would make sense to offer a simpler and

  self-contained

      implementation within pacemaker as an alternative

    I think the watchdog interface is so simple that you

don't

  need a relay

   for it. The only limit I can imagine is the number of watchdogs

  available of

   some specific hardware.
   That is the point ;-)

      thus it would be favorable to have the dependency
      within a non-compulsory pacemaker-rpm so that
      we can offer an alternative that doesn't use sbd
      at maybe the cost of being less reliable or one
      that owns a hardware-watchdog by itself for systems
      where this is still unused.

      - e.g. via some kind of plugin (Andrew forgive me -
                                                       no

pils ;-)

      - or via an additional daemon

    What did you have in mind?
    Maybe it makes sense to synchronize...

    Regards,
    Klaus

    Best Regards,
    Hideo Yamauchi.



    ----- Original Message -----

    From: Ulrich Windl

  <ulrich.wi...@rz.uni-regensburg.de>

    To: users@clusterlabs.org;

renayama19661...@ybb.ne.jp

    Cc:
    Date: 2016/10/5, Wed 23:08
    Subject: Antw: Re: [ClusterLabs] Antw: Re: When

the DC

  crmd is

   frozen,

    cluster decisions are delayed infinitely

     <renayama19661...@ybb.ne.jp>

  schrieb am

   21.09.2016 um 11:52

    in Nachricht

  <876439.61305...@web200311.mail.ssk.yahoo.co.jp>:

     Hi All,

     Was the final conclusion given about this

  problem?

     If a user uses sbd, can the cluster evade a

  problem of

   SIGSTOP of crmd?

    As pointed out earlier, maybe crmd should feed a

  watchdog. Then

   stopping

    crmd

    will reboot the node (unless the watchdog fails).

     We are interested in this problem, too.

     Best Regards,

     Hideo Yamauchi.

_______________________________________________

     Users mailing list: Users@clusterlabs.org
    http://clusterlabs.org/mailman/listinfo/users
     Project Home: http://www.clusterlabs.org
     Getting started:

   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

     Bugs: http://bugs.clusterlabs.org

    _______________________________________________
    Users mailing list: Users@clusterlabs.org
    http://clusterlabs.org/mailman/listinfo/users

    Project Home: http://www.clusterlabs.org
    Getting started:

   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

    Bugs: http://bugs.clusterlabs.org

    _______________________________________________
    Users mailing list: Users@clusterlabs.org
    http://clusterlabs.org/mailman/listinfo/users

    Project Home: http://www.clusterlabs.org
    Getting started:

   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

    Bugs: http://bugs.clusterlabs.org

   _______________________________________________
   Users mailing list: Users@clusterlabs.org
   http://clusterlabs.org/mailman/listinfo/users

   Project Home: http://www.clusterlabs.org
   Getting started:

  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

   Bugs: http://bugs.clusterlabs.org

   _______________________________________________
   Users mailing list: Users@clusterlabs.org
   http://clusterlabs.org/mailman/listinfo/users

   Project Home: http://www.clusterlabs.org
   Getting started:

http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

   Bugs: http://bugs.clusterlabs.org



  _______________________________________________
  Users mailing list: Users@clusterlabs.org
  http://clusterlabs.org/mailman/listinfo/users

  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

Reply via email to