Re: [Pacemaker] configuration variants for 2 node cluster

Christine Caulfield Tue, 24 Jun 2014 01:49:26 -0700

On 24/06/14 09:36, Kostiantyn Ponomarenko wrote:

Hi Chrissie,


But wait_for_all doesn't help when there is no connection between the nodes.
Because in case I need to reboot the remaining working node I won't get
working cluster after that - both nodes will be waiting connection
between them.
That's why I am looking for the solution which could help me to get one
node working in this situation (after reboot).
I've been thinking about some kind of marker which could help a node to
determine a state of the other node.
Like external disk and SCSI reservation command. Maybe you could suggest
another kind of marker?
I am not sure can we use a presents of a file on external SSD as the
marker. Kind of: if there is a file - the other node is alive, if no -
node is dead.

More seriously, that solution is harder than it might seem - which isone reason qdiskd was as complex as it became, and why votequorum is asconservative as it is when it comes to declaring a workable cluster. Ifsomeone is there to manually reboot nodes then it might be as well for ahuman decision to be made about which one is capable of running services.


Chrissie

Digimer,

Thanks for the links and information.
Anyway if I go this way, I will write my own daemon to determine a state
of the other node.
Also the information about fence loop is new for me, thanks =)

Thank you,
Kostya


On Tue, Jun 24, 2014 at 10:55 AM, Christine Caulfield
<ccaul...@redhat.com <mailto:ccaul...@redhat.com>> wrote:

    On 23/06/14 15:49, Digimer wrote:

        Hi Kostya,

            I'm having a little trouble understanding your question, sorry.

            On boot, the node will not start anything, so after booting
        it, you
        log in, check that it can talk to the peer node (a simple ping is
        generally enough), then start the cluster. It will join the peer's
        existing cluster (even if it's a cluster on just itself).

            If you booted both nodes, say after a power outage, you will
        check
        the connection (again, a simple ping is fine) and then start the
        cluster
        on both nodes at the same time.



    wait_for_all helps with most of these situations. If a node goes
    down then it won't start services until it's seen the non-failed
    node because wait_for_all prevents a newly rebooted node from doing
    anything on its own. This also takes care of the case where both
    nodes are rebooted together of course, because that's the same as a
    new start.

    Chrissie


            If one of the nodes needs to be shut down, say for repairs or
        upgrades, you migrate the services off of it and over to the
        peer node,
        then you stop the cluster (which tells the peer that the node is
        leaving
        the cluster). After that, the remaining node operates by itself.
        When
        you turn it back on, you rejoin the cluster and migrate the
        services back.

            I think, maybe, you are looking at things more complicated
        than you
        need to. Pacemaker and corosync will handle most of this for
        you, once
        setup properly. What operating system do you plan to use, and what
        cluster stack? I suspect it will be corosync + pacemaker, which
        should
        work fine.

        digimer

        On 23/06/14 10:36 AM, Kostiantyn Ponomarenko wrote:

            Hi Digimer,

            Suppose I disabled to cluster on start up, but what about
            remaining
            node, if I need to reboot it?
            So, even in case of connection lost between these two nodes
            I need to
            have one node working and providing resources.
            How did you solve this situation?
            Should it be a separate daemon which checks somehow
            connection between
            the two nodes and decides to run corosync and pacemaker or
            to keep them
            down?

            Thank you,
            Kostya


            On Mon, Jun 23, 2014 at 4:34 PM, Digimer <li...@alteeve.ca
            <mailto:li...@alteeve.ca>
            <mailto:li...@alteeve.ca <mailto:li...@alteeve.ca>>> wrote:

                 On 23/06/14 09:11 AM, Kostiantyn Ponomarenko wrote:

                     Hi guys,
                     I want to gather all possible configuration
            variants for 2-node
                     cluster,
                     because it has a lot of pitfalls and there are not
            a lot of
                     information
                     across the internet about it. And also I have some
            questions
            about
                     configurations and their specific problems.
                     VARIANT 1:
                     -----------------
                     We can use "two_node" and "wait_for_all" option
            from Corosync's
                     votequorum, and set up fencing agents with delay on
            one of them.
                     Here is a workflow(diagram) of this configuration:
                     1. Node start.
                     2. Cluster start (Corosync and Pacemaker) at the
            boot time.
                     3. Wait for all nodes. All nodes joined?
                           No. Go to step 3.
                           Yes. Go to step 4.
                     4. Start resources.
                     5. Split brain situation (something with connection
            between
            nodes).
                     6. Fencing agent on the one of the nodes reboots
            the other node
                     (there
                     is a configured delay on one of the Fencing agents).
                     7. Rebooted node go to step 1.
                     There are two (or more?) important things in this
            configuration:
                     1. Rebooted node remains waiting for all nodes to
            be visible
                     (connection
                     should be restored).
                     2. Suppose connection problem still exists and the
            node which
                     rebooted
                     the other guy has to be rebooted also (for some
            reasons). After
                     reboot
                     he is also stuck on step 3 because of connection
            problem.
                     QUESTION:
                     -----------------
                     Is it possible somehow to assign to the guy who won
            the reboot
            race
                     (rebooted other guy) a status like a "primary" and
            allow him not
                     to wait
                     for all nodes after reboot. And neglect this status
            after
            other node
                     joined this one.
                     So is it possible?
                     Right now that's the only configuration I know for
            2 node
            cluster.
                     Other variants are very appreciated =)
                     VARIANT 2 (not implemented, just a suggestion):
                     -----------------
                     I've been thinking about using external SSD drive
            (or other
            external
                     drive). So for example fencing agent can reserve
            SSD using SCSI
                     command
                     and after that reboot the other node.
                     The main idea of this is the first node, as soon as
            a cluster
                     starts on
                     it, reserves SSD till the other node joins the
            cluster, after
                     that SCSI
                     reservation is removed.
                     1. Node start
                     2. Cluster start (Corosync and Pacemaker) at the
            boot time.
                     3. Reserve SSD. Did it manage to reserve?
                           No. Don't start resources (Wait for all).
                           Yes. Go to step 4.
                     4. Start resources.
                     5. Remove SCSI reservation when the other node has
            joined.
                     5. Split brain situation (something with connection
            between
            nodes).
                     6. Fencing agent tries to reserve SSD. Did it
            manage to reserve?
                           No. Maybe puts node in standby mode ...
                           Yes. Reboot the other node.
                     7. Optional: a single node can keep SSD reservation
            till he is
                     alone in
                     the cluster or till his shut-down.
                     I am really looking forward to find the best
            solution (or a
                     couple of
                     them =)).
                     Hope I am not the only person ho is interested in
            this topic.


                     Thank you,
                     Kostya


                 Hi Kostya,

                    I only build 2-node clusters, and I've not had
            problems with this
                 going back to 2009 over dozens of clusters. The tricks
            I found are:

                 * Disable quorum (of course)
                 * Setup good fencing, and add a delay to the node you
            you prefer (or
                 pick one at random, if equal value) to avoid dual-fences
                 * Disable to cluster on start up, to prevent fence loops.

                    That's it. With this, your 2-node cluster will be
            just fine.

                    As for your question; Once a node is fenced
            successfully, the
                 resource manager (pacemaker) will take over any
            services lost on the
                 fenced node, if that is how you configured it. A node
            the either
                 gracefully leaves or dies/fenced should not interfere
            with the
                 remaining node.

                    The problem is when a node vanishes and fencing
            fails. Then, not
                 knowing what the other node might be doing, the only
            safe option is
                 to block, otherwise you risk a split-brain. This is why
            fencing is
                 so important.

                 Cheers

                 --
                 Digimer
                 Papers and Projects: https://alteeve.ca/w/
                 What if the cure for cancer is trapped in the mind of a
            person
                 without access to education?

                 ___________________________________________________
                 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
            <mailto:Pacemaker@oss.clusterlabs.org>
                 <mailto:Pacemaker@oss.__clusterlabs.org
            <mailto:Pacemaker@oss.clusterlabs.org>>
            http://oss.clusterlabs.org/____mailman/listinfo/pacemaker
            <http://oss.clusterlabs.org/__mailman/listinfo/pacemaker>

            <http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
            <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>>

                 Project Home: http://www.clusterlabs.org
                 Getting started:
            http://www.clusterlabs.org/____doc/Cluster_from_Scratch.pdf
            <http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf>

            <http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
            <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
                 Bugs: http://bugs.clusterlabs.org




            _________________________________________________
            Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
            <mailto:Pacemaker@oss.clusterlabs.org>
            http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
            <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>

            Project Home: http://www.clusterlabs.org
            Getting started:
            http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
            <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
            Bugs: http://bugs.clusterlabs.org





    _________________________________________________
    Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
    <mailto:Pacemaker@oss.clusterlabs.org>
    http://oss.clusterlabs.org/__mailman/listinfo/pacemaker
    <http://oss.clusterlabs.org/mailman/listinfo/pacemaker>

    Project Home: http://www.clusterlabs.org
    Getting started:
    http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf
    <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
    Bugs: http://bugs.clusterlabs.org




_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] configuration variants for 2 node cluster

Reply via email to