Re: [ClusterLabs] Resources are stopped and started when one node rejoins

Vladislav Bogdanov Mon, 28 Aug 2017 05:57:02 -0700

28.08.2017 14:03, Octavian Ciobanu wrote:

Hey Vladislav,


Thank you for the info. I've tried you suggestions but the behavior is
still the same. When an offline/standby node rejoins the cluster all the
resources are first stopped and then started. I've added the changes
I've made, see below in reply message, next to your suggestions.

Logs on DC (node where you see logs from the pengine process) shouldcontain references to pe-input-XX.bz2 files. Something like "notice:Calculated transition XXXX, saving inputs in/var/lib/pacemaker/pengine/pe-input-XX.bz2"

Locate one for which Stop actions occur.

You can replay them with 'crm_simulate -S -x/var/lib/pacemaker/pengine/pe-input-XX.bz2' to see if that is thecorrect one (look in the middle of output).


After that you may add some debugging:

PCMK_debug=yes PCMK_logfile=./pcmk.log crm_simulate -S -x/var/lib/pacemaker/pengine/pe-input-XX.bz2


That will produce a big file with all debugging messages enabled.

Try to locate a reason for restarts there.

Best,
Vladislav

Also please look inline (may be info there will be enough so you won'tneed to debug).


Once again thank you for info.

Best regards.
Octavian Ciobanu

On Sat, Aug 26, 2017 at 8:17 PM, Vladislav Bogdanov
<bub...@hoster-ok.com <mailto:bub...@hoster-ok.com>> wrote:

    26.08.2017 19 <tel:26.08.2017%2019>:36, Octavian Ciobanu wrote:

        Thank you for your reply.

        There is no reason to set location for the resources, I think,
        because
        all the resources are set with clone options so they are started
        on all
        nodes at the same time.


    You still need to colocate "upper" resources with their
    dependencies. Otherwise pacemaker will try to start them even if
    their dependencies fail. Order without colocation has very limited
    use (usually when resources may run on different nodes). For clones
    that is even more exotic.


I've added collocation

pcs constraint colocation add iSCSI1-clone with DLM-clone
pcs constraint colocation add iSCSI2-clone with DLM-clone
pcs constraint colocation add iSCSI3-clone with DLM-clone
pcs constraint colocation add Mount1-clone with iSCSI1-clone
pcs constraint colocation add Mount2-clone with iSCSI2-clone
pcs constraint colocation add Mount4-clone with iSCSI3-clone

The result is the same ... all clones are first stopped and then started
beginning with DLM resource and ending with the Mount ones.


Yep, that was not meant to fix your problem. Just to prevent future issues.



    For you original question: ensure you have interleave=true set for
    all your clones. You seem to miss it for iSCSI ones.
    interleave=false (default) is for different uses (when upper
    resources require all clone instances to be up).


Modified iSCSI resources and added interleave="true" and still no change
in behavior.

Weird... Probably you also do not need 'ordered="true"' for your DLMclone? Knowing what is DLM, it does not need ordering, its instances maybe safely started in the parallel.



    Also, just a minor note, iSCSI resources do not actually depend on
    dlm, mounts should depend on it.


I know but the mount resource must know when the iSCSI resource to whom
is connected is started so the only solution I've seen was to place DLM
before iSCSI and then Mount. If there is another solution, a proper way
to do it, please can you give a reference or a place from where to read
on how to do it ?

You would want to colocate (and order) mount with both DLM and iSCSI.Multiple colocations/orders for the same resource are allowed.For mount you need DLM running and iSCSI disk connected. But youactually do not need DLM to connect iSCSI disk (so DLM and iSCSIresources may start in the parallel).



        And when it comes to stickiness I forgot to
        mention that but it set to 200. and also I have stonith
        configured  to
        use vmware esxi.

        Best regards
        Octavian Ciobanu

        On Sat, Aug 26, 2017 at 6:16 PM, John Keates <j...@keates.nl
        <mailto:j...@keates.nl>
        <mailto:j...@keates.nl <mailto:j...@keates.nl>>> wrote:

            While I am by no means a CRM/Pacemaker expert, I only see the
            resource primitives and the order constraints. Wouldn’t you need
            location and/or colocation as well as stickiness settings to
        prevent
            this from happening? What I think it might be doing is
        seeing the
            new node, then trying to move the resources (but not finding
        it a
            suitable target) and then moving them back where they came
        from, but
            fast enough for you to only see it as a restart.

            If you crm_resource -P, it should also restart all
        resources, but
            put them in the preferred spot. If they end up in the same
        place,
            you probably didn’t put and weighing in the config or have
            stickiness set to INF.

            Kind regards,

            John Keates

                On 26 Aug 2017, at 14:23, Octavian Ciobanu
                <coctavian1...@gmail.com
            <mailto:coctavian1...@gmail.com>
            <mailto:coctavian1...@gmail.com
            <mailto:coctavian1...@gmail.com>>> wrote:

                Hello all,

                While playing with cluster configuration I noticed a strange
                behavior. If I stop/standby cluster services on one node and
                reboot it, when it joins the cluster all the resources
            that were
                started and working on active nodes get stopped and
            restarted.

                My testing configuration is based on 4 nodes. One node is a
                storage node that makes 3 iSCSI targets available for
            the other
                nodes to use,it is not configured to join cluster, and
            three nodes
                that are configured in a cluster using the following
            commands.

                pcs resource create DLM ocf:pacemaker:controld op monitor
                interval="60" on-fail="fence" clone meta clone-max="3"
                clone-node-max="1" interleave="true" ordered="true"
                pcs resource create iSCSI1 ocf:heartbeat:iscsi
                portal="10.0.0.1:3260 <http://10.0.0.1:3260>
            <http://10.0.0.1:3260/>"
                target="iqn.2017-08.example.com
            <http://iqn.2017-08.example.com>
                <http://iqn.2017-08.example.com
            <http://iqn.2017-08.example.com>>:tgt1" op start interval="0"
                timeout="20" op stop interval="0" timeout="20" op monitor
                interval="120" timeout="30" clone meta clone-max="3"
                clone-node-max="1"
                pcs resource create iSCSI2 ocf:heartbeat:iscsi
                portal="10.0.0.1:3260 <http://10.0.0.1:3260>
            <http://10.0.0.1:3260/>"
                target="iqn.2017-08.example.com
            <http://iqn.2017-08.example.com>
                <http://iqn.2017-08.example.com
            <http://iqn.2017-08.example.com>>:tgt2" op start interval="0"
                timeout="20" op stop interval="0" timeout="20" op monitor
                interval="120" timeout="30" clone meta clone-max="3"
                clone-node-max="1"
                pcs resource create iSCSI3 ocf:heartbeat:iscsi
                portal="10.0.0.1:3260 <http://10.0.0.1:3260>
            <http://10.0.0.1:3260/>"
                target="iqn.2017-08.example.com
            <http://iqn.2017-08.example.com>
                <http://iqn.2017-08.example.com
            <http://iqn.2017-08.example.com>>:tgt3" op start interval="0"

                timeout="20" op stop interval="0" timeout="20" op monitor
                interval="120" timeout="30" clone meta clone-max="3"
                clone-node-max="1"
                pcs resource create Mount1 ocf:heartbeat:Filesystem
                device="/dev/disk/by-label/MyCluster:Data1"
            directory="/mnt/data1"
                fstype="gfs2" options="noatime,nodiratime,rw" op monitor
                interval="90" on-fail="fence" clone meta clone-max="3"
                clone-node-max="1" interleave="true"
                pcs resource create Mount2 ocf:heartbeat:Filesystem
                device="/dev/disk/by-label/MyCluster:Data2"
            directory="/mnt/data2"
                fstype="gfs2" options="noatime,nodiratime,rw" op monitor
                interval="90" on-fail="fence" clone meta clone-max="3"
                clone-node-max="1" interleave="true"
                pcs resource create Mount3 ocf:heartbeat:Filesystem
                device="/dev/disk/by-label/MyCluster:Data3"
            directory="/mnt/data3"
                fstype="gfs2" options="noatime,nodiratime,rw" op monitor
                interval="90" on-fail="fence" clone meta clone-max="3"
                clone-node-max="1" interleave="true"
                pcs constraint order DLM-clone then iSCSI1-clone
                pcs constraint order DLM-clone then iSCSI2-clone
                pcs constraint order DLM-clone then iSCSI3-clone
                pcs constraint order iSCSI1-clone then Mount1-clone
                pcs constraint order iSCSI2-clone then Mount2-clone
                pcs constraint order iSCSI3-clone then Mount3-clone

                If I issue the command "pcs cluster standby node1" or
            "pcs cluster
                stop" on node 1 and after that I reboot the node. When
            the node
                gets back online (unstandby if it was put in standby
            mode) all the
                "MountX" resources get stopped on node 3 and 4 and
            started again.

                Can anyone help me figure out where and what is the
            mistake in my
                configuration as I would like to keep the started
            resources on
                active nodes (avoid stop and start of resources)?

                Thank you in advance
                Octavian Ciobanu
                _______________________________________________
                Users mailing list: Users@clusterlabs.org
            <mailto:Users@clusterlabs.org>
                <mailto:Users@clusterlabs.org
            <mailto:Users@clusterlabs.org>>
                http://lists.clusterlabs.org/mailman/listinfo/users
            <http://lists.clusterlabs.org/mailman/listinfo/users>
                <http://lists.clusterlabs.org/mailman/listinfo/users
            <http://lists.clusterlabs.org/mailman/listinfo/users>>

                Project Home: http://www.clusterlabs.org
                Getting started:
                http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
            <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
                <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
            <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
                Bugs: http://bugs.clusterlabs.org



            _______________________________________________
            Users mailing list: Users@clusterlabs.org
        <mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org
        <mailto:Users@clusterlabs.org>>
            http://lists.clusterlabs.org/mailman/listinfo/users
        <http://lists.clusterlabs.org/mailman/listinfo/users>
            <http://lists.clusterlabs.org/mailman/listinfo/users
        <http://lists.clusterlabs.org/mailman/listinfo/users>>

            Project Home: http://www.clusterlabs.org
            Getting started:
            http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
        <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
            <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
        <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
            Bugs: http://bugs.clusterlabs.org




        _______________________________________________
        Users mailing list: Users@clusterlabs.org
        <mailto:Users@clusterlabs.org>
        http://lists.clusterlabs.org/mailman/listinfo/users
        <http://lists.clusterlabs.org/mailman/listinfo/users>

        Project Home: http://www.clusterlabs.org
        Getting started:
        http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
        <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
        Bugs: http://bugs.clusterlabs.org



    _______________________________________________
    Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org>
    http://lists.clusterlabs.org/mailman/listinfo/users
    <http://lists.clusterlabs.org/mailman/listinfo/users>

    Project Home: http://www.clusterlabs.org
    Getting started:
    http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
    <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
    Bugs: http://bugs.clusterlabs.org




_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Resources are stopped and started when one node rejoins

Reply via email to