On Tue, 2017-08-15 at 08:42 +0200, Jan Friesse wrote: > Ken Gaillot napsal(a): > > On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote: > >> On Wed, 2017-08-02 at 09:59 +0000, 井上 和徳 wrote: > >>> Hi, > >>> > >>> In Pacemaker-1.1.17, the attribute updated while starting pacemaker is > >>> not displayed in crm_mon. > >>> In Pacemaker-1.1.16, it is displayed and results are different. > >>> > >>> https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d > >>> This commit is the cause, but the following result (3.) is expected > >>> behavior? > >> > >> This turned out to be an odd one. The sequence of events is: > >> > >> 1. When the node leaves the cluster, the DC (correctly) wipes all its > >> transient attributes from attrd and the CIB. > >> > >> 2. Pacemaker is newly started on the node, and a transient attribute is > >> set before the node joins the cluster. > >> > >> 3. The node joins the cluster, and its transient attributes (including > >> the new value) are sync'ed with the rest of the cluster, in both attrd > >> and the CIB. So far, so good. > >> > >> 4. Because this is the node's first join since its crmd started, its > >> crmd wipes all of its transient attributes again. The idea is that the > >> node may have restarted so quickly that the DC hasn't yet done it (step > >> 1 here), so clear them now to avoid any problems with old values. > >> However, the crmd wipes only the CIB -- not attrd (arguably a bug). > > > > Whoops, clarification: the node may have restarted so quickly that > > corosync didn't notice it left, so the DC would never have gotten the > > Corosync always notice left of node no matter if left is longer or > within token timeout.
Looking back at the original commit, it has a comment "OpenAIS has a nasty habit of not being able to tell if a node is returning or didn't leave in the first place", so it looks like it's only relevant on legacy stacks. > > > "peer lost" message that triggers wiping its transient attributes. > > > > I suspect the crmd wipes only the CIB in this case because we assumed > > attrd would be empty at this point -- missing exactly this case where a > > value was set between start-up and first join. > > > >> 5. With the older pacemaker version, both the joining node and the DC > >> would request a full write-out of all values from attrd. Because step 4 > >> only wiped the CIB, this ends up restoring the new value. With the newer > >> pacemaker version, this step is no longer done, so the value winds up > >> staying in attrd but not in CIB (until the next write-out naturally > >> occurs). > >> > >> I don't have a solution yet, but step 4 is clearly the problem (rather > >> than the new code that skips step 5, which is still a good idea > >> performance-wise). I'll keep working on it. > >> > >>> [test case] > >>> 1. Start pacemaker on two nodes at the same time and update the attribute > >>> during startup. > >>> In this case, the attribute is displayed in crm_mon. > >>> > >>> [root@node1 ~]# ssh -f node1 'systemctl start pacemaker ; > >>> attrd_updater -n KEY -U V-1' ; \ > >>> ssh -f node3 'systemctl start pacemaker ; > >>> attrd_updater -n KEY -U V-3' > >>> [root@node1 ~]# crm_mon -QA1 > >>> Stack: corosync > >>> Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with > >>> quorum > >>> > >>> 2 nodes configured > >>> 0 resources configured > >>> > >>> Online: [ node1 node3 ] > >>> > >>> No active resources > >>> > >>> > >>> Node Attributes: > >>> * Node node1: > >>> + KEY : V-1 > >>> * Node node3: > >>> + KEY : V-3 > >>> > >>> > >>> 2. Restart pacemaker on node1, and update the attribute during startup. > >>> > >>> [root@node1 ~]# systemctl stop pacemaker > >>> [root@node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U > >>> V-10 > >>> > >>> > >>> 3. The attribute is registered in attrd but it is not registered in CIB, > >>> so the updated attribute is not displayed in crm_mon. > >>> > >>> [root@node1 ~]# attrd_updater -Q -n KEY -A > >>> name="KEY" host="node3" value="V-3" > >>> name="KEY" host="node1" value="V-10" > >>> > >>> [root@node1 ~]# crm_mon -QA1 > >>> Stack: corosync > >>> Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with > >>> quorum > >>> > >>> 2 nodes configured > >>> 0 resources configured > >>> > >>> Online: [ node1 node3 ] > >>> > >>> No active resources > >>> > >>> > >>> Node Attributes: > >>> * Node node1: > >>> * Node node3: > >>> + KEY : V-3 > >>> > >>> > >>> Best Regards > >>> > >>> _______________________________________________ > >>> Users mailing list: Users@clusterlabs.org > >>> http://lists.clusterlabs.org/mailman/listinfo/users > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >> > > > -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org