Andrei,

On 01.03.2021 12:26, Jan Friesse wrote:


Thanks for digging into logs. I believe Eric is hitting
https://github.com/corosync/corosync-qdevice/issues/10 (already fixed,
but may take some time to get into distributions) - it also contains
workaround.


I tested corosync-qnetd at df3c672 which should include these fixes. It
changed behavior, still I cannot explain it.

Again, ha1+ha2+qnetd, ha2 is current DC, I disconnect ha1 (block
everything with ha1 source MAC), stonith disabled. corosync and

So ha1 is blocked on both ha2 and qnetd and blocking is symmetric (I mean, nothing is sent to ha1 and nothing is received from ha1)?

corosync-qdevice on nodes are still 2.4.5 if it matters.

Shouldn't really matter as long as both corosync-qdevice and corosync-qnetd are version 3.0.1.


ha2:

ar 01 13:23:27 ha2 corosync[1576]:   [TOTEM ] A processor failed,
forming new configuration.
Mar 01 13:23:28 ha2 corosync[1576]:   [VOTEQ ] waiting for quorum device
Qdevice poll (but maximum for 30000 ms)
Mar 01 13:23:28 ha2 corosync[1576]:   [TOTEM ] A new membership
(192.168.1.2:3632) was formed. Members left: 1
Mar 01 13:23:28 ha2 corosync[1576]:   [TOTEM ] Failed to receive the
leave message. failed: 1
Mar 01 13:23:28 ha2 corosync[1576]:   [CPG   ] downlist left_list: 1
received
Mar 01 13:23:28 ha2 pacemaker-based[2032]:  notice: Node ha1 state is
now lost
Mar 01 13:23:28 ha2 pacemaker-based[2032]:  notice: Purged 1 peer with
id=1 and/or uname=ha1 from the membership cache
Mar 01 13:23:28 ha2 pacemaker-attrd[2035]:  notice: Lost attribute
writer ha1
Mar 01 13:23:28 ha2 pacemaker-attrd[2035]:  notice: Node ha1 state is
now lost
Mar 01 13:23:28 ha2 pacemaker-attrd[2035]:  notice: Removing all ha1
attributes for peer loss
Mar 01 13:23:28 ha2 pacemaker-attrd[2035]:  notice: Purged 1 peer with
id=1 and/or uname=ha1 from the membership cache
Mar 01 13:23:28 ha2 pacemaker-fenced[2033]:  notice: Node ha1 state is
now lost
Mar 01 13:23:28 ha2 pacemaker-fenced[2033]:  notice: Purged 1 peer with
id=1 and/or uname=ha1 from the membership cache
Mar 01 13:23:28 ha2 pacemaker-controld[2037]:  warning: Stonith/shutdown
of node ha1 was not expected
Mar 01 13:23:28 ha2 pacemaker-controld[2037]:  notice: State transition
S_IDLE -> S_POLICY_ENGINE
Mar 01 13:23:33 ha2 pacemaker-controld[2037]:  notice: High CPU load
detected: 1.200000
Mar 01 13:23:35 ha2 corosync[1576]:   [QUORUM] Members[1]: 2
Mar 01 13:23:35 ha2 corosync[1576]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Mar 01 13:23:35 ha2 pacemaker-attrd[2035]:  notice: Recorded local node
as attribute writer (was unset)
Mar 01 13:23:35 ha2 pacemaker-controld[2037]:  notice: Node ha1 state is
now lost
Mar 01 13:23:35 ha2 pacemaker-controld[2037]:  warning: Stonith/shutdown
of node ha1 was not expected
Mar 01 13:23:36 ha2 pacemaker-schedulerd[2036]:  notice:  * Promote
p_drbd0:0        (   Slave -> Master ha2 )
Mar 01 13:23:36 ha2 pacemaker-schedulerd[2036]:  notice:  * Start
p_fs_clust01     (                   ha2 )
Mar 01 13:23:36 ha2 pacemaker-schedulerd[2036]:  notice:  * Start
p_mysql_001      (                   ha2 )


So it is pretty fast to react (8 seconds)

ha1:

Mar 01 13:23:27 ha1 corosync[1552]:   [TOTEM ] A processor failed,
forming new configuration.
Mar 01 13:23:30 ha1 corosync[1552]:   [VOTEQ ] waiting for quorum device
Qdevice poll (but maximum for 30000 ms)
Mar 01 13:23:30 ha1 corosync[1552]:   [TOTEM ] A new membership
(192.168.1.1:3640) was formed. Members left: 2
Mar 01 13:23:30 ha1 corosync[1552]:   [TOTEM ] Failed to receive the
leave message. failed: 2
Mar 01 13:23:30 ha1 corosync[1552]:   [CPG   ] downlist left_list: 1
received
Mar 01 13:23:30 ha1 pacemaker-attrd[1738]:  notice: Node ha2 state is
now lost
Mar 01 13:23:30 ha1 pacemaker-attrd[1738]:  notice: Removing all ha2
attributes for peer loss
Mar 01 13:23:30 ha1 pacemaker-attrd[1738]:  notice: Purged 1 peer with
id=2 and/or uname=ha2 from the membership cache
Mar 01 13:23:30 ha1 pacemaker-based[1735]:  notice: Node ha2 state is
now lost
Mar 01 13:23:30 ha1 pacemaker-based[1735]:  notice: Purged 1 peer with
id=2 and/or uname=ha2 from the membership cache
Mar 01 13:23:30 ha1 pacemaker-controld[1740]:  notice: Our peer on the
DC (ha2) is dead
Mar 01 13:23:30 ha1 pacemaker-controld[1740]:  notice: State transition
S_NOT_DC -> S_ELECTION
Mar 01 13:23:30 ha1 pacemaker-fenced[1736]:  notice: Node ha2 state is
now lost
Mar 01 13:23:30 ha1 pacemaker-fenced[1736]:  notice: Purged 1 peer with
id=2 and/or uname=ha2 from the membership cache
Mar 01 13:23:32 ha1 corosync[1552]:   [VOTEQ ] waiting for quorum device
Qdevice poll (but maximum for 30000 ms)
Mar 01 13:23:32 ha1 corosync[1552]:   [TOTEM ] A new membership
(192.168.1.1:3644) was formed. Members
Mar 01 13:23:32 ha1 corosync[1552]:   [CPG   ] downlist left_list: 0
received
Mar 01 13:23:33 ha1 corosync[1552]:   [VOTEQ ] waiting for quorum device
Qdevice poll (but maximum for 30000 ms)
Mar 01 13:23:33 ha1 corosync[1552]:   [TOTEM ] A new membership
(192.168.1.1:3648) was formed. Members
Mar 01 13:23:33 ha1 corosync[1552]:   [CPG   ] downlist left_list: 0
received
Mar 01 13:23:35 ha1 corosync[1552]:   [VOTEQ ] waiting for quorum device
Qdevice poll (but maximum for 30000 ms)
...
Mar 01 13:24:05 ha1 corosync-qdevice[1563]: Can't connect to qnetd host.
(-5986): Network address not available (in use?)
Mar 01 13:24:05 ha1 corosync-qdevice[1563]: Mar 01 13:24:05 error
Can't connect to qnetd host. (-5986): Network address not available (in
use?)
Mar 01 13:24:05 ha1 corosync[1552]:   [VOTEQ ] waiting for quorum device
Qdevice poll (but maximum for 30000 ms)

Yes, this is strange and shouldn't really happen. There should be just one "waiting for quorum device Qdevice poll".

Mar 01 13:24:05 ha1 corosync[1552]:   [TOTEM ] A new membership
(192.168.1.1:3736) was formed. Members
Mar 01 13:24:05 ha1 corosync[1552]:   [CPG   ] downlist left_list: 0
received
Mar 01 13:24:05 ha1 corosync[1552]:   [QUORUM] This node is within the
non-primary component and will NOT provide any services.
Mar 01 13:24:05 ha1 corosync[1552]:   [QUORUM] Members[1]: 1
Mar 01 13:24:05 ha1 corosync[1552]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Mar 01 13:24:05 ha1 pacemaker-controld[1740]:  warning: Quorum lost
Mar 01 13:24:05 ha1 pacemaker-controld[1740]:  notice: Node ha2 state is
now lost
Mar 01 13:24:05 ha1 pacemaker-controld[1740]:  notice: State transition
S_ELECTION -> S_INTEGRATION
Mar 01 13:24:05 ha1 pacemaker-controld[1740]:  notice: Updating quorum
status to false (call=56)
Mar 01 13:24:05 ha1 pacemaker-schedulerd[1739]:  warning: Blind faith:
not fencing unseen nodes
Mar 01 13:24:05 ha1 pacemaker-schedulerd[1739]:  warning: Fencing and
resource management disabled due to lack of quorum
Mar 01 13:24:05 ha1 pacemaker-schedulerd[1739]:  notice:  * Stop
p_drbd0:0     ( Master ha1 )   due to no quorum
Mar 01 13:24:05 ha1 pacemaker-schedulerd[1739]:  notice:  * Stop
p_drbd1:0     (  Slave ha1 )   due to no quorum
Mar 01 13:24:05 ha1 pacemaker-schedulerd[1739]:  notice:  * Stop
p_fs_clust01     (        ha1 )   due to no quorum
Mar 01 13:24:05 ha1 pacemaker-schedulerd[1739]:  notice:  * Start
p_fs_clust02     (        ha1 )   due to no quorum (blocked)
Mar 01 13:24:05 ha1 pacemaker-schedulerd[1739]:  notice:  * Stop
p_mysql_001      (        ha1 )   due to no quorum
Mar 01 13:24:05 ha1 pacemaker-schedulerd[1739]:  notice:  * Start
p_mysql_006      (        ha1 )   due to no quorum (blocked)
Mar 01 13:24:05 ha1 pacemaker-schedulerd[1739]:  notice:  * Start
p_mysql_666      (        ha1 )   due to no quorum (blocked)
Mar 01 13:24:05 ha1 pacemaker-controld[1740]:  notice: Processing graph
0 (ref=pe_calc-dc-16145


So it took it almost 40 seconds to make decision. Somehow it is exactly
the opposite of what I observed before - disconnected node was fast and
connected node was slow.

While I can understand why behavior changed for connected node, I still
do not understand why disconnected node now needs so much time.

And what is worse, this is not reliable - next time I test both nodes
react almost immediately (just 3 seconds to reach decision it is out of
quorum for disconnected node). That is the most irritating, as one
expects consistent behavior here.

That's a bit harder to explain but it has a reason.

Qnetd requires from qdevice to send some message at least every dpd_interval_coefficient * heartbeat_interval (by default 1.5 * 8000 ms). If nothing is received in that timeframe then client is disconnected by qnetd.

Qdevice sends heartbeat (by default) every 8000 ms. It is also expecting reply from qnetd before next heartbeat is sent and if there is no reply it will disconnect from qnetd.

So typical situation is:
- node 1, 2 and qnetd are running
- node 1 dies
- node 2 corosync finds out node 1 missing and waits for qdevice
- qdevice on node 2 sends new information to qnetd
- qnetd finds out that cluster information provided by node 1 and node 2 diverges so it will wait till node 1 sends updated information
- node 2 qdevice is waiting for qnetd reply
- qnetd timeout for node 1 expires so it node 1 is disconnected. That means membership information of cluster no longer diverges (there is only one node left) so qdevice on node 2 is informed about new decision

The "randomness" of reaction time has two reasons. Reason 1 - qnetd-qdevice heartbeat are not connected with corosync one and they are running in parallel. Reason 2 is just a matter of when node 1 sent its last message to qnetd. If last message was sent right before it was shutdown then qnetd waits for (almost) full 12 sec, but if it was closer to the timeout expiration (so let's say heartbeat was sent 7 sec ago) then qnetd needs to wait for much shorter time (in this example it would be really just 5 sec). Combined with Reason 1 there is +-3 sec before corosync says "waiting for quorum device Qdevice poll ..." and qdevice providing the vote.



That is something inside corosync/corosync-qdevice. At least it seems to
improve situation with qnetd response timing to survived nodes.


No mater what, are you able to provide some step-by-step reproducer of that 40 sec delay? Because it's definitively wrong. The worst case should be 16 sec.

Regards,
  Honza

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to