Hi Philipp,Is there an incompatibility between rc2 and rc3? Trying to run both versions in one cluster fails and I see:
drbd r0 [rc2 node]: Preparing remote state change 1013781642 drbd r0: Two-phase commit 1013781642 timeout <-- rc3 node
Going back to rc2 solves the problem and the cluster will be healthy again. Regards, Rob On 12/7/20 5:38 PM, Philipp Reisner wrote:
Hi, It is time for a rc3, rc2 is already nearly 3 weeks old! We were very busy ironing out details with the state engine for strate transitions when nodes establish a connection. Well, two partitions join. It looks really good now. A new test tortures it in a way we never tested it before. I am convinced that we have put an end to an entire class of bugs. While doing this two patches reached us that aim to cure possible sources for inconsistencies in mirroring the data. One of those got merged, the other one is still under investigation. We will take the time that is necessary to fully understand that and have a proper fix in place. This is a release candidate, please help testing it. Changelog: 9.0.26-0rc3 (api:genl2/proto:86-118/transport:14) -------- * fix for writes not getting mirrored over a connection while the primary transitions through the WFBitMapS state * completed missing logic of the new two-phase-commit based connect process; avoid connecting partitions with a primary in each; ensure consistent decisions if the connect attempt will be retried 9.0.26-0rc2 (api:genl2/proto:86-118/transport:14) -------- * fix a crash if during resync a discard operation fails on the resync-target node * fix online verify to not clamp disk states to UpToDate * fix promoting resync-target nodes; the problem was that it could modify the bitmap of an ongoing resync; which leads to alarming log messages * pause a resync if the sync-source node becomes inconsistent; an example is a cascading resync where the upstream resync aborts and leaves the sync-source node for the downstream resync with an inconsistent disk; note, the node at the end of the chain could still have an outdated disk (better than inconsistent) * allow force primary on a sync-target node by breaking the resync * minor fixes to the compat tests 9.0.26-0rc1 (api:genl2/proto:86-118/transport:14) -------- * fix a case of a disk unexpectedly becoming Outdated by moving the exchange of the initial packets into the body of the two-phase-commit that happens at a connect * fix adding of new volumes to resources with a primary node * reliably detect split brain situation on both nodes * fix an unexpected occurrence of NetworkFailure state in a tight drbdsetup disconnect; drbdsetup connect sequence * fix online verify to return to Established from VerifyS if the VerifyT node was temporarily Inconsistent during the run * fix a corner case where a node ends up Outdated after the crash and rejoin of a primary node * implement 'blockdev --setro' in DRBD * following upstream changes to DRBD up to Linux 5.9 and ensure compatibility with Linux 5.8 and 5.9 https://www.linbit.com/downloads/drbd/9.0/drbd-9.0.26-0rc3.tar.gz https://github.com/LINBIT/drbd/commit/9114a0383f72b87610cd9ee282676cf94213da5b _______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user