nonsense!

Nils Juergens Tue, 25 Oct 2022 06:55:48 -0700

Dear DRBD-users,

we are currently performing an upgrade from proxmox ve-6 to ve-7 on athree-node linstor/drbd cluster. (Only two nodes are storage+computenodes / satellites, third is linstor-controller+quorum node)

This is a testing environment that we built in preparation for theupgrade of the live cluster.

Before starting the upgrade we were on linstor 1.11, drbd-dkms 9.0.27and pve 6.3. Our upgrade route was to first upgrade linstor to 1.20,then upgrade all nodes to pve 6.4 and drbd-9.2 (9.0.27-1 -> 9.2.0-1)

After a fresh boot all nodes we were in a good state. Healthy cluster,pve6to7 happy, drbd in sync and all packages up-to-date.

We then performed the upgrade of the first node to pve-7 which seemed togo well and rebooted the first node into pve-7.2-11) As we have threeactive VMs with three disk resources this triggered a drbd resync.


Two resources came out fine:

drbd1000 Testserver1: Resync done (total 2 sec; paused 0 sec; 104448 K/sec)
drbd1002 Testserver1: Resync done (total 55 sec; paused 0 sec; 92120 K/sec)

The third resource however did sync about 65% of the outdated data andthen stalled (no more sync traffic, no progress in drbdmon)


The kernel message that seems to be relevant here is this:

drbd vm-101-disk-1/0 drbd1001: drbd_set_in_sync: sector=73703424ssize=134479872 nonsense!


More kernel logs from the pve7 node can be found here
https://pastebin.com/aGjy7Sgp

So far we have tried to reboot the pve7 node, but it will always getstuck in inconsistent/synctarget (no percentage of progress shown) andprint the kernel error message "drbd_set_in_sync: sector=73703424ssize=134479872 nonsense".

The linstor resources are backed by lvm_thin which is backed by aMegaRAID in RAID1 with SSD drives.

I don't know if this is relevant, but the VM in question has at somepoint in its lifetime been rolled back to a snapshot. (All snapshotshave been removed prior to the upgrades).

At that time the rollback did work OK, but we noticed a huge increase ofthe allocated space on the backing device (IIRC it was equal to thevirtual disk size). We have set "discard=on" in proxmox and did a"fstrim" in the VM, which cut down the space usage, but it's not equalon both nodes):


root@Testserver3:~# linstor resource list-volumes
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮

┊ Node ┊ Resource ┊ StoragePool ┊ VolNr ┊ MinorNr ┊DeviceName ┊ Allocated ┊ InUse ┊ State ┊

╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡

┊ Testserver1 ┊ vm-100-disk-1 ┊ ssd_thin ┊ 0 ┊ 1000 ┊/dev/drbd1000 ┊ 2.28 GiB ┊ InUse ┊ UpToDate ┊┊ Testserver2 ┊ vm-100-disk-1 ┊ ssd_thin ┊ 0 ┊ 1000 ┊/dev/drbd1000 ┊ 2.50 GiB ┊ Unused ┊ UpToDate ┊┊ Testserver1 ┊ vm-101-disk-1 ┊ ssd_thin ┊ 0 ┊ 1001 ┊/dev/drbd1001 ┊ 35.38 GiB ┊ InUse ┊ UpToDate ┊┊ Testserver2 ┊ vm-101-disk-1 ┊ ssd_thin ┊ 0 ┊ 1001 ┊/dev/drbd1001 ┊ 31.05 GiB ┊ Unused ┊ Inconsistent ┊┊ Testserver1 ┊ vm-102-disk-1 ┊ ssd_thin ┊ 0 ┊ 1002 ┊/dev/drbd1002 ┊ 7.04 GiB ┊ InUse ┊ UpToDate ┊┊ Testserver2 ┊ vm-102-disk-1 ┊ ssd_thin ┊ 0 ┊ 1002 ┊/dev/drbd1002 ┊ 7.04 GiB ┊ Unused ┊ UpToDate ┊

╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The linstor-created resource looks like this:
https://pastebin.com/syLADBdC

relevant version numbers:

drbd-dkms: 9.2.0-1
linstor-(controller|satellite): 1.20.0-1
linstor-proxmox: 6.1.0-1
proxmox-ve versions: 6.4-1 (two nodes) and 7.2-1 (one node)
kernel: 5.4.203-1-pve (two nodes) and 5.15.64-1-pve (one node)

Any insight on this would be most welcome. I'll provide more details ifyou feel something is missing.


thanks and kind regards,
Nils
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] drbd9.2 resync stuck with drbd_set_in_sync: sector=<...>s size=<...> nonsense!

Reply via email to