In the process of upgrading a Proxmox cluster from 6.4 to 7, I encountered a failure of linstor which prevents me from proceeding.
First, I upgraded all nodes to the latest linstor 1.14.0, and made sure that linstor node list shows all nodes Online. Next, I upgraded all nodes to the latest 6.4 pve-manager (6.4-13), then evacuated the first diskless node. Then I upgraded the first node to pve 7.0-11. After dealing with bonding problems (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=990428), the machine was a full member of the cluster again. At this point, the new machine has linstor-proxmox 5.2.1 drbd 9.1.3 drbd-utils 9.18.2 while all others still have linstor-proxmox 5.1.6 drbd 9.1.2 drbd-utils 9.18.0 I started draining the next node. Migrating 7 VMs online with 75G total to the fresh node worked flawlessly, but the next migration failed: Proxmox reports "Resource did not became ready on node 'TARGET' within reasonable time, check Satellite for errors." Actually for that very moment, there was no error logged on the satellite, but a minute later, an ErrorReport was created: "external command for stopping the DRBD resource failed", generated at deleteDrbd. linstor resource list will show that resource as Unconnected(...) DELETING, while drbdadm on that satellite says "not in your config". I still see /dev/drbd/... for that resources, drbd_r and drbd_s processes, and 15 "drbdsetup down" processes for that resource that can't be killed -9. Restarting linstor-satellite will create several "external command for stopping..." ErrorReports, no more migration to that machine possible. How can I resolve this situation, preferrably without reboot (because the situation might pop up again)? Regards, Andreas _______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user