Am 29.05.19 um 10:26 schrieb Robert Altnoeder: > On 5/28/19 6:16 PM, Alexander Karamanlidis wrote: > >> hangs forever because of tainted kernel > Those hangs have nothing to do with the taint status that the kernel > shows, since none of the problem-related taint flags are set. > The kernel shows a taint of P O, which is > > - P: Proprietary module loaded > - O: Out-of-tree module loaded > > That is a normal runtime status that does not indicate any problems. > > > What's more interesting are the messages emitted by LINSTOR: > >> SUCCESS: >> >> Suspended IO of 'vm-102-disk-1' on 'node2' for snapshot >> SUCCESS: >> Suspended IO of 'vm-102-disk-1' on 'node1' for snapshot >> >> ERROR: >> Description: >> (Node: 'node1') Preparing resources for layer StorageLayer failed >> Cause: >> External command timed out >> Details: >> External command: lvs -o >> lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr >> --separator ; --noheadings --units k --nosuffix drbdpool >> VM 102 qmp command 'savevm-end' failed - unable to connect to VM 102 >> qmp socket - timeout after 5992 retries >> snapshot create failed: starting cleanup >> error with cfs lock 'storage-drbdpool': Could not remove >> vm-102-state-test123: got lock timeout - aborting command >> TASK ERROR: Could not create cluster wide snapshot for: vm-102-disk-1: >> exit code 10 >> > Looks like LVM, or some subtask of it, is accessing the storage of > vm-102-disk-1 through DRBD (maybe LVM scanning DRBD devices), which will > hang, because I/O on that device is suspended in order to take a > cluster-wide consistent snapshot.
Correct, yes. The subtask is the command linstor is executing above. "lvs -o lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr --separator ";" --noheadings --units k --nosuffix drbdpool" The snapshot process stops at this point and after 180 Seconds i get the traces i provided over and over again. However, if executed in a non-snapshot process (normally from bash) it just works fine, so i guess it could have something to do with the suspended I/O. However i don't really understand how this can occurr. > My guess is that this is an LVM configuration error that causes LVM to > access DRBD devices, a very common source of timeout problems of all kinds. > We didn't configure anything special for our LVM_THIN Storage pools, except, that we increased the metadata size to 4G. And i couldn't get any information about settings to set when using linstor with proxmox and LVM_THIN Storage-Pools. For double checking these are the exact steps we used: ssacli ctrl slot=0 create type=ld drives=1I:3:1,1I:3:2,1I:3:3,1I:3:4,2I:3:5,2I:3:6,2I:3:7,2I:3:8 raid=1+0 ctrl slot=0 array B add spares=4I:2:6 pvcreate /dev/sdb vgcreate drbdpool /dev/sdb lvcreate -l95%FREE --thinpool drbdpool/drbdpool lvextend --poolmetadatasize +4G drbdpool/drbdpool Just to make sure we also changed some drbd-options in the linstor-controller, so that they would be more suitable for our 25G dedicated DRBD Network. These were the following: linstor controller drbd-options \ --after-sb-0pri=discard-zero-changes \ --after-sb-1pri=discard-secondary \ --after-sb-2pri=disconnect linstor controller drbd-options \ --max-buffers=36864 \ --rcvbuf-size=2097152 \ --sndbuf-size=1048576 linstor controller drbd-options \ --c-fill-target=10240 \ --c-max-rate=737280 \ --c-min-rate=20480 \ --c-plan-ahead=10 linstor controller drbd-options --verify-alg sha1 --csums-alg sha1 linstor controller drbd-options --resync-rate=2000000 If there is some more configuration needed or we missconfigured something, we were not aware of it. >> We also have LVM_THIN Storage Pools. > Those also block whenever they run full, so checking that may be a good > idea too. Did that real quick. Seems like we have enought space left. root@node1:~# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert drbdpool drbdpool twi-aotz-- 6.64t 6.56 1.52 vm-100-disk-0_00000 drbdpool Vwi-aotz-- 4.00g drbdpool 99.98 vm-102-disk-1_00000 drbdpool Vwi-aotz-- 100.02g drbdpool 100.00 vm-103-disk-1_00000 drbdpool Vwi-aotz-- 100.02g drbdpool 76.55 vm-104-disk-1_00000 drbdpool Vwi-aotz-- 5.00g drbdpool 100.00 vm-104-disk-2_00000 drbdpool Vwi-aotz-- 100.02g drbdpool 69.36 vm-105-disk-1_00000 drbdpool Vwi-aotz-- 5.00g drbdpool 100.00 vm-105-disk-2_00000 drbdpool Vwi-aotz-- 115.03g drbdpool 52.83 vm-106-disk-1_00000 drbdpool Vwi-aotz-- 5.00g drbdpool 100.00 vm-106-disk-2_00000 drbdpool Vwi-aotz-- 215.05g drbdpool 29.55 vm-107-disk-1_00000 drbdpool Vwi-aotz-- 5.00g drbdpool 0.02 vm-108-disk-1_00000 drbdpool Vwi-aotz-- 5.00g drbdpool 100.00 Maybe worth to mention that we configured linstor and tested the snapshot function yesterday. So it never worked for us. Just to clarify, that this hasn't worked in the past. Just if it matters, this is our storage.cfg proxmox config entry for the DRBD resources: drbd: drbdpool content images,rootdir controller 10.1.128.158 controllervm 100 nodes node2,node1 redundancy 2 > > br, > Robert Thanks for the quick reply Robert BR, Alex > _______________________________________________ > drbd-user mailing list > [email protected] > http://lists.linbit.com/mailman/listinfo/drbd-user -- Freundliche Grüße Kind regards, Alexander Karamanlidis IT Systemadministrator Phone: +49 721 480 848 – 609 Lindenbaum GmbH Conferencing - Virtual Dialogues Facebook | LinkedIn | Youtube | Website Head office: Ludwig-Erhard-Allee 34 im Park Office, 76131 Karlsruhe Registration court: Amtsgericht Mannheim, HRB 706184 Managing director: Maarten Kronenburg Tax number: 35007/02060, USt. ID: DE 263797265 Lindenbaum auf der CallCenterWorld – und auf dem Mobile World Congress _______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
