On 5/28/19 6:16 PM, Alexander Karamanlidis wrote: > hangs forever because of tainted kernel
Those hangs have nothing to do with the taint status that the kernel shows, since none of the problem-related taint flags are set. The kernel shows a taint of P O, which is - P: Proprietary module loaded - O: Out-of-tree module loaded That is a normal runtime status that does not indicate any problems. What's more interesting are the messages emitted by LINSTOR: > SUCCESS: > > Suspended IO of 'vm-102-disk-1' on 'node2' for snapshot > SUCCESS: > Suspended IO of 'vm-102-disk-1' on 'node1' for snapshot > > ERROR: > Description: > (Node: 'node1') Preparing resources for layer StorageLayer failed > Cause: > External command timed out > Details: > External command: lvs -o > lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr > --separator ; --noheadings --units k --nosuffix drbdpool > VM 102 qmp command 'savevm-end' failed - unable to connect to VM 102 > qmp socket - timeout after 5992 retries > snapshot create failed: starting cleanup > error with cfs lock 'storage-drbdpool': Could not remove > vm-102-state-test123: got lock timeout - aborting command > TASK ERROR: Could not create cluster wide snapshot for: vm-102-disk-1: > exit code 10 > Looks like LVM, or some subtask of it, is accessing the storage of vm-102-disk-1 through DRBD (maybe LVM scanning DRBD devices), which will hang, because I/O on that device is suspended in order to take a cluster-wide consistent snapshot. My guess is that this is an LVM configuration error that causes LVM to access DRBD devices, a very common source of timeout problems of all kinds. > We also have LVM_THIN Storage Pools. Those also block whenever they run full, so checking that may be a good idea too. br, Robert _______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
