Hi,
I'm testing a 3 node cluster, fresh installed with linbit repo for pve with last versions

I have configured LVM as backend storage and set redundancy to 1 for now.

So when I create a VM on node1, for example, could happen that the resource
is allocated on node2 and i get a diskless resource on node1, but this not seem to be a problem.

The problem is restoring adn creating VMs, here some log:




1) restoring a VM on pve215


Output from proxmox:
restore vma archive: lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783
CFG: size: 281 name: qemu-server.conf
DEV: dev_id=1 size: 2147483648 devname: drive-virtio0
CTIME: Thu Mar 23 17:58:11 2017
new volume ID is 'drbd1:vm-101-disk-1'
map 'drive-virtio0' to '/dev/drbd/by-res/vm-101-disk-1/0' (write zeros = 1)

** (process:6786): ERROR **: can't open file /dev/drbd/by-res/vm-101-disk-1/0 - Could not open '/dev/drbd/by-res/vm-101-disk-1/0': No such file or directory /bin/bash: line 1: 6785 Broken pipe lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo 6786 Trace/breakpoint trap | vma extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783
temporary volume 'drbd1:vm-101-disk-1' sucessfuly removed
TASK ERROR: command 'lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783' failed: exit code 133



dmesg logs on pve215
[Fri Mar 24 17:33:53 2017] traps: vma[6786] trap int3 ip:7fea7dc35d30 sp:7ffe12b61f70 error:0 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1: Starting worker thread (from drbdsetup [6830]) [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: disk( Diskless -> Attaching ) [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: Maximum number of peer devices = 7 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1: Method to ensure write ordering: flush
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: my node_id: 0
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: my node_id: 0
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: drbd_bm_resize called with capacity == 4194304 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: resync bitmap: bits=524288 words=57344 pages=112 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: recounting of set bits took additional 0ms [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: disk( Attaching -> UpToDate ) [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: attached to current UUID: 626E64181A1CD438 [Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: size = 2048 MB (2097152 KB) [Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: disk( UpToDate -> Detaching ) [Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: disk( Detaching -> Diskless ) [Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: drbd_bm_resize called with capacity == 0
[Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1: Terminating worker thread


pve216
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Preparing remote state change 4149564885 (primary_nodes=0, weak_nodes=0) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Committing remote state change 4149564885 [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1/0 drbd108 pve214: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: ack_receiver terminated [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating ack_recv thread
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Connection closed
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( TearDown -> Unconnected ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Restarting receiver thread [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Unconnected -> Connecting ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Connecting -> Disconnecting )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Connection closed
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Disconnecting -> StandAlone ) [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating receiver thread [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating sender thread [Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1/0 drbd108: drbd_bm_resize called with capacity == 0
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1: Terminating worker thread


pve214
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: Preparing cluster-wide state change 4149564885 (0->1 496/16) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: State change 4149564885: primary_nodes=0, weak_nodes=0
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Cluster is now split
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: Committing cluster-wide state change 4149564885 (0ms) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: conn( Connected -> Disconnecting ) peer( Secondary -> Unknown ) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1/0 drbd108 pve216: pdsk( Diskless -> DUnknown ) repl( Established -> Off ) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: ack_receiver terminated [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating ack_recv thread
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Connection closed
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: conn( Disconnecting -> StandAlone ) [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating receiver thread [Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating sender thread



2) restoring a VM on pve215

restore vma archive: lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp8299.fifo - /var/tmp/vzdumptmp8299
CFG: size: 281 name: qemu-server.conf
DEV: dev_id=1 size: 2147483648 devname: drive-virtio0
CTIME: Thu Mar 23 17:58:11 2017
trying to aquire cfs lock 'storage-drbd1' ...TASK ERROR: command 'lzop -d -c /mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp8299.fifo - /var/tmp/vzdumptmp8299' failed: got lock request timeout



pve215
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1: Starting worker thread (from drbdsetup [8064]) [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: disk( Diskless -> Attaching ) [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: Maximum number of peer devices = 7 [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1: Method to ensure write ordering: flush
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: my node_id: 0
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: my node_id: 0
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: drbd_bm_resize called with capacity == 4194304 [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: resync bitmap: bits=524288 words=57344 pages=112 [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: recounting of set bits took additional 0ms [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: disk( Attaching -> UpToDate ) [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: attached to current UUID: F1BF2127385E673F [Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: size = 2048 MB (2097152 KB)


pve214
no log

pve216
no log


3)
If I try to create a new VM, often happens that drbdmanage miss something:

root@pve214:~# drbdmanage list-assignments
+-----------------------------------------------------------------------------------------------------------+
| Node | Resource | Vol ID | | State |
|------------------------------------------------------------------------------------------------------------|
| pve216 | vm-102-disk-1 | * | | pending actions: commission | | pve216 | vm-102-disk-1 | 0 | | pending actions: commission, attach |

but on pve216...
root@pve216:~# drbdadm status
.drbdctrl role:Secondary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  pve214 role:Primary
    volume:0 peer-disk:UpToDate
    volume:1 peer-disk:UpToDate
  pve215 role:Secondary
volume:0 peer-disk:UpToDat[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Starting worker thread (from drbdsetup [31664]) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless -> Attaching ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number of peer devices = 7 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write ordering: flush
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: drbd_bm_resize called with capacity == 2097152 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap: bits=262144 words=28672 pages=56 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of set bits took additional 0ms [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching -> UpToDate ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to current UUID: D38CF07DC8601CD9 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB (1048576 KB) eI have also a PVE 4.3 cluster with drbdmanage 0.97. Sometime restore hangs but there are
    volume:1 peer-disk:UpToDate

vm-102-disk-1 role:Secondary
  disk:UpToDate


dmesg on pve216
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Starting worker thread (from drbdsetup [31664]) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless -> Attaching ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number of peer devices = 7 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write ordering: flush
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drb[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Starting worker thread (from drbdsetup [31664]) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless -> Attaching ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number of peer devices = 7 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write ordering: flush
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: drbd_bm_resize called with capacity == 2097152 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap: bits=262144 words=28672 pages=56 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of set bits took additional 0ms [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching -> UpToDate ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to current UUID: D38CF07DC8601CD9 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB (1048576 KB)
d vm-102-disk-1/0 drbd120: drbd_bm_resize called with capacity == 2097152
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap: bits=262144 words=28672 pages=56 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of set bits took additional 0ms [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching -> UpToDate ) [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to current UUID: D38CF07DC8601CD9 [Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB (1048576 KB)


Then I made ad assign on pve214, unassign on pve216 (drbdmanage removed the lv too) and worked



Thank you and hope this could be useful for developers

Den

_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to