Hi,
I'm testing a 3 node cluster, fresh installed with linbit repo for pve
with last versions
I have configured LVM as backend storage and set redundancy to 1 for now.
So when I create a VM on node1, for example, could happen that the resource
is allocated on node2 and i get a diskless resource on node1, but this
not seem to be a problem.
The problem is restoring adn creating VMs, here some log:
1) restoring a VM on pve215
Output from proxmox:
restore vma archive: lzop -d -c
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma
extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783
CFG: size: 281 name: qemu-server.conf
DEV: dev_id=1 size: 2147483648 devname: drive-virtio0
CTIME: Thu Mar 23 17:58:11 2017
new volume ID is 'drbd1:vm-101-disk-1'
map 'drive-virtio0' to '/dev/drbd/by-res/vm-101-disk-1/0' (write zeros = 1)
** (process:6786): ERROR **: can't open file
/dev/drbd/by-res/vm-101-disk-1/0 - Could not open
'/dev/drbd/by-res/vm-101-disk-1/0': No such file or directory
/bin/bash: line 1: 6785 Broken pipe lzop -d -c
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo
6786 Trace/breakpoint trap | vma extract -v -r
/var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783
temporary volume 'drbd1:vm-101-disk-1' sucessfuly removed
TASK ERROR: command 'lzop -d -c
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma
extract -v -r /var/tmp/vzdumptmp6783.fifo - /var/tmp/vzdumptmp6783'
failed: exit code 133
dmesg logs on pve215
[Fri Mar 24 17:33:53 2017] traps: vma[6786] trap int3 ip:7fea7dc35d30
sp:7ffe12b61f70 error:0
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1: Starting worker thread
(from drbdsetup [6830])
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: disk( Diskless
-> Attaching )
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: Maximum number
of peer devices = 7
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1: Method to ensure write
ordering: flush
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: my node_id: 0
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: my node_id: 0
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: drbd_bm_resize
called with capacity == 4194304
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: resync bitmap:
bits=524288 words=57344 pages=112
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: recounting of
set bits took additional 0ms
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: disk( Attaching
-> UpToDate )
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: attached to
current UUID: 626E64181A1CD438
[Fri Mar 24 17:33:53 2017] drbd vm-101-disk-1/0 drbd109: size = 2048 MB
(2097152 KB)
[Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: disk( UpToDate
-> Detaching )
[Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: disk( Detaching
-> Diskless )
[Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1/0 drbd109: drbd_bm_resize
called with capacity == 0
[Fri Mar 24 17:33:54 2017] drbd vm-101-disk-1: Terminating worker thread
pve216
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Preparing remote
state change 4149564885 (primary_nodes=0, weak_nodes=0)
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Committing remote
state change 4149564885
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Connected ->
TearDown ) peer( Secondary -> Unknown )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1/0 drbd108 pve214: pdsk(
UpToDate -> DUnknown ) repl( Established -> Off )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: ack_receiver
terminated
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating
ack_recv thread
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Connection closed
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( TearDown ->
Unconnected )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Restarting
receiver thread
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Unconnected
-> Connecting )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn( Connecting
-> Disconnecting )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Connection closed
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: conn(
Disconnecting -> StandAlone )
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating
receiver thread
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1 pve214: Terminating sender
thread
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1/0 drbd108: drbd_bm_resize
called with capacity == 0
[Fri Mar 24 17:32:47 2017] drbd vm-100-disk-1: Terminating worker thread
pve214
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: Preparing cluster-wide
state change 4149564885 (0->1 496/16)
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: State change 4149564885:
primary_nodes=0, weak_nodes=0
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Cluster is now split
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1: Committing cluster-wide
state change 4149564885 (0ms)
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: conn( Connected ->
Disconnecting ) peer( Secondary -> Unknown )
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1/0 drbd108 pve216: pdsk(
Diskless -> DUnknown ) repl( Established -> Off )
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: ack_receiver
terminated
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating
ack_recv thread
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Connection closed
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: conn(
Disconnecting -> StandAlone )
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating
receiver thread
[Fri Mar 24 17:32:36 2017] drbd vm-100-disk-1 pve216: Terminating sender
thread
2) restoring a VM on pve215
restore vma archive: lzop -d -c
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma
extract -v -r /var/tmp/vzdumptmp8299.fifo - /var/tmp/vzdumptmp8299
CFG: size: 281 name: qemu-server.conf
DEV: dev_id=1 size: 2147483648 devname: drive-virtio0
CTIME: Thu Mar 23 17:58:11 2017
trying to aquire cfs lock 'storage-drbd1' ...TASK ERROR: command 'lzop
-d -c
/mnt/pve/nfsbk249/dump/vzdump-qemu-497-2017_03_23-17_58_09.vma.lzo|vma
extract -v -r /var/tmp/vzdumptmp8299.fifo - /var/tmp/vzdumptmp8299'
failed: got lock request timeout
pve215
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1: Starting worker thread
(from drbdsetup [8064])
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: disk( Diskless
-> Attaching )
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: Maximum number
of peer devices = 7
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1: Method to ensure write
ordering: flush
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: my node_id: 0
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: my node_id: 0
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: drbd_bm_resize
called with capacity == 4194304
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: resync bitmap:
bits=524288 words=57344 pages=112
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: recounting of
set bits took additional 0ms
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: disk( Attaching
-> UpToDate )
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: attached to
current UUID: F1BF2127385E673F
[Fri Mar 24 17:40:37 2017] drbd vm-101-disk-1/0 drbd111: size = 2048 MB
(2097152 KB)
pve214
no log
pve216
no log
3)
If I try to create a new VM, often happens that drbdmanage miss something:
root@pve214:~# drbdmanage list-assignments
+-----------------------------------------------------------------------------------------------------------+
| Node | Resource | Vol ID | |
State |
|------------------------------------------------------------------------------------------------------------|
| pve216 | vm-102-disk-1 | * |
| pending actions: commission |
| pve216 | vm-102-disk-1 | 0 | |
pending actions: commission, attach |
but on pve216...
root@pve216:~# drbdadm status
.drbdctrl role:Secondary
volume:0 disk:UpToDate
volume:1 disk:UpToDate
pve214 role:Primary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate
pve215 role:Secondary
volume:0 peer-disk:UpToDat[Sat Mar 25 10:44:47 2017] drbd
vm-102-disk-1: Starting worker thread (from drbdsetup [31664])
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless
-> Attaching )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number
of peer devices = 7
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write
ordering: flush
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: drbd_bm_resize
called with capacity == 2097152
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap:
bits=262144 words=28672 pages=56
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of
set bits took additional 0ms
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching
-> UpToDate )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to
current UUID: D38CF07DC8601CD9
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB
(1048576 KB)
eI have also a PVE 4.3 cluster with drbdmanage 0.97. Sometime restore
hangs but there are
volume:1 peer-disk:UpToDate
vm-102-disk-1 role:Secondary
disk:UpToDate
dmesg on pve216
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Starting worker thread
(from drbdsetup [31664])
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless
-> Attaching )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number
of peer devices = 7
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write
ordering: flush
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drb[Sat Mar 25 10:44:47 2017] drbd
vm-102-disk-1: Starting worker thread (from drbdsetup [31664])
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Diskless
-> Attaching )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: Maximum number
of peer devices = 7
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1: Method to ensure write
ordering: flush
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: my node_id: 0
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: drbd_bm_resize
called with capacity == 2097152
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap:
bits=262144 words=28672 pages=56
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of
set bits took additional 0ms
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching
-> UpToDate )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to
current UUID: D38CF07DC8601CD9
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB
(1048576 KB)
d vm-102-disk-1/0 drbd120: drbd_bm_resize called with capacity == 2097152
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: resync bitmap:
bits=262144 words=28672 pages=56
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: recounting of
set bits took additional 0ms
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: disk( Attaching
-> UpToDate )
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: attached to
current UUID: D38CF07DC8601CD9
[Sat Mar 25 10:44:47 2017] drbd vm-102-disk-1/0 drbd120: size = 1024 MB
(1048576 KB)
Then I made ad assign on pve214, unassign on pve216 (drbdmanage removed
the lv too) and worked
Thank you and hope this could be useful for developers
Den
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user