On Thursday, December 3, 2020 10:16 PM, Frank Thommen
<[email protected]> wrote:
Dear all,
on our PVE cluster, the backup of a specific VM always fails (which
makes us worry, as it is our GitLab instance). The general backup plan
is "back up all VMs at 00:30". In the confirmation email we see, that
the backup of this specific VM takes six to seven hours and then fails.
The error message in the overview table used to be:
vma_queue_write: write error - Broken pipe
With detailed log
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
123: 2020-12-01 02:53:08 INFO: Starting Backup of VM 123 (qemu)
123: 2020-12-01 02:53:08 INFO: status = running
123: 2020-12-01 02:53:09 INFO: update VM 123: -lock backup
123: 2020-12-01 02:53:09 INFO: VM Name: odcf-vm123
123: 2020-12-01 02:53:09 INFO: include disk 'virtio0'
'ceph-rbd:vm-123-disk-0' 20G
123: 2020-12-01 02:53:09 INFO: include disk 'virtio1'
'ceph-rbd:vm-123-disk-2' 1000G
123: 2020-12-01 02:53:09 INFO: include disk 'virtio2'
'ceph-rbd:vm-123-disk-3' 2T
123: 2020-12-01 02:53:09 INFO: backup mode: snapshot
123: 2020-12-01 02:53:09 INFO: ionice priority: 7
123: 2020-12-01 02:53:09 INFO: creating archive
'/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_01-02_53_08.vma.lzo'
123: 2020-12-01 02:53:09 INFO: started backup task
'a38ff50a-f474-4b0a-a052-01a835d5c5c7'
123: 2020-12-01 02:53:12 INFO: status: 0% (167772160/3294239916032),
sparse 0% (31563776), duration 3, read/write 55/45 MB/s
[... ecc. ecc. ...]
123: 2020-12-01 09:42:14 INFO: status: 35%
(1170252365824/3294239916032), sparse 0% (26845003776), duration 24545,
read/write 59/56 MB/s
123: 2020-12-01 09:42:14 ERROR: vma_queue_write: write error - Broken pipe
123: 2020-12-01 09:42:14 INFO: aborting backup job
123: 2020-12-01 09:42:15 ERROR: Backup of VM 123 failed -
vma_queue_write: write error - Broken pipe
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Since lately (upgrade to the newest PVE release) it's
VM 123 qmp command 'query-backup' failed - got timeout
with log
--------------------------------------------------------------------------------------------------------------------------
123: 2020-12-03 03:29:00 INFO: Starting Backup of VM 123 (qemu)
123: 2020-12-03 03:29:00 INFO: status = running
123: 2020-12-03 03:29:00 INFO: VM Name: odcf-vm123
123: 2020-12-03 03:29:00 INFO: include disk 'virtio0'
'ceph-rbd:vm-123-disk-0' 20G
123: 2020-12-03 03:29:00 INFO: include disk 'virtio1'
'ceph-rbd:vm-123-disk-2' 1000G
123: 2020-12-03 03:29:00 INFO: include disk 'virtio2'
'ceph-rbd:vm-123-disk-3' 2T
123: 2020-12-03 03:29:01 INFO: backup mode: snapshot
123: 2020-12-03 03:29:01 INFO: ionice priority: 7
123: 2020-12-03 03:29:01 INFO: creating vzdump archive
'/mnt/pve/cephfs/dump/vzdump-qemu-123-2020_12_03-03_29_00.vma.lzo'
123: 2020-12-03 03:29:01 INFO: started backup task
'cc7cde4e-20e8-4e26-a89a-f6f1aa9e9612'
123: 2020-12-03 03:29:01 INFO: resuming VM again
123: 2020-12-03 03:29:04 INFO: 0% (284.0 MiB of 3.0 TiB) in 3s, read:
94.7 MiB/s, write: 51.7 MiB/s
[... ecc. ecc. ...]
123: 2020-12-03 09:05:08 INFO: 36% (1.1 TiB of 3.0 TiB) in 5h 36m 7s,
read: 57.3 MiB/s, write: 53.6 MiB/s
123: 2020-12-03 09:22:57 ERROR: VM 123 qmp command 'query-backup' failed
- got timeout
123: 2020-12-03 09:22:57 INFO: aborting backup job
123: 2020-12-03 09:32:57 ERROR: VM 123 qmp command 'backup-cancel'
failed - unable to connect to VM 123 qmp socket - timeout after 5981 retries
123: 2020-12-03 09:32:57 ERROR: Backup of VM 123 failed - VM 123 qmp
command 'query-backup' failed - got timeout
The VM has some quite big vdisks (20G, 1T and 2T). All stored in Ceph.
There is still plenty of space in Ceph.
Can anyone give us some hint on how to investigate and debug this further?
Because it is a write error, maybe we should look at the backup destination.
Maybe it is a network connection issue? Maybe something wrong with the host?
Maybe the disk is full?
Which storage are you using for backup? Can you show us the corresponding entry
in /etc/pve/storage.cfg?
We are backing up to cephfs with still 8 TB or so free.
/etc/pve/storage.cfg is
------------
dir: local
path /var/lib/vz
content vztmpl,backup,iso
dir: data
path /data
content snippets,images,backup,iso,rootdir,vztmpl
cephfs: cephfs
path /mnt/pve/cephfs
content backup,vztmpl,iso
maxfiles 5
rbd: ceph-rbd
content images,rootdir
krbd 0
pool pve-pool1
------------
Frank
best regards, Arjen
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user