Hallo Roland,
danke für deine Antwort und Tipps.
Ich hab nun mehrere hundert größere und sehr große Files nach
/mnt/pve/gfs_vms geschrieben und die md5-Summen verglichen, alles kein
Problem. Auch beim Lesen nicht.
Wenn ich aio auf threads setze, wird es gefühlt leider sogar noch schlimmer
mit den kaputten VMs. Ich hab Folgendes in der VM Konfig stehen:
scsi0: gfs_vms:200/vm-200-disk-0.qcow2,discard=on,aio=threads,size=10444M
Ist das so richtig? Laut den Prozessen sollte es stimmen:
root 1708993 4.3 1.7 3370764 1174016 ? Sl 15:32 1:40
/usr/bin/kvm -id 200 -name testvm,debug-threads=on -no-shutdown -chardev
socket,id=qmp,path=/var/run/qemu-server/200.qmp,server=on,wait=off -mon
chardev=qmp,mode=control -ch
ardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon
chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/200.pid -daemonize
-smbios type=1,uuid=0da99a1f-a9ac-4999-a6c4-203cd39ff72e -smp
1,sockets=1,cores=1,maxcpus
=1 -nodefaults -boot
menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg
-vnc unix:/var/run/qemu-server/200.vnc,password=on -cpu
kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 2048 -object memory-ba
ckend-ram,id=ram-node0,size=2048M -numa
node,nodeid=0,cpus=0,memdev=ram-node0 -readconfig
/usr/share/qemu-server/pve-q35-4.0.cfg -device
vmgenid,guid=dc4109a1-7b6f-4735-9685-ca50a38744e2 -device
usb-tablet,id=tablet,bus=ehci.0,port=1 -chard
ev
socket,id=serial0,path=/var/run/qemu-server/200.serial0,server=on,wait=off
-device isa-serial,chardev=serial0 -device VGA,id=vga,bus=pcie.0,addr=0x1
-chardev socket,path=/var/run/qemu-server/200.qga,server=on,wait=off,id=qga0
-device vir
tio-serial,id=qga0,bus=pci.0,addr=0x8 -device
virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi
initiator-name=iqn.1993-08.org.debian:01:cbb6926f9
59d -drive
file=gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-cloudinit.qcow2,if=none,id=drive-ide2,media=cdrom,aio=io_uring
-device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2 -device
virtio-scsi-pci,id=scsihw0,bus=pci.0,addr
=0x5 -drive
file=gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-disk-0.qcow2,if=none,id=drive-scsi0,aio=threads,discard=on,format=qcow2,cache=none,detect-zeroes=unmap
-device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=dri
ve-scsi0,id=scsi0,bootindex=101 -netdev
type=tap,id=net0,ifname=tap200i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
-device virtio-net-pci,mac=5E:1F:9A:04:D6:6C,netdev=net0,bus=pci.0,addr=
0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024 -machine type=q35+pve0
Ich werd das Ganze jeztt nochmal mit einem lokalen Storage Backend
probieren, geh aber davon aus, dass es damit läuft.
Leider hat das gluster-Zeugs ein Kollege aufgesetzt, wenn es daran also
liegt, muss ich mich wohl näher damit beschäftigen...
Wir haben glusterfs deshalb gewählt, weil es uns am unkompliziertesten
schien und weil wir etwas Respekt vor z.B. Ceph haben.
Was könnte ich denn noch versuchen? Würde es vielleicht Sinn machen das
Image-Format von qcow2 auf raw umzustellen? wir haben qcow2 vor allem wegen
der Snapshots und Platzersparnis gewählt, falls das mit glusterfs nicht
vernünftig funktioniert, müssten wir da ggf. auch nochmal schauen.
Ich selbst habe bisher virtuelle Maschinen immer nur mit libvirt betrieben,
ohne ein zentrales Storage. Daher kommen gerade viele neue Themen zusammen,
die alle recht komplex sind :-(. Daher wäre ich über jeden Tipp für ein
sinnvolles Setup froh :-).
Ciao und danke,
Christian
On Tue, May 30, 2023 at 06:46:51PM +0200, Roland wrote:
if /mnt/pve/gfs_vms is a writeable path from inside pve host, did you check if
there is
also corruption when reading/writing large files there and compare with md5sum
after copy ?
furthermore, i remember there was a gluster/qcow2 issue with aio=native some
years ago,
could you retry with aio=threads for the virtual disks ?
regards
roland
Am 30.05.23 um 18:32 schrieb Christian Schoepplein:
Hi,
we are testing the current proxmox version with a glusterfs storage backend
and have a strange issue with file getting corupted inside the virtual
machines. For what reason ever from one moment to another binaries can not
longer be executed, scripts are damaged and so on. In the logs I get errors
like this:
May 30 11:22:36 ns1 dockerd[1234]: time="2023-05-30T11:22:36.874765091+02:00"
level=warning msg="Running modprobe bridge br_netfilter failed with message: modprobe: ERROR:
could not insert 'bridge': Exec format error\nmodprobe: ERROR: could not insert 'br_netfilter':
Exec format error\ninsmod /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko \ninsmod
/lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko \n, error: exit status 1"
On such a broken system a file brings the following:
root@ns1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko
/lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: data
root@ns1:~#
On a normal system it looks like this:
root@gluster1:~# file /lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko
/lib/modules/5.15.0-72-generic/kernel/net/802/stp.ko: ELF 64-bit LSB
relocatable, x86-64, version 1 (SYSV),
BuildID[sha1]=1084f7cfcffbd4c607724fba287c0ea7fc5775
root@gluster1:~#
there are not only kernel modules afected. I saw the same behaviour for
scripts, icinga check modules, the sendmail binary and so on, I think it is
totaly random :-(.
We have the problems with newly installed VMs, VMs cloned from a template
create on our proxmox host and with VMs which we used before with libvirtd
and migrated to our new proxmox machine. So IMHO it can not be related to
the way we create new virtual machines...
We are using the following software:
root@proxmox1:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.104-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-1
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
root@proxmox1:~#
root@proxmox1:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content rootdir,iso,images,vztmpl,backup,snippets
zfspool: local-zfs
pool rpool/data
content images,rootdir
sparse 1
glusterfs: gfs_vms
path /mnt/pve/gfs_vms
volume gfs_vms
content images
prune-backups keep-all=1
server gluster1.linova.de
server2 gluster2.linova.de
root@proxmox1:~#
The config of a typical VM looks like this:
root@proxmox1:~# cat /etc/pve/qemu-server/101.conf
#ns1
agent: enabled=1,fstrim_cloned_disks=1
boot: c
bootdisk: scsi0
cicustom: user=local:snippets/user-data
cores: 1
hotplug: disk,network,usb
ide2: gfs_vms:101/vm-101-cloudinit.qcow2,media=cdrom,size=4M
ipconfig0: ip=10.200.32.9/22,gw=10.200.32.1
kvm: 1
machine: q35
memory: 2048
meta: creation-qemu=7.2.0,ctime=1683718002
name: ns1
nameserver: 10.200.0.5
net0: virtio=1A:61:75:25:C6:30,bridge=vmbr0
numa: 1
ostype: l26
scsi0: gfs_vms:101/vm-101-disk-0.qcow2,discard=on,size=10444M
scsihw: virtio-scsi-pci
searchdomain: linova.de
serial0: socket
smbios1: uuid=e2f503fe-4a66-4085-86c0-bb692add6b7a
sockets: 1
vmgenid: 3be6ec9d-7cfd-47c0-9f86-23c2e3ce5103
root@proxmox1:~#
Our glusterfs storage backend consists of three servers all running Ubuntu
22.04 and glusterfs version 10.1. There are no errors in the logs on the
glusterfs hosts when a VM crashes and because some times also icinga plugins
get corupted I do get a very exact time range to search in the logs for
errors and warnings.
However, I think it has something to do with our glusterfs setup. If I clone
a VM from a template I get the following:
root@proxmox1:~# qm clone 9000 200 --full --name testvm --description
"testvm" --storage gfs_vms
[62/62]
create full clone of drive ide2 (gfs_vms:9000/vm-9000-cloudinit.qcow2)
Formatting
'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-cloudinit.qcow2',
fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata
compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
[2023-05-30 16:18:17.753152 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure
ios_sample_buf size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:17.876879 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:17.877606 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:17.878275 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:27.761247 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
[2023-05-30 16:18:28.766999 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure
ios_sample_buf size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:28.936449 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0:
All subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:28.937547 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:28.938115 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:38.774387 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
create full clone of drive scsi0 (gfs_vms:9000/base-9000-disk-0.qcow2)
Formatting
'gluster://gluster1.linova.de/gfs_vms/images/200/vm-200-disk-0.qcow2',
fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata
compression_type=zlib size=10951327744 lazy_refcounts=off refcount_bits=16
[2023-05-30 16:18:39.962238 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure
ios_sample_buf size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:40.084300 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:40.084996 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:40.085505 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:49.970199 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
[2023-05-30 16:18:50.975729 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure
ios_sample_buf size is 1024 because ios_sample_interval is 0
[2023-05-30 16:18:51.768619 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:51.769330 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:18:51.769822 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:00.984578 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
transferred 0.0 B of 10.2 GiB (0.00%)
[2023-05-30 16:19:02.030902 +0000] I
[io-stats.c:3706:ios_sample_buf_size_configure] 0-gfs_vms: Configure
ios_sample_buf size is 1024 because ios_sample_interval is 0
transferred 112.8 MiB of 10.2 GiB (1.08%)
transferred 230.8 MiB of 10.2 GiB (2.21%)
transferred 340.5 MiB of 10.2 GiB (3.26%)
...
transferred 10.1 GiB of 10.2 GiB (99.15%)
transferred 10.2 GiB of 10.2 GiB (100.00%)
transferred 10.2 GiB of 10.2 GiB (100.00%)
[2023-05-30 16:19:29.804006 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-0: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:29.804807 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-1: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:29.805486 +0000] E [MSGID: 108006]
[afr-common.c:6140:__afr_handle_child_down_event] 0-gfs_vms-replicate-2: All
subvolumes are down. Going offline until at least one of them comes back up.
[2023-05-30 16:19:32.044693 +0000] I [io-stats.c:4038:fini] 0-gfs_vms:
io-stats translator unloaded
root@proxmox1:~#
Is this message about the subvolumes which are down normal or might this be
the reason for our strange problems?
I have no idea how to further debug the problem so any helping idea or hint
would be great. Pleae let me also know if I can provide more infos regarding
our setup.
Ciao and thanks a lot,
Schoepp