I updated a few hypervisors and their VMs to CentOS 6.10 on Monday;
today I awoke to an alert saying all VMs are down. It looks like a very
old bug crept back in.
The machine is a ProLiant DL380 G7 with Xeon X5675 and 96 GB, running
half a dozen smallish VMs. Hypervisor and all VMs have kernel
2.6.32-754.2.1.el6.x86_64. Around the time the VMs must have gone down,
there are quite a few error messages like the following in the system
log:
Aug 16 03:10:13 hyper-7 kernel: [265397.382552] vmwrite error: reg 6000 value
fff7 (err -9)
Aug 16 03:10:13 hyper-7 kernel: [265397.421372] Pid: 9375, comm: qemu-kvm Not
tainted 2.6.32-754.2.1.el6.x86_64 #1
Aug 16 03:10:13 hyper-7 kernel: [265397.464985] Call Trace:
Aug 16 03:10:13 hyper-7 kernel: [265397.481530] [] ?
vmwrite_error+0x2c/0x30 [kvm_intel]
Aug 16 03:10:13 hyper-7 kernel: [265397.520737] [] ?
vmcs_writel+0x20/0x30 [kvm_intel]
Aug 16 03:10:13 hyper-7 kernel: [265397.560028] [] ?
vmx_fpu_activate+0x93/0xc0 [kvm_intel]
Aug 16 03:10:14 hyper-7 kernel: [265397.600072] [] ?
kvm_arch_vcpu_create+0x37/0x50 [kvm]
Aug 16 03:10:14 hyper-7 kernel: [265397.638183] [] ?
kvm_vm_ioctl+0x601/0x1050 [kvm]
Aug 16 03:10:14 hyper-7 kernel: [265397.674367] [] ?
free_one_page+0x191/0x440
Aug 16 03:10:14 hyper-7 kernel: [265397.708101] [] ?
vfs_ioctl+0x29/0xc0
Aug 16 03:10:14 hyper-7 kernel: [265397.739124] [] ?
__free_pages+0x46/0xa0
Aug 16 03:10:14 hyper-7 kernel: [265397.773193] [] ?
do_vfs_ioctl+0x3aa/0x590
Aug 16 03:10:14 hyper-7 kernel: [265397.805774] [] ?
free_pages+0x49/0x50
Aug 16 03:10:14 hyper-7 kernel: [265397.839147] [] ?
sys_ioctl+0x81/0xa0
Aug 16 03:10:14 hyper-7 kernel: [265397.870109] [] ?
__audit_syscall_exit+0x25e/0x290
Aug 16 03:10:14 hyper-7 kernel: [265397.909358] [] ?
system_call_fastpath+0x2f/0x34
Curiously, the messages don't seem to indicate anything fatal in and of
themselves; there are a two like this a minute after bootup and like a
dozen more after about a day, none of which seems to have crashed
anything. However, it's the only obvious anomaly I could find around the
time and as it's VT-x related, I reckon there's a connection.
The stack trace closely resembles this bug that turned up in 2015 and was
fixed long ago: https://lkml.org/lkml/2015/7/3/288
Has anyone seen this recently and could confirm or refute any of my
guesswork?
Cheers,
Matthias
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos