Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument
On Thu, Jun 21, 2018 at 02:58:59PM +0200, Mischa wrote: > > On 19 Jun 2018, at 20:28, Mischa wrote: > > > >> On 19 Jun 2018, at 17:51, Mike Larkin wrote: > >> On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote: > Synopsis:VMs stop intermitently after vcpu_run_loop error > Category:system > Environment: > >>> System : OpenBSD 6.3 > >>> Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 > >>> > >>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > >>> > >>> Architecture: OpenBSD.amd64 > >>> Machine : amd64 > Description: > >>> Currently running 12 VMs on a single machine. After some random time, > >>> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's > >>> always after an error message like the following a VM stops. > >>> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl > >>> failed: Invalid argument > >>> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited > >>> unexpectedly > >> > >> This is almost surely the following bug, fixed in April (log from pmap.c): > >> > >> revision 1.113 > >> date: 2018/04/17 06:31:55; author: mlarkin; state: Exp; lines: +275 > >> -64; commitid: BaLjO2NVfYaZP00l; > >> Better way of allocating EPT entries. > >> > >> Don't use the standard pmap PTE functions to manipulate EPT PTEs. This > >> occasionally caused VMs to fail after random amounts of time due to > >> loading the pmap on the CPU and the processor updating A/D bits (which > >> are reserved bits in EPT). This ultimately manifested itself as errors > >> from vmd ("vcpu X run ioctl failed".) > >> > >> tested by many, on different types of HW, no regressions noted > >> > >> --- > >> > >> Can you try -current and see if you can still reproduce this problem? > > Tried -current today but got a kernel panic, seems to be unrelated to vmd but > wasn't able to collect all the information that is needed to file the bug. > Only got the trace. Will try -current again in a couple of days. > > The below is what I was able to collect, will do better next time. > > panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file > "/usr/src/sys/net/if.c", line 1382 > Stopped at db_enter+0x12: popq%r11 > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > 374176 65811070x100010 0x4003K vmd > 94393 65811070x100010 0x4002 vmd > 311214 91351070x100010 0x4000 vmd > *299692 82346 910x12 01 snmpd > db_enter() at db_enter+0x12 > panic() at panic+0x138 > __assert(818eebd4,8000222de2e0,0,8000222de3d8) at > __assert+0x24 > > ifa_ifwithaddr(1962634d45f6caa5,8000222de3d8) at ifa_ifwithaddr+0xed > > in_pcbaddrisavail(c69add51b77cda1a,ff01f1cb6118,18,16) at > in_pcbaddrisavail > +0xd0 > udp_output(8b483746c94285a0,237d,0,0) at udp_output+0x168 > sosend(c197a5735fd92204,ff01edcbc798,80009bd8,6b,5,80009bf8 > ) at sosend+0x351 > sendit(cb9e953daf7fb585,80009bd8,8000222de6c0,8000222de5c0, > 8000222de6d0) at sendit+0x3fb > sys_sendmsg(e1aea756e75a2f45,1c0,80009bd8) at sys_sendmsg+0x15a > > syscall(eee152956d09f98c) at syscall+0x32a > Xsyscall_untramp(6,0,0,0,0,1c) at Xsyscall_untramp+0xe4 > end of kernel > end trace frame: 0x7f7c0410, count: 4 > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > ddb{1}> > > > Mischa > Yeah, there is certainly nothing to do with vmm/vmd in that trace :) If you're interested, You can probably apply this diff to 6.3-stable with
Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument
> On 19 Jun 2018, at 20:28, Mischa wrote: > >> On 19 Jun 2018, at 17:51, Mike Larkin wrote: >> On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote: Synopsis: VMs stop intermitently after vcpu_run_loop error Category: system Environment: >>> System : OpenBSD 6.3 >>> Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 >>> >>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >>> >>> Architecture: OpenBSD.amd64 >>> Machine : amd64 Description: >>> Currently running 12 VMs on a single machine. After some random time, >>> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's >>> always after an error message like the following a VM stops. >>> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl >>> failed: Invalid argument >>> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly >>> >> >> This is almost surely the following bug, fixed in April (log from pmap.c): >> >> revision 1.113 >> date: 2018/04/17 06:31:55; author: mlarkin; state: Exp; lines: +275 -64; >> commitid: BaLjO2NVfYaZP00l; >> Better way of allocating EPT entries. >> >> Don't use the standard pmap PTE functions to manipulate EPT PTEs. This >> occasionally caused VMs to fail after random amounts of time due to >> loading the pmap on the CPU and the processor updating A/D bits (which >> are reserved bits in EPT). This ultimately manifested itself as errors >> from vmd ("vcpu X run ioctl failed".) >> >> tested by many, on different types of HW, no regressions noted >> >> --- >> >> Can you try -current and see if you can still reproduce this problem? Tried -current today but got a kernel panic, seems to be unrelated to vmd but wasn't able to collect all the information that is needed to file the bug. Only got the trace. Will try -current again in a couple of days. The below is what I was able to collect, will do better next time. panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file "/usr/src/sys/net/if.c", line 1382 Stopped at db_enter+0x12: popq%r11 TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 374176 65811070x100010 0x4003K vmd 94393 65811070x100010 0x4002 vmd 311214 91351070x100010 0x4000 vmd *299692 82346 910x12 01 snmpd db_enter() at db_enter+0x12 panic() at panic+0x138 __assert(818eebd4,8000222de2e0,0,8000222de3d8) at __assert+0x24 ifa_ifwithaddr(1962634d45f6caa5,8000222de3d8) at ifa_ifwithaddr+0xed in_pcbaddrisavail(c69add51b77cda1a,ff01f1cb6118,18,16) at in_pcbaddrisavail +0xd0 udp_output(8b483746c94285a0,237d,0,0) at udp_output+0x168 sosend(c197a5735fd92204,ff01edcbc798,80009bd8,6b,5,80009bf8 ) at sosend+0x351 sendit(cb9e953daf7fb585,80009bd8,8000222de6c0,8000222de5c0, 8000222de6d0) at sendit+0x3fb sys_sendmsg(e1aea756e75a2f45,1c0,80009bd8) at sys_sendmsg+0x15a syscall(eee152956d09f98c) at syscall+0x32a Xsyscall_untramp(6,0,0,0,0,1c) at Xsyscall_untramp+0xe4 end of kernel end trace frame: 0x7f7c0410, count: 4 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{1}> Mischa > > Will do! Will probably be able to upgrade to current this week. > >>> Side note: after a reboot of the host, all VMs stop at one point as it >>> looks like VMM starts all the VMs at the same time. Looks like it's >>> draining resources at that point. >>> >> >> Yes, this is a known issue, I've had it on my to-do list to have some sort >> of sequencing or delay, but never got around to it (Hint, hint, such a
Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument
> On 19 Jun 2018, at 17:51, Mike Larkin wrote: > On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote: >>> Synopsis: VMs stop intermitently after vcpu_run_loop error >>> Category: system >>> Environment: >> System : OpenBSD 6.3 >> Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 >> >> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >> Architecture: OpenBSD.amd64 >> Machine : amd64 >>> Description: >> Currently running 12 VMs on a single machine. After some random time, >> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's always >> after an error message like the following a VM stops. >> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl >> failed: Invalid argument >> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly >> > > This is almost surely the following bug, fixed in April (log from pmap.c): > > revision 1.113 > date: 2018/04/17 06:31:55; author: mlarkin; state: Exp; lines: +275 -64; > commitid: BaLjO2NVfYaZP00l; > Better way of allocating EPT entries. > > Don't use the standard pmap PTE functions to manipulate EPT PTEs. This > occasionally caused VMs to fail after random amounts of time due to > loading the pmap on the CPU and the processor updating A/D bits (which > are reserved bits in EPT). This ultimately manifested itself as errors > from vmd ("vcpu X run ioctl failed".) > > tested by many, on different types of HW, no regressions noted > > --- > > Can you try -current and see if you can still reproduce this problem? Will do! Will probably be able to upgrade to current this week. >> Side note: after a reboot of the host, all VMs stop at one point as it looks >> like VMM starts all the VMs at the same time. Looks like it's draining >> resources at that point. >> > > Yes, this is a known issue, I've had it on my to-do list to have some sort > of sequencing or delay, but never got around to it (Hint, hint, such a fix > would > likely be an easy intro-to-vmd-hacking effort for someone who wanted to dip > their > toe in the water). Wish I was able to do something. Will get some more hardware up and running to host OpenBSD VMs and donate a part to the Foundation. Mischa > > -ml > >>> How-To-Repeat: >> Unfortunately I have not found a way to reproduce this, I thought I was >> on to something when I loaded a Alpine Linux VM as well, but this is now >> also happening without it running. >> >>> Fix: >> No fix. >> >> >> dmesg: >> OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 >> >> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> real mem = 8544342016 (8148MB) >> avail mem = 8278315008 (7894MB) >> mpath0 at root >> scsibus0 at mpath0: 256 targets >> mainbus0 at root >> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries) >> bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012 >> bios0: Supermicro X9SCL/X9SCM >> acpi0 at bios0: rev 2 >> acpi0: sleep states S0 S1 S4 S5 >> acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST >> HEST BERT BGRT >> acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) >> USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) USB7(S4) PXSX(S4) RP01(S4) >> PXSX(S4) RP02(S4) [...] >> acpitimer0 at acpi0: 3579545 Hz, 24 bits >> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat >> cpu0 at mainbus0: apid 0 (boot processor) >> cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.48 MHz >> cpu0: >>
Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument
On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote: > >Synopsis:VMs stop intermitently after vcpu_run_loop error > >Category:system > >Environment: > System : OpenBSD 6.3 > Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 > > r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > Currently running 12 VMs on a single machine. After some random time, > VMs randomly shutdown. Sometimes a single VM and sometimes more. It's always > after an error message like the following a VM stops. > Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl > failed: Invalid argument > Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl failed: > Invalid argument > Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl failed: > Invalid argument > Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl > failed: Invalid argument > Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl failed: > Invalid argument > Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl failed: > Invalid argument > Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl > failed: Invalid argument > Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl failed: > Invalid argument > Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl failed: > Invalid argument > Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl failed: > Invalid argument > Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl failed: > Invalid argument > Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl failed: > Invalid argument > Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly > This is almost surely the following bug, fixed in April (log from pmap.c): revision 1.113 date: 2018/04/17 06:31:55; author: mlarkin; state: Exp; lines: +275 -64; commitid: BaLjO2NVfYaZP00l; Better way of allocating EPT entries. Don't use the standard pmap PTE functions to manipulate EPT PTEs. This occasionally caused VMs to fail after random amounts of time due to loading the pmap on the CPU and the processor updating A/D bits (which are reserved bits in EPT). This ultimately manifested itself as errors from vmd ("vcpu X run ioctl failed".) tested by many, on different types of HW, no regressions noted --- Can you try -current and see if you can still reproduce this problem? > > Side note: after a reboot of the host, all VMs stop at one point as it looks > like VMM starts all the VMs at the same time. Looks like it's draining > resources at that point. > Yes, this is a known issue, I've had it on my to-do list to have some sort of sequencing or delay, but never got around to it (Hint, hint, such a fix would likely be an easy intro-to-vmd-hacking effort for someone who wanted to dip their toe in the water). -ml > >How-To-Repeat: > Unfortunately I have not found a way to reproduce this, I thought I was > on to something when I loaded a Alpine Linux VM as well, but this is now also > happening without it running. > > >Fix: > No fix. > > > dmesg: > OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 > > r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 8544342016 (8148MB) > avail mem = 8278315008 (7894MB) > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries) > bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012 > bios0: Supermicro X9SCL/X9SCM > acpi0 at bios0: rev 2 > acpi0: sleep states S0 S1 S4 S5 > acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST > HEST BERT BGRT > acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) > USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) USB7(S4) PXSX(S4) RP01(S4) > PXSX(S4) RP02(S4) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.48 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,XSAVEOPT,MELTDOWN > cpu0: 256KB 64b/line 8-way L2 cache > acpitimer0: recalibrated TSC frequency 3100030275 Hz > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > cpu0: apic clock running at 100MHz > cpu0: mwait min=64, max=64,
VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument
>Synopsis: VMs stop intermitently after vcpu_run_loop error >Category: system >Environment: System : OpenBSD 6.3 Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: Currently running 12 VMs on a single machine. After some random time, VMs randomly shutdown. Sometimes a single VM and sometimes more. It's always after an error message like the following a VM stops. Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl failed: Invalid argument Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly Side note: after a reboot of the host, all VMs stop at one point as it looks like VMM starts all the VMs at the same time. Looks like it's draining resources at that point. >How-To-Repeat: Unfortunately I have not found a way to reproduce this, I thought I was on to something when I loaded a Alpine Linux VM as well, but this is now also happening without it running. >Fix: No fix. dmesg: OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018 r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 8544342016 (8148MB) avail mem = 8278315008 (7894MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries) bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012 bios0: Supermicro X9SCL/X9SCM acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST HEST BERT BGRT acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) USB7(S4) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.48 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache acpitimer0: recalibrated TSC frequency 3100030275 Hz cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 100MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.02 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.02 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Xeon(R) CPU E3-1220 V2 @