Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument

2018-06-21 Thread Mike Larkin
On Thu, Jun 21, 2018 at 02:58:59PM +0200, Mischa wrote:
> > On 19 Jun 2018, at 20:28, Mischa  wrote:
> > 
> >> On 19 Jun 2018, at 17:51, Mike Larkin  wrote:
> >> On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote:
>  Synopsis:VMs stop intermitently after vcpu_run_loop error
>  Category:system
>  Environment:
> >>>   System  : OpenBSD 6.3
> >>>   Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
> >>>
> >>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >>> 
> >>>   Architecture: OpenBSD.amd64
> >>>   Machine : amd64
>  Description:
> >>>   Currently running 12 VMs on a single machine. After some random time, 
> >>> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's 
> >>> always after an error message like the following a VM stops.
> >>> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl 
> >>> failed: Invalid argument
> >>> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited 
> >>> unexpectedly 
> >> 
> >> This is almost surely the following bug, fixed in April (log from pmap.c):
> >> 
> >> revision 1.113
> >> date: 2018/04/17 06:31:55;  author: mlarkin;  state: Exp;  lines: +275 
> >> -64;  commitid: BaLjO2NVfYaZP00l;
> >> Better way of allocating EPT entries.
> >> 
> >> Don't use the standard pmap PTE functions to manipulate EPT PTEs. This
> >> occasionally caused VMs to fail after random amounts of time due to
> >> loading the pmap on the CPU and the processor updating A/D bits (which
> >> are reserved bits in EPT). This ultimately manifested itself as errors
> >> from vmd ("vcpu X run ioctl failed".)
> >> 
> >> tested by many, on different types of HW, no regressions noted
> >> 
> >> ---
> >> 
> >> Can you try -current and see if you can still reproduce this problem?
> 
> Tried -current today but got a kernel panic, seems to be unrelated to vmd but 
> wasn't able to collect all the information that is needed to file the bug.
> Only got the trace. Will try -current again in a couple of days.
> 
> The below is what I was able to collect, will do better next time.
> 
> panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file 
> "/usr/src/sys/net/if.c", line 1382
> Stopped at  db_enter+0x12:  popq%r11
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>  374176   65811070x100010  0x4003K vmd
>   94393   65811070x100010  0x4002  vmd
>  311214   91351070x100010  0x4000  vmd
> *299692  82346 910x12  01  snmpd
> db_enter() at db_enter+0x12
> panic() at panic+0x138
> __assert(818eebd4,8000222de2e0,0,8000222de3d8) at 
> __assert+0x24
> 
> ifa_ifwithaddr(1962634d45f6caa5,8000222de3d8) at ifa_ifwithaddr+0xed  
>  
> in_pcbaddrisavail(c69add51b77cda1a,ff01f1cb6118,18,16) at 
> in_pcbaddrisavail
> +0xd0
> udp_output(8b483746c94285a0,237d,0,0) at udp_output+0x168
> sosend(c197a5735fd92204,ff01edcbc798,80009bd8,6b,5,80009bf8
> ) at sosend+0x351
> sendit(cb9e953daf7fb585,80009bd8,8000222de6c0,8000222de5c0,
> 8000222de6d0) at sendit+0x3fb
> sys_sendmsg(e1aea756e75a2f45,1c0,80009bd8) at sys_sendmsg+0x15a   
>  
> syscall(eee152956d09f98c) at syscall+0x32a
> Xsyscall_untramp(6,0,0,0,0,1c) at Xsyscall_untramp+0xe4
> end of kernel
> end trace frame: 0x7f7c0410, count: 4
> https://www.openbsd.org/ddb.html describes the minimum info required in bug   
>  
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{1}>
> 
> 
> Mischa
> 

Yeah, there is certainly nothing to do with vmm/vmd in that trace :)

If you're interested, You can probably apply this diff to 6.3-stable with 

Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument

2018-06-21 Thread Mischa
> On 19 Jun 2018, at 20:28, Mischa  wrote:
> 
>> On 19 Jun 2018, at 17:51, Mike Larkin  wrote:
>> On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote:
 Synopsis:  VMs stop intermitently after vcpu_run_loop error
 Category:  system
 Environment:
>>> System  : OpenBSD 6.3
>>> Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
>>>  
>>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>>> 
>>> Architecture: OpenBSD.amd64
>>> Machine : amd64
 Description:
>>> Currently running 12 VMs on a single machine. After some random time, 
>>> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's 
>>> always after an error message like the following a VM stops.
>>> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl 
>>> failed: Invalid argument
>>> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly  
>>>
>> 
>> This is almost surely the following bug, fixed in April (log from pmap.c):
>> 
>> revision 1.113
>> date: 2018/04/17 06:31:55;  author: mlarkin;  state: Exp;  lines: +275 -64;  
>> commitid: BaLjO2NVfYaZP00l;
>> Better way of allocating EPT entries.
>> 
>> Don't use the standard pmap PTE functions to manipulate EPT PTEs. This
>> occasionally caused VMs to fail after random amounts of time due to
>> loading the pmap on the CPU and the processor updating A/D bits (which
>> are reserved bits in EPT). This ultimately manifested itself as errors
>> from vmd ("vcpu X run ioctl failed".)
>> 
>> tested by many, on different types of HW, no regressions noted
>> 
>> ---
>> 
>> Can you try -current and see if you can still reproduce this problem?

Tried -current today but got a kernel panic, seems to be unrelated to vmd but 
wasn't able to collect all the information that is needed to file the bug.
Only got the trace. Will try -current again in a couple of days.

The below is what I was able to collect, will do better next time.

panic: kernel diagnostic assertion "_kernel_lock_held()" failed: file 
"/usr/src/sys/net/if.c", line 1382
Stopped at  db_enter+0x12:  popq%r11
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 374176   65811070x100010  0x4003K vmd
  94393   65811070x100010  0x4002  vmd
 311214   91351070x100010  0x4000  vmd
*299692  82346 910x12  01  snmpd
db_enter() at db_enter+0x12
panic() at panic+0x138
__assert(818eebd4,8000222de2e0,0,8000222de3d8) at __assert+0x24

ifa_ifwithaddr(1962634d45f6caa5,8000222de3d8) at ifa_ifwithaddr+0xed   
in_pcbaddrisavail(c69add51b77cda1a,ff01f1cb6118,18,16) at in_pcbaddrisavail
+0xd0
udp_output(8b483746c94285a0,237d,0,0) at udp_output+0x168
sosend(c197a5735fd92204,ff01edcbc798,80009bd8,6b,5,80009bf8
) at sosend+0x351
sendit(cb9e953daf7fb585,80009bd8,8000222de6c0,8000222de5c0,
8000222de6d0) at sendit+0x3fb
sys_sendmsg(e1aea756e75a2f45,1c0,80009bd8) at sys_sendmsg+0x15a
syscall(eee152956d09f98c) at syscall+0x32a
Xsyscall_untramp(6,0,0,0,0,1c) at Xsyscall_untramp+0xe4
end of kernel
end trace frame: 0x7f7c0410, count: 4
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{1}>


Mischa

> 
> Will do! Will probably be able to upgrade to current this week.
> 
>>> Side note: after a reboot of the host, all VMs stop at one point as it 
>>> looks like VMM starts all the VMs at the same time. Looks like it's 
>>> draining resources at that point.
>>> 
>> 
>> Yes, this is a known issue, I've had it on my to-do list to have some sort
>> of sequencing or delay, but never got around to it (Hint, hint, such a 

Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument

2018-06-19 Thread Mischa
> On 19 Jun 2018, at 17:51, Mike Larkin  wrote:
> On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote:
>>> Synopsis:   VMs stop intermitently after vcpu_run_loop error
>>> Category:   system
>>> Environment:
>>  System  : OpenBSD 6.3
>>  Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
>>   
>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> 
>>  Architecture: OpenBSD.amd64
>>  Machine : amd64
>>> Description:
>>  Currently running 12 VMs on a single machine. After some random time, 
>> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's always 
>> after an error message like the following a VM stops.
>> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl 
>> failed: Invalid argument
>> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly   
>>   
> 
> This is almost surely the following bug, fixed in April (log from pmap.c):
> 
> revision 1.113
> date: 2018/04/17 06:31:55;  author: mlarkin;  state: Exp;  lines: +275 -64;  
> commitid: BaLjO2NVfYaZP00l;
> Better way of allocating EPT entries.
> 
> Don't use the standard pmap PTE functions to manipulate EPT PTEs. This
> occasionally caused VMs to fail after random amounts of time due to
> loading the pmap on the CPU and the processor updating A/D bits (which
> are reserved bits in EPT). This ultimately manifested itself as errors
> from vmd ("vcpu X run ioctl failed".)
> 
> tested by many, on different types of HW, no regressions noted
> 
> ---
> 
> Can you try -current and see if you can still reproduce this problem?

Will do! Will probably be able to upgrade to current this week.

>> Side note: after a reboot of the host, all VMs stop at one point as it looks 
>> like VMM starts all the VMs at the same time. Looks like it's draining 
>> resources at that point.
>> 
> 
> Yes, this is a known issue, I've had it on my to-do list to have some sort
> of sequencing or delay, but never got around to it (Hint, hint, such a fix 
> would
> likely be an easy intro-to-vmd-hacking effort for someone who wanted to dip 
> their
> toe in the water).

Wish I was able to do something. Will get some more hardware up and running to 
host OpenBSD VMs and donate a part to the Foundation.

Mischa

> 
> -ml
> 
>>> How-To-Repeat:
>>  Unfortunately I have not found a way to reproduce this, I thought I was 
>> on to something when I loaded a Alpine Linux VM as well, but this is now 
>> also happening without it running. 
>> 
>>> Fix:
>>  No fix.
>> 
>> 
>> dmesg:
>> OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
>>
>> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>> real mem = 8544342016 (8148MB)
>> avail mem = 8278315008 (7894MB)
>> mpath0 at root
>> scsibus0 at mpath0: 256 targets
>> mainbus0 at root
>> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries)
>> bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012
>> bios0: Supermicro X9SCL/X9SCM
>> acpi0 at bios0: rev 2
>> acpi0: sleep states S0 S1 S4 S5
>> acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST 
>> HEST BERT BGRT
>> acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) 
>> USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) USB7(S4) PXSX(S4) RP01(S4) 
>> PXSX(S4) RP02(S4) [...]
>> acpitimer0 at acpi0: 3579545 Hz, 24 bits
>> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
>> cpu0 at mainbus0: apid 0 (boot processor)
>> cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.48 MHz
>> cpu0: 
>> 

Re: VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument

2018-06-19 Thread Mike Larkin
On Tue, Jun 19, 2018 at 03:42:06PM +0200, obs...@high5.nl wrote:
> >Synopsis:VMs stop intermitently after vcpu_run_loop error
> >Category:system
> >Environment:
>   System  : OpenBSD 6.3
>   Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
>
> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Currently running 12 VMs on a single machine. After some random time, 
> VMs randomly shutdown. Sometimes a single VM and sometimes more. It's always 
> after an error message like the following a VM stops.
> Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl 
> failed: Invalid argument
> Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl failed: 
> Invalid argument
> Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl failed: 
> Invalid argument
> Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl 
> failed: Invalid argument
> Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl failed: 
> Invalid argument
> Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl failed: 
> Invalid argument
> Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl 
> failed: Invalid argument
> Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl failed: 
> Invalid argument
> Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl failed: 
> Invalid argument
> Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl failed: 
> Invalid argument
> Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl failed: 
> Invalid argument
> Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl failed: 
> Invalid argument
> Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly
>  

This is almost surely the following bug, fixed in April (log from pmap.c):

revision 1.113
date: 2018/04/17 06:31:55;  author: mlarkin;  state: Exp;  lines: +275 -64;  
commitid: BaLjO2NVfYaZP00l;
Better way of allocating EPT entries.

Don't use the standard pmap PTE functions to manipulate EPT PTEs. This
occasionally caused VMs to fail after random amounts of time due to
loading the pmap on the CPU and the processor updating A/D bits (which
are reserved bits in EPT). This ultimately manifested itself as errors
from vmd ("vcpu X run ioctl failed".)

tested by many, on different types of HW, no regressions noted


---

Can you try -current and see if you can still reproduce this problem?

> 
> Side note: after a reboot of the host, all VMs stop at one point as it looks 
> like VMM starts all the VMs at the same time. Looks like it's draining 
> resources at that point.
> 

Yes, this is a known issue, I've had it on my to-do list to have some sort
of sequencing or delay, but never got around to it (Hint, hint, such a fix would
likely be an easy intro-to-vmd-hacking effort for someone who wanted to dip 
their
toe in the water).

-ml

> >How-To-Repeat:
>   Unfortunately I have not found a way to reproduce this, I thought I was 
> on to something when I loaded a Alpine Linux VM as well, but this is now also 
> happening without it running. 
> 
> >Fix:
>   No fix.
> 
> 
> dmesg:
> OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
> 
> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8544342016 (8148MB)
> avail mem = 8278315008 (7894MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries)
> bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012
> bios0: Supermicro X9SCL/X9SCM
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST 
> HEST BERT BGRT
> acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) 
> USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) USB7(S4) PXSX(S4) RP01(S4) 
> PXSX(S4) RP02(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.48 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> acpitimer0: recalibrated TSC frequency 3100030275 Hz
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 100MHz
> cpu0: mwait min=64, max=64, 

VMs "crash" with: vcpu_run_loop: vm 19 / vcpu 0 run ioctl failed: Invalid argument

2018-06-19 Thread obsdml
>Synopsis:  VMs stop intermitently after vcpu_run_loop error
>Category:  system
>Environment:
System  : OpenBSD 6.3
Details : OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018
 
r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
Currently running 12 VMs on a single machine. After some random time, 
VMs randomly shutdown. Sometimes a single VM and sometimes more. It's always 
after an error message like the following a VM stops.
Jun 19 11:16:49 j5 vmd[59907]: vcpu_run_loop: vm 11 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[51030]: vcpu_run_loop: vm 6 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[64667]: vcpu_run_loop: vm 9 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[53125]: vcpu_run_loop: vm 10 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[61348]: vcpu_run_loop: vm 7 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[59411]: vcpu_run_loop: vm 4 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[52083]: vcpu_run_loop: vm 12 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[21423]: vcpu_run_loop: vm 3 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[40185]: vcpu_run_loop: vm 5 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[60512]: vcpu_run_loop: vm 8 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[44187]: vcpu_run_loop: vm 1 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:16:49 j5 vmd[46728]: vcpu_run_loop: vm 2 / vcpu 0 run ioctl failed: 
Invalid argument
Jun 19 11:20:30 j5 vmd[81571]: vmd: vm 22 event thread exited unexpectedly 

Side note: after a reboot of the host, all VMs stop at one point as it looks 
like VMM starts all the VMs at the same time. Looks like it's draining 
resources at that point.

>How-To-Repeat:
Unfortunately I have not found a way to reproduce this, I thought I was 
on to something when I loaded a Alpine Linux VM as well, but this is now also 
happening without it running. 

>Fix:
No fix.


dmesg:
OpenBSD 6.3 (GENERIC.MP) #4: Sun Jun 17 11:22:20 CEST 2018

r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8544342016 (8148MB)
avail mem = 8278315008 (7894MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe9750 (56 entries)
bios0: vendor American Megatrends Inc. version "2.0b" date 09/17/2012
bios0: Supermicro X9SCL/X9SCM
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP APIC FPDT MCFG HPET SSDT SPMI SSDT SSDT EINJ ERST HEST 
BERT BGRT
acpi0: wakeup devices PS2K(S4) PS2M(S4) UAR1(S4) UAR2(S4) P0P1(S4) USB1(S4) 
USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) USB7(S4) PXSX(S4) RP01(S4) 
PXSX(S4) RP02(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.48 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
acpitimer0: recalibrated TSC frequency 3100030275 Hz
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.02 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 3100.02 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Xeon(R) CPU E3-1220 V2 @