Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-02 Thread Theo de Raadt
Mike Larkin  wrote:

> On Thu, Aug 02, 2018 at 09:22:53AM +0200, Nulani t'Acraya wrote:
> > Hello,
> > 
> > Something similar also appears to also be affecting bhyve, at least on an
> > AMD Opteron 4228 HE. The error produced is different depending on
> > whether bhyve is instructed to ignore accessed to model specific registers
> > that are not implemented in the current CPU. I haven't had to have that flag
> > toggled previously. I've included the dmesg and trace from both setups 
> > below.
> > 
> > A snapshot of -current with a build date of 1533181438 - Thu Aug 2 03:43:58
> > UTC 2018 boots successfully with ignore_bad_msr set to on. I'm not entirely
> > sure if Bryan's patch will have made it into that snapshot or not, but
> > if it has,
> > it appears to also be fixing the issue on bhyve. Thanks!
> > 
> > Sincerely,
> > Nulani.
> > 
> 
> The problem is not ignoring access to bad MSRs.
> 
> The problem is that these hypervisors dont know that the MSR we are accessing 
> is
> indeed a real, valid MSR. And then they panic.

Indeed.

AMD said their hardware has this chicken bit in a specific MSR.  It was
previously undocumented.

The hypervisors have a whitelist of MSR, and this MSR is not listed,
and instead blocked.

The hypervisors need to be updated.

The people who run that code can tell them to do so.



Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-02 Thread Mike Larkin
On Thu, Aug 02, 2018 at 09:22:53AM +0200, Nulani t'Acraya wrote:
> Hello,
> 
> Something similar also appears to also be affecting bhyve, at least on an
> AMD Opteron 4228 HE. The error produced is different depending on
> whether bhyve is instructed to ignore accessed to model specific registers
> that are not implemented in the current CPU. I haven't had to have that flag
> toggled previously. I've included the dmesg and trace from both setups below.
> 
> A snapshot of -current with a build date of 1533181438 - Thu Aug 2 03:43:58
> UTC 2018 boots successfully with ignore_bad_msr set to on. I'm not entirely
> sure if Bryan's patch will have made it into that snapshot or not, but
> if it has,
> it appears to also be fixing the issue on bhyve. Thanks!
> 
> Sincerely,
> Nulani.
> 

The problem is not ignoring access to bad MSRs.

The problem is that these hypervisors dont know that the MSR we are accessing is
indeed a real, valid MSR. And then they panic.

-ml

> ### 6.3 without ignore_bad_msr ###
> Copyright (c) 1982, 1986, 1989, 1991, 1993
> The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2018 OpenBSD. All rights reserved.  https://www.OpenBSD.org
> 
> OpenBSD 6.3 (GENERIC) #7: Sun Jul 29 11:30:47 CEST 2018
> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> real mem = 1056964608 (1008MB)
> avail mem = 1019158528 (971MB)
> warning: no entropy supplied by boot loader
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf101f (9 entries)
> bios0: vendor BHYVE version "1.00" date 03/14/2014
> bios0: bhyve BHYVE
> acpi0 at bios0: rev 2
> acpi0: sleep states S5
> acpi0: tables DSDT APIC FACP HPET MCFG
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 32 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Opteron(tm) Processor 4228 HE, 2800.42 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,HV,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,XOP,SKINIT,WDT,FMA4,ITSC
> cpu0: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB
> 64b/line 16-way L2 cache, 8MB 64b/line 64-way L3 cache
> cpu0: ITLB 48 4KB entries fully associative, 24 4MB entries fully associative
> cpu0: DTLB 32 4KB entries fully associative, 32 4MB entries fully associative
> kernel: protection fault trap, code=0
> Stopped at  0x81219c59: wrmsr
> ddb> trace
> 81219c59(80031700,81a7fff0,81a7d028,81c
> 06a58,80031724,0) at 0x81219c59
> 81008d2e(80023100,81c06a58,80031700,81a
> 7d000,81008d2e,81c069b0) at 0x81008d2e
> 813618b8(0,800232c4,80023298,800232c4,8
> 1c06a38,816c51a0) at 0x813618b8
> 816c4c36(80020400,81c06b60,81ab3f38,800
> 31200,80031224,0) at 0x816c4c36
> 813618b8(81c06b60,80020400,80020470,800
> 20460,80023280,8100d040) at 0x813618b8
> 8100c571(80023180,81c06c50,81a811a8,800
> 20400,80020424,0) at 0x8100c571
> 813618b8(800014a67023,80023180,3c,104,800014a67042,
> 8140b6f0) at 0x813618b8
> 8140a766(80023100,81c06d88,81aa1ea8,800
> 23180,800231a4,0) at 0x8140a766
> 813618b8(81c06d88,80023100,81a8ba98,800
> 23100,80023124,811a9830) at 0x813618b8
> 811a95a1(0,0,0,81c06db0,81c06e20,300010) at 
> 0xf
> fff811a95a1
> 813618b8(0,81827131,81a9fd8a,81c06e78,b28,0) 
> at
>  0x813618b8
> 81361a53(0,0,0,0,81c8,0) at 0x81361a53
> 8101187b(0,0,8101187b,81c06ef0,0,0) at 
> 0x810118
> 7b
> 8116b8c3(0,0,0,0,8116b8c3,81c06f20) at 
> 0x8116b8
> c3
> end trace frame: 0x0, count: -14
> ddb> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> *0   0 -1  0  7 0x10200swapper
> 
> ###6.3 with ignore_bad_msr###
> Copyright (c) 1982, 1986, 1989, 1991, 1993
> The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2018 OpenBSD. All rights reserved.  https://www.OpenBSD.org
> 
> OpenBSD 6.3 (GENERIC) #7: Sun Jul 29 11:30:47 CEST 2018
> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> real mem = 1056964608 (1008MB)
> avail mem = 1019158528 (971MB)
> warning: no entropy supplied by boot loader
> 

Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-02 Thread Penty Wenngren
On Wed, Aug 01, 2018 at 03:46:25PM +0200, Elmer Skjødt Henriksen wrote:
> After installing the 014_amdlfence patch released yesterday for 6.3, my
> OpenBSD VM crashes on boot. It's running under KVM on a Linux box (Ubuntu
> 18.04 w/ kernel 4.15) on an AMD Ryzen 7 1700 (microcode 0x8001137).
> I suppose this would also happen on vmm(4) and bhyve, however I don't have
> any such AMD hosts available for testing.
> 
> It occurs both using libvirt's "EPYC" CPU model and using "host-passthrough"
> (i.e. no virtual CPU model), but the "core2duo" CPU model works fine.
> 
> I guess not many people are running OpenBSD as a VM, and even less on AMD
> hardware. But still, a syspatch leaving the system unable to boot is
> probably not a good thing. :)
> 

Perhaps there are some differences between Ryzen and Threadripper then. I run
Ubuntu 18.04 on a 1950X Threadripper and a couple of OpenBSD hosts under
libvirt/KVM and it works well here:

# uname -a
Linux 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 
x86_64 x86_64 GNU/Linux


  EPYC
  AMD
  
  
  
  
  
  
  


OpenBSD 6.3 (GENERIC.MP) #7: Sun Jul 29 11:43:12 CEST 2018

r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4278054912 (4079MB)
avail mem = 4141334528 (3949MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xbc60 (17 entries)
bios0: vendor SeaBIOS version "1.10.2-1ubuntu1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S5
acpi0: tables DSDT FACP APIC
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD EPYC Processor, 3400.37 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD EPYC Processor, 3399.97 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2 at mainbus0: apid 2 (application processor)
cpu2: AMD EPYC Processor, 3399.98 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3 at mainbus0: apid 3 (application processor)
cpu3: AMD EPYC Processor, 3399.98 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,XSAVEOPT,XSAVEC,XGETBV1
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache, 16MB 64b/line 16-way L3 cache
cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu4 at mainbus0: apid 4 (application processor)
cpu4: AMD EPYC Processor, 3399.97 MHz
cpu4: 

Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-02 Thread Nulani t'Acraya
Hello,

Something similar also appears to also be affecting bhyve, at least on an
AMD Opteron 4228 HE. The error produced is different depending on
whether bhyve is instructed to ignore accessed to model specific registers
that are not implemented in the current CPU. I haven't had to have that flag
toggled previously. I've included the dmesg and trace from both setups below.

A snapshot of -current with a build date of 1533181438 - Thu Aug 2 03:43:58
UTC 2018 boots successfully with ignore_bad_msr set to on. I'm not entirely
sure if Bryan's patch will have made it into that snapshot or not, but
if it has,
it appears to also be fixing the issue on bhyve. Thanks!

Sincerely,
Nulani.

### 6.3 without ignore_bad_msr ###
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2018 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 6.3 (GENERIC) #7: Sun Jul 29 11:30:47 CEST 2018
r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 1056964608 (1008MB)
avail mem = 1019158528 (971MB)
warning: no entropy supplied by boot loader
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf101f (9 entries)
bios0: vendor BHYVE version "1.00" date 03/14/2014
bios0: bhyve BHYVE
acpi0 at bios0: rev 2
acpi0: sleep states S5
acpi0: tables DSDT APIC FACP HPET MCFG
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Opteron(tm) Processor 4228 HE, 2800.42 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AES,XSAVE,AVX,HV,NXE,MMXX,FFXSR,PAGE1GB,LONG,LAHF,CMPLEG,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,XOP,SKINIT,WDT,FMA4,ITSC
cpu0: 64KB 64b/line 2-way I-cache, 16KB 64b/line 4-way D-cache, 2MB
64b/line 16-way L2 cache, 8MB 64b/line 64-way L3 cache
cpu0: ITLB 48 4KB entries fully associative, 24 4MB entries fully associative
cpu0: DTLB 32 4KB entries fully associative, 32 4MB entries fully associative
kernel: protection fault trap, code=0
Stopped at  0x81219c59: wrmsr
ddb> trace
81219c59(80031700,81a7fff0,81a7d028,81c
06a58,80031724,0) at 0x81219c59
81008d2e(80023100,81c06a58,80031700,81a
7d000,81008d2e,81c069b0) at 0x81008d2e
813618b8(0,800232c4,80023298,800232c4,8
1c06a38,816c51a0) at 0x813618b8
816c4c36(80020400,81c06b60,81ab3f38,800
31200,80031224,0) at 0x816c4c36
813618b8(81c06b60,80020400,80020470,800
20460,80023280,8100d040) at 0x813618b8
8100c571(80023180,81c06c50,81a811a8,800
20400,80020424,0) at 0x8100c571
813618b8(800014a67023,80023180,3c,104,800014a67042,
8140b6f0) at 0x813618b8
8140a766(80023100,81c06d88,81aa1ea8,800
23180,800231a4,0) at 0x8140a766
813618b8(81c06d88,80023100,81a8ba98,800
23100,80023124,811a9830) at 0x813618b8
811a95a1(0,0,0,81c06db0,81c06e20,300010) at 0xf
fff811a95a1
813618b8(0,81827131,81a9fd8a,81c06e78,b28,0) at
 0x813618b8
81361a53(0,0,0,0,81c8,0) at 0x81361a53
8101187b(0,0,8101187b,81c06ef0,0,0) at 0x810118
7b
8116b8c3(0,0,0,0,8116b8c3,81c06f20) at 0x8116b8
c3
end trace frame: 0x0, count: -14
ddb> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*0   0 -1  0  7 0x10200swapper

###6.3 with ignore_bad_msr###
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2018 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 6.3 (GENERIC) #7: Sun Jul 29 11:30:47 CEST 2018
r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 1056964608 (1008MB)
avail mem = 1019158528 (971MB)
warning: no entropy supplied by boot loader
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0xf101f (9 entries)
bios0: vendor BHYVE version "1.00" date 03/14/2014
bios0: bhyve BHYVE
acpi0 at bios0: rev 2
acpi0: sleep states S5
acpi0: tables DSDT APIC FACP HPET MCFG
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Opteron(tm) 

Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-01 Thread Bryan Steele
On Wed, Aug 01, 2018 at 01:07:33PM -0700, Mike Larkin wrote:
> On Wed, Aug 01, 2018 at 12:14:59PM -0400, Bryan Steele wrote:
> > On Wed, Aug 01, 2018 at 11:27:26AM -0400, Bryan Steele wrote:
> > > On Wed, Aug 01, 2018 at 03:46:25PM +0200, Elmer Skjødt Henriksen wrote:
> > > > After installing the 014_amdlfence patch released yesterday for 6.3, my
> > > > OpenBSD VM crashes on boot. It's running under KVM on a Linux box 
> > > > (Ubuntu
> > > > 18.04 w/ kernel 4.15) on an AMD Ryzen 7 1700 (microcode 0x8001137).
> > > > I suppose this would also happen on vmm(4) and bhyve, however I don't 
> > > > have
> > > > any such AMD hosts available for testing.
> > > 
> > > Hi Elmer,
> > > 
> > > This was tested in vmm(4), which does work, unfortunately there was not
> > > extensive testing by in other virtualization software. The MSR that is
> > > being set here is only mentioned in AMDs whitepaper and I had no reason
> > > to believe any special consideration was needed for guest VMs on AMD
> > > processors.
> > > 
> > > > It occurs both using libvirt's "EPYC" CPU model and using 
> > > > "host-passthrough"
> > > > (i.e. no virtual CPU model), but the "core2duo" CPU model works fine.
> > > > 
> > > > I guess not many people are running OpenBSD as a VM, and even less on 
> > > > AMD
> > > > hardware. But still, a syspatch leaving the system unable to boot is
> > > > probably not a good thing. :)
> > > > 
> > > 
> > > Even so, I would like to apologize. This situation is unfortunate, and
> > > I'll try to work with other developers to find the best way forward.
> > > But, I regret I am only but an amateur magician.
> > > 
> > > -Bryan.
> > 
> > Actually, it looks like this is at least partially a KVM/QEMU bug. In
> > the meantime I guess the solution would be to do as you suggested and
> > set a different CPU model for now until Linux distros include a fix for
> > this.
> > 
> > https://lkml.org/lkml/2018/2/21/1202
> > 
> > Afterwards, on the OpenBSD side, it looks like one small change may be
> > required in addition..
> > 
> > -Bryan.
> > 
> > Index: sys/arch/amd64/amd64/identcpu.c
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
> > retrieving revision 1.95.2.2
> > diff -u -p -u -r1.95.2.2 identcpu.c
> > --- sys/arch/amd64/amd64/identcpu.c 30 Jul 2018 14:45:05 -  1.95.2.2
> > +++ sys/arch/amd64/amd64/identcpu.c 1 Aug 2018 16:09:50 -
> > @@ -650,8 +650,10 @@ identifycpu(struct cpu_info *ci)
> >  
> > msr = rdmsr(MSR_DE_CFG);
> >  #define DE_CFG_SERIALIZE_LFENCE(1 << 1)
> > -   msr |= DE_CFG_SERIALIZE_LFENCE;
> > -   wrmsr(MSR_DE_CFG, msr);
> > +   if ((msr & DE_CFG_SERIALIZE_LFENCE) == 0) {
> > +   msr |= DE_CFG_SERIALIZE_LFENCE;
> > +   wrmsr(MSR_DE_CFG, msr);
> > +   }
> > }
> > }
> >  
> > 
> 
> As expected, -current works properly on real AMD hardware. So my assumption
> about KVM doing something odd seems to be correct.
> 
> The issue should be reported upstream to the KVM folks. But if the diff above
> also fixes the issue (I didn't test because I cannot reproduce it), ok 
> mlarkin.
> 
> -ml

I committed a fix for the potential MSR write #GP bug to -current:

https://marc.info/?l=openbsd-cvs=153315564121057=2

Unfortunately, for the MSR read issue on older KVMs, it would require
adding additional code to determine if we're running under KVM, there's
really not much at all we can do here..

I agree these seem like KVM bugs, as this does not happen on real
hardware, and at least also not in OpenBSD vmm(4).

-Bryan.



Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-01 Thread Mike Larkin
On Wed, Aug 01, 2018 at 12:14:59PM -0400, Bryan Steele wrote:
> On Wed, Aug 01, 2018 at 11:27:26AM -0400, Bryan Steele wrote:
> > On Wed, Aug 01, 2018 at 03:46:25PM +0200, Elmer Skjødt Henriksen wrote:
> > > After installing the 014_amdlfence patch released yesterday for 6.3, my
> > > OpenBSD VM crashes on boot. It's running under KVM on a Linux box (Ubuntu
> > > 18.04 w/ kernel 4.15) on an AMD Ryzen 7 1700 (microcode 0x8001137).
> > > I suppose this would also happen on vmm(4) and bhyve, however I don't have
> > > any such AMD hosts available for testing.
> > 
> > Hi Elmer,
> > 
> > This was tested in vmm(4), which does work, unfortunately there was not
> > extensive testing by in other virtualization software. The MSR that is
> > being set here is only mentioned in AMDs whitepaper and I had no reason
> > to believe any special consideration was needed for guest VMs on AMD
> > processors.
> > 
> > > It occurs both using libvirt's "EPYC" CPU model and using 
> > > "host-passthrough"
> > > (i.e. no virtual CPU model), but the "core2duo" CPU model works fine.
> > > 
> > > I guess not many people are running OpenBSD as a VM, and even less on AMD
> > > hardware. But still, a syspatch leaving the system unable to boot is
> > > probably not a good thing. :)
> > > 
> > 
> > Even so, I would like to apologize. This situation is unfortunate, and
> > I'll try to work with other developers to find the best way forward.
> > But, I regret I am only but an amateur magician.
> > 
> > -Bryan.
> 
> Actually, it looks like this is at least partially a KVM/QEMU bug. In
> the meantime I guess the solution would be to do as you suggested and
> set a different CPU model for now until Linux distros include a fix for
> this.
> 
> https://lkml.org/lkml/2018/2/21/1202
> 
> Afterwards, on the OpenBSD side, it looks like one small change may be
> required in addition..
> 
> -Bryan.
> 
> Index: sys/arch/amd64/amd64/identcpu.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
> retrieving revision 1.95.2.2
> diff -u -p -u -r1.95.2.2 identcpu.c
> --- sys/arch/amd64/amd64/identcpu.c   30 Jul 2018 14:45:05 -  1.95.2.2
> +++ sys/arch/amd64/amd64/identcpu.c   1 Aug 2018 16:09:50 -
> @@ -650,8 +650,10 @@ identifycpu(struct cpu_info *ci)
>  
>   msr = rdmsr(MSR_DE_CFG);
>  #define DE_CFG_SERIALIZE_LFENCE  (1 << 1)
> - msr |= DE_CFG_SERIALIZE_LFENCE;
> - wrmsr(MSR_DE_CFG, msr);
> + if ((msr & DE_CFG_SERIALIZE_LFENCE) == 0) {
> + msr |= DE_CFG_SERIALIZE_LFENCE;
> + wrmsr(MSR_DE_CFG, msr);
> + }
>   }
>   }
>  
> 

As expected, -current works properly on real AMD hardware. So my assumption
about KVM doing something odd seems to be correct.

The issue should be reported upstream to the KVM folks. But if the diff above
also fixes the issue (I didn't test because I cannot reproduce it), ok mlarkin.

-ml



Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-01 Thread Mike Larkin
On Wed, Aug 01, 2018 at 12:14:59PM -0400, Bryan Steele wrote:
> On Wed, Aug 01, 2018 at 11:27:26AM -0400, Bryan Steele wrote:
> > On Wed, Aug 01, 2018 at 03:46:25PM +0200, Elmer Skjødt Henriksen wrote:
> > > After installing the 014_amdlfence patch released yesterday for 6.3, my
> > > OpenBSD VM crashes on boot. It's running under KVM on a Linux box (Ubuntu
> > > 18.04 w/ kernel 4.15) on an AMD Ryzen 7 1700 (microcode 0x8001137).
> > > I suppose this would also happen on vmm(4) and bhyve, however I don't have
> > > any such AMD hosts available for testing.
> > 
> > Hi Elmer,
> > 
> > This was tested in vmm(4), which does work, unfortunately there was not
> > extensive testing by in other virtualization software. The MSR that is
> > being set here is only mentioned in AMDs whitepaper and I had no reason
> > to believe any special consideration was needed for guest VMs on AMD
> > processors.
> > 
> > > It occurs both using libvirt's "EPYC" CPU model and using 
> > > "host-passthrough"
> > > (i.e. no virtual CPU model), but the "core2duo" CPU model works fine.
> > > 
> > > I guess not many people are running OpenBSD as a VM, and even less on AMD
> > > hardware. But still, a syspatch leaving the system unable to boot is
> > > probably not a good thing. :)
> > > 
> > 
> > Even so, I would like to apologize. This situation is unfortunate, and
> > I'll try to work with other developers to find the best way forward.
> > But, I regret I am only but an amateur magician.
> > 
> > -Bryan.
> 
> Actually, it looks like this is at least partially a KVM/QEMU bug. In
> the meantime I guess the solution would be to do as you suggested and
> set a different CPU model for now until Linux distros include a fix for
> this.
> 
> https://lkml.org/lkml/2018/2/21/1202
> 
> Afterwards, on the OpenBSD side, it looks like one small change may be
> required in addition..
> 
> -Bryan.
> 
> Index: sys/arch/amd64/amd64/identcpu.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
> retrieving revision 1.95.2.2
> diff -u -p -u -r1.95.2.2 identcpu.c
> --- sys/arch/amd64/amd64/identcpu.c   30 Jul 2018 14:45:05 -  1.95.2.2
> +++ sys/arch/amd64/amd64/identcpu.c   1 Aug 2018 16:09:50 -
> @@ -650,8 +650,10 @@ identifycpu(struct cpu_info *ci)
>  
>   msr = rdmsr(MSR_DE_CFG);
>  #define DE_CFG_SERIALIZE_LFENCE  (1 << 1)
> - msr |= DE_CFG_SERIALIZE_LFENCE;
> - wrmsr(MSR_DE_CFG, msr);
> + if ((msr & DE_CFG_SERIALIZE_LFENCE) == 0) {
> + msr |= DE_CFG_SERIALIZE_LFENCE;
> + wrmsr(MSR_DE_CFG, msr);
> + }
>   }
>   }
>  
> 

As far as I can tell, nowhere does AMD claim this is a RO MSR. Thus, KVM might
not be doing the right thing here by #GPing the guest on write. It is possible
though, that it is described this way in some document I don't have access to.

The diff you propose above is safe however, so if you want to make that change,
ok mlarkin.

I will test the diff on real hardware today. I stand by my belief that KVM is 
doing
something in a nonstandard way here, but I'll check real hardware to make sure.

-ml



Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-01 Thread Bryan Steele
On Wed, Aug 01, 2018 at 11:27:26AM -0400, Bryan Steele wrote:
> On Wed, Aug 01, 2018 at 03:46:25PM +0200, Elmer Skjødt Henriksen wrote:
> > After installing the 014_amdlfence patch released yesterday for 6.3, my
> > OpenBSD VM crashes on boot. It's running under KVM on a Linux box (Ubuntu
> > 18.04 w/ kernel 4.15) on an AMD Ryzen 7 1700 (microcode 0x8001137).
> > I suppose this would also happen on vmm(4) and bhyve, however I don't have
> > any such AMD hosts available for testing.
> 
> Hi Elmer,
> 
> This was tested in vmm(4), which does work, unfortunately there was not
> extensive testing by in other virtualization software. The MSR that is
> being set here is only mentioned in AMDs whitepaper and I had no reason
> to believe any special consideration was needed for guest VMs on AMD
> processors.
> 
> > It occurs both using libvirt's "EPYC" CPU model and using "host-passthrough"
> > (i.e. no virtual CPU model), but the "core2duo" CPU model works fine.
> > 
> > I guess not many people are running OpenBSD as a VM, and even less on AMD
> > hardware. But still, a syspatch leaving the system unable to boot is
> > probably not a good thing. :)
> > 
> 
> Even so, I would like to apologize. This situation is unfortunate, and
> I'll try to work with other developers to find the best way forward.
> But, I regret I am only but an amateur magician.
> 
> -Bryan.

Actually, it looks like this is at least partially a KVM/QEMU bug. In
the meantime I guess the solution would be to do as you suggested and
set a different CPU model for now until Linux distros include a fix for
this.

https://lkml.org/lkml/2018/2/21/1202

Afterwards, on the OpenBSD side, it looks like one small change may be
required in addition..

-Bryan.

Index: sys/arch/amd64/amd64/identcpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
retrieving revision 1.95.2.2
diff -u -p -u -r1.95.2.2 identcpu.c
--- sys/arch/amd64/amd64/identcpu.c 30 Jul 2018 14:45:05 -  1.95.2.2
+++ sys/arch/amd64/amd64/identcpu.c 1 Aug 2018 16:09:50 -
@@ -650,8 +650,10 @@ identifycpu(struct cpu_info *ci)
 
msr = rdmsr(MSR_DE_CFG);
 #define DE_CFG_SERIALIZE_LFENCE(1 << 1)
-   msr |= DE_CFG_SERIALIZE_LFENCE;
-   wrmsr(MSR_DE_CFG, msr);
+   if ((msr & DE_CFG_SERIALIZE_LFENCE) == 0) {
+   msr |= DE_CFG_SERIALIZE_LFENCE;
+   wrmsr(MSR_DE_CFG, msr);
+   }
}
}
 



Re: 014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-01 Thread Bryan Steele
On Wed, Aug 01, 2018 at 03:46:25PM +0200, Elmer Skjødt Henriksen wrote:
> After installing the 014_amdlfence patch released yesterday for 6.3, my
> OpenBSD VM crashes on boot. It's running under KVM on a Linux box (Ubuntu
> 18.04 w/ kernel 4.15) on an AMD Ryzen 7 1700 (microcode 0x8001137).
> I suppose this would also happen on vmm(4) and bhyve, however I don't have
> any such AMD hosts available for testing.

Hi Elmer,

This was tested in vmm(4), which does work, unfortunately there was not
extensive testing by in other virtualization software. The MSR that is
being set here is only mentioned in AMDs whitepaper and I had no reason
to believe any special consideration was needed for guest VMs on AMD
processors.

> It occurs both using libvirt's "EPYC" CPU model and using "host-passthrough"
> (i.e. no virtual CPU model), but the "core2duo" CPU model works fine.
> 
> I guess not many people are running OpenBSD as a VM, and even less on AMD
> hardware. But still, a syspatch leaving the system unable to boot is
> probably not a good thing. :)
> 

Even so, I would like to apologize. This situation is unfortunate, and
I'll try to work with other developers to find the best way forward.
But, I regret I am only but an amateur magician.

-Bryan.

> Kernel output:
> >> OpenBSD/amd64 BOOT 3.34
> boot>
> booting hd0a:/bsd: 8616075+2454544+262168+0+671744
> [646904+98+712056+493074]=0xd39630
> entry point at 0x1000158
> [ using 1852976 bytes of bsd ELF symbol table ]
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>   The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2018 OpenBSD. All rights reserved.
> https://www.OpenBSD.org
> 
> OpenBSD 6.3 (GENERIC.MP) #7: Sun Jul 29 11:43:12 CEST 2018
> 
> r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 2130546688 (2031MB)
> avail mem = 2058960896 (1963MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf6880 (10 entries)
> bios0: vendor SeaBIOS version "1.10.2-1ubuntu1" date 04/01/2014
> bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> acpi0 at bios0: rev 0
> acpi0: sleep states S5
> acpi0: tables DSDT FACP APIC
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: AMD Ryzen 7 1700 Eight-Core Processor, 2994.73 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1
> cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
> 64b/line 16-way L2 cache, 16MB 64b/line 16-way L3 cache
> cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> kernel: protection fault trap, code=0
> Stopped at  identifycpu+0x7ad:  rdmsr
> ddb{0}> trace
> identifycpu(81a99ff0,80039400,81d40a58,8000210b9000
> ,81d40a60,12ad28e092a02002) at identifycpu+0x7ad
> cpu_attach(80023100,81d40a58,81a97040,80039400,
> 80039424,12ad28e092a02002) at cpu_attach+0x326
> config_attach(0,8001c744,8001c718,8001c744,81d4
> 0a38,813ce5d0) at config_attach+0x1d8
> acpimadt_attach(80020400,81d40b60,81aa84d0,8003
> 9b80,80039ba4,12ad28e092a02002) at acpimadt_attach+0x3be
> config_attach(81d40b60,80020400,80020470,800204
> 60,8001c700,81683350) at config_attach+0x1d8
> acpi_attach(80023180,81d40c50,81abf0d8,80020400
> ,80020424,12ad28e092a02002) at acpi_attach+0x5c1
> config_attach(8000210b7884,80023180,50,118,8000210b78b0,fff
> f81256180) at config_attach+0x1d8
> bios_attach(80023100,81d40d88,81aa2188,80023180
> ,800231a4,12ad28e092a02002) at bios_attach+0x636
> config_attach(81d40d88,80023100,81ab0bb0,800231
> 00,80023124,81456d90) at config_attach+0x1d8
> mainbus_attach(0,0,12ad28e092a02002,81d40db0,81d40e20,30001
> 0) at mainbus_attach+0x71
> config_attach(0,819a78b4,81ac8fd2,81d40e78,b28,0) at
> co
> nfig_attach+0x1d8
> config_rootfound(0,0,0,0,81d3a008,12ad28e092a02002) at
> config_rootfound
> +0xd3
> cpu_configure(0,0,8141440b,81d40ef0,0,0) at
> cpu_configure+0x1b
> main(0,0,0,12ad28e092a02002,814eff28,81d40f20) at main+0x4a8
> end trace frame: 0x0, count: -14
> ddb{0}> ps
>PID TID   PPID

014_amdlfence.patch breaks OpenBSD VMs on AMD systems

2018-08-01 Thread Elmer Skjødt Henriksen
After installing the 014_amdlfence patch released yesterday for 6.3, my 
OpenBSD VM crashes on boot. It's running under KVM on a Linux box 
(Ubuntu 18.04 w/ kernel 4.15) on an AMD Ryzen 7 1700 (microcode 0x8001137).
I suppose this would also happen on vmm(4) and bhyve, however I don't 
have any such AMD hosts available for testing.


It occurs both using libvirt's "EPYC" CPU model and using 
"host-passthrough" (i.e. no virtual CPU model), but the "core2duo" CPU 
model works fine.


I guess not many people are running OpenBSD as a VM, and even less on 
AMD hardware. But still, a syspatch leaving the system unable to boot is 
probably not a good thing. :)


Kernel output:
>> OpenBSD/amd64 BOOT 3.34
boot>
booting hd0a:/bsd: 8616075+2454544+262168+0+671744 
[646904+98+712056+493074]=0xd39630

entry point at 0x1000158
[ using 1852976 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2018 OpenBSD. All rights reserved. 
https://www.OpenBSD.org


OpenBSD 6.3 (GENERIC.MP) #7: Sun Jul 29 11:43:12 CEST 2018

r...@syspatch-63-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 2130546688 (2031MB)
avail mem = 2058960896 (1963MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf6880 (10 entries)
bios0: vendor SeaBIOS version "1.10.2-1ubuntu1" date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S5
acpi0: tables DSDT FACP APIC
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 1700 Eight-Core Processor, 2994.73 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA,IBPB,XSAVEOPT,XSAVEC,XGETBV1
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 
64b/line 16-way L2 cache, 16MB 64b/line 16-way L3 cache

cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
kernel: protection fault trap, code=0
Stopped at  identifycpu+0x7ad:  rdmsr
ddb{0}> trace
identifycpu(81a99ff0,80039400,81d40a58,8000210b9000
,81d40a60,12ad28e092a02002) at identifycpu+0x7ad
cpu_attach(80023100,81d40a58,81a97040,80039400,
80039424,12ad28e092a02002) at cpu_attach+0x326
config_attach(0,8001c744,8001c718,8001c744,81d4
0a38,813ce5d0) at config_attach+0x1d8
acpimadt_attach(80020400,81d40b60,81aa84d0,8003
9b80,80039ba4,12ad28e092a02002) at acpimadt_attach+0x3be
config_attach(81d40b60,80020400,80020470,800204
60,8001c700,81683350) at config_attach+0x1d8
acpi_attach(80023180,81d40c50,81abf0d8,80020400
,80020424,12ad28e092a02002) at acpi_attach+0x5c1
config_attach(8000210b7884,80023180,50,118,8000210b78b0,fff
f81256180) at config_attach+0x1d8
bios_attach(80023100,81d40d88,81aa2188,80023180
,800231a4,12ad28e092a02002) at bios_attach+0x636
config_attach(81d40d88,80023100,81ab0bb0,800231
00,80023124,81456d90) at config_attach+0x1d8
mainbus_attach(0,0,12ad28e092a02002,81d40db0,81d40e20,30001
0) at mainbus_attach+0x71
config_attach(0,819a78b4,81ac8fd2,81d40e78,b28,0) 
at co

nfig_attach+0x1d8
config_rootfound(0,0,0,0,81d3a008,12ad28e092a02002) at 
config_rootfound

+0xd3
cpu_configure(0,0,8141440b,81d40ef0,0,0) at 
cpu_configure+0x1b

main(0,0,0,12ad28e092a02002,814eff28,81d40f20) at main+0x4a8
end trace frame: 0x0, count: -14
ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*0   0 -1  0  7 0x10200swapper