Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace

2022-11-06 Thread Paul de Weerd
All,

Just to close the loop on this one.  Over the weekend I upgraded the
machine with the "AMD EPYC 3201 8-Core Processor" to the latest
snapshot (and the VMs that run on it), and much like with Jesper
Wallin's Ryzen laptop, all is fine now.

Thanks again to Scott and Mike for the super quick fix!

Paul

On Mon, Oct 31, 2022 at 02:15:00PM +0100, Paul de Weerd wrote:
| On Mon, Oct 31, 2022 at 07:39:01AM -0500, Scott Cheloha wrote:
| | You get a #GP in your VM when trying to rdmsr(MSR_HWCR).  My guess is
| | we need to expand the MSR read bitmap for SVM.
| | 
| | This patch compiles, but I can't test it.  Does it fix the panic?
| 
| To test this patch, I'd have to upgrade the hypervisor.  That's a bit
| more involved, I'll plan it ASAP and report back, but it may be a few
| days.
| 
| Thank you Scott and Mike!
| 
| Paul
| 
| | CC dv@ mlarkin@
| | 
| | Index: vmm.c
| | ===
| | RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v
| | retrieving revision 1.323
| | diff -u -p -r1.323 vmm.c
| | --- vmm.c   7 Sep 2022 18:44:09 -   1.323
| | +++ vmm.c   31 Oct 2022 12:38:30 -
| | @@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s
| | /* allow reading TSC */
| | svm_setmsrbr(vcpu, MSR_TSC);
| |  
| | +   /* allow reading HWCR and PSTATEDEF for TSC calibration */
| | +   svm_setmsrbr(vcpu, MSR_HWCR);
| | +   svm_setmsrbr(vcpu, MSR_PSTATEDEF(0));
| | +
| | /* Guest VCPU ASID */
| | if (vmm_alloc_vpid()) {
| | DPRINTF("%s: could not allocate asid\n", __func__);
| | 
| 
| -- 
| >[<++>-]<+++.>+++[<-->-]<.>+++[<+
| +++>-]<.>++[<>-]<+.--.[-]
|  http://www.weirdnet.nl/ 
| 

-- 
>[<++>-]<+++.>+++[<-->-]<.>+++[<+
+++>-]<.>++[<>-]<+.--.[-]
 http://www.weirdnet.nl/ 



Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace

2022-10-31 Thread Mike Larkin
On Mon, Oct 31, 2022 at 07:39:01AM -0500, Scott Cheloha wrote:
> On Mon, Oct 31, 2022 at 12:43:50PM +0100, Paul de Weerd wrote:
> > Hi folks,
> >
> > I just upgraded a VM on my AMD EPYC host.  I get the following
> > protection fault during boot:
> >
> > ddb> bo re
> > rebooting...
> > Using drive 0, partition 3.
> > Loading..
> > probing: pc0 com0 mem[638K 3838M 256M a20=on]
> > disk: hd0+
> > >> OpenBSD/amd64 BOOT 3.55
> > \
> > com0: 115200 baud
> > switching console to com0
> > >> OpenBSD/amd64 BOOT 3.55
> > boot>
> > NOTE: random seed is being reused.
> > booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 
> > [1143945+128+1225080+928182]=0x170d440
> > entry point at 0x81001000
> > [ using 3298368 bytes of bsd ELF symbol table ]
> > Copyright (c) 1982, 1986, 1989, 1991, 1993
> > The Regents of the University of California.  All rights reserved.
> > Copyright (c) 1995-2022 OpenBSD. All rights reserved.  
> > https://www.OpenBSD.org
> >
> > OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> > real mem = 4278177792 (4079MB)
> > avail mem = 4131221504 (3939MB)
> > random: good seed from bootblocks
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries)
> > bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
> > bios0: OpenBSD VMM
> > acpi at bios0 not configured
> > cpu0 at mainbus0: (uniprocessor)
> > kernel: protection fault trap, code=0
> > Stopped at  tsc_identify+0xcd:  rdmsr
> > ddb> ps
> >PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> > *0   0 -1  0  7 0x10200swapper
> > ddb> trace
> > tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10)
> >  at tsc_identify+0xcd
> > identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424)
> >  at identifycpu+0x2e4
> > cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300)
> >  at cpu_attach+0x16f
> > config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8)
> >  at config_attach+0x1f4
> > mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at 
> > mainbus_attach+0x151
> > config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at 
> > config_attach+0x1f4
> > cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00)
> >  at cpu_configure+0x33
> > main(0,0,0,0,0,1) at main+0x379
> > end trace frame: 0x0, count: -8
> > ddb> show reg
> > rdi   0x822a3035cpu_vendor+0xd
> > rsi   0x81f04410cmd0646_9_tim_udma+0x170f5
> > rbp   0x82714c30end+0x314c30
> > rbx   0x20202020
> > rdx0
> > rcx   0xc0010015
> > rax0
> > r8 0
> > r9  0x40
> > r10   0x2bc299b68ee7cba5
> > r11   0x75a3a544d54dd7b9
> > r12  0x1
> > r13   0x8002c424
> > r14   0x822c7ff0cpu_info_full_primary+0x1ff0
> > r15   0x82714c40end+0x314c40
> > rip   0x819e1f4dtsc_identify+0xcd
> > cs   0x8
> > rflags   0x10202__ALIGN_SIZE+0xf202
> > rsp   0x82714c10end+0x314c10
> > ss  0x10
> > tsc_identify+0xcd:  rdmsr
> > ddb>
>
> You get a #GP in your VM when trying to rdmsr(MSR_HWCR).  My guess is
> we need to expand the MSR read bitmap for SVM.
>
> This patch compiles, but I can't test it.  Does it fix the panic?
>
> CC dv@ mlarkin@
>
> Index: vmm.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v
> retrieving revision 1.323
> diff -u -p -r1.323 vmm.c
> --- vmm.c 7 Sep 2022 18:44:09 -   1.323
> +++ vmm.c 31 Oct 2022 12:38:30 -
> @@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s
>   /* allow reading TSC */
>   svm_setmsrbr(vcpu, MSR_TSC);
>
> + /* allow reading HWCR and PSTATEDEF for TSC calibration */
> + svm_setmsrbr(vcpu, MSR_HWCR);
> + svm_setmsrbr(vcpu, MSR_PSTATEDEF(0));
> +
>   /* Guest VCPU ASID */
>   if (vmm_alloc_vpid()) {
>   DPRINTF("%s: could not allocate asid\n", __func__);
>

This is the same diff I would have come up with myself, and since it is reported
to fix the issue, ok mlarkin@ on this. Thanks Scott.

-ml



Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace

2022-10-31 Thread Jesper Wallin
Hi

I had the same issue on my laptop (AMD Ryzen 7 PRO 3700U) and the patch
solved it on my machine at least.


Jesper Wallin

On Mon, Oct 31, 2022 at 02:15:00PM +0100, Paul de Weerd wrote:
> On Mon, Oct 31, 2022 at 07:39:01AM -0500, Scott Cheloha wrote:
> | You get a #GP in your VM when trying to rdmsr(MSR_HWCR).  My guess is
> | we need to expand the MSR read bitmap for SVM.
> | 
> | This patch compiles, but I can't test it.  Does it fix the panic?
> 
> To test this patch, I'd have to upgrade the hypervisor.  That's a bit
> more involved, I'll plan it ASAP and report back, but it may be a few
> days.
> 
> Thank you Scott and Mike!
> 
> Paul
> 
> | CC dv@ mlarkin@
> | 
> | Index: vmm.c
> | ===
> | RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v
> | retrieving revision 1.323
> | diff -u -p -r1.323 vmm.c
> | --- vmm.c   7 Sep 2022 18:44:09 -   1.323
> | +++ vmm.c   31 Oct 2022 12:38:30 -
> | @@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s
> | /* allow reading TSC */
> | svm_setmsrbr(vcpu, MSR_TSC);
> |  
> | +   /* allow reading HWCR and PSTATEDEF for TSC calibration */
> | +   svm_setmsrbr(vcpu, MSR_HWCR);
> | +   svm_setmsrbr(vcpu, MSR_PSTATEDEF(0));
> | +
> | /* Guest VCPU ASID */
> | if (vmm_alloc_vpid()) {
> | DPRINTF("%s: could not allocate asid\n", __func__);
> | 
> 
> -- 
> >[<++>-]<+++.>+++[<-->-]<.>+++[<+
> +++>-]<.>++[<>-]<+.--.[-]
>  http://www.weirdnet.nl/ 
> 



Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace

2022-10-31 Thread Paul de Weerd
On Mon, Oct 31, 2022 at 07:39:01AM -0500, Scott Cheloha wrote:
| You get a #GP in your VM when trying to rdmsr(MSR_HWCR).  My guess is
| we need to expand the MSR read bitmap for SVM.
| 
| This patch compiles, but I can't test it.  Does it fix the panic?

To test this patch, I'd have to upgrade the hypervisor.  That's a bit
more involved, I'll plan it ASAP and report back, but it may be a few
days.

Thank you Scott and Mike!

Paul

| CC dv@ mlarkin@
| 
| Index: vmm.c
| ===
| RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v
| retrieving revision 1.323
| diff -u -p -r1.323 vmm.c
| --- vmm.c 7 Sep 2022 18:44:09 -   1.323
| +++ vmm.c 31 Oct 2022 12:38:30 -
| @@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s
|   /* allow reading TSC */
|   svm_setmsrbr(vcpu, MSR_TSC);
|  
| + /* allow reading HWCR and PSTATEDEF for TSC calibration */
| + svm_setmsrbr(vcpu, MSR_HWCR);
| + svm_setmsrbr(vcpu, MSR_PSTATEDEF(0));
| +
|   /* Guest VCPU ASID */
|   if (vmm_alloc_vpid()) {
|   DPRINTF("%s: could not allocate asid\n", __func__);
| 

-- 
>[<++>-]<+++.>+++[<-->-]<.>+++[<+
+++>-]<.>++[<>-]<+.--.[-]
 http://www.weirdnet.nl/ 



Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace

2022-10-31 Thread Scott Cheloha
On Mon, Oct 31, 2022 at 12:43:50PM +0100, Paul de Weerd wrote:
> Hi folks,
> 
> I just upgraded a VM on my AMD EPYC host.  I get the following
> protection fault during boot:
> 
> ddb> bo re
> rebooting...
> Using drive 0, partition 3.
> Loading..
> probing: pc0 com0 mem[638K 3838M 256M a20=on] 
> disk: hd0+
> >> OpenBSD/amd64 BOOT 3.55
> \
> com0: 115200 baud
> switching console to com0
> >> OpenBSD/amd64 BOOT 3.55
> boot> 
> NOTE: random seed is being reused.
> booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 
> [1143945+128+1225080+928182]=0x170d440
> entry point at 0x81001000
> [ using 3298368 bytes of bsd ELF symbol table ]
> Copyright (c) 1982, 1986, 1989, 1991, 1993
> The Regents of the University of California.  All rights reserved.
> Copyright (c) 1995-2022 OpenBSD. All rights reserved.  https://www.OpenBSD.org
> 
> OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> real mem = 4278177792 (4079MB)
> avail mem = 4131221504 (3939MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries)
> bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
> bios0: OpenBSD VMM
> acpi at bios0 not configured
> cpu0 at mainbus0: (uniprocessor)
> kernel: protection fault trap, code=0
> Stopped at  tsc_identify+0xcd:  rdmsr
> ddb> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> *0   0 -1  0  7 0x10200swapper
> ddb> trace
> tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10)
>  at tsc_identify+0xcd
> identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424)
>  at identifycpu+0x2e4
> cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300)
>  at cpu_attach+0x16f
> config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8)
>  at config_attach+0x1f4
> mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at 
> mainbus_attach+0x151
> config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at 
> config_attach+0x1f4
> cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00)
>  at cpu_configure+0x33
> main(0,0,0,0,0,1) at main+0x379
> end trace frame: 0x0, count: -8
> ddb> show reg
> rdi   0x822a3035cpu_vendor+0xd
> rsi   0x81f04410cmd0646_9_tim_udma+0x170f5
> rbp   0x82714c30end+0x314c30
> rbx   0x20202020
> rdx0
> rcx   0xc0010015
> rax0
> r8 0
> r9  0x40
> r10   0x2bc299b68ee7cba5
> r11   0x75a3a544d54dd7b9
> r12  0x1
> r13   0x8002c424
> r14   0x822c7ff0cpu_info_full_primary+0x1ff0
> r15   0x82714c40end+0x314c40
> rip   0x819e1f4dtsc_identify+0xcd
> cs   0x8
> rflags   0x10202__ALIGN_SIZE+0xf202
> rsp   0x82714c10end+0x314c10
> ss  0x10
> tsc_identify+0xcd:  rdmsr
> ddb> 

You get a #GP in your VM when trying to rdmsr(MSR_HWCR).  My guess is
we need to expand the MSR read bitmap for SVM.

This patch compiles, but I can't test it.  Does it fix the panic?

CC dv@ mlarkin@

Index: vmm.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v
retrieving revision 1.323
diff -u -p -r1.323 vmm.c
--- vmm.c   7 Sep 2022 18:44:09 -   1.323
+++ vmm.c   31 Oct 2022 12:38:30 -
@@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s
/* allow reading TSC */
svm_setmsrbr(vcpu, MSR_TSC);
 
+   /* allow reading HWCR and PSTATEDEF for TSC calibration */
+   svm_setmsrbr(vcpu, MSR_HWCR);
+   svm_setmsrbr(vcpu, MSR_PSTATEDEF(0));
+
/* Guest VCPU ASID */
if (vmm_alloc_vpid()) {
DPRINTF("%s: could not allocate asid\n", __func__);



Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace

2022-10-31 Thread Mike Larkin
On Mon, Oct 31, 2022 at 01:14:59PM +0100, Paul de Weerd wrote:
> Had some time over lunch.  Disabling the code path that calls
> tsc_freq_msr lets me boot into the VM again:
>
> Index: tsc.c
> ===
> RCS file: /home/OpenBSD/cvs/src/sys/arch/amd64/amd64/tsc.c,v
> retrieving revision 1.30
> diff -u -p -r1.30 tsc.c
> --- tsc.c 24 Oct 2022 00:56:33 -  1.30
> +++ tsc.c 31 Oct 2022 12:12:32 -
> @@ -179,7 +179,7 @@ tsc_identify(struct cpu_info *ci)
>   tsc_is_invariant = 1;
>
>   tsc_frequency = tsc_freq_cpuid(ci);
> - if (tsc_frequency == 0)
> + if (tsc_frequency == 42)
>   tsc_frequency = tsc_freq_msr(ci);
>   if (tsc_frequency > 0)
>   delay_init(tsc_delay, 5000);
>
> Obviously not a fix, but at least a smoking gun.
>
> Paul
>

The tsc freq msr probably needs to be passed through in this case. I'll take
a look.

-ml

> On Mon, Oct 31, 2022 at 12:43:50PM +0100, Paul de Weerd wrote:
> | Hi folks,
> |
> | I just upgraded a VM on my AMD EPYC host.  I get the following
> | protection fault during boot:
> |
> | ddb> bo re
> | rebooting...
> | Using drive 0, partition 3.
> | Loading..
> | probing: pc0 com0 mem[638K 3838M 256M a20=on]
> | disk: hd0+
> | >> OpenBSD/amd64 BOOT 3.55
> | \
> | com0: 115200 baud
> | switching console to com0
> | >> OpenBSD/amd64 BOOT 3.55
> | boot>
> | NOTE: random seed is being reused.
> | booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 
> [1143945+128+1225080+928182]=0x170d440
> | entry point at 0x81001000
> | [ using 3298368 bytes of bsd ELF symbol table ]
> | Copyright (c) 1982, 1986, 1989, 1991, 1993
> | The Regents of the University of California.  All rights reserved.
> | Copyright (c) 1995-2022 OpenBSD. All rights reserved.  
> https://www.OpenBSD.org
> |
> | OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022
> | dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> | real mem = 4278177792 (4079MB)
> | avail mem = 4131221504 (3939MB)
> | random: good seed from bootblocks
> | mpath0 at root
> | scsibus0 at mpath0: 256 targets
> | mainbus0 at root
> | bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries)
> | bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
> | bios0: OpenBSD VMM
> | acpi at bios0 not configured
> | cpu0 at mainbus0: (uniprocessor)
> | kernel: protection fault trap, code=0
> | Stopped at  tsc_identify+0xcd:  rdmsr
> | ddb> ps
> |PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> | *0   0 -1  0  7 0x10200swapper
> | ddb> trace
> | 
> tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10)
>  at tsc_identify+0xcd
> | 
> identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424)
>  at identifycpu+0x2e4
> | 
> cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300)
>  at cpu_attach+0x16f
> | 
> config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8)
>  at config_attach+0x1f4
> | mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at 
> mainbus_attach+0x151
> | config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at 
> config_attach+0x1f4
> | 
> cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00)
>  at cpu_configure+0x33
> | main(0,0,0,0,0,1) at main+0x379
> | end trace frame: 0x0, count: -8
> | ddb> show reg
> | rdi   0x822a3035cpu_vendor+0xd
> | rsi   0x81f04410cmd0646_9_tim_udma+0x170f5
> | rbp   0x82714c30end+0x314c30
> | rbx   0x20202020
> | rdx0
> | rcx   0xc0010015
> | rax0
> | r8 0
> | r9  0x40
> | r10   0x2bc299b68ee7cba5
> | r11   0x75a3a544d54dd7b9
> | r12  0x1
> | r13   0x8002c424
> | r14   0x822c7ff0cpu_info_full_primary+0x1ff0
> | r15   0x82714c40end+0x314c40
> | rip   0x819e1f4dtsc_identify+0xcd
> | cs   0x8
> | rflags   0x10202__ALIGN_SIZE+0xf202
> | rsp   0x82714c10end+0x314c10
> | ss  0x10
> | tsc_identify+0xcd:  rdmsr
> | ddb>
> |
> | When trying to boot bsd.rd I get:
> |
> | fatal protection fault in supervisor mode
> | trap type 4 code  rip 811d5fb2 cs 8 rflags 10202 cr2 0 cpl 
> e rsp 81a06d10
> | gsbase 0x818f5ff0  kgsbase 0x0
> | panic: trap type 4, code=, pc=811d5fb2
> |
> | This snapshot works 

Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace

2022-10-31 Thread Paul de Weerd
Had some time over lunch.  Disabling the code path that calls
tsc_freq_msr lets me boot into the VM again:

Index: tsc.c
===
RCS file: /home/OpenBSD/cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.30
diff -u -p -r1.30 tsc.c
--- tsc.c   24 Oct 2022 00:56:33 -  1.30
+++ tsc.c   31 Oct 2022 12:12:32 -
@@ -179,7 +179,7 @@ tsc_identify(struct cpu_info *ci)
tsc_is_invariant = 1;
 
tsc_frequency = tsc_freq_cpuid(ci);
-   if (tsc_frequency == 0)
+   if (tsc_frequency == 42)
tsc_frequency = tsc_freq_msr(ci);
if (tsc_frequency > 0)
delay_init(tsc_delay, 5000);

Obviously not a fix, but at least a smoking gun.

Paul

On Mon, Oct 31, 2022 at 12:43:50PM +0100, Paul de Weerd wrote:
| Hi folks,
| 
| I just upgraded a VM on my AMD EPYC host.  I get the following
| protection fault during boot:
| 
| ddb> bo re
| rebooting...
| Using drive 0, partition 3.
| Loading..
| probing: pc0 com0 mem[638K 3838M 256M a20=on] 
| disk: hd0+
| >> OpenBSD/amd64 BOOT 3.55
| \
| com0: 115200 baud
| switching console to com0
| >> OpenBSD/amd64 BOOT 3.55
| boot> 
| NOTE: random seed is being reused.
| booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 
[1143945+128+1225080+928182]=0x170d440
| entry point at 0x81001000
| [ using 3298368 bytes of bsd ELF symbol table ]
| Copyright (c) 1982, 1986, 1989, 1991, 1993
| The Regents of the University of California.  All rights reserved.
| Copyright (c) 1995-2022 OpenBSD. All rights reserved.  https://www.OpenBSD.org
| 
| OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022
| dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
| real mem = 4278177792 (4079MB)
| avail mem = 4131221504 (3939MB)
| random: good seed from bootblocks
| mpath0 at root
| scsibus0 at mpath0: 256 targets
| mainbus0 at root
| bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries)
| bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
| bios0: OpenBSD VMM
| acpi at bios0 not configured
| cpu0 at mainbus0: (uniprocessor)
| kernel: protection fault trap, code=0
| Stopped at  tsc_identify+0xcd:  rdmsr
| ddb> ps
|PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
| *0   0 -1  0  7 0x10200swapper
| ddb> trace
| 
tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10)
 at tsc_identify+0xcd
| 
identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424)
 at identifycpu+0x2e4
| 
cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300)
 at cpu_attach+0x16f
| 
config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8)
 at config_attach+0x1f4
| mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at 
mainbus_attach+0x151
| config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at 
config_attach+0x1f4
| 
cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00)
 at cpu_configure+0x33
| main(0,0,0,0,0,1) at main+0x379
| end trace frame: 0x0, count: -8
| ddb> show reg
| rdi   0x822a3035cpu_vendor+0xd
| rsi   0x81f04410cmd0646_9_tim_udma+0x170f5
| rbp   0x82714c30end+0x314c30
| rbx   0x20202020
| rdx0
| rcx   0xc0010015
| rax0
| r8 0
| r9  0x40
| r10   0x2bc299b68ee7cba5
| r11   0x75a3a544d54dd7b9
| r12  0x1
| r13   0x8002c424
| r14   0x822c7ff0cpu_info_full_primary+0x1ff0
| r15   0x82714c40end+0x314c40
| rip   0x819e1f4dtsc_identify+0xcd
| cs   0x8
| rflags   0x10202__ALIGN_SIZE+0xf202
| rsp   0x82714c10end+0x314c10
| ss  0x10
| tsc_identify+0xcd:  rdmsr
| ddb> 
| 
| When trying to boot bsd.rd I get:
| 
| fatal protection fault in supervisor mode
| trap type 4 code  rip 811d5fb2 cs 8 rflags 10202 cr2 0 cpl e 
rsp 81a06d10
| gsbase 0x818f5ff0  kgsbase 0x0
| panic: trap type 4, code=, pc=811d5fb2
| 
| This snapshot works fine in VMs running on my old Intel-based
| workstation, so I suspect the AMD CPU may have something to do with
| it.  Included below is the dmesg of the hypervisor (yes, that should
| also be upgraded at some point...).
| 
| I still have an old bsd.rd that I can boot into from the previous
| snapshot:
| 
| OpenBSD 7.2 (RAMDISK_CD) #715: Thu Sep 22 11:51:48 MDT 2022
| 
| 

kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace

2022-10-31 Thread Paul de Weerd
Hi folks,

I just upgraded a VM on my AMD EPYC host.  I get the following
protection fault during boot:

ddb> bo re
rebooting...
Using drive 0, partition 3.
Loading..
probing: pc0 com0 mem[638K 3838M 256M a20=on] 
disk: hd0+
>> OpenBSD/amd64 BOOT 3.55
\
com0: 115200 baud
switching console to com0
>> OpenBSD/amd64 BOOT 3.55
boot> 
NOTE: random seed is being reused.
booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 
[1143945+128+1225080+928182]=0x170d440
entry point at 0x81001000
[ using 3298368 bytes of bsd ELF symbol table ]
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2022 OpenBSD. All rights reserved.  https://www.OpenBSD.org

OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 4278177792 (4079MB)
avail mem = 4131221504 (3939MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries)
bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011
bios0: OpenBSD VMM
acpi at bios0 not configured
cpu0 at mainbus0: (uniprocessor)
kernel: protection fault trap, code=0
Stopped at  tsc_identify+0xcd:  rdmsr
ddb> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*0   0 -1  0  7 0x10200swapper
ddb> trace
tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10)
 at tsc_identify+0xcd
identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424)
 at identifycpu+0x2e4
cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300)
 at cpu_attach+0x16f
config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8)
 at config_attach+0x1f4
mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at 
mainbus_attach+0x151
config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at config_attach+0x1f4
cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00)
 at cpu_configure+0x33
main(0,0,0,0,0,1) at main+0x379
end trace frame: 0x0, count: -8
ddb> show reg
rdi   0x822a3035cpu_vendor+0xd
rsi   0x81f04410cmd0646_9_tim_udma+0x170f5
rbp   0x82714c30end+0x314c30
rbx   0x20202020
rdx0
rcx   0xc0010015
rax0
r8 0
r9  0x40
r10   0x2bc299b68ee7cba5
r11   0x75a3a544d54dd7b9
r12  0x1
r13   0x8002c424
r14   0x822c7ff0cpu_info_full_primary+0x1ff0
r15   0x82714c40end+0x314c40
rip   0x819e1f4dtsc_identify+0xcd
cs   0x8
rflags   0x10202__ALIGN_SIZE+0xf202
rsp   0x82714c10end+0x314c10
ss  0x10
tsc_identify+0xcd:  rdmsr
ddb> 

When trying to boot bsd.rd I get:

fatal protection fault in supervisor mode
trap type 4 code  rip 811d5fb2 cs 8 rflags 10202 cr2 0 cpl e 
rsp 81a06d10
gsbase 0x818f5ff0  kgsbase 0x0
panic: trap type 4, code=, pc=811d5fb2

This snapshot works fine in VMs running on my old Intel-based
workstation, so I suspect the AMD CPU may have something to do with
it.  Included below is the dmesg of the hypervisor (yes, that should
also be upgraded at some point...).

I still have an old bsd.rd that I can boot into from the previous
snapshot:

OpenBSD 7.2 (RAMDISK_CD) #715: Thu Sep 22 11:51:48 MDT 2022

Looking at CVS history between Sep 22 and today, this commit from
Scott sticks out (hence the CC: to cheloha@):

https://marc.info/?l=openbsd-cvs=166657262528344=2

Later tonight I can try reverting this commit to see if it helps
things.  Will follow up when there's something to report.

Cheers,

Paul

--- dmesg (of the hypervisor) 
OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11 18:03:57 MDT 2022
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 68567597056 (65391MB)
avail mem = 66472255488 (63392MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xdab19000 (51 entries)
bios0: vendor American Megatrends Inc. version "1.0c" date 06/30/2020
bios0: Supermicro Super Server
acpi0 at bios0: ACPI 6.1
acpi0: sleep states S0 S5
acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SPMI SSDT MCFG SSDT CRAT CDIT BERT 
EINJ HEST HPET SSDT UEFI IVRS SSDT WSMT
acpi0: wakeup devices