Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace
All, Just to close the loop on this one. Over the weekend I upgraded the machine with the "AMD EPYC 3201 8-Core Processor" to the latest snapshot (and the VMs that run on it), and much like with Jesper Wallin's Ryzen laptop, all is fine now. Thanks again to Scott and Mike for the super quick fix! Paul On Mon, Oct 31, 2022 at 02:15:00PM +0100, Paul de Weerd wrote: | On Mon, Oct 31, 2022 at 07:39:01AM -0500, Scott Cheloha wrote: | | You get a #GP in your VM when trying to rdmsr(MSR_HWCR). My guess is | | we need to expand the MSR read bitmap for SVM. | | | | This patch compiles, but I can't test it. Does it fix the panic? | | To test this patch, I'd have to upgrade the hypervisor. That's a bit | more involved, I'll plan it ASAP and report back, but it may be a few | days. | | Thank you Scott and Mike! | | Paul | | | CC dv@ mlarkin@ | | | | Index: vmm.c | | === | | RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v | | retrieving revision 1.323 | | diff -u -p -r1.323 vmm.c | | --- vmm.c 7 Sep 2022 18:44:09 - 1.323 | | +++ vmm.c 31 Oct 2022 12:38:30 - | | @@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s | | /* allow reading TSC */ | | svm_setmsrbr(vcpu, MSR_TSC); | | | | + /* allow reading HWCR and PSTATEDEF for TSC calibration */ | | + svm_setmsrbr(vcpu, MSR_HWCR); | | + svm_setmsrbr(vcpu, MSR_PSTATEDEF(0)); | | + | | /* Guest VCPU ASID */ | | if (vmm_alloc_vpid()) { | | DPRINTF("%s: could not allocate asid\n", __func__); | | | | -- | >[<++>-]<+++.>+++[<-->-]<.>+++[<+ | +++>-]<.>++[<>-]<+.--.[-] | http://www.weirdnet.nl/ | -- >[<++>-]<+++.>+++[<-->-]<.>+++[<+ +++>-]<.>++[<>-]<+.--.[-] http://www.weirdnet.nl/
Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace
On Mon, Oct 31, 2022 at 07:39:01AM -0500, Scott Cheloha wrote: > On Mon, Oct 31, 2022 at 12:43:50PM +0100, Paul de Weerd wrote: > > Hi folks, > > > > I just upgraded a VM on my AMD EPYC host. I get the following > > protection fault during boot: > > > > ddb> bo re > > rebooting... > > Using drive 0, partition 3. > > Loading.. > > probing: pc0 com0 mem[638K 3838M 256M a20=on] > > disk: hd0+ > > >> OpenBSD/amd64 BOOT 3.55 > > \ > > com0: 115200 baud > > switching console to com0 > > >> OpenBSD/amd64 BOOT 3.55 > > boot> > > NOTE: random seed is being reused. > > booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 > > [1143945+128+1225080+928182]=0x170d440 > > entry point at 0x81001000 > > [ using 3298368 bytes of bsd ELF symbol table ] > > Copyright (c) 1982, 1986, 1989, 1991, 1993 > > The Regents of the University of California. All rights reserved. > > Copyright (c) 1995-2022 OpenBSD. All rights reserved. > > https://www.OpenBSD.org > > > > OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC > > real mem = 4278177792 (4079MB) > > avail mem = 4131221504 (3939MB) > > random: good seed from bootblocks > > mpath0 at root > > scsibus0 at mpath0: 256 targets > > mainbus0 at root > > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries) > > bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011 > > bios0: OpenBSD VMM > > acpi at bios0 not configured > > cpu0 at mainbus0: (uniprocessor) > > kernel: protection fault trap, code=0 > > Stopped at tsc_identify+0xcd: rdmsr > > ddb> ps > >PID TID PPIDUID S FLAGS WAIT COMMAND > > *0 0 -1 0 7 0x10200swapper > > ddb> trace > > tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10) > > at tsc_identify+0xcd > > identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424) > > at identifycpu+0x2e4 > > cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300) > > at cpu_attach+0x16f > > config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8) > > at config_attach+0x1f4 > > mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at > > mainbus_attach+0x151 > > config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at > > config_attach+0x1f4 > > cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00) > > at cpu_configure+0x33 > > main(0,0,0,0,0,1) at main+0x379 > > end trace frame: 0x0, count: -8 > > ddb> show reg > > rdi 0x822a3035cpu_vendor+0xd > > rsi 0x81f04410cmd0646_9_tim_udma+0x170f5 > > rbp 0x82714c30end+0x314c30 > > rbx 0x20202020 > > rdx0 > > rcx 0xc0010015 > > rax0 > > r8 0 > > r9 0x40 > > r10 0x2bc299b68ee7cba5 > > r11 0x75a3a544d54dd7b9 > > r12 0x1 > > r13 0x8002c424 > > r14 0x822c7ff0cpu_info_full_primary+0x1ff0 > > r15 0x82714c40end+0x314c40 > > rip 0x819e1f4dtsc_identify+0xcd > > cs 0x8 > > rflags 0x10202__ALIGN_SIZE+0xf202 > > rsp 0x82714c10end+0x314c10 > > ss 0x10 > > tsc_identify+0xcd: rdmsr > > ddb> > > You get a #GP in your VM when trying to rdmsr(MSR_HWCR). My guess is > we need to expand the MSR read bitmap for SVM. > > This patch compiles, but I can't test it. Does it fix the panic? > > CC dv@ mlarkin@ > > Index: vmm.c > === > RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v > retrieving revision 1.323 > diff -u -p -r1.323 vmm.c > --- vmm.c 7 Sep 2022 18:44:09 - 1.323 > +++ vmm.c 31 Oct 2022 12:38:30 - > @@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s > /* allow reading TSC */ > svm_setmsrbr(vcpu, MSR_TSC); > > + /* allow reading HWCR and PSTATEDEF for TSC calibration */ > + svm_setmsrbr(vcpu, MSR_HWCR); > + svm_setmsrbr(vcpu, MSR_PSTATEDEF(0)); > + > /* Guest VCPU ASID */ > if (vmm_alloc_vpid()) { > DPRINTF("%s: could not allocate asid\n", __func__); > This is the same diff I would have come up with myself, and since it is reported to fix the issue, ok mlarkin@ on this. Thanks Scott. -ml
Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace
Hi I had the same issue on my laptop (AMD Ryzen 7 PRO 3700U) and the patch solved it on my machine at least. Jesper Wallin On Mon, Oct 31, 2022 at 02:15:00PM +0100, Paul de Weerd wrote: > On Mon, Oct 31, 2022 at 07:39:01AM -0500, Scott Cheloha wrote: > | You get a #GP in your VM when trying to rdmsr(MSR_HWCR). My guess is > | we need to expand the MSR read bitmap for SVM. > | > | This patch compiles, but I can't test it. Does it fix the panic? > > To test this patch, I'd have to upgrade the hypervisor. That's a bit > more involved, I'll plan it ASAP and report back, but it may be a few > days. > > Thank you Scott and Mike! > > Paul > > | CC dv@ mlarkin@ > | > | Index: vmm.c > | === > | RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v > | retrieving revision 1.323 > | diff -u -p -r1.323 vmm.c > | --- vmm.c 7 Sep 2022 18:44:09 - 1.323 > | +++ vmm.c 31 Oct 2022 12:38:30 - > | @@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s > | /* allow reading TSC */ > | svm_setmsrbr(vcpu, MSR_TSC); > | > | + /* allow reading HWCR and PSTATEDEF for TSC calibration */ > | + svm_setmsrbr(vcpu, MSR_HWCR); > | + svm_setmsrbr(vcpu, MSR_PSTATEDEF(0)); > | + > | /* Guest VCPU ASID */ > | if (vmm_alloc_vpid()) { > | DPRINTF("%s: could not allocate asid\n", __func__); > | > > -- > >[<++>-]<+++.>+++[<-->-]<.>+++[<+ > +++>-]<.>++[<>-]<+.--.[-] > http://www.weirdnet.nl/ >
Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace
On Mon, Oct 31, 2022 at 07:39:01AM -0500, Scott Cheloha wrote: | You get a #GP in your VM when trying to rdmsr(MSR_HWCR). My guess is | we need to expand the MSR read bitmap for SVM. | | This patch compiles, but I can't test it. Does it fix the panic? To test this patch, I'd have to upgrade the hypervisor. That's a bit more involved, I'll plan it ASAP and report back, but it may be a few days. Thank you Scott and Mike! Paul | CC dv@ mlarkin@ | | Index: vmm.c | === | RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v | retrieving revision 1.323 | diff -u -p -r1.323 vmm.c | --- vmm.c 7 Sep 2022 18:44:09 - 1.323 | +++ vmm.c 31 Oct 2022 12:38:30 - | @@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s | /* allow reading TSC */ | svm_setmsrbr(vcpu, MSR_TSC); | | + /* allow reading HWCR and PSTATEDEF for TSC calibration */ | + svm_setmsrbr(vcpu, MSR_HWCR); | + svm_setmsrbr(vcpu, MSR_PSTATEDEF(0)); | + | /* Guest VCPU ASID */ | if (vmm_alloc_vpid()) { | DPRINTF("%s: could not allocate asid\n", __func__); | -- >[<++>-]<+++.>+++[<-->-]<.>+++[<+ +++>-]<.>++[<>-]<+.--.[-] http://www.weirdnet.nl/
Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace
On Mon, Oct 31, 2022 at 12:43:50PM +0100, Paul de Weerd wrote: > Hi folks, > > I just upgraded a VM on my AMD EPYC host. I get the following > protection fault during boot: > > ddb> bo re > rebooting... > Using drive 0, partition 3. > Loading.. > probing: pc0 com0 mem[638K 3838M 256M a20=on] > disk: hd0+ > >> OpenBSD/amd64 BOOT 3.55 > \ > com0: 115200 baud > switching console to com0 > >> OpenBSD/amd64 BOOT 3.55 > boot> > NOTE: random seed is being reused. > booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 > [1143945+128+1225080+928182]=0x170d440 > entry point at 0x81001000 > [ using 3298368 bytes of bsd ELF symbol table ] > Copyright (c) 1982, 1986, 1989, 1991, 1993 > The Regents of the University of California. All rights reserved. > Copyright (c) 1995-2022 OpenBSD. All rights reserved. https://www.OpenBSD.org > > OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC > real mem = 4278177792 (4079MB) > avail mem = 4131221504 (3939MB) > random: good seed from bootblocks > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries) > bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011 > bios0: OpenBSD VMM > acpi at bios0 not configured > cpu0 at mainbus0: (uniprocessor) > kernel: protection fault trap, code=0 > Stopped at tsc_identify+0xcd: rdmsr > ddb> ps >PID TID PPIDUID S FLAGS WAIT COMMAND > *0 0 -1 0 7 0x10200swapper > ddb> trace > tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10) > at tsc_identify+0xcd > identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424) > at identifycpu+0x2e4 > cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300) > at cpu_attach+0x16f > config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8) > at config_attach+0x1f4 > mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at > mainbus_attach+0x151 > config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at > config_attach+0x1f4 > cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00) > at cpu_configure+0x33 > main(0,0,0,0,0,1) at main+0x379 > end trace frame: 0x0, count: -8 > ddb> show reg > rdi 0x822a3035cpu_vendor+0xd > rsi 0x81f04410cmd0646_9_tim_udma+0x170f5 > rbp 0x82714c30end+0x314c30 > rbx 0x20202020 > rdx0 > rcx 0xc0010015 > rax0 > r8 0 > r9 0x40 > r10 0x2bc299b68ee7cba5 > r11 0x75a3a544d54dd7b9 > r12 0x1 > r13 0x8002c424 > r14 0x822c7ff0cpu_info_full_primary+0x1ff0 > r15 0x82714c40end+0x314c40 > rip 0x819e1f4dtsc_identify+0xcd > cs 0x8 > rflags 0x10202__ALIGN_SIZE+0xf202 > rsp 0x82714c10end+0x314c10 > ss 0x10 > tsc_identify+0xcd: rdmsr > ddb> You get a #GP in your VM when trying to rdmsr(MSR_HWCR). My guess is we need to expand the MSR read bitmap for SVM. This patch compiles, but I can't test it. Does it fix the panic? CC dv@ mlarkin@ Index: vmm.c === RCS file: /cvs/src/sys/arch/amd64/amd64/vmm.c,v retrieving revision 1.323 diff -u -p -r1.323 vmm.c --- vmm.c 7 Sep 2022 18:44:09 - 1.323 +++ vmm.c 31 Oct 2022 12:38:30 - @@ -2705,6 +2705,10 @@ vcpu_reset_regs_svm(struct vcpu *vcpu, s /* allow reading TSC */ svm_setmsrbr(vcpu, MSR_TSC); + /* allow reading HWCR and PSTATEDEF for TSC calibration */ + svm_setmsrbr(vcpu, MSR_HWCR); + svm_setmsrbr(vcpu, MSR_PSTATEDEF(0)); + /* Guest VCPU ASID */ if (vmm_alloc_vpid()) { DPRINTF("%s: could not allocate asid\n", __func__);
Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace
On Mon, Oct 31, 2022 at 01:14:59PM +0100, Paul de Weerd wrote: > Had some time over lunch. Disabling the code path that calls > tsc_freq_msr lets me boot into the VM again: > > Index: tsc.c > === > RCS file: /home/OpenBSD/cvs/src/sys/arch/amd64/amd64/tsc.c,v > retrieving revision 1.30 > diff -u -p -r1.30 tsc.c > --- tsc.c 24 Oct 2022 00:56:33 - 1.30 > +++ tsc.c 31 Oct 2022 12:12:32 - > @@ -179,7 +179,7 @@ tsc_identify(struct cpu_info *ci) > tsc_is_invariant = 1; > > tsc_frequency = tsc_freq_cpuid(ci); > - if (tsc_frequency == 0) > + if (tsc_frequency == 42) > tsc_frequency = tsc_freq_msr(ci); > if (tsc_frequency > 0) > delay_init(tsc_delay, 5000); > > Obviously not a fix, but at least a smoking gun. > > Paul > The tsc freq msr probably needs to be passed through in this case. I'll take a look. -ml > On Mon, Oct 31, 2022 at 12:43:50PM +0100, Paul de Weerd wrote: > | Hi folks, > | > | I just upgraded a VM on my AMD EPYC host. I get the following > | protection fault during boot: > | > | ddb> bo re > | rebooting... > | Using drive 0, partition 3. > | Loading.. > | probing: pc0 com0 mem[638K 3838M 256M a20=on] > | disk: hd0+ > | >> OpenBSD/amd64 BOOT 3.55 > | \ > | com0: 115200 baud > | switching console to com0 > | >> OpenBSD/amd64 BOOT 3.55 > | boot> > | NOTE: random seed is being reused. > | booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 > [1143945+128+1225080+928182]=0x170d440 > | entry point at 0x81001000 > | [ using 3298368 bytes of bsd ELF symbol table ] > | Copyright (c) 1982, 1986, 1989, 1991, 1993 > | The Regents of the University of California. All rights reserved. > | Copyright (c) 1995-2022 OpenBSD. All rights reserved. > https://www.OpenBSD.org > | > | OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022 > | dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC > | real mem = 4278177792 (4079MB) > | avail mem = 4131221504 (3939MB) > | random: good seed from bootblocks > | mpath0 at root > | scsibus0 at mpath0: 256 targets > | mainbus0 at root > | bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries) > | bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011 > | bios0: OpenBSD VMM > | acpi at bios0 not configured > | cpu0 at mainbus0: (uniprocessor) > | kernel: protection fault trap, code=0 > | Stopped at tsc_identify+0xcd: rdmsr > | ddb> ps > |PID TID PPIDUID S FLAGS WAIT COMMAND > | *0 0 -1 0 7 0x10200swapper > | ddb> trace > | > tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10) > at tsc_identify+0xcd > | > identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424) > at identifycpu+0x2e4 > | > cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300) > at cpu_attach+0x16f > | > config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8) > at config_attach+0x1f4 > | mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at > mainbus_attach+0x151 > | config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at > config_attach+0x1f4 > | > cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00) > at cpu_configure+0x33 > | main(0,0,0,0,0,1) at main+0x379 > | end trace frame: 0x0, count: -8 > | ddb> show reg > | rdi 0x822a3035cpu_vendor+0xd > | rsi 0x81f04410cmd0646_9_tim_udma+0x170f5 > | rbp 0x82714c30end+0x314c30 > | rbx 0x20202020 > | rdx0 > | rcx 0xc0010015 > | rax0 > | r8 0 > | r9 0x40 > | r10 0x2bc299b68ee7cba5 > | r11 0x75a3a544d54dd7b9 > | r12 0x1 > | r13 0x8002c424 > | r14 0x822c7ff0cpu_info_full_primary+0x1ff0 > | r15 0x82714c40end+0x314c40 > | rip 0x819e1f4dtsc_identify+0xcd > | cs 0x8 > | rflags 0x10202__ALIGN_SIZE+0xf202 > | rsp 0x82714c10end+0x314c10 > | ss 0x10 > | tsc_identify+0xcd: rdmsr > | ddb> > | > | When trying to boot bsd.rd I get: > | > | fatal protection fault in supervisor mode > | trap type 4 code rip 811d5fb2 cs 8 rflags 10202 cr2 0 cpl > e rsp 81a06d10 > | gsbase 0x818f5ff0 kgsbase 0x0 > | panic: trap type 4, code=, pc=811d5fb2 > | > | This snapshot works
Re: kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace
Had some time over lunch. Disabling the code path that calls tsc_freq_msr lets me boot into the VM again: Index: tsc.c === RCS file: /home/OpenBSD/cvs/src/sys/arch/amd64/amd64/tsc.c,v retrieving revision 1.30 diff -u -p -r1.30 tsc.c --- tsc.c 24 Oct 2022 00:56:33 - 1.30 +++ tsc.c 31 Oct 2022 12:12:32 - @@ -179,7 +179,7 @@ tsc_identify(struct cpu_info *ci) tsc_is_invariant = 1; tsc_frequency = tsc_freq_cpuid(ci); - if (tsc_frequency == 0) + if (tsc_frequency == 42) tsc_frequency = tsc_freq_msr(ci); if (tsc_frequency > 0) delay_init(tsc_delay, 5000); Obviously not a fix, but at least a smoking gun. Paul On Mon, Oct 31, 2022 at 12:43:50PM +0100, Paul de Weerd wrote: | Hi folks, | | I just upgraded a VM on my AMD EPYC host. I get the following | protection fault during boot: | | ddb> bo re | rebooting... | Using drive 0, partition 3. | Loading.. | probing: pc0 com0 mem[638K 3838M 256M a20=on] | disk: hd0+ | >> OpenBSD/amd64 BOOT 3.55 | \ | com0: 115200 baud | switching console to com0 | >> OpenBSD/amd64 BOOT 3.55 | boot> | NOTE: random seed is being reused. | booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 [1143945+128+1225080+928182]=0x170d440 | entry point at 0x81001000 | [ using 3298368 bytes of bsd ELF symbol table ] | Copyright (c) 1982, 1986, 1989, 1991, 1993 | The Regents of the University of California. All rights reserved. | Copyright (c) 1995-2022 OpenBSD. All rights reserved. https://www.OpenBSD.org | | OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022 | dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC | real mem = 4278177792 (4079MB) | avail mem = 4131221504 (3939MB) | random: good seed from bootblocks | mpath0 at root | scsibus0 at mpath0: 256 targets | mainbus0 at root | bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries) | bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011 | bios0: OpenBSD VMM | acpi at bios0 not configured | cpu0 at mainbus0: (uniprocessor) | kernel: protection fault trap, code=0 | Stopped at tsc_identify+0xcd: rdmsr | ddb> ps |PID TID PPIDUID S FLAGS WAIT COMMAND | *0 0 -1 0 7 0x10200swapper | ddb> trace | tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10) at tsc_identify+0xcd | identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424) at identifycpu+0x2e4 | cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300) at cpu_attach+0x16f | config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8) at config_attach+0x1f4 | mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at mainbus_attach+0x151 | config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at config_attach+0x1f4 | cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00) at cpu_configure+0x33 | main(0,0,0,0,0,1) at main+0x379 | end trace frame: 0x0, count: -8 | ddb> show reg | rdi 0x822a3035cpu_vendor+0xd | rsi 0x81f04410cmd0646_9_tim_udma+0x170f5 | rbp 0x82714c30end+0x314c30 | rbx 0x20202020 | rdx0 | rcx 0xc0010015 | rax0 | r8 0 | r9 0x40 | r10 0x2bc299b68ee7cba5 | r11 0x75a3a544d54dd7b9 | r12 0x1 | r13 0x8002c424 | r14 0x822c7ff0cpu_info_full_primary+0x1ff0 | r15 0x82714c40end+0x314c40 | rip 0x819e1f4dtsc_identify+0xcd | cs 0x8 | rflags 0x10202__ALIGN_SIZE+0xf202 | rsp 0x82714c10end+0x314c10 | ss 0x10 | tsc_identify+0xcd: rdmsr | ddb> | | When trying to boot bsd.rd I get: | | fatal protection fault in supervisor mode | trap type 4 code rip 811d5fb2 cs 8 rflags 10202 cr2 0 cpl e rsp 81a06d10 | gsbase 0x818f5ff0 kgsbase 0x0 | panic: trap type 4, code=, pc=811d5fb2 | | This snapshot works fine in VMs running on my old Intel-based | workstation, so I suspect the AMD CPU may have something to do with | it. Included below is the dmesg of the hypervisor (yes, that should | also be upgraded at some point...). | | I still have an old bsd.rd that I can boot into from the previous | snapshot: | | OpenBSD 7.2 (RAMDISK_CD) #715: Thu Sep 22 11:51:48 MDT 2022 | |
kernel protection fault during boot on vmm(4) VM running on AMD EPYC cpu with tsc_identify in trace
Hi folks, I just upgraded a VM on my AMD EPYC host. I get the following protection fault during boot: ddb> bo re rebooting... Using drive 0, partition 3. Loading.. probing: pc0 com0 mem[638K 3838M 256M a20=on] disk: hd0+ >> OpenBSD/amd64 BOOT 3.55 \ com0: 115200 baud switching console to com0 >> OpenBSD/amd64 BOOT 3.55 boot> NOTE: random seed is being reused. booting hd0a:/bsd: 15615256+3781640+298464+0+1171456 [1143945+128+1225080+928182]=0x170d440 entry point at 0x81001000 [ using 3298368 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2022 OpenBSD. All rights reserved. https://www.OpenBSD.org OpenBSD 7.2-current (GENERIC) #784: Fri Oct 28 21:50:59 MDT 2022 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC real mem = 4278177792 (4079MB) avail mem = 4131221504 (3939MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf36b0 (12 entries) bios0: vendor SeaBIOS version "1.14.0p0-OpenBSD-vmm" date 01/01/2011 bios0: OpenBSD VMM acpi at bios0 not configured cpu0 at mainbus0: (uniprocessor) kernel: protection fault trap, code=0 Stopped at tsc_identify+0xcd: rdmsr ddb> ps PID TID PPIDUID S FLAGS WAIT COMMAND *0 0 -1 0 7 0x10200swapper ddb> trace tsc_identify(822c7ff0,822c7ff0,68a34bffd15c67e6,822c7ff0,10,82714c10) at tsc_identify+0xcd identifycpu(822c7ff0,822c7ff0,bca189629b3de454,8002c400,822c7ff0,8002c424) at identifycpu+0x2e4 cpu_attach(8002c300,8002c400,82714d98,8002c300,980a70616799eafd,8002c300) at cpu_attach+0x16f config_attach(8002c300,82289250,82714d98,8138d1b0,6c550c45866795b6,82714db8) at config_attach+0x1f4 mainbus_attach(0,8002c300,0,0,819b798732a62156,0) at mainbus_attach+0x151 config_attach(0,822891a8,0,0,6c550c4586f4e2c4,0) at config_attach+0x1f4 cpu_configure(f588b7541b8b8d14,0,0,8002e000,81abb8d3,82714f00) at cpu_configure+0x33 main(0,0,0,0,0,1) at main+0x379 end trace frame: 0x0, count: -8 ddb> show reg rdi 0x822a3035cpu_vendor+0xd rsi 0x81f04410cmd0646_9_tim_udma+0x170f5 rbp 0x82714c30end+0x314c30 rbx 0x20202020 rdx0 rcx 0xc0010015 rax0 r8 0 r9 0x40 r10 0x2bc299b68ee7cba5 r11 0x75a3a544d54dd7b9 r12 0x1 r13 0x8002c424 r14 0x822c7ff0cpu_info_full_primary+0x1ff0 r15 0x82714c40end+0x314c40 rip 0x819e1f4dtsc_identify+0xcd cs 0x8 rflags 0x10202__ALIGN_SIZE+0xf202 rsp 0x82714c10end+0x314c10 ss 0x10 tsc_identify+0xcd: rdmsr ddb> When trying to boot bsd.rd I get: fatal protection fault in supervisor mode trap type 4 code rip 811d5fb2 cs 8 rflags 10202 cr2 0 cpl e rsp 81a06d10 gsbase 0x818f5ff0 kgsbase 0x0 panic: trap type 4, code=, pc=811d5fb2 This snapshot works fine in VMs running on my old Intel-based workstation, so I suspect the AMD CPU may have something to do with it. Included below is the dmesg of the hypervisor (yes, that should also be upgraded at some point...). I still have an old bsd.rd that I can boot into from the previous snapshot: OpenBSD 7.2 (RAMDISK_CD) #715: Thu Sep 22 11:51:48 MDT 2022 Looking at CVS history between Sep 22 and today, this commit from Scott sticks out (hence the CC: to cheloha@): https://marc.info/?l=openbsd-cvs=166657262528344=2 Later tonight I can try reverting this commit to see if it helps things. Will follow up when there's something to report. Cheers, Paul --- dmesg (of the hypervisor) OpenBSD 7.1 (GENERIC.MP) #465: Mon Apr 11 18:03:57 MDT 2022 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 68567597056 (65391MB) avail mem = 66472255488 (63392MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xdab19000 (51 entries) bios0: vendor American Megatrends Inc. version "1.0c" date 06/30/2020 bios0: Supermicro Super Server acpi0 at bios0: ACPI 6.1 acpi0: sleep states S0 S5 acpi0: tables DSDT FACP APIC FPDT FIDT SSDT SPMI SSDT MCFG SSDT CRAT CDIT BERT EINJ HEST HPET SSDT UEFI IVRS SSDT WSMT acpi0: wakeup devices