Re: crash on booting GENERIC.MP since upgrade to Jan 18 snapshot
On Mon, Jan 31, 2022 at 12:14:32PM -0300, Martin Pieuchot wrote: > On 31/01/22(Mon) 00:54, Thomas Frohwein wrote: > > On Sat, 29 Jan 2022 12:15:10 -0300 > > Martin Pieuchot wrote: > > > > > On 28/01/22(Fri) 23:03, Thomas Frohwein wrote: > > > > On Sat, 29 Jan 2022 15:19:20 +1100 > > > > Jonathan Gray wrote: > > > > > > > > > does this diff to revert uvm_fault.c rev 1.124 change anything? > > > > > > > > Unfortunately no. Same pmap error as in the original bug report occurs > > > > with a kernel with this diff. > > > > > > Could you submit a new bug report? Could you manage to include ps and the > > > trace of all the CPUs when the pmap corruption occurs? > > > > See below > > > > > > > > Do you have some steps to reproduce the corruption? Which program is > > > currently running? Is it multi-threaded? What is the simplest scenario > > > to trigger the corruption? > > > > It's during boot of the MP kernel. The only scenario I can provide is > > booting this machine with an MP kernel from January 18 or newer. If I > > boot SP kernel, or build an MP kernel with jsg@'s diff that adds > > `pool_debug = 2`, the panic does _not_ occur. > > This indicates some race is present and not triggered if more context > switches occur. > > > Here some new (hand-typed from a picture) output when I boot a freshly > > downloaded snapshot MP kernel from January 30th (note this is an 8 core/16 > > hyperthreads CPU; I have _not_ enabled hyperthreading). I attached dmesg > > from > > booting bsd.sp, too. > > Thanks, so most CPUs already reached the idle loop and are not yet running > anything. > > Nobody is running the KERNEL_LOCK(), the faulting process obviously > isn't and I don't understand which one it is. > > Note that the corruption occurred on CPU2. We don't know where it > occurred the previous time. This is interesting to watch to understand > between which CPUs the race is occurring. > > > ... (boot, see dmesg in original bugs@ submission) > > wsdisplay0: screen 1-5 added (std, vt100 emulation) > > iwm0: hw rev 0x200, fw ver 36.ca7b901d.0, address [...] > > va 7f7fb000 ppa ff000 > > That the faulting address, right? This is the same as in the first > report. It seems to be inside the level 1 page table range, is it? > What does that mean? it is exactly one page below amd64 VM_MAXUSER_ADDRESS 7f7fc000 rounding error or off by one somewhere? > > I don't understand which process is triggering the fault. Maybe > somebody (jsg@?) could craft a diff to figure out if this same address > fault and which thread/context is faulting it in SP and/or with > pool_debug = 2. I'm not sure what you mean here. It doesn't trigger with pool_debug=2. It would be interesting if not starting xenodm on boot could be tried, that should avoid the drm mmap path on boot. > > Something like: > > if (va == 0x7f7fb000) > db_enter(); > > > panic: pmap_get_ptp: unmanaged user PTP > > Stopped at db_enter+0x10: popq %rbp > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > * 28644 1 0 0 0 2K swapper > > db_enter() at db_enter+0x10 > > panic(81f3dd1f) at panic+0xbf > > pmap_get_ptp(fd888e52ee58,7f7fb000) at pmap_get_ptp+0x303 > > pmap_enter(fd888e52ee58,7f7fb000,13d151000,3,22) at pmap_enter+0x188 > > uvm_fault_lower(8000156852a0,8000156852d8,800015685220,0) at > > uvm_fault_lower+0x63d > > uvm_fault(fd888e52fdd0,7f7fb000,0,2) at uvm_fault+0x1b3 > > kpageflttrap(800015685420,7f7fbff5) at kpageflttrap+0x12c > > kerntrap(800015685420) at kerntrap+0x91 > > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > > copyout() at copyout+0x53 > > end trace frame: 0x0, count: 5 > > https://www.openbsd.org/ [...] > > ddb{2}> show panic > > *cpu2: pmap_get_ptp: unmanaged user PTP > > ddb{2}> mach ddbcpu 0 > > Stopped at x86_ipi_db+0x12:leave > > x86_ipi_db(822acff0) at x86_ipi_db+0x12 > > x86_ipi_handler() at x86_ipi_handler+0x80 > > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > > acpicpu_idle() at acpicpu_idle+0x203 > > sched_idle(f822acff0) at sched_idle+0x280 > > end trace frame: 0x0, count: 10 > > ddb{0}> mach ddbcpu 1 > > Stopped at x86_ipi_db+0x12:leave > > x86_ipi_db(800015363ff0) at x86_ipi_db+0x12 > > x86_ipi_handler() at x86_ipi_handler+0x80 > > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > > acpicpu_idle() at acpicpu_idle+0x203 > > sched_idle(800015363ff0) at sched_idle+0x280 > > end trace frame: 0x0, count: 10 > > ddb{1}> mach ddbcpu 3 > > Stopped at x86_ipi_db+0x12:leave > > x86_ipi_db(800015375ff0) at x86_ipi_db+0x12 > > x86_ipi_handler() at x86_ipi_handler+0x80 > > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > > acpicpu_idle() at acpicpu_idle+0x203 > > sched_idle(800015375ff0) at sched_idle+0x280 > > end trace frame: 0x0, count: 10 > > ddb{3}> mach ddbcpu 4 > > Stopped at
Re: Interrupts hover above 40% when idle on Dell Latitude E7450
On Sun, Jan 30, 2022 at 10:41:34AM -0500, Ryan Kavanagh wrote: > On Sun, Jan 30, 2022 at 12:39:02AM -0600, Scott Cheloha wrote: > > > btrace -e 'profile:hz:100 { @[kstack] = count(); }' > /tmp/btrace.out > > > > > > for ten seconds and ran the output through > > > > > > https://github.com/brendangregg/FlameGraph/raw/master/stackcollapse-bpftrace.pl > > > https://github.com/brendangregg/FlameGraph/raw/master/flamegraph.pl > > > > > > The output of stackcollapse-bpftrace.pl and flamegraph.pl are attached > > > as btrace.collapsed and btrace.svg. > > > > The flamegraph suggests that you spent 10% of that time servicing > > ichiic(4) interrupts from idle. > > > > That could be a fluke though. > > In case it was a fluke, I've regenerated the flamegraph on 7.0 > GENERIC.MP#293 amd64 using 10 seconds of output on an idle machine. > Please see attached. > > > What does the main systat view look like in the interrupt column? > > > > $ systat 1 > > Again on #293: > > Interrupts(range after idling for a few seconds) > 247 total(235-260) > 200 clock(200-200) > 21 ipi (16-23) >1 acpi0(0-1) >6 inteldrm (5-7) > azalia1 (0-0) > 11 iwm0 (10-16) > ehci0(0-0) >1 ahci0(0-1) >1 ichiic0 (0-1) >6 pckbc0 (0-0) > pckbc0 (0-0) Based on these numbers and the similar-looking flamegraph I'd say you're spending a relatively large amount of time handling ichiic(4) interrupts. I don't know anything about that device but my guess is that it is slow if you're spending that much time in x86_bus_space_io_read_1() and its _write_1() counterpart. Someone else is going to have to weigh in on what might be the cause and solution. Thank you providing the traces.
Re: crash on booting GENERIC.MP since upgrade to Jan 18 snapshot
On 31/01/22(Mon) 19:18, Jonathan Gray wrote: > On Mon, Jan 31, 2022 at 12:54:53AM -0700, Thomas Frohwein wrote: > > On Sat, 29 Jan 2022 12:15:10 -0300 > > Martin Pieuchot wrote: > > > > > On 28/01/22(Fri) 23:03, Thomas Frohwein wrote: > > > > On Sat, 29 Jan 2022 15:19:20 +1100 > > > > Jonathan Gray wrote: > > > > > > > > > does this diff to revert uvm_fault.c rev 1.124 change anything? > > > > > > > > Unfortunately no. Same pmap error as in the original bug report occurs > > > > with a kernel with this diff. > > > > > > Could you submit a new bug report? Could you manage to include ps and the > > > trace of all the CPUs when the pmap corruption occurs? > > > > See below > > > > > > > > Do you have some steps to reproduce the corruption? Which program is > > > currently running? Is it multi-threaded? What is the simplest scenario > > > to trigger the corruption? > > > > It's during boot of the MP kernel. The only scenario I can provide is > > booting this machine with an MP kernel from January 18 or newer. If I > > boot SP kernel, or build an MP kernel with jsg@'s diff that adds > > `pool_debug = 2`, the panic does _not_ occur. > > That pool_debug change also avoids what Paul de Weerd sees on a > Dell XPS 13 9305 with i7-1165G7 as does running SP > > panic: pool_do_get: idrpl: page empty > Stopped atdb_enter+0x10: popq%rbp > TIDPIDUID PRFLAGS PFLAGS CPU COMMAND > > *293226 4683 0 0x14000 0x2000K drmwq > How can this error happen? Does that mean there's a corruption in the pool? Is some synchronisation incorrect or some lock missing? David you know the pool subsystem better than us, do you have any inside? Thanks! > db_enter() at db_enter+0x10 > panic(81f08e21) at panic+0xbf > pool_do_get(823b3710,1,80001d9a11e4) at pool_do_get+0x2f6 > pool_get(823b3710,1) at pool_get+0x96 > idr_alloc(803cc2e0,80fba500,1,0,5) at idr_alloc+0x78 > __drm_mode_object_add(803cc078,80fba500,,1,8102dda0) > at __drm_mode_object_add+0xa6 > drm_property_create_blob(803cc078,80,8119ef80) at > drm_property_create_blob+0xa7 > drm_property_replace_global_blob(803cc078,80e9c950,80,8119ef80,80e9c828,8095a180) > at drm_property_replace_global_blob+0x84 > drm_connector_update_edid_property(80e9c800,8119ef80) at > drm_connector_update_edid_property+0x118 > intel_connector_update_modes(80e9c800,8119ef80) at > intel_connector_update_modes+0x15 > intel_dp_get_modes(80e9c800) at intel_dp_get_modes+0x33 > drm_helper_probe_single_connector_modes(80e9c800,f00,870) at > drm_helper_probe_single_connector_modes+0x353 > drm_client_modeset_probe(80edda00,f00,870) at > drm_client_modeset_probe+0x281 > drm_fb_helper_hotplug_event(80edda00) at > drm_fb_helper_hotplug_event+0xd3 > end trace frame: 0x80001d9a1800, count: 0 > > some tiger lake machines don't see either problem > for example thinkpad x1 nano, framework laptop > > > > > Here some new (hand-typed from a picture) output when I boot a freshly > > downloaded snapshot MP kernel from January 30th (note this is an 8 core/16 > > hyperthreads CPU; I have _not_ enabled hyperthreading). I attached dmesg > > from > > booting bsd.sp, too. > > > > ... (boot, see dmesg in original bugs@ submission) > > wsdisplay0: screen 1-5 added (std, vt100 emulation) > > iwm0: hw rev 0x200, fw ver 36.ca7b901d.0, address [...] > > va 7f7fb000 ppa ff000 > > panic: pmap_get_ptp: unmanaged user PTP > > Stopped at db_enter+0x10: popq %rbp > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > * 28644 1 0 0 0 2K swapper > > db_enter() at db_enter+0x10 > > panic(81f3dd1f) at panic+0xbf > > pmap_get_ptp(fd888e52ee58,7f7fb000) at pmap_get_ptp+0x303 > > pmap_enter(fd888e52ee58,7f7fb000,13d151000,3,22) at pmap_enter+0x188 > > uvm_fault_lower(8000156852a0,8000156852d8,800015685220,0) at > > uvm_fault_lower+0x63d > > uvm_fault(fd888e52fdd0,7f7fb000,0,2) at uvm_fault+0x1b3 > > kpageflttrap(800015685420,7f7fbff5) at kpageflttrap+0x12c > > kerntrap(800015685420) at kerntrap+0x91 > > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > > copyout() at copyout+0x53 > > end trace frame: 0x0, count: 5 > > https://www.openbsd.org/ [...] > > ddb{2}> show panic > > *cpu2: pmap_get_ptp: unmanaged user PTP > > ddb{2}> mach ddbcpu 0 > > Stopped at x86_ipi_db+0x12:leave > > x86_ipi_db(822acff0) at x86_ipi_db+0x12 > > x86_ipi_handler() at x86_ipi_handler+0x80 > > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > > acpicpu_idle() at acpicpu_idle+0x203 > > sched_idle(f822acff0) at sched_idle+0x280 > > end trace frame: 0x0, count: 10 > > ddb{0}> mach ddbcpu 1 > > Stopped at
Re: crash on booting GENERIC.MP since upgrade to Jan 18 snapshot
On 31/01/22(Mon) 00:54, Thomas Frohwein wrote: > On Sat, 29 Jan 2022 12:15:10 -0300 > Martin Pieuchot wrote: > > > On 28/01/22(Fri) 23:03, Thomas Frohwein wrote: > > > On Sat, 29 Jan 2022 15:19:20 +1100 > > > Jonathan Gray wrote: > > > > > > > does this diff to revert uvm_fault.c rev 1.124 change anything? > > > > > > Unfortunately no. Same pmap error as in the original bug report occurs > > > with a kernel with this diff. > > > > Could you submit a new bug report? Could you manage to include ps and the > > trace of all the CPUs when the pmap corruption occurs? > > See below > > > > > Do you have some steps to reproduce the corruption? Which program is > > currently running? Is it multi-threaded? What is the simplest scenario > > to trigger the corruption? > > It's during boot of the MP kernel. The only scenario I can provide is > booting this machine with an MP kernel from January 18 or newer. If I > boot SP kernel, or build an MP kernel with jsg@'s diff that adds > `pool_debug = 2`, the panic does _not_ occur. This indicates some race is present and not triggered if more context switches occur. > Here some new (hand-typed from a picture) output when I boot a freshly > downloaded snapshot MP kernel from January 30th (note this is an 8 core/16 > hyperthreads CPU; I have _not_ enabled hyperthreading). I attached dmesg from > booting bsd.sp, too. Thanks, so most CPUs already reached the idle loop and are not yet running anything. Nobody is running the KERNEL_LOCK(), the faulting process obviously isn't and I don't understand which one it is. Note that the corruption occurred on CPU2. We don't know where it occurred the previous time. This is interesting to watch to understand between which CPUs the race is occurring. > ... (boot, see dmesg in original bugs@ submission) > wsdisplay0: screen 1-5 added (std, vt100 emulation) > iwm0: hw rev 0x200, fw ver 36.ca7b901d.0, address [...] > va 7f7fb000 ppa ff000 That the faulting address, right? This is the same as in the first report. It seems to be inside the level 1 page table range, is it? What does that mean? I don't understand which process is triggering the fault. Maybe somebody (jsg@?) could craft a diff to figure out if this same address fault and which thread/context is faulting it in SP and/or with pool_debug = 2. Something like: if (va == 0x7f7fb000) db_enter(); > panic: pmap_get_ptp: unmanaged user PTP > Stopped at db_enter+0x10: popq %rbp > TID PID UID PRFLAGS PFLAGS CPU COMMAND > * 28644 1 0 0 0 2K swapper > db_enter() at db_enter+0x10 > panic(81f3dd1f) at panic+0xbf > pmap_get_ptp(fd888e52ee58,7f7fb000) at pmap_get_ptp+0x303 > pmap_enter(fd888e52ee58,7f7fb000,13d151000,3,22) at pmap_enter+0x188 > uvm_fault_lower(8000156852a0,8000156852d8,800015685220,0) at > uvm_fault_lower+0x63d > uvm_fault(fd888e52fdd0,7f7fb000,0,2) at uvm_fault+0x1b3 > kpageflttrap(800015685420,7f7fbff5) at kpageflttrap+0x12c > kerntrap(800015685420) at kerntrap+0x91 > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > copyout() at copyout+0x53 > end trace frame: 0x0, count: 5 > https://www.openbsd.org/ [...] > ddb{2}> show panic > *cpu2: pmap_get_ptp: unmanaged user PTP > ddb{2}> mach ddbcpu 0 > Stopped atx86_ipi_db+0x12:leave > x86_ipi_db(822acff0) at x86_ipi_db+0x12 > x86_ipi_handler() at x86_ipi_handler+0x80 > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > acpicpu_idle() at acpicpu_idle+0x203 > sched_idle(f822acff0) at sched_idle+0x280 > end trace frame: 0x0, count: 10 > ddb{0}> mach ddbcpu 1 > Stopped atx86_ipi_db+0x12:leave > x86_ipi_db(800015363ff0) at x86_ipi_db+0x12 > x86_ipi_handler() at x86_ipi_handler+0x80 > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > acpicpu_idle() at acpicpu_idle+0x203 > sched_idle(800015363ff0) at sched_idle+0x280 > end trace frame: 0x0, count: 10 > ddb{1}> mach ddbcpu 3 > Stopped atx86_ipi_db+0x12:leave > x86_ipi_db(800015375ff0) at x86_ipi_db+0x12 > x86_ipi_handler() at x86_ipi_handler+0x80 > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > acpicpu_idle() at acpicpu_idle+0x203 > sched_idle(800015375ff0) at sched_idle+0x280 > end trace frame: 0x0, count: 10 > ddb{3}> mach ddbcpu 4 > Stopped atx86_ipi_db+0x12:leave > x86_ipi_db(80001537eff0) at x86_ipi_db+0x12 > x86_ipi_handler() at x86_ipi_handler+0x80 > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > acpicpu_idle() at acpicpu_idle+0x203 > sched_idle(80001537eff0) at sched_idle+0x280 > end trace frame: 0x0, count: 10 > ddb{4}> mach ddbcpu 5 > Stopped atx86_ipi_db+0x12:leave > x86_ipi_db(800015387ff0) at x86_ipi_db+0x12 > x86_ipi_handler() at x86_ipi_handler+0x80 > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > acpicpu_idle() at acpicpu_idle+0x203 > sched_idle(800015387ff0) at
Re: crash on booting GENERIC.MP since upgrade to Jan 18 snapshot
On Mon, Jan 31, 2022 at 12:54:53AM -0700, Thomas Frohwein wrote: > On Sat, 29 Jan 2022 12:15:10 -0300 > Martin Pieuchot wrote: > > > On 28/01/22(Fri) 23:03, Thomas Frohwein wrote: > > > On Sat, 29 Jan 2022 15:19:20 +1100 > > > Jonathan Gray wrote: > > > > > > > does this diff to revert uvm_fault.c rev 1.124 change anything? > > > > > > Unfortunately no. Same pmap error as in the original bug report occurs > > > with a kernel with this diff. > > > > Could you submit a new bug report? Could you manage to include ps and the > > trace of all the CPUs when the pmap corruption occurs? > > See below > > > > > Do you have some steps to reproduce the corruption? Which program is > > currently running? Is it multi-threaded? What is the simplest scenario > > to trigger the corruption? > > It's during boot of the MP kernel. The only scenario I can provide is > booting this machine with an MP kernel from January 18 or newer. If I > boot SP kernel, or build an MP kernel with jsg@'s diff that adds > `pool_debug = 2`, the panic does _not_ occur. > > Here some new (hand-typed from a picture) output when I boot a freshly > downloaded snapshot MP kernel from January 30th (note this is an 8 core/16 > hyperthreads CPU; I have _not_ enabled hyperthreading). I attached dmesg from > booting bsd.sp, too. > > ... (boot, see dmesg in original bugs@ submission) > wsdisplay0: screen 1-5 added (std, vt100 emulation) > iwm0: hw rev 0x200, fw ver 36.ca7b901d.0, address [...] > va 7f7fb000 ppa ff000 > panic: pmap_get_ptp: unmanaged user PTP > Stopped at db_enter+0x10: popq %rbp > TID PID UID PRFLAGS PFLAGS CPU COMMAND > * 28644 1 0 0 0 2K swapper > db_enter() at db_enter+0x10 > panic(81f3dd1f) at panic+0xbf > pmap_get_ptp(fd888e52ee58,7f7fb000) at pmap_get_ptp+0x303 > pmap_enter(fd888e52ee58,7f7fb000,13d151000,3,22) at pmap_enter+0x188 > uvm_fault_lower(8000156852a0,8000156852d8,800015685220,0) at > uvm_fault_lower+0x63d > uvm_fault(fd888e52fdd0,7f7fb000,0,2) at uvm_fault+0x1b3 > kpageflttrap(800015685420,7f7fbff5) at kpageflttrap+0x12c > kerntrap(800015685420) at kerntrap+0x91 > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > copyout() at copyout+0x53 > end trace frame: 0x0, count: 5 does this diff to provide stolen memory data help? Index: sys/dev/pci/drm/i915/i915_drv.c === RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_drv.c,v retrieving revision 1.135 diff -u -p -r1.135 i915_drv.c --- sys/dev/pci/drm/i915/i915_drv.c 19 Jan 2022 02:20:06 - 1.135 +++ sys/dev/pci/drm/i915/i915_drv.c 31 Jan 2022 11:20:04 - @@ -2350,6 +2350,7 @@ inteldrm_match(struct device *parent, vo } int drm_gem_init(struct drm_device *); +void intel_init_stolen_res(struct inteldrm_softc *); void inteldrm_attach(struct device *parent, struct device *self, void *aux) @@ -2469,6 +2470,7 @@ inteldrm_attach(struct device *parent, s return; } dev->pdev->irq = -1; + intel_init_stolen_res(dev_priv); config_mountroot(self, inteldrm_attachhook); } Index: sys/dev/pci/drm/i915/intel_stolen.c === RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_stolen.c,v retrieving revision 1.2 diff -u -p -r1.2 intel_stolen.c --- sys/dev/pci/drm/i915/intel_stolen.c 14 Jan 2022 06:53:11 - 1.2 +++ sys/dev/pci/drm/i915/intel_stolen.c 31 Jan 2022 11:25:37 - @@ -163,7 +163,7 @@ intel_init_stolen_res(struct inteldrm_so if (GRAPHICS_VER(dev_priv) >= 3 && GRAPHICS_VER(dev_priv) < 11) stolen_base = gen3_stolen_base(dev_priv); - else if (GRAPHICS_VER(dev_priv) == 11) + else if (GRAPHICS_VER(dev_priv) == 11 || GRAPHICS_VER(dev_priv) == 12) stolen_base = gen11_stolen_base(dev_priv); if (IS_I830(dev_priv) || IS_I845G(dev_priv)) @@ -177,7 +177,7 @@ intel_init_stolen_res(struct inteldrm_so stolen_size = gen6_stolen_size(dev_priv); else if (GRAPHICS_VER(dev_priv) == 8) stolen_size = gen8_stolen_size(dev_priv); - else if (GRAPHICS_VER(dev_priv) >= 9 && GRAPHICS_VER(dev_priv) < 12) + else if (GRAPHICS_VER(dev_priv) >= 9 && GRAPHICS_VER(dev_priv) <= 12) stolen_size = gen9_stolen_size(dev_priv); if (stolen_base == 0 || stolen_size == 0) Index: sys/dev/pci/drm/i915/gt/intel_ggtt.c === RCS file: /cvs/src/sys/dev/pci/drm/i915/gt/intel_ggtt.c,v retrieving revision 1.4 diff -u -p -r1.4 intel_ggtt.c --- sys/dev/pci/drm/i915/gt/intel_ggtt.c26 Jan 2022 01:46:12 - 1.4 +++ sys/dev/pci/drm/i915/gt/intel_ggtt.c31 Jan 2022 11:33:05 - @@ -1320,10 +1320,10 @@ static int ggtt_probe_hw(struct i915_ggt } /* GMADR is the
Re: crash on booting GENERIC.MP since upgrade to Jan 18 snapshot
On Mon, Jan 31, 2022 at 12:54:53AM -0700, Thomas Frohwein wrote: > On Sat, 29 Jan 2022 12:15:10 -0300 > Martin Pieuchot wrote: > > > On 28/01/22(Fri) 23:03, Thomas Frohwein wrote: > > > On Sat, 29 Jan 2022 15:19:20 +1100 > > > Jonathan Gray wrote: > > > > > > > does this diff to revert uvm_fault.c rev 1.124 change anything? > > > > > > Unfortunately no. Same pmap error as in the original bug report occurs > > > with a kernel with this diff. > > > > Could you submit a new bug report? Could you manage to include ps and the > > trace of all the CPUs when the pmap corruption occurs? > > See below > > > > > Do you have some steps to reproduce the corruption? Which program is > > currently running? Is it multi-threaded? What is the simplest scenario > > to trigger the corruption? > > It's during boot of the MP kernel. The only scenario I can provide is > booting this machine with an MP kernel from January 18 or newer. If I > boot SP kernel, or build an MP kernel with jsg@'s diff that adds > `pool_debug = 2`, the panic does _not_ occur. That pool_debug change also avoids what Paul de Weerd sees on a Dell XPS 13 9305 with i7-1165G7 as does running SP panic: pool_do_get: idrpl: page empty Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *293226 4683 0 0x14000 0x2000K drmwq db_enter() at db_enter+0x10 panic(81f08e21) at panic+0xbf pool_do_get(823b3710,1,80001d9a11e4) at pool_do_get+0x2f6 pool_get(823b3710,1) at pool_get+0x96 idr_alloc(803cc2e0,80fba500,1,0,5) at idr_alloc+0x78 __drm_mode_object_add(803cc078,80fba500,,1,8102dda0) at __drm_mode_object_add+0xa6 drm_property_create_blob(803cc078,80,8119ef80) at drm_property_create_blob+0xa7 drm_property_replace_global_blob(803cc078,80e9c950,80,8119ef80,80e9c828,8095a180) at drm_property_replace_global_blob+0x84 drm_connector_update_edid_property(80e9c800,8119ef80) at drm_connector_update_edid_property+0x118 intel_connector_update_modes(80e9c800,8119ef80) at intel_connector_update_modes+0x15 intel_dp_get_modes(80e9c800) at intel_dp_get_modes+0x33 drm_helper_probe_single_connector_modes(80e9c800,f00,870) at drm_helper_probe_single_connector_modes+0x353 drm_client_modeset_probe(80edda00,f00,870) at drm_client_modeset_probe+0x281 drm_fb_helper_hotplug_event(80edda00) at drm_fb_helper_hotplug_event+0xd3 end trace frame: 0x80001d9a1800, count: 0 some tiger lake machines don't see either problem for example thinkpad x1 nano, framework laptop > > Here some new (hand-typed from a picture) output when I boot a freshly > downloaded snapshot MP kernel from January 30th (note this is an 8 core/16 > hyperthreads CPU; I have _not_ enabled hyperthreading). I attached dmesg from > booting bsd.sp, too. > > ... (boot, see dmesg in original bugs@ submission) > wsdisplay0: screen 1-5 added (std, vt100 emulation) > iwm0: hw rev 0x200, fw ver 36.ca7b901d.0, address [...] > va 7f7fb000 ppa ff000 > panic: pmap_get_ptp: unmanaged user PTP > Stopped at db_enter+0x10: popq %rbp > TID PID UID PRFLAGS PFLAGS CPU COMMAND > * 28644 1 0 0 0 2K swapper > db_enter() at db_enter+0x10 > panic(81f3dd1f) at panic+0xbf > pmap_get_ptp(fd888e52ee58,7f7fb000) at pmap_get_ptp+0x303 > pmap_enter(fd888e52ee58,7f7fb000,13d151000,3,22) at pmap_enter+0x188 > uvm_fault_lower(8000156852a0,8000156852d8,800015685220,0) at > uvm_fault_lower+0x63d > uvm_fault(fd888e52fdd0,7f7fb000,0,2) at uvm_fault+0x1b3 > kpageflttrap(800015685420,7f7fbff5) at kpageflttrap+0x12c > kerntrap(800015685420) at kerntrap+0x91 > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > copyout() at copyout+0x53 > end trace frame: 0x0, count: 5 > https://www.openbsd.org/ [...] > ddb{2}> show panic > *cpu2: pmap_get_ptp: unmanaged user PTP > ddb{2}> mach ddbcpu 0 > Stopped atx86_ipi_db+0x12:leave > x86_ipi_db(822acff0) at x86_ipi_db+0x12 > x86_ipi_handler() at x86_ipi_handler+0x80 > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > acpicpu_idle() at acpicpu_idle+0x203 > sched_idle(f822acff0) at sched_idle+0x280 > end trace frame: 0x0, count: 10 > ddb{0}> mach ddbcpu 1 > Stopped atx86_ipi_db+0x12:leave > x86_ipi_db(800015363ff0) at x86_ipi_db+0x12 > x86_ipi_handler() at x86_ipi_handler+0x80 > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > acpicpu_idle() at acpicpu_idle+0x203 > sched_idle(800015363ff0) at sched_idle+0x280 > end trace frame: 0x0, count: 10 > ddb{1}> mach ddbcpu 3 > Stopped atx86_ipi_db+0x12:leave > x86_ipi_db(800015375ff0) at x86_ipi_db+0x12 > x86_ipi_handler() at x86_ipi_handler+0x80 >