Re: intel(4): edp_panel_vdd_on calls task_del(9) with NULL taskq
On Thu, May 06, 2021 at 02:36:21PM +0200, Mark Kettenis wrote: > > Date: Thu, 6 May 2021 21:59:12 +1000 > > From: Jonathan Gray > > > > On Wed, May 05, 2021 at 04:47:50PM -0500, Scott Cheloha wrote: > > > > > > [...] > > > > > > On a hunch I added additional parameter checks to task_add(9) and > > > task_del(9) and caught intel(4) doing something strange. > > > > > > [...] > > > > > > And here is the panic on my machine. I had to reconstruct it from > > > OCR, the machine has no serial port, sorry if there are typos. > > > > boot crash can be helpful for such machines Oh nice, thanks. > > > [...] > > > > > > db_enter() at db_enter+Oxa > > > panic(81db24fb) at panic+0x12f > > > task del(0,810633e0) at task_del+Oxa8 > > > edp_panel vdd_on(81063128) at edp_panel_vdd_on+0x6a > > > intel_dp_aux_xfer(81063128, 82512a20,4, > > > 82512400,2,0) at intel_dp_aux_xfer+0x18b > > > intel_dp_aux_transfer(810631e8, 82512a88) at > > > intel_dp_aux_transfer+0x183 > > > drm_dp_dpcd_access(810631e8,9,0,8106313a, 1) at > > > drm_dp_dpcd_access+Oxa9 > > > drm_dp_dpcd_read(810631e8,0,8106313a, f) at > > > drm_dp_dpcd_read+0x61 > > > intel_dp_read_dpcd(81063128) at intel_dp_read_dpcd+0x45 > > > intel_dp_init_connector(81063000, 81064000) at > > > intel_dp_init_connector+0x988 > > > intel_ddi_init(80272000,0) at intel_ddi_init+0x454 > > > intel_modeset_init(80272000) at intel_modeset_init+0x1c9f > > > i915_driver_probe(80272000, 82052f98) at > > > i915_driver_probe+0x7df > > > inteldrm_attachhook(80272000) at inteldrm_attachhook+0x46 > > > end trace frame: Ox82512700, count: 0 > > > > > > From the backtrace, I gather the following: > > > > > > edp_panel_vdd_on() calls clear_delayed_work() which is just a macro > > > > cancel_delayed_work() whoops > > > that calls task_del(). And for whatever reason the taskq passed to > > > task_del() is NULL. Maybe there is a missing INIT_DELAYED_WORK() call > > > somewhere prior to this point? > > > > the call to that is in > > INIT_DELAYED_WORK(_dp->panel_vdd_work, edp_panel_vdd_work); > > but tq doesn't get set until work is scheduled as there are interfaces > > to pick a tq when scheduling work. > > > > So perhaps you want something like this to catch the cases where work is > > cancelled before it is scheduled. > > Yeah, I think that makes sense. > > ok kettenis@ This patch fixes the panic, running with it now. ok cheloha@ Caveat: my confidence in my understanding of these interfaces is pretty low. -Scott
Re: intel(4): edp_panel_vdd_on calls task_del(9) with NULL taskq
> Date: Thu, 6 May 2021 21:59:12 +1000 > From: Jonathan Gray > > On Wed, May 05, 2021 at 04:47:50PM -0500, Scott Cheloha wrote: > > Hi, > > > > On a hunch I added additional parameter checks to task_add(9) and > > task_del(9) and caught intel(4) doing something strange. > > > > The patch is straightforward: check that the taskq pointer tq is not > > NULL. In the current code we return early if a flag is set or cleared > > in the task w, in which case we don't catch bogus taskq inputs, which > > is why the machine boots fine without the extra checks. > > > > The patch: > > > > Index: kern_task.c > > === > > RCS file: /cvs/src/sys/kern/kern_task.c,v > > retrieving revision 1.31 > > diff -u -p -r1.31 kern_task.c > > --- kern_task.c 1 Aug 2020 08:40:20 - 1.31 > > +++ kern_task.c 5 May 2021 21:29:08 - > > @@ -354,6 +354,9 @@ task_add(struct taskq *tq, struct task * > > { > > int rv = 0; > > > > + if (tq == NULL) > > + panic("%s: NULL taskq", __func__); > > + > > if (ISSET(w->t_flags, TASK_ONQUEUE)) > > return (0); > > > > @@ -378,6 +381,9 @@ int > > task_del(struct taskq *tq, struct task *w) > > { > > int rv = 0; > > + > > + if (tq == NULL) > > + panic("%s: NULL taskq", __func__); > > > > if (!ISSET(w->t_flags, TASK_ONQUEUE)) > > return (0); > > > > And here is the panic on my machine. I had to reconstruct it from > > OCR, the machine has no serial port, sorry if there are typos. > > boot crash can be helpful for such machines > > > > > panic: task_del: NULL taskq > > Stopped at db_enter+0xa: popq %rbp > > TIDPID UID PRFLAGSPFLAGS CPU COMMAND > > 513524 448830 Ox14000 0x2003 update > > 352928 824020 0x14000 0x2002 cleaner > > 382195 660350 Ox14000 0x2001 reaper > > ... > > db_enter() at db_enter+Oxa > > panic(81db24fb) at panic+0x12f > > task del(0,810633e0) at task_del+Oxa8 > > edp_panel vdd_on(81063128) at edp_panel_vdd_on+0x6a > > intel_dp_aux_xfer(81063128, 82512a20,4, > > 82512400,2,0) at intel_dp_aux_xfer+0x18b > > intel_dp_aux_transfer(810631e8, 82512a88) at > > intel_dp_aux_transfer+0x183 > > drm_dp_dpcd_access(810631e8,9,0,8106313a, 1) at > > drm_dp_dpcd_access+Oxa9 > > drm_dp_dpcd_read(810631e8,0,8106313a, f) at > > drm_dp_dpcd_read+0x61 > > intel_dp_read_dpcd(81063128) at intel_dp_read_dpcd+0x45 > > intel_dp_init_connector(81063000, 81064000) at > > intel_dp_init_connector+0x988 > > intel_ddi_init(80272000,0) at intel_ddi_init+0x454 > > intel_modeset_init(80272000) at intel_modeset_init+0x1c9f > > i915_driver_probe(80272000, 82052f98) at > > i915_driver_probe+0x7df > > inteldrm_attachhook(80272000) at inteldrm_attachhook+0x46 > > end trace frame: Ox82512700, count: 0 > > > > >From the backtrace, I gather the following: > > > > edp_panel_vdd_on() calls clear_delayed_work() which is just a macro > > cancel_delayed_work() > > > that calls task_del(). And for whatever reason the taskq passed to > > task_del() is NULL. Maybe there is a missing INIT_DELAYED_WORK() call > > somewhere prior to this point? > > the call to that is in > INIT_DELAYED_WORK(_dp->panel_vdd_work, edp_panel_vdd_work); > but tq doesn't get set until work is scheduled as there are interfaces > to pick a tq when scheduling work. > > So perhaps you want something like this to catch the cases where work is > cancelled before it is scheduled. Yeah, I think that makes sense. ok kettenis@ > Index: sys/dev/pci/drm/include/linux/workqueue.h > === > RCS file: /cvs/src/sys/dev/pci/drm/include/linux/workqueue.h,v > retrieving revision 1.4 > diff -u -p -r1.4 workqueue.h > --- sys/dev/pci/drm/include/linux/workqueue.h 14 Feb 2021 03:42:55 - > 1.4 > +++ sys/dev/pci/drm/include/linux/workqueue.h 6 May 2021 11:51:14 - > @@ -95,7 +95,8 @@ queue_work(struct workqueue_struct *wq, > static inline void > cancel_work_sync(struct work_struct *work) > { > - task_del(work->tq, >task); > + if (work->tq != NULL) > + task_del(work->tq, >task); > } > > #define work_pending(work) task_pending(&(work)->task) > @@ -169,6 +170,8 @@ mod_delayed_work(struct workqueue_struct > static inline bool > cancel_delayed_work(struct delayed_work *dwork) > { > + if (dwork->tq == NULL) > + return false; > if (timeout_del(>to)) > return true; > return task_del(dwork->tq, >work.task); > @@ -177,6 +180,8 @@ cancel_delayed_work(struct delayed_work > static inline bool > cancel_delayed_work_sync(struct delayed_work *dwork) > { > + if (dwork->tq == NULL) > + return false; > if
Re: intel(4): edp_panel_vdd_on calls task_del(9) with NULL taskq
On Wed, May 05, 2021 at 04:47:50PM -0500, Scott Cheloha wrote: > Hi, > > On a hunch I added additional parameter checks to task_add(9) and > task_del(9) and caught intel(4) doing something strange. > > The patch is straightforward: check that the taskq pointer tq is not > NULL. In the current code we return early if a flag is set or cleared > in the task w, in which case we don't catch bogus taskq inputs, which > is why the machine boots fine without the extra checks. > > The patch: > > Index: kern_task.c > === > RCS file: /cvs/src/sys/kern/kern_task.c,v > retrieving revision 1.31 > diff -u -p -r1.31 kern_task.c > --- kern_task.c 1 Aug 2020 08:40:20 - 1.31 > +++ kern_task.c 5 May 2021 21:29:08 - > @@ -354,6 +354,9 @@ task_add(struct taskq *tq, struct task * > { > int rv = 0; > > + if (tq == NULL) > + panic("%s: NULL taskq", __func__); > + > if (ISSET(w->t_flags, TASK_ONQUEUE)) > return (0); > > @@ -378,6 +381,9 @@ int > task_del(struct taskq *tq, struct task *w) > { > int rv = 0; > + > + if (tq == NULL) > + panic("%s: NULL taskq", __func__); > > if (!ISSET(w->t_flags, TASK_ONQUEUE)) > return (0); > > And here is the panic on my machine. I had to reconstruct it from > OCR, the machine has no serial port, sorry if there are typos. boot crash can be helpful for such machines > > panic: task_del: NULL taskq > Stopped at db_enter+0xa: popq %rbp > TID PID UID PRFLAGSPFLAGS CPU COMMAND > 513524 448830 Ox14000 0x2003 update > 352928 824020 0x14000 0x2002 cleaner > 382195 660350 Ox14000 0x2001 reaper > ... > db_enter() at db_enter+Oxa > panic(81db24fb) at panic+0x12f > task del(0,810633e0) at task_del+Oxa8 > edp_panel vdd_on(81063128) at edp_panel_vdd_on+0x6a > intel_dp_aux_xfer(81063128, 82512a20,4, 82512400,2,0) > at intel_dp_aux_xfer+0x18b > intel_dp_aux_transfer(810631e8, 82512a88) at > intel_dp_aux_transfer+0x183 > drm_dp_dpcd_access(810631e8,9,0,8106313a, 1) at > drm_dp_dpcd_access+Oxa9 > drm_dp_dpcd_read(810631e8,0,8106313a, f) at > drm_dp_dpcd_read+0x61 > intel_dp_read_dpcd(81063128) at intel_dp_read_dpcd+0x45 > intel_dp_init_connector(81063000, 81064000) at > intel_dp_init_connector+0x988 > intel_ddi_init(80272000,0) at intel_ddi_init+0x454 > intel_modeset_init(80272000) at intel_modeset_init+0x1c9f > i915_driver_probe(80272000, 82052f98) at > i915_driver_probe+0x7df > inteldrm_attachhook(80272000) at inteldrm_attachhook+0x46 > end trace frame: Ox82512700, count: 0 > > >From the backtrace, I gather the following: > > edp_panel_vdd_on() calls clear_delayed_work() which is just a macro cancel_delayed_work() > that calls task_del(). And for whatever reason the taskq passed to > task_del() is NULL. Maybe there is a missing INIT_DELAYED_WORK() call > somewhere prior to this point? the call to that is in INIT_DELAYED_WORK(_dp->panel_vdd_work, edp_panel_vdd_work); but tq doesn't get set until work is scheduled as there are interfaces to pick a tq when scheduling work. So perhaps you want something like this to catch the cases where work is cancelled before it is scheduled. Index: sys/dev/pci/drm/include/linux/workqueue.h === RCS file: /cvs/src/sys/dev/pci/drm/include/linux/workqueue.h,v retrieving revision 1.4 diff -u -p -r1.4 workqueue.h --- sys/dev/pci/drm/include/linux/workqueue.h 14 Feb 2021 03:42:55 - 1.4 +++ sys/dev/pci/drm/include/linux/workqueue.h 6 May 2021 11:51:14 - @@ -95,7 +95,8 @@ queue_work(struct workqueue_struct *wq, static inline void cancel_work_sync(struct work_struct *work) { - task_del(work->tq, >task); + if (work->tq != NULL) + task_del(work->tq, >task); } #define work_pending(work) task_pending(&(work)->task) @@ -169,6 +170,8 @@ mod_delayed_work(struct workqueue_struct static inline bool cancel_delayed_work(struct delayed_work *dwork) { + if (dwork->tq == NULL) + return false; if (timeout_del(>to)) return true; return task_del(dwork->tq, >work.task); @@ -177,6 +180,8 @@ cancel_delayed_work(struct delayed_work static inline bool cancel_delayed_work_sync(struct delayed_work *dwork) { + if (dwork->tq == NULL) + return false; if (timeout_del(>to)) return true; return task_del(dwork->tq, >work.task);
intel(4): edp_panel_vdd_on calls task_del(9) with NULL taskq
Hi, On a hunch I added additional parameter checks to task_add(9) and task_del(9) and caught intel(4) doing something strange. The patch is straightforward: check that the taskq pointer tq is not NULL. In the current code we return early if a flag is set or cleared in the task w, in which case we don't catch bogus taskq inputs, which is why the machine boots fine without the extra checks. The patch: Index: kern_task.c === RCS file: /cvs/src/sys/kern/kern_task.c,v retrieving revision 1.31 diff -u -p -r1.31 kern_task.c --- kern_task.c 1 Aug 2020 08:40:20 - 1.31 +++ kern_task.c 5 May 2021 21:29:08 - @@ -354,6 +354,9 @@ task_add(struct taskq *tq, struct task * { int rv = 0; + if (tq == NULL) + panic("%s: NULL taskq", __func__); + if (ISSET(w->t_flags, TASK_ONQUEUE)) return (0); @@ -378,6 +381,9 @@ int task_del(struct taskq *tq, struct task *w) { int rv = 0; + + if (tq == NULL) + panic("%s: NULL taskq", __func__); if (!ISSET(w->t_flags, TASK_ONQUEUE)) return (0); And here is the panic on my machine. I had to reconstruct it from OCR, the machine has no serial port, sorry if there are typos. panic: task_del: NULL taskq Stopped at db_enter+0xa: popq %rbp TIDPID UID PRFLAGSPFLAGS CPU COMMAND 513524 448830 Ox14000 0x2003 update 352928 824020 0x14000 0x2002 cleaner 382195 660350 Ox14000 0x2001 reaper ... db_enter() at db_enter+Oxa panic(81db24fb) at panic+0x12f task del(0,810633e0) at task_del+Oxa8 edp_panel vdd_on(81063128) at edp_panel_vdd_on+0x6a intel_dp_aux_xfer(81063128, 82512a20,4, 82512400,2,0) at intel_dp_aux_xfer+0x18b intel_dp_aux_transfer(810631e8, 82512a88) at intel_dp_aux_transfer+0x183 drm_dp_dpcd_access(810631e8,9,0,8106313a, 1) at drm_dp_dpcd_access+Oxa9 drm_dp_dpcd_read(810631e8,0,8106313a, f) at drm_dp_dpcd_read+0x61 intel_dp_read_dpcd(81063128) at intel_dp_read_dpcd+0x45 intel_dp_init_connector(81063000, 81064000) at intel_dp_init_connector+0x988 intel_ddi_init(80272000,0) at intel_ddi_init+0x454 intel_modeset_init(80272000) at intel_modeset_init+0x1c9f i915_driver_probe(80272000, 82052f98) at i915_driver_probe+0x7df inteldrm_attachhook(80272000) at inteldrm_attachhook+0x46 end trace frame: Ox82512700, count: 0 >From the backtrace, I gather the following: edp_panel_vdd_on() calls clear_delayed_work() which is just a macro that calls task_del(). And for whatever reason the taskq passed to task_del() is NULL. Maybe there is a missing INIT_DELAYED_WORK() call somewhere prior to this point? And yes, I know, this isn't a bug in the code as-is, but I'm putting this on bugs@ because I'm pretty certain the taskq shouldn't be NULL. It "works" without the additional checks, yes, but something is off. My dmesg is attached. Happy to provide more detail, reproduce it, etc. CC dlg@: Maybe there is a discussion to be had about always entering the taskq mutex during task_add() and task_del() to catch this problem in the future? -Scott OpenBSD 6.9-current (GENERIC.MP) #0: Tue May 4 14:43:41 CDT 2021 ssc@jetsam.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 16895528960 (16112MB) avail mem = 16368230400 (15609MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x9f03b000 (63 entries) bios0: vendor LENOVO version "N23ET59W (1.34 )" date 11/08/2018 bios0: LENOVO 20KHCTO1WW acpi0 at bios0: ACPI 5.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP SSDT SSDT TPM2 UEFI SSDT SSDT HPET APIC MCFG ECDT SSDT SSDT BOOT BATB SSDT SSDT SSDT LPIT WSMT SSDT SSDT SSDT DBGP DBG2 MSDM DMAR NHLT ASF! FPDT UEFI acpi0: wakeup devices GLAN(S4) XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) PXSX(S4) RP07(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 2399 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz, 1791.41 MHz, 06-8e-0a cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: