Re: intel(4): edp_panel_vdd_on calls task_del(9) with NULL taskq

2021-05-06 Thread Scott Cheloha
On Thu, May 06, 2021 at 02:36:21PM +0200, Mark Kettenis wrote:
> > Date: Thu, 6 May 2021 21:59:12 +1000
> > From: Jonathan Gray 
> > 
> > On Wed, May 05, 2021 at 04:47:50PM -0500, Scott Cheloha wrote:
> > > 
> > > [...]
> > > 
> > > On a hunch I added additional parameter checks to task_add(9) and
> > > task_del(9) and caught intel(4) doing something strange.
> > > 
> > > [...]
> > > 
> > > And here is the panic on my machine.  I had to reconstruct it from
> > > OCR, the machine has no serial port, sorry if there are typos.
> > 
> > boot crash can be helpful for such machines

Oh nice, thanks.

> > > [...]
> > > 
> > > db_enter() at db_enter+Oxa
> > > panic(81db24fb) at panic+0x12f
> > > task del(0,810633e0) at task_del+Oxa8
> > > edp_panel vdd_on(81063128) at edp_panel_vdd_on+0x6a
> > > intel_dp_aux_xfer(81063128, 82512a20,4, 
> > > 82512400,2,0) at intel_dp_aux_xfer+0x18b
> > > intel_dp_aux_transfer(810631e8, 82512a88) at 
> > > intel_dp_aux_transfer+0x183
> > > drm_dp_dpcd_access(810631e8,9,0,8106313a, 1) at 
> > > drm_dp_dpcd_access+Oxa9
> > > drm_dp_dpcd_read(810631e8,0,8106313a, f) at 
> > > drm_dp_dpcd_read+0x61
> > > intel_dp_read_dpcd(81063128) at intel_dp_read_dpcd+0x45
> > > intel_dp_init_connector(81063000, 81064000) at 
> > > intel_dp_init_connector+0x988
> > > intel_ddi_init(80272000,0) at intel_ddi_init+0x454
> > > intel_modeset_init(80272000) at intel_modeset_init+0x1c9f
> > > i915_driver_probe(80272000, 82052f98) at 
> > > i915_driver_probe+0x7df
> > > inteldrm_attachhook(80272000) at inteldrm_attachhook+0x46
> > > end trace frame: Ox82512700, count: 0
> > > 
> > > From the backtrace, I gather the following:
> > > 
> > > edp_panel_vdd_on() calls clear_delayed_work() which is just a macro
> > 
> > cancel_delayed_work()

whoops

> > > that calls task_del().  And for whatever reason the taskq passed to
> > > task_del() is NULL.  Maybe there is a missing INIT_DELAYED_WORK() call
> > > somewhere prior to this point?
> > 
> > the call to that is in
> > INIT_DELAYED_WORK(_dp->panel_vdd_work, edp_panel_vdd_work);
> > but tq doesn't get set until work is scheduled as there are interfaces
> > to pick a tq when scheduling work.
> > 
> > So perhaps you want something like this to catch the cases where work is
> > cancelled before it is scheduled.
> 
> Yeah, I think that makes sense.
> 
> ok kettenis@

This patch fixes the panic, running with it now.

ok cheloha@

Caveat: my confidence in my understanding of these interfaces is
pretty low.

-Scott



Re: intel(4): edp_panel_vdd_on calls task_del(9) with NULL taskq

2021-05-06 Thread Mark Kettenis
> Date: Thu, 6 May 2021 21:59:12 +1000
> From: Jonathan Gray 
> 
> On Wed, May 05, 2021 at 04:47:50PM -0500, Scott Cheloha wrote:
> > Hi,
> > 
> > On a hunch I added additional parameter checks to task_add(9) and
> > task_del(9) and caught intel(4) doing something strange.
> > 
> > The patch is straightforward: check that the taskq pointer tq is not
> > NULL.  In the current code we return early if a flag is set or cleared
> > in the task w, in which case we don't catch bogus taskq inputs, which
> > is why the machine boots fine without the extra checks.
> > 
> > The patch:
> > 
> > Index: kern_task.c
> > ===
> > RCS file: /cvs/src/sys/kern/kern_task.c,v
> > retrieving revision 1.31
> > diff -u -p -r1.31 kern_task.c
> > --- kern_task.c 1 Aug 2020 08:40:20 -   1.31
> > +++ kern_task.c 5 May 2021 21:29:08 -
> > @@ -354,6 +354,9 @@ task_add(struct taskq *tq, struct task *
> >  {
> > int rv = 0;
> >  
> > +   if (tq == NULL)
> > +   panic("%s: NULL taskq", __func__);
> > +
> > if (ISSET(w->t_flags, TASK_ONQUEUE))
> > return (0);
> >  
> > @@ -378,6 +381,9 @@ int
> >  task_del(struct taskq *tq, struct task *w)
> >  {
> > int rv = 0;
> > +
> > +   if (tq == NULL)
> > +   panic("%s: NULL taskq", __func__);
> >  
> > if (!ISSET(w->t_flags, TASK_ONQUEUE))
> > return (0);
> > 
> > And here is the panic on my machine.  I had to reconstruct it from
> > OCR, the machine has no serial port, sorry if there are typos.
> 
> boot crash can be helpful for such machines
> 
> > 
> > panic: task_del: NULL taskq
> > Stopped at db_enter+0xa: popq %rbp
> > TIDPID  UID PRFLAGSPFLAGS  CPU  COMMAND
> >  513524  448830 Ox14000 0x2003  update
> >  352928  824020 0x14000 0x2002  cleaner
> >  382195  660350 Ox14000 0x2001  reaper
> > ...
> > db_enter() at db_enter+Oxa
> > panic(81db24fb) at panic+0x12f
> > task del(0,810633e0) at task_del+Oxa8
> > edp_panel vdd_on(81063128) at edp_panel_vdd_on+0x6a
> > intel_dp_aux_xfer(81063128, 82512a20,4, 
> > 82512400,2,0) at intel_dp_aux_xfer+0x18b
> > intel_dp_aux_transfer(810631e8, 82512a88) at 
> > intel_dp_aux_transfer+0x183
> > drm_dp_dpcd_access(810631e8,9,0,8106313a, 1) at 
> > drm_dp_dpcd_access+Oxa9
> > drm_dp_dpcd_read(810631e8,0,8106313a, f) at 
> > drm_dp_dpcd_read+0x61
> > intel_dp_read_dpcd(81063128) at intel_dp_read_dpcd+0x45
> > intel_dp_init_connector(81063000, 81064000) at 
> > intel_dp_init_connector+0x988
> > intel_ddi_init(80272000,0) at intel_ddi_init+0x454
> > intel_modeset_init(80272000) at intel_modeset_init+0x1c9f
> > i915_driver_probe(80272000, 82052f98) at 
> > i915_driver_probe+0x7df
> > inteldrm_attachhook(80272000) at inteldrm_attachhook+0x46
> > end trace frame: Ox82512700, count: 0
> > 
> > >From the backtrace, I gather the following:
> > 
> > edp_panel_vdd_on() calls clear_delayed_work() which is just a macro
> 
> cancel_delayed_work()
> 
> > that calls task_del().  And for whatever reason the taskq passed to
> > task_del() is NULL.  Maybe there is a missing INIT_DELAYED_WORK() call
> > somewhere prior to this point?
> 
> the call to that is in
> INIT_DELAYED_WORK(_dp->panel_vdd_work, edp_panel_vdd_work);
> but tq doesn't get set until work is scheduled as there are interfaces
> to pick a tq when scheduling work.
> 
> So perhaps you want something like this to catch the cases where work is
> cancelled before it is scheduled.

Yeah, I think that makes sense.

ok kettenis@

> Index: sys/dev/pci/drm/include/linux/workqueue.h
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/include/linux/workqueue.h,v
> retrieving revision 1.4
> diff -u -p -r1.4 workqueue.h
> --- sys/dev/pci/drm/include/linux/workqueue.h 14 Feb 2021 03:42:55 -  
> 1.4
> +++ sys/dev/pci/drm/include/linux/workqueue.h 6 May 2021 11:51:14 -
> @@ -95,7 +95,8 @@ queue_work(struct workqueue_struct *wq, 
>  static inline void
>  cancel_work_sync(struct work_struct *work)
>  {
> - task_del(work->tq, >task);
> + if (work->tq != NULL)
> + task_del(work->tq, >task);
>  }
>  
>  #define work_pending(work)   task_pending(&(work)->task)
> @@ -169,6 +170,8 @@ mod_delayed_work(struct workqueue_struct
>  static inline bool
>  cancel_delayed_work(struct delayed_work *dwork)
>  {
> + if (dwork->tq == NULL)
> + return false;
>   if (timeout_del(>to))
>   return true;
>   return task_del(dwork->tq, >work.task);
> @@ -177,6 +180,8 @@ cancel_delayed_work(struct delayed_work 
>  static inline bool
>  cancel_delayed_work_sync(struct delayed_work *dwork)
>  {
> + if (dwork->tq == NULL)
> + return false;
>   if 

Re: intel(4): edp_panel_vdd_on calls task_del(9) with NULL taskq

2021-05-06 Thread Jonathan Gray
On Wed, May 05, 2021 at 04:47:50PM -0500, Scott Cheloha wrote:
> Hi,
> 
> On a hunch I added additional parameter checks to task_add(9) and
> task_del(9) and caught intel(4) doing something strange.
> 
> The patch is straightforward: check that the taskq pointer tq is not
> NULL.  In the current code we return early if a flag is set or cleared
> in the task w, in which case we don't catch bogus taskq inputs, which
> is why the machine boots fine without the extra checks.
> 
> The patch:
> 
> Index: kern_task.c
> ===
> RCS file: /cvs/src/sys/kern/kern_task.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 kern_task.c
> --- kern_task.c   1 Aug 2020 08:40:20 -   1.31
> +++ kern_task.c   5 May 2021 21:29:08 -
> @@ -354,6 +354,9 @@ task_add(struct taskq *tq, struct task *
>  {
>   int rv = 0;
>  
> + if (tq == NULL)
> + panic("%s: NULL taskq", __func__);
> +
>   if (ISSET(w->t_flags, TASK_ONQUEUE))
>   return (0);
>  
> @@ -378,6 +381,9 @@ int
>  task_del(struct taskq *tq, struct task *w)
>  {
>   int rv = 0;
> +
> + if (tq == NULL)
> + panic("%s: NULL taskq", __func__);
>  
>   if (!ISSET(w->t_flags, TASK_ONQUEUE))
>   return (0);
> 
> And here is the panic on my machine.  I had to reconstruct it from
> OCR, the machine has no serial port, sorry if there are typos.

boot crash can be helpful for such machines

> 
> panic: task_del: NULL taskq
> Stopped at db_enter+0xa: popq %rbp
> TID  PID  UID PRFLAGSPFLAGS  CPU  COMMAND
>  513524  448830 Ox14000 0x2003  update
>  352928  824020 0x14000 0x2002  cleaner
>  382195  660350 Ox14000 0x2001  reaper
> ...
> db_enter() at db_enter+Oxa
> panic(81db24fb) at panic+0x12f
> task del(0,810633e0) at task_del+Oxa8
> edp_panel vdd_on(81063128) at edp_panel_vdd_on+0x6a
> intel_dp_aux_xfer(81063128, 82512a20,4, 82512400,2,0) 
> at intel_dp_aux_xfer+0x18b
> intel_dp_aux_transfer(810631e8, 82512a88) at 
> intel_dp_aux_transfer+0x183
> drm_dp_dpcd_access(810631e8,9,0,8106313a, 1) at 
> drm_dp_dpcd_access+Oxa9
> drm_dp_dpcd_read(810631e8,0,8106313a, f) at 
> drm_dp_dpcd_read+0x61
> intel_dp_read_dpcd(81063128) at intel_dp_read_dpcd+0x45
> intel_dp_init_connector(81063000, 81064000) at 
> intel_dp_init_connector+0x988
> intel_ddi_init(80272000,0) at intel_ddi_init+0x454
> intel_modeset_init(80272000) at intel_modeset_init+0x1c9f
> i915_driver_probe(80272000, 82052f98) at 
> i915_driver_probe+0x7df
> inteldrm_attachhook(80272000) at inteldrm_attachhook+0x46
> end trace frame: Ox82512700, count: 0
> 
> >From the backtrace, I gather the following:
> 
> edp_panel_vdd_on() calls clear_delayed_work() which is just a macro

cancel_delayed_work()

> that calls task_del().  And for whatever reason the taskq passed to
> task_del() is NULL.  Maybe there is a missing INIT_DELAYED_WORK() call
> somewhere prior to this point?

the call to that is in
INIT_DELAYED_WORK(_dp->panel_vdd_work, edp_panel_vdd_work);
but tq doesn't get set until work is scheduled as there are interfaces
to pick a tq when scheduling work.

So perhaps you want something like this to catch the cases where work is
cancelled before it is scheduled.

Index: sys/dev/pci/drm/include/linux/workqueue.h
===
RCS file: /cvs/src/sys/dev/pci/drm/include/linux/workqueue.h,v
retrieving revision 1.4
diff -u -p -r1.4 workqueue.h
--- sys/dev/pci/drm/include/linux/workqueue.h   14 Feb 2021 03:42:55 -  
1.4
+++ sys/dev/pci/drm/include/linux/workqueue.h   6 May 2021 11:51:14 -
@@ -95,7 +95,8 @@ queue_work(struct workqueue_struct *wq, 
 static inline void
 cancel_work_sync(struct work_struct *work)
 {
-   task_del(work->tq, >task);
+   if (work->tq != NULL)
+   task_del(work->tq, >task);
 }
 
 #define work_pending(work) task_pending(&(work)->task)
@@ -169,6 +170,8 @@ mod_delayed_work(struct workqueue_struct
 static inline bool
 cancel_delayed_work(struct delayed_work *dwork)
 {
+   if (dwork->tq == NULL)
+   return false;
if (timeout_del(>to))
return true;
return task_del(dwork->tq, >work.task);
@@ -177,6 +180,8 @@ cancel_delayed_work(struct delayed_work 
 static inline bool
 cancel_delayed_work_sync(struct delayed_work *dwork)
 {
+   if (dwork->tq == NULL)
+   return false;
if (timeout_del(>to))
return true;
return task_del(dwork->tq, >work.task);



intel(4): edp_panel_vdd_on calls task_del(9) with NULL taskq

2021-05-05 Thread Scott Cheloha
Hi,

On a hunch I added additional parameter checks to task_add(9) and
task_del(9) and caught intel(4) doing something strange.

The patch is straightforward: check that the taskq pointer tq is not
NULL.  In the current code we return early if a flag is set or cleared
in the task w, in which case we don't catch bogus taskq inputs, which
is why the machine boots fine without the extra checks.

The patch:

Index: kern_task.c
===
RCS file: /cvs/src/sys/kern/kern_task.c,v
retrieving revision 1.31
diff -u -p -r1.31 kern_task.c
--- kern_task.c 1 Aug 2020 08:40:20 -   1.31
+++ kern_task.c 5 May 2021 21:29:08 -
@@ -354,6 +354,9 @@ task_add(struct taskq *tq, struct task *
 {
int rv = 0;
 
+   if (tq == NULL)
+   panic("%s: NULL taskq", __func__);
+
if (ISSET(w->t_flags, TASK_ONQUEUE))
return (0);
 
@@ -378,6 +381,9 @@ int
 task_del(struct taskq *tq, struct task *w)
 {
int rv = 0;
+
+   if (tq == NULL)
+   panic("%s: NULL taskq", __func__);
 
if (!ISSET(w->t_flags, TASK_ONQUEUE))
return (0);

And here is the panic on my machine.  I had to reconstruct it from
OCR, the machine has no serial port, sorry if there are typos.

panic: task_del: NULL taskq
Stopped at db_enter+0xa: popq %rbp
TIDPID  UID PRFLAGSPFLAGS  CPU  COMMAND
 513524  448830 Ox14000 0x2003  update
 352928  824020 0x14000 0x2002  cleaner
 382195  660350 Ox14000 0x2001  reaper
...
db_enter() at db_enter+Oxa
panic(81db24fb) at panic+0x12f
task del(0,810633e0) at task_del+Oxa8
edp_panel vdd_on(81063128) at edp_panel_vdd_on+0x6a
intel_dp_aux_xfer(81063128, 82512a20,4, 82512400,2,0) 
at intel_dp_aux_xfer+0x18b
intel_dp_aux_transfer(810631e8, 82512a88) at 
intel_dp_aux_transfer+0x183
drm_dp_dpcd_access(810631e8,9,0,8106313a, 1) at 
drm_dp_dpcd_access+Oxa9
drm_dp_dpcd_read(810631e8,0,8106313a, f) at 
drm_dp_dpcd_read+0x61
intel_dp_read_dpcd(81063128) at intel_dp_read_dpcd+0x45
intel_dp_init_connector(81063000, 81064000) at 
intel_dp_init_connector+0x988
intel_ddi_init(80272000,0) at intel_ddi_init+0x454
intel_modeset_init(80272000) at intel_modeset_init+0x1c9f
i915_driver_probe(80272000, 82052f98) at i915_driver_probe+0x7df
inteldrm_attachhook(80272000) at inteldrm_attachhook+0x46
end trace frame: Ox82512700, count: 0

>From the backtrace, I gather the following:

edp_panel_vdd_on() calls clear_delayed_work() which is just a macro
that calls task_del().  And for whatever reason the taskq passed to
task_del() is NULL.  Maybe there is a missing INIT_DELAYED_WORK() call
somewhere prior to this point?

And yes, I know, this isn't a bug in the code as-is, but I'm putting
this on bugs@ because I'm pretty certain the taskq shouldn't be NULL.
It "works" without the additional checks, yes, but something is off.

My dmesg is attached.

Happy to provide more detail, reproduce it, etc.

CC dlg@: Maybe there is a discussion to be had about always entering
the taskq mutex during task_add() and task_del() to catch this problem
in the future?

-Scott

OpenBSD 6.9-current (GENERIC.MP) #0: Tue May  4 14:43:41 CDT 2021
ssc@jetsam.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16895528960 (16112MB)
avail mem = 16368230400 (15609MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x9f03b000 (63 entries)
bios0: vendor LENOVO version "N23ET59W (1.34 )" date 11/08/2018
bios0: LENOVO 20KHCTO1WW
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT TPM2 UEFI SSDT SSDT HPET APIC MCFG ECDT SSDT 
SSDT BOOT BATB SSDT SSDT SSDT LPIT WSMT SSDT SSDT SSDT DBGP DBG2 MSDM DMAR NHLT 
ASF! FPDT UEFI
acpi0: wakeup devices GLAN(S4) XHC_(S3) XDCI(S4) HDAS(S4) RP01(S4) PXSX(S4) 
RP02(S4) PXSX(S4) PXSX(S4) RP04(S4) PXSX(S4) RP05(S4) PXSX(S4) RP06(S4) 
PXSX(S4) RP07(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 2399 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz, 1791.41 MHz, 06-8e-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: