On Thursday, 18 December 2025 13:38:25 Central European Standard Time Boris 
Brezillon wrote:
> On Wed, 17 Dec 2025 15:29:38 +0100
> Nicolas Frattaroli <[email protected]> wrote:
> 
> > Add a function to modify an IRQ's mask. If the IRQ is currently active,
> > it will write to the register, otherwise it will only set the struct
> > member.
> > 
> > There's no locking done to guarantee exclusion with the other two
> > functions that touch the IRQ mask, and it should only be called from a
> > context where the circumstances guarantee no concurrent access is
> > performed.
> > 
> > Signed-off-by: Nicolas Frattaroli <[email protected]>
> > ---
> >  drivers/gpu/drm/panthor/panthor_device.h | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
> > b/drivers/gpu/drm/panthor/panthor_device.h
> > index f35e52b9546a..894d28b3eb02 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -470,6 +470,13 @@ static int panthor_request_ ## __name ## _irq(struct 
> > panthor_device *ptdev,                    \
> >                                      panthor_ ## __name ## 
> > _irq_threaded_handler,           \
> >                                      IRQF_SHARED, KBUILD_MODNAME "-" # 
> > __name,              \
> >                                      pirq);                                 
> >                 \
> > +}                                                                          
> >                 \
> > +                                                                           
> >                 \
> > +static inline void panthor_ ## __name ## _irq_mask_set(struct panthor_irq 
> > *pirq, u32 mask) \
> > +{                                                                          
> >                 \
> > +   pirq->mask = mask;                                                      
> >                 \
> > +   if (!atomic_read(&pirq->suspended))                                     
> >                 \
> > +           gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, mask);        
> >                 \
> 
> This is racy if called outside the (threaded) IRQ handler, which I
> believe is the case since its called when a trace point is enabled:
> 
> 1. _irq_raw_handler() sets INT_MASK to zero to avoid receiving
> interrupts from this IRQ line until the threaded handler has processed
> events
> 
> 2. _irq_mask_set() sets INT_MASK to something non-zero, meaning the
> interrupt will be re-enabled before events have been processed

Good catch, I only considered the case where the irq mask is modified
by a PM runtime suspend/resume, not by the actual handler functions
itself.

> this leads to at least one spurious interrupt being received before we
> set INT_MASK to zero again. Probably not the end of the world, but if
> we can avoid it, that'd be better.
> 
> Also, I'd like to see if we could re-purpose panthor_irq::mask to be
> the mask of events the user wants to monitor instead of a pure proxy of
> INT_MASK. If we do that, and we make panthor_irq::mask an atomic_t,

Yeah, I was wondering why panthor_irq::mask changed on IRQ suspend
and resume, since that information is already stored in the
"suspended" atomic.

> we can add panthor_xxx_irq_{enable,disable}_event() helpers that would
> do the atomic_{or,and} on panthor_irq::mask, and write the new value
> to _INT_MASK if:
> - we're processing events in the threaded handler (we would need
>   another field, or we'd need to turn suspended into a state that can
>   encode more than just "suspended or not")

Right, I assume synchronize_irq will not really fix the race,
since a single synchronization point does not a mutually exclusive
section make.

I thought about suspending IRQs, synchronising, then changing the
mask, then re-enabling them, but that feels potentially disruptive.
Though I may be misunderstanding the hardware here; if a bit is
disabled in the INT_MASK and the hardware would want to fire such
an interrupt, does it discard it? Because in that case, any solution
where we suspend IRQs -> wait for threaded handler to finish ->
modify mask -> resume IRQs could lead to us missing out on critical
knowledge, like that a GPU fault occurred.

In a perfect world, this hardware would use a register access
pattern that allows for race-free updates of non-overlapping bits,
like FIELD_PREP_WM16 implements. ;) But we do not live in such a
world.

> - the device is not suspended (that test you already have)
> 
> >  }
> >  
> >  extern struct workqueue_struct *panthor_cleanup_wq;
> > 
> 
> 

Thanks for the review, I'll try to come up with a solution that
won't potentially blow our legs off.

Kind regards,
Nicolas Frattaroli


Reply via email to