Re: [PATCH] leds: trigger: fix potential deadlock with libata

2021-03-08 Thread Marc Kleine-Budde
On 08.03.2021 15:56:53, Marc Kleine-Budde wrote:
> > > meanwhile this patch hit v5.10.x stable and caused a performance
> > > degradation on our use case:
> > > 
> > > It's an embedded ARM system, 4x Cortex A53, with an SPI attached CAN
> > > controller. CAN stands for Controller Area Network and here used to
> > > connect to some automotive equipment. Over CAN an ISOTP (a CAN-specific
> > > Transport Protocol) transfer is running. With this patch, we see CAN
> > > frames delayed for ~6ms, the usual gap between CAN frames is 240µs.
> > > 
> > > Reverting this patch, restores the old performance.
> > > 
> > > What is the best way to solve this dilemma? Identify the critical path
> > > in our use case? Is there a way we can get around the irqsave in
> > > led_trigger_event()?
> > > 
> > 
> > Probably, we can change from rwlock to rcu here, POC code as follow,
> > only compile tested. Marc, could you see whether this help the
> > performance on your platform? Please note that I haven't test it in a
> > running kernel and I'm not that familir with led subsystem, so use it
> > with caution ;-)
> > 
> > (While at it, I think maybe we miss the leddev_list_lock in net/mac80211
> > in the patch)
> 
> I can confirm, this patch basically restores the old performance.

Hmmm - it first looked OK, now it doesn't - let me do some better testing.

Sorry for the noise,
Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |


signature.asc
Description: PGP signature


Re: [PATCH] leds: trigger: fix potential deadlock with libata

2021-03-08 Thread Marc Kleine-Budde
On 07.03.2021 10:02:32, Boqun Feng wrote:
> On Sat, Mar 06, 2021 at 09:39:54PM +0100, Marc Kleine-Budde wrote:
> > Hello *,
> > 
> > On 02.11.2020 11:41:52, Andrea Righi wrote:
> > > We have the following potential deadlock condition:
> > > 
> > >  
> > >  WARNING: possible irq lock inversion dependency detected
> > >  5.10.0-rc2+ #25 Not tainted
> > >  
> > >  swapper/3/0 just changed the state of lock:
> > >  8880063bd618 (>lock){-...}-{2:2}, at: 
> > > ata_bmdma_interrupt+0x27/0x200
> > >  but this lock took another, HARDIRQ-READ-unsafe lock in the past:
> > >   (>leddev_list_lock){.+.?}-{2:2}
> > > 
> > >  and interrupts could create inverse lock ordering between them.
> > 
> > [...]
> > 
> > > ---
> > >  drivers/leds/led-triggers.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
> > > index 91da90cfb11d..16d1a93a10a8 100644
> > > --- a/drivers/leds/led-triggers.c
> > > +++ b/drivers/leds/led-triggers.c
> > > @@ -378,14 +378,15 @@ void led_trigger_event(struct led_trigger *trig,
> > >   enum led_brightness brightness)
> > >  {
> > >   struct led_classdev *led_cdev;
> > > + unsigned long flags;
> > >  
> > >   if (!trig)
> > >   return;
> > >  
> > > - read_lock(>leddev_list_lock);
> > > + read_lock_irqsave(>leddev_list_lock, flags);
> > >   list_for_each_entry(led_cdev, >led_cdevs, trig_list)
> > >   led_set_brightness(led_cdev, brightness);
> > > - read_unlock(>leddev_list_lock);
> > > + read_unlock_irqrestore(>leddev_list_lock, flags);
> > >  }
> > >  EXPORT_SYMBOL_GPL(led_trigger_event);
> > 
> > meanwhile this patch hit v5.10.x stable and caused a performance
> > degradation on our use case:
> > 
> > It's an embedded ARM system, 4x Cortex A53, with an SPI attached CAN
> > controller. CAN stands for Controller Area Network and here used to
> > connect to some automotive equipment. Over CAN an ISOTP (a CAN-specific
> > Transport Protocol) transfer is running. With this patch, we see CAN
> > frames delayed for ~6ms, the usual gap between CAN frames is 240µs.
> > 
> > Reverting this patch, restores the old performance.
> > 
> > What is the best way to solve this dilemma? Identify the critical path
> > in our use case? Is there a way we can get around the irqsave in
> > led_trigger_event()?
> > 
> 
> Probably, we can change from rwlock to rcu here, POC code as follow,
> only compile tested. Marc, could you see whether this help the
> performance on your platform? Please note that I haven't test it in a
> running kernel and I'm not that familir with led subsystem, so use it
> with caution ;-)
> 
> (While at it, I think maybe we miss the leddev_list_lock in net/mac80211
> in the patch)

I can confirm, this patch basically restores the old performance.

Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |


signature.asc
Description: PGP signature


Re: [PATCH] leds: trigger: fix potential deadlock with libata

2021-03-07 Thread Hans de Goede
Hi,

On 3/7/21 5:13 PM, Pavel Machek wrote:
> Hi!
> 
>>> --- a/drivers/leds/led-triggers.c
>>> +++ b/drivers/leds/led-triggers.c
>>> @@ -378,14 +378,15 @@ void led_trigger_event(struct led_trigger *trig,
>>> enum led_brightness brightness)
>>>  {
>>> struct led_classdev *led_cdev;
>>> +   unsigned long flags;
>>>  
>>> if (!trig)
>>> return;
>>>  
>>> -   read_lock(>leddev_list_lock);
>>> +   read_lock_irqsave(>leddev_list_lock, flags);
>>> list_for_each_entry(led_cdev, >led_cdevs, trig_list)
>>> led_set_brightness(led_cdev, brightness);
>>> -   read_unlock(>leddev_list_lock);
>>> +   read_unlock_irqrestore(>leddev_list_lock, flags);
>>>  }
>>>  EXPORT_SYMBOL_GPL(led_trigger_event)
>>
>> meanwhile this patch hit v5.10.x stable and caused a performance
>> degradation on our use case:
>>
>> It's an embedded ARM system, 4x Cortex A53, with an SPI attached CAN
>> controller. CAN stands for Controller Area Network and here used to
>> connect to some automotive equipment. Over CAN an ISOTP (a CAN-specific
>> Transport Protocol) transfer is running. With this patch, we see CAN
>> frames delayed for ~6ms, the usual gap between CAN frames is 240µs.
>>
>> Reverting this patch, restores the old performance.
>>
>> What is the best way to solve this dilemma? Identify the critical path
>> in our use case? Is there a way we can get around the irqsave in
>> led_trigger_event()?
> 
> Hans was pushing for this patch, perhaps he has some ideas...

I was not pushing for this particular fix, I was asking about a fix
for the lockdep identified potential deadlock.

And you replied that this was already fixed in your for-next branch
when I asked, so all in all, other then reporting the potential deadlock
(after it was already fixed) I have very little do to with this patch.

With that all said, I must say that I'm surprised that switching from
read_lock() to read_lock_irqsave() causes such a hefty penalty, so I
wonder what is really going on here. Using the irqsave version disables
interrupts, but AFAIK only on the current core and only for the duration
of the led_set_brightness() call(s) . 

Is the system perhaps pinning IRQs to a specific CPU in combination with
a led_set_brightness() somehow taking much longer then it should?

Note that led_set_brightness() calls are not allowed to block, if they
block they should use the brightness_set_blocking callback in their
led_class_dev struct not the regular brightness_set callback. In which case
the LED-core will defer the actually setting of the LED to a workqueue.

So one thing which might be worthwhile to check is if any of the LED
drivers on the system in question are using the brightness_set callback,
where they should be using the blocking one.

Regards,

Hans



Re: [PATCH] leds: trigger: fix potential deadlock with libata

2021-03-07 Thread Pavel Machek
Hi!

> > ---
> >  drivers/leds/led-triggers.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
> > index 91da90cfb11d..16d1a93a10a8 100644
> > --- a/drivers/leds/led-triggers.c
> > +++ b/drivers/leds/led-triggers.c
> > @@ -378,14 +378,15 @@ void led_trigger_event(struct led_trigger *trig,
> > enum led_brightness brightness)
> >  {
> > struct led_classdev *led_cdev;
> > +   unsigned long flags;
> >  
> > if (!trig)
> > return;
> >  
> > -   read_lock(>leddev_list_lock);
> > +   read_lock_irqsave(>leddev_list_lock, flags);
> > list_for_each_entry(led_cdev, >led_cdevs, trig_list)
> > led_set_brightness(led_cdev, brightness);
> > -   read_unlock(>leddev_list_lock);
> > +   read_unlock_irqrestore(>leddev_list_lock, flags);
> >  }
> >  EXPORT_SYMBOL_GPL(led_trigger_event);
> 
> meanwhile this patch hit v5.10.x stable and caused a performance
> degradation on our use case:
> 
> It's an embedded ARM system, 4x Cortex A53, with an SPI attached CAN
> controller. CAN stands for Controller Area Network and here used to
> connect to some automotive equipment. Over CAN an ISOTP (a CAN-specific
> Transport Protocol) transfer is running. With this patch, we see CAN
> frames delayed for ~6ms, the usual gap between CAN frames is 240µs.
> 
> Reverting this patch, restores the old performance.
> 
> What is the best way to solve this dilemma? Identify the critical path
> in our use case? Is there a way we can get around the irqsave in
> led_trigger_event()?

6ms is quite long. Are you actively using any triggers? Do you have
LED blinking on CAN access?

Can you verify if it is cli/sti taking too long, or if the
led_set_brightness takes too long?

Best regards,
Pavel

-- 
http://www.livejournal.com/~pavelmachek


signature.asc
Description: PGP signature


Re: [PATCH] leds: trigger: fix potential deadlock with libata

2021-03-07 Thread Pavel Machek
Hi!

> > --- a/drivers/leds/led-triggers.c
> > +++ b/drivers/leds/led-triggers.c
> > @@ -378,14 +378,15 @@ void led_trigger_event(struct led_trigger *trig,
> > enum led_brightness brightness)
> >  {
> > struct led_classdev *led_cdev;
> > +   unsigned long flags;
> >  
> > if (!trig)
> > return;
> >  
> > -   read_lock(>leddev_list_lock);
> > +   read_lock_irqsave(>leddev_list_lock, flags);
> > list_for_each_entry(led_cdev, >led_cdevs, trig_list)
> > led_set_brightness(led_cdev, brightness);
> > -   read_unlock(>leddev_list_lock);
> > +   read_unlock_irqrestore(>leddev_list_lock, flags);
> >  }
> >  EXPORT_SYMBOL_GPL(led_trigger_event);
> 
> meanwhile this patch hit v5.10.x stable and caused a performance
> degradation on our use case:
> 
> It's an embedded ARM system, 4x Cortex A53, with an SPI attached CAN
> controller. CAN stands for Controller Area Network and here used to
> connect to some automotive equipment. Over CAN an ISOTP (a CAN-specific
> Transport Protocol) transfer is running. With this patch, we see CAN
> frames delayed for ~6ms, the usual gap between CAN frames is 240µs.
> 
> Reverting this patch, restores the old performance.
> 
> What is the best way to solve this dilemma? Identify the critical path
> in our use case? Is there a way we can get around the irqsave in
> led_trigger_event()?

Hans was pushing for this patch, perhaps he has some ideas...
Pavel
-- 
http://www.livejournal.com/~pavelmachek


signature.asc
Description: PGP signature


Re: [PATCH] leds: trigger: fix potential deadlock with libata

2021-03-06 Thread Andrea Righi
On Sun, Mar 07, 2021 at 10:02:32AM +0800, Boqun Feng wrote:
> On Sat, Mar 06, 2021 at 09:39:54PM +0100, Marc Kleine-Budde wrote:
> > Hello *,
> > 
> > On 02.11.2020 11:41:52, Andrea Righi wrote:
> > > We have the following potential deadlock condition:
> > > 
> > >  
> > >  WARNING: possible irq lock inversion dependency detected
> > >  5.10.0-rc2+ #25 Not tainted
> > >  
> > >  swapper/3/0 just changed the state of lock:
> > >  8880063bd618 (>lock){-...}-{2:2}, at: 
> > > ata_bmdma_interrupt+0x27/0x200
> > >  but this lock took another, HARDIRQ-READ-unsafe lock in the past:
> > >   (>leddev_list_lock){.+.?}-{2:2}
> > > 
> > >  and interrupts could create inverse lock ordering between them.
> > 
> > [...]
> > 
> > > ---
> > >  drivers/leds/led-triggers.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
> > > index 91da90cfb11d..16d1a93a10a8 100644
> > > --- a/drivers/leds/led-triggers.c
> > > +++ b/drivers/leds/led-triggers.c
> > > @@ -378,14 +378,15 @@ void led_trigger_event(struct led_trigger *trig,
> > >   enum led_brightness brightness)
> > >  {
> > >   struct led_classdev *led_cdev;
> > > + unsigned long flags;
> > >  
> > >   if (!trig)
> > >   return;
> > >  
> > > - read_lock(>leddev_list_lock);
> > > + read_lock_irqsave(>leddev_list_lock, flags);
> > >   list_for_each_entry(led_cdev, >led_cdevs, trig_list)
> > >   led_set_brightness(led_cdev, brightness);
> > > - read_unlock(>leddev_list_lock);
> > > + read_unlock_irqrestore(>leddev_list_lock, flags);
> > >  }
> > >  EXPORT_SYMBOL_GPL(led_trigger_event);
> > 
> > meanwhile this patch hit v5.10.x stable and caused a performance
> > degradation on our use case:
> > 
> > It's an embedded ARM system, 4x Cortex A53, with an SPI attached CAN
> > controller. CAN stands for Controller Area Network and here used to
> > connect to some automotive equipment. Over CAN an ISOTP (a CAN-specific
> > Transport Protocol) transfer is running. With this patch, we see CAN
> > frames delayed for ~6ms, the usual gap between CAN frames is 240µs.
> > 
> > Reverting this patch, restores the old performance.
> > 
> > What is the best way to solve this dilemma? Identify the critical path
> > in our use case? Is there a way we can get around the irqsave in
> > led_trigger_event()?
> > 
> 
> Probably, we can change from rwlock to rcu here, POC code as follow,
> only compile tested. Marc, could you see whether this help the
> performance on your platform? Please note that I haven't test it in a
> running kernel and I'm not that familir with led subsystem, so use it
> with caution ;-)

If we don't want to touch the led subsystem at all maybe we could try to
fix the problem in libata, we just need to prevent calling
led_trigger_blink_oneshot() with >lock held from
ata_qc_complete(), maybe doing the led blinking from another context (a
workqueue for example)?

-Andrea


Re: [PATCH] leds: trigger: fix potential deadlock with libata

2021-03-06 Thread Boqun Feng
On Sat, Mar 06, 2021 at 09:39:54PM +0100, Marc Kleine-Budde wrote:
> Hello *,
> 
> On 02.11.2020 11:41:52, Andrea Righi wrote:
> > We have the following potential deadlock condition:
> > 
> >  
> >  WARNING: possible irq lock inversion dependency detected
> >  5.10.0-rc2+ #25 Not tainted
> >  
> >  swapper/3/0 just changed the state of lock:
> >  8880063bd618 (>lock){-...}-{2:2}, at: 
> > ata_bmdma_interrupt+0x27/0x200
> >  but this lock took another, HARDIRQ-READ-unsafe lock in the past:
> >   (>leddev_list_lock){.+.?}-{2:2}
> > 
> >  and interrupts could create inverse lock ordering between them.
> 
> [...]
> 
> > ---
> >  drivers/leds/led-triggers.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
> > index 91da90cfb11d..16d1a93a10a8 100644
> > --- a/drivers/leds/led-triggers.c
> > +++ b/drivers/leds/led-triggers.c
> > @@ -378,14 +378,15 @@ void led_trigger_event(struct led_trigger *trig,
> > enum led_brightness brightness)
> >  {
> > struct led_classdev *led_cdev;
> > +   unsigned long flags;
> >  
> > if (!trig)
> > return;
> >  
> > -   read_lock(>leddev_list_lock);
> > +   read_lock_irqsave(>leddev_list_lock, flags);
> > list_for_each_entry(led_cdev, >led_cdevs, trig_list)
> > led_set_brightness(led_cdev, brightness);
> > -   read_unlock(>leddev_list_lock);
> > +   read_unlock_irqrestore(>leddev_list_lock, flags);
> >  }
> >  EXPORT_SYMBOL_GPL(led_trigger_event);
> 
> meanwhile this patch hit v5.10.x stable and caused a performance
> degradation on our use case:
> 
> It's an embedded ARM system, 4x Cortex A53, with an SPI attached CAN
> controller. CAN stands for Controller Area Network and here used to
> connect to some automotive equipment. Over CAN an ISOTP (a CAN-specific
> Transport Protocol) transfer is running. With this patch, we see CAN
> frames delayed for ~6ms, the usual gap between CAN frames is 240µs.
> 
> Reverting this patch, restores the old performance.
> 
> What is the best way to solve this dilemma? Identify the critical path
> in our use case? Is there a way we can get around the irqsave in
> led_trigger_event()?
> 

Probably, we can change from rwlock to rcu here, POC code as follow,
only compile tested. Marc, could you see whether this help the
performance on your platform? Please note that I haven't test it in a
running kernel and I'm not that familir with led subsystem, so use it
with caution ;-)

(While at it, I think maybe we miss the leddev_list_lock in net/mac80211
in the patch)

Regards,
Boqun
--->8
diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
index 4e7b78a84149..ae68ccab6cc9 100644
--- a/drivers/leds/led-triggers.c
+++ b/drivers/leds/led-triggers.c
@@ -171,10 +171,12 @@ int led_trigger_set(struct led_classdev *led_cdev, struct 
led_trigger *trig)
 
/* Remove any existing trigger */
if (led_cdev->trigger) {
-   write_lock_irqsave(_cdev->trigger->leddev_list_lock, flags);
-   list_del(_cdev->trig_list);
-   write_unlock_irqrestore(_cdev->trigger->leddev_list_lock,
+   spin_lock_irqsave(_cdev->trigger->leddev_list_lock, flags);
+   list_del_rcu(_cdev->trig_list);
+   spin_unlock_irqrestore(_cdev->trigger->leddev_list_lock,
flags);
+   /* Wait for the readers gone */
+   synchronize_rcu();
cancel_work_sync(_cdev->set_brightness_work);
led_stop_software_blink(led_cdev);
if (led_cdev->trigger->deactivate)
@@ -186,9 +188,9 @@ int led_trigger_set(struct led_classdev *led_cdev, struct 
led_trigger *trig)
led_set_brightness(led_cdev, LED_OFF);
}
if (trig) {
-   write_lock_irqsave(>leddev_list_lock, flags);
-   list_add_tail(_cdev->trig_list, >led_cdevs);
-   write_unlock_irqrestore(>leddev_list_lock, flags);
+   spin_lock_irqsave(>leddev_list_lock, flags);
+   list_add_tail_rcu(_cdev->trig_list, >led_cdevs);
+   spin_unlock_irqrestore(>leddev_list_lock, flags);
led_cdev->trigger = trig;
 
if (trig->activate)
@@ -223,9 +225,12 @@ int led_trigger_set(struct led_classdev *led_cdev, struct 
led_trigger *trig)
trig->deactivate(led_cdev);
 err_activate:
 
-   write_lock_irqsave(_cdev->trigger->leddev_list_lock, flags);
-   list_del(_cdev->trig_list);
-   write_unlock_irqrestore(_cdev->trigger->leddev_list_lock, flags);
+   spin_lock_irqsave(_cdev->trigger->leddev_list_lock, flags);
+   list_del_rcu(_cdev->trig_list);
+   spin_unlock_irqrestore(_cdev->trigger->leddev_list_lock, flags);
+
+   /* XXX could use call_rcu() here? */
+   

Re: [PATCH] leds: trigger: fix potential deadlock with libata

2021-03-06 Thread Marc Kleine-Budde
Hello *,

On 02.11.2020 11:41:52, Andrea Righi wrote:
> We have the following potential deadlock condition:
> 
>  
>  WARNING: possible irq lock inversion dependency detected
>  5.10.0-rc2+ #25 Not tainted
>  
>  swapper/3/0 just changed the state of lock:
>  8880063bd618 (>lock){-...}-{2:2}, at: 
> ata_bmdma_interrupt+0x27/0x200
>  but this lock took another, HARDIRQ-READ-unsafe lock in the past:
>   (>leddev_list_lock){.+.?}-{2:2}
> 
>  and interrupts could create inverse lock ordering between them.

[...]

> ---
>  drivers/leds/led-triggers.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/leds/led-triggers.c b/drivers/leds/led-triggers.c
> index 91da90cfb11d..16d1a93a10a8 100644
> --- a/drivers/leds/led-triggers.c
> +++ b/drivers/leds/led-triggers.c
> @@ -378,14 +378,15 @@ void led_trigger_event(struct led_trigger *trig,
>   enum led_brightness brightness)
>  {
>   struct led_classdev *led_cdev;
> + unsigned long flags;
>  
>   if (!trig)
>   return;
>  
> - read_lock(>leddev_list_lock);
> + read_lock_irqsave(>leddev_list_lock, flags);
>   list_for_each_entry(led_cdev, >led_cdevs, trig_list)
>   led_set_brightness(led_cdev, brightness);
> - read_unlock(>leddev_list_lock);
> + read_unlock_irqrestore(>leddev_list_lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(led_trigger_event);

meanwhile this patch hit v5.10.x stable and caused a performance
degradation on our use case:

It's an embedded ARM system, 4x Cortex A53, with an SPI attached CAN
controller. CAN stands for Controller Area Network and here used to
connect to some automotive equipment. Over CAN an ISOTP (a CAN-specific
Transport Protocol) transfer is running. With this patch, we see CAN
frames delayed for ~6ms, the usual gap between CAN frames is 240µs.

Reverting this patch, restores the old performance.

What is the best way to solve this dilemma? Identify the critical path
in our use case? Is there a way we can get around the irqsave in
led_trigger_event()?

regards,
Marc

-- 
Pengutronix e.K. | Marc Kleine-Budde   |
Embedded Linux   | https://www.pengutronix.de  |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917- |


signature.asc
Description: PGP signature


Re: [PATCH] leds: trigger: fix potential deadlock with libata

2020-11-25 Thread Andrea Righi
On Wed, Nov 25, 2020 at 03:15:18PM +0100, Andrea Righi wrote:
...
> > I'd hate to see this in stable 3 days after Linus merges it...
> > 
> > Do these need _irqsave, too?
> > 
> > drivers/leds/led-triggers.c:   read_lock(>leddev_list_lock);
> > drivers/leds/led-triggers.c:   read_unlock(>leddev_list_lock);
> > drivers/leds/led-triggers.c:   read_lock(>leddev_list_lock);
> > drivers/leds/led-triggers.c:   read_unlock(>leddev_list_lock);
> > 
> > Best regards,
> 
> I think also led_trigger_blink_setup() needs to use irqsave/irqrestore,
> in fact:
> 
> $ git grep "led_trigger_blink("
> drivers/leds/led-triggers.c:void led_trigger_blink(struct led_trigger *trig,
> drivers/power/supply/power_supply_leds.c:   
> led_trigger_blink(psy->charging_blink_full_solid_trig,
> include/linux/leds.h:void led_trigger_blink(struct led_trigger *trigger, 
> unsigned long *delay_on,
> include/linux/leds.h:static inline void led_trigger_blink(struct led_trigger 
> *trigger,
> 
> power_supply_leds.c is using led_trigger_blink() from a workqueue
> context, so potentially the same deadlock condition can also happen.
> 
> Let me know if you want me to send a new patch to include also this
> case.

Just sent (and tested) a v2 of this patch that changes also
led_trigger_blink_setup().

-Andrea


Re: [PATCH] leds: trigger: fix potential deadlock with libata

2020-11-25 Thread Andrea Righi
Hi Pavel,

On Wed, Nov 25, 2020 at 01:46:48PM +0100, Pavel Machek wrote:
> Hi!
> 
> > We have the following potential deadlock condition:
> > 
> >  
> >  WARNING: possible irq lock inversion dependency detected
> >  5.10.0-rc2+ #25 Not tainted
> >  
> >  swapper/3/0 just changed the state of lock:
> >  8880063bd618 (>lock){-...}-{2:2}, at: 
> > ata_bmdma_interrupt+0x27/0x200
> >  but this lock took another, HARDIRQ-READ-unsafe lock in the past:
> >   (>leddev_list_lock){.+.?}-{2:2}
> > 
> >  and interrupts could create inverse lock ordering between them.
> > 
> >  other info that might help us debug this:
> >   Possible interrupt unsafe locking scenario:
> > 
> > CPU0CPU1
> > 
> >lock(>leddev_list_lock);
> > local_irq_disable();
> > lock(>lock);
> > lock(>leddev_list_lock);
> >
> >  lock(>lock);
> > 
> >   *** DEADLOCK ***
> > 
> >  no locks held by swapper/3/0.
> > 
> >  the shortest dependencies between 2nd lock and 1st lock:
> >   -> (>leddev_list_lock){.+.?}-{2:2} ops: 46 {
> >  HARDIRQ-ON-R at:
> >lock_acquire+0x15f/0x420
> >_raw_read_lock+0x42/0x90
> >led_trigger_event+0x2b/0x70
> >rfkill_global_led_trigger_worker+0x94/0xb0
> >process_one_work+0x240/0x560
> >worker_thread+0x58/0x3d0
> >kthread+0x151/0x170
> >ret_from_fork+0x1f/0x30
> >  IN-SOFTIRQ-R at:
> >lock_acquire+0x15f/0x420
> >_raw_read_lock+0x42/0x90
> >led_trigger_event+0x2b/0x70
> >kbd_bh+0x9e/0xc0
> >tasklet_action_common.constprop.0+0xe9/0x100
> >tasklet_action+0x22/0x30
> >__do_softirq+0xcc/0x46d
> >run_ksoftirqd+0x3f/0x70
> >smpboot_thread_fn+0x116/0x1f0
> >kthread+0x151/0x170
> >ret_from_fork+0x1f/0x30
> >  SOFTIRQ-ON-R at:
> >lock_acquire+0x15f/0x420
> >_raw_read_lock+0x42/0x90
> >led_trigger_event+0x2b/0x70
> >rfkill_global_led_trigger_worker+0x94/0xb0
> >process_one_work+0x240/0x560
> >worker_thread+0x58/0x3d0
> >kthread+0x151/0x170
> >ret_from_fork+0x1f/0x30
> >  INITIAL READ USE at:
> >lock_acquire+0x15f/0x420
> >_raw_read_lock+0x42/0x90
> >led_trigger_event+0x2b/0x70
> >rfkill_global_led_trigger_worker+0x94/0xb0
> >process_one_work+0x240/0x560
> >worker_thread+0x58/0x3d0
> >kthread+0x151/0x170
> >ret_from_fork+0x1f/0x30
> >}
> >... key  at: [] __key.0+0x0/0x10
> >... acquired at:
> > _raw_read_lock+0x42/0x90
> > led_trigger_blink_oneshot+0x3b/0x90
> > ledtrig_disk_activity+0x3c/0xa0
> > ata_qc_complete+0x26/0x450
> > ata_do_link_abort+0xa3/0xe0
> > ata_port_freeze+0x2e/0x40
> > ata_hsm_qc_complete+0x94/0xa0
> > ata_sff_hsm_move+0x177/0x7a0
> > ata_sff_pio_task+0xc7/0x1b0
> > process_one_work+0x240/0x560
> > worker_thread+0x58/0x3d0
> > kthread+0x151/0x170
> > ret_from_fork+0x1f/0x30
> > 
> >  -> (>lock){-...}-{2:2} ops: 69 {
> > IN-HARDIRQ-W at:
> >  lock_acquire+0x15f/0x420
> >  _raw_spin_lock_irqsave+0x52/0xa0
> >  ata_bmdma_interrupt+0x27/0x200
> >  __handle_irq_event_percpu+0xd5/0x2b0
> >  handle_irq_event+0x57/0xb0
> >  handle_edge_irq+0x8c/0x230
> >  asm_call_irq_on_stack+0xf/0x20
> >  common_interrupt+0x100/0x1c0
> >  asm_common_interrupt+0x1e/0x40
> >  native_safe_halt+0xe/0x10
> >  arch_cpu_idle+0x15/0x20
> >  default_idle_call+0x59/0x1c0
> >  do_idle+0x22c/0x2c0
> >  cpu_startup_entry+0x20/0x30
> >  start_secondary+0x11d/0x150
> >  secondary_startup_64_no_verify+0xa6/0xab
> > INITIAL USE at:
> > lock_acquire+0x15f/0x420
> > _raw_spin_lock_irqsave+0x52/0xa0
> > ata_dev_init+0x54/0xe0
> > 

Re: [PATCH] leds: trigger: fix potential deadlock with libata

2020-11-25 Thread Boqun Feng
On Wed, Nov 25, 2020 at 01:46:48PM +0100, Pavel Machek wrote:
> Hi!
> 
> > We have the following potential deadlock condition:
> > 
> >  
> >  WARNING: possible irq lock inversion dependency detected
> >  5.10.0-rc2+ #25 Not tainted
> >  
> >  swapper/3/0 just changed the state of lock:
> >  8880063bd618 (>lock){-...}-{2:2}, at: 
> > ata_bmdma_interrupt+0x27/0x200
> >  but this lock took another, HARDIRQ-READ-unsafe lock in the past:
> >   (>leddev_list_lock){.+.?}-{2:2}
> > 
> >  and interrupts could create inverse lock ordering between them.
> > 
> >  other info that might help us debug this:
> >   Possible interrupt unsafe locking scenario:
> > 
> > CPU0CPU1
> > 
> >lock(>leddev_list_lock);
> > local_irq_disable();
> > lock(>lock);
> > lock(>leddev_list_lock);
> >
> >  lock(>lock);
> > 
> >   *** DEADLOCK ***
> > 
> >  no locks held by swapper/3/0.
> > 
> >  the shortest dependencies between 2nd lock and 1st lock:
> >   -> (>leddev_list_lock){.+.?}-{2:2} ops: 46 {
> >  HARDIRQ-ON-R at:
> >lock_acquire+0x15f/0x420
> >_raw_read_lock+0x42/0x90
> >led_trigger_event+0x2b/0x70
> >rfkill_global_led_trigger_worker+0x94/0xb0
> >process_one_work+0x240/0x560
> >worker_thread+0x58/0x3d0
> >kthread+0x151/0x170
> >ret_from_fork+0x1f/0x30
> >  IN-SOFTIRQ-R at:
> >lock_acquire+0x15f/0x420
> >_raw_read_lock+0x42/0x90
> >led_trigger_event+0x2b/0x70
> >kbd_bh+0x9e/0xc0
> >tasklet_action_common.constprop.0+0xe9/0x100
> >tasklet_action+0x22/0x30
> >__do_softirq+0xcc/0x46d
> >run_ksoftirqd+0x3f/0x70
> >smpboot_thread_fn+0x116/0x1f0
> >kthread+0x151/0x170
> >ret_from_fork+0x1f/0x30
> >  SOFTIRQ-ON-R at:
> >lock_acquire+0x15f/0x420
> >_raw_read_lock+0x42/0x90
> >led_trigger_event+0x2b/0x70
> >rfkill_global_led_trigger_worker+0x94/0xb0
> >process_one_work+0x240/0x560
> >worker_thread+0x58/0x3d0
> >kthread+0x151/0x170
> >ret_from_fork+0x1f/0x30
> >  INITIAL READ USE at:
> >lock_acquire+0x15f/0x420
> >_raw_read_lock+0x42/0x90
> >led_trigger_event+0x2b/0x70
> >rfkill_global_led_trigger_worker+0x94/0xb0
> >process_one_work+0x240/0x560
> >worker_thread+0x58/0x3d0
> >kthread+0x151/0x170
> >ret_from_fork+0x1f/0x30
> >}
> >... key  at: [] __key.0+0x0/0x10
> >... acquired at:
> > _raw_read_lock+0x42/0x90
> > led_trigger_blink_oneshot+0x3b/0x90
> > ledtrig_disk_activity+0x3c/0xa0
> > ata_qc_complete+0x26/0x450
> > ata_do_link_abort+0xa3/0xe0
> > ata_port_freeze+0x2e/0x40
> > ata_hsm_qc_complete+0x94/0xa0
> > ata_sff_hsm_move+0x177/0x7a0
> > ata_sff_pio_task+0xc7/0x1b0
> > process_one_work+0x240/0x560
> > worker_thread+0x58/0x3d0
> > kthread+0x151/0x170
> > ret_from_fork+0x1f/0x30
> > 
> >  -> (>lock){-...}-{2:2} ops: 69 {
> > IN-HARDIRQ-W at:
> >  lock_acquire+0x15f/0x420
> >  _raw_spin_lock_irqsave+0x52/0xa0
> >  ata_bmdma_interrupt+0x27/0x200
> >  __handle_irq_event_percpu+0xd5/0x2b0
> >  handle_irq_event+0x57/0xb0
> >  handle_edge_irq+0x8c/0x230
> >  asm_call_irq_on_stack+0xf/0x20
> >  common_interrupt+0x100/0x1c0
> >  asm_common_interrupt+0x1e/0x40
> >  native_safe_halt+0xe/0x10
> >  arch_cpu_idle+0x15/0x20
> >  default_idle_call+0x59/0x1c0
> >  do_idle+0x22c/0x2c0
> >  cpu_startup_entry+0x20/0x30
> >  start_secondary+0x11d/0x150
> >  secondary_startup_64_no_verify+0xa6/0xab
> > INITIAL USE at:
> > lock_acquire+0x15f/0x420
> > _raw_spin_lock_irqsave+0x52/0xa0
> > ata_dev_init+0x54/0xe0
> > ata_link_init+0x8b/0xd0

Re: [PATCH] leds: trigger: fix potential deadlock with libata

2020-11-25 Thread Pavel Machek
Hi!

> We have the following potential deadlock condition:
> 
>  
>  WARNING: possible irq lock inversion dependency detected
>  5.10.0-rc2+ #25 Not tainted
>  
>  swapper/3/0 just changed the state of lock:
>  8880063bd618 (>lock){-...}-{2:2}, at: 
> ata_bmdma_interrupt+0x27/0x200
>  but this lock took another, HARDIRQ-READ-unsafe lock in the past:
>   (>leddev_list_lock){.+.?}-{2:2}
> 
>  and interrupts could create inverse lock ordering between them.
> 
>  other info that might help us debug this:
>   Possible interrupt unsafe locking scenario:
> 
> CPU0CPU1
> 
>lock(>leddev_list_lock);
> local_irq_disable();
> lock(>lock);
> lock(>leddev_list_lock);
>
>  lock(>lock);
> 
>   *** DEADLOCK ***
> 
>  no locks held by swapper/3/0.
> 
>  the shortest dependencies between 2nd lock and 1st lock:
>   -> (>leddev_list_lock){.+.?}-{2:2} ops: 46 {
>  HARDIRQ-ON-R at:
>lock_acquire+0x15f/0x420
>_raw_read_lock+0x42/0x90
>led_trigger_event+0x2b/0x70
>rfkill_global_led_trigger_worker+0x94/0xb0
>process_one_work+0x240/0x560
>worker_thread+0x58/0x3d0
>kthread+0x151/0x170
>ret_from_fork+0x1f/0x30
>  IN-SOFTIRQ-R at:
>lock_acquire+0x15f/0x420
>_raw_read_lock+0x42/0x90
>led_trigger_event+0x2b/0x70
>kbd_bh+0x9e/0xc0
>tasklet_action_common.constprop.0+0xe9/0x100
>tasklet_action+0x22/0x30
>__do_softirq+0xcc/0x46d
>run_ksoftirqd+0x3f/0x70
>smpboot_thread_fn+0x116/0x1f0
>kthread+0x151/0x170
>ret_from_fork+0x1f/0x30
>  SOFTIRQ-ON-R at:
>lock_acquire+0x15f/0x420
>_raw_read_lock+0x42/0x90
>led_trigger_event+0x2b/0x70
>rfkill_global_led_trigger_worker+0x94/0xb0
>process_one_work+0x240/0x560
>worker_thread+0x58/0x3d0
>kthread+0x151/0x170
>ret_from_fork+0x1f/0x30
>  INITIAL READ USE at:
>lock_acquire+0x15f/0x420
>_raw_read_lock+0x42/0x90
>led_trigger_event+0x2b/0x70
>rfkill_global_led_trigger_worker+0x94/0xb0
>process_one_work+0x240/0x560
>worker_thread+0x58/0x3d0
>kthread+0x151/0x170
>ret_from_fork+0x1f/0x30
>}
>... key  at: [] __key.0+0x0/0x10
>... acquired at:
> _raw_read_lock+0x42/0x90
> led_trigger_blink_oneshot+0x3b/0x90
> ledtrig_disk_activity+0x3c/0xa0
> ata_qc_complete+0x26/0x450
> ata_do_link_abort+0xa3/0xe0
> ata_port_freeze+0x2e/0x40
> ata_hsm_qc_complete+0x94/0xa0
> ata_sff_hsm_move+0x177/0x7a0
> ata_sff_pio_task+0xc7/0x1b0
> process_one_work+0x240/0x560
> worker_thread+0x58/0x3d0
> kthread+0x151/0x170
> ret_from_fork+0x1f/0x30
> 
>  -> (>lock){-...}-{2:2} ops: 69 {
> IN-HARDIRQ-W at:
>  lock_acquire+0x15f/0x420
>  _raw_spin_lock_irqsave+0x52/0xa0
>  ata_bmdma_interrupt+0x27/0x200
>  __handle_irq_event_percpu+0xd5/0x2b0
>  handle_irq_event+0x57/0xb0
>  handle_edge_irq+0x8c/0x230
>  asm_call_irq_on_stack+0xf/0x20
>  common_interrupt+0x100/0x1c0
>  asm_common_interrupt+0x1e/0x40
>  native_safe_halt+0xe/0x10
>  arch_cpu_idle+0x15/0x20
>  default_idle_call+0x59/0x1c0
>  do_idle+0x22c/0x2c0
>  cpu_startup_entry+0x20/0x30
>  start_secondary+0x11d/0x150
>  secondary_startup_64_no_verify+0xa6/0xab
> INITIAL USE at:
> lock_acquire+0x15f/0x420
> _raw_spin_lock_irqsave+0x52/0xa0
> ata_dev_init+0x54/0xe0
> ata_link_init+0x8b/0xd0
> ata_port_alloc+0x1f1/0x210
> ata_host_alloc+0xf1/0x130
> ata_host_alloc_pinfo+0x14/0xb0
> ata_pci_sff_prepare_host+0x41/0xa0
> ata_pci_bmdma_prepare_host+0x14/0x30
> 

[PATCH] leds: trigger: fix potential deadlock with libata

2020-11-02 Thread Andrea Righi
We have the following potential deadlock condition:

 
 WARNING: possible irq lock inversion dependency detected
 5.10.0-rc2+ #25 Not tainted
 
 swapper/3/0 just changed the state of lock:
 8880063bd618 (>lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200
 but this lock took another, HARDIRQ-READ-unsafe lock in the past:
  (>leddev_list_lock){.+.?}-{2:2}

 and interrupts could create inverse lock ordering between them.

 other info that might help us debug this:
  Possible interrupt unsafe locking scenario:

CPU0CPU1

   lock(>leddev_list_lock);
local_irq_disable();
lock(>lock);
lock(>leddev_list_lock);
   
 lock(>lock);

  *** DEADLOCK ***

 no locks held by swapper/3/0.

 the shortest dependencies between 2nd lock and 1st lock:
  -> (>leddev_list_lock){.+.?}-{2:2} ops: 46 {
 HARDIRQ-ON-R at:
   lock_acquire+0x15f/0x420
   _raw_read_lock+0x42/0x90
   led_trigger_event+0x2b/0x70
   rfkill_global_led_trigger_worker+0x94/0xb0
   process_one_work+0x240/0x560
   worker_thread+0x58/0x3d0
   kthread+0x151/0x170
   ret_from_fork+0x1f/0x30
 IN-SOFTIRQ-R at:
   lock_acquire+0x15f/0x420
   _raw_read_lock+0x42/0x90
   led_trigger_event+0x2b/0x70
   kbd_bh+0x9e/0xc0
   tasklet_action_common.constprop.0+0xe9/0x100
   tasklet_action+0x22/0x30
   __do_softirq+0xcc/0x46d
   run_ksoftirqd+0x3f/0x70
   smpboot_thread_fn+0x116/0x1f0
   kthread+0x151/0x170
   ret_from_fork+0x1f/0x30
 SOFTIRQ-ON-R at:
   lock_acquire+0x15f/0x420
   _raw_read_lock+0x42/0x90
   led_trigger_event+0x2b/0x70
   rfkill_global_led_trigger_worker+0x94/0xb0
   process_one_work+0x240/0x560
   worker_thread+0x58/0x3d0
   kthread+0x151/0x170
   ret_from_fork+0x1f/0x30
 INITIAL READ USE at:
   lock_acquire+0x15f/0x420
   _raw_read_lock+0x42/0x90
   led_trigger_event+0x2b/0x70
   rfkill_global_led_trigger_worker+0x94/0xb0
   process_one_work+0x240/0x560
   worker_thread+0x58/0x3d0
   kthread+0x151/0x170
   ret_from_fork+0x1f/0x30
   }
   ... key  at: [] __key.0+0x0/0x10
   ... acquired at:
_raw_read_lock+0x42/0x90
led_trigger_blink_oneshot+0x3b/0x90
ledtrig_disk_activity+0x3c/0xa0
ata_qc_complete+0x26/0x450
ata_do_link_abort+0xa3/0xe0
ata_port_freeze+0x2e/0x40
ata_hsm_qc_complete+0x94/0xa0
ata_sff_hsm_move+0x177/0x7a0
ata_sff_pio_task+0xc7/0x1b0
process_one_work+0x240/0x560
worker_thread+0x58/0x3d0
kthread+0x151/0x170
ret_from_fork+0x1f/0x30

 -> (>lock){-...}-{2:2} ops: 69 {
IN-HARDIRQ-W at:
 lock_acquire+0x15f/0x420
 _raw_spin_lock_irqsave+0x52/0xa0
 ata_bmdma_interrupt+0x27/0x200
 __handle_irq_event_percpu+0xd5/0x2b0
 handle_irq_event+0x57/0xb0
 handle_edge_irq+0x8c/0x230
 asm_call_irq_on_stack+0xf/0x20
 common_interrupt+0x100/0x1c0
 asm_common_interrupt+0x1e/0x40
 native_safe_halt+0xe/0x10
 arch_cpu_idle+0x15/0x20
 default_idle_call+0x59/0x1c0
 do_idle+0x22c/0x2c0
 cpu_startup_entry+0x20/0x30
 start_secondary+0x11d/0x150
 secondary_startup_64_no_verify+0xa6/0xab
INITIAL USE at:
lock_acquire+0x15f/0x420
_raw_spin_lock_irqsave+0x52/0xa0
ata_dev_init+0x54/0xe0
ata_link_init+0x8b/0xd0
ata_port_alloc+0x1f1/0x210
ata_host_alloc+0xf1/0x130
ata_host_alloc_pinfo+0x14/0xb0
ata_pci_sff_prepare_host+0x41/0xa0
ata_pci_bmdma_prepare_host+0x14/0x30
piix_init_one+0x21f/0x600
local_pci_probe+0x48/0x80
pci_device_probe+0x105/0x1c0
really_probe+0x221/0x490
driver_probe_device+0xe9/0x160