subject:"Re\: \[GIT pull\] irq updates for 4.13"

Re: [GIT pull] irq updates for 4.13

2017-07-17 Thread Pavel Machek

On Mon 2017-07-17 13:01:58, Linus Torvalds wrote:
> On Sat, Jul 15, 2017 at 1:24 PM, Pavel Machek  wrote:
> >
> > But I don't see the commit in 4.13-rc0. Could we get it in now, so
> > that problem is fixed in -rc1?
> 
> It didn't hit rc1, but it's in my tree now.

Thanks for head-up. Yes, current tree works ok for me.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [GIT pull] irq updates for 4.13

2017-07-17 Thread Linus Torvalds

On Sat, Jul 15, 2017 at 1:24 PM, Pavel Machek  wrote:
>
> But I don't see the commit in 4.13-rc0. Could we get it in now, so
> that problem is fixed in -rc1?

It didn't hit rc1, but it's in my tree now.

Linus

Re: [GIT pull] irq updates for 4.13

2017-07-16 Thread Tony Lindgren

* Pavel Machek  [170715 13:24]:
> Hi!
> 
> > > On Tue, Jul 11, 2017 at 11:41:52PM +0200, Thomas Gleixner wrote:
> > > > [...]
> > > >
> > > > Here is a revised version of the previous patch with the conditional
> > > > locking removed and a bunch of comments added.
> > > 
> > > That one also fixes Droid 4 boot.
> > > 
> > > Tested-by: Sebastian Reichel 
> > 
> > Sill works for me too:
> > 
> > Tested-by: Tony Lindgren 
> 
> I seen the announcement
> 
> Date: Wed, 12 Jul 2017 01:18:50 -0700
> From: tip-bot for Thomas Gleixner 
> To: linux-tip-comm...@vger.kernel.org
> Subject: [tip:irq/urgent] genirq: Keep chip buslock across
> irq_request/release_resources()
> 
> But I don't see the commit in 4.13-rc0. Could we get it in now, so
> that problem is fixed in -rc1?

Seems we missed that for -rc1. In general, I've noticed that the rule
of having code sit in next before the merge window really helps
preventing regressions during the merge window. That is as long
people keep testing next on almost daily basis and report
regressions promptly.. and I do just to avoid chasing regressions
during the -rc cycle.

And the real reason why I think catching the regressions in next
helps is the fact that people react to regressions much faster to
revert patches compared to after things get merged into the mainline
kernel :p

In this case Thomas reacted within hours and fixed the issue, so
no issues there and thanks for doing that. But we still got -rc1
with a regression that probably could have been prevented with
enough time in next. So maybe we should be more strict with the
next requirement? Just sayin.

Tony

Re: [GIT pull] irq updates for 4.13

2017-07-15 Thread Pavel Machek

Hi!

> > On Tue, Jul 11, 2017 at 11:41:52PM +0200, Thomas Gleixner wrote:
> > > [...]
> > >
> > > Here is a revised version of the previous patch with the conditional
> > > locking removed and a bunch of comments added.
> > 
> > That one also fixes Droid 4 boot.
> > 
> > Tested-by: Sebastian Reichel 
> 
> Sill works for me too:
> 
> Tested-by: Tony Lindgren 

I seen the announcement

Date: Wed, 12 Jul 2017 01:18:50 -0700
From: tip-bot for Thomas Gleixner 
To: linux-tip-comm...@vger.kernel.org
Subject: [tip:irq/urgent] genirq: Keep chip buslock across
irq_request/release_resources()

But I don't see the commit in 4.13-rc0. Could we get it in now, so
that problem is fixed in -rc1?

Thanks,
Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [GIT pull] irq updates for 4.13

2017-07-12 Thread Geert Uytterhoeven

Hi Grygorii,

On Tue, Jul 11, 2017 at 5:39 PM, Grygorii Strashko
 wrote:
> On 07/11/2017 09:41 AM, Thomas Gleixner wrote:
>> On Tue, 11 Jul 2017, Tony Lindgren wrote:
>>> * Thomas Gleixner  [170711 02:48]:
>>> And "external abort on non-linefetch" means something is not clocked
>>> in this case. The following alone makes things boot for me again, but I 
>>> don't
>>> quite follow what has now changed with the ordering.. Thomas, any ideas?
>>
>> Ah. Now that makes sense.
>>
>> Unpatched the ordering is:
>>
>> chip_bus_lock(desc);
>> irq_request_resources(desc);
>>
>> Now the offending change reordered the calls. OMAP gpio has:
>>
>>  omap_gpio_irq_bus_lock()
>> pm_runtime_get_sync(bank->chip.parent);
>>
>> So that at least explains the error. So that omap gpio irq chip (ab)uses
>> the bus_lock() callback to do runtime power management. Sigh, I did not
>> expect that. Let me have a deeper look if that's OMAP only or whether this
>> happens in other places as well.
>
> It was the only one way to power on GPIO bank when the first GPIO IRQ is 
> requested,
> as all other irqchip callbacks are under raw_lock while pm_runtime uses 
> spinlock, as
> result on -RT it was not possible to use PM runtime in other irqchip 
> callbacks.
>
> Now, I think, It might be possible to use irq_chip_pm_get(), but there is one 
> problem -
> OMAP Power management platform code can call 
> omap2_gpio_prepare_for_idle()/omap2_gpio_resume_after_idle()
> which expected to disable GPIO banks using PM runtime and current driver 
> implementation
> expect to have PM runtime usage_count = 1.
>
> Tony, Potentially we can use pm_runtime_force_suspend()/resume() there, but 
> they are not compatible with
> irqoff context (CPUIdle late stages).
>
> In other words, below patch should fix this issue, but will break CPUIdle on 
> OMAP :(
>
> --
> From dfca1c806f03ad6bdd72b634d71c96d39bda2046 Mon Sep 17 00:00:00 2001
> From: Grygorii Strashko 
> Date: Tue, 11 Jul 2017 10:36:23 -0500
> Subject: [PATCH] gpio: omap: switch to use irq_chip_pm_get/put()
>
> Signed-off-by: Grygorii Strashko 

That looks similar to how gpio-rcar was fixed, which used to do the
same trick in its bus_lock() callbacks.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Tony Lindgren

* Sebastian Reichel  [170711 15:51]:
> Hi,
> 
> On Tue, Jul 11, 2017 at 11:41:52PM +0200, Thomas Gleixner wrote:
> > [...]
> >
> > Here is a revised version of the previous patch with the conditional
> > locking removed and a bunch of comments added.
> 
> That one also fixes Droid 4 boot.
> 
> Tested-by: Sebastian Reichel 

Sill works for me too:

Tested-by: Tony Lindgren 

Thanks,

Tony

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Sebastian Reichel

Hi,

On Tue, Jul 11, 2017 at 11:41:52PM +0200, Thomas Gleixner wrote:
> [...]
>
> Here is a revised version of the previous patch with the conditional
> locking removed and a bunch of comments added.

That one also fixes Droid 4 boot.

Tested-by: Sebastian Reichel 

-- Sebastian

> 8<
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1090,6 +1090,16 @@ setup_irq_thread(struct irqaction *new,
>  /*
>   * Internal function to register an irqaction - typically used to
>   * allocate special interrupts that are part of the architecture.
> + *
> + * Locking rules:
> + *
> + * desc->request_mutex   Provides serialization against a concurrent 
> free_irq()
> + *   chip_bus_lock   Provides serialization for slow bus operations
> + * desc->lockProvides serialization against hard interrupts
> + *
> + * chip_bus_lock and desc->lock are sufficient for all other management and
> + * interrupt related functions. desc->request_mutex solely serializes
> + * request/free_irq().
>   */
>  static int
>  __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
> @@ -1167,20 +1177,35 @@ static int
>   if (desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE)
>   new->flags &= ~IRQF_ONESHOT;
>  
> + /*
> +  * Protects against a concurrent __free_irq() call which might wait
> +  * for synchronize_irq() to complete without holding the optional
> +  * chip bus lock and desc->lock.
> +  */
>   mutex_lock(&desc->request_mutex);
> +
> + /*
> +  * Acquire bus lock as the irq_request_resources() callback below
> +  * might rely on the serialization or the magic power management
> +  * functions which are abusing the irq_bus_lock() callback,
> +  */
> + chip_bus_lock(desc);
> +
> + /* First installed action requests resources. */
>   if (!desc->action) {
>   ret = irq_request_resources(desc);
>   if (ret) {
>   pr_err("Failed to request resources for %s (irq %d) on 
> irqchip %s\n",
>  new->name, irq, desc->irq_data.chip->name);
> - goto out_mutex;
> + goto out_bus_unlock;
>   }
>   }
>  
> - chip_bus_lock(desc);
> -
>   /*
>* The following block of code has to be executed atomically
> +  * protected against a concurrent interrupt and any of the other
> +  * management calls which are not serialized via
> +  * desc->request_mutex or the optional bus lock.
>*/
>   raw_spin_lock_irqsave(&desc->lock, flags);
>   old_ptr = &desc->action;
> @@ -1286,10 +1311,8 @@ static int
>   ret = __irq_set_trigger(desc,
>   new->flags & IRQF_TRIGGER_MASK);
>  
> - if (ret) {
> - irq_release_resources(desc);
> + if (ret)
>   goto out_unlock;
> - }
>   }
>  
>   desc->istate &= ~(IRQS_AUTODETECT | IRQS_SPURIOUS_DISABLED | \
> @@ -1385,12 +1408,10 @@ static int
>  out_unlock:
>   raw_spin_unlock_irqrestore(&desc->lock, flags);
>  
> - chip_bus_sync_unlock(desc);
> -
>   if (!desc->action)
>   irq_release_resources(desc);
> -
> -out_mutex:
> +out_bus_unlock:
> + chip_bus_sync_unlock(desc);
>   mutex_unlock(&desc->request_mutex);
>  
>  out_thread:
> @@ -1472,6 +1493,7 @@ static struct irqaction *__free_irq(unsi
>   WARN(1, "Trying to free already-free IRQ %d\n", irq);
>   raw_spin_unlock_irqrestore(&desc->lock, flags);
>   chip_bus_sync_unlock(desc);
> + mutex_unlock(&desc->request_mutex);
>   return NULL;
>   }
>  
> @@ -1498,6 +1520,20 @@ static struct irqaction *__free_irq(unsi
>  #endif
>  
>   raw_spin_unlock_irqrestore(&desc->lock, flags);
> + /*
> +  * Drop bus_lock here so the changes which were done in the chip
> +  * callbacks above are synced out to the irq chips which hang
> +  * behind a slow bus (I2C, SPI) before calling synchronize_irq().
> +  *
> +  * Aside of that the bus_lock can also be taken from the threaded
> +  * handler in irq_finalize_oneshot() which results in a deadlock
> +  * because synchronize_irq() would wait forever for the thread to
> +  * complete, which is blocked on the bus lock.
> +  *
> +  * The still held desc->request_mutex() protects against a
> +  * concurrent request_irq() of this irq so the release of resources
> +  * and timing data is properly serialized.
> +  */
>   chip_bus_sync_unlock(desc);
>  
>   unregister_handler_proc(irq, action);
> @@ -1530,8 +1566,15 @@ static struct irqaction *__free_irq(unsi
>   }
>   }
>  
> + /* Last action releases resources */
>   if (!desc->

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Linus Torvalds

On Tue, Jul 11, 2017 at 2:41 PM, Thomas Gleixner  wrote:
>
> Here is a revised version of the previous patch with the conditional
> locking removed and a bunch of comments added.

This one looks good to me. Thanks,

Linus

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

On Tue, 11 Jul 2017, Linus Torvalds wrote:

> On Tue, Jul 11, 2017 at 10:52 AM, Thomas Gleixner  wrote:
> >
> > Not completely, because of the free path issues. See the other mail. Tony
> > confirmed that it works. I wait for Sebastian and queue it with a proper
> > changelog, ok?
> 
> Ugh, I absolutely detest your ugly "bool buslock" parameter to
> irq_release_resources().

Yes, that was a knee jerk reaction to avoid the buslock/unlock dance if the
chip does not have a irq_release_resources() callback. Silly in hindsight
as this is really not a fast path.

> And there seems to be no reason for it.
> 
> Why don't you just move the
> 
> chip_bus_sync_unlock(desc);
> 
> call in __free_irq() down to just before you release the request_mutex?

See below.

> In fact, looking at __free_irq(), I note that it's locking is
> completely broken shit. Look at the
> 
> if (!action) {
> WARN(1, "Trying to free already-free IRQ %d\n", irq);
> 
> error case, and look for where it unlocks request_mutex. Yeah, it doesn't.

Yes, I noticed that and my patch fixed it already.

> Why not fix those stupid bugs and clean things up at the same time?
> Make the rule be that as you take the request_mutex lock, you then
> also do the chip_bus_lock().
> 
> And when you release the request_mutex lock, you do
> chip_bus_sync_unlock() just before.
> 
> And no, I have no idea what the locking rules are for
> irq_finalize_oneshot() - it does that chip_bus_lock() without having
> any external serialization. Is that ok? Are the chip handlers able to
> deal with that? Same seems to go for free_percpu_irq().

The extra serialization is only required to protect stuff across
request/free. Everything else is serialized via bus_lock (if the irq chip
has it) and the desc->lock spinlock.

> Comments? Even if they are "Linus, you're way out of line, and you
> can't just move that chip_bus_sync_unlock() down like that because of
> XYZ, you moron".
> 
> For example, it's entirely possible that we can't do the
> "synchronize_irq()" waiting while we hold that chip_bus_lock().  But
> the ones I looked at seemed to all take sleeping locks (or no locks at
> all - doing other things), which implies that they certainly can't be
> blocking irq delivery.

We can't do that move for two reasons:

   1) The data which has been changed between bus_lock/un_lock is cached in
  the irq chip driver private data and needs to go out to the irq chip
  via the slow bus (usually SPI or I2C).

  That's the reason why this bus_lock/unlock magic exists in the first
  place, as you cannot do SPI/I2C transactions while holding desc->lock
  with interrupts disabled.

   2) synchronize_irq() will actually deadlock, if there is a handler on
  flight. These chips use threaded handlers for obvious reasons, as
  they allow to do SPI/I2C communication. When the threaded handler
  returns then bus_lock needs to be taken in irq_finalize_oneshot() as
  we need to talk to the actual irq chip once more. After that the
  threaded handler is marked done, which makes synchronize_irq() return.

  So if we hold bus_lock accross the synchronize_irq() call, the
  handler cannot mark itself done because it blocks on the bus
  lock. That in turn makes synchronize_irq() wait forever on the
  threaded handler to complete

  Preventing that would require even uglier conditional locking in
  irq_finalize_oneshot(). /me runs and hides

Here is a revised version of the previous patch with the conditional
locking removed and a bunch of comments added.

Thanks,

tglx

8<
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1090,6 +1090,16 @@ setup_irq_thread(struct irqaction *new,
 /*
  * Internal function to register an irqaction - typically used to
  * allocate special interrupts that are part of the architecture.
+ *
+ * Locking rules:
+ *
+ * desc->request_mutex Provides serialization against a concurrent free_irq()
+ *   chip_bus_lock Provides serialization for slow bus operations
+ * desc->lock  Provides serialization against hard interrupts
+ *
+ * chip_bus_lock and desc->lock are sufficient for all other management and
+ * interrupt related functions. desc->request_mutex solely serializes
+ * request/free_irq().
  */
 static int
 __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)
@@ -1167,20 +1177,35 @@ static int
if (desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE)
new->flags &= ~IRQF_ONESHOT;

+   /*
+* Protects against a concurrent __free_irq() call which might wait
+* for synchronize_irq() to complete without holding the optional
+* chip bus lock and desc->lock.
+*/
mutex_lock(&desc->request_mutex);
+
+   /*
+* Acquire bus lock as the irq_request_resources() callback below
+* might rely on the serialization or the mag

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Sebastian Reichel

Hi,

On Tue, Jul 11, 2017 at 11:16:03AM -0700, Linus Torvalds wrote:
> On Tue, Jul 11, 2017 at 10:52 AM, Thomas Gleixner  wrote:
> > Not completely, because of the free path issues. See the other mail. Tony
> > confirmed that it works. I wait for Sebastian and queue it with a proper
> > changelog, ok?
>
> Ugh, I absolutely detest your ugly "bool buslock" parameter to
> irq_release_resources().

So /me will skip testing Thomas' patch for now.

> And there seems to be no reason for it.
> 
> Why don't you just move the
> 
> chip_bus_sync_unlock(desc);
> 
> call in __free_irq() down to just before you release the request_mutex?
> 
> In fact, looking at __free_irq(), I note that it's locking is
> completely broken shit. Look at the
> 
> if (!action) {
> WARN(1, "Trying to free already-free IRQ %d\n", irq);
> 
> error case, and look for where it unlocks request_mutex. Yeah, it doesn't.
> 
> So honestly, I think this code is broken, and it's broken partly
> because it has some really bad locking and logic rules.
> 
> Why not fix those stupid bugs and clean things up at the same time?
> Make the rule be that as you take the request_mutex lock, you then
> also do the chip_bus_lock().
> 
> And when you release the request_mutex lock, you do
> chip_bus_sync_unlock() just before.
> 
> And no, I have no idea what the locking rules are for
> irq_finalize_oneshot() - it does that chip_bus_lock() without having
> any external serialization. Is that ok? Are the chip handlers able to
> deal with that? Same seems to go for free_percpu_irq().
> 
> Anyway, patch attached (AGAIN, TOTALLY UNTESTED) showing what I mean,
> and fixing (well, modulo any bugs I introduced by my untested sh*t)
> that definite bug in lack of unlocking.
> 
> But that "bool buslock" thing really is too disgusting. Conditional
> locking should not be done. It's a sign of serious problems, imnsho.
> 
> Comments? Even if they are "Linus, you're way out of line, and you
> can't just move that chip_bus_sync_unlock() down like that because of
> XYZ, you moron".
> 
> For example, it's entirely possible that we can't do the
> "synchronize_irq()" waiting while we hold that chip_bus_lock().  But
> the ones I looked at seemed to all take sleeping locks (or no locks at
> all - doing other things), which implies that they certainly can't be
> blocking irq delivery.
> 
> So I'm *not* claiming that the attached patch is necessarily right. I
> just really don't like your conditional lock thing, and this would
> seem to possibly be a clean way around it if it works.

This fixes boot on Droid 4:

Tested-by: Sebastian Reichel 

-- Sebastian

>  kernel/irq/manage.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 5624b2dd6b58..c4cbda784ea5 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1168,17 +1168,17 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
> struct irqaction *new)
>   new->flags &= ~IRQF_ONESHOT;
>  
>   mutex_lock(&desc->request_mutex);
> + chip_bus_lock(desc);
> +
>   if (!desc->action) {
>   ret = irq_request_resources(desc);
>   if (ret) {
>   pr_err("Failed to request resources for %s (irq %d) on 
> irqchip %s\n",
>  new->name, irq, desc->irq_data.chip->name);
> - goto out_mutex;
> + goto out_unlock_chip_bus;
>   }
>   }
>  
> - chip_bus_lock(desc);
> -
>   /*
>* The following block of code has to be executed atomically
>*/
> @@ -1385,12 +1385,11 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
> struct irqaction *new)
>  out_unlock:
>   raw_spin_unlock_irqrestore(&desc->lock, flags);
>  
> - chip_bus_sync_unlock(desc);
> -
>   if (!desc->action)
>   irq_release_resources(desc);
>  
> -out_mutex:
> +out_unlock_chip_bus:
> + chip_bus_sync_unlock(desc);
>   mutex_unlock(&desc->request_mutex);
>  
>  out_thread:
> @@ -1472,6 +1471,7 @@ static struct irqaction *__free_irq(unsigned int irq, 
> void *dev_id)
>   WARN(1, "Trying to free already-free IRQ %d\n", irq);
>   raw_spin_unlock_irqrestore(&desc->lock, flags);
>   chip_bus_sync_unlock(desc);
> + mutex_unlock(&desc->request_mutex);
>   return NULL;
>   }
>  
> @@ -1498,7 +1498,6 @@ static struct irqaction *__free_irq(unsigned int irq, 
> void *dev_id)
>  #endif
>  
>   raw_spin_unlock_irqrestore(&desc->lock, flags);
> - chip_bus_sync_unlock(desc);
>  
>   unregister_handler_proc(irq, action);
>  
> @@ -1535,6 +1534,7 @@ static struct irqaction *__free_irq(unsigned int irq, 
> void *dev_id)
>   irq_remove_timings(desc);
>   }
>  
> + chip_bus_sync_unlock(desc);
>   mutex_unlock(&desc->req

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Linus Torvalds

On Tue, Jul 11, 2017 at 10:52 AM, Thomas Gleixner  wrote:
>
> Not completely, because of the free path issues. See the other mail. Tony
> confirmed that it works. I wait for Sebastian and queue it with a proper
> changelog, ok?

Ugh, I absolutely detest your ugly "bool buslock" parameter to
irq_release_resources().

And there seems to be no reason for it.

Why don't you just move the

chip_bus_sync_unlock(desc);

call in __free_irq() down to just before you release the request_mutex?

In fact, looking at __free_irq(), I note that it's locking is
completely broken shit. Look at the

if (!action) {
WARN(1, "Trying to free already-free IRQ %d\n", irq);

error case, and look for where it unlocks request_mutex. Yeah, it doesn't.

So honestly, I think this code is broken, and it's broken partly
because it has some really bad locking and logic rules.

Why not fix those stupid bugs and clean things up at the same time?
Make the rule be that as you take the request_mutex lock, you then
also do the chip_bus_lock().

And when you release the request_mutex lock, you do
chip_bus_sync_unlock() just before.

And no, I have no idea what the locking rules are for
irq_finalize_oneshot() - it does that chip_bus_lock() without having
any external serialization. Is that ok? Are the chip handlers able to
deal with that? Same seems to go for free_percpu_irq().

Anyway, patch attached (AGAIN, TOTALLY UNTESTED) showing what I mean,
and fixing (well, modulo any bugs I introduced by my untested sh*t)
that definite bug in lack of unlocking.

But that "bool buslock" thing really is too disgusting. Conditional
locking should not be done. It's a sign of serious problems, imnsho.

Comments? Even if they are "Linus, you're way out of line, and you
can't just move that chip_bus_sync_unlock() down like that because of
XYZ, you moron".

For example, it's entirely possible that we can't do the
"synchronize_irq()" waiting while we hold that chip_bus_lock().  But
the ones I looked at seemed to all take sleeping locks (or no locks at
all - doing other things), which implies that they certainly can't be
blocking irq delivery.

So I'm *not* claiming that the attached patch is necessarily right. I
just really don't like your conditional lock thing, and this would
seem to possibly be a clean way around it if it works.

Linus
 kernel/irq/manage.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 5624b2dd6b58..c4cbda784ea5 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1168,17 +1168,17 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
new->flags &= ~IRQF_ONESHOT;

mutex_lock(&desc->request_mutex);
+   chip_bus_lock(desc);
+
if (!desc->action) {
ret = irq_request_resources(desc);
if (ret) {
pr_err("Failed to request resources for %s (irq %d) on 
irqchip %s\n",
   new->name, irq, desc->irq_data.chip->name);
-   goto out_mutex;
+   goto out_unlock_chip_bus;
}
}

-   chip_bus_lock(desc);
-
/*
 * The following block of code has to be executed atomically
 */
@@ -1385,12 +1385,11 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
 out_unlock:
raw_spin_unlock_irqrestore(&desc->lock, flags);

-   chip_bus_sync_unlock(desc);
-
if (!desc->action)
irq_release_resources(desc);

-out_mutex:
+out_unlock_chip_bus:
+   chip_bus_sync_unlock(desc);
mutex_unlock(&desc->request_mutex);

 out_thread:
@@ -1472,6 +1471,7 @@ static struct irqaction *__free_irq(unsigned int irq, 
void *dev_id)
WARN(1, "Trying to free already-free IRQ %d\n", irq);
raw_spin_unlock_irqrestore(&desc->lock, flags);
chip_bus_sync_unlock(desc);
+   mutex_unlock(&desc->request_mutex);
return NULL;
}

@@ -1498,7 +1498,6 @@ static struct irqaction *__free_irq(unsigned int irq, 
void *dev_id)
 #endif

raw_spin_unlock_irqrestore(&desc->lock, flags);
-   chip_bus_sync_unlock(desc);

unregister_handler_proc(irq, action);

@@ -1535,6 +1534,7 @@ static struct irqaction *__free_irq(unsigned int irq, 
void *dev_id)
irq_remove_timings(desc);
}

+   chip_bus_sync_unlock(desc);
mutex_unlock(&desc->request_mutex);

irq_chip_pm_put(&desc->irq_data);

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

On Tue, 11 Jul 2017, Linus Torvalds wrote:
> On Tue, Jul 11, 2017 at 9:19 AM, Thomas Gleixner  wrote:
> >
> > What I do not understand here is that we have already power management
> > around all of that.
> >
> >irq_chip_pm_get(&desc->irq_data);
> >...
> >chip_bus_lock(desc);
> >...
> >chip_bus_unlock_sync(desc);
> >...
> >irq_chip_pm_put(&desc->irq_data);
> >
> > So why is that not sufficient and needs extra magic in that GPIO driver?
> 
> Well, irq_chip_pm_get/put() isn't called just over the operation, it's
> called over the *whole* sequence of the irq being enabled at all.
> 
> So the different (right now) is that
> 
>  - chip_bus_lock/unlock_sync() is purely done around the actual
> operations to set up and tear down the irq data.
> 
>So this just covers the very short setup/teardown.
> 
>  - irq_chip_pm_get/put() is called around the *whole* "irqs can be active" 
> block
> 
>This covers the whole lifetime of the irq, from setup to free.
> 
> Very different.
> 
> I'd really prefer my simple patch for now, leaving everything working
> the way it used to work. I *think* it's ok for RT too. Yes?

Not completely, because of the free path issues. See the other mail. Tony
confirmed that it works. I wait for Sebastian and queue it with a proper
changelog, ok?

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Tony Lindgren

* Thomas Gleixner  [170711 10:17]:
> On Tue, 11 Jul 2017, Tony Lindgren wrote:
> > * Linus Torvalds  [170711 08:40]:
> > > On Tue, Jul 11, 2017 at 7:41 AM, Thomas Gleixner  
> > > wrote:
> > > >
> > > > Ah. Now that makes sense.
> > > >
> > > > Unpatched the ordering is:
> > > >
> > > >   chip_bus_lock(desc);
> > > >   irq_request_resources(desc);
> > > 
> > > I *looked* at that ordering and then went "Naah, that makes no sense".
> > > 
> > > But if that's the only issue, how about we just re-order those things
> > > - we still don't need to move the irq_request_resources() into the
> > > spinlock, we just move it to below the chip_bus_lock().
> > > 
> > > IOW, something like the (COMPLETELY UNTEESTED!) attached patch.
> > 
> > Yeah that fixes the issue:
> > 
> > Tested-by: Tony Lindgren 
> > 
> > > This assumes that the chip_bus_lock() thing is still ok for the RT
> > > case, but it looks like it might be: the only other one I looked at
> > > (apart from the gpio-omap one) used a mutex.
> > 
> > Yeah and the ordering below makes more sense to me at least. That is
> > assuming we want to call chip_bus_lock() before we start calling the
> > chip functions :)
> 
> We can do that, just the free path is ugly and does not really work that
> way.

OK

> __free_irq()
>   
>   chip_bus_sync_unlock(desc);
>   ...
>   synchronize_irq(irq);
>   ...
>   if (!desc->action) {
>   irq_release_resources();
>   irq_remove_timings();
>   }
>   mutex_unlock(&desc->request_mutex);
> 
> We can't release request_mutex early otherwise we run into the issue of a
> concurrent request_irq() trying to reuse stuff which we just release, but
> we can't reacquire bus_lock under request_mutex either when we change the
> lock ordering to bus_lock -> desc->request_mutex -> desc->lock.
> 
> We really want to have both the release_resources() and the
> remove_timings() calls outside of the spinlocked region. That's not only a
> RT issue, there have been requests for making the resource call 'sleepable'
> for mainline as well.
> 
> Below is a slightly different fix, which keeps the lock order
> 
>   desc->request_mutex -> bus_lock -> desc->lock
> 
> intact and conditionally reacquired the bus lock for the release call.

Yeah that fixes the issue too:

Tested-by: Tony Lindgren 

Regards,

Tony


> 8<
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1036,13 +1036,20 @@ static int irq_request_resources(struct
>   return c->irq_request_resources ? c->irq_request_resources(d) : 0;
>  }
>  
> -static void irq_release_resources(struct irq_desc *desc)
> +static void irq_release_resources(struct irq_desc *desc, bool buslock)
>  {
>   struct irq_data *d = &desc->irq_data;
>   struct irq_chip *c = d->chip;
>  
> - if (c->irq_release_resources)
> - c->irq_release_resources(d);
> + if (!c->irq_release_resources)
> + return;
> + if (buslock)
> + chip_bus_lock(desc);
> +
> + c->irq_release_resources(d);
> +
> + if (buslock)
> + chip_bus_sync_unlock(desc);
>  }
>  
>  static int
> @@ -1168,17 +1175,16 @@ static int
>   new->flags &= ~IRQF_ONESHOT;
>  
>   mutex_lock(&desc->request_mutex);
> + chip_bus_lock(desc);
>   if (!desc->action) {
>   ret = irq_request_resources(desc);
>   if (ret) {
>   pr_err("Failed to request resources for %s (irq %d) on 
> irqchip %s\n",
>  new->name, irq, desc->irq_data.chip->name);
> - goto out_mutex;
> + goto out_bus;
>   }
>   }
>  
> - chip_bus_lock(desc);
> -
>   /*
>* The following block of code has to be executed atomically
>*/
> @@ -1286,10 +1292,8 @@ static int
>   ret = __irq_set_trigger(desc,
>   new->flags & IRQF_TRIGGER_MASK);
>  
> - if (ret) {
> - irq_release_resources(desc);
> + if (ret)
>   goto out_unlock;
> - }
>   }
>  
>   desc->istate &= ~(IRQS_AUTODETECT | IRQS_SPURIOUS_DISABLED | \
> @@ -1385,12 +1389,10 @@ static int
>  out_unlock:
>   raw_spin_unlock_irqrestore(&desc->lock, flags);
>  
> - chip_bus_sync_unlock(desc);
> -
>   if (!desc->action)
> - irq_release_resources(desc);
> -
> -out_mutex:
> + irq_release_resources(desc, false);
> +out_bus:
> + chip_bus_sync_unlock(desc);
>   mutex_unlock(&desc->request_mutex);
>  
>  out_thread:
> @@ -1472,6 +1474,7 @@ static struct irqaction *__free_irq(unsi
>   WARN(1, "Trying to free already-free IRQ %d\n", irq);
>   raw_spin_unlock_irqrestore(&desc->lock, flags);
>   chip_bus_sync_unlock(desc

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

On Tue, 11 Jul 2017, Tony Lindgren wrote:
> * Linus Torvalds  [170711 08:40]:
> > On Tue, Jul 11, 2017 at 7:41 AM, Thomas Gleixner  wrote:
> > >
> > > Ah. Now that makes sense.
> > >
> > > Unpatched the ordering is:
> > >
> > >   chip_bus_lock(desc);
> > >   irq_request_resources(desc);
> > 
> > I *looked* at that ordering and then went "Naah, that makes no sense".
> > 
> > But if that's the only issue, how about we just re-order those things
> > - we still don't need to move the irq_request_resources() into the
> > spinlock, we just move it to below the chip_bus_lock().
> > 
> > IOW, something like the (COMPLETELY UNTEESTED!) attached patch.
> 
> Yeah that fixes the issue:
> 
> Tested-by: Tony Lindgren 
> 
> > This assumes that the chip_bus_lock() thing is still ok for the RT
> > case, but it looks like it might be: the only other one I looked at
> > (apart from the gpio-omap one) used a mutex.
> 
> Yeah and the ordering below makes more sense to me at least. That is
> assuming we want to call chip_bus_lock() before we start calling the
> chip functions :)

We can do that, just the free path is ugly and does not really work that
way.

__free_irq()

chip_bus_sync_unlock(desc);
...
synchronize_irq(irq);
...
if (!desc->action) {
irq_release_resources();
irq_remove_timings();
}
mutex_unlock(&desc->request_mutex);

We can't release request_mutex early otherwise we run into the issue of a
concurrent request_irq() trying to reuse stuff which we just release, but
we can't reacquire bus_lock under request_mutex either when we change the
lock ordering to bus_lock -> desc->request_mutex -> desc->lock.

We really want to have both the release_resources() and the
remove_timings() calls outside of the spinlocked region. That's not only a
RT issue, there have been requests for making the resource call 'sleepable'
for mainline as well.

Below is a slightly different fix, which keeps the lock order

  desc->request_mutex -> bus_lock -> desc->lock

intact and conditionally reacquired the bus lock for the release call.

Thanks,

tglx

8<
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1036,13 +1036,20 @@ static int irq_request_resources(struct
return c->irq_request_resources ? c->irq_request_resources(d) : 0;
 }
 
-static void irq_release_resources(struct irq_desc *desc)
+static void irq_release_resources(struct irq_desc *desc, bool buslock)
 {
struct irq_data *d = &desc->irq_data;
struct irq_chip *c = d->chip;
 
-   if (c->irq_release_resources)
-   c->irq_release_resources(d);
+   if (!c->irq_release_resources)
+   return;
+   if (buslock)
+   chip_bus_lock(desc);
+
+   c->irq_release_resources(d);
+
+   if (buslock)
+   chip_bus_sync_unlock(desc);
 }
 
 static int
@@ -1168,17 +1175,16 @@ static int
new->flags &= ~IRQF_ONESHOT;
 
mutex_lock(&desc->request_mutex);
+   chip_bus_lock(desc);
if (!desc->action) {
ret = irq_request_resources(desc);
if (ret) {
pr_err("Failed to request resources for %s (irq %d) on 
irqchip %s\n",
   new->name, irq, desc->irq_data.chip->name);
-   goto out_mutex;
+   goto out_bus;
}
}
 
-   chip_bus_lock(desc);
-
/*
 * The following block of code has to be executed atomically
 */
@@ -1286,10 +1292,8 @@ static int
ret = __irq_set_trigger(desc,
new->flags & IRQF_TRIGGER_MASK);
 
-   if (ret) {
-   irq_release_resources(desc);
+   if (ret)
goto out_unlock;
-   }
}
 
desc->istate &= ~(IRQS_AUTODETECT | IRQS_SPURIOUS_DISABLED | \
@@ -1385,12 +1389,10 @@ static int
 out_unlock:
raw_spin_unlock_irqrestore(&desc->lock, flags);
 
-   chip_bus_sync_unlock(desc);
-
if (!desc->action)
-   irq_release_resources(desc);
-
-out_mutex:
+   irq_release_resources(desc, false);
+out_bus:
+   chip_bus_sync_unlock(desc);
mutex_unlock(&desc->request_mutex);
 
 out_thread:
@@ -1472,6 +1474,7 @@ static struct irqaction *__free_irq(unsi
WARN(1, "Trying to free already-free IRQ %d\n", irq);
raw_spin_unlock_irqrestore(&desc->lock, flags);
chip_bus_sync_unlock(desc);
+   mutex_unlock(&desc->request_mutex);
return NULL;
}
 
@@ -1531,7 +1534,7 @@ static struct irqaction *__free_irq(unsi
}
 
if (!desc->action) {
-   irq_release_resources(des

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Sebastian Reichel

Hi,

On Tue, Jul 11, 2017 at 09:20:44AM -0700, Tony Lindgren wrote:
> * Sebastian Reichel  [170711 07:41]:
> > Ack, that also works for me. The strange thing is, that I added the
> > following before and it did not print anything.
> > 
> > if (!pm_runtime_enabled(bank->chip.parent))
> > dev_err(bank->chip.parent, "runtime pm issue!\n");
> 
> Enabled but not active, you should have tested for !pm_runtime_active()?

oh right.

-- Sebastian


signature.asc
Description: PGP signature

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Tony Lindgren

* Thomas Gleixner  [170711 09:20]:
> On Tue, 11 Jul 2017, Linus Torvalds wrote:
> 
> > On Tue, Jul 11, 2017 at 7:41 AM, Thomas Gleixner  wrote:
> > >
> > > Ah. Now that makes sense.
> > >
> > > Unpatched the ordering is:
> > >
> > >   chip_bus_lock(desc);
> > >   irq_request_resources(desc);
> > 
> > I *looked* at that ordering and then went "Naah, that makes no sense".
> > 
> > But if that's the only issue, how about we just re-order those things
> > - we still don't need to move the irq_request_resources() into the
> > spinlock, we just move it to below the chip_bus_lock().
> > 
> > IOW, something like the (COMPLETELY UNTEESTED!) attached patch.
> > 
> > This assumes that the chip_bus_lock() thing is still ok for the RT
> > case, but it looks like it might be: the only other one I looked at
> > (apart from the gpio-omap one) used a mutex.
> 
> I looked through all of them and the only special case is gpio-omap.
> 
> What I do not understand here is that we have already power management
> around all of that.
> 
>irq_chip_pm_get(&desc->irq_data);
>...
>chip_bus_lock(desc);
>...
>chip_bus_unlock_sync(desc);
>...
>irq_chip_pm_put(&desc->irq_data);
> 
> So why is that not sufficient and needs extra magic in that GPIO driver?

Yeah it seems we should eventually be able to use irq_chip_pm_get()
like Grygorii just explained.

But aren't we currently calling chip functions with irq_request_resources()
outside the chip_bus_lock() too in addition to the gpio-omap runtime PM
issue? It seems that the patch from Linus fixes that, no?

Regards,

Tony

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Linus Torvalds

On Tue, Jul 11, 2017 at 9:19 AM, Thomas Gleixner  wrote:
>
> What I do not understand here is that we have already power management
> around all of that.
>
>irq_chip_pm_get(&desc->irq_data);
>...
>chip_bus_lock(desc);
>...
>chip_bus_unlock_sync(desc);
>...
>irq_chip_pm_put(&desc->irq_data);
>
> So why is that not sufficient and needs extra magic in that GPIO driver?

Well, irq_chip_pm_get/put() isn't called just over the operation, it's
called over the *whole* sequence of the irq being enabled at all.

So the different (right now) is that

 - chip_bus_lock/unlock_sync() is purely done around the actual
operations to set up and tear down the irq data.

   So this just covers the very short setup/teardown.

 - irq_chip_pm_get/put() is called around the *whole* "irqs can be active" block

   This covers the whole lifetime of the irq, from setup to free.

Very different.

I'd really prefer my simple patch for now, leaving everything working
the way it used to work. I *think* it's ok for RT too. Yes?

.. and then maybe longer term people can look more at this and clean
up the oddities.

   Linus

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Tony Lindgren

* Sebastian Reichel  [170711 07:41]:
> Ack, that also works for me. The strange thing is, that I added the
> following before and it did not print anything.
> 
> if (!pm_runtime_enabled(bank->chip.parent))
> dev_err(bank->chip.parent, "runtime pm issue!\n");

Enabled but not active, you should have tested for !pm_runtime_active()?

Regards,

Tony

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

On Tue, 11 Jul 2017, Linus Torvalds wrote:

> On Tue, Jul 11, 2017 at 7:41 AM, Thomas Gleixner  wrote:
> >
> > Ah. Now that makes sense.
> >
> > Unpatched the ordering is:
> >
> >   chip_bus_lock(desc);
> >   irq_request_resources(desc);
> 
> I *looked* at that ordering and then went "Naah, that makes no sense".
> 
> But if that's the only issue, how about we just re-order those things
> - we still don't need to move the irq_request_resources() into the
> spinlock, we just move it to below the chip_bus_lock().
> 
> IOW, something like the (COMPLETELY UNTEESTED!) attached patch.
> 
> This assumes that the chip_bus_lock() thing is still ok for the RT
> case, but it looks like it might be: the only other one I looked at
> (apart from the gpio-omap one) used a mutex.

I looked through all of them and the only special case is gpio-omap.

What I do not understand here is that we have already power management
around all of that.

   irq_chip_pm_get(&desc->irq_data);
   ...
   chip_bus_lock(desc);
   ...
   chip_bus_unlock_sync(desc);
   ...
   irq_chip_pm_put(&desc->irq_data);

So why is that not sufficient and needs extra magic in that GPIO driver?

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Tony Lindgren

* Grygorii Strashko  [170711 08:40]:
> Tony, Potentially we can use pm_runtime_force_suspend()/resume() there, but 
> they are not compatible with
> irqoff context (CPUIdle late stages).
> 
> In other words, below patch should fix this issue, but will break CPUIdle on 
> OMAP :(

Thanks, yea let's take a look at it but let's not break cpuidle! People
are using mainline kernel with batteries.

Regards,

Tony

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Sebastian Reichel

Hi,

On Tue, Jul 11, 2017 at 08:40:10AM -0700, Linus Torvalds wrote:
> On Tue, Jul 11, 2017 at 7:41 AM, Thomas Gleixner  wrote:
> >
> > Ah. Now that makes sense.
> >
> > Unpatched the ordering is:
> >
> >   chip_bus_lock(desc);
> >   irq_request_resources(desc);
> 
> I *looked* at that ordering and then went "Naah, that makes no sense".
> 
> But if that's the only issue, how about we just re-order those things
> - we still don't need to move the irq_request_resources() into the
> spinlock, we just move it to below the chip_bus_lock().
> 
> IOW, something like the (COMPLETELY UNTEESTED!) attached patch.

That patch on top of 9967468c0a10 fixes boot on Droid 4.

> This assumes that the chip_bus_lock() thing is still ok for the RT
> case, but it looks like it might be: the only other one I looked at
> (apart from the gpio-omap one) used a mutex.

-- Sebastian


signature.asc
Description: PGP signature

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Tony Lindgren

* Linus Torvalds  [170711 08:40]:
> On Tue, Jul 11, 2017 at 7:41 AM, Thomas Gleixner  wrote:
> >
> > Ah. Now that makes sense.
> >
> > Unpatched the ordering is:
> >
> >   chip_bus_lock(desc);
> >   irq_request_resources(desc);
> 
> I *looked* at that ordering and then went "Naah, that makes no sense".
> 
> But if that's the only issue, how about we just re-order those things
> - we still don't need to move the irq_request_resources() into the
> spinlock, we just move it to below the chip_bus_lock().
> 
> IOW, something like the (COMPLETELY UNTEESTED!) attached patch.

Yeah that fixes the issue:

Tested-by: Tony Lindgren 

> This assumes that the chip_bus_lock() thing is still ok for the RT
> case, but it looks like it might be: the only other one I looked at
> (apart from the gpio-omap one) used a mutex.

Yeah and the ordering below makes more sense to me at least. That is
assuming we want to call chip_bus_lock() before we start calling the
chip functions :)

Regards,

Tony


>  kernel/irq/manage.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 5624b2dd6b58..ea1b9404c041 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1168,17 +1168,17 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
> struct irqaction *new)
>   new->flags &= ~IRQF_ONESHOT;
>  
>   mutex_lock(&desc->request_mutex);
> + chip_bus_lock(desc);
> +
>   if (!desc->action) {
>   ret = irq_request_resources(desc);
>   if (ret) {
>   pr_err("Failed to request resources for %s (irq %d) on 
> irqchip %s\n",
>  new->name, irq, desc->irq_data.chip->name);
> - goto out_mutex;
> + goto out_unlock_chip_bus;
>   }
>   }
>  
> - chip_bus_lock(desc);
> -
>   /*
>* The following block of code has to be executed atomically
>*/
> @@ -1385,12 +1385,11 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
> struct irqaction *new)
>  out_unlock:
>   raw_spin_unlock_irqrestore(&desc->lock, flags);
>  
> - chip_bus_sync_unlock(desc);
> -
>   if (!desc->action)
>   irq_release_resources(desc);
>  
> -out_mutex:
> +out_unlock_chip_bus:
> + chip_bus_sync_unlock(desc);
>   mutex_unlock(&desc->request_mutex);
>  
>  out_thread:

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Tony Lindgren

* Thomas Gleixner  [170711 08:07]:
> On Tue, 11 Jul 2017, Thomas Gleixner wrote:
> > On Tue, 11 Jul 2017, Tony Lindgren wrote:
> > > * Thomas Gleixner  [170711 02:48]:
> > > And "external abort on non-linefetch" means something is not clocked
> > > in this case. The following alone makes things boot for me again, but I 
> > > don't
> > > quite follow what has now changed with the ordering.. Thomas, any ideas?
> > 
> > Ah. Now that makes sense.
> > 
> > Unpatched the ordering is:
> > 
> >   chip_bus_lock(desc);
> >   irq_request_resources(desc);
> > 
> > Now the offending change reordered the calls. OMAP gpio has:
> > 
> > omap_gpio_irq_bus_lock()
> >pm_runtime_get_sync(bank->chip.parent);
> > 
> > So that at least explains the error. So that omap gpio irq chip (ab)uses
> > the bus_lock() callback to do runtime power management. Sigh, I did not
> > expect that. Let me have a deeper look if that's OMAP only or whether this
> > happens in other places as well.

OK that explains.

> So OMAP-GPIO is the only driver which abuses bus_lock/unlock() in that way
> and gets surprised.

OK. Grygorii, care to take a look if there's a better way to deal with
runtime PM for gpio-omap?

Meanwhile, below is my fix again with a proper description so we can
have working -rc1.

Regards,

Tony

8< --
>From tony Mon Sep 17 00:00:00 2001
From: Tony Lindgren 
Date: Tue, 11 Jul 2017 06:40:50 -0700
Subject: [PATCH] gpio: omap: Fix external abort on non-linefetch

Commit 46e48e257360 ("genirq: Move irq resource handling out of spinlocked
region") caused external abort on non-linefetch for n900 and droid 4. Turns
out this was caused by runtime PM use in gpio-omap in chip_bus_lock:

* Thomas Gleixner  [170711 08:07]:
> > Unpatched the ordering is:
> >
> >   chip_bus_lock(desc);
> >   irq_request_resources(desc);
> >
> > Now the offending change reordered the calls. OMAP gpio has:
> >
> > omap_gpio_irq_bus_lock()
> >pm_runtime_get_sync(bank->chip.parent);
> >
> > So that at least explains the error. So that omap gpio irq chip (ab)uses
> > the bus_lock() callback to do runtime power management. Sigh, I did not
> > expect that.

Let's fix it with pm_runtime_get_sync() for now, then further improvments
can be done when we have a better way to deal with it.

Reported-by: Pavel Machek 
Signed-off-by: Tony Lindgren 
---
 drivers/gpio/gpio-omap.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -917,13 +917,24 @@ static int omap_gpio_get_direction(struct gpio_chip 
*chip, unsigned offset)
struct gpio_bank *bank;
unsigned long flags;
void __iomem *reg;
-   int dir;
+   int error, dir;
 
bank = gpiochip_get_data(chip);
reg = bank->base + bank->regs->direction;
+   error = pm_runtime_get_sync(bank->chip.parent);
+   if (error < 0) {
+   dev_err(bank->chip.parent,
+   "Could not enable gpio bank %p: %d\n",
+   bank, error);
+   pm_runtime_put_noidle(bank->chip.parent);
+
+   return error;
+   }
raw_spin_lock_irqsave(&bank->lock, flags);
dir = !!(readl_relaxed(reg) & BIT(offset));
raw_spin_unlock_irqrestore(&bank->lock, flags);
+   pm_runtime_put_sync(bank->chip.parent);
+
return dir;
 }
 
-- 
2.13.2

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Linus Torvalds

On Tue, Jul 11, 2017 at 7:41 AM, Thomas Gleixner  wrote:
>
> Ah. Now that makes sense.
>
> Unpatched the ordering is:
>
>   chip_bus_lock(desc);
>   irq_request_resources(desc);

I *looked* at that ordering and then went "Naah, that makes no sense".

But if that's the only issue, how about we just re-order those things
- we still don't need to move the irq_request_resources() into the
spinlock, we just move it to below the chip_bus_lock().

IOW, something like the (COMPLETELY UNTEESTED!) attached patch.

This assumes that the chip_bus_lock() thing is still ok for the RT
case, but it looks like it might be: the only other one I looked at
(apart from the gpio-omap one) used a mutex.

 Linus
 kernel/irq/manage.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 5624b2dd6b58..ea1b9404c041 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1168,17 +1168,17 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
new->flags &= ~IRQF_ONESHOT;
 
mutex_lock(&desc->request_mutex);
+   chip_bus_lock(desc);
+
if (!desc->action) {
ret = irq_request_resources(desc);
if (ret) {
pr_err("Failed to request resources for %s (irq %d) on 
irqchip %s\n",
   new->name, irq, desc->irq_data.chip->name);
-   goto out_mutex;
+   goto out_unlock_chip_bus;
}
}
 
-   chip_bus_lock(desc);
-
/*
 * The following block of code has to be executed atomically
 */
@@ -1385,12 +1385,11 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
 out_unlock:
raw_spin_unlock_irqrestore(&desc->lock, flags);
 
-   chip_bus_sync_unlock(desc);
-
if (!desc->action)
irq_release_resources(desc);
 
-out_mutex:
+out_unlock_chip_bus:
+   chip_bus_sync_unlock(desc);
mutex_unlock(&desc->request_mutex);
 
 out_thread:

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Grygorii Strashko



On 07/11/2017 09:41 AM, Thomas Gleixner wrote:
> On Tue, 11 Jul 2017, Tony Lindgren wrote:
>> * Thomas Gleixner  [170711 02:48]:
>> And "external abort on non-linefetch" means something is not clocked
>> in this case. The following alone makes things boot for me again, but I don't
>> quite follow what has now changed with the ordering.. Thomas, any ideas?
> 
> Ah. Now that makes sense.
> 
> Unpatched the ordering is:
> 
> chip_bus_lock(desc);
> irq_request_resources(desc);
> 
> Now the offending change reordered the calls. OMAP gpio has:
> 
>  omap_gpio_irq_bus_lock()
> pm_runtime_get_sync(bank->chip.parent);
> 
> So that at least explains the error. So that omap gpio irq chip (ab)uses
> the bus_lock() callback to do runtime power management. Sigh, I did not
> expect that. Let me have a deeper look if that's OMAP only or whether this
> happens in other places as well.

It was the only one way to power on GPIO bank when the first GPIO IRQ is 
requested,
as all other irqchip callbacks are under raw_lock while pm_runtime uses 
spinlock, as
result on -RT it was not possible to use PM runtime in other irqchip callbacks.

Now, I think, It might be possible to use irq_chip_pm_get(), but there is one 
problem -
OMAP Power management platform code can call 
omap2_gpio_prepare_for_idle()/omap2_gpio_resume_after_idle()
which expected to disable GPIO banks using PM runtime and current driver 
implementation
expect to have PM runtime usage_count = 1.

Tony, Potentially we can use pm_runtime_force_suspend()/resume() there, but 
they are not compatible with
irqoff context (CPUIdle late stages).

In other words, below patch should fix this issue, but will break CPUIdle on 
OMAP :(

--
>From dfca1c806f03ad6bdd72b634d71c96d39bda2046 Mon Sep 17 00:00:00 2001
From: Grygorii Strashko 
Date: Tue, 11 Jul 2017 10:36:23 -0500
Subject: [PATCH] gpio: omap: switch to use irq_chip_pm_get/put()

Signed-off-by: Grygorii Strashko 
---
 drivers/gpio/gpio-omap.c | 23 +--
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index ba58c8b..b614475 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -787,26 +787,6 @@ static void omap_gpio_irq_shutdown(struct irq_data *d)
raw_spin_unlock_irqrestore(&bank->lock, flags);
 }
 
-static void omap_gpio_irq_bus_lock(struct irq_data *data)
-{
-   struct gpio_bank *bank = omap_irq_data_get_bank(data);
-
-   if (!BANK_USED(bank))
-   pm_runtime_get_sync(bank->chip.parent);
-}
-
-static void gpio_irq_bus_sync_unlock(struct irq_data *data)
-{
-   struct gpio_bank *bank = omap_irq_data_get_bank(data);
-
-   /*
-* If this is the last IRQ to be freed in the bank,
-* disable the bank module.
-*/
-   if (!BANK_USED(bank))
-   pm_runtime_put(bank->chip.parent);
-}
-
 static void omap_gpio_ack_irq(struct irq_data *d)
 {
struct gpio_bank *bank = omap_irq_data_get_bank(d);
@@ -1168,10 +1148,9 @@ static int omap_gpio_probe(struct platform_device *pdev)
irqc->irq_unmask = omap_gpio_unmask_irq,
irqc->irq_set_type = omap_gpio_irq_type,
irqc->irq_set_wake = omap_gpio_wake_enable,
-   irqc->irq_bus_lock = omap_gpio_irq_bus_lock,
-   irqc->irq_bus_sync_unlock = gpio_irq_bus_sync_unlock,
irqc->name = dev_name(&pdev->dev);
irqc->flags = IRQCHIP_MASK_ON_SUSPEND;
+   irqc->parent_device = dev;
 
bank->irq = platform_get_irq(pdev, 0);
if (bank->irq <= 0) {
-- 
2.10.1


-- 
regards,
-grygorii

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

On Tue, 11 Jul 2017, Thomas Gleixner wrote:
> On Tue, 11 Jul 2017, Tony Lindgren wrote:
> > * Thomas Gleixner  [170711 02:48]:
> > And "external abort on non-linefetch" means something is not clocked
> > in this case. The following alone makes things boot for me again, but I 
> > don't
> > quite follow what has now changed with the ordering.. Thomas, any ideas?
> 
> Ah. Now that makes sense.
> 
> Unpatched the ordering is:
> 
> chip_bus_lock(desc);
> irq_request_resources(desc);
> 
> Now the offending change reordered the calls. OMAP gpio has:
> 
> omap_gpio_irq_bus_lock()
>pm_runtime_get_sync(bank->chip.parent);
> 
> So that at least explains the error. So that omap gpio irq chip (ab)uses
> the bus_lock() callback to do runtime power management. Sigh, I did not
> expect that. Let me have a deeper look if that's OMAP only or whether this
> happens in other places as well.

So OMAP-GPIO is the only driver which abuses bus_lock/unlock() in that way
and gets surprised.

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Sebastian Reichel

Hi,

On Tue, Jul 11, 2017 at 06:51:32AM -0700, Tony Lindgren wrote:
> * Thomas Gleixner  [170711 02:48]:
> > On Tue, 11 Jul 2017, Thomas Gleixner wrote:
> > 
> > So Tony actually provided the part of dmesg which shows the initial
> > failure, which subsequently leads to the splat Sebastian reported.
> > 
> > Unhandled fault: external abort on non-linefetch (0x1028) at 0xfb050034
> > pgd = c0004000 [fb050034] *pgd=49011452(bad)
> > Internal error: : 1028 [#1] SMP ARM
> > Workqueue: events deferred_probe_work_func
> > task: ce1d41c0 task.stack: ce1fc000
> > PC is at omap_gpio_get_direction+0x2c/0x44
> > LR is at _raw_spin_lock_irqsave+0x40/0x4c
> > pc : []lr : []psr: 6093
> > sp : ce1fdb78  ip : c0dce42c  fp : ce22d810
> > r10: ce22d800  r9 :   r8 : ce22d900
> > r7 : 0016  r6 : ce223864  r5 : fb050034  r4 : 0020
> > r3 : ce1d41c0  r2 :   r1 : a013  r0 : a013
> > Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> > Control: 10c5387d  Table: 80004019  DAC: 0051
> > Process kworker/0:1 (pid: 14, stack limit = 0xce1fc218)
> > 
> > The callstack is:
> > 
> > omap_gpio_get_direction
> > gpiochip_lock_as_irq
> > gpiochip_irq_reqres
> > __setup_irq
> > request_threaded_irq
> > smc_probe
> > smc_drv_probe
> > platform_drv_probe
> > 
> > 
> > So the SMC91X network driver request an IRQ, which ends up calling into the
> > GPIO interrupt setup and that fails. I have no idea why that would not fail
> > with the patch reverted. Dusting off a Beaglebone board
> 
> And "external abort on non-linefetch" means something is not clocked
> in this case. The following alone makes things boot for me again, but I don't
> quite follow what has now changed with the ordering.. Thomas, any ideas?
> 
> Anyways, adding Linus W and Grygorii to Cc since things now point to
> gpio-omap.
> 
> Regards,
> 
> Tony

Ack, that also works for me. The strange thing is, that I added the
following before and it did not print anything.

if (!pm_runtime_enabled(bank->chip.parent))
dev_err(bank->chip.parent, "runtime pm issue!\n");

-- Sebastian

> 8< -
> diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
> --- a/drivers/gpio/gpio-omap.c
> +++ b/drivers/gpio/gpio-omap.c
> @@ -919,13 +919,24 @@ static int omap_gpio_get_direction(struct gpio_chip 
> *chip, unsigned offset)
>   struct gpio_bank *bank;
>   unsigned long flags;
>   void __iomem *reg;
> - int dir;
> + int error, dir;
>  
>   bank = gpiochip_get_data(chip);
>   reg = bank->base + bank->regs->direction;
> + error = pm_runtime_get_sync(bank->chip.parent);
> + if (error < 0) {
> + dev_err(bank->chip.parent,
> + "Could not enable gpio bank %p: %d\n",
> + bank, error);
> + pm_runtime_put_noidle(bank->chip.parent);
> +
> + return error;
> + }
>   raw_spin_lock_irqsave(&bank->lock, flags);
>   dir = !!(readl_relaxed(reg) & BIT(offset));
>   raw_spin_unlock_irqrestore(&bank->lock, flags);
> + pm_runtime_put_sync(bank->chip.parent);
> +
>   return dir;
>  }
>  
> -- 
> 2.13.2


signature.asc
Description: PGP signature

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

On Tue, 11 Jul 2017, Tony Lindgren wrote:
> * Thomas Gleixner  [170711 02:48]:
> And "external abort on non-linefetch" means something is not clocked
> in this case. The following alone makes things boot for me again, but I don't
> quite follow what has now changed with the ordering.. Thomas, any ideas?

Ah. Now that makes sense.

Unpatched the ordering is:

  chip_bus_lock(desc);
  irq_request_resources(desc);

Now the offending change reordered the calls. OMAP gpio has:

omap_gpio_irq_bus_lock()
   pm_runtime_get_sync(bank->chip.parent);

So that at least explains the error. So that omap gpio irq chip (ab)uses
the bus_lock() callback to do runtime power management. Sigh, I did not
expect that. Let me have a deeper look if that's OMAP only or whether this
happens in other places as well.

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Sebastian Reichel

Hi,

On Tue, Jul 11, 2017 at 02:51:23PM +0100, Marc Zyngier wrote:
> On 11/07/17 12:21, Sebastian Reichel wrote:
> > Hi,
> > 
> > On Tue, Jul 11, 2017 at 12:52:17PM +0200, Thomas Gleixner wrote:
> >> On Tue, 11 Jul 2017, Thomas Gleixner wrote:
> >>> On Tue, 11 Jul 2017, Sebastian Reichel wrote:
> >>> So this crashes in do_raw_spin_unlock_irqrestore() !?! I just have to
> >>> wonder how the raw_spin_lock() succeeded. That does not make any sense.
> >>
> >> can you please apply the patch below on top of 4.12? It's a backport
> >> isolating the resource request changes.
> > 
> > Full bootlog for v4.12 + your patch is below. I used the same
> > .config with oldconfig.
> 
> [...]
> 
> > [1.329315] cpcap-core spi1.0: CPCAP vendor: ST rev: 2.10 (1a)
> > [1.336914] Unhandled fault: imprecise external abort (0x1406) at 
> > 0x
> > [1.343994] pgd = c0004000
> > [1.346710] [] *pgd=
> > [1.350341] Internal error: : 1406 [#1] SMP ARM
> > [1.354888] Modules linked in:
> > [1.357971] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> > 4.12.0-1-g2a481f732c4b #1539
> > [1.365936] Hardware name: Generic OMAP4 (Flattened Device Tree)
> > [1.371978] task: ee8aadc0 task.stack: ee8ac000
> > [1.376556] PC is at lock_release+0x25c/0x360
> > [1.380950] LR is at lock_release+0x25c/0x360
> > [1.385314] pc : []lr : []psr: 2093
> > [1.385314] sp : ee8adb60  ip : c10fc43c  fp : eea05010
> > [1.396881] r10: 0001  r9 : c10f1e70  r8 : 6093
> > [1.402130] r7 : c1007b6c  r6 : c0520658  r5 : ee9fd274  r4 : a013
> > [1.408691] r3 : ee8aadc0  r2 : 0003  r1 : 0003  r0 : 
> > [1.415252] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  
> > Segment none
> > [1.422515] Control: 10c5387d  Table: 8000404a  DAC: 0051
> > [1.428314] Process swapper/0 (pid: 1, stack limit = 0xee8ac218)
> > [1.434356] Stack: (0xee8adb60 to 0xee8ae000)
> > [1.438751] db60: a013 c0520648 0007 a013 ee9fd264 ee9fd264 
> > 0007 eea05100
> > [1.446960] db80: c061fe0c eea05000 eea05010 c0add86c  fc310134 
> > ee9fd264 c0520658
> > [1.455200] dba0: 0020 ee9fd2a4 ee9fd070  eea05100 c05196f0 
> > ee9fd2a4 eea05010
> > [1.463439] dbc0: eef1d580 c0519c20 eea05000 0021 eef1d580 c01a9508 
> > 000f c01aa584
> > [1.471649] dbe0: ee8000c0 6013  eef1d580  c01a77b8 
> > eef9c400 0021
> > [1.479888] dc00: c061fe0c eea05000 eea05010 c01a9890 2084 0204 
> > eef9c400 c10a7bc8
> > [1.488128] dc20: 0001 eef18600    c0620d7c 
> > c0d93c60 eef9c400
> > [1.496368] dc40:  efd93038 ef6a85d0 014e 0021 0084 
> > 0084 eef18600
> > [1.504577] dc60: eef1de90 0021 0084 eef19000 0010 eef1e93c 
> > c0b613dc c0620f54
> > [1.512817] dc80: c10a7bc8 ee8adc8c 0004 eef19000 eef19000 eef18600 
> > 0021 eef17f90
> > [1.521057] dca0: eef1e810 c062bd68  c10a7bc8 eef17f98 ee8aadc0 
> > 0001 
> > [1.529266] dcc0: c10a7bc8 0010   eef19000 eef19000 
> > eef17f90 
> > [1.537506] dce0: 0010 001a 0013   c062bef0 
> > 000a 001a
> > [1.545745] dd00:  0013  eef19000 c10a7b78  
> > c10a7b88 
> > [1.553985] dd20:  c06a06b8 eef19000 c18bfe4c  c05fcd9c 
> >  ee8add70
> > [1.562194] dd40: c05fcee8 0001  c18bfe08  c05fb2d4 
> > ee9eccd4 eef13c54
> > [1.570434] dd60: eef19000 eef19034 c10affe0 c05fca58 eef19000 0001 
> > c18bfe08 eef19008
> > [1.578674] dd80: eef19000 c10affe0  c05fc0d4 eef19008 eecdc000 
> > eef19000 c05fa478
> > [1.586914] dda0:  eef19000 eef19260  eef19000 eecdc000 
> >  eea61c10
> > [1.595123] ddc0: 0001  c0d7f208 c06a184c eecdc000 ef6e9a2c 
> > ef6e9a7c eef19000
> > [1.603363] dde0: 0001 c06a20ac  0002 c0add880 eea61c10 
> > c06a1bd8 002dc6c0
> > [1.611602] de00: eea61c10 eef17210 eecdc000 eecdc000 eea61c10 eea61c10 
> > c0da4ba0 c0da4b98
> > [1.619812] de20: 01f0 c06a243c  eecdc4e0 eecdc000 eecdc000 
> > eea61c10 c06a5e60
> > [1.628051] de40:  6013 c1897138 0004 8132535b eea61c10 
> > ffed c10b0c74
> > [1.636291] de60: fdfb   c0f66858 c0f005a8 c05fecf8 
> > eea61c10 c18bfe4c
> > [1.644531] de80:  c10b0c74  c05fcd9c eea61c10 c10b0c74 
> > eea61c44 
> > [1.652740] dea0: c10f8000 0007 c0f66858 c05fcee4  c10b0c74 
> > c05fce24 c05fb228
> > [1.660980] dec0: ee8a58a4 eea5ac50 c10b0c74 eef13580 c10a5720 c05fc2e4 
> > c0d332c0 c0f4365c
> > [1.669219] dee0:  c10b0c74 c0f4365c  c0e4f6ec c05fdd28 
> > e000 c0f4365c
> > [1.677429] df00:  c0101874 0134  efffec00

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Tony Lindgren

* Thomas Gleixner  [170711 02:48]:
> On Tue, 11 Jul 2017, Thomas Gleixner wrote:
> 
> So Tony actually provided the part of dmesg which shows the initial
> failure, which subsequently leads to the splat Sebastian reported.
> 
> Unhandled fault: external abort on non-linefetch (0x1028) at 0xfb050034
> pgd = c0004000 [fb050034] *pgd=49011452(bad)
> Internal error: : 1028 [#1] SMP ARM
> Workqueue: events deferred_probe_work_func
> task: ce1d41c0 task.stack: ce1fc000
> PC is at omap_gpio_get_direction+0x2c/0x44
> LR is at _raw_spin_lock_irqsave+0x40/0x4c
> pc : []lr : []psr: 6093
> sp : ce1fdb78  ip : c0dce42c  fp : ce22d810
> r10: ce22d800  r9 :   r8 : ce22d900
> r7 : 0016  r6 : ce223864  r5 : fb050034  r4 : 0020
> r3 : ce1d41c0  r2 :   r1 : a013  r0 : a013
> Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 80004019  DAC: 0051
> Process kworker/0:1 (pid: 14, stack limit = 0xce1fc218)
> 
> The callstack is:
> 
> omap_gpio_get_direction
> gpiochip_lock_as_irq
> gpiochip_irq_reqres
> __setup_irq
> request_threaded_irq
> smc_probe
> smc_drv_probe
> platform_drv_probe
> 
> 
> So the SMC91X network driver request an IRQ, which ends up calling into the
> GPIO interrupt setup and that fails. I have no idea why that would not fail
> with the patch reverted. Dusting off a Beaglebone board

And "external abort on non-linefetch" means something is not clocked
in this case. The following alone makes things boot for me again, but I don't
quite follow what has now changed with the ordering.. Thomas, any ideas?

Anyways, adding Linus W and Grygorii to Cc since things now point to
gpio-omap.

Regards,

Tony

8< -
diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -919,13 +919,24 @@ static int omap_gpio_get_direction(struct gpio_chip 
*chip, unsigned offset)
struct gpio_bank *bank;
unsigned long flags;
void __iomem *reg;
-   int dir;
+   int error, dir;
 
bank = gpiochip_get_data(chip);
reg = bank->base + bank->regs->direction;
+   error = pm_runtime_get_sync(bank->chip.parent);
+   if (error < 0) {
+   dev_err(bank->chip.parent,
+   "Could not enable gpio bank %p: %d\n",
+   bank, error);
+   pm_runtime_put_noidle(bank->chip.parent);
+
+   return error;
+   }
raw_spin_lock_irqsave(&bank->lock, flags);
dir = !!(readl_relaxed(reg) & BIT(offset));
raw_spin_unlock_irqrestore(&bank->lock, flags);
+   pm_runtime_put_sync(bank->chip.parent);
+
return dir;
 }
 
-- 
2.13.2

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Marc Zyngier

On 11/07/17 12:21, Sebastian Reichel wrote:
> Hi,
> 
> On Tue, Jul 11, 2017 at 12:52:17PM +0200, Thomas Gleixner wrote:
>> On Tue, 11 Jul 2017, Thomas Gleixner wrote:
>>> On Tue, 11 Jul 2017, Sebastian Reichel wrote:
>>> So this crashes in do_raw_spin_unlock_irqrestore() !?! I just have to
>>> wonder how the raw_spin_lock() succeeded. That does not make any sense.
>>
>> can you please apply the patch below on top of 4.12? It's a backport
>> isolating the resource request changes.
> 
> Full bootlog for v4.12 + your patch is below. I used the same
> .config with oldconfig.

[...]

> [1.329315] cpcap-core spi1.0: CPCAP vendor: ST rev: 2.10 (1a)
> [1.336914] Unhandled fault: imprecise external abort (0x1406) at 
> 0x
> [1.343994] pgd = c0004000
> [1.346710] [] *pgd=
> [1.350341] Internal error: : 1406 [#1] SMP ARM
> [1.354888] Modules linked in:
> [1.357971] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 4.12.0-1-g2a481f732c4b #1539
> [1.365936] Hardware name: Generic OMAP4 (Flattened Device Tree)
> [1.371978] task: ee8aadc0 task.stack: ee8ac000
> [1.376556] PC is at lock_release+0x25c/0x360
> [1.380950] LR is at lock_release+0x25c/0x360
> [1.385314] pc : []lr : []psr: 2093
> [1.385314] sp : ee8adb60  ip : c10fc43c  fp : eea05010
> [1.396881] r10: 0001  r9 : c10f1e70  r8 : 6093
> [1.402130] r7 : c1007b6c  r6 : c0520658  r5 : ee9fd274  r4 : a013
> [1.408691] r3 : ee8aadc0  r2 : 0003  r1 : 0003  r0 : 
> [1.415252] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
> none
> [1.422515] Control: 10c5387d  Table: 8000404a  DAC: 0051
> [1.428314] Process swapper/0 (pid: 1, stack limit = 0xee8ac218)
> [1.434356] Stack: (0xee8adb60 to 0xee8ae000)
> [1.438751] db60: a013 c0520648 0007 a013 ee9fd264 ee9fd264 
> 0007 eea05100
> [1.446960] db80: c061fe0c eea05000 eea05010 c0add86c  fc310134 
> ee9fd264 c0520658
> [1.455200] dba0: 0020 ee9fd2a4 ee9fd070  eea05100 c05196f0 
> ee9fd2a4 eea05010
> [1.463439] dbc0: eef1d580 c0519c20 eea05000 0021 eef1d580 c01a9508 
> 000f c01aa584
> [1.471649] dbe0: ee8000c0 6013  eef1d580  c01a77b8 
> eef9c400 0021
> [1.479888] dc00: c061fe0c eea05000 eea05010 c01a9890 2084 0204 
> eef9c400 c10a7bc8
> [1.488128] dc20: 0001 eef18600    c0620d7c 
> c0d93c60 eef9c400
> [1.496368] dc40:  efd93038 ef6a85d0 014e 0021 0084 
> 0084 eef18600
> [1.504577] dc60: eef1de90 0021 0084 eef19000 0010 eef1e93c 
> c0b613dc c0620f54
> [1.512817] dc80: c10a7bc8 ee8adc8c 0004 eef19000 eef19000 eef18600 
> 0021 eef17f90
> [1.521057] dca0: eef1e810 c062bd68  c10a7bc8 eef17f98 ee8aadc0 
> 0001 
> [1.529266] dcc0: c10a7bc8 0010   eef19000 eef19000 
> eef17f90 
> [1.537506] dce0: 0010 001a 0013   c062bef0 
> 000a 001a
> [1.545745] dd00:  0013  eef19000 c10a7b78  
> c10a7b88 
> [1.553985] dd20:  c06a06b8 eef19000 c18bfe4c  c05fcd9c 
>  ee8add70
> [1.562194] dd40: c05fcee8 0001  c18bfe08  c05fb2d4 
> ee9eccd4 eef13c54
> [1.570434] dd60: eef19000 eef19034 c10affe0 c05fca58 eef19000 0001 
> c18bfe08 eef19008
> [1.578674] dd80: eef19000 c10affe0  c05fc0d4 eef19008 eecdc000 
> eef19000 c05fa478
> [1.586914] dda0:  eef19000 eef19260  eef19000 eecdc000 
>  eea61c10
> [1.595123] ddc0: 0001  c0d7f208 c06a184c eecdc000 ef6e9a2c 
> ef6e9a7c eef19000
> [1.603363] dde0: 0001 c06a20ac  0002 c0add880 eea61c10 
> c06a1bd8 002dc6c0
> [1.611602] de00: eea61c10 eef17210 eecdc000 eecdc000 eea61c10 eea61c10 
> c0da4ba0 c0da4b98
> [1.619812] de20: 01f0 c06a243c  eecdc4e0 eecdc000 eecdc000 
> eea61c10 c06a5e60
> [1.628051] de40:  6013 c1897138 0004 8132535b eea61c10 
> ffed c10b0c74
> [1.636291] de60: fdfb   c0f66858 c0f005a8 c05fecf8 
> eea61c10 c18bfe4c
> [1.644531] de80:  c10b0c74  c05fcd9c eea61c10 c10b0c74 
> eea61c44 
> [1.652740] dea0: c10f8000 0007 c0f66858 c05fcee4  c10b0c74 
> c05fce24 c05fb228
> [1.660980] dec0: ee8a58a4 eea5ac50 c10b0c74 eef13580 c10a5720 c05fc2e4 
> c0d332c0 c0f4365c
> [1.669219] dee0:  c10b0c74 c0f4365c  c0e4f6ec c05fdd28 
> e000 c0f4365c
> [1.677429] df00:  c0101874 0134  efffec00 efffecdd 
> c0e50efc 0134
> [1.685668] df20: 0134 c015f5dc c0e4f6ec  0006 0006 
> efffecdd 
> [1.693908] df40: c0f7f7cc 0006 c10f8000 c0f6684c c0f7fe74 c10f8000 
> c0f66850 c10f8000
> [1.702148] df60: 0007 c0f00eb

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

On Tue, 11 Jul 2017, Sebastian Reichel wrote:
> On Tue, Jul 11, 2017 at 12:52:17PM +0200, Thomas Gleixner wrote:
> > On Tue, 11 Jul 2017, Thomas Gleixner wrote:
> > > On Tue, 11 Jul 2017, Sebastian Reichel wrote:
> > > So this crashes in do_raw_spin_unlock_irqrestore() !?! I just have to
> > > wonder how the raw_spin_lock() succeeded. That does not make any sense.
> > 
> > can you please apply the patch below on top of 4.12? It's a backport
> > isolating the resource request changes.
> 
> Full bootlog for v4.12 + your patch is below. I used the same
> .config with oldconfig.
> [1.329315] cpcap-core spi1.0: CPCAP vendor: ST rev: 2.10 (1a)
> [1.336914] Unhandled fault: imprecise external abort (0x1406) at 
> 0x
> [1.343994] pgd = c0004000
> [1.346710] [] *pgd=
> [1.350341] Internal error: : 1406 [#1] SMP ARM
> [1.354888] Modules linked in:
> [1.357971] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 4.12.0-1-g2a481f732c4b #1539
> [1.365936] Hardware name: Generic OMAP4 (Flattened Device Tree)
> [1.371978] task: ee8aadc0 task.stack: ee8ac000
> [1.376556] PC is at lock_release+0x25c/0x360
> [1.380950] LR is at lock_release+0x25c/0x360

Slightly different function, but in the same way non sensical.

I used your config with the extra bits for beaglebone XM enabled and it
just works. The irq_request_resources() callback is invoked and ends up in
omap_gpio_get_direction() without any complaints.

I'm running out of ideas right now.

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Sebastian Reichel

Hi,

On Tue, Jul 11, 2017 at 12:52:17PM +0200, Thomas Gleixner wrote:
> On Tue, 11 Jul 2017, Thomas Gleixner wrote:
> > On Tue, 11 Jul 2017, Sebastian Reichel wrote:
> > So this crashes in do_raw_spin_unlock_irqrestore() !?! I just have to
> > wonder how the raw_spin_lock() succeeded. That does not make any sense.
> 
> can you please apply the patch below on top of 4.12? It's a backport
> isolating the resource request changes.

Full bootlog for v4.12 + your patch is below. I used the same
.config with oldconfig.

-- Sebastian

[0.00] Booting Linux on physical CPU 0x0
[0.00] Linux version 4.12.0-1-g2a481f732c4b (sre@earth) (gcc 
version 6.3.0 20170516 (Debian 6.3.0-18) ) #1539 SMP Tue Jul 11 13:13:20 CEST 
2017
[0.00] CPU: ARMv7 Processor [411fc093] revision 3 (ARMv7), cr=10c5387d
[0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing 
instruction cache
[0.00] OF: fdt: Machine model: Motorola Droid 4 XT894
[0.00] earlycon: omap8250 at MMIO 0x4802 (options '')
[0.00] bootconsole [omap8250] enabled
[0.00] Memory policy: Data cache writealloc
[0.00] cma: Reserved 16 MiB at 0xbe80
[0.00] OMAP4: Map 0xbfb0 to fe60 for dram barrier
[0.00] OMAP4430 ES2.3
[0.00] percpu: Embedded 18 pages/cpu @ef69b000 s41256 r8192 d24280 
u73728
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 259136
[0.00] Kernel command line: root=/dev/mmcblk0p1 rootwait rw 
console=tty0 console=ttyS2,115200 fbcon=rotate:1 earlyprintk earlycon
[0.00] PID hash table entries: 4096 (order: 2, 16384 bytes)
[0.00] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[0.00] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[0.00] Memory: 992184K/1043456K available (10240K kernel code, 985K 
rwdata, 3396K rodata, 1024K init, 8063K bss, 34888K reserved, 16384K 
cma-reserved, 240640K highmem)
[0.00] Virtual kernel memory layout:
[0.00] vector  : 0x - 0x1000   (   4 kB)
[0.00] fixmap  : 0xffc0 - 0xfff0   (3072 kB)
[0.00] vmalloc : 0xf080 - 0xff80   ( 240 MB)
[0.00] lowmem  : 0xc000 - 0xf000   ( 768 MB)
[0.00] pkmap   : 0xbfe0 - 0xc000   (   2 MB)
[0.00] modules : 0xbf00 - 0xbfe0   (  14 MB)
[0.00]   .text : 0xc0008000 - 0xc0b0   (11232 kB)
[0.00]   .init : 0xc0f0 - 0xc100   (1024 kB)
[0.00]   .data : 0xc100 - 0xc10f6514   ( 986 kB)
[0.00].bss : 0xc10f8000 - 0xc18d7f74   (8064 kB)
[0.00] Running RCU self tests
[0.00] Hierarchical RCU implementation.
[0.00]  RCU lockdep checking is enabled.
[0.00] NR_IRQS:16 nr_irqs:16 16
[0.00] L2C: platform modifies aux control register: 0x5e47 -> 
0x7e47
[0.00] L2C: DT/platform modifies aux control register: 0x5e47 -> 
0x7e47
[0.00] L2C-310 erratum 727915 enabled
[0.00] L2C-310 enabling early BRESP for Cortex-A9
[0.00] L2C-310 ID prefetch enabled, offset 8 lines
[0.00] L2C-310 cache controller enabled, 16 ways, 1024 kB
[0.00] L2C-310: CACHE_ID 0x41c4, AUX_CTRL 0x7e47
[0.00] ti_dt_clocks_register: failed to lookup clock node dss_fck
[0.00] OMAP clockevent source: timer1 at 32768 Hz
[0.00] clocksource: 32k_counter: mask: 0x max_cycles: 
0x, max_idle_ns: 58327039986419 ns
[0.00] sched_clock: 32 bits at 32kHz, resolution 30517ns, wraps every 
6553584741ns
[0.008605] OMAP clocksource: 32k_counter at 32768 Hz
[0.014862] Console: colour dummy device 80x30
[0.021179] console [tty0] enabled
[0.024688] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., 
Ingo Molnar
[0.032745] ... MAX_LOCKDEP_SUBCLASSES:  8
[0.036956] ... MAX_LOCK_DEPTH:  48
[0.041320] ... MAX_LOCKDEP_KEYS:8191
[0.045806] ... CLASSHASH_SIZE:  4096
[0.050354] ... MAX_LOCKDEP_ENTRIES: 32768
[0.054931] ... MAX_LOCKDEP_CHAINS:  65536
[0.059539] ... CHAINHASH_SIZE:  32768
[0.064147]  memory used by lock dependency info: 5167 kB
[0.069732]  per task-struct memory footprint: 1536 bytes
[0.075347] Calibrating delay loop... 2387.14 BogoMIPS (lpj=11935744)
[0.136138] pid_max: default: 32768 minimum: 301
[0.141235] Security Framework initialized
[0.145568] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
[0.152435] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
[0.161865] CPU: Testing write buffer coherency: ok
[0.167938] CPU0: thread -1, cpu 0, socket 0, mpidr 8000
[0.173858] smp: CPU1 parked within kernel, needs reset (0x80011ba8 
0x80067478)
[0.182342] Setting up static identity map for 0x8010 - 0x80100078
[0.190734] smp

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

Sebastian,

On Tue, 11 Jul 2017, Thomas Gleixner wrote:
> On Tue, 11 Jul 2017, Sebastian Reichel wrote:
> So this crashes in do_raw_spin_unlock_irqrestore() !?! I just have to
> wonder how the raw_spin_lock() succeeded. That does not make any sense.

can you please apply the patch below on top of 4.12? It's a backport
isolating the resource request changes.

Thanks,

tglx

8<--
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1198,6 +1198,18 @@ static int
if (desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE)
new->flags &= ~IRQF_ONESHOT;
 
+   mutex_lock(&desc->request_mutex);
+   if (!desc->action) {
+   ret = irq_request_resources(desc);
+   if (ret) {
+   pr_err("Failed to request resources for %s (irq %d) on 
irqchip %s\n",
+  new->name, irq, desc->irq_data.chip->name);
+   goto out_mutex;
+   }
+   }
+
+   chip_bus_lock(desc);
+
/*
 * The following block of code has to be executed atomically
 */
@@ -1298,13 +1310,6 @@ static int
}
 
if (!shared) {
-   ret = irq_request_resources(desc);
-   if (ret) {
-   pr_err("Failed to request resources for %s (irq %d) on 
irqchip %s\n",
-  new->name, irq, desc->irq_data.chip->name);
-   goto out_mask;
-   }
-
init_waitqueue_head(&desc->wait_for_threads);
 
/* Setup the type (level, edge polarity) if configured: */
@@ -1312,10 +1317,8 @@ static int
ret = __irq_set_trigger(desc,
new->flags & IRQF_TRIGGER_MASK);
 
-   if (ret) {
-   irq_release_resources(desc);
+   if (ret)
goto out_mask;
-   }
}
 
desc->istate &= ~(IRQS_AUTODETECT | IRQS_SPURIOUS_DISABLED | \
@@ -1373,6 +1376,8 @@ static int
}
 
raw_spin_unlock_irqrestore(&desc->lock, flags);
+   chip_bus_sync_unlock(desc);
+   mutex_unlock(&desc->request_mutex);
 
/*
 * Strictly no need to wake it up, but hung_task complains
@@ -1402,6 +1407,13 @@ static int
 
 out_mask:
raw_spin_unlock_irqrestore(&desc->lock, flags);
+   chip_bus_sync_unlock(desc);
+   if (!desc->action)
+   irq_release_resources(desc);
+
+out_mutex:
+   mutex_unlock(&desc->request_mutex);
+
free_cpumask_var(mask);
 
 out_thread:
@@ -1443,9 +1455,7 @@ int setup_irq(unsigned int irq, struct i
if (retval < 0)
return retval;
 
-   chip_bus_lock(desc);
retval = __setup_irq(irq, desc, act);
-   chip_bus_sync_unlock(desc);
 
if (retval)
irq_chip_pm_put(&desc->irq_data);
@@ -1469,6 +1479,7 @@ static struct irqaction *__free_irq(unsi
if (!desc)
return NULL;
 
+   mutex_lock(&desc->request_mutex);
chip_bus_lock(desc);
raw_spin_lock_irqsave(&desc->lock, flags);
 
@@ -1501,7 +1512,6 @@ static struct irqaction *__free_irq(unsi
if (!desc->action) {
irq_settings_clr_disable_unlazy(desc);
irq_shutdown(desc);
-   irq_release_resources(desc);
}
 
 #ifdef CONFIG_SMP
@@ -1543,6 +1553,11 @@ static struct irqaction *__free_irq(unsi
}
}
 
+   if (!desc->action)
+   irq_release_resources(desc);
+
+   mutex_unlock(&desc->request_mutex);
+
irq_chip_pm_put(&desc->irq_data);
module_put(desc->owner);
kfree(action->secondary);
@@ -1699,9 +1714,7 @@ int request_threaded_irq(unsigned int ir
return retval;
}
 
-   chip_bus_lock(desc);
retval = __setup_irq(irq, desc, action);
-   chip_bus_sync_unlock(desc);
 
if (retval) {
irq_chip_pm_put(&desc->irq_data);
@@ -1949,9 +1962,7 @@ int setup_percpu_irq(unsigned int irq, s
if (retval < 0)
return retval;
 
-   chip_bus_lock(desc);
retval = __setup_irq(irq, desc, act);
-   chip_bus_sync_unlock(desc);
 
if (retval)
irq_chip_pm_put(&desc->irq_data);
@@ -2005,9 +2016,7 @@ int request_percpu_irq(unsigned int irq,
return retval;
}
 
-   chip_bus_lock(desc);
retval = __setup_irq(irq, desc, action);
-   chip_bus_sync_unlock(desc);
 
if (retval) {
irq_chip_pm_put(&desc->irq_data);
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * Core internal functions to deal with irq descriptors
@@ -45,6 +46,7 @@ struct pt_regs;
  * IRQF_FORCE_RESUME set
  * @rcu:   rcu head for delayed free

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

On Tue, 11 Jul 2017, Sebastian Reichel wrote:
> There you go (this is basically 9967468c0a10). The referenced
> cpcap is a PMIC, that uses one of OMAP's GPIOs to generate
> interrupts and (among other things) provides an interrupt
> controller.
> 
> [1.328521] cpcap-core spi1.0: CPCAP vendor: ST rev: 2.10 (1a)
> [1.336334] Unhandled fault: imprecise external abort (0x1406) at 
> 0x
> [1.343536] pgd = c0004000
> [1.346282] [] *pgd=
> [1.349914] Internal error: : 1406 [#1] SMP ARM
> [1.354492] Modules linked in:
> [1.357574] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 4.12.0-10625-gcb473a1c6f03 #531
> [1.365447] Hardware name: Generic OMAP4 (Flattened Device Tree)
> [1.371520] task: ee8aae00 task.stack: ee8ac000
> [1.376098] PC is at do_raw_spin_unlock+0x58/0x120
> [1.380920] LR is at _raw_spin_unlock_irqrestore+0x24/0x44
> [1.386444] pc : []lr : []psr: 6093
> [1.392761] sp : ee8adba0  ip : c10fe40c  fp : eea0ac10
> [1.398040] r10: eea0ac00  r9 : c061f764  r8 : eea0ad04
> [1.403289] r7 : 0007  r6 : eea07e64  r5 : e000  r4 : eea07e64
> [1.409881] r3 :   r2 :   r1 : ee8aae00  r0 : eea07e64
> [1.416442] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
> none
> [1.423736] Control: 10c5387d  Table: 8000404a  DAC: 0051
> [1.429534] Process swapper/0 (pid: 1, stack limit = 0xee8ac218)
> [1.435577] Stack: (0xee8adba0 to 0xee8ae000)

> [1.728546] [] (do_raw_spin_unlock) from [] 
> (_raw_spin_unlock_irqrestore+0x24/0x44)
> [1.738037] [] (_raw_spin_unlock_irqrestore) from [] 
> (omap_gpio_get_direction+0x38/0x44)
> [1.747955] [] (omap_gpio_get_direction) from [] 
> (gpiochip_lock_as_irq+0x98/0xe4)
> [1.757232] [] (gpiochip_lock_as_irq) from [] 
> (gpiochip_irq_reqres+0x2c/0x6c)
> [1.766174] [] (gpiochip_irq_reqres) from [] 
> (__setup_irq+0x478/0x6ec)
> [1.774536] [] (__setup_irq) from [] 
> (request_threaded_irq+0xcc/0x14c)

So this crashes in do_raw_spin_unlock_irqrestore() !?! I just have to
wonder how the raw_spin_lock() succeeded. That does not make any sense.

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-11 Thread Thomas Gleixner

On Tue, 11 Jul 2017, Thomas Gleixner wrote:
> On Mon, 10 Jul 2017, Linus Torvalds wrote:
> > On Mon, Jul 10, 2017 at 6:35 AM, Sebastian Reichel
> >  wrote:
> > >
> > > This patch apparently breaks OMAP platform:
> > >
> > > 46e48e257360f0845fe17089713cbad4db611e70 is the first bad commit
> > > commit 46e48e257360f0845fe17089713cbad4db611e70
> > > Author: Thomas Gleixner 
> > > Date:   Thu Jun 29 23:33:38 2017 +0200
> > >
> > > genirq: Move irq resource handling out of spinlocked region
> > >
> > > Boot failure log from Droid 4:
> > > [ ... snip snip ..]
> > >
> > > Droid 4 boots current master again after applying the patch below
> > > (which is git revet of above patch, but I provide the patch, since
> > > it did not revet cleanly).
> > 
> > Hmm. Do you actually need the full revert?
> > 
> > I think it's only the __setup_irq() part that looks like it may be garbage.
> > 
> > For example, I think it releases the resources twice if the
> > __irq_set_trigger() call fails.
> 
> Yes, I missed that. Sorry.
> 
> > But it looks questionably in other ways too - notably, the change to
> > make the request call be in the same context as the freeing is done is
> > apparently done entirely for symmetry reasons, not for any actual
> > *reason* reasons.
> 
> There is a reasons reason. The whole purpose was to move out the
> request/free resources call from the spinlocked and irq disabled reason.
> I noticed the free ordering issue, when I was working on that.
> 
> The fact that the patch breaks the OMAP boot, points to something else.
> 
> The only user of the irq_request_resources() callback at the moment is the
> GPIO subsystem and some pinctrl drivers, which are not involved in the OMAP
> case. In case of OMAP it uses the gpiolib generic implementation which
> does:
> 
>   try_module_get(chip->gpiodev->owner);
>   gpiochip_lock_as_irq(chip, d->hwirq);

So Tony actually provided the part of dmesg which shows the initial
failure, which subsequently leads to the splat Sebastian reported.

Unhandled fault: external abort on non-linefetch (0x1028) at 0xfb050034
pgd = c0004000 [fb050034] *pgd=49011452(bad)
Internal error: : 1028 [#1] SMP ARM
Workqueue: events deferred_probe_work_func
task: ce1d41c0 task.stack: ce1fc000
PC is at omap_gpio_get_direction+0x2c/0x44
LR is at _raw_spin_lock_irqsave+0x40/0x4c
pc : []lr : []psr: 6093
sp : ce1fdb78  ip : c0dce42c  fp : ce22d810
r10: ce22d800  r9 :   r8 : ce22d900
r7 : 0016  r6 : ce223864  r5 : fb050034  r4 : 0020
r3 : ce1d41c0  r2 :   r1 : a013  r0 : a013
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 80004019  DAC: 0051
Process kworker/0:1 (pid: 14, stack limit = 0xce1fc218)

The callstack is:

omap_gpio_get_direction
gpiochip_lock_as_irq
gpiochip_irq_reqres
__setup_irq
request_threaded_irq
smc_probe
smc_drv_probe
platform_drv_probe


So the SMC91X network driver request an IRQ, which ends up calling into the
GPIO interrupt setup and that fails. I have no idea why that would not fail
with the patch reverted. Dusting off a Beaglebone board

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-10 Thread Thomas Gleixner

On Mon, 10 Jul 2017, Linus Torvalds wrote:
> On Mon, Jul 10, 2017 at 6:35 AM, Sebastian Reichel
>  wrote:
> >
> > This patch apparently breaks OMAP platform:
> >
> > 46e48e257360f0845fe17089713cbad4db611e70 is the first bad commit
> > commit 46e48e257360f0845fe17089713cbad4db611e70
> > Author: Thomas Gleixner 
> > Date:   Thu Jun 29 23:33:38 2017 +0200
> >
> > genirq: Move irq resource handling out of spinlocked region
> >
> > Boot failure log from Droid 4:
> > [ ... snip snip ..]
> >
> > Droid 4 boots current master again after applying the patch below
> > (which is git revet of above patch, but I provide the patch, since
> > it did not revet cleanly).
> 
> Hmm. Do you actually need the full revert?
> 
> I think it's only the __setup_irq() part that looks like it may be garbage.
> 
> For example, I think it releases the resources twice if the
> __irq_set_trigger() call fails.

Yes, I missed that. Sorry.

> But it looks questionably in other ways too - notably, the change to
> make the request call be in the same context as the freeing is done is
> apparently done entirely for symmetry reasons, not for any actual
> *reason* reasons.

There is a reasons reason. The whole purpose was to move out the
request/free resources call from the spinlocked and irq disabled reason.
I noticed the free ordering issue, when I was working on that.

The fact that the patch breaks the OMAP boot, points to something else.

The only user of the irq_request_resources() callback at the moment is the
GPIO subsystem and some pinctrl drivers, which are not involved in the OMAP
case. In case of OMAP it uses the gpiolib generic implementation which
does:

  try_module_get(chip->gpiodev->owner);
  gpiochip_lock_as_irq(chip, d->hwirq);

I have no idea at the moment why this would break anything. The double
release in the __irq_set_trigger() error path is the only issue I can find
there.

Sebastian, can you please provide a .config and a full boot log, preferably
with initcall_debug on the kernel command line?

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-10 Thread Linus Torvalds

On Mon, Jul 10, 2017 at 1:15 PM, Sebastian Reichel
 wrote:
>>
>> So Sebastian, can you test if it's ok to revert just the __setup_irq()
>> part, but leave the smaller part in __free_irq() that just moves the
>> irq_release_resources() around at freeing time?
>
> Looking at my patch it implements what you describe (by coincidence,
> since git revert could not do a clean revert) as far as I can see.

Heh, yes, I didn't look at your patch as much as I looked at the
revert description.

And yes, going back to look at your patch it looks like it only
reverts the __setup_irq() parts.

Waiting for Thomas to comment on this whole thing..

Linus

Re: [GIT pull] irq updates for 4.13

2017-07-10 Thread Sebastian Reichel

Hi Linus,

On Mon, Jul 10, 2017 at 10:01:22AM -0700, Linus Torvalds wrote:
> On Mon, Jul 10, 2017 at 6:35 AM, Sebastian Reichel
>  wrote:
> >
> > This patch apparently breaks OMAP platform:
> >
> > 46e48e257360f0845fe17089713cbad4db611e70 is the first bad commit
> > commit 46e48e257360f0845fe17089713cbad4db611e70
> > Author: Thomas Gleixner 
> > Date:   Thu Jun 29 23:33:38 2017 +0200
> >
> > genirq: Move irq resource handling out of spinlocked region
> >
> > Boot failure log from Droid 4:
> > [ ... snip snip ..]
> >
> > Droid 4 boots current master again after applying the patch below
> > (which is git revet of above patch, but I provide the patch, since
> > it did not revet cleanly).
> 
> Hmm. Do you actually need the full revert?

It's technically not a full revert - I actually did not revert the
__free_irq changes.

> I think it's only the __setup_irq() part that looks like it may be garbage.
> 
> For example, I think it releases the resources twice if the
> __irq_set_trigger() call fails.
> 
> But it looks questionably in other ways too - notably, the change to
> make the request call be in the same context as the freeing is done is
> apparently done entirely for symmetry reasons, not for any actual
> *reason* reasons.
> 
> So I suspect just the __setup_irq() parts should be reverted, because
> they look both buggy and pointless. But the actual *real* part of the
> patch was the two-liner __free_irq() part, and that looks sane to me.
> 
> So Sebastian, can you test if it's ok to revert just the __setup_irq()
> part, but leave the smaller part in __free_irq() that just moves the
> irq_release_resources() around at freeing time?

Looking at my patch it implements what you describe (by coincidence,
since git revert could not do a clean revert) as far as I can see.
It seems Pavel also understood it this way, since his patch is identical 
to the one I provided.

-- Sebastian


signature.asc
Description: PGP signature

Re: [GIT pull] irq updates for 4.13

2017-07-10 Thread Pavel Machek

Hi!

> > This patch apparently breaks OMAP platform:
> >
> > 46e48e257360f0845fe17089713cbad4db611e70 is the first bad commit
> > commit 46e48e257360f0845fe17089713cbad4db611e70
> > Author: Thomas Gleixner 
> > Date:   Thu Jun 29 23:33:38 2017 +0200
> >
> > genirq: Move irq resource handling out of spinlocked region
> >
> > Boot failure log from Droid 4:
> > [ ... snip snip ..]
> >
> > Droid 4 boots current master again after applying the patch below
> > (which is git revet of above patch, but I provide the patch, since
> > it did not revet cleanly).
> 
> Hmm. Do you actually need the full revert?
> 
> I think it's only the __setup_irq() part that looks like it may be garbage.
> 
> For example, I think it releases the resources twice if the
> __irq_set_trigger() call fails.
> 
> But it looks questionably in other ways too - notably, the change to
> make the request call be in the same context as the freeing is done is
> apparently done entirely for symmetry reasons, not for any actual
> *reason* reasons.
> 
> So I suspect just the __setup_irq() parts should be reverted, because
> they look both buggy and pointless. But the actual *real* part of the
> patch was the two-liner __free_irq() part, and that looks sane to me.
> 
> So Sebastian, can you test if it's ok to revert just the __setup_irq()
> part, but leave the smaller part in __free_irq() that just moves the
> irq_release_resources() around at freeing time?

If I understood it correctly, you wanted to test this:

And yes, this is enough to fix boot on N900 for me.

Thanks,
Pavel

commit 285358d48dec82f13fa76724bff434897883d188
Author: Pavel 
Date:   Mon Jul 10 21:35:25 2017 +0200

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 5624b2d..528bfc3 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1168,14 +1168,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
new->flags &= ~IRQF_ONESHOT;
 
mutex_lock(&desc->request_mutex);
-   if (!desc->action) {
-   ret = irq_request_resources(desc);
-   if (ret) {
-   pr_err("Failed to request resources for %s (irq %d) on 
irqchip %s\n",
-  new->name, irq, desc->irq_data.chip->name);
-   goto out_mutex;
-   }
-   }
 
chip_bus_lock(desc);
 
@@ -1279,6 +1271,13 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
}
 
if (!shared) {
+   ret = irq_request_resources(desc);
+   if (ret) {
+   pr_err("Failed to request resources for %s (irq %d) on 
irqchip %s\n",
+  new->name, irq, desc->irq_data.chip->name);
+   goto out_unlock;
+   }
+
init_waitqueue_head(&desc->wait_for_threads);
 
/* Setup the type (level, edge polarity) if configured: */
@@ -1387,10 +1386,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
 
chip_bus_sync_unlock(desc);
 
-   if (!desc->action)
-   irq_release_resources(desc);
-
-out_mutex:
mutex_unlock(&desc->request_mutex);
 
 out_thread:


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [GIT pull] irq updates for 4.13

2017-07-10 Thread Linus Torvalds

On Mon, Jul 10, 2017 at 6:35 AM, Sebastian Reichel
 wrote:
>
> This patch apparently breaks OMAP platform:
>
> 46e48e257360f0845fe17089713cbad4db611e70 is the first bad commit
> commit 46e48e257360f0845fe17089713cbad4db611e70
> Author: Thomas Gleixner 
> Date:   Thu Jun 29 23:33:38 2017 +0200
>
> genirq: Move irq resource handling out of spinlocked region
>
> Boot failure log from Droid 4:
> [ ... snip snip ..]
>
> Droid 4 boots current master again after applying the patch below
> (which is git revet of above patch, but I provide the patch, since
> it did not revet cleanly).

Hmm. Do you actually need the full revert?

I think it's only the __setup_irq() part that looks like it may be garbage.

For example, I think it releases the resources twice if the
__irq_set_trigger() call fails.

But it looks questionably in other ways too - notably, the change to
make the request call be in the same context as the freeing is done is
apparently done entirely for symmetry reasons, not for any actual
*reason* reasons.

So I suspect just the __setup_irq() parts should be reverted, because
they look both buggy and pointless. But the actual *real* part of the
patch was the two-liner __free_irq() part, and that looks sane to me.

So Sebastian, can you test if it's ok to revert just the __setup_irq()
part, but leave the smaller part in __free_irq() that just moves the
irq_release_resources() around at freeing time?

Thomas? Comments?

   Linus

Re: [GIT pull] irq updates for 4.13

2017-07-10 Thread Sebastian Reichel

Hi,

On Sun, Jul 09, 2017 at 10:49:57AM +0200, Thomas Gleixner wrote:
>   - Move the interrupt resource management logic out of the spin locked,
> irq disabled region to avoid unnecessary restrictions of the resource
> callbacks

This patch apparently breaks OMAP platform:

46e48e257360f0845fe17089713cbad4db611e70 is the first bad commit
commit 46e48e257360f0845fe17089713cbad4db611e70
Author: Thomas Gleixner 
Date:   Thu Jun 29 23:33:38 2017 +0200

genirq: Move irq resource handling out of spinlocked region

Boot failure log from Droid 4:

[1.346984] cpcap-core spi1.0: CPCAP vendor: ST rev: 2.10 (1a)
[1.354766] Unhandled fault: imprecise external abort (0x1406) at 0xfeff
[1.354797] [ cut here ]
[1.354827] WARNING: CPU: 0 PID: 0 at drivers/bus/omap_l3_noc.c:147 
l3_interrupt_handler+0x21c/0x348
[1.354827] 4400.ocp:L3 Custom Error: MASTER MPU TARGET L4CFG (Read): 
Data Access in User mode during Functional access
[1.354827] Modules linked in:
[1.354858] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.12.0-10335-gb8c0c26d6667 #516
[1.354858] Hardware name: Generic OMAP4 (Flattened Device Tree)
[1.354888] [] (unwind_backtrace) from [] 
(show_stack+0x10/0x14)
[1.354888] [] (show_stack) from [] 
(dump_stack+0xac/0xe0)
[1.354919] [] (dump_stack) from [] (__warn+0xd8/0x104)
[1.354919] [] (__warn) from [] 
(warn_slowpath_fmt+0x34/0x44)
[1.354919] [] (warn_slowpath_fmt) from [] 
(l3_interrupt_handler+0x21c/0x348)
[1.354949] [] (l3_interrupt_handler) from [] 
(__handle_irq_event_percpu+0x48/0x3b4)
[1.354949] [] (__handle_irq_event_percpu) from [] 
(handle_irq_event_percpu+0x1c/0x58)
[1.354949] [] (handle_irq_event_percpu) from [] 
(handle_irq_event+0x38/0x5c)
[1.354980] [] (handle_irq_event) from [] 
(handle_fasteoi_irq+0xb4/0x178)
[1.354980] [] (handle_fasteoi_irq) from [] 
(generic_handle_irq+0x20/0x34)
[1.354980] [] (generic_handle_irq) from [] 
(__handle_domain_irq+0x64/0xe0)
[1.354980] [] (__handle_domain_irq) from [] 
(gic_handle_irq+0x54/0xb8)
[1.355010] [] (gic_handle_irq) from [] 
(__irq_svc+0x70/0x98)
[1.355010] Exception stack(0xc1001f38 to 0xc1001f80)
[1.355010] 1f20:   
0001 0001
[1.355041] 1f40:  c100abc0 c100 c1007bcc c1007b68 c0f8a938 
c1007f68 c1046133
[1.355041] 1f60:    c1001f88 c019a828 c010822c 
2013 
[1.355041] [] (__irq_svc) from [] 
(arch_cpu_idle+0x20/0x3c)
[1.355072] [] (arch_cpu_idle) from [] 
(do_idle+0x168/0x21c)
[1.355072] [] (do_idle) from [] 
(cpu_startup_entry+0x18/0x1c)
[1.355072] [] (cpu_startup_entry) from [] 
(start_kernel+0x348/0x3c0)
[1.355102] [] (start_kernel) from [<8000807c>] (0x8000807c)
[1.355133] ---[ end trace 03269d8f047e066b ]---
[1.584167] pgd = c0004000
[1.586914] [feff] *pgd=
[1.590515] Internal error: : 1406 [#1] SMP ARM
[1.595062] Modules linked in:
[1.598144] CPU: 1 PID: 1 Comm: swapper/0 Tainted: GW   
4.12.0-10335-gb8c0c26d6667 #516
[1.607238] Hardware name: Generic OMAP4 (Flattened Device Tree)
[1.613281] task: ee8aae00 task.stack: ee8ac000
[1.617858] PC is at debug_lockdep_rcu_enabled+0x38/0x48
[1.623199] LR is at lock_release+0x25c/0x360
[1.627593] pc : []lr : []psr: 2093
[1.633880] sp : ee8adb80  ip : c10fe40c  fp : eea0fc10
[1.639160] r10: 0001  r9 : c10f4230  r8 : 6093
[1.644409] r7 : c1007b68  r6 : c051b3f8  r5 : eea0ce74  r4 : a013
[1.650970] r3 : ee8aae00  r2 : 0001  r1 : 0003  r0 : 001f
[1.657531] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
none
[1.664825] Control: 10c5387d  Table: 8000404a  DAC: 0051
[1.670593] Process swapper/0 (pid: 1, stack limit = 0xee8ac218)
[1.676635] Stack: (0xee8adb80 to 0xee8ae000)
[1.681030] db80: a013 c051b3e8 0007 a013 eea0ce64 eea0ce64 
0007 eea0fd04
[1.689270] dba0: c06208d8 eea0fc00 eea0fc10 c0af76ec  fc310134 
eea0ce64 c051b3f8
[1.697479] dbc0: 0020 eea0cea4 eea0cc70  eea0fd04 c0514430 
eea0cea4 eea0fc10
[1.705718] dbe0: eef44cc0 c0514960 eea0fc00 0021 eef44cc0 c01ac370 
ee8000c0 6013
[1.713958] dc00: eed2d57c eef44cc0  c01aa628 eef42000 0021 
c06208d8 eea0fc00
[1.722198] dc20: eea0fc10 c01ac73c 2084 eef42000 0204 c10a88cc 
0001 eee8a200
[1.730407] dc40:   0010 c0621460 c0da9f6c eef42000 
 0111
[1.738647] dc60: 0021 0084 0084 eee8a200 eef22610 0021 
0084 eef23800
[1.746887] dc80: 0010 eef2513c c0b69968 c0621618 c10a88cc ee8adc9c 
0004 eef23800
[1.755126] dca0: eef23800 eee8a200 0021 eef22710 eef25010 c062d618 
 c10a88cc
[1.763366] dcc0: eef22718 ee8aae00 0001  c10a88cc 0010 
 
[1.77

Re: [GIT pull] irq updates for 4.13

2017-07-06 Thread Max Gurtovoy




On 7/4/2017 11:48 PM, Max Gurtovoy wrote:



On 7/4/2017 10:10 PM, Thomas Gleixner wrote:

On Tue, 4 Jul 2017, Linus Torvalds wrote:

On Tue, Jul 4, 2017 at 8:17 AM, Jens Axboe  wrote:

On 07/03/2017 06:00 PM, Linus Torvalds wrote:


If they ever do come online, does that get fixed? I don't know.
Somebody should check.


Yes, the blk-mq cpu hotplug code updates mappings when CPUs come and
go, so that part is fine. That's exercised everytime the laptop is
suspended and resumed.


I don't think that's true any more. Commit fe631457ff3e changed it to
map the initial CPU's sequentially whether they are online or not.
Only after you run out of hardware queues will we start playing games.

That's what worries me about the conflict - the two changes did very
different things to the same code. I'd really like somebody to take a
look at my resolution, and just in general how those two different
changes work together.


Hmm, I leave that to Christoph. He wrote the irq stuff and reviewed
fe631457ff3e.

Thanks,

tglx



Hi Linus,
From code reviewing the changes you made during the merge, I'm good with
my commit purpose (fix the mapping between CPUs and HWQs). You actually
replaced the "struct cpumask *online_mask" with cpu_online(cpu) function
and it's fine. I'll run some tests to see that I'm not missing something.
Regarding the second patch from Christoph, let's wait for his review (I
can also test other scenarios like offline/online CPU's after initial
mapping of CPUs to HWQs).

Cheers,
Max.


FYI,
My tests for the mapping between (72) CPUs and (64) HWQs passed 
successfully with 4.12.0+.

Re: [GIT pull] irq updates for 4.13

2017-07-05 Thread Christoph Hellwig

On Mon, Jul 03, 2017 at 05:00:03PM -0700, Linus Torvalds wrote:
> I'm not at all understanding why that second commit came in through
> the irq tree at all, in fact. Very annoying. Why was that not sent
> through the block tree? It doesn't seem to have anything fundamentally
> to do with irqs, really: it's a driver CPU choice for irq chocie.

It depends on a major IRQ layer rework, and it had at that time no
clash with the block tree at all.  That's why I suggested to Thomas
and Jens that we should take it through the irq code.  Then Max'
code showed up last minute and wrecked that plan, and no one noticed
in time.  Blame it on me for not noticing this in time.

> Anyway, I absolutely detested that code, and the obvious resolution
> was too disgusting to live. So I did an evil merge and moved some
> things around in the merge to make it at least not cause me to dig my
> eyes out.
> 
> But I'd like people to look at that - not so much due to the evil
> merge itself (but check that too, by any means), but just because the
> code seems fundamentally broken for the hotplug case. We end up
> picking a possible metric shit-ton of CPU's for queue 0, if they were
> "possible but not online".

I think this code also needs to move over to cpu_possible instead
of cpu_online to match what we did for the MSI-X based mapping.  But
I'll need a little more coffee and get back into it.

Re: [GIT pull] irq updates for 4.13

2017-07-04 Thread Jens Axboe

On 07/04/2017 12:34 PM, Linus Torvalds wrote:
> On Tue, Jul 4, 2017 at 8:17 AM, Jens Axboe  wrote:
>> On 07/03/2017 06:00 PM, Linus Torvalds wrote:
>>>
>>> If they ever do come online, does that get fixed? I don't know.
>>> Somebody should check.
>>
>> Yes, the blk-mq cpu hotplug code updates mappings when CPUs come and
>> go, so that part is fine. That's exercised everytime the laptop is
>> suspended and resumed.
> 
> I don't think that's true any more. Commit fe631457ff3e changed it to
> map the initial CPU's sequentially whether they are online or not.
> Only after you run out of hardware queues will we start playing games.
> 
> That's what worries me about the conflict - the two changes did very
> different things to the same code. I'd really like somebody to take a
> look at my resolution, and just in general how those two different
> changes work together.

OK, I'll take a look at it tomorrow.

-- 
Jens Axboe

Re: [GIT pull] irq updates for 4.13

2017-07-04 Thread Max Gurtovoy




On 7/4/2017 10:10 PM, Thomas Gleixner wrote:

On Tue, 4 Jul 2017, Linus Torvalds wrote:

On Tue, Jul 4, 2017 at 8:17 AM, Jens Axboe  wrote:

On 07/03/2017 06:00 PM, Linus Torvalds wrote:


If they ever do come online, does that get fixed? I don't know.
Somebody should check.


Yes, the blk-mq cpu hotplug code updates mappings when CPUs come and
go, so that part is fine. That's exercised everytime the laptop is
suspended and resumed.


I don't think that's true any more. Commit fe631457ff3e changed it to
map the initial CPU's sequentially whether they are online or not.
Only after you run out of hardware queues will we start playing games.

That's what worries me about the conflict - the two changes did very
different things to the same code. I'd really like somebody to take a
look at my resolution, and just in general how those two different
changes work together.


Hmm, I leave that to Christoph. He wrote the irq stuff and reviewed
fe631457ff3e.

Thanks,

tglx



Hi Linus,
From code reviewing the changes you made during the merge, I'm good 
with my commit purpose (fix the mapping between CPUs and HWQs). You 
actually replaced the "struct cpumask *online_mask" with cpu_online(cpu) 
function and it's fine. I'll run some tests to see that I'm not missing 
something.
Regarding the second patch from Christoph, let's wait for his review (I 
can also test other scenarios like offline/online CPU's after initial 
mapping of CPUs to HWQs).


Cheers,
Max.

Re: [GIT pull] irq updates for 4.13

2017-07-04 Thread Thomas Gleixner

On Tue, 4 Jul 2017, Linus Torvalds wrote:
> On Tue, Jul 4, 2017 at 8:17 AM, Jens Axboe  wrote:
> > On 07/03/2017 06:00 PM, Linus Torvalds wrote:
> >>
> >> If they ever do come online, does that get fixed? I don't know.
> >> Somebody should check.
> >
> > Yes, the blk-mq cpu hotplug code updates mappings when CPUs come and
> > go, so that part is fine. That's exercised everytime the laptop is
> > suspended and resumed.
> 
> I don't think that's true any more. Commit fe631457ff3e changed it to
> map the initial CPU's sequentially whether they are online or not.
> Only after you run out of hardware queues will we start playing games.
> 
> That's what worries me about the conflict - the two changes did very
> different things to the same code. I'd really like somebody to take a
> look at my resolution, and just in general how those two different
> changes work together.

Hmm, I leave that to Christoph. He wrote the irq stuff and reviewed
fe631457ff3e.

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-04 Thread Linus Torvalds

On Tue, Jul 4, 2017 at 8:17 AM, Jens Axboe  wrote:
> On 07/03/2017 06:00 PM, Linus Torvalds wrote:
>>
>> If they ever do come online, does that get fixed? I don't know.
>> Somebody should check.
>
> Yes, the blk-mq cpu hotplug code updates mappings when CPUs come and
> go, so that part is fine. That's exercised everytime the laptop is
> suspended and resumed.

I don't think that's true any more. Commit fe631457ff3e changed it to
map the initial CPU's sequentially whether they are online or not.
Only after you run out of hardware queues will we start playing games.

That's what worries me about the conflict - the two changes did very
different things to the same code. I'd really like somebody to take a
look at my resolution, and just in general how those two different
changes work together.

 Linus

Re: [GIT pull] irq updates for 4.13

2017-07-04 Thread Jens Axboe

On 07/03/2017 06:00 PM, Linus Torvalds wrote:
> But I'd like people to look at that - not so much due to the evil
> merge itself (but check that too, by any means), but just because the
> code seems fundamentally broken for the hotplug case. We end up
> picking a possible metric shit-ton of CPU's for queue 0, if they were
> "possible but not online".
> 
> If they ever do come online, does that get fixed? I don't know.
> Somebody should check.

Yes, the blk-mq cpu hotplug code updates mappings when CPUs come and
go, so that part is fine. That's exercised everytime the laptop is
suspended and resumed.

-- 
Jens Axboe

Re: [GIT pull] irq updates for 4.13

2017-07-04 Thread Thomas Gleixner

On Tue, 4 Jul 2017, Thomas Gleixner wrote:
> On Mon, 3 Jul 2017, Linus Torvalds wrote:
> 
> > On Mon, Jul 3, 2017 at 12:42 AM, Thomas Gleixner  wrote:
> > >
> > > please pull the latest irq-core-for-linus git tree from:
> > >
> > >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> > > irq-core-for-linus
> > 
> > Ugh, this caused conflicts with the block tree, with commits
> > 
> >  - fe631457ff3e: "blk-mq: map all HWQ also in hyperthreaded system"
> > 
> >  - 5f042e7cbd9e "blk-mq: Include all present CPUs in the default queue 
> > mapping"
> > 
> > clashing.
> > 
> > I'm not at all understanding why that second commit came in through
> > the irq tree at all, in fact. Very annoying. Why was that not sent
> > through the block tree? It doesn't seem to have anything fundamentally
> > to do with irqs, really: it's a driver CPU choice for irq chocie.
> 
> There is a dependency. The changes in the block code rely on the new
> features of the generic interrupt affinity management. See below.

That said, we should have kept the colliding changes back, wait until
block and irq got merged and then apply them again properly.

Sorry for the inconveniance.

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-04 Thread Thomas Gleixner

On Mon, 3 Jul 2017, Linus Torvalds wrote:

> On Mon, Jul 3, 2017 at 12:42 AM, Thomas Gleixner  wrote:
> >
> > please pull the latest irq-core-for-linus git tree from:
> >
> >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> > irq-core-for-linus
> 
> Ugh, this caused conflicts with the block tree, with commits
> 
>  - fe631457ff3e: "blk-mq: map all HWQ also in hyperthreaded system"
> 
>  - 5f042e7cbd9e "blk-mq: Include all present CPUs in the default queue 
> mapping"
> 
> clashing.
> 
> I'm not at all understanding why that second commit came in through
> the irq tree at all, in fact. Very annoying. Why was that not sent
> through the block tree? It doesn't seem to have anything fundamentally
> to do with irqs, really: it's a driver CPU choice for irq chocie.

There is a dependency. The changes in the block code rely on the new
features of the generic interrupt affinity management. See below.

> Anyway, I absolutely detested that code, and the obvious resolution
> was too disgusting to live. So I did an evil merge and moved some
> things around in the merge to make it at least not cause me to dig my
> eyes out.
> 
> But I'd like people to look at that - not so much due to the evil
> merge itself (but check that too, by any means), but just because the
> code seems fundamentally broken for the hotplug case. We end up
> picking a possible metric shit-ton of CPU's for queue 0, if they were
> "possible but not online".

The mechanism is:

Spread out the queues and the associated interrupts accross the possible
CPUs. This results in a queue/interrupt per group of CPUs (group can be a
single CPU)

If a group is offline, then the interrupt is kept in managed shutdown
mode. If a CPU of the group comes online then the core management starts up
the interrupt and makes it affine to that CPU.

If the last CPU of a group goes offline, the interrupt is not moved to some
random other CPU. It's put in managed shutdown mode and then restarted when
the a CPU of the group comes online again.

That exercise avoids exactly the 'metric tons of irqs' moved to random CPUs
and then brought back to the target CPUs when they come online
again. On/offline seems to be (ab)used frequently for power management
purposes nowadays.

Sorry, if I did not make that clear enough in the pull request message.

Thanks,

tglx

Re: [GIT pull] irq updates for 4.13

2017-07-03 Thread Linus Torvalds

On Mon, Jul 3, 2017 at 12:42 AM, Thomas Gleixner  wrote:
>
> please pull the latest irq-core-for-linus git tree from:
>
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> irq-core-for-linus

Ugh, this caused conflicts with the block tree, with commits

 - fe631457ff3e: "blk-mq: map all HWQ also in hyperthreaded system"

 - 5f042e7cbd9e "blk-mq: Include all present CPUs in the default queue mapping"

clashing.

I'm not at all understanding why that second commit came in through
the irq tree at all, in fact. Very annoying. Why was that not sent
through the block tree? It doesn't seem to have anything fundamentally
to do with irqs, really: it's a driver CPU choice for irq chocie.

Anyway, I absolutely detested that code, and the obvious resolution
was too disgusting to live. So I did an evil merge and moved some
things around in the merge to make it at least not cause me to dig my
eyes out.

But I'd like people to look at that - not so much due to the evil
merge itself (but check that too, by any means), but just because the
code seems fundamentally broken for the hotplug case. We end up
picking a possible metric shit-ton of CPU's for queue 0, if they were
"possible but not online".

If they ever do come online, does that get fixed? I don't know.
Somebody should check.

Linus

52 matches

Mail list logo