Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-28 Thread Thomas Gleixner
On Tue, 25 Aug 2015, Felipe Balbi wrote:
 Hi Ingo,

Thanks for not cc'ing the irq maintainer 
 
 I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
 devm_request_*irq().
 
 If we using devm_request_*irq(), that irq will be freed after device
 drivers' -remove() gets called. If on -remove(), we're calling
 pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get
 gated and, because we do an extra call to the device's IRQ handler when
 CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the
 IRQ handler, we try to read a register which is clocked by the device's
 clock.
 
 This is, of course, really old code which has been in tree for many,
 many years. I guess nobody has been running their tests in the setup
 mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on
 -remove(), a register read on IRQ handler, and a shared IRQ handler),
 so that's why we never caught this before.
 
 Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but
 if driver *must* be ready to receive, and handle, an IRQ even during
 module removal, I wonder what the IRQ handler should do. We can't, in
 most cases, call pm_runtime_put_sync() from IRQ handler.

Well, a shared interrupt handler must handle this situation, no matter
what. Assume the following:

irqreturn_t dev_irq(int irq, void *data)
{
struct devdata *dd = data;
u32 state;

state = readl(dd-base);
...
}

void module_exit(void)
{   
/* Write to the device interrupt register */
disable_device_irq(dd-base);
/*
 * After this point the device does not longer
 * raise an interrupt
 */
iounmap(dd-base);
free_irq();

If the other device which shares the interrupt line raises an
interrupt after the unmap and before free_irq() removed the device
handler from the irq, the machine is toast, because the dev_irq
handler is still called.

If the handler is shut down after critical parts of the driver/device
are shut down, then you can 

 - either can change the setup/teardown ordering

disable_device_irq(dd-base);
free_irq();
iounmap(dd-base);

 - or have a proper flag in the private data which tells the interrupt
   handler to sod off.

irqreturn_t dev_irq(int irq, void *data)
{
struct devdata *dd = data;

if (dd-shutdown)
return IRQ_NONE;
...

void module_exit(void)
{   
disable_device_irq(dd-base);
dd-shutdown = 1;

/* On an SMP machine you also need: */  
synchronize_irq(dd-irq);

So for the problem at hand, the devm magic needs to make sure that the
crucial parts are still alive when the devm allocated irq is released.

I have no idea how that runtime PM stuff is integrated into devm (I
fear not at all), so it's hard to give you a proper advise on that.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-27 Thread Felipe Balbi
On Wed, Aug 26, 2015 at 05:36:24PM -0300, Ezequiel Garcia wrote:
 On 26 August 2015 at 17:24, Felipe Balbi ba...@ti.com wrote:
 [..]
 
  static irqreturn_t tw68_irq(int irq, void *dev_id)
  {
  struct tw68_dev *dev = dev_id;
  u32 status, orig;
  int loop;
 
  status = orig = tw_readl(TW68_INTSTAT)  dev-pci_irqmask;
 
  Now try to read that register when your clock is gated. That's the
  problem I'm talking about. Everything about the handler is functioning
  correctly; however clocks are gated in -remove() and free_irq() is
  only called *AFTER* -remove() has returned.
 
 
 Yeah, it's pretty clear you are talking about clocks here. That's
 why I said read won't stall in the next paragraph.
 
  [etc]
  }
 
  The IRQ handler accesses the device struct and then
  reads through PCI. So if you use devm_request_irq
  you need to make sure the device struct is still allocated
  after remove(), and the PCI read won't stall or crash.
 
  dude, that's not the problem I'm talking about. I still have my
  private_data around, what I don't have is:
 
__
__ ____| | ___   ___| | __
   / _` |  / __| |/ _ \ / __| |/ /
  | (_| | | (__| | (_) | (__|   
   \__,_|  \___|_|\___/ \___|_|\_\
 
 
 
 Yes, *you* may have your private data around and have a clock gated,
 others (the tw68 for instance) may have its region released and unmapped.
 
 And yet others may have $whatever resource released in the
 remove() and assume it's available in the IRQ handler.
 
 I honestly can't think why using request_irq / free_irq to solve this
 is a workaround.

because it'll, eventually, boil down to not using devm_* at all and
that's pretty stupid.

-- 
balbi


signature.asc
Description: Digital signature


Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-26 Thread Ezequiel Garcia
On 26 August 2015 at 17:03, Felipe Balbi ba...@ti.com wrote:
 On Wed, Aug 26, 2015 at 04:53:27PM -0300, Ezequiel Garcia wrote:
 On 26 August 2015 at 16:38, Felipe Balbi ba...@ti.com wrote:
  Hi,
 
  On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote:
  Felipe,
 
  On 25 August 2015 at 16:58, Felipe Balbi ba...@ti.com wrote:
   Hi Ingo,
  
   I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
   devm_request_*irq().
  
 
  I may be jumping on the gun here, but I believe here's your problem.
  Using devm_request_irq with shared IRQs is not a good idea.
 
  Or at least, it forces you to handle interrupts after your device
  is _completely_ removed (e.g. your IRQ cookie could be bogus).
 
  AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
  spurious interrupts, that are expected to happen anyway.
 
  Your IRQ is shared, and so you may get any IRQ at any time,
  coming from another device (not yours).
 
  So, if I'm right, my suggestion is simple: use request_irq/free_irq
  and free your IRQ before you disable your clocks, remove your device,
  etc.
 
  yeah, that's just a workaround though. Specially with IRQ flags coming
  from DT, driver might have no knowledge that its IRQ is shared to start
  with.
 

 Really? Is there any way the DT can set the IRQF_SHARED flag?
 The caller of devm_request_irq / request_irq needs to pass the irqflags,
 so I'd say the driver is perfectly aware of the IRQ being shared or not.

 Maybe you can clarify what I'm missing here.

 hmm, that's true. Now that I look at it, DT can pass triggering flags.

  Besides, removing devm_* is just a workaround to the real problem. It
  seems, to me at least, that drivers shouldn't be calling
  pm_runtime_put_sync(); pm_runtime_disable() from their -remove(),
  rather the bus driver should be responsible for doing so; much
  usb_bus_type and pci_bus_type do. Of course, this has the side effect of
  requiring buses to make sure that by the time -probe() is called, that
  device's clocks are stable and running and the device is active.
 
  However, that's not done today for the platform_bus_type and, frankly,
  that would be somewhat of a complex problem to solve, considering that
  different SoCs integrate IPs the way they want.
 
  Personally, I think removing devm_* is but a workaround to the real
  thing we're dealing with.
 

 I don't see any problems here: if your interrupt is shared, then you must
 be prepared to handle it any time, coming from any sources (not only
 your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to
 make sure all the drivers passing IRQF_SHARED comply with that rule.

 you need to be sure of that with non-shared IRQs anyway.

Not entirely. If your IRQ is not shared, then you usually have a register
to enable or unmask your peripheral interrupts. So the driver is in control
of when it will get interrupts.

If the IRQ is shared, this won't do. This is what I mean by shared IRQs
must be prepared to receive an interrupt any time, in the sense that
the driver has no way of preventing IRQs (because they may be
coming from any source).

In the same sense, shared IRQs handlers need to double-check
the IRQ is coming to the current device by checking some IRQ
status register to see if there's pending work.

 Also, an IRQ
 which isn't shared in SoC A, might become shared in SoC B which uses the
 same IP.

 So you either avoid using devm_request_irq, or you prepare your handler
 accordingly to be ready to handle an interrupt _any time_.

 the handler is ready to handle at any time, what isn't correct is the
 fact that clocks get gated before IRQ is freed.

 There should be no such special case as if your handler is shared,
 don't use devm_request_*irq() because if we just disable PM_RUNTIME, it
 works as expected anyway.


Yeah, I meant to say: if you use devm_request_irq with IRQF_SHARED
then you _must_ be prepared to get an IRQ *after* your remove() has
been called.

Let's consider this snippet from tw68:

static irqreturn_t tw68_irq(int irq, void *dev_id)
{
struct tw68_dev *dev = dev_id;
u32 status, orig;
int loop;

status = orig = tw_readl(TW68_INTSTAT)  dev-pci_irqmask;
[etc]
}

The IRQ handler accesses the device struct and then
reads through PCI. So if you use devm_request_irq
you need to make sure the device struct is still allocated
after remove(), and the PCI read won't stall or crash.

Interestingly, tw68 uses devm_request_irq with IRQF_SHARED :-)

Still, I don't think that's a good idea, since it relies on
the IRQ being freed *before* the device struct.
-- 
Ezequiel García, VanguardiaSur
www.vanguardiasur.com.ar
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-26 Thread Ezequiel Garcia
Felipe,

On 25 August 2015 at 16:58, Felipe Balbi ba...@ti.com wrote:
 Hi Ingo,

 I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
 devm_request_*irq().


I may be jumping on the gun here, but I believe here's your problem.
Using devm_request_irq with shared IRQs is not a good idea.

Or at least, it forces you to handle interrupts after your device
is _completely_ removed (e.g. your IRQ cookie could be bogus).

AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
spurious interrupts, that are expected to happen anyway.

Your IRQ is shared, and so you may get any IRQ at any time,
coming from another device (not yours).

So, if I'm right, my suggestion is simple: use request_irq/free_irq
and free your IRQ before you disable your clocks, remove your device, etc.

 If we using devm_request_*irq(), that irq will be freed after device
 drivers' -remove() gets called. If on -remove(), we're calling
 pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get
 gated and, because we do an extra call to the device's IRQ handler when
 CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the
 IRQ handler, we try to read a register which is clocked by the device's
 clock.

 This is, of course, really old code which has been in tree for many,
 many years. I guess nobody has been running their tests in the setup
 mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on
 -remove(), a register read on IRQ handler, and a shared IRQ handler),
 so that's why we never caught this before.

 Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but
 if driver *must* be ready to receive, and handle, an IRQ even during
 module removal, I wonder what the IRQ handler should do. We can't, in
 most cases, call pm_runtime_put_sync() from IRQ handler.

 I'm guessing the only way would be to move pm_runtime calls into the bus
 driver (in this case, the platform_bus) and make sure it only gets
 called after device managed resources are freed ?

 Any hints would be greatly appreciated.


Hope it helps!
-- 
Ezequiel García, VanguardiaSur
www.vanguardiasur.com.ar
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-26 Thread Felipe Balbi
Hi,

On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote:
 Felipe,
 
 On 25 August 2015 at 16:58, Felipe Balbi ba...@ti.com wrote:
  Hi Ingo,
 
  I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
  devm_request_*irq().
 
 
 I may be jumping on the gun here, but I believe here's your problem.
 Using devm_request_irq with shared IRQs is not a good idea.
 
 Or at least, it forces you to handle interrupts after your device
 is _completely_ removed (e.g. your IRQ cookie could be bogus).
 
 AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
 spurious interrupts, that are expected to happen anyway.
 
 Your IRQ is shared, and so you may get any IRQ at any time,
 coming from another device (not yours).
 
 So, if I'm right, my suggestion is simple: use request_irq/free_irq
 and free your IRQ before you disable your clocks, remove your device,
 etc.

yeah, that's just a workaround though. Specially with IRQ flags coming
from DT, driver might have no knowledge that its IRQ is shared to start
with.

Besides, removing devm_* is just a workaround to the real problem. It
seems, to me at least, that drivers shouldn't be calling
pm_runtime_put_sync(); pm_runtime_disable() from their -remove(),
rather the bus driver should be responsible for doing so; much
usb_bus_type and pci_bus_type do. Of course, this has the side effect of
requiring buses to make sure that by the time -probe() is called, that
device's clocks are stable and running and the device is active.

However, that's not done today for the platform_bus_type and, frankly,
that would be somewhat of a complex problem to solve, considering that
different SoCs integrate IPs the way they want.

Personally, I think removing devm_* is but a workaround to the real
thing we're dealing with.

-- 
balbi


signature.asc
Description: Digital signature


Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-26 Thread Felipe Balbi
On Wed, Aug 26, 2015 at 04:53:27PM -0300, Ezequiel Garcia wrote:
 On 26 August 2015 at 16:38, Felipe Balbi ba...@ti.com wrote:
  Hi,
 
  On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote:
  Felipe,
 
  On 25 August 2015 at 16:58, Felipe Balbi ba...@ti.com wrote:
   Hi Ingo,
  
   I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
   devm_request_*irq().
  
 
  I may be jumping on the gun here, but I believe here's your problem.
  Using devm_request_irq with shared IRQs is not a good idea.
 
  Or at least, it forces you to handle interrupts after your device
  is _completely_ removed (e.g. your IRQ cookie could be bogus).
 
  AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
  spurious interrupts, that are expected to happen anyway.
 
  Your IRQ is shared, and so you may get any IRQ at any time,
  coming from another device (not yours).
 
  So, if I'm right, my suggestion is simple: use request_irq/free_irq
  and free your IRQ before you disable your clocks, remove your device,
  etc.
 
  yeah, that's just a workaround though. Specially with IRQ flags coming
  from DT, driver might have no knowledge that its IRQ is shared to start
  with.
 
 
 Really? Is there any way the DT can set the IRQF_SHARED flag?
 The caller of devm_request_irq / request_irq needs to pass the irqflags,
 so I'd say the driver is perfectly aware of the IRQ being shared or not.
 
 Maybe you can clarify what I'm missing here.

hmm, that's true. Now that I look at it, DT can pass triggering flags.

  Besides, removing devm_* is just a workaround to the real problem. It
  seems, to me at least, that drivers shouldn't be calling
  pm_runtime_put_sync(); pm_runtime_disable() from their -remove(),
  rather the bus driver should be responsible for doing so; much
  usb_bus_type and pci_bus_type do. Of course, this has the side effect of
  requiring buses to make sure that by the time -probe() is called, that
  device's clocks are stable and running and the device is active.
 
  However, that's not done today for the platform_bus_type and, frankly,
  that would be somewhat of a complex problem to solve, considering that
  different SoCs integrate IPs the way they want.
 
  Personally, I think removing devm_* is but a workaround to the real
  thing we're dealing with.
 
 
 I don't see any problems here: if your interrupt is shared, then you must
 be prepared to handle it any time, coming from any sources (not only
 your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to
 make sure all the drivers passing IRQF_SHARED comply with that rule.

you need to be sure of that with non-shared IRQs anyway. Also, an IRQ
which isn't shared in SoC A, might become shared in SoC B which uses the
same IP.

 So you either avoid using devm_request_irq, or you prepare your handler
 accordingly to be ready to handle an interrupt _any time_.

the handler is ready to handle at any time, what isn't correct is the
fact that clocks get gated before IRQ is freed.

There should be no such special case as if your handler is shared,
don't use devm_request_*irq() because if we just disable PM_RUNTIME, it
works as expected anyway.

-- 
balbi


signature.asc
Description: Digital signature


Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-26 Thread Ezequiel Garcia
On 26 August 2015 at 16:38, Felipe Balbi ba...@ti.com wrote:
 Hi,

 On Wed, Aug 26, 2015 at 04:29:52PM -0300, Ezequiel Garcia wrote:
 Felipe,

 On 25 August 2015 at 16:58, Felipe Balbi ba...@ti.com wrote:
  Hi Ingo,
 
  I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
  devm_request_*irq().
 

 I may be jumping on the gun here, but I believe here's your problem.
 Using devm_request_irq with shared IRQs is not a good idea.

 Or at least, it forces you to handle interrupts after your device
 is _completely_ removed (e.g. your IRQ cookie could be bogus).

 AFAICS, the CONFIG_DEBUG_SHIRQ option is just triggering a couple
 spurious interrupts, that are expected to happen anyway.

 Your IRQ is shared, and so you may get any IRQ at any time,
 coming from another device (not yours).

 So, if I'm right, my suggestion is simple: use request_irq/free_irq
 and free your IRQ before you disable your clocks, remove your device,
 etc.

 yeah, that's just a workaround though. Specially with IRQ flags coming
 from DT, driver might have no knowledge that its IRQ is shared to start
 with.


Really? Is there any way the DT can set the IRQF_SHARED flag?
The caller of devm_request_irq / request_irq needs to pass the irqflags,
so I'd say the driver is perfectly aware of the IRQ being shared or not.

Maybe you can clarify what I'm missing here.

 Besides, removing devm_* is just a workaround to the real problem. It
 seems, to me at least, that drivers shouldn't be calling
 pm_runtime_put_sync(); pm_runtime_disable() from their -remove(),
 rather the bus driver should be responsible for doing so; much
 usb_bus_type and pci_bus_type do. Of course, this has the side effect of
 requiring buses to make sure that by the time -probe() is called, that
 device's clocks are stable and running and the device is active.

 However, that's not done today for the platform_bus_type and, frankly,
 that would be somewhat of a complex problem to solve, considering that
 different SoCs integrate IPs the way they want.

 Personally, I think removing devm_* is but a workaround to the real
 thing we're dealing with.


I don't see any problems here: if your interrupt is shared, then you must
be prepared to handle it any time, coming from any sources (not only
your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to
make sure all the drivers passing IRQF_SHARED comply with that rule.

So you either avoid using devm_request_irq, or you prepare your handler
accordingly to be ready to handle an interrupt _any time_.

-- 
Ezequiel García, VanguardiaSur
www.vanguardiasur.com.ar
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-26 Thread Ezequiel Garcia
On 26 August 2015 at 17:24, Felipe Balbi ba...@ti.com wrote:
[..]

 static irqreturn_t tw68_irq(int irq, void *dev_id)
 {
 struct tw68_dev *dev = dev_id;
 u32 status, orig;
 int loop;

 status = orig = tw_readl(TW68_INTSTAT)  dev-pci_irqmask;

 Now try to read that register when your clock is gated. That's the
 problem I'm talking about. Everything about the handler is functioning
 correctly; however clocks are gated in -remove() and free_irq() is
 only called *AFTER* -remove() has returned.


Yeah, it's pretty clear you are talking about clocks here. That's
why I said read won't stall in the next paragraph.

 [etc]
 }

 The IRQ handler accesses the device struct and then
 reads through PCI. So if you use devm_request_irq
 you need to make sure the device struct is still allocated
 after remove(), and the PCI read won't stall or crash.

 dude, that's not the problem I'm talking about. I still have my
 private_data around, what I don't have is:

   __
   __ ____| | ___   ___| | __
  / _` |  / __| |/ _ \ / __| |/ /
 | (_| | | (__| | (_) | (__|   
  \__,_|  \___|_|\___/ \___|_|\_\



Yes, *you* may have your private data around and have a clock gated,
others (the tw68 for instance) may have its region released and unmapped.

And yet others may have $whatever resource released in the
remove() and assume it's available in the IRQ handler.

I honestly can't think why using request_irq / free_irq to solve this
is a workaround.

 Interestingly, tw68 uses devm_request_irq with IRQF_SHARED :-)

 Still, I don't think that's a good idea, since it relies on
 the IRQ being freed *before* the device struct.

 that's not an issue at all. If you're using devm_request_irq() you're
 likely using devm_kzalloc() for the device struct anyway. Also, you
 called devm_kzalloc() before devm_request_irq() so IRQ *will* be freed
 before your private data; there's nothing wrong there.


-- 
Ezequiel García, VanguardiaSur
www.vanguardiasur.com.ar
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_DEBUG_SHIRQ and PM

2015-08-26 Thread Felipe Balbi
Hi,

On Wed, Aug 26, 2015 at 05:15:51PM -0300, Ezequiel Garcia wrote:

snip

  be prepared to handle it any time, coming from any sources (not only
  your device). And CONFIG_DEBUG_SHIRQ does exactly that, in order to
  make sure all the drivers passing IRQF_SHARED comply with that rule.
 
  you need to be sure of that with non-shared IRQs anyway.
 
 Not entirely. If your IRQ is not shared, then you usually have a register
 to enable or unmask your peripheral interrupts. So the driver is in control
 of when it will get interrupts.
 
 If the IRQ is shared, this won't do. This is what I mean by shared IRQs
 must be prepared to receive an interrupt any time, in the sense that
 the driver has no way of preventing IRQs (because they may be
 coming from any source).

right, the problem is much less likely on non-shared lines but the fine
that a line is shared or not is a function of HW integration, not the
e.g. USB Controller, so that knowledge really doesn't fit the driver in
a sense.

We might as well get rid of IRQF_SHARED and assume all lines are
shareable.

 In the same sense, shared IRQs handlers need to double-check
 the IRQ is coming to the current device by checking some IRQ
 status register to see if there's pending work.

you should the status register even on non-shared IRQs to catch spurious
right ?

  Also, an IRQ
  which isn't shared in SoC A, might become shared in SoC B which uses the
  same IP.
 
  So you either avoid using devm_request_irq, or you prepare your handler
  accordingly to be ready to handle an interrupt _any time_.
 
  the handler is ready to handle at any time, what isn't correct is the
  fact that clocks get gated before IRQ is freed.
 
  There should be no such special case as if your handler is shared,
  don't use devm_request_*irq() because if we just disable PM_RUNTIME, it
  works as expected anyway.
 
 
 Yeah, I meant to say: if you use devm_request_irq with IRQF_SHARED
 then you _must_ be prepared to get an IRQ *after* your remove() has
 been called.
 
 Let's consider this snippet from tw68:
 
 static irqreturn_t tw68_irq(int irq, void *dev_id)
 {
 struct tw68_dev *dev = dev_id;
 u32 status, orig;
 int loop;
 
 status = orig = tw_readl(TW68_INTSTAT)  dev-pci_irqmask;

Now try to read that register when your clock is gated. That's the
problem I'm talking about. Everything about the handler is functioning
correctly; however clocks are gated in -remove() and free_irq() is
only called *AFTER* -remove() has returned.

 [etc]
 }
 
 The IRQ handler accesses the device struct and then
 reads through PCI. So if you use devm_request_irq
 you need to make sure the device struct is still allocated
 after remove(), and the PCI read won't stall or crash.

dude, that's not the problem I'm talking about. I still have my
private_data around, what I don't have is:

  __
  __ ____| | ___   ___| | __
 / _` |  / __| |/ _ \ / __| |/ /
| (_| | | (__| | (_) | (__|
 \__,_|  \___|_|\___/ \___|_|\_\


 Interestingly, tw68 uses devm_request_irq with IRQF_SHARED :-)
 
 Still, I don't think that's a good idea, since it relies on
 the IRQ being freed *before* the device struct.

that's not an issue at all. If you're using devm_request_irq() you're
likely using devm_kzalloc() for the device struct anyway. Also, you
called devm_kzalloc() before devm_request_irq() so IRQ *will* be freed
before your private data; there's nothing wrong there.

-- 
balbi


signature.asc
Description: Digital signature


CONFIG_DEBUG_SHIRQ and PM

2015-08-25 Thread Felipe Balbi
Hi Ingo,

I'm facing an issue with CONFIG_DEBUG_SHIRQ and pm_runtime when using
devm_request_*irq().

If we using devm_request_*irq(), that irq will be freed after device
drivers' -remove() gets called. If on -remove(), we're calling
pm_runtime_put_sync(); pm_runtime_disable(), device's clocks might get
gated and, because we do an extra call to the device's IRQ handler when
CONFIG_DEBUG_SHIRQ=y, we might trigger an abort exception if, inside the
IRQ handler, we try to read a register which is clocked by the device's
clock.

This is, of course, really old code which has been in tree for many,
many years. I guess nobody has been running their tests in the setup
mentioned above (CONFIG_DEBUG_SHIRQ=y, pm_runtime_put_sync() on
-remove(), a register read on IRQ handler, and a shared IRQ handler),
so that's why we never caught this before.

Disabling CONFIG_DEBUG_SHIRQ, of course, makes the problem go away, but
if driver *must* be ready to receive, and handle, an IRQ even during
module removal, I wonder what the IRQ handler should do. We can't, in
most cases, call pm_runtime_put_sync() from IRQ handler.

I'm guessing the only way would be to move pm_runtime calls into the bus
driver (in this case, the platform_bus) and make sure it only gets
called after device managed resources are freed ?

Any hints would be greatly appreciated.

cheers

-- 
balbi


signature.asc
Description: Digital signature