Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Nov 30, 2006, at 12:07, Maciej W. Rozycki wrote: On Mon, 23 Oct 2006, Maciej W. Rozycki wrote: I'm not too enthusiastic about requiring the ethernet drivers to call phy_disconnect in a separate thread after "close" is called. Assuming there's not some sort of "squash work queue" function that can be invoked with rtnl_lock held, I think phy_disconnect should schedule itself to flush the queue. This would also require that mdiobus_unregister hold off on freeing phydevs if any of the phys were still waiting for pending flush_pending calls to finish. Which would, in turn, require mdiobus_unregister to schedule cleaning up memory for some later time. This could work, indeed. I'm not enthusiastic about that implementation, either, but it maintains the abstractions I consider important for this code. The ethernet driver should not need to know what structures the PHY lib uses to implement its interrupt handling, and how to work around their failings, IMHO. Agreed. So what's the plan? Here's a new version of the patch that addresses your other concerns. So I think the problem is we still don't understand the problem, and the solution to the problem, except that it's causing your driver to lock up. Most of the changes below are fine with me. The confusing one is still the check for current_is_keventd(). This is related in some way to why the driver code invokes phy_disconnect from a work_queue. I admit, though, I'm not familiar enough with the work queue infrastructure to understand the problem. But I'm very certain that creating a work queue for the sole purpose of disconnecting from the PHY is crufty. Can you try again to convey how this solves your problem, so we can try to figure out if there's a better way? Andy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Mon, 23 Oct 2006, Maciej W. Rozycki wrote: > > I'm not too enthusiastic about requiring the ethernet drivers to call > > phy_disconnect in a separate thread after "close" is called. Assuming > > there's > > not some sort of "squash work queue" function that can be invoked with > > rtnl_lock held, I think phy_disconnect should schedule itself to flush the > > queue. This would also require that mdiobus_unregister hold off on freeing > > phydevs if any of the phys were still waiting for pending flush_pending > > calls > > to finish. Which would, in turn, require mdiobus_unregister to schedule > > cleaning up memory for some later time. > > This could work, indeed. > > > I'm not enthusiastic about that implementation, either, but it maintains the > > abstractions I consider important for this code. The ethernet driver should > > not need to know what structures the PHY lib uses to implement its interrupt > > handling, and how to work around their failings, IMHO. > > Agreed. So what's the plan? Here's a new version of the patch that addresses your other concerns. Maciej patch-mips-2.6.18-20060920-phy-irq-18 diff -up --recursive --new-file linux-mips-2.6.18-20060920.macro/drivers/net/phy/phy.c linux-mips-2.6.18-20060920/drivers/net/phy/phy.c --- linux-mips-2.6.18-20060920.macro/drivers/net/phy/phy.c 2006-08-05 04:58:46.0 + +++ linux-mips-2.6.18-20060920/drivers/net/phy/phy.c2006-11-30 17:58:37.0 + @@ -7,6 +7,7 @@ * Author: Andy Fleming * * Copyright (c) 2004 Freescale Semiconductor, Inc. + * Copyright (c) 2006 Maciej W. Rozycki * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the @@ -32,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -484,6 +487,9 @@ static irqreturn_t phy_interrupt(int irq { struct phy_device *phydev = phy_dat; + if (PHY_HALTED == phydev->state) + return IRQ_NONE;/* It can't be ours. */ + /* The MDIO bus is not allowed to be written in interrupt * context, so we need to disable the irq here. A work * queue will write the PHY to disable and clear the @@ -577,6 +583,13 @@ int phy_stop_interrupts(struct phy_devic if (err) phy_error(phydev); + /* +* Finish any pending work; we might have been scheduled +* to be called from keventd ourselves, though. +*/ + if (!current_is_keventd()) + flush_scheduled_work(); + free_irq(phydev->irq, phydev); return err; @@ -596,14 +609,17 @@ static void phy_change(void *data) goto phy_err; spin_lock(&phydev->lock); + if ((PHY_RUNNING == phydev->state) || (PHY_NOLINK == phydev->state)) phydev->state = PHY_CHANGELINK; - spin_unlock(&phydev->lock); enable_irq(phydev->irq); /* Reenable interrupts */ - err = phy_config_interrupt(phydev, PHY_INTERRUPT_ENABLED); + if (PHY_HALTED != phydev->state) + err = phy_config_interrupt(phydev, PHY_INTERRUPT_ENABLED); + + spin_unlock(&phydev->lock); if (err) goto irq_enable_err; @@ -624,15 +640,15 @@ void phy_stop(struct phy_device *phydev) if (PHY_HALTED == phydev->state) goto out_unlock; - if (phydev->irq != PHY_POLL) { - /* Clear any pending interrupts */ - phy_clear_interrupt(phydev); + phydev->state = PHY_HALTED; + if (phydev->irq != PHY_POLL) { /* Disable PHY Interrupts */ phy_config_interrupt(phydev, PHY_INTERRUPT_DISABLED); - } - phydev->state = PHY_HALTED; + /* Clear any pending interrupts */ + phy_clear_interrupt(phydev); + } out_unlock: spin_unlock(&phydev->lock); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Fri, 20 Oct 2006, Andy Fleming wrote: > I've been trying to figure out this problem since you posted this, and I'm not > sure I understand it fully (And I apologize profusely for the horror that is > the PHY interrupt handling code. I'd love to rewrite it if there's some First of all I don't see much of the need to use soft timers with an interrupt-driven PHY. Most of the state changes could be invoked straight from phy_change(), perhaps with an exception for the autonegotiation timeout. > cleaner way.). But let me see if I can follow the chain of reasoning that led > to this patch, and see if we can figure out a solution that doesn't involve > creating a work queue just for bringing down the PHY. Avoiding a separate work queue was my intent as well. > 1) Invoking phy_stop is meant to stop the system from looking for PHY status > updates. Currently, another PHY sharing the interrupt can cause the HALTED > PHY to reenable interrupts. Checking for HALTED in the interrupt handler > fixes this, but it's incorrect. The phy_interrupt handler does not grab the > lock, and so you could get this: > > phy_stop > lock > clear any pending interrupts > disable interrupts on this PHY > ---> interrupt from another PHY causes this PHY's interrupt handler to be > invoked > HALTED isn't set, so phy_change is scheduled > <--- set HALTED, unlock > > scheduled work is done: > interrupt is reenabled > > Sadly, I think the only way to fix this problem is to have phy_change check > for HALTED, and react appropriately. Please have a look at how I have rewritten phy_stop() to avoid this problem with no need for a lock -- HALTED is set first and only then interrupts are masked and cleared for this PHY. It can be done without locking because the interrupt handler is strictly a consumer and phy_stop() is strictly a producer and we do not care about any other transitions [1]. > 2) The PHY lib doesn't clear out remaining work in the work queue when it's > bringing down a PHY. This is clearly wrong, but I'm confused how *any* driver > does this? It seems to me that any network driver which has a work queue is > going to be unable to flush the pending work when it is brought down. So > what's the solution to this? The only driver that seems to care is tg3.c and it gets away by other means. Note that network drivers can quite easily handle and ignore deferred interrupt work as long as it arrives before they are removed from memory. All that is required is calling flush_scheduled_work() from their module_exit() call at the very latest. > I'm not too enthusiastic about requiring the ethernet drivers to call > phy_disconnect in a separate thread after "close" is called. Assuming there's > not some sort of "squash work queue" function that can be invoked with > rtnl_lock held, I think phy_disconnect should schedule itself to flush the > queue. This would also require that mdiobus_unregister hold off on freeing > phydevs if any of the phys were still waiting for pending flush_pending calls > to finish. Which would, in turn, require mdiobus_unregister to schedule > cleaning up memory for some later time. This could work, indeed. > I'm not enthusiastic about that implementation, either, but it maintains the > abstractions I consider important for this code. The ethernet driver should > not need to know what structures the PHY lib uses to implement its interrupt > handling, and how to work around their failings, IMHO. Agreed. > >@@ -484,6 +487,9 @@ static irqreturn_t phy_interrupt(int irq > > { > > struct phy_device *phydev = phy_dat; > > > >+if (PHY_HALTED == phydev->state) > >+return IRQ_NONE;/* It can't be ours. */ > >+ > > > As I mentioned, this doesn't protect it, since it doesn't grab the lock. And > it can't grab the lock, or we'd have to disable interrupts while doing phy > transactions. And we can't do that, because one design goal is to allow some > bus drivers to use interrupts to signal that the transaction has completed. > Admittedly, this is still quite broken right now. I'm looking into using > semaphores, on the theory that I can sleep when I grab them. But that would > still prevent taking the semaphore in the interrupt controller. This needs to > be moved to phy_change (which you have done, anyway), and we just have to let > the actual handler figure out whether it's safe to do anything. There is no problem with accessing the state here -- see above[1] and below[2]. > >@@ -577,6 +583,13 @@ int phy_stop_interrupts(struct phy_devic > > if (err) > > phy_error(phydev); > > > >+/* > >+ * Finish any pending work; we might have been scheduled > >+ * to be called from keventd ourselves, though. > >+ */ > >+if (!current_is_keventd()) > >+flush_scheduled_work(); > >+ > > > And this is what is making you move your call to phy_disconnect to a work > queue function, right? Does th
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Fri, 20 Oct 2006 16:40:20 -0500 Andy Fleming <[EMAIL PROTECTED]> wrote: > >The solution is to ignore phy_interrupt() calls if the reported > > device > >has already been halted and calling flush_scheduled_work() from > >phy_stop_interrupts() (but guarded with current_is_keventd() in > > case > >the function has been called through keventd from the MAC device's > >close call to avoid a deadlock on the netlink lock). > > > I've been trying to figure out this problem since you posted this, > and I'm not sure I understand it fully Me either, but I basically gave up. If it is deadlocky for keventd to call flush_scheduled_work() I fail to see why it is not deadlocky for other processes. If the caller of flush_scheduled_work() holds rtnl_lock, and if a queued work callback also takes rtnl_lock then the flush_scheduled_work() caller will deadlock regardless of whether or not it is keventd. Because the flush_scheduled_work() caller holds a lock which will prevent the flush from completing. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Oct 3, 2006, at 10:18, Maciej W. Rozycki wrote: Hello, This patch fixes a couple of problems discovered with interrupt handling in the phylib core, namely: 1. The driver uses timer and workqueue calls, but does not include nor . Good catch. 2. The driver uses schedule_work() for handling interrupts, but does not make sure any pending work scheduled thus has been completed before driver's structures get freed from memory. This is especially important as interrupts may keep arriving if the line is shared with another PHY. The solution is to ignore phy_interrupt() calls if the reported device has already been halted and calling flush_scheduled_work() from phy_stop_interrupts() (but guarded with current_is_keventd() in case the function has been called through keventd from the MAC device's close call to avoid a deadlock on the netlink lock). I've been trying to figure out this problem since you posted this, and I'm not sure I understand it fully (And I apologize profusely for the horror that is the PHY interrupt handling code. I'd love to rewrite it if there's some cleaner way.). But let me see if I can follow the chain of reasoning that led to this patch, and see if we can figure out a solution that doesn't involve creating a work queue just for bringing down the PHY. 1) Invoking phy_stop is meant to stop the system from looking for PHY status updates. Currently, another PHY sharing the interrupt can cause the HALTED PHY to reenable interrupts. Checking for HALTED in the interrupt handler fixes this, but it's incorrect. The phy_interrupt handler does not grab the lock, and so you could get this: phy_stop lock clear any pending interrupts disable interrupts on this PHY ---> interrupt from another PHY causes this PHY's interrupt handler to be invoked HALTED isn't set, so phy_change is scheduled <--- set HALTED, unlock scheduled work is done: interrupt is reenabled Sadly, I think the only way to fix this problem is to have phy_change check for HALTED, and react appropriately. 2) The PHY lib doesn't clear out remaining work in the work queue when it's bringing down a PHY. This is clearly wrong, but I'm confused how *any* driver does this? It seems to me that any network driver which has a work queue is going to be unable to flush the pending work when it is brought down. So what's the solution to this? I'm not too enthusiastic about requiring the ethernet drivers to call phy_disconnect in a separate thread after "close" is called. Assuming there's not some sort of "squash work queue" function that can be invoked with rtnl_lock held, I think phy_disconnect should schedule itself to flush the queue. This would also require that mdiobus_unregister hold off on freeing phydevs if any of the phys were still waiting for pending flush_pending calls to finish. Which would, in turn, require mdiobus_unregister to schedule cleaning up memory for some later time. I'm not enthusiastic about that implementation, either, but it maintains the abstractions I consider important for this code. The ethernet driver should not need to know what structures the PHY lib uses to implement its interrupt handling, and how to work around their failings, IMHO. And now, to be more specific, a few comments on the patch, directly: Please consider. Maciej Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]> patch-mips-2.6.18-20060920-phy-irq-16 diff -up --recursive --new-file linux-mips-2.6.18-20060920.macro/ drivers/net/phy/phy.c linux-mips-2.6.18-20060920/drivers/net/phy/phy.c --- linux-mips-2.6.18-20060920.macro/drivers/net/phy/phy.c 2006-08-05 04:58:46.0 + +++ linux-mips-2.6.18-20060920/drivers/net/phy/phy.c 2006-10-03 14:19:21.0 + @@ -7,6 +7,7 @@ * Author: Andy Fleming * * Copyright (c) 2004 Freescale Semiconductor, Inc. + * Copyright (c) 2006 Maciej W. Rozycki * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the @@ -32,6 +33,8 @@ #include #include #include +#include +#include #include #include @@ -484,6 +487,9 @@ static irqreturn_t phy_interrupt(int irq { struct phy_device *phydev = phy_dat; + if (PHY_HALTED == phydev->state) + return IRQ_NONE;/* It can't be ours. */ + As I mentioned, this doesn't protect it, since it doesn't grab the lock. And it can't grab the lock, or we'd have to disable interrupts while doing phy transactions. And we can't do that, because one design goal is to allow some bus drivers to use interrupts to signal that the transaction has completed. Admittedly, this is still quite broken right now. I'm looking into using semaphores, on the theory that I can sleep when I grab them. But that would still pr
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Mon, 16 Oct 2006, Andrew Morton wrote: > Vaguely. Why doesn't it deadlock if !current_is_keventd()? I mean, > whether or not the caller is keventd, the flush_scheduled_work() caller > will still be dependent upon rtnl_lock() being acquirable. This !current_is_keventd() condition is just what it is, i.e. a check whether phy_stop_interrupts() has been scheduled through keventd or not. If it has, then flush_scheduled_work() cannot be called; it is not needed anyway. Otherwise phy_stop_interrupts() has to make sure no deferred calls to phy_change() will be made once it has finished. In all cases the assumption is the caller has made sure rtnl_lock() is not held at the time phy_stop_interrupts() is called. That's the very reason for scheduling phy_disconnect() (and hence phy_stop_interrupts()) through keventd. Maciej - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Mon, 16 Oct 2006 15:50:55 +0100 (BST) "Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote: > Andrew, > > > I don't get it. If some code does > > > > rtnl_lock(); > > flush_scheduled_work(); > > > > and there's some work scheduled which does rtnl_lock() then it'll deadlock. > > > > But it'll deadlock whether or not the caller of flush_scheduled_work() is > > keventd. > > > > Calling flush_scheduled_work() under locks is generally a bad idea. > > Indeed -- this is why I avoid it. > > > I'd have thought that was still deadlockable. Could you describe the > > deadlock more completely please? > > The simplest sequence of calls that prevents races here is as follows: > > unregister_netdev(); > rtnl_lock(); > unregister_netdevice(); > dev_close(); > sbmac_close(); > phy_stop(); > phy_disconnect(); > phy_stop_interrupts(); > phy_disable_interrupts(); > flush_scheduled_work(); > free_irq(); > phy_detach(); > mdiobus_unregister(); > rtnl_unlock(); > > We want to call flush_scheduled_work() from phy_stop_interrupts(), because > there may still be calls to phy_change() waiting for the keventd to > process and mdiobus_unregister() frees structures needed by phy_change(). > This may deadlock because of the call to rtnl_lock() though. > > So the modified sequence I have implemented is as follows: > > unregister_netdev(); > rtnl_lock(); > unregister_netdevice(); > dev_close(); > sbmac_close(); > phy_stop(); > schedule_work(); [sbmac_phy_disconnect()] > rtnl_unlock(); > > and then: > > sbmac_phy_disconnect(); > phy_disconnect(); > phy_stop_interrupts(); > phy_disable_interrupts(); > free_irq(); > phy_detach(); > mdiobus_unregister(); > > This guarantees calls to phy_change() for this PHY unit will not be run > after mdiobus_unregister(), because any such calls have been placed in the > queue before sbmac_phy_disconnect() (phy_stop() prevents the interrupt > handler from issuing new calls to phy_change()). > > We still need flush_scheduled_work() to be called from > phy_stop_interrupts() though, in case some other MAC driver calls > phy_disconnect() (or phy_stop_interrupts(), depending on the abstraction > layer of phylib used) directly rather than using keventd. This is > possible if phy_disconnect() is called from the driver's module_exit() > call, which may make sense for devices that are known not to have their > MII interface available as an external connector. Hence the: > > if (!current_is_keventd()) > flush_scheduled_work(); > > sequence in phy_stop_interrupts(). Of course, we can force all drivers > using phylib (in the interrupt mode) to call phy_disconnect() through > keventd instead. > > Does it sound clearer? > Vaguely. Why doesn't it deadlock if !current_is_keventd()? I mean, whether or not the caller is keventd, the flush_scheduled_work() caller will still be dependent upon rtnl_lock() being acquirable. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
Andrew, > I don't get it. If some code does > > rtnl_lock(); > flush_scheduled_work(); > > and there's some work scheduled which does rtnl_lock() then it'll deadlock. > > But it'll deadlock whether or not the caller of flush_scheduled_work() is > keventd. > > Calling flush_scheduled_work() under locks is generally a bad idea. Indeed -- this is why I avoid it. > I'd have thought that was still deadlockable. Could you describe the > deadlock more completely please? The simplest sequence of calls that prevents races here is as follows: unregister_netdev(); rtnl_lock(); unregister_netdevice(); dev_close(); sbmac_close(); phy_stop(); phy_disconnect(); phy_stop_interrupts(); phy_disable_interrupts(); flush_scheduled_work(); free_irq(); phy_detach(); mdiobus_unregister(); rtnl_unlock(); We want to call flush_scheduled_work() from phy_stop_interrupts(), because there may still be calls to phy_change() waiting for the keventd to process and mdiobus_unregister() frees structures needed by phy_change(). This may deadlock because of the call to rtnl_lock() though. So the modified sequence I have implemented is as follows: unregister_netdev(); rtnl_lock(); unregister_netdevice(); dev_close(); sbmac_close(); phy_stop(); schedule_work(); [sbmac_phy_disconnect()] rtnl_unlock(); and then: sbmac_phy_disconnect(); phy_disconnect(); phy_stop_interrupts(); phy_disable_interrupts(); free_irq(); phy_detach(); mdiobus_unregister(); This guarantees calls to phy_change() for this PHY unit will not be run after mdiobus_unregister(), because any such calls have been placed in the queue before sbmac_phy_disconnect() (phy_stop() prevents the interrupt handler from issuing new calls to phy_change()). We still need flush_scheduled_work() to be called from phy_stop_interrupts() though, in case some other MAC driver calls phy_disconnect() (or phy_stop_interrupts(), depending on the abstraction layer of phylib used) directly rather than using keventd. This is possible if phy_disconnect() is called from the driver's module_exit() call, which may make sense for devices that are known not to have their MII interface available as an external connector. Hence the: if (!current_is_keventd()) flush_scheduled_work(); sequence in phy_stop_interrupts(). Of course, we can force all drivers using phylib (in the interrupt mode) to call phy_disconnect() through keventd instead. Does it sound clearer? Maciej - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Fri, 6 Oct 2006 12:26:22 +0100 (BST) "Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote: > On Thu, 5 Oct 2006, Andrew Morton wrote: > > > > 2. The driver uses schedule_work() for handling interrupts, but does not > > >make sure any pending work scheduled thus has been completed before > > >driver's structures get freed from memory. This is especially > > >important as interrupts may keep arriving if the line is shared with > > >another PHY. > > > > > >The solution is to ignore phy_interrupt() calls if the reported device > > >has already been halted and calling flush_scheduled_work() from > > >phy_stop_interrupts() (but guarded with current_is_keventd() in case > > >the function has been called through keventd from the MAC device's > > >close call to avoid a deadlock on the netlink lock). > > > > > > > eww, hack. > > > > Also not module-friendly: > > > > WARNING: "current_is_keventd" [drivers/net/phy/libphy.ko] undefined! > > > > Does this > > > > static void flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) > > { > > if (cwq->thread == current) { > > /* > > * Probably keventd trying to flush its own queue. So simply run > > * it by hand rather than deadlocking. > > */ > > run_workqueue(cwq); > > > > not work??? > > It does, too well even! -- in the case I am trying to take care of we are > run with "rtnl_mutex" held as a result of rtnl_lock() called from > unregister_netdev() and there is some work already in the workqueue > (linkwatch_event(), apparently) wanting to acquire the same lock. Hence a > deadlock. I don't get it. If some code does rtnl_lock(); flush_scheduled_work(); and there's some work scheduled which does rtnl_lock() then it'll deadlock. But it'll deadlock whether or not the caller of flush_scheduled_work() is keventd. Calling flush_scheduled_work() under locks is generally a bad idea. > It seems problematic elsewhere as well -- see tg3.c -- but a > different way to deal with it has been found there. > > I am not that fond of this solution, but at least it works, unlike > calling flush_scheduled_work() here, which deadlocks the current CPU in my > system instantly. Any suggestions as to how to do this differently? > Perhaps linkwatch_event() should be scheduled differently (using a > separate workqueue?)... I'd have thought that was still deadlockable. Could you describe the deadlock more completely please? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Thu, 5 Oct 2006, Andrew Morton wrote: > > 2. The driver uses schedule_work() for handling interrupts, but does not > >make sure any pending work scheduled thus has been completed before > >driver's structures get freed from memory. This is especially > >important as interrupts may keep arriving if the line is shared with > >another PHY. > > > >The solution is to ignore phy_interrupt() calls if the reported device > >has already been halted and calling flush_scheduled_work() from > >phy_stop_interrupts() (but guarded with current_is_keventd() in case > >the function has been called through keventd from the MAC device's > >close call to avoid a deadlock on the netlink lock). > > > > eww, hack. > > Also not module-friendly: > > WARNING: "current_is_keventd" [drivers/net/phy/libphy.ko] undefined! > > Does this > > static void flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) > { > if (cwq->thread == current) { > /* >* Probably keventd trying to flush its own queue. So simply run >* it by hand rather than deadlocking. >*/ > run_workqueue(cwq); > > not work??? It does, too well even! -- in the case I am trying to take care of we are run with "rtnl_mutex" held as a result of rtnl_lock() called from unregister_netdev() and there is some work already in the workqueue (linkwatch_event(), apparently) wanting to acquire the same lock. Hence a deadlock. It seems problematic elsewhere as well -- see tg3.c -- but a different way to deal with it has been found there. I am not that fond of this solution, but at least it works, unlike calling flush_scheduled_work() here, which deadlocks the current CPU in my system instantly. Any suggestions as to how to do this differently? Perhaps linkwatch_event() should be scheduled differently (using a separate workqueue?)... Or does dev_close() have to be called under this lock from unregister_netdevice()? Perhaps code like this would do: while (dev->flags & IFF_UP) { rtnl_unlock(); dev_close(dev); rtnl_lock(); } Maciej - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes
On Tue, 3 Oct 2006 16:18:35 +0100 (BST) "Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote: > > 2. The driver uses schedule_work() for handling interrupts, but does not >make sure any pending work scheduled thus has been completed before >driver's structures get freed from memory. This is especially >important as interrupts may keep arriving if the line is shared with >another PHY. > >The solution is to ignore phy_interrupt() calls if the reported device >has already been halted and calling flush_scheduled_work() from >phy_stop_interrupts() (but guarded with current_is_keventd() in case >the function has been called through keventd from the MAC device's >close call to avoid a deadlock on the netlink lock). > eww, hack. Also not module-friendly: WARNING: "current_is_keventd" [drivers/net/phy/libphy.ko] undefined! Does this static void flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) { if (cwq->thread == current) { /* * Probably keventd trying to flush its own queue. So simply run * it by hand rather than deadlocking. */ run_workqueue(cwq); not work??? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html