Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-12-04 Thread Andy Fleming


On Nov 30, 2006, at 12:07, Maciej W. Rozycki wrote:


On Mon, 23 Oct 2006, Maciej W. Rozycki wrote:

I'm not too enthusiastic about requiring the ethernet drivers to  
call
phy_disconnect in a separate thread after "close" is called.   
Assuming there's
not some sort of "squash work queue" function that can be invoked  
with
rtnl_lock held, I think phy_disconnect should schedule itself to  
flush the
queue.  This would also require that mdiobus_unregister hold off  
on freeing
phydevs if any of the phys were still waiting for pending  
flush_pending calls
to finish.  Which would, in turn, require mdiobus_unregister to  
schedule

cleaning up memory for some later time.


 This could work, indeed.

I'm not enthusiastic about that implementation, either, but it  
maintains the
abstractions I consider important for this code.  The ethernet  
driver should
not need to know what structures the PHY lib uses to implement  
its interrupt

handling, and how to work around their failings, IMHO.


 Agreed.


 So what's the plan?

 Here's a new version of the patch that addresses your other concerns.



So I think the problem is we still don't understand the problem, and  
the solution to the problem, except that it's causing your driver to  
lock up.  Most of the changes below are fine with me.  The confusing  
one is still the check for current_is_keventd().  This is related in  
some way to why the driver code invokes phy_disconnect from a  
work_queue.  I admit, though, I'm not familiar enough with the work  
queue infrastructure to understand the problem.  But I'm very certain  
that creating a work queue for the sole purpose of disconnecting from  
the PHY is crufty.


Can you try again to convey how this solves your problem, so we can  
try to figure out if there's a better way?


Andy


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-11-30 Thread Maciej W. Rozycki
On Mon, 23 Oct 2006, Maciej W. Rozycki wrote:

> > I'm not too enthusiastic about requiring the ethernet drivers to call
> > phy_disconnect in a separate thread after "close" is called.  Assuming 
> > there's
> > not some sort of "squash work queue" function that can be invoked with
> > rtnl_lock held, I think phy_disconnect should schedule itself to flush the
> > queue.  This would also require that mdiobus_unregister hold off on freeing
> > phydevs if any of the phys were still waiting for pending flush_pending 
> > calls
> > to finish.  Which would, in turn, require mdiobus_unregister to schedule
> > cleaning up memory for some later time.
> 
>  This could work, indeed.
> 
> > I'm not enthusiastic about that implementation, either, but it maintains the
> > abstractions I consider important for this code.  The ethernet driver should
> > not need to know what structures the PHY lib uses to implement its interrupt
> > handling, and how to work around their failings, IMHO.
> 
>  Agreed.

 So what's the plan?

 Here's a new version of the patch that addresses your other concerns.

  Maciej

patch-mips-2.6.18-20060920-phy-irq-18
diff -up --recursive --new-file 
linux-mips-2.6.18-20060920.macro/drivers/net/phy/phy.c 
linux-mips-2.6.18-20060920/drivers/net/phy/phy.c
--- linux-mips-2.6.18-20060920.macro/drivers/net/phy/phy.c  2006-08-05 
04:58:46.0 +
+++ linux-mips-2.6.18-20060920/drivers/net/phy/phy.c2006-11-30 
17:58:37.0 +
@@ -7,6 +7,7 @@
  * Author: Andy Fleming
  *
  * Copyright (c) 2004 Freescale Semiconductor, Inc.
+ * Copyright (c) 2006  Maciej W. Rozycki
  *
  * This program is free software; you can redistribute  it and/or modify it
  * under  the terms of  the GNU General  Public License as published by the
@@ -32,6 +33,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -484,6 +487,9 @@ static irqreturn_t phy_interrupt(int irq
 {
struct phy_device *phydev = phy_dat;
 
+   if (PHY_HALTED == phydev->state)
+   return IRQ_NONE;/* It can't be ours.  */
+
/* The MDIO bus is not allowed to be written in interrupt
 * context, so we need to disable the irq here.  A work
 * queue will write the PHY to disable and clear the
@@ -577,6 +583,13 @@ int phy_stop_interrupts(struct phy_devic
if (err)
phy_error(phydev);
 
+   /*
+* Finish any pending work; we might have been scheduled
+* to be called from keventd ourselves, though.
+*/
+   if (!current_is_keventd())
+   flush_scheduled_work();
+
free_irq(phydev->irq, phydev);
 
return err;
@@ -596,14 +609,17 @@ static void phy_change(void *data)
goto phy_err;
 
spin_lock(&phydev->lock);
+
if ((PHY_RUNNING == phydev->state) || (PHY_NOLINK == phydev->state))
phydev->state = PHY_CHANGELINK;
-   spin_unlock(&phydev->lock);
 
enable_irq(phydev->irq);
 
/* Reenable interrupts */
-   err = phy_config_interrupt(phydev, PHY_INTERRUPT_ENABLED);
+   if (PHY_HALTED != phydev->state)
+   err = phy_config_interrupt(phydev, PHY_INTERRUPT_ENABLED);
+
+   spin_unlock(&phydev->lock);
 
if (err)
goto irq_enable_err;
@@ -624,15 +640,15 @@ void phy_stop(struct phy_device *phydev)
if (PHY_HALTED == phydev->state)
goto out_unlock;
 
-   if (phydev->irq != PHY_POLL) {
-   /* Clear any pending interrupts */
-   phy_clear_interrupt(phydev);
+   phydev->state = PHY_HALTED;
 
+   if (phydev->irq != PHY_POLL) {
/* Disable PHY Interrupts */
phy_config_interrupt(phydev, PHY_INTERRUPT_DISABLED);
-   }
 
-   phydev->state = PHY_HALTED;
+   /* Clear any pending interrupts */
+   phy_clear_interrupt(phydev);
+   }
 
 out_unlock:
spin_unlock(&phydev->lock);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-10-23 Thread Maciej W. Rozycki
On Fri, 20 Oct 2006, Andy Fleming wrote:

> I've been trying to figure out this problem since you posted this, and I'm not
> sure I understand it fully (And I apologize profusely for the horror that is
> the PHY interrupt handling code.  I'd love to rewrite it if there's some

 First of all I don't see much of the need to use soft timers with an 
interrupt-driven PHY.  Most of the state changes could be invoked straight 
from phy_change(), perhaps with an exception for the autonegotiation 
timeout.

> cleaner way.).  But let me see if I can follow the chain of reasoning that led
> to this patch, and see if we can figure out a solution that doesn't involve
> creating a work queue just for bringing down the PHY.

 Avoiding a separate work queue was my intent as well.

> 1) Invoking phy_stop is meant to stop the system from looking for PHY status
> updates.  Currently, another PHY sharing the interrupt can cause the HALTED
> PHY to reenable interrupts.  Checking for HALTED in the interrupt handler
> fixes this, but it's incorrect.  The phy_interrupt handler does not grab the
> lock, and so you could get this:
> 
> phy_stop
>   lock
>   clear any pending interrupts
>   disable interrupts on this PHY
> ---> interrupt from another PHY causes this PHY's interrupt handler to be
> invoked
>   HALTED isn't set, so phy_change is scheduled
> <--- set HALTED, unlock
> 
> scheduled work is done:
>   interrupt is reenabled
> 
> Sadly, I think the only way to fix this problem is to have phy_change check
> for HALTED, and react appropriately.

 Please have a look at how I have rewritten phy_stop() to avoid this 
problem with no need for a lock -- HALTED is set first and only then 
interrupts are masked and cleared for this PHY.  It can be done without 
locking because the interrupt handler is strictly a consumer and 
phy_stop() is strictly a producer and we do not care about any other 
transitions [1].

> 2) The PHY lib doesn't clear out remaining work in the work queue when it's
> bringing down a PHY.  This is clearly wrong, but I'm confused how *any* driver
> does this?  It seems to me that any network driver which has a work queue is
> going to be unable to flush the pending work when it is brought down.  So
> what's the solution to this?

 The only driver that seems to care is tg3.c and it gets away by other 
means.  Note that network drivers can quite easily handle and ignore 
deferred interrupt work as long as it arrives before they are removed from 
memory.  All that is required is calling flush_scheduled_work() from their 
module_exit() call at the very latest.

> I'm not too enthusiastic about requiring the ethernet drivers to call
> phy_disconnect in a separate thread after "close" is called.  Assuming there's
> not some sort of "squash work queue" function that can be invoked with
> rtnl_lock held, I think phy_disconnect should schedule itself to flush the
> queue.  This would also require that mdiobus_unregister hold off on freeing
> phydevs if any of the phys were still waiting for pending flush_pending calls
> to finish.  Which would, in turn, require mdiobus_unregister to schedule
> cleaning up memory for some later time.

 This could work, indeed.

> I'm not enthusiastic about that implementation, either, but it maintains the
> abstractions I consider important for this code.  The ethernet driver should
> not need to know what structures the PHY lib uses to implement its interrupt
> handling, and how to work around their failings, IMHO.

 Agreed.

> >@@ -484,6 +487,9 @@ static irqreturn_t phy_interrupt(int irq
> > {
> >  struct phy_device *phydev = phy_dat;
> >
> >+if (PHY_HALTED == phydev->state)
> >+return IRQ_NONE;/* It can't be ours.  */
> >+
> 
> 
> As I mentioned, this doesn't protect it, since it doesn't grab the lock.  And
> it can't grab the lock, or we'd have to disable interrupts while doing phy
> transactions.  And we can't do that, because one design goal is to allow some
> bus drivers to use interrupts to signal that the transaction has completed.
> Admittedly, this is still quite broken right now.  I'm looking into using
> semaphores, on the theory that I can sleep when I grab them.  But that would
> still prevent taking the semaphore in the interrupt controller.  This needs to
> be moved to phy_change (which you have done, anyway), and we just have to let
> the actual handler figure out whether it's safe to do anything.

 There is no problem with accessing the state here -- see above[1] and
below[2].

> >@@ -577,6 +583,13 @@ int phy_stop_interrupts(struct phy_devic
> >  if (err)
> >   phy_error(phydev);
> >
> >+/*
> >+ * Finish any pending work; we might have been scheduled
> >+ * to be called from keventd ourselves, though.
> >+ */
> >+if (!current_is_keventd())
> >+flush_scheduled_work();
> >+
> 
> 
> And this is what is making you move your call to phy_disconnect to a work
> queue function, right?  Does th

Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-10-20 Thread Andrew Morton
On Fri, 20 Oct 2006 16:40:20 -0500
Andy Fleming <[EMAIL PROTECTED]> wrote:

> >The solution is to ignore phy_interrupt() calls if the reported  
> > device
> >has already been halted and calling flush_scheduled_work() from
> >phy_stop_interrupts() (but guarded with current_is_keventd() in  
> > case
> >the function has been called through keventd from the MAC device's
> >close call to avoid a deadlock on the netlink lock).
> 
> 
> I've been trying to figure out this problem since you posted this,  
> and I'm not sure I understand it fully

Me either, but I basically gave up.

If it is deadlocky for keventd to call flush_scheduled_work() I fail to see
why it is not deadlocky for other processes.

If the caller of flush_scheduled_work() holds rtnl_lock, and if a queued
work callback also takes rtnl_lock then the flush_scheduled_work() caller
will deadlock regardless of whether or not it is keventd.

Because the flush_scheduled_work() caller holds a lock which will prevent
the flush from completing.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-10-20 Thread Andy Fleming


On Oct 3, 2006, at 10:18, Maciej W. Rozycki wrote:


Hello,

 This patch fixes a couple of problems discovered with interrupt  
handling

in the phylib core, namely:

1. The driver uses timer and workqueue calls, but does not include
nor .



Good catch.




2. The driver uses schedule_work() for handling interrupts, but  
does not

   make sure any pending work scheduled thus has been completed before
   driver's structures get freed from memory.  This is especially
   important as interrupts may keep arriving if the line is shared  
with

   another PHY.

   The solution is to ignore phy_interrupt() calls if the reported  
device

   has already been halted and calling flush_scheduled_work() from
   phy_stop_interrupts() (but guarded with current_is_keventd() in  
case

   the function has been called through keventd from the MAC device's
   close call to avoid a deadlock on the netlink lock).



I've been trying to figure out this problem since you posted this,  
and I'm not sure I understand it fully (And I apologize profusely for  
the horror that is the PHY interrupt handling code.  I'd love to  
rewrite it if there's some cleaner way.).  But let me see if I can  
follow the chain of reasoning that led to this patch, and see if we  
can figure out a solution that doesn't involve creating a work queue  
just for bringing down the PHY.


1) Invoking phy_stop is meant to stop the system from looking for PHY  
status updates.  Currently, another PHY sharing the interrupt can  
cause the HALTED PHY to reenable interrupts.  Checking for HALTED in  
the interrupt handler fixes this, but it's incorrect.  The  
phy_interrupt handler does not grab the lock, and so you could get this:


phy_stop
lock
clear any pending interrupts
disable interrupts on this PHY
---> interrupt from another PHY causes this PHY's interrupt handler  
to be invoked

HALTED isn't set, so phy_change is scheduled
<--- set HALTED, unlock

scheduled work is done:
interrupt is reenabled

Sadly, I think the only way to fix this problem is to have phy_change  
check for HALTED, and react appropriately.


2) The PHY lib doesn't clear out remaining work in the work queue  
when it's bringing down a PHY.  This is clearly wrong, but I'm  
confused how *any* driver does this?  It seems to me that any network  
driver which has a work queue is going to be unable to flush the  
pending work when it is brought down.  So what's the solution to this?


I'm not too enthusiastic about requiring the ethernet drivers to call  
phy_disconnect in a separate thread after "close" is called.   
Assuming there's not some sort of "squash work queue" function that  
can be invoked with rtnl_lock held, I think phy_disconnect should  
schedule itself to flush the queue.  This would also require that  
mdiobus_unregister hold off on freeing phydevs if any of the phys  
were still waiting for pending flush_pending calls to finish.  Which  
would, in turn, require mdiobus_unregister to schedule cleaning up  
memory for some later time.


I'm not enthusiastic about that implementation, either, but it  
maintains the abstractions I consider important for this code.  The  
ethernet driver should not need to know what structures the PHY lib  
uses to implement its interrupt handling, and how to work around  
their failings, IMHO.


And now, to be more specific, a few comments on the patch, directly:



 Please consider.

  Maciej

Signed-off-by: Maciej W. Rozycki <[EMAIL PROTECTED]>

patch-mips-2.6.18-20060920-phy-irq-16
diff -up --recursive --new-file linux-mips-2.6.18-20060920.macro/ 
drivers/net/phy/phy.c linux-mips-2.6.18-20060920/drivers/net/phy/phy.c
--- linux-mips-2.6.18-20060920.macro/drivers/net/phy/phy.c	 
2006-08-05 04:58:46.0 +
+++ linux-mips-2.6.18-20060920/drivers/net/phy/phy.c	2006-10-03  
14:19:21.0 +

@@ -7,6 +7,7 @@
  * Author: Andy Fleming
  *
  * Copyright (c) 2004 Freescale Semiconductor, Inc.
+ * Copyright (c) 2006  Maciej W. Rozycki
  *
  * This program is free software; you can redistribute  it and/or  
modify it
  * under  the terms of  the GNU General  Public License as  
published by the

@@ -32,6 +33,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 

 #include 
 #include 
@@ -484,6 +487,9 @@ static irqreturn_t phy_interrupt(int irq
 {
struct phy_device *phydev = phy_dat;

+   if (PHY_HALTED == phydev->state)
+   return IRQ_NONE;/* It can't be ours.  */
+



As I mentioned, this doesn't protect it, since it doesn't grab the  
lock.  And it can't grab the lock, or we'd have to disable interrupts  
while doing phy transactions.  And we can't do that, because one  
design goal is to allow some bus drivers to use interrupts to signal  
that the transaction has completed.  Admittedly, this is still quite  
broken right now.  I'm looking into using semaphores, on the theory  
that I can sleep when I grab them.  But that would still pr

Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-10-17 Thread Maciej W. Rozycki
On Mon, 16 Oct 2006, Andrew Morton wrote:

> Vaguely.  Why doesn't it deadlock if !current_is_keventd()?  I mean,
> whether or not the caller is keventd, the flush_scheduled_work() caller
> will still be dependent upon rtnl_lock() being acquirable.

 This !current_is_keventd() condition is just what it is, i.e. a check 
whether phy_stop_interrupts() has been scheduled through keventd or not.  
If it has, then flush_scheduled_work() cannot be called; it is not needed 
anyway.  Otherwise phy_stop_interrupts() has to make sure no deferred 
calls to phy_change() will be made once it has finished.

 In all cases the assumption is the caller has made sure rtnl_lock() is 
not held at the time phy_stop_interrupts() is called.  That's the very 
reason for scheduling phy_disconnect() (and hence phy_stop_interrupts()) 
through keventd.

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-10-16 Thread Andrew Morton
On Mon, 16 Oct 2006 15:50:55 +0100 (BST)
"Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote:

> Andrew,
> 
> > I don't get it.  If some code does
> > 
> > rtnl_lock();
> > flush_scheduled_work();
> > 
> > and there's some work scheduled which does rtnl_lock() then it'll deadlock.
> > 
> > But it'll deadlock whether or not the caller of flush_scheduled_work() is
> > keventd.
> > 
> > Calling flush_scheduled_work() under locks is generally a bad idea.
> 
>  Indeed -- this is why I avoid it.
> 
> > I'd have thought that was still deadlockable.  Could you describe the
> > deadlock more completely please?
> 
>  The simplest sequence of calls that prevents races here is as follows:
> 
> unregister_netdev();
>   rtnl_lock();
>   unregister_netdevice();
> dev_close();
>   sbmac_close();
> phy_stop();
> phy_disconnect();
>   phy_stop_interrupts();
> phy_disable_interrupts();
> flush_scheduled_work();
> free_irq();
>   phy_detach();
> mdiobus_unregister();
>   rtnl_unlock();
> 
> We want to call flush_scheduled_work() from phy_stop_interrupts(), because 
> there may still be calls to phy_change() waiting for the keventd to 
> process and mdiobus_unregister() frees structures needed by phy_change().  
> This may deadlock because of the call to rtnl_lock() though.
> 
>  So the modified sequence I have implemented is as follows:
> 
> unregister_netdev();
>   rtnl_lock();
>   unregister_netdevice();
> dev_close();
>   sbmac_close();
> phy_stop();
> schedule_work(); [sbmac_phy_disconnect()]
>   rtnl_unlock();
> 
> and then:
> 
> sbmac_phy_disconnect();
>   phy_disconnect();
> phy_stop_interrupts();
>   phy_disable_interrupts();
>   free_irq();
> phy_detach();
>   mdiobus_unregister();
> 
> This guarantees calls to phy_change() for this PHY unit will not be run 
> after mdiobus_unregister(), because any such calls have been placed in the 
> queue before sbmac_phy_disconnect() (phy_stop() prevents the interrupt 
> handler from issuing new calls to phy_change()).
> 
>  We still need flush_scheduled_work() to be called from 
> phy_stop_interrupts() though, in case some other MAC driver calls 
> phy_disconnect() (or phy_stop_interrupts(), depending on the abstraction 
> layer of phylib used) directly rather than using keventd.  This is 
> possible if phy_disconnect() is called from the driver's module_exit() 
> call, which may make sense for devices that are known not to have their 
> MII interface available as an external connector.  Hence the:
> 
> if (!current_is_keventd())
>   flush_scheduled_work();
> 
> sequence in phy_stop_interrupts().  Of course, we can force all drivers 
> using phylib (in the interrupt mode) to call phy_disconnect() through 
> keventd instead.
> 
>  Does it sound clearer?
> 

Vaguely.  Why doesn't it deadlock if !current_is_keventd()?  I mean,
whether or not the caller is keventd, the flush_scheduled_work() caller
will still be dependent upon rtnl_lock() being acquirable.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-10-16 Thread Maciej W. Rozycki
Andrew,

> I don't get it.  If some code does
> 
>   rtnl_lock();
>   flush_scheduled_work();
> 
> and there's some work scheduled which does rtnl_lock() then it'll deadlock.
> 
> But it'll deadlock whether or not the caller of flush_scheduled_work() is
> keventd.
> 
> Calling flush_scheduled_work() under locks is generally a bad idea.

 Indeed -- this is why I avoid it.

> I'd have thought that was still deadlockable.  Could you describe the
> deadlock more completely please?

 The simplest sequence of calls that prevents races here is as follows:

unregister_netdev();
  rtnl_lock();
  unregister_netdevice();
dev_close();
  sbmac_close();
phy_stop();
phy_disconnect();
  phy_stop_interrupts();
phy_disable_interrupts();
flush_scheduled_work();
free_irq();
  phy_detach();
mdiobus_unregister();
  rtnl_unlock();

We want to call flush_scheduled_work() from phy_stop_interrupts(), because 
there may still be calls to phy_change() waiting for the keventd to 
process and mdiobus_unregister() frees structures needed by phy_change().  
This may deadlock because of the call to rtnl_lock() though.

 So the modified sequence I have implemented is as follows:

unregister_netdev();
  rtnl_lock();
  unregister_netdevice();
dev_close();
  sbmac_close();
phy_stop();
schedule_work(); [sbmac_phy_disconnect()]
  rtnl_unlock();

and then:

sbmac_phy_disconnect();
  phy_disconnect();
phy_stop_interrupts();
  phy_disable_interrupts();
  free_irq();
phy_detach();
  mdiobus_unregister();

This guarantees calls to phy_change() for this PHY unit will not be run 
after mdiobus_unregister(), because any such calls have been placed in the 
queue before sbmac_phy_disconnect() (phy_stop() prevents the interrupt 
handler from issuing new calls to phy_change()).

 We still need flush_scheduled_work() to be called from 
phy_stop_interrupts() though, in case some other MAC driver calls 
phy_disconnect() (or phy_stop_interrupts(), depending on the abstraction 
layer of phylib used) directly rather than using keventd.  This is 
possible if phy_disconnect() is called from the driver's module_exit() 
call, which may make sense for devices that are known not to have their 
MII interface available as an external connector.  Hence the:

if (!current_is_keventd())
  flush_scheduled_work();

sequence in phy_stop_interrupts().  Of course, we can force all drivers 
using phylib (in the interrupt mode) to call phy_disconnect() through 
keventd instead.

 Does it sound clearer?

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-10-06 Thread Andrew Morton
On Fri, 6 Oct 2006 12:26:22 +0100 (BST)
"Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote:

> On Thu, 5 Oct 2006, Andrew Morton wrote:
> 
> > > 2. The driver uses schedule_work() for handling interrupts, but does not 
> > >make sure any pending work scheduled thus has been completed before 
> > >driver's structures get freed from memory.  This is especially 
> > >important as interrupts may keep arriving if the line is shared with 
> > >another PHY.
> > > 
> > >The solution is to ignore phy_interrupt() calls if the reported device 
> > >has already been halted and calling flush_scheduled_work() from 
> > >phy_stop_interrupts() (but guarded with current_is_keventd() in case 
> > >the function has been called through keventd from the MAC device's 
> > >close call to avoid a deadlock on the netlink lock).
> > > 
> > 
> > eww, hack.
> > 
> > Also not module-friendly:
> > 
> > WARNING: "current_is_keventd" [drivers/net/phy/libphy.ko] undefined!
> > 
> > Does this
> > 
> > static void flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
> > {
> > if (cwq->thread == current) {
> > /*
> >  * Probably keventd trying to flush its own queue. So simply run
> >  * it by hand rather than deadlocking.
> >  */
> > run_workqueue(cwq);
> > 
> > not work???
> 
>  It does, too well even! -- in the case I am trying to take care of we are 
> run with "rtnl_mutex" held as a result of rtnl_lock() called from 
> unregister_netdev() and there is some work already in the workqueue 
> (linkwatch_event(), apparently) wanting to acquire the same lock.  Hence a 
> deadlock.

I don't get it.  If some code does

rtnl_lock();
flush_scheduled_work();

and there's some work scheduled which does rtnl_lock() then it'll deadlock.

But it'll deadlock whether or not the caller of flush_scheduled_work() is
keventd.

Calling flush_scheduled_work() under locks is generally a bad idea.

>  It seems problematic elsewhere as well -- see tg3.c -- but a 
> different way to deal with it has been found there.
> 
>  I am not that fond of this solution, but at least it works, unlike 
> calling flush_scheduled_work() here, which deadlocks the current CPU in my 
> system instantly.  Any suggestions as to how to do this differently?  
> Perhaps linkwatch_event() should be scheduled differently (using a 
> separate workqueue?)...

I'd have thought that was still deadlockable.  Could you describe the
deadlock more completely please?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-10-06 Thread Maciej W. Rozycki
On Thu, 5 Oct 2006, Andrew Morton wrote:

> > 2. The driver uses schedule_work() for handling interrupts, but does not 
> >make sure any pending work scheduled thus has been completed before 
> >driver's structures get freed from memory.  This is especially 
> >important as interrupts may keep arriving if the line is shared with 
> >another PHY.
> > 
> >The solution is to ignore phy_interrupt() calls if the reported device 
> >has already been halted and calling flush_scheduled_work() from 
> >phy_stop_interrupts() (but guarded with current_is_keventd() in case 
> >the function has been called through keventd from the MAC device's 
> >close call to avoid a deadlock on the netlink lock).
> > 
> 
> eww, hack.
> 
> Also not module-friendly:
> 
> WARNING: "current_is_keventd" [drivers/net/phy/libphy.ko] undefined!
> 
> Does this
> 
> static void flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
> {
>   if (cwq->thread == current) {
>   /*
>* Probably keventd trying to flush its own queue. So simply run
>* it by hand rather than deadlocking.
>*/
>   run_workqueue(cwq);
> 
> not work???

 It does, too well even! -- in the case I am trying to take care of we are 
run with "rtnl_mutex" held as a result of rtnl_lock() called from 
unregister_netdev() and there is some work already in the workqueue 
(linkwatch_event(), apparently) wanting to acquire the same lock.  Hence a 
deadlock.  It seems problematic elsewhere as well -- see tg3.c -- but a 
different way to deal with it has been found there.

 I am not that fond of this solution, but at least it works, unlike 
calling flush_scheduled_work() here, which deadlocks the current CPU in my 
system instantly.  Any suggestions as to how to do this differently?  
Perhaps linkwatch_event() should be scheduled differently (using a 
separate workqueue?)...  Or does dev_close() have to be called under this 
lock from unregister_netdevice()?  Perhaps code like this would do:

while (dev->flags & IFF_UP) {
rtnl_unlock();
dev_close(dev);
rtnl_lock();
}

  Maciej
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

2006-10-05 Thread Andrew Morton
On Tue, 3 Oct 2006 16:18:35 +0100 (BST)
"Maciej W. Rozycki" <[EMAIL PROTECTED]> wrote:

> 
> 2. The driver uses schedule_work() for handling interrupts, but does not 
>make sure any pending work scheduled thus has been completed before 
>driver's structures get freed from memory.  This is especially 
>important as interrupts may keep arriving if the line is shared with 
>another PHY.
> 
>The solution is to ignore phy_interrupt() calls if the reported device 
>has already been halted and calling flush_scheduled_work() from 
>phy_stop_interrupts() (but guarded with current_is_keventd() in case 
>the function has been called through keventd from the MAC device's 
>close call to avoid a deadlock on the netlink lock).
> 

eww, hack.

Also not module-friendly:

WARNING: "current_is_keventd" [drivers/net/phy/libphy.ko] undefined!

Does this

static void flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
{
if (cwq->thread == current) {
/*
 * Probably keventd trying to flush its own queue. So simply run
 * it by hand rather than deadlocking.
 */
run_workqueue(cwq);

not work???
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html