Re: [PATCH][netdrvr] lib8390: comment on locking by Alan Cox Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-30 Thread Jeff Garzik

Jarek Poplawski wrote:

Hi,

Very below is my patch proposal with a comment, which in my opinion
is precious enough to save it for future help in reading and
understanding the code.

I hope Alan will not blame me I've not asked for his permission before
sending, and he would ack this patch as it is or at least most of this.

Thanks & regards,
Jarek P.

On Wed, Jul 25, 2007 at 03:46:56PM +0100, Alan Cox wrote:

The code in question lib8390.c does

disable_irq();
fiddle_with_the_network_card_hardware()
enable_irq();

...

No idea how this affects the network card, as the code there must be
able to handle interrupts, which are not originated from the card due to
interrupt sharing.

I think, in this last yesterday's patch Ingo could be right, yet!
The comment at the beginnig points this is done like that because
of chip's slowness. And problems with timing are mysterious.

On the other hand author of this code didn't use spin_lock_irqsave
for some reason, probably after testing this option too. So, I hope
this is the right path, but alas, I'm not sure this patch has to
prove this 100%.

The author (me) didn't use spin_lock_irqsave because the slowness of the
card means that approach caused horrible problems like losing serial data
at 38400 baud on some chips. Rememeber many 8390 nics on PCI were ISA
chips with FPGA front ends.


Anyway, in my opinion this situation where interrupts could/have_to
be used for such strange things should confirm the need of more
options for handling irqs individually.

Ok the logic behind the 8390 is very simple:

Things to know
- IRQ delivery is asynchronous to the PCI bus
- Blocking the local CPU IRQ via spin locks was too slow
- The chip has register windows needing locking work

So the path was once (I say once as people appear to have changed it
in the mean time and it now looks rather bogus if the changes to use
disable_irq_nosync_irqsave are disabling the local IRQ)


Take the page lock
Mask the IRQ on chip
Disable the IRQ (but not mask locally- someone seems to have
broken this with the lock validator stuff)
[This must be _nosync as the page lock may otherwise
deadlock us]
Drop the page lock and turn IRQs back on

At this point an existing IRQ may still be running but we can't
get a new one

Take the lock (so we know the IRQ has terminated) but don't mask
the IRQs on the processor
Set irqlock [for debug]

Transmit (slow as )

re-enable the IRQ


We have to use disable_irq because otherwise you will get delayed
interrupts on the APIC bus deadlocking the transmit path.

Quite hairy but the chip simply wasn't designed for SMP and you can't
even ACK an interrupt without risking corrupting other parallel
activities on the chip.

Alan


-->

From: Jarek Poplawski <[EMAIL PROTECTED]>

Subject: lib8390: comment on locking by Alan Cox

Additional explanation of problems with locking by Alan Cox.

Signed-off-by: Jarek Poplawski <[EMAIL PROTECTED]>
Cc: Alan Cox <[EMAIL PROTECTED]>
Cc: Paul Gortmaker <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>


applied


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][netdrvr] lib8390: comment on locking by Alan Cox Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Alan Cox
On Thu, 26 Jul 2007 14:44:01 +0200
Jarek Poplawski <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> Very below is my patch proposal with a comment, which in my opinion
> is precious enough to save it for future help in reading and
> understanding the code.
> 
> I hope Alan will not blame me I've not asked for his permission before
> sending, and he would ack this patch as it is or at least most of this.

Fine by me

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][netdrvr] lib8390: comment on locking by Alan Cox Re: 2.6.20->2.6.21 - networking dies after random time

2007-07-26 Thread Jarek Poplawski
Hi,

Very below is my patch proposal with a comment, which in my opinion
is precious enough to save it for future help in reading and
understanding the code.

I hope Alan will not blame me I've not asked for his permission before
sending, and he would ack this patch as it is or at least most of this.

Thanks & regards,
Jarek P.

On Wed, Jul 25, 2007 at 03:46:56PM +0100, Alan Cox wrote:
> > > The code in question lib8390.c does
> > > 
> > >   disable_irq();
> > >   fiddle_with_the_network_card_hardware()
> > >   enable_irq();
> > ...
> > > 
> > > No idea how this affects the network card, as the code there must be
> > > able to handle interrupts, which are not originated from the card due to
> > > interrupt sharing.
> > 
> > I think, in this last yesterday's patch Ingo could be right, yet!
> > The comment at the beginnig points this is done like that because
> > of chip's slowness. And problems with timing are mysterious.
> > 
> > On the other hand author of this code didn't use spin_lock_irqsave
> > for some reason, probably after testing this option too. So, I hope
> > this is the right path, but alas, I'm not sure this patch has to
> > prove this 100%.
> 
> The author (me) didn't use spin_lock_irqsave because the slowness of the
> card means that approach caused horrible problems like losing serial data
> at 38400 baud on some chips. Rememeber many 8390 nics on PCI were ISA
> chips with FPGA front ends.
> 
> > Anyway, in my opinion this situation where interrupts could/have_to
> > be used for such strange things should confirm the need of more
> > options for handling irqs individually.
> 
> Ok the logic behind the 8390 is very simple:
> 
> Things to know
>   - IRQ delivery is asynchronous to the PCI bus
>   - Blocking the local CPU IRQ via spin locks was too slow
>   - The chip has register windows needing locking work
> 
> So the path was once (I say once as people appear to have changed it
> in the mean time and it now looks rather bogus if the changes to use
> disable_irq_nosync_irqsave are disabling the local IRQ)
> 
> 
>   Take the page lock
>   Mask the IRQ on chip
>   Disable the IRQ (but not mask locally- someone seems to have
>   broken this with the lock validator stuff)
>   [This must be _nosync as the page lock may otherwise
>   deadlock us]
>   Drop the page lock and turn IRQs back on
>   
>   At this point an existing IRQ may still be running but we can't
>   get a new one
> 
>   Take the lock (so we know the IRQ has terminated) but don't mask
> the IRQs on the processor
>   Set irqlock [for debug]
> 
>   Transmit (slow as )
> 
>   re-enable the IRQ
> 
> 
> We have to use disable_irq because otherwise you will get delayed
> interrupts on the APIC bus deadlocking the transmit path.
> 
> Quite hairy but the chip simply wasn't designed for SMP and you can't
> even ACK an interrupt without risking corrupting other parallel
> activities on the chip.
> 
> Alan
> 
-->

From: Jarek Poplawski <[EMAIL PROTECTED]>

Subject: lib8390: comment on locking by Alan Cox

Additional explanation of problems with locking by Alan Cox.

Signed-off-by: Jarek Poplawski <[EMAIL PROTECTED]>
Cc: Alan Cox <[EMAIL PROTECTED]>
Cc: Paul Gortmaker <[EMAIL PROTECTED]>
Cc: Jeff Garzik <[EMAIL PROTECTED]>

---

diff -Nurp 2.6.23-rc1-/drivers/net/lib8390.c 2.6.23-rc1/drivers/net/lib8390.c
--- 2.6.23-rc1-/drivers/net/lib8390.c   2007-07-09 01:32:17.0 +0200
+++ 2.6.23-rc1/drivers/net/lib8390.c2007-07-26 13:55:17.0 +0200
@@ -143,6 +143,52 @@ static void __NS8390_init(struct net_dev
  * annoying the transmit function is called bh atomic. That places
  * restrictions on the user context callers as disable_irq won't save
  * them.
+ *
+ * Additional explanation of problems with locking by Alan Cox:
+ *
+ * "The author (me) didn't use spin_lock_irqsave because the slowness of 
the
+ * card means that approach caused horrible problems like losing serial 
data
+ * at 38400 baud on some chips. Rememeber many 8390 nics on PCI were ISA
+ * chips with FPGA front ends.
+ * 
+ * Ok the logic behind the 8390 is very simple:
+ * 
+ * Things to know
+ * - IRQ delivery is asynchronous to the PCI bus
+ * - Blocking the local CPU IRQ via spin locks was too slow
+ * - The chip has register windows needing locking work
+ * 
+ * So the path was once (I say once as people appear to have changed it
+ * in the mean time and it now looks rather bogus if the changes to use
+ * disable_irq_nosync_irqsave are disabling the local IRQ)
+ * 
+ * 
+ * Take the page lock
+ * Mask the IRQ on chip
+ * Disable the IRQ (but not mask locally- someone seems to have
+ * broken this with the lock validator stuff)
+ * [This must be _nosync as the page lock may oth