Re: [PATCH 1/5] spidernet: add missing initialization

2008-01-11 Thread Linas Vepstas
Hi,

On 11/01/2008, Jens Osterkamp <[EMAIL PROTECTED]> wrote:
> Hi Ishizaki,
>
> Linas has left the company and is no longer doing kernel related stuff,
> so I suggest, given Jeff is ok with that, that the two of us take over
> spidernet maintainership.
>
> Jens
>
> ---
>
> Change maintainership for spidernet.
>
> Signed-off-by: Jens Osterkamp <[EMAIL PROTECTED]>

Fine with me ...

Acked-by: Linas Vepstas <[EMAIL PROTECTED]>

> Index: linux-2.6/MAINTAINERS
> ===
> --- linux-2.6.orig/MAINTAINERS  2008-01-11 13:32:04.0 +0100
> +++ linux-2.6/MAINTAINERS   2008-01-11 13:41:32.0 +0100
> @@ -3613,8 +3613,10 @@
>  S:     Supported
>
>  SPIDERNET NETWORK DRIVER for CELL
> -P: Linas Vepstas
> -M: [EMAIL PROTECTED]
> +P: Ishizaki Kou
> +M: [EMAIL PROTECTED]
> +P: Jens Osterkamp
> +M: [EMAIL PROTECTED]
>  L: netdev@vger.kernel.org
>  S: Supported
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ehea: Add kdump support

2007-11-26 Thread Linas Vepstas

Hi,

On Mon, Nov 26, 2007 at 01:41:37PM -0200, Luke Browning wrote:
> On Mon, 2007-11-26 at 19:16 +1100, Michael Ellerman wrote:
> 
> > For kdump we have to assume that the kernel is fundamentally broken,

If I may so humbly suggest: since ehea is a power6 thing only,
we should refocus our energies on "hypervisor assisted dump",
which solves all of these problems. 

In short, upon crash, the hypervisor will reset the 
pci devices into working order, and will then boot
a new fresh kernel into a tiny corner of ram. The rest
of ram is not cleared, and can be dumped. After the 
dump, the mem is returned to general use.

The key point here, for ehea, is "the hypervisor
will reset he device state to something rational".

Preliminary patches are at
http://patchwork.ozlabs.org/linuxppc/patch?id=14884
and following.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netdev: create an "is_napi_enabled()" call

2007-11-13 Thread Linas Vepstas

In certain rare cases, it can be nice to be able to check
if napi is enabled or not. Create an is_napi_enabled() call.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
---
Actually, I'm confused about whether I'd mailed this previously.
It might be a duplicate submission.

 include/linux/netdevice.h |   11 +++
 1 file changed, 11 insertions(+)

Index: linux-2.6.23-rc8-mm1/include/linux/netdevice.h
===
--- linux-2.6.23-rc8-mm1.orig/include/linux/netdevice.h 2007-11-09 
17:36:51.0 -0600
+++ linux-2.6.23-rc8-mm1/include/linux/netdevice.h  2007-11-09 
17:40:19.0 -0600
@@ -384,6 +384,17 @@ static inline void napi_enable(struct na
clear_bit(NAPI_STATE_SCHED, &n->state);
 }
 
+/**
+ * is_napi_enabled - return non-zero if napi enabled
+ * @n: napi context
+ *
+ * Return true if napi is enabled.
+ */
+static inline bool is_napi_enabled(struct napi_struct *n)
+{
+   return !test_bit(NAPI_STATE_SCHED, &n->state);
+}
+
 /*
  * The DEVICE structure.
  * Actually, this whole structure is a big mistake.  It mixes I/O
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2]: e1000: avoid lockup durig error recovery

2007-11-09 Thread Linas Vepstas
On Fri, Nov 09, 2007 at 06:02:34PM +0100, Ingo Oeser wrote:
> Linas Vepstas schrieb:
> > + * napi_enabled_p - return non-zero if napi enabled
> > + * 
> > + * Mnemonic: _p stands for "predicate", returning a yes/no
> > + * answer to the question.
> 
> Call it "is_napi_enabled()" an nobody will ask :-)

Heh. The suffix _p is standard coding style for lisp/scheme
and first-order logic interpreters.  This was my lame attempt
to introduce it to the kernel. I guess that lame duck won't fly.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PCI: export pci_restore_msi_state()

2007-11-09 Thread Linas Vepstas
On Thu, Nov 08, 2007 at 07:21:01PM -0600, Wen Xiong wrote:
> Hi Linas,
> 
> I saw you have submitted several patches to support pci-express network 
> adapters EEH. But looks only this patch fixed something in linux kernel 
> code.

And its an old patch, submitted long ago ... I've resubmitted, because
it seems that its the best/most correct thing to do.

> Do you mean I can test EEH callback functions in device driver after I 
> apply  this patch in the kernel?

Yes, please.  Note, however, I was never able to make the pci-e 
version of the e1000 work. It comes up, generates interrupts, and
registeres are readable and writeable. But it behvaes as if the PHY 
is turned off -- no network traffic goes thorugh. (I tried turning
PHY on explicitly; that didn't help). So there is still something 
wrong somewhere, probably in the e1000 deice driver. I'm guessing
the pci-e to pci-x bridge chip on that card is not quite resetting
the card completely.

> Do you do "pci_save_msi_state" somewhere in the kernel? Or you suggest to 
> do "pci_save_msi_state" and "pci_restore_msi_state" in each device driver?

There is no "save state", the msi state can't be saved. The MSI regs
are write-only, and they are controlled by firmware. The restore_state
function is the only one you need.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2]: e1000: avoid lockup durig error recovery

2007-11-07 Thread Linas Vepstas
On Wed, Nov 07, 2007 at 02:45:18PM -0800, Kok, Auke wrote:
> [adding netdev, jeff G to the Cc]
> 
> Linas Vepstas wrote:
> > On Wed, Nov 07, 2007 at 01:50:17PM -0800, Kok, Auke wrote:
> >> Linas Vepstas wrote:
> >>> If a PCI bus error is encountered during device open, the
> >>> error recovery routines will attempt to close the device.
> >>> If napi has not yet been enabled, the napi disable in the
> >>> close will hang. 
> >>>
> >>> Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
> >>>
> >>> 
> >>> The "elegence" of this solution is arguable: one could
> >>> say its "better" to perform this check in e1000_down().
> >>> However, doing so will disrupt a commonly used path,
> >>> whereas here, the hack is in the infrequently used
> >>> error path, and thus less intrusive. 
> >>>
> >>>  drivers/net/e1000/e1000_main.c |9 -
> >>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >> I think this is OK, but it's quite awful looking if you ask me.
> > 
> > Yeah, ... 
> > 
> > There are several alternatives: below are two. If you
> > find one to be more appealing.. could you use it? Consider them
> > to be "signed-off-by"; I have not actually compiled or tested
> > either of them.
> 
> I'm not a particular fan of putting extra state tracking in the driver for
> something we could extract from the napi subsystem already.
> 
> Jeff, Stephen, can't we have a generic napi_enabled() inline in netdevice.h 
> that
> tests for NAPI_STATE_SCHED ?

Like this?

 include/linux/netdevice.h |   12 
 1 file changed, 12 insertions(+)

Index: linux-2.6.23-rc8-mm1/include/linux/netdevice.h
===
--- linux-2.6.23-rc8-mm1.orig/include/linux/netdevice.h 2007-09-26 
15:07:05.0 -0500
+++ linux-2.6.23-rc8-mm1/include/linux/netdevice.h  2007-11-07 
17:14:50.0 -0600
@@ -384,6 +384,18 @@ static inline void napi_enable(struct na
clear_bit(NAPI_STATE_SCHED, &n->state);
 }
 
+/**
+ * napi_enabled_p - return non-zero if napi enabled
+ * @n: napi context
+ * 
+ * Mnemonic: _p stands for "predicate", returning a yes/no
+ * answer to the question.
+ */
+static inline int napi_enabled_p(struct napi_struct *n)
+{
+   return !test_bit(NAPI_STATE_SCHED, &n->state);
+}
+
 /*
  * The DEVICE structure.
  * Actually, this whole structure is a big mistake.  It mixes I/O


> I wonder if there isn't something in the PCI error recovery missing the point 
> and
> we can solve this problem better for all drivers somehow.

Well, there's also scsi, which doesn't use napi :-)
For the most part, error recovery is a fairly cut-n-paste
set of steps. However, I don't quite have enough confidence
to say "yea verily, all network adapters will use these 
same steps."

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PCI: export pci_restore_msi_state()

2007-11-07 Thread Linas Vepstas
Hi,

On Wed, Nov 07, 2007 at 03:43:59PM -0600, Linas Vepstas wrote:
> 
> PCI error recovery usually involves the PCI adapter being reset.
> If the device is using MSI, the reset will cause the MSI state 
> to be lost; the device driver needs to restore the MSI state.
> 
> The pci_restore_msi_state() routine is currently protected
> by CONFIG_PM; remove this, and also export the symbol, so
> that it can be used in a modle.
> 
> Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>

The long time delay has managed to muddle notes & recollection 
of prior discussions. This patch should have had a 

Signed-off-by: Matt Carlson <[EMAIL PROTECTED]>
Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

on it; its the same patch that was submitted a long time ago.

During the discussions of 21 Oct, it was proposed that there
should be an arch hook for pci_restore_msi_state(), so that
it would be treated at the same level as msi setup and teardown.
I'd volunteered to write that patch. 

When I sat down to do it, however, I realized that I did not
actually *need* it. And so I wondered: why am I writing un-needed,
but theoretically proper, code? So I punted, and I didn't.
Does that make sense?

That's also why this patch is just a resubmission of the old
patch. The original thread is here:

http://www.mail-archive.com/netdev@vger.kernel.org/msg51296.html

--linas



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] PCI: export pci_restore_msi_state()

2007-11-07 Thread Linas Vepstas

PCI error recovery usually involves the PCI adapter being reset.
If the device is using MSI, the reset will cause the MSI state 
to be lost; the device driver needs to restore the MSI state.

The pci_restore_msi_state() routine is currently protected
by CONFIG_PM; remove this, and also export the symbol, so
that it can be used in a modle.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


I am so sorry I wasn't able to send this 3 weeks ago, when
I first wrote the patch. There was simply no functional
hardware available to actually run this stuff :-(

Patches that use this, including those for tg3 and e1000e and ixgbe
i.e. MSI-using drivers, are to follow "real soon now". 

 drivers/pci/msi.c   |3 +--
 drivers/pci/pci.h   |6 --
 include/linux/pci.h |2 ++
 3 files changed, 3 insertions(+), 8 deletions(-)

Index: linux-2.6.23-rc8-mm1/drivers/pci/msi.c
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/msi.c 2007-10-16 15:14:20.0 
-0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/msi.c  2007-10-16 15:14:42.0 
-0500
@@ -224,7 +224,6 @@ static struct msi_desc* alloc_msi_entry(
return entry;
 }
 
-#ifdef CONFIG_PM
 static void __pci_restore_msi_state(struct pci_dev *dev)
 {
int pos;
@@ -282,7 +281,7 @@ void pci_restore_msi_state(struct pci_de
__pci_restore_msi_state(dev);
__pci_restore_msix_state(dev);
 }
-#endif /* CONFIG_PM */
+EXPORT_SYMBOL_GPL(pci_restore_msi_state);
 
 /**
  * msi_capability_init - configure device's MSI capability structure
Index: linux-2.6.23-rc8-mm1/drivers/pci/pci.h
===
--- linux-2.6.23-rc8-mm1.orig/drivers/pci/pci.h 2007-10-16 15:14:20.0 
-0500
+++ linux-2.6.23-rc8-mm1/drivers/pci/pci.h  2007-10-16 15:19:33.0 
-0500
@@ -45,12 +45,6 @@ static inline void pci_no_msi(void) { }
 static inline void pci_msi_init_pci_dev(struct pci_dev *dev) { }
 #endif
 
-#if defined(CONFIG_PCI_MSI) && defined(CONFIG_PM)
-void pci_restore_msi_state(struct pci_dev *dev);
-#else
-static inline void pci_restore_msi_state(struct pci_dev *dev) {}
-#endif
-
 static inline int pci_no_d1d2(struct pci_dev *dev)
 {
unsigned int parent_dstates = 0;
Index: linux-2.6.23-rc8-mm1/include/linux/pci.h
===
--- linux-2.6.23-rc8-mm1.orig/include/linux/pci.h   2007-10-01 
13:26:38.0 -0500
+++ linux-2.6.23-rc8-mm1/include/linux/pci.h2007-10-16 15:19:07.0 
-0500
@@ -665,6 +665,7 @@ static inline int pci_enable_msix(struct
struct msix_entry *entries, int nvec) {return -1;}
 static inline void pci_disable_msix(struct pci_dev *dev) {}
 static inline void msi_remove_pci_irq_vectors(struct pci_dev *dev) {}
+static inline void pci_restore_msi_state(struct pci_dev *dev) {}
 #else
 extern int pci_enable_msi(struct pci_dev *dev);
 extern void pci_disable_msi(struct pci_dev *dev);
@@ -672,6 +673,7 @@ extern int pci_enable_msix(struct pci_de
struct msix_entry *entries, int nvec);
 extern void pci_disable_msix(struct pci_dev *dev);
 extern void msi_remove_pci_irq_vectors(struct pci_dev *dev);
+extern void pci_restore_msi_state(struct pci_dev *dev);
 #endif
 
 #ifdef CONFIG_HT_IRQ
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Read back MSI message in rtas_setup_msi_irqs() so restore works

2007-11-07 Thread Linas Vepstas
On Tue, Oct 23, 2007 at 02:23:44PM +1000, Michael Ellerman wrote:
> There are plans afoot to use pci_restore_msi_state() to restore MSI
> state after a device reset. In order for this to work for the RTAS MSI
> backend, we need to read back the MSI message from config space after
> it has been setup by firmware.
> 
> This should be sufficient for restoring the MSI state after a device
> reset, however we will need to revisit this for suspend to disk if that
> is ever implemented on pseries.
> 
> Signed-off-by: Michael Ellerman <[EMAIL PROTECTED]>
> ---
> 
> Linas, can you test this on your setup with your EEH stuff? I haven't got
> any MSI supporting hardware/firmware combination.

Acked-by: Linas Vepstas <[EMAIL PROTECTED]>

I *finally* was able to get onto some hardware long enough to run this.
And that took a lot of work. Sigh. Yes, this is exactly what I'd wanted.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-22 Thread Linas Vepstas
On Tue, Oct 23, 2007 at 07:24:27AM +1000, Benjamin Herrenschmidt wrote:
> 
> On Mon, 2007-10-22 at 13:13 -0500, Linas Vepstas wrote:
> > On Mon, Oct 22, 2007 at 11:49:24AM +1000, Michael Ellerman wrote:
> > > 
> > > On pseries there's a chance it will work for PCI error recovery, but if
> > > so it's just lucky that firmware has left everything configured the same
> > > way. 
> > 
> > ? The papr is quite clear that i is up to the OS to restore the msi
> > state after an eeh error.
> 
> Via direct config space access or via firmware change-msi calls ?

Direct config space access. It says that the OS is supposed to read the
MSI config (after its been set up), save it, and restore it, (via direct
config space writes) if the device is ever reset.

> I don't know why you keep talking about powerpc laptops here ... 

Well, there are Apple laptops, right?  Aren't those the "powermac" 
platform?  Now, I don't know if they support MSI, but if they do,
I get the impression that they might not restore msi state correctly,
after being put into hardware suspend.  But perhaps I'm mistaken;
I was simply grepping for various msi-related functions in various
arch subdirectories, comparing x86 to other arches, and noticed 
that code that would restore msi state seems to be missing for
most arches and most powerpc platforms.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] powerpc does not save msi state [was Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-22 Thread Linas Vepstas
On Fri, Oct 19, 2007 at 05:53:08PM -0700, David Miller wrote:
> From: [EMAIL PROTECTED] (Linas Vepstas)
> Date: Fri, 19 Oct 2007 19:46:10 -0500
> 
> > FWIW, it looks like not all that many arches do this; the output
> > for grep -r address_hi * is pretty thin. Then, looking at
> > i386/kernel/io_apic.c as an example, one can see that the 
> > msi state save happens "by accident" if CONFIG_SMP is enabled;
> > and so its surely broekn on uniprocesor machines.
> 
> I don't see this, in all cases write_msi_msg() will transfer
> the given "*msg" to entry->msg by this assignment in
> drivers/pci/msi.c:
> 
> void write_msi_msg(unsigned int irq, struct msi_msg *msg)
> {
>  ...
>   entry->msg = *msg;
> }
> 
> So as long as write_msi_msg() is invoked, it will be saved
> properly.

As Michael Ellerman points out, the pseries msi setup is done
by firmware, and so this bit never happens. 

As discussed in the other thread, I'll try to set up a patch
for an arch callback for restoring msi state.

-linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-22 Thread Linas Vepstas
On Sun, Oct 21, 2007 at 09:45:20PM -0700, David Miller wrote:
> 
> The core issue is that the ARCH level MSI code invokes
> write_msi_msg(), not the generic code, exactly because there
> are platform level issues wherein the firmware is the only
> legal way to write the MSI settings in PCI config space.
> 
> However, the MSI state restore code was not architected similarly.  It
> does the write_msi_msg() directly, instead of letting platform level
> code is in ARCH hooks.

Yes, exactly.

> Therefore I think we need to attack this in two stages:
> 
> 1) First changeset moves the write_msi_msg() call currently in
>__pci_restore_msi_state() into an ARCH overridable handler.
> 
>This would allow powerpc to deal with this properly.

Yes!
I'll try to put together a patch later today, if I can get
a fabled "round tuit".

>pci_restor_msi_state() can get exported to modules in this
>change

OK.

> 2) The Tigon3 error recovery changes, as they were.
> 
> But I have to ask, can anyone see how e1000 handles MSI properly
> in it's PCI error support?

It doesn't. None of them do. :-(  I didn't get access to msi-capable 
hardware until a few weeks ago; that's why this is coming up just now.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-22 Thread Linas Vepstas
On Mon, Oct 22, 2007 at 11:49:24AM +1000, Michael Ellerman wrote:
> 
> On pseries there's a chance it will work for PCI error recovery, but if
> so it's just lucky that firmware has left everything configured the same
> way. 

? The papr is quite clear that i is up to the OS to restore the msi
state after an eeh error.

> Yes I think so. That way we can properly reconfigure via the firmware
> interface. The other option would be to design some new arch hook to do
> resume, but just doing a disable/enable seems simpler to me.

Err, If you read the code for suspend/resume, it never actually calls
disable/enable (and thus doesn't go to the firmware); it calls 
restore_msi_state() function!

If suspend/resume needs to call firmware to restore the state, then,
at the moment, suspend/resume is broken.  As I mentioned earlier,
I presumed that no powerpc laptops currently use msi-enabled devices,
as otherwise, this would have been flushed out.

--linas


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-22 Thread Linas Vepstas
On Sun, Oct 21, 2007 at 04:21:31PM -0700, David Miller wrote:
> From: "Matt Carlson" <[EMAIL PROTECTED]>
> Date: Fri, 19 Oct 2007 14:36:56 -0700
> 
> > This patch exports the pci_restore_msi_state() function.  This function
> > is needed to restore the MSI state during PCI error recovery.
> > 
> > Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
> > Signed-off-by: Matt Carlson <[EMAIL PROTECTED]>
> > Signed-off-by: Michael Chan <[EMAIL PROTECTED]>
> 
> I'm not so sure about this.
> 
> Perhaps, instead, you should do a pci_msi_disable() and
> pci_msi_enable() in the error detection and recovery sequence.
> 
> Or, alternatively, save/restore those MSI registers by hand.
> 
> I'm trying to figure out how the E1000 driver handles this correctly,
> but I can't see it just by reading it over quickly.

The e1000 and the ixgb are broken as well ... right now, any
driver that uses msi together with the pci error recovery will
fail to get recovered correctly.  There are several distinct bugs;
one is that, msi state is not being restored; and the call to 
pci_restore_msi_state() was supposed to aid with that.

I'd rather not use pci_msi_disable(), because that has the 
side-effect of enabling legacy interupts; I'm concerned that 
this will have the potential for causing havoc of all sorts.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] powerpc does not save msi state [was Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-19 Thread Linas Vepstas
Hi,

On Fri, Oct 19, 2007 at 05:27:06PM -0700, David Miller wrote:
> From: [EMAIL PROTECTED] (Linas Vepstas)
> Date: Fri, 19 Oct 2007 19:04:21 -0500
> 
> > I'm working in linux-2.6.23-rc8-mm1 at the moment, and I don't see
> > that happening. viz. read_msi_msg() is not called anywhere, and I need
> > to have valid msg->address_lo and msg->address_hi and msg->data
> > in order to be able to restore.
> 
> See the pci_restore_msi_state() call done from pci_restore_state()
> in drivers/pci/pci.c, that pci_restore_msi_state() code in
> drivers/pci/msi.c very much relies upon the entry->msg values
> being uptodate and valid.
> 
> The MSI arch layer code is supposed to fill the entry->msg values in
> via arch_setup_msi_irq().  Perhaps the pseries code is forgetting to
> do that.

Yep.  Thank you for confirming the correct location for the fix.

FWIW, it looks like not all that many arches do this; the output
for grep -r address_hi * is pretty thin. Then, looking at
i386/kernel/io_apic.c as an example, one can see that the 
msi state save happens "by accident" if CONFIG_SMP is enabled;
and so its surely broekn on uniprocesor machines.

I'm cc'ing the powerpc mailing list to point this out: 
it looks like only cell/axon_msi.c and mpic_u3msi.c 
bother do do anything.  I guess that there aren't any old 
macintosh laptops that have msi on them? Because without
this, suspend and resume breaks.

Paul,
On the off chance your reading this, I'll send a pseries
patch on Monday, with luck (and some other patches too).
I'm not touching any of the other plaforms, you and benh 
would know those better.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-19 Thread Linas Vepstas
On Fri, Oct 19, 2007 at 06:12:03PM -0700, Michael Chan wrote:
> On Fri, 2007-10-19 at 19:04 -0500, [EMAIL PROTECTED] wrote:
> > I'm working in linux-2.6.23-rc8-mm1 at the moment, and I don't see
> > that happening. viz. read_msi_msg() is not called anywhere, and I need
> > to have valid msg->address_lo and msg->address_hi and msg->data
> > in order to be able to restore.
> > 
> > In particular, this has to happen after the call to
> > arch_setup_msi_irqs
> > as otherwise, the arch hasn't yet filled these fields with correct
> > values.
> > 
> > Perhaps this is fixed in the kernel you're working with?
> 
> It's possible that this doesn't work on pseries.  I've only tested
> pci_restore_msi_state() on x86 in the context of suspend and resume.
> During resume, the MSI state gets restored correctly on x86.

:-) Yes, I think that is being done in arch/i386/kernel/io_apic.c
and arch/ia64/kernel/msi_ia64.c and etc. but its not being done
on most of the powerpc's.  Its possible that none of the
old macintosh laptops use msi, and so no one noticed before; 
I know that no one ever suspends/resumes the big servers I work 
on, sooo :-)

Actually, looking at arch/i386/kernel/io_apic.c, it looks like
the msi state is being saved only when CONFIG_SMP is set, so 
it seems to me that the restore will fail on uni systems ... 
are there any of those left? 

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-19 Thread Linas Vepstas
On Fri, Oct 19, 2007 at 05:36:17PM -0700, Michael Chan wrote:
> On Fri, 2007-10-19 at 18:29 -0500, [EMAIL PROTECTED] wrote:
> > On Fri, Oct 19, 2007 at 02:36:56PM -0700, Matt Carlson wrote:
> > > This patch exports the pci_restore_msi_state() function.  This function
> > > is needed to restore the MSI state during PCI error recovery.
> > > 
> > > Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
> > > Signed-off-by: Matt Carlson <[EMAIL PROTECTED]>
> > > Signed-off-by: Michael Chan <[EMAIL PROTECTED]>
> > 
> > Davem,
> > 
> > This patch is generically needed for recovery from PCI errors, 
> > and not just the tg3 that Matt is working on.
> > 
> > Matt, there are also several msi-related bugs in the pseries
> > architecture implementation, those patches will go out to 
> > Paul Mackerras seperately. I was hoping today ... but things 
> > came up. One little iddy-biddy problem is that the pseries
> > is not actually *saving* the msi state, and so, ahem, the 
> > restore isn't quite working out either. I'm still trying
> > to navigate around that.
> > 
> Linas, the MSI state is saved automatically when the driver calls
> pci_enable_msi(), so it doesn't need to be saved by pseries code.

I'm working in linux-2.6.23-rc8-mm1 at the moment, and I don't see
that happening. viz. read_msi_msg() is not called anywhere, and I need
to have valid msg->address_lo and msg->address_hi and msg->data
in order to be able to restore.

In particular, this has to happen after the call to arch_setup_msi_irqs
as otherwise, the arch hasn't yet filled these fields with correct values.

Perhaps this is fixed in the kernel you're working with?

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] pci: Export the pci_restore_msi_state() function

2007-10-19 Thread Linas Vepstas
On Fri, Oct 19, 2007 at 02:36:56PM -0700, Matt Carlson wrote:
> This patch exports the pci_restore_msi_state() function.  This function
> is needed to restore the MSI state during PCI error recovery.
> 
> Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
> Signed-off-by: Matt Carlson <[EMAIL PROTECTED]>
> Signed-off-by: Michael Chan <[EMAIL PROTECTED]>

Davem,

This patch is generically needed for recovery from PCI errors, 
and not just the tg3 that Matt is working on.

Matt, there are also several msi-related bugs in the pseries
architecture implementation, those patches will go out to 
Paul Mackerras seperately. I was hoping today ... but things 
came up. One little iddy-biddy problem is that the pseries
is not actually *saving* the msi state, and so, ahem, the 
restore isn't quite working out either. I'm still trying
to navigate around that.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tg3: add PCI error recovery support

2007-10-12 Thread Linas Vepstas
Hi,

On Thu, Aug 02, 2007 at 05:13:39PM -0700, Michael Chan wrote:
> On Thu, 2007-08-02 at 18:12 -0500, [EMAIL PROTECTED] wrote:
> > On Thu, Jul 26, 2007 at 06:12:00PM -0700, Michael Chan wrote:
> > > On Thu, 2007-07-26 at 17:57 -0500, [EMAIL PROTECTED] wrote:
> > > > [... PCI error recovery patches for tg3 ...]
> > > 
> > > Thanks.  We will review and submit probably in 2.6.24.
> > 
> > OK, thanks. Not to nag, but it helps to get these into susbsystem 
> > maintainer trees well before then, as otherwise the 2.6.24 window 
> > will get missed (seeing how I just missed 2.6.23 ..)
> 
> Yes, I know.  David Miller hasn't opened up a 2.6.24 tree yet.  We'll
> probably have a number of patches to be submitted for 2.6.24, and we'll
> add this one to the set.  Thanks.

Did the patch make it in?  Now seems to be the time ...

--linas

p.s. I'll send a generic MSI patch to the PCI mailing list; it appears 
the MSI issue is generic, affecting lots of adapters.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: fix interrupt reason recognition

2007-09-04 Thread Linas Vepstas
On Fri, Aug 31, 2007 at 06:46:17AM -0400, Jeff Garzik wrote:
> Ishizaki Kou wrote:
> >This patch solves a problem that the spidernet driver sometimes fails
> >to handle IRQ.
> >
> >The problem happens because,
> >- In Cell architecture, interrupts may arrive at an interrupt
> >  controller, even if they are masked by the setting on registers of
> >  devices. It happens when interrupt packets are sent just before 
> >  the interrupts are masked.
> >- spidernet interrupt handler compares interrupt reasons with
> >  interrupt masks, so when such interrupts occurs, spidernet interrupt
> >  handler returns IRQ_NONE.
> >- When all of interrupt handler return IRQ_NONE, linux kernel disables
> >  the IRQ and it no longer delivers interrupts to the interrupt handlers.
> >
> >spidernet doesn't work after above sequence, because it can't receive
> >interrupts.
> > 
> >This patch changes spidernet interrupt handler that it compares
> >interrupt reason with SPIDER_NET_INTX_MASK_VALUE.
> >
> >Signed-off-by: Kou Ishizaki <[EMAIL PROTECTED]>
> >---
> >
> >Linas-san,
> >
> >Please apply this to 2.6.23. Because this problem is sometimes happens
> >and we cannot use the ethernet port any more.
> >
> >And also, please apply the following Arnd-san's patch to fix a problem
> >that spidernet driver sometimes causes a BUG_ON at open.
> 
> Linas?  ACK?  Alive?  :)

Argh. I read the code; it looked fine. I was going to compile it and
forward it formally and etc. and then I got busy ...

Ack'ed by: Linas Vepstas <[EMAIL PROTECTED]>

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 02:44:36PM -0700, David Miller wrote:
> From: David Stevens <[EMAIL PROTECTED]>
> Date: Fri, 24 Aug 2007 09:50:58 -0700
> 
> > Problem is if it increases rapidly, you may drop packets
> > before you notice that the ring is full in the current estimated
> > interval.
> 
> This is one of many reasons why hardware interrupt mitigation
> is really needed for this.

When turning off interrupts, don't turn them *all* off.
Leave the queue-full interrupt always on.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 11:11:56PM +0200, Jan-Bernd Themann wrote:
> (when they are available for
> POWER in our case). 

hrtimer worked fine on the powerpc cell arch last summer.
I assume they work on p5 and p6 too, no ??

> I tried to implement something with "normal" timers, but the result
> was everything but great. The timers seem to be far too slow.
> I'm not sure if it helps to increase it from 1000HZ to 2500HZ
> or more.

Heh. Do the math. Even on 1gigabit cards, that's not enough:

(1gigabit/sec) x (byte/8 bits) x (packet/1500bytes) x (sec/1000 jiffy) 

is 83 packets a jiffy (for big packets, even more for small packets, 
and more again for 10 gigabit cards). So polling once per jiffy is a 
latency disaster.

--linas  

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote:
> Linas Vepstas <[EMAIL PROTECTED]> wrote:
> > On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> >> 3) On modern systems the incoming packets are processed very fast. 
> >> Especially
> >> on SMP systems when we use multiple queues we process only a few packets
> >> per napi poll cycle. So NAPI does not work very well here and the interrupt
> >> rate is still high.
> > 
> > worst-case network ping-pong app: send one
> > packet, wait for reply, send one packet, etc.
> 
> Possible solution / possible brainfart:
> 
> Introduce a timer, but don't start to use it to combine packets unless you
> receive n packets within the timeframe. If you receive less than m packets
> within one timeframe, stop using the timer. The system should now have a
> decent response time when the network is idle, and when the network is
> busy, nobody will complain about the latency.-)

Ohh, that was inspirational. Let me free-associate some wild ideas.

Suppose we keep a running average of the recent packet arrival rate,
Lets say its 10 per millisecond ("typical" for a gigabit eth runnning
flat-out).  If we could poll the driver at a rate of 10-20 per
millisecond (i.e. letting the OS do other useful work for 0.05 millisec),
then we could potentially service the card without ever having to enable 
interrupts on the card, and without hurting latency.

If the packet arrival rate becomes slow enough, we go back to an
interrupt-driven scheme (to keep latency down).

The main problem here is that, even for HZ=1000 machines, this amounts 
to 10-20 polls per jiffy.  Which, if implemented in kernel, requires 
using the high-resolution timers. And, umm, don't the HR timers require
a cpu timer interrupt to make them go? So its not clear that this is much
of a win.

The eHEA is a 10 gigabit device, so it can expect 80-100 packets per
millisecond for large packets, and even more, say 1K packets per
millisec, for small packets. (Even the spec for my 1Gb spidernet card
claims its internal rate is 1M packets/sec.) 

Another possiblity is to set HZ to 5000 or 2 or something humongous
... after all cpu's are now faster! But, since this might be wasteful,
maybe we could make HZ be dynamically variable: have high HZ rates when
there's lots of network/disk activity, and low HZ rates when not. That
means a non-constant jiffy.

If all drivers used interrupt mitigation, then the variable-high
frequency jiffy could take thier place, and be more "fair" to everyone.
Most drivers would be polled most of the time when they're busy, and 
only use interrupts when they're not.
 
--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 08:52:03AM -0700, Stephen Hemminger wrote:
> 
> You need hardware support for deferred interrupts. Most devices have it 
> (e1000, sky2, tg3)
> and it interacts well with NAPI. It is not a generic thing you want done by 
> the stack,
> you want the hardware to hold off interrupts until X packets or Y usecs have 
> expired.

Just to be clear, in the previous email I posted on this thread, I
described a worst-case network ping-pong test case (send a packet, wait
for reply), and found out that a deffered interrupt scheme just damaged
the performance of the test case.  Since the folks who came up with the
test case were adamant, I turned off the defferred interrupts.  
While defferred interrupts are an "obvious" solution, I decided that 
they weren't a good solution. (And I have no other solution to offer).

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Linas Vepstas
On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
> 3) On modern systems the incoming packets are processed very fast. Especially
>    on SMP systems when we use multiple queues we process only a few packets
>    per napi poll cycle. So NAPI does not work very well here and the 
> interrupt 
>    rate is still high. 

I saw this too, on a system that is "modern" but not terribly fast, and
only slightly (2-way) smp. (the spidernet)

I experimented wih various solutions, none were terribly exciting.  The
thing that killed all of them was a crazy test case that someone sprung on
me:  They had written a worst-case network ping-pong app: send one
packet, wait for reply, send one packet, etc.  

If I waited (indefinitely) for a second packet to show up, the test case 
completely stalled (since no second packet would ever arrive).  And if I 
introduced a timer to wait for a second packet, then I just increased 
the latency in the response to the first packet, and this was noticed, 
and folks complained.  

In the end, I just let it be, and let the system work as a busy-beaver, 
with the high interrupt rate. Is this a wise thing to do?  I was
thinking that, if the system is under heavy load, then the interrupt
rate would fall, since (for less pathological network loads) more 
packets would queue up before the poll was serviced.  But I did not
actually measure the interrupt rate under heavy load ... 

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: fix interrupt reason recognition

2007-08-20 Thread Linas Vepstas
On Mon, Aug 20, 2007 at 10:13:27PM +0900, Ishizaki Kou wrote:
> Please apply this to 2.6.23.

I'll review and forward shortly.  Kick me if you don't see a formal
reply in a few days.

> And also, please apply the following Arnd-san's patch to fix a problem
> that spidernet driver sometimes causes a BUG_ON at open.
> 
>  http://patchwork.ozlabs.org/cbe-oss-dev/patch?id=12211

Are you sure? This patch no longer applies cleanly, in part because
your patch "[PATCH] spidernet: improve interrupt handling" 
from Mon, 09 Jul 2007 added a spider_net_enable_interrupts(card); 
at the end of spider_net_open().  Because of this, it seems like 
Arnd's patch is no longer needed, right?

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: enable poll() before registering interrupts

2007-08-20 Thread Linas Vepstas
On Thu, Jul 12, 2007 at 01:19:11AM +0200, Arnd Bergmann wrote:
> We must not call netif_poll_enable after enabling interrupts,
> because an interrupt might come in and set the __LINK_STATE_RX_SCHED
> bit before we get to clear that bit again. If that happens,
> the next call to the ->poll() function will oops.
> 
> Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]>
> ---
> This was found during testing with the fedora kernel,
> with all patches from netdev-2.6.git applied.
> 
> It may not be the right fix, but this is currently the
> only way I can get that kernel to boot.
> 
> One part I don't understand at the moment is that Christian
> Krafft reported the same problem with tg3, but that driver
> has all interrupts disabled at the device while calling
> the request_irq() function, which seems to be the best
> solution for avoiding the bug in the first place.

It apears that this patch does not apply cleanly any more,
and I think that's a good thing! 

An intervening patch changed the init so that the 
hardware interrupts aren't enabled until after the
request_irq, and after the poll_enable().  Thus,
it seems this pach is no longer needed, right?

I'll pursue with Kou Ishizaki, who pointed out that
I'd missed your email.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: enable poll() before registering interrupts

2007-08-20 Thread Linas Vepstas
On Thu, Jul 12, 2007 at 01:19:11AM +0200, Arnd Bergmann wrote:
> Index: linux-2.6/drivers/net/spider_net.c

Sorry, this one got lost in my mailbox.  Will attend to it shortly.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [459/2many] MAINTAINERS - SPIDERNET NETWORK DRIVER for CELL

2007-08-14 Thread Linas Vepstas
On Mon, Aug 13, 2007 at 10:07:25AM -0700, Joe Perches wrote:
> On Mon, 2007-08-13 at 10:45 -0500, Linas Vepstas wrote:
> > Note quite right. spider-pic is not part of spider_net.
> 
> SPIDERNET NETWORK DRIVER for CELL
> P:Linas Vepstas
> M:[EMAIL PROTECTED]
> L:netdev@vger.kernel.org
> S:Supported
> F:Documentation/networking/spider_net.txt
> F:drivers/net/spider_net*

Works for me.

Acked-by: Linas Vepstas <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [459/2many] MAINTAINERS - SPIDERNET NETWORK DRIVER for CELL

2007-08-13 Thread Linas Vepstas
On Sun, Aug 12, 2007 at 11:36:42PM -0700, [EMAIL PROTECTED] wrote:
> Add file pattern to MAINTAINER entry
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b616562..fa8fb1c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4377,6 +4377,9 @@ P:  Linas Vepstas
>  M:   [EMAIL PROTECTED]
>  L:   netdev@vger.kernel.org
>  S:   Supported
> +F:   Documentation/networking/spider_net.txt
> +F:   arch/powerpc/platforms/cell/spider-pic.c
> +F:   drivers/net/spider_net*

Note quite right. spider-pic is not part of spider_net.
The rest loks fine.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Linas Vepstas
On Mon, Jul 30, 2007 at 05:24:33PM +0200, Jan-Bernd Themann wrote:
> 
> Changes to http://www.spinics.net/lists/netdev/msg36912.html
> 
> 1) A new field called "features" has been added to the net_lro_mgr struct.
>It is set by the driver to indicate:
>- LRO_F_NAPI:Use NAPI / netif_rx to pass packets to stack
> 
>- LRO_F_EXTRACT_VLAN_ID: Set by driver if HW extracts VLAN IDs for VLAN
> packets but does not modify ETH protocol (ETH_P_8021Q)
> 
> 2) Padded frames are not aggregated for now. Bug fixed
> 
> 3) Correct header length now used. No minimal header length for aggregated
>packets used anymore.
> 
> 4) Statistic counters were introduced. They are stored in a new struct in
>the net_lro_mgr. This has the advantage that no locking is required in
>cases where the driver uses multiple lro_mgrs for different receive queues.
>Thus we get the following statistics per lro_mgr / eth device:
>- Number of aggregated packets
>- Number of flushed packets
>- Number of times we run out of lro_desc.
> 
>The ratio of "aggregated packets" and "flushed packets" give you an
>idea how well LRO is working.

I'd like to see an edited form of this, together with an introduction to
LRO, written up in the Documentation subdirectory.  

As someone with some driver experience, but not on te bleeding edge,
some basc newbie questions pop into mind:

-- what is LRO?
-- Basic principles of operation?
-- Can I use it in my driver?  
-- Does my hardware have to have some special feature before I can use it?
-- What sort of performance improvement does it provide? Throughput?
   Latency? CPU usage? How does it affect DMA allocation? Does it 
   improve only a certain type of traffic (large/small packets, etc.)
-- Example code? What's the API? How should my driver use it?

Right now, I can maybe find answers by doing lots of googling.  I'd like
to have some quick way of getting a grip on this.

--linas
   
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] tg3: add PCI error recovery support

2007-07-18 Thread Linas Vepstas

Add support for PCI Error Recovery for the tg3 ethernet
device driver. The general principles of operation are
described in Documentation/pci-error-recovery.txt
Other drivers having similar structure include e100,
e1000, ixgb, s2io, ipr, sym53c8xx_2, and lpfc

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
Cc: Michael Chan <[EMAIL PROTECTED]>



Michael, you are listed as the tg3 maintainer; could you
please forward upstream if you agree?  

Tested on the PCI-E version of this adapter, on power6, 
for 85 (artificial) error injections (overnight) while
ftp'ing dvd iso images over the link. Worked well.

 drivers/net/tg3.c |  108 +-
 1 file changed, 107 insertions(+), 1 deletion(-)

Index: linux-2.6.22-git2/drivers/net/tg3.c
===
--- linux-2.6.22-git2.orig/drivers/net/tg3.c2007-07-17 11:07:30.0 
-0500
+++ linux-2.6.22-git2/drivers/net/tg3.c 2007-07-18 15:10:09.0 -0500
@@ -64,7 +64,7 @@
 
 #define DRV_MODULE_NAME"tg3"
 #define PFX DRV_MODULE_NAME": "
-#define DRV_MODULE_VERSION "3.77"
+#define DRV_MODULE_VERSION "3.77-a"
 #define DRV_MODULE_RELDATE "May 31, 2007"
 
 #define TG3_DEF_MAC_MODE   0
@@ -12126,11 +12126,117 @@ out:
return err;
 }
 
+/**
+ * tg3_io_error_detected - called when PCI error is detected
+ * @pdev: Pointer to PCI device
+ * @state: The current pci connection state
+ *
+ * This function is called after a PCI bus error affecting
+ * this device has been detected. 
+ */
+static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev,
+   pci_channel_state_t state)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct tg3 *tp = netdev_priv(netdev);
+   struct device *dev = &netdev->dev;
+
+   dev_info(dev, "PCI I/O error detected on %s\n", netdev->name);
+
+   if (!netif_running(netdev))
+   return PCI_ERS_RESULT_NEED_RESET;
+
+   /* Want to make sure that the reset task doesn't run */
+   cancel_work_sync(&tp->reset_task);
+   tg3_netif_stop(tp);
+   del_timer_sync(&tp->timer);
+   netif_device_detach(netdev);
+   pci_disable_device(pdev);
+
+   if (state == pci_channel_io_perm_failure) {
+   /* avoid hang in dev_close() with rtnl_lock held */
+   netif_poll_enable(netdev);
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+   return PCI_ERS_RESULT_NEED_RESET;
+}
+
+/**
+ * tg3_io_slot_reset - called after the pci bus has been reset.
+ * @pdev: Pointer to PCI device
+ *
+ * Restart the card from scratch, as if from a cold-boot.
+ * At this point, the card has exprienced a hard reset,
+ * followed by fixups by BIOS, and has its config space
+ * set up identically to what it was at cold boot.
+ */
+static pci_ers_result_t tg3_io_slot_reset(struct pci_dev *pdev)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct tg3 *tp = netdev_priv(netdev);
+   int err;
+
+   if (!netif_running(netdev))
+   return PCI_ERS_RESULT_RECOVERED;
+
+   if (pci_enable_device(pdev)) {
+   printk(KERN_ERR "tg3: %s: "
+  "Cannot re-enable PCI device after reset.\n", 
netdev->name);
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+
+   pci_set_master(pdev);
+   pci_restore_state(tp->pdev);
+   netif_device_attach(netdev);
+
+   tg3_full_lock(tp, 0);
+   tp->tg3_flags |= TG3_FLAG_INIT_COMPLETE;
+   err = tg3_restart_hw(tp, 1);
+   tg3_full_unlock(tp);
+   if (err) {
+   printk(KERN_ERR "tg3: %s: "
+  "Cannot restart hardware after reset.\n", netdev->name);
+   return PCI_ERS_RESULT_DISCONNECT;
+   }
+
+   return PCI_ERS_RESULT_RECOVERED;
+}
+
+/**
+ * tg3_io_resume - called when traffic can start flowing again.
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called when the error recovery driver tells
+ * us that its OK to resume normal operation.
+ */
+static void tg3_io_resume(struct pci_dev *pdev)
+{
+   struct net_device *netdev = pci_get_drvdata(pdev);
+   struct tg3 *tp = netdev_priv(netdev);
+
+   if (!netif_running(netdev))
+   return;
+
+   netif_wake_queue(netdev);
+
+   tp->timer.expires = jiffies + tp->timer_offset;
+   add_timer(&tp->timer);
+
+   tg3_netif_start(tp);
+}
+
+static struct pci_error_handlers tg3_err_handler = {
+   .error_detected = tg3_io_error_detected,
+   .slot_reset = tg3_io_slot_reset,
+   .resume = tg3_io_resume,
+};
+
 static struct pci_driver tg3_driver = {
.name   = DRV_MODULE_NAME,
.id_table   = tg3_pci_tbl,
  

Re: [PATCH] crash in 2.6.22-git2 sysctl_set_parent()

2007-07-16 Thread Linas Vepstas
On Fri, Jul 13, 2007 at 03:47:02PM -0700, David Miller wrote:
> From: [EMAIL PROTECTED] (Linas Vepstas)
> Date: Fri, 13 Jul 2007 15:05:15 -0500
> 
> > 
> > This is a patch (& bug report) for a crash in sysctl_set_parent() 
> > in 2.6.22-git2. 
> > 
> > Problem: 2.6.22-git2 crashes with a stack trace 
> > 
> > Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
> 
> Thanks for tracking this down, I'll apply your patch.

NAK. As I just explained in another email, this bug
was introduced by the "send-to-self" patch I habitually
apply -- so habitually, that I forgot I was not working 
with a "clean" tree. So it goes ... 

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crash in 2.6.22-git2 sysctl_set_parent()

2007-07-16 Thread Linas Vepstas
On Fri, Jul 13, 2007 at 07:06:56PM -0600, Eric W. Biederman wrote:
> > .data   = &ipv4_devconf.loop,
> > .maxlen = sizeof(int),
> > .mode   = 0644,
> > +   .child  = 0x0,
> > .proc_handler   = &proc_dointvec,
> > },
> Where did this entry above in devinet_sysctl come from?

My bad.
I habitually apply the "send-to-self" patch, since some of the 
network testing that I do is easiest if I load up the all of the 
adapters in the same box. (If you're not familiar with this patch ... 
its great, and I wish it was integratedd into mainline. It allows
one to drive network traffic through the physical devices, even
if they are in the same box.  Without it, the network stack is
too clever, and won't allow you to do this.)

> > +   {
> > +   .ctl_name   = 0,
> > +   .procname   = 0,
> > +   },
> I probably would have just done:
> + {},

Yes, in retrospect, this would have been the simplest solution.

> What added the additional entry to devinet_root_dir?  I don't see that
> in Linus' tree?
> 
> The result may be fine but if it isn't named in a per network device
> manner we are adding duplicate entries to the root /proc/sys directory
> which is wrong.
> 
> Actually come to think of it I am concerned that someone added a
> settable entry into /proc/sys/ it should at least be in /proc/sys/net/
> where it won't conflict with other uses of that directory.  Especially
> as things like network devices have user controlled names.

Sigh. Silly me. Haste makes waste.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crash in 2.6.22-git2 sysctl_set_parent()

2007-07-13 Thread Linas Vepstas

This is a patch (& bug report) for a crash in sysctl_set_parent() 
in 2.6.22-git2. 

Problem: 2.6.22-git2 crashes with a stack trace 
[c1d0fb00] c0067b4c .sysctl_set_parent+0x48/0x7c
[c1d0fb90] c0069b40 .register_sysctl_table+0x7c/0xf4
[c1d0fc30] c065e710 .devinet_init+0x88/0xb0
[c1d0fcc0] c065db74 .ip_rt_init+0x17c/0x32c
[c1d0fd70] c065deec .ip_init+0x10/0x34
[c1d0fdf0] c065e898 .inet_init+0x160/0x3dc
[c1d0fea0] c0630bc4 .kernel_init+0x204/0x3c8

A bit of poking around makes it clear what the problem is:
In sysctl_set_parent(), the for loop 

   for (; table->ctl_name || table->procname; table++) {

walks off the end of the table, and into garbage.  Basically,
this for-loop iterator expects all table arrays to be 
"null terminated".  However, net/ipv4/devinet.c statically 
declares an array that is not null-terminated.  The patch 
below fixes that; it works for me.  Its somewhat conservative;
if one wishes to assume that the compiler will always zero out
the empty parts of the structure, then this pach can be shrunk 
to one line: +  ctl_table   devinet_root_dir[3];

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


I tried to audit some of the code to see where else there 
might be similar badly-formed static declarations.  This is hard,
as there's a lot of code. Most seems fine.


 net/core/neighbour.c |4 
 net/ipv4/devinet.c   |7 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

Index: linux-2.6.22-git2/net/ipv4/devinet.c
===
--- linux-2.6.22-git2.orig/net/ipv4/devinet.c   2007-07-13 14:23:21.0 
-0500
+++ linux-2.6.22-git2/net/ipv4/devinet.c2007-07-13 14:24:15.0 
-0500
@@ -1424,7 +1424,7 @@ static struct devinet_sysctl_table {
ctl_table   devinet_dev[2];
ctl_table   devinet_conf_dir[2];
ctl_table   devinet_proto_dir[2];
-   ctl_table   devinet_root_dir[2];
+   ctl_table   devinet_root_dir[3];
 } devinet_sysctl = {
.devinet_vars = {
DEVINET_SYSCTL_COMPLEX_ENTRY(FORWARDING, "forwarding",
@@ -1493,8 +1493,13 @@ static struct devinet_sysctl_table {
.data   = &ipv4_devconf.loop,
.maxlen = sizeof(int),
.mode   = 0644,
+   .child  = 0x0,
.proc_handler   = &proc_dointvec,
},
+   {
+   .ctl_name   = 0,
+   .procname   = 0,
+   },
},
 };
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH] spidernet: don't use debug flag

2007-07-11 Thread Linas Vepstas
On Wed, Jul 11, 2007 at 04:57:38PM +0900, Ishizaki Kou wrote:
[...]
> I need more investigation. Please drop the patch.

OK.

--linas

p.s. I tested ifdown/ifup, and didn't see any problems.
Does your bug happen immediately, or does it take many attempts 
to trigger it?


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] spidernet: improve interrupt handling

2007-07-09 Thread Linas Vepstas

From: Ishizaki Kou <[EMAIL PROTECTED]>

We intend this patch to improve spidernet interrupt handling to be
more strict.  We had following problem and this patch solves it.

 -when CONFIG_DEBUG_SHIRQ=y, request_irq() calls handler().
 -when spider_net_open() is called, it calls request_irq() which calls
  spider_net_interrupt().
 -if some specific interrupt bit is set at this timing, it calls
  netif_rx_schedule() and spider_net_poll() is scheduled.
 -spider_net_open() calls netif_poll_enable() which clears the bit 
  __LINK_STATE_RX_SCHED.
 -when spider_net_poll() is called, it calls netif_rx_complete() which
  causes BUG_ON() because __LINK_STATE_RX_SCHED is not set.

Signed-off-by: Kou Ishizaki <[EMAIL PROTECTED]>
Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


Jeff, please apply for 2.6.23

Linas.

 drivers/net/spider_net.c |   59 +++
 1 file changed, 45 insertions(+), 14 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-14 
17:23:32.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-07-09 14:10:04.0 
-0500
@@ -1478,11 +1478,17 @@ static void
 spider_net_handle_error_irq(struct spider_net_card *card, u32 status_reg)
 {
u32 error_reg1, error_reg2;
+   u32 mask_reg1, mask_reg2;
u32 i;
int show_error = 1;
 
error_reg1 = spider_net_read_reg(card, SPIDER_NET_GHIINT1STS);
error_reg2 = spider_net_read_reg(card, SPIDER_NET_GHIINT2STS);
+   mask_reg1 = spider_net_read_reg(card, SPIDER_NET_GHIINT1MSK);
+   mask_reg2 = spider_net_read_reg(card,SPIDER_NET_GHIINT2MSK);
+
+   error_reg1 &= mask_reg1;
+   error_reg2 &= mask_reg2;
 
/* check GHIINT0STS /
if (status_reg)
@@ -1710,9 +1716,11 @@ spider_net_interrupt(int irq, void *ptr)
 {
struct net_device *netdev = ptr;
struct spider_net_card *card = netdev_priv(netdev);
-   u32 status_reg;
+   u32 status_reg, mask_reg;
 
status_reg = spider_net_read_reg(card, SPIDER_NET_GHIINT0STS);
+   mask_reg = spider_net_read_reg(card, SPIDER_NET_GHIINT0MSK);
+   status_reg &= mask_reg;
 
if (!status_reg)
return IRQ_NONE;
@@ -1754,6 +1762,38 @@ spider_net_poll_controller(struct net_de
 #endif /* CONFIG_NET_POLL_CONTROLLER */
 
 /**
+ * spider_net_enable_interrupts - enable interrupts
+ * @card: card structure
+ *
+ * spider_net_enable_interrupt enables several interrupts
+ */
+static void 
+spider_net_enable_interrupts(struct spider_net_card *card)
+{
+   spider_net_write_reg(card, SPIDER_NET_GHIINT0MSK,
+SPIDER_NET_INT0_MASK_VALUE);
+   spider_net_write_reg(card, SPIDER_NET_GHIINT1MSK,
+SPIDER_NET_INT1_MASK_VALUE);
+   spider_net_write_reg(card, SPIDER_NET_GHIINT2MSK,
+SPIDER_NET_INT2_MASK_VALUE);
+}
+
+/**
+ * spider_net_disable_interrupts - disable interrupts
+ * @card: card structure
+ *
+ * spider_net_disable_interrupts disables all the interrupts
+ */
+static void 
+spider_net_disable_interrupts(struct spider_net_card *card)
+{
+   spider_net_write_reg(card, SPIDER_NET_GHIINT0MSK, 0);
+   spider_net_write_reg(card, SPIDER_NET_GHIINT1MSK, 0);
+   spider_net_write_reg(card, SPIDER_NET_GHIINT2MSK, 0);
+   spider_net_write_reg(card, SPIDER_NET_GMACINTEN, 0);
+}
+
+/**
  * spider_net_init_card - initializes the card
  * @card: card structure
  *
@@ -1773,6 +1813,7 @@ spider_net_init_card(struct spider_net_c
spider_net_write_reg(card, SPIDER_NET_GMACOPEMD,
spider_net_read_reg(card, SPIDER_NET_GMACOPEMD) | 0x4);
 
+   spider_net_disable_interrupts(card);
 }
 
 /**
@@ -1860,14 +1901,6 @@ spider_net_enable_card(struct spider_net
spider_net_write_reg(card, SPIDER_NET_GMACOPEMD,
 SPIDER_NET_OPMODE_VALUE);
 
-   /* set interrupt mask registers */
-   spider_net_write_reg(card, SPIDER_NET_GHIINT0MSK,
-SPIDER_NET_INT0_MASK_VALUE);
-   spider_net_write_reg(card, SPIDER_NET_GHIINT1MSK,
-SPIDER_NET_INT1_MASK_VALUE);
-   spider_net_write_reg(card, SPIDER_NET_GHIINT2MSK,
-SPIDER_NET_INT2_MASK_VALUE);
-
spider_net_write_reg(card, SPIDER_NET_GDTDMACCNTR,
 SPIDER_NET_GDTBSTA);
 }
@@ -2044,6 +2077,8 @@ spider_net_open(struct net_device *netde
netif_carrier_on(netdev);
netif_poll_enable(netdev);
 
+   spider_net_enable_interrupts(card);
+
return 0;
 
 register_int_failed:
@@ -2216,11 +2251,7 @@ spider_net_stop(struct net_device *netde
del_timer_sync(&card->tx_timer);
del_timer_sync(&card->aneg_time

Re: [PATCH] spidernet: improve interrupt handling

2007-07-09 Thread Linas Vepstas
On Mon, Jul 09, 2007 at 05:48:08PM +0900, Ishizaki Kou wrote:
> We intend this patch to improve spidernet interrupt handling to be
> more strict.  We had following problem and this patch solves it.

Looks reasonable to me. I'll forward it upstream. In the future,
could you use "diff -Nupr"? it adds some extra information
(the name of the subroutine) to the patch chunks; this makes 
it easier to read. 

QUILT_DIFF_OPTS="-Nupr"
in ~/.quiltrc if you use quilt.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: don't use debug flag

2007-07-09 Thread Linas Vepstas
On Mon, Jul 09, 2007 at 05:45:21PM +0900, Ishizaki Kou wrote:
> GDTDCEIDIS flag is defined that it is for debug and should not be used.

!? Certainly, my spec doesn't say anything like this;
I don't know of any other way of turning off the descriptor 
chain end interrupt; leaving it on hurts performance in a big way.

I get the following TX performance numbers:

pkt sz   rate w/o patch  rate w/patch
(bytes)  (Mbits/sec) (Mbits/sec)
---  --  -
400503 353
200239  88
100122  44
 60 73  26

That's not quite a 3x performance degradation.

In addition, with your patch, the number of interrupts jumps
from just about zero, to about 55K/second. From what I can tell, 
this huge interrupt rate eats up all the CPU cycles, which is
why the performance drops so drasically.

> We met some troubles on Celleb platform by setting this flag.
>  -network does not recover after ifconfig down, then up operations.

Can you be more specific?  I can't imagine why this flag would
have anything to do with ifdown/ifup. The device open/close 
routines should reset all hardware state; this shouldn't make
any difference. (It doesn't for me, at least).  

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH] ps3: gigabit ethernet driver for PS3, take3

2007-07-09 Thread Linas Vepstas
On Mon, Jul 09, 2007 at 10:50:19AM +0900, Akira Tsukamoto wrote:
> Hi,
> 
> On Fri, 6 Jul 2007 13:02:41 -0500, [EMAIL PROTECTED] (Linas Vepstas) 
> mentioned: 
> > of the spidernet device driver.  Please note that the old
> > spidernet had absolutely disasterous performance for transmit;
> > it also had a variety of crazy hangs and lockups under 
> > high-stress conditions; or NFS operation, or certain back-to-back
> > tcp usage scenarios. A few dozen bugfixes went in since
> > the time that the gelic snapshot was taken.
> 
> I think we know the problems of old spidernet issues, we uses QS20 also 
> in our lab.
> Current gelic has fixed them all separated from your work and it have no 
> remaining issues or performance problem.
> In our measurement, PS3 network performance is better than IBM QS20 right
> now.

!! Well, gee, I wish this plan had been made public. I'd spent something
between 3-6 months working full-time on spidernet issues. This was a lot
of work and effort.  Worse, this was *not* my main job; it was rather to 
help the neighbors prevent a disaster in the making.  Certainly, I would 
not have spent any time on this, if I'd known someone else was also working 
to fix the same bugs. :-(

> I totally understand difficultness of your effort that fixing spidernet 
> without decent documentation which we have, but the changes we made was 
> significantly large as you see if you diff gelic driver with current 
> spidernet driver (totally different), 

The differences are very large, because very large changes have been made
to the spidernet. However, a diff between gelic and the old spidernet is
much smaller.  So I'm somewhat confused by this.  Perhaps I am mistaken, 
and should read the code more carefully.

> so the conclusion of discussion at 
> 3C common-linux wg (Sony, IBM and Toshiba) 

?? What working group is this?  This is the first time that I am hearing
of it, and, as the only active spidernet maintainer, I'd hope to have
been a part of suc discussion. Oh well.

-- Linas Vepstas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH] ps3: gigabit ethernet driver for PS3, take3

2007-07-06 Thread Linas Vepstas
On Thu, Jul 05, 2007 at 11:47:20AM -0500, jschopp wrote:
> 
> This is the third submission of the network driver for PS3.
> The differences from the previous one are:

I notice that this mostly a cut-n-paste of a very old version
of the spidernet device driver.  Please note that the old
spidernet had absolutely disasterous performance for transmit;
it also had a variety of crazy hangs and lockups under 
high-stress conditions; or NFS operation, or certain back-to-back
tcp usage scenarios. A few dozen bugfixes went in since
the time that the gelic snapshot was taken.

Wouldn't it be better to just add the hypervisor bits-n-pieces 
to spidernet?  That way, you get not only the various fixes, but 
also the benefit of fairly regular testing...


--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] spidernet: Replace literal with const

2007-06-14 Thread Linas Vepstas

Replace literal with const; add bit definitions.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>



On Wed, Jun 13, 2007 at 04:12:00PM -0400, Jeff Garzik wrote:
> A follow-up patch needs to remove the above magic numbers (==numeric 
> constants), replacing them with named constants

Here it is. Lightly stres-tested (about 1/2 hour), as this patch
tests some additonal bits.

 drivers/net/spider_net.c |2 +-
 drivers/net/spider_net.h |   19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
15:39:03.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-14 17:23:32.0 
-0500
@@ -1235,7 +1235,7 @@ spider_net_decode_one_descr(struct spide
goto bad_desc;
}
 
-   if (hwdescr->dmac_cmd_status & 0xfcf4) {
+   if (hwdescr->dmac_cmd_status & SPIDER_NET_DESCR_BAD_STATUS) {
dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
   hwdescr->dmac_cmd_status);
pr_err("buf_addr=x%08x\n", hw_buf_addr);
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-11 
15:39:03.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-14 17:34:56.0 
-0500
@@ -359,6 +359,18 @@ enum spider_net_int2_status {
 #define SPIDER_NET_DMAC_UDP0x0003
 #define SPIDER_NET_TXDCEST 0x0800
 
+#define SPIDER_NET_DESCR_RXFDIS0x0001
+#define SPIDER_NET_DESCR_RXDCEIS   0x0002
+#define SPIDER_NET_DESCR_RXDEN0IS  0x0004
+#define SPIDER_NET_DESCR_RXINVDIS  0x0008
+#define SPIDER_NET_DESCR_RXRERRIS  0x0010
+#define SPIDER_NET_DESCR_RXFDCIMS  0x0100
+#define SPIDER_NET_DESCR_RXDCEIMS  0x0200
+#define SPIDER_NET_DESCR_RXDEN0IMS 0x0400
+#define SPIDER_NET_DESCR_RXINVDIMS 0x0800
+#define SPIDER_NET_DESCR_RXRERRMIS 0x1000
+#define SPIDER_NET_DESCR_UNUSED0x077fe0e0
+
 #define SPIDER_NET_DESCR_IND_PROC_MASK 0xF000
 #define SPIDER_NET_DESCR_COMPLETE  0x /* used in rx and tx 
*/
 #define SPIDER_NET_DESCR_RESPONSE_ERROR0x1000 /* used in 
rx and tx */
@@ -369,6 +381,13 @@ enum spider_net_int2_status {
 #define SPIDER_NET_DESCR_NOT_IN_USE0xF000
 #define SPIDER_NET_DESCR_TXDESFLG  0x0080
 
+#define SPIDER_NET_DESCR_BAD_STATUS   (SPIDER_NET_DESCR_RXDEN0IS | \
+   SPIDER_NET_DESCR_RXRERRIS | \
+   SPIDER_NET_DESCR_RXDEN0IMS | \
+   SPIDER_NET_DESCR_RXINVDIMS | \
+   SPIDER_NET_DESCR_RXRERRMIS | \
+   SPIDER_NET_DESCR_UNUSED)
+
 /* Descriptor, as defined by the hardware */
 struct spider_net_hw_descr {
u32 buf_addr;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/15] spidernet: silence the ramfull messages

2007-06-14 Thread Linas Vepstas
On Wed, Jun 13, 2007 at 04:12:00PM -0400, Jeff Garzik wrote:
> Linas Vepstas wrote:
> >--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c   2007-06-11 
> >10:02:34.0 -0500
> >+++ linux-2.6.22-rc1/drivers/net/spider_net.c2007-06-11 
> >11:45:25.0 -0500
> >@@ -1172,7 +1172,7 @@ spider_net_decode_one_descr(struct spide
> > goto bad_desc;
> > }
> > 
> >-if (hwdescr->dmac_cmd_status & 0xfefe) {
> >+if (hwdescr->dmac_cmd_status & 0xfcf4) {
> > pr_err("%s: bad status, cmd_status=x%08x\n",
> >card->netdev->name,
> >hwdescr->dmac_cmd_status);
> 
> 
> A follow-up patch needs to remove the above magic numbers (==numeric 
> constants), replacing them with named constants

I thought laziness was a virtue ... oh, wait, wrong programming language.

Patch coming shortly.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 12/15] spidernet: increase the NAPI weight

2007-06-14 Thread Linas Vepstas
On Wed, Jun 13, 2007 at 10:49:51PM +0200, Arnd Bergmann wrote:
> On Wednesday 13 June 2007, Jeff Garzik wrote:
> > > +/* We really really want to empty the ring buffer every time,
> > > + * so as to avoid the RX ram full bug. So set te napi wieght
> > > + * to the ring size.
> > > + */
> > > +#define SPIDER_NET_NAPI_WEIGHT   SPIDER_NET_RX_DESCRIPTORS_DEFAULT
> > 
> > I don't see why spider_net should have a different NAPI weight from 
> > other drivers

It was a lame attempt to try to trick napi into draining the entire
RX queue in one go, with the goal of avoiding the dreaded rx ram full.
I'm not sure it made much of a difference, so we can let this slide.

> Would it help to do it the other way round, as in

No, that would shorten the RX queue, thus making it more likely
to overflow. At gigabit speeds, its petty easy to fill this thing
up multiple times per jiffy. The driver should continue to operate 
either way, but the larger queue should keep it from being a busy 
beaver.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/15] spidernet: null out skb pointer after its been used.

2007-06-14 Thread Linas Vepstas
On Wed, Jun 13, 2007 at 04:10:17PM -0400, Jeff Garzik wrote:
> Linas Vepstas wrote:
> >Avoid kernel crash in mm/slab.c due to double-free of pointer.
> >
> >If the ethernet interface is brought down while there is still
> >RX traffic in flight, the device shutdown routine can end up
> >trying to double-free an skb, leading to a crash in mm/slab.c
> >Avoid the double-free by nulling out the skb pointer.
> >
> >Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
> >
> >
> > drivers/net/spider_net.c |1 +
> > 1 file changed, 1 insertion(+)
> 
> applied 1-5, 7 to #upstream-fixes (2.6.22)
> 
> patch #6 was ignored, because it was already upstream

Thank you!

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] spidernet driver bug fixes

2007-06-13 Thread Linas Vepstas
On Tue, Jun 12, 2007 at 08:04:18PM -0400, Jeff Garzik wrote:
> >
> >>Should I just drop all spidernet patches and start over?
> >
> >No. Apply the series I just sent you, dropping the one called
> >"patch 6/15", the one from Florin Malita, as it appears you'd
> >previously picked this up.  The rest of the patches should apply
> >cleanly; I just cheked. I just did a "git pull" of 
> >git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
> >and checked. The result of patching is exactly as it should be.
> >
> 
> As I just stated, many of the patches in the "current" patch series have 
> 
> Linas Vepstas (11):
>   s2io: add PCI error recovery support
>   s2io: add PCI error recovery support
>   spidernet: beautify error messages
>   spidernet: move a block of code around
>   spidernet: zero out a pointer.
>   spidernet: null out skb pointer after its been used.
>   spidernet: Don't terminate the RX ring
>   spidernet: enhance the dump routine
>   spidernet: reset the card when an rxramfull is seen
>   spidernet: service TX later.
>   spidernet: increase the NAPI weight
> 
> These are clearly duplicating some of the patches in your patchseries, 
> which means you are woefully out of sync with upstream.
> already been applied to netdev-2.6.git#upstream:

My apologies; I'm trying.  Seems that I've tripped over a git "feature".

"git branch" shows that I'm on "upstream".  So I performed a "git pull" 
(without any additional arguments) assuming that it would sync to your
"upstream" branch.  And so my email was based on this.

Some googling seems to show that "git pull" has a bug/feature of
ignoring the branch that one is working in, and pulling "master"
no matter what.  I have no clue why; this seems broken to me.

So ... let me try again ... 
git pull git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 
upstream
...
Automatic merge failed; fix up by hand

So not only did "git pull" not fetch the correct branch, but it also
wrecked the repository.  Glug. I have no clue how to recover from this.

I suggest dropping the above series of spidernet patches, and reapplying 
the series of 15 I'd sent in. 

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/15] spidernet driver bug fixes

2007-06-12 Thread Linas Vepstas
On Tue, Jun 12, 2007 at 07:00:17PM -0400, Jeff Garzik wrote:
> Linas Vepstas wrote:
> >On Fri, Jun 08, 2007 at 01:20:20PM -0400, Jeff Garzik wrote:
> >>On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
> >>>On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> >>>>On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> >>>>>The major bug fixes are: 
> >>>>I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
> >>>Yeah, I suppose, I admit I've lost track of the process. 
> >>You need to order your bug fixes first in the queue. 
> >
> >OK, here are the patches, re-ordered. There is a different number
> >than last time, as I threw out one, merged one, and got cold feet
> >on a third one.  They still pass the tests.
> >
> >The first five patches focus on three serious bugs, fixing crashes or
> >hangs.
> >
> >-- patch 1 -- kernel crash when ifdown while receiving packets.
> >-- patch 2,3,4 -- device driver deadlocks on "RX ram full" mesgs.
> >  (kernel stays up, ifdown/up clear the problem).
> >-- patch 5 -- misconfigured TX interrupts results in 3x-4x per
> >  degradation for small packets.
> >
> >-- patch 6 -- rx stats may be mangled
> >-- patch 7 -- hw checksum sometimes breaks ipv6 operation
> >
> >-- patches 8-15 -- misc tweaks, and documentation.
> >
> >
> >I re-ran my stress tests with patches 1-7 applied; they pass.
> 
> This is a bit frustrating, because this includes many patches that you 
> ALREADY told me to queue for 2.6.23, which I did, in 
> netdev-2.6.git#upstream.

Sigh. I redid the series so as to avoid this problem, per the 
previous conversation. 

> Should I just drop all spidernet patches and start over?

No. Apply the series I just sent you, dropping the one called
"patch 6/15", the one from Florin Malita, as it appears you'd
previously picked this up.  The rest of the patches should apply
cleanly; I just cheked. I just did a "git pull" of 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
and checked. The result of patching is exactly as it should be.

Just in case it wasn't clear, I'd like to see patches 1-5 go
into 2.6.22 ... as these address the most critical complaints I'd
gotten recently.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/15] spidernet: driver docmentation

2007-06-11 Thread Linas Vepstas

Documentation for the spidernet driver.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 Documentation/networking/spider_net.txt |  204 
 1 file changed, 204 insertions(+)

Index: linux-2.6.22-rc1/Documentation/networking/spider_net.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc1/Documentation/networking/spider_net.txt2007-06-11 
11:53:31.0 -0500
@@ -0,0 +1,204 @@
+
+The Spidernet Device Driver
+===
+
+Written by Linas Vepstas <[EMAIL PROTECTED]>
+
+Version of 7 June 2007
+
+Abstract
+
+This document sketches the structure of portions of the spidernet
+device driver in the Linux kernel tree. The spidernet is a gigabit
+ethernet device built into the Toshiba southbridge commonly used
+in the SONY Playstation 3 and the IBM QS20 Cell blade.
+
+The Structure of the RX Ring.
+=
+The receive (RX) ring is a circular linked list of RX descriptors,
+together with three pointers into the ring that are used to manage its
+contents.
+
+The elements of the ring are called "descriptors" or "descrs"; they
+describe the received data. This includes a pointer to a buffer
+containing the received data, the buffer size, and various status bits.
+
+There are three primary states that a descriptor can be in: "empty",
+"full" and "not-in-use".  An "empty" or "ready" descriptor is ready
+to receive data from the hardware. A "full" descriptor has data in it,
+and is waiting to be emptied and processed by the OS. A "not-in-use"
+descriptor is neither empty or full; it is simply not ready. It may
+not even have a data buffer in it, or is otherwise unusable.
+
+During normal operation, on device startup, the OS (specifically, the
+spidernet device driver) allocates a set of RX descriptors and RX
+buffers. These are all marked "empty", ready to receive data. This
+ring is handed off to the hardware, which sequentially fills in the
+buffers, and marks them "full". The OS follows up, taking the full
+buffers, processing them, and re-marking them empty.
+
+This filling and emptying is managed by three pointers, the "head"
+and "tail" pointers, managed by the OS, and a hardware current
+descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
+currently being filled. When this descr is filled, the hardware
+marks it full, and advances the GDACTDPA by one.  Thus, when there is
+flowing RX traffic, every descr behind it should be marked "full",
+and everything in front of it should be "empty".  If the hardware
+discovers that the current descr is not empty, it will signal an
+interrupt, and halt processing.
+
+The tail pointer tails or trails the hardware pointer. When the
+hardware is ahead, the tail pointer will be pointing at a "full"
+descr. The OS will process this descr, and then mark it "not-in-use",
+and advance the tail pointer.  Thus, when there is flowing RX traffic,
+all of the descrs in front of the tail pointer should be "full", and
+all of those behind it should be "not-in-use". When RX traffic is not
+flowing, then the tail pointer can catch up to the hardware pointer.
+The OS will then note that the current tail is "empty", and halt
+processing.
+
+The head pointer (somewhat mis-named) follows after the tail pointer.
+When traffic is flowing, then the head pointer will be pointing at
+a "not-in-use" descr. The OS will perform various housekeeping duties
+on this descr. This includes allocating a new data buffer and
+dma-mapping it so as to make it visible to the hardware. The OS will
+then mark the descr as "empty", ready to receive data. Thus, when there
+is flowing RX traffic, everything in front of the head pointer should
+be "not-in-use", and everything behind it should be "empty". If no
+RX traffic is flowing, then the head pointer can catch up to the tail
+pointer, at which point the OS will notice that the head descr is
+"empty", and it will halt processing.
+
+Thus, in an idle system, the GDACTDPA, tail and head pointers will
+all be pointing at the same descr, which should be "empty". All of the
+other descrs in the ring should be "empty" as well.
+
+The show_rx_chain() routine will print out the the locations of the
+GDACTDPA, tail and head pointers. It will also summarize the contents
+of the ring, starting at the tail pointer, and listing the status
+of the descrs that follow.
+
+A typical example of the output, for a nearly idle system, might be
+
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=20
+net eth1: Chain head is at 20
+net eth1: HW curr desc (GDACTDPA

[PATCH 14/15] spidernet: fix misnamed flag

2007-06-11 Thread Linas Vepstas

The transmit frame tail bit is stranglely misnamed as 
"no checksum". Fix the name to what it should be:
"transmit frame tail". No functional change, 
just a name change.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |2 +-
 drivers/net/spider_net.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:53:27.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:53:29.0 
-0500
@@ -716,7 +716,7 @@ spider_net_prepare_tx_descr(struct spide
hwdescr->data_status = 0;
 
hwdescr->dmac_cmd_status =
-   SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
+   SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_TXFRMTL;
spin_unlock_irqrestore(&chain->lock, flags);
 
if (skb->ip_summed == CHECKSUM_PARTIAL)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-11 
11:53:26.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-11 11:53:29.0 
-0500
@@ -354,7 +354,7 @@ enum spider_net_int2_status {
 #define SPIDER_NET_GPRDAT_MASK 0x
 
 #define SPIDER_NET_DMAC_NOINTR_COMPLETE0x0080
-#define SPIDER_NET_DMAC_NOCS   0x0004
+#define SPIDER_NET_DMAC_TXFRMTL0x0004
 #define SPIDER_NET_DMAC_TCP0x0002
 #define SPIDER_NET_DMAC_UDP0x0003
 #define SPIDER_NET_TXDCEST 0x0800
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/15] spidernet: move a block of code around

2007-06-11 Thread Linas Vepstas


Put the enable and disable routines next to one-another, 
as this makes verifying thier symmetry that much easier.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:53:24.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:53:27.0 
-0500
@@ -501,6 +501,20 @@ spider_net_enable_rxdmac(struct spider_n
 }
 
 /**
+ * spider_net_disable_rxdmac - disables the receive DMA controller
+ * @card: card structure
+ *
+ * spider_net_disable_rxdmac terminates processing on the DMA controller
+ * by turing off the DMA controller, with the force-end flag set.
+ */
+static inline void
+spider_net_disable_rxdmac(struct spider_net_card *card)
+{
+   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
+SPIDER_NET_DMA_RX_FEND_VALUE);
+}
+
+/**
  * spider_net_refill_rx_chain - refills descriptors/skbs in the rx chains
  * @card: card structure
  *
@@ -656,20 +670,6 @@ write_hash:
 }
 
 /**
- * spider_net_disable_rxdmac - disables the receive DMA controller
- * @card: card structure
- *
- * spider_net_disable_rxdmac terminates processing on the DMA controller by
- * turing off DMA and issueing a force end
- */
-static void
-spider_net_disable_rxdmac(struct spider_net_card *card)
-{
-   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
-SPIDER_NET_DMA_RX_FEND_VALUE);
-}
-
-/**
  * spider_net_prepare_tx_descr - fill tx descriptor with skb data
  * @card: card structure
  * @descr: descriptor structure to fill out
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/15] spidernet: increase the NAPI weight

2007-06-11 Thread Linas Vepstas

Another way of minimizing the likelyhood of RX ram from overflowing
is to empty out the entire rx ring every chance we get. Change
the crazy watchdog timeout from 50 seconds to 3 seconds, while
we're here.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.h |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-11 
11:50:03.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-11 11:53:26.0 
-0500
@@ -56,8 +56,13 @@ extern char spider_net_driver_name[];
 
 #define SPIDER_NET_RX_CSUM_DEFAULT 1
 
-#define SPIDER_NET_WATCHDOG_TIMEOUT50*HZ
-#define SPIDER_NET_NAPI_WEIGHT 64
+#define SPIDER_NET_WATCHDOG_TIMEOUT3*HZ
+
+/* We really really want to empty the ring buffer every time,
+ * so as to avoid the RX ram full bug. So set te napi wieght
+ * to the ring size.
+ */
+#define SPIDER_NET_NAPI_WEIGHT SPIDER_NET_RX_DESCRIPTORS_DEFAULT
 
 #define SPIDER_NET_FIRMWARE_SEQS   6
 #define SPIDER_NET_FIRMWARE_SEQWORDS   1024
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/15] spidernet: service TX later.

2007-06-11 Thread Linas Vepstas


When entering the netdev poll routine, empty out the RX
chain first, before cleaning up the TX chain. This should
help avoid RX buffer overflows.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:53:21.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:53:24.0 
-0500
@@ -1287,7 +1287,6 @@ spider_net_poll(struct net_device *netde
int packets_to_do, packets_done = 0;
int no_more_packets = 0;
 
-   spider_net_cleanup_tx_ring(card);
packets_to_do = min(*budget, netdev->quota);
 
while (packets_to_do) {
@@ -1312,6 +1311,8 @@ spider_net_poll(struct net_device *netde
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
 
+   spider_net_cleanup_tx_ring(card);
+
/* if all packets are in the stack, enable interrupts and return 0 */
/* if not, return 1 */
if (no_more_packets) {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/15] spidernet: invalidate unused pointer.

2007-06-11 Thread Linas Vepstas

Invalidate a pointer as its pci_unmap'ed; this is a bit of 
paranoia to make sure hardware doesn't continue trying to 
DMA to it. 

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:51:19.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:53:21.0 
-0500
@@ -1187,6 +1187,7 @@ spider_net_decode_one_descr(struct spide
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *descr = chain->tail;
struct spider_net_hw_descr *hwdescr = descr->hwdescr;
+   u32 hw_buf_addr;
int status;
 
status = spider_net_get_descr_status(hwdescr);
@@ -1200,7 +1201,9 @@ spider_net_decode_one_descr(struct spide
chain->tail = descr->next;
 
/* unmap descriptor */
-   pci_unmap_single(card->pdev, hwdescr->buf_addr,
+   hw_buf_addr = hwdescr->buf_addr;
+   hwdescr->buf_addr = 0x;
+   pci_unmap_single(card->pdev, hw_buf_addr,
SPIDER_NET_MAX_FRAME, PCI_DMA_FROMDEVICE);
 
if ( (status == SPIDER_NET_DESCR_RESPONSE_ERROR) ||
@@ -1237,7 +1240,7 @@ spider_net_decode_one_descr(struct spide
dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
   card->netdev->name,
   hwdescr->dmac_cmd_status);
-   pr_err("buf_addr=x%08x\n", hwdescr->buf_addr);
+   pr_err("buf_addr=x%08x\n", hw_buf_addr);
pr_err("buf_size=x%08x\n", hwdescr->buf_size);
pr_err("next_descr_addr=x%08x\n", hwdescr->next_descr_addr);
pr_err("result_size=x%08x\n", hwdescr->result_size);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/15] spidernet: enhance the dump routine

2007-06-11 Thread Linas Vepstas

Crazy device problems are hard to debug, when one does not have
good trace info. This patch makes a major enhancement to the
device dump routine.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   78 ++-
 1 file changed, 70 insertions(+), 8 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
11:50:03.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:51:19.0 
-0500
@@ -1022,34 +1022,94 @@ spider_net_pass_skb_up(struct spider_net
netif_receive_skb(skb);
 }
 
-#ifdef DEBUG
 static void show_rx_chain(struct spider_net_card *card)
 {
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *start= chain->tail;
struct spider_net_descr *descr= start;
+   struct spider_net_hw_descr *hwd = start->hwdescr;
+   struct device *dev = &card->netdev->dev;
+   u32 curr_desc, next_desc;
int status;
 
+   int tot = 0;
int cnt = 0;
-   int cstat = spider_net_get_descr_status(descr);
-   printk(KERN_INFO "RX chain tail at descr=%ld\n",
-(start - card->descr) - card->tx_chain.num_desc);
+   int off = start - chain->ring;
+   int cstat = hwd->dmac_cmd_status;
+
+   dev_info(dev, "Total number of descrs=%d\n",
+   chain->num_desc);
+   dev_info(dev, "Chain tail located at descr=%d, status=0x%x\n",
+   off, cstat);
+
+   curr_desc = spider_net_read_reg(card, SPIDER_NET_GDACTDPA);
+   next_desc = spider_net_read_reg(card, SPIDER_NET_GDACNEXTDA);
+
status = cstat;
do
{
-   status = spider_net_get_descr_status(descr);
+   hwd = descr->hwdescr;
+   off = descr - chain->ring;
+   status = hwd->dmac_cmd_status;
+
+   if (descr == chain->head)
+   dev_info(dev, "Chain head is at %d, head status=0x%x\n",
+off, status);
+
+   if (curr_desc == descr->bus_addr)
+   dev_info(dev, "HW curr desc (GDACTDPA) is at %d, 
status=0x%x\n",
+off, status);
+
+   if (next_desc == descr->bus_addr)
+   dev_info(dev, "HW next desc (GDACNEXTDA) is at %d, 
status=0x%x\n",
+off, status);
+
+   if (hwd->next_descr_addr == 0)
+   dev_info(dev, "chain is cut at %d\n", off);
+
if (cstat != status) {
-   printk(KERN_INFO "Have %d descrs with stat=x%08x\n", 
cnt, cstat);
+   int from = (chain->num_desc + off - cnt) % 
chain->num_desc;
+   int to = (chain->num_desc + off - 1) % chain->num_desc;
+   dev_info(dev, "Have %d (from %d to %d) descrs "
+"with stat=0x%08x\n", cnt, from, to, cstat);
cstat = status;
cnt = 0;
}
+
cnt ++;
+   tot ++;
+   descr = descr->next;
+   } while (descr != start);
+
+   dev_info(dev, "Last %d descrs with stat=0x%08x "
+"for a total of %d descrs\n", cnt, cstat, tot);
+
+#ifdef DEBUG
+   /* Now dump the whole ring */
+   descr = start;
+   do
+   {
+   struct spider_net_hw_descr *hwd = descr->hwdescr;
+   status = spider_net_get_descr_status(hwd);
+   cnt = descr - chain->ring;
+   dev_info(dev, "Descr %d stat=0x%08x skb=%p\n",
+cnt, status, descr->skb);
+   dev_info(dev, "bus addr=%08x buf addr=%08x sz=%d\n",
+descr->bus_addr, hwd->buf_addr, hwd->buf_size);
+   dev_info(dev, "next=%08x result sz=%d valid sz=%d\n",
+hwd->next_descr_addr, hwd->result_size,
+hwd->valid_size);
+   dev_info(dev, "dmac=%08x data stat=%08x data err=%08x\n",
+hwd->dmac_cmd_status, hwd->data_status,
+hwd->data_error);
+   dev_info(dev, "\n");
+
descr = descr->next;
} while (descr != start);
-   printk(KERN_INFO "Last %d descrs with stat=x%08x\n", cnt, cstat);
-}
 #endif
 
+}
+
 /**
  * spider_net_resync_head_ptr - Advance head ptr past empty descrs
  *
@@ -1197,6 +1257,8 @@ spider_net_decode_one_descr(struct spide
return 1;
 

[PATCH 8/15] spidernet: beautify error messages

2007-06-11 Thread Linas Vepstas

Use dev_err() to print device error messages.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   64 ---
 1 file changed, 34 insertions(+), 30 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
13:09:46.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 13:11:29.0 
-0500
@@ -434,7 +434,8 @@ spider_net_prepare_rx_descr(struct spide
  bufsize + SPIDER_NET_RXBUF_ALIGN - 1);
if (!descr->skb) {
if (netif_msg_rx_err(card) && net_ratelimit())
-   pr_err("Not enough memory to allocate rx buffer\n");
+   dev_err(&card->netdev->dev,
+   "Not enough memory to allocate rx buffer\n");
card->spider_stats.alloc_rx_skb_error++;
return -ENOMEM;
}
@@ -455,7 +456,7 @@ spider_net_prepare_rx_descr(struct spide
dev_kfree_skb_any(descr->skb);
descr->skb = NULL;
if (netif_msg_rx_err(card) && net_ratelimit())
-   pr_err("Could not iommu-map rx buffer\n");
+   dev_err(&card->netdev->dev, "Could not iommu-map rx 
buffer\n");
card->spider_stats.rx_iommu_map_error++;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
@@ -692,7 +693,7 @@ spider_net_prepare_tx_descr(struct spide
buf = pci_map_single(card->pdev, skb->data, skb->len, PCI_DMA_TODEVICE);
if (pci_dma_mapping_error(buf)) {
if (netif_msg_tx_err(card) && net_ratelimit())
-   pr_err("could not iommu-map packet (%p, %i). "
+   dev_err(&card->netdev->dev, "could not iommu-map packet 
(%p, %i). "
  "Dropping packet\n", skb->data, skb->len);
card->spider_stats.tx_iommu_map_error++;
return -ENOMEM;
@@ -832,9 +833,8 @@ spider_net_release_tx_chain(struct spide
case SPIDER_NET_DESCR_PROTECTION_ERROR:
case SPIDER_NET_DESCR_FORCE_END:
if (netif_msg_tx_err(card))
-   pr_err("%s: forcing end of tx descriptor "
-  "with status x%02x\n",
-  card->netdev->name, status);
+   dev_err(&card->netdev->dev, "forcing end of tx 
descriptor "
+  "with status x%02x\n", status);
card->netdev_stats.tx_errors++;
break;
 
@@ -1147,8 +1147,8 @@ spider_net_decode_one_descr(struct spide
 (status == SPIDER_NET_DESCR_PROTECTION_ERROR) ||
 (status == SPIDER_NET_DESCR_FORCE_END) ) {
if (netif_msg_rx_err(card))
-   pr_err("%s: dropping RX descriptor with state %d\n",
-  card->netdev->name, status);
+   dev_err(&card->netdev->dev,
+  "dropping RX descriptor with state %d\n", 
status);
card->netdev_stats.rx_dropped++;
goto bad_desc;
}
@@ -1156,8 +1156,8 @@ spider_net_decode_one_descr(struct spide
if ( (status != SPIDER_NET_DESCR_COMPLETE) &&
 (status != SPIDER_NET_DESCR_FRAME_END) ) {
if (netif_msg_rx_err(card))
-   pr_err("%s: RX descriptor with unknown state %d\n",
-  card->netdev->name, status);
+   dev_err(&card->netdev->dev,
+  "RX descriptor with unknown state %d\n", status);
card->spider_stats.rx_desc_unk_state++;
goto bad_desc;
}
@@ -1165,16 +1165,15 @@ spider_net_decode_one_descr(struct spide
/* The cases we'll throw away the packet immediately */
if (hwdescr->data_error & SPIDER_NET_DESTROY_RX_FLAGS) {
if (netif_msg_rx_err(card))
-   pr_err("%s: error in received descriptor found, "
+   dev_err(&card->netdev->dev,
+  "error in received descriptor found, "
   "data_status=x%08x, data_error=x%08x\n",
-  card->netdev->name,
   hwdescr->data_status, hwdescr->

[PATCH 7/15] spidernet: checksum and ethtool

2007-06-11 Thread Linas Vepstas

From: Stephen Hemminger <[EMAIL PROTECTED]>

It doesn't look like spidernet hardware can really checksum all protocols,
the code looks like it does IPV4 only.  If so, it should use NETIF_F_IP_CSUM
instead of NETIF_F_HW_CSUM.

The driver doesn't need it's own get/set for ethtool tx csum, and it
should use the standard ethtool_op_get_link.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>

-

 drivers/net/spider_net.c |4 ++--
 drivers/net/spider_net_ethtool.c |   21 +++--
 2 files changed, 5 insertions(+), 20 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
17:28:55.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-08 17:28:58.0 
-0500
@@ -718,7 +718,7 @@ spider_net_prepare_tx_descr(struct spide
SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
spin_unlock_irqrestore(&chain->lock, flags);
 
-   if (skb->protocol == htons(ETH_P_IP) && skb->ip_summed == 
CHECKSUM_PARTIAL)
+   if (skb->ip_summed == CHECKSUM_PARTIAL)
switch (ip_hdr(skb)->protocol) {
case IPPROTO_TCP:
hwdescr->dmac_cmd_status |= SPIDER_NET_DMAC_TCP;
@@ -2300,7 +2300,7 @@ spider_net_setup_netdev(struct spider_ne
 
spider_net_setup_netdev_ops(netdev);
 
-   netdev->features = NETIF_F_HW_CSUM | NETIF_F_LLTX;
+   netdev->features = NETIF_F_IP_CSUM | NETIF_F_LLTX;
/* some time: NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX |
 *  NETIF_F_HW_VLAN_FILTER */
 
Index: linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net_ethtool.c  2007-06-08 
17:27:01.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c   2007-06-08 
17:28:58.0 -0500
@@ -134,22 +134,6 @@ spider_net_ethtool_set_rx_csum(struct ne
return 0;
 }
 
-static uint32_t
-spider_net_ethtool_get_tx_csum(struct net_device *netdev)
-{
-return (netdev->features & NETIF_F_HW_CSUM) != 0;
-}
-
-static int
-spider_net_ethtool_set_tx_csum(struct net_device *netdev, uint32_t data)
-{
-if (data)
-netdev->features |= NETIF_F_HW_CSUM;
-else
-netdev->features &= ~NETIF_F_HW_CSUM;
-
-return 0;
-}
 
 static void
 spider_net_ethtool_get_ringparam(struct net_device *netdev,
@@ -200,11 +184,12 @@ const struct ethtool_ops spider_net_etht
.get_wol= spider_net_ethtool_get_wol,
.get_msglevel   = spider_net_ethtool_get_msglevel,
.set_msglevel   = spider_net_ethtool_set_msglevel,
+   .get_link   = ethtool_op_get_link,
.nway_reset = spider_net_ethtool_nway_reset,
.get_rx_csum= spider_net_ethtool_get_rx_csum,
.set_rx_csum= spider_net_ethtool_set_rx_csum,
-   .get_tx_csum= spider_net_ethtool_get_tx_csum,
-   .set_tx_csum= spider_net_ethtool_set_tx_csum,
+   .get_tx_csum= ethtool_op_get_tx_csum,
+   .set_tx_csum= ethtool_op_set_tx_csum,
.get_ringparam  = spider_net_ethtool_get_ringparam,
.get_strings= spider_net_get_strings,
.get_stats_count= spider_net_get_stats_count,
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/15] spidernet: skb used after netif_receive_skb

2007-06-11 Thread Linas Vepstas

From: Florin Malita <[EMAIL PROTECTED]>

The stats update code in spider_net_pass_skb_up() is touching the skb 
after it's been passed up to the stack. To avoid that, just update the 
stats first.

Signed-off-by: Florin Malita <[EMAIL PROTECTED]>
Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>



 drivers/net/spider_net.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 108adbf..1df2f0b 100644
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
17:40:02.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-08 17:40:09.0 
-0500
@@ -1014,12 +1014,12 @@ spider_net_pass_skb_up(struct spider_net
 */
}
 
-   /* pass skb up to stack */
-   netif_receive_skb(skb);
-
/* update netdevice statistics */
card->netdev_stats.rx_packets++;
card->netdev_stats.rx_bytes += skb->len;
+
+   /* pass skb up to stack */
+   netif_receive_skb(skb);
 }
 
 #ifdef DEBUG
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/15] spidernet: Cure RX ram full bug

2007-06-11 Thread Linas Vepstas

This patch fixes a rare deadlock that can occur when the kernel
is not able to empty out the RX ring quickly enough. Below follows
a detailed description of the bug and the fix.

As long as the OS can empty out the RX buffers at a rate faster than
the hardware can fill them, there is no problem. If, for some reason,
the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
pointer will catch up to the head, notice the not-empty condition,
ad stop. However, RX packets may still continue arriving on the wire.
The spidernet chip can save some limited number of these in local RAM.
When this local ram fills up, the spider chip will issue an interrupt
indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
will be set in GHIINT1STS).  When te RX ram full condition occurs, 
a certain bug/feature is triggered that has to be specially handled. 
This section describes the special handling for this condition.

When the OS finally has a chance to run, it will empty out the RX ring.
In particular, it will clear the descriptor on which the hardware had
stopped. However, once the hardware has decided that a certain
descriptor is invalid, it will not restart at that descriptor; instead
it will restart at the next descr. This potentially will lead to a 
deadlock condition, as the tail pointer will be pointing at this descr, 
which, from the OS point of view, is empty; the OS will be waiting for 
this descr to be filled. However, the hardware has skipped this descr, 
and is filling the next descrs. Since the OS doesn't see this, there
is a potential deadlock, with the OS waiting for one descr to fill, 
while the hardware is waiting for a differen set of descrs to become
empty.

A call to show_rx_chain() at this point indicates the nature of the
problem. A typical print when the network is hung shows the following:

net eth1: Spider RX RAM full, incoming packets might be discarded!
net eth1: Total number of descrs=256
net eth1: Chain tail located at descr=255
net eth1: Chain head is at 255
net eth1: HW curr desc (GDACTDPA) is at 0
net eth1: Have 1 descrs with stat=xa080
net eth1: HW next desc (GDACNEXTDA) is at 1
net eth1: Have 127 descrs with stat=x40800101
net eth1: Have 1 descrs with stat=x4081
net eth1: Have 126 descrs with stat=x40800101
net eth1: Last 1 descrs with stat=xa080

Both the tail and head pointers are pointing at descr 255, which is
marked xa... which is "empty". Thus, from the OS point of view, there
is nothing to be done. In particular, there is the implicit assumption
that everything in front of the "empty" descr must surely also be empty,
as explained in the last section. The OS is waiting for descr 255 to
become non-empty, which, in this case, will never happen.

The HW pointer is at descr 0. This descr is marked 0x4.. or "full". 
Since its already full, the hardware can do nothing more, and thus has
halted processing. Notice that descrs 0 through 254 are all marked
"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is 
descr 254, since tail was at 255.) Thus, the system is deadlocked, 
and there can be no forward progress; the OS thinks there's nothing 
to do, and the hardware has nowhere to put incoming data.

This bug/feature is worked around with the spider_net_resync_head_ptr()
routine. When the driver receives RX interrupts, but an examination
of the RX chain seems to show it is empty, then it is probable that
the hardware has skipped a descr or two (sometimes dozens under heavy
network conditions). The spider_net_resync_head_ptr() subroutine will
search the ring for the next full descr, and the driver will resume
operations there.  Since this will leave "holes" in the ring, there
is also a spider_net_resync_tail_ptr() that will skip over such holes. 


Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   86 +++
 drivers/net/spider_net.h |3 +
 2 files changed, 82 insertions(+), 7 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
15:48:10.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 10:02:12.0 
-0500
@@ -1051,6 +1051,66 @@ static void show_rx_chain(struct spider_
 #endif
 
 /**
+ * spider_net_resync_head_ptr - Advance head ptr past empty descrs
+ *
+ * If the driver fails to keep up and empty the queue, then the
+ * hardware wil run out of room to put incoming packets. This
+ * will cause the hardware to skip descrs that are full (instead
+ * of halting/retrying). Thus, once the driver runs, it wil need
+ * to "catch up" to where the hardware chain pointer is at.
+ */
+static void spider_net_resync_head_ptr(struct spider_net_card *card)
+{
+   unsigned long flags;
+   struct spider_net_descr

[PATCH 5/15] spidernet: turn off descriptor chain end interrupt.

2007-06-11 Thread Linas Vepstas

At some point, the transmit descriptor chain end interrupt (TXDCEINT)
was turned on. This is a mistake; and it damages small packet
transmit performance, as it results in a huge storm of interrupts.  
Turn it off.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-08 
17:40:02.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-08 17:40:05.0 
-0500
@@ -222,6 +222,7 @@ extern char spider_net_driver_name[];
 #define SPIDER_NET_GDTBSTA 0x0300
 #define SPIDER_NET_GDTDCEIDIS  0x0002
 #define SPIDER_NET_DMA_TX_VALUESPIDER_NET_TX_DMA_EN | \
+   SPIDER_NET_GDTDCEIDIS | \
SPIDER_NET_GDTBSTA
 
 #define SPIDER_NET_DMA_TX_FEND_VALUE   0x00030003
@@ -332,8 +333,7 @@ enum spider_net_int2_status {
SPIDER_NET_GRISPDNGINT
 };
 
-#define SPIDER_NET_TXINT   ( (1 << SPIDER_NET_GDTFDCINT) | \
- (1 << SPIDER_NET_GDTDCEINT) )
+#define SPIDER_NET_TXINT   (1 << SPIDER_NET_GDTFDCINT)
 
 /* We rely on flagged descriptor interrupts */
 #define SPIDER_NET_RXINT   ( (1 << SPIDER_NET_GDAFDCINT) )
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/15] spidernet: silence the ramfull messages

2007-06-11 Thread Linas Vepstas

Although the previous patch resolved issues with hangs when the
RX ram full interrupt is encountered, there are still situations
where lots of RX ramfull interrupts arrive, resulting in a noisy
log in syslog. There is no need for this.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   20 +++-
 drivers/net/spider_net.h |1 +
 2 files changed, 12 insertions(+), 9 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-11 
10:02:34.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-11 11:45:25.0 
-0500
@@ -1172,7 +1172,7 @@ spider_net_decode_one_descr(struct spide
goto bad_desc;
}
 
-   if (hwdescr->dmac_cmd_status & 0xfefe) {
+   if (hwdescr->dmac_cmd_status & 0xfcf4) {
pr_err("%s: bad status, cmd_status=x%08x\n",
   card->netdev->name,
   hwdescr->dmac_cmd_status);
@@ -1251,6 +1251,7 @@ spider_net_poll(struct net_device *netde
if (no_more_packets) {
netif_rx_complete(netdev);
spider_net_rx_irq_on(card);
+   card->ignore_rx_ramfull = 0;
return 0;
}
 
@@ -1521,15 +1522,15 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFBFLLINT: /* fallthrough */
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
-   if (netif_msg_intr(card) && net_ratelimit())
-   pr_err("Spider RX RAM full, incoming packets "
-  "might be discarded!\n");
/* Could happen when rx chain is full */
-   spider_net_resync_head_ptr(card);
-   spider_net_refill_rx_chain(card);
-   spider_net_enable_rxdmac(card);
-   card->num_rx_ints ++;
-   netif_rx_schedule(card->netdev);
+   if (card->ignore_rx_ramfull == 0) {
+   card->ignore_rx_ramfull = 1;
+   spider_net_resync_head_ptr(card);
+   spider_net_refill_rx_chain(card);
+   spider_net_enable_rxdmac(card);
+   card->num_rx_ints ++;
+   netif_rx_schedule(card->netdev);
+   }
show_error = 0;
break;
 
@@ -2305,6 +2306,7 @@ spider_net_setup_netdev(struct spider_ne
 
netdev->irq = card->pdev->irq;
card->num_rx_ints = 0;
+   card->ignore_rx_ramfull = 0;
 
dn = pci_device_to_OF_node(card->pdev);
if (!dn)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-11 
10:02:25.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-11 11:45:50.0 
-0500
@@ -462,6 +462,7 @@ struct spider_net_card {
atomic_t tx_timeout_task_counter;
wait_queue_head_t waitq;
int num_rx_ints;
+   int ignore_rx_ramfull;
 
/* for ethtool */
int msg_enable;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/15] spidernet: Don't terminate the RX ring

2007-06-11 Thread Linas Vepstas


The terminated RX ring will cause trouble during the RX ram full
conditions, leading to a hung driver, as the hardware can't find
the next descr.  There is no real reason to terminate the RX ring; 
it doesn't make the operation any smooother, and it does
require an extra sync. So don't do it.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
17:35:33.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-08 17:36:19.0 
-0500
@@ -460,13 +460,9 @@ spider_net_prepare_rx_descr(struct spide
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
hwdescr->buf_addr = buf;
-   hwdescr->next_descr_addr = 0;
wmb();
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_CARDOWNED |
 SPIDER_NET_DMAC_NOINTR_COMPLETE;
-
-   wmb();
-   descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
}
 
return 0;
@@ -541,12 +537,16 @@ spider_net_refill_rx_chain(struct spider
 static int
 spider_net_alloc_rx_skbs(struct spider_net_card *card)
 {
-   int result;
-   struct spider_net_descr_chain *chain;
+   struct spider_net_descr_chain *chain = &card->rx_chain;
+   struct spider_net_descr *start = chain->tail;
+   struct spider_net_descr *descr = start;
 
-   result = -ENOMEM;
+   /* Link up the hardware chain pointers */
+   do {
+   descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
+   descr = descr->next;
+   } while (descr != start);
 
-   chain = &card->rx_chain;
/* Put at least one buffer into the chain. if this fails,
 * we've got a problem. If not, spider_net_refill_rx_chain
 * will do the rest at the end of this function. */
@@ -563,7 +563,7 @@ spider_net_alloc_rx_skbs(struct spider_n
 
 error:
spider_net_free_rx_chain_contents(card);
-   return result;
+   return -ENOMEM;
 }
 
 /**
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/15] spidernet: null out skb pointer after its been used.

2007-06-11 Thread Linas Vepstas

Avoid kernel crash in mm/slab.c due to double-free of pointer.

If the ethernet interface is brought down while there is still
RX traffic in flight, the device shutdown routine can end up
trying to double-free an skb, leading to a crash in mm/slab.c
Avoid the double-free by nulling out the skb pointer.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-08 
15:45:33.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-08 15:48:10.0 
-0500
@@ -1131,6 +1131,7 @@ spider_net_decode_one_descr(struct spide
 
/* Ok, we've got a packet in descr */
spider_net_pass_skb_up(descr, card);
+   descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
return 1;
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/15] spidernet driver bug fixes

2007-06-11 Thread Linas Vepstas
On Fri, Jun 08, 2007 at 01:20:20PM -0400, Jeff Garzik wrote:
> On Fri, Jun 08, 2007 at 12:06:08PM -0500, Linas Vepstas wrote:
> > On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> > > On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> > > > 
> > > > The major bug fixes are: 
> > > I realise it's late, but shouldn't "major bugfixes" be going into 22 ?
> > Yeah, I suppose, I admit I've lost track of the process. 
>
> You need to order your bug fixes first in the queue. 

OK, here are the patches, re-ordered. There is a different number
than last time, as I threw out one, merged one, and got cold feet
on a third one.  They still pass the tests.

The first five patches focus on three serious bugs, fixing crashes or
hangs.

-- patch 1 -- kernel crash when ifdown while receiving packets.
-- patch 2,3,4 -- device driver deadlocks on "RX ram full" mesgs.
  (kernel stays up, ifdown/up clear the problem).
-- patch 5 -- misconfigured TX interrupts results in 3x-4x per
  degradation for small packets.

-- patch 6 -- rx stats may be mangled
-- patch 7 -- hw checksum sometimes breaks ipv6 operation

-- patches 8-15 -- misc tweaks, and documentation.


I re-ran my stress tests with patches 1-7 applied; they pass.

I suggest that patches 1-5 or 1-7 be applied asap.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 0/18] spidernet driver bug fixes

2007-06-08 Thread Linas Vepstas
On Fri, Jun 08, 2007 at 11:12:31AM +1000, Michael Ellerman wrote:
> On Thu, 2007-06-07 at 14:17 -0500, Linas Vepstas wrote:
> > Jeff, please apply for the 2.6.23 kernel tree.  The pach series
> > consists of two major bugfixes, and several bits of cleanup.
> > 
> > The major bug fixes are: 
> > 
> > 1) a rare but fatal bug involving "RX ram full" messages, 
> >which results in a driver deadlock.
> > 
> > 2) misconfigured TX interrupts, causing a sever performance
> >degardation for small packets.
> 
> I realise it's late, but shouldn't "major bugfixes" be going into 22 ?

Yeah, I suppose, I admit I've lost track of the process. 

I'm not sure how to submit patches for this case. The "major fixes"
are patches 6/18, 13/18 14/18 and 17/18; (the rest of the patches are 
cruft-fixes). Taken alone, these four will not apply cleanly. 

I could prepare a new set, with just these four; asuming these are
accepted into 2.6.22, then once 22 comes out, Jeff's .23 tree won't 
merge cleanly.  

What's the right way to do this?

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/18] spidernet: driver docmentation

2007-06-07 Thread Linas Vepstas

Documentation for the spidernet driver.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 Documentation/networking/spider_net.txt |  204 
 1 file changed, 204 insertions(+)

Index: linux-2.6.22-rc1/Documentation/networking/spider_net.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc1/Documentation/networking/spider_net.txt2007-06-07 
14:01:52.0 -0500
@@ -0,0 +1,204 @@
+
+The Spidernet Device Driver
+===
+
+Written by Linas Vepstas <[EMAIL PROTECTED]>
+
+Version of 7 June 2007
+
+Abstract
+
+This document sketches the structure of portions of the spidernet
+device driver in the Linux kernel tree. The spidernet is a gigabit
+ethernet device built into the Toshiba southbridge commonly used
+in the SONY Playstation 3 and the IBM QS20 Cell blade.
+
+The Structure of the RX Ring.
+=
+The receive (RX) ring is a circular linked list of RX descriptors,
+together with three pointers into the ring that are used to manage its
+contents.
+
+The elements of the ring are called "descriptors" or "descrs"; they
+describe the received data. This includes a pointer to a buffer
+containing the received data, the buffer size, and various status bits.
+
+There are three primary states that a descriptor can be in: "empty",
+"full" and "not-in-use".  An "empty" or "ready" descriptor is ready
+to receive data from the hardware. A "full" descriptor has data in it,
+and is waiting to be emptied and processed by the OS. A "not-in-use"
+descriptor is neither empty or full; it is simply not ready. It may
+not even have a data buffer in it, or is otherwise unusable.
+
+During normal operation, on device startup, the OS (specifically, the
+spidernet device driver) allocates a set of RX descriptors and RX
+buffers. These are all marked "empty", ready to receive data. This
+ring is handed off to the hardware, which sequentially fills in the
+buffers, and marks them "full". The OS follows up, taking the full
+buffers, processing them, and re-marking them empty.
+
+This filling and emptying is managed by three pointers, the "head"
+and "tail" pointers, managed by the OS, and a hardware current
+descriptor pointer (GDACTDPA). The GDACTDPA points at the descr
+currently being filled. When this descr is filled, the hardware
+marks it full, and advances the GDACTDPA by one.  Thus, when there is
+flowing RX traffic, every descr behind it should be marked "full",
+and everything in front of it should be "empty".  If the hardware
+discovers that the current descr is not empty, it will signal an
+interrupt, and halt processing.
+
+The tail pointer tails or trails the hardware pointer. When the
+hardware is ahead, the tail pointer will be pointing at a "full"
+descr. The OS will process this descr, and then mark it "not-in-use",
+and advance the tail pointer.  Thus, when there is flowing RX traffic,
+all of the descrs in front of the tail pointer should be "full", and
+all of those behind it should be "not-in-use". When RX traffic is not
+flowing, then the tail pointer can catch up to the hardware pointer.
+The OS will then note that the current tail is "empty", and halt
+processing.
+
+The head pointer (somewhat mis-named) follows after the tail pointer.
+When traffic is flowing, then the head pointer will be pointing at
+a "not-in-use" descr. The OS will perform various housekeeping duties
+on this descr. This includes allocating a new data buffer and
+dma-mapping it so as to make it visible to the hardware. The OS will
+then mark the descr as "empty", ready to receive data. Thus, when there
+is flowing RX traffic, everything in front of the head pointer should
+be "not-in-use", and everything behind it should be "empty". If no
+RX traffic is flowing, then the head pointer can catch up to the tail
+pointer, at which point the OS will notice that the head descr is
+"empty", and it will halt processing.
+
+Thus, in an idle system, the GDACTDPA, tail and head pointers will
+all be pointing at the same descr, which should be "empty". All of the
+other descrs in the ring should be "empty" as well.
+
+The show_rx_chain() routine will print out the the locations of the
+GDACTDPA, tail and head pointers. It will also summarize the contents
+of the ring, starting at the tail pointer, and listing the status
+of the descrs that follow.
+
+A typical example of the output, for a nearly idle system, might be
+
+net eth1: Total number of descrs=256
+net eth1: Chain tail located at descr=20
+net eth1: Chain head is at 20
+net eth1: HW curr desc (GDACTDPA

[PATCH 17/18] spidernet: turn off descriptor chain end interrupt.

2007-06-07 Thread Linas Vepstas

At some point, the transmit descriptor chain end interrupt (TXDCEINT)
was turned on. This is a mistake; and it damages small packet
transmit performance, as it results in a huge storm of interrupts.  
Turn it off.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-07 
11:56:31.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-07 11:56:36.0 
-0500
@@ -227,6 +227,7 @@ extern char spider_net_driver_name[];
 #define SPIDER_NET_GDTBSTA 0x0300
 #define SPIDER_NET_GDTDCEIDIS  0x0002
 #define SPIDER_NET_DMA_TX_VALUESPIDER_NET_TX_DMA_EN | \
+   SPIDER_NET_GDTDCEIDIS | \
SPIDER_NET_GDTBSTA
 
 #define SPIDER_NET_DMA_TX_FEND_VALUE   0x00030003
@@ -337,8 +338,7 @@ enum spider_net_int2_status {
SPIDER_NET_GRISPDNGINT
 };
 
-#define SPIDER_NET_TXINT   ( (1 << SPIDER_NET_GDTFDCINT) | \
- (1 << SPIDER_NET_GDTDCEINT) )
+#define SPIDER_NET_TXINT   (1 << SPIDER_NET_GDTFDCINT)
 
 /* We rely on flagged descriptor interrupts */
 #define SPIDER_NET_RXINT   ( (1 << SPIDER_NET_GDAFDCINT) )
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/18] spidernet: fix misnamed flag

2007-06-07 Thread Linas Vepstas

The transmit frame tail bit is stranglely misnamed as 
"no checksum". Fix the name to what it should be:
"transmit frame tail". No functional change, 
just a name change.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |2 +-
 drivers/net/spider_net.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:56:23.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:56:31.0 
-0500
@@ -720,7 +720,7 @@ spider_net_prepare_tx_descr(struct spide
hwdescr->data_status = 0;
 
hwdescr->dmac_cmd_status =
-   SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
+   SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_TXFRMTL;
spin_unlock_irqrestore(&chain->lock, flags);
 
if (skb->ip_summed == CHECKSUM_PARTIAL)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-07 
11:55:06.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-07 11:56:31.0 
-0500
@@ -354,7 +354,7 @@ enum spider_net_int2_status {
 #define SPIDER_NET_GPRDAT_MASK 0x
 
 #define SPIDER_NET_DMAC_NOINTR_COMPLETE0x0080
-#define SPIDER_NET_DMAC_NOCS   0x0004
+#define SPIDER_NET_DMAC_TXFRMTL0x0004
 #define SPIDER_NET_DMAC_TCP0x0002
 #define SPIDER_NET_DMAC_UDP0x0003
 #define SPIDER_NET_TXDCEST 0x0800
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/18] spidernet: minor RX optimization

2007-06-07 Thread Linas Vepstas


A minor optimization on the RX side is that the hardware does 
not need to be kicked if space did not open up in the RX ring.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:56:10.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:56:23.0 
-0500
@@ -525,6 +525,7 @@ spider_net_refill_rx_chain(struct spider
 {
struct spider_net_descr_chain *chain = &card->rx_chain;
unsigned long flags;
+   int cnt = 0;
 
/* one context doing the refill (and a second context seeing that
 * and omitting it) is ok. If called by NAPI, we'll be called again
@@ -538,9 +539,13 @@ spider_net_refill_rx_chain(struct spider
if (spider_net_prepare_rx_descr(card, chain->head))
break;
chain->head = chain->head->next;
+   cnt ++;
}
 
spin_unlock_irqrestore(&chain->lock, flags);
+
+   if (cnt)
+   spider_net_enable_rxdmac(card);
 }
 
 /**
@@ -573,7 +578,6 @@ spider_net_alloc_rx_skbs(struct spider_n
/* This will allocate the rest of the rx buffers;
 * if not, it's business as usual later on. */
spider_net_refill_rx_chain(card);
-   spider_net_enable_rxdmac(card);
return 0;
 
 error:
@@ -1305,7 +1309,6 @@ spider_net_poll(struct net_device *netde
netdev->quota -= packets_done;
*budget -= packets_done;
spider_net_refill_rx_chain(card);
-   spider_net_enable_rxdmac(card);
 
spider_net_cleanup_tx_ring(card);
 
@@ -1590,7 +1593,6 @@ spider_net_handle_error_irq(struct spide
card->ignore_rx_ramfull = 1;
spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
-   spider_net_enable_rxdmac(card);
card->num_rx_ints ++;
netif_rx_schedule(card->netdev);
}
@@ -1611,7 +1613,6 @@ spider_net_handle_error_irq(struct spide
/* Could happen when rx chain is full */
spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
-   spider_net_enable_rxdmac(card);
card->num_rx_ints ++;
netif_rx_schedule(card->netdev);
show_error = 0;
@@ -1625,7 +1626,6 @@ spider_net_handle_error_irq(struct spide
/* Could happen when rx chain is full */
spider_net_resync_head_ptr(card);
spider_net_refill_rx_chain(card);
-   spider_net_enable_rxdmac(card);
card->num_rx_ints ++;
netif_rx_schedule(card->netdev);
show_error = 0;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/18] spidernet: silence the ramfull messages

2007-06-07 Thread Linas Vepstas

Altough the previous patch resolved issues with hangs when the
RX ram full interrupt is encountered, there are still situations
where lots of RX ramfull interrupts arrive, rsulting in a noisy
log in syslog. There is no need for this.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   20 ++--
 drivers/net/spider_net.h |3 ++-
 2 files changed, 12 insertions(+), 11 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:53:55.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:56:10.0 
-0500
@@ -1314,6 +1314,7 @@ spider_net_poll(struct net_device *netde
if (no_more_packets) {
netif_rx_complete(netdev);
spider_net_rx_irq_on(card);
+   card->ignore_rx_ramfull = 0;
return 0;
}
 
@@ -1584,17 +1585,15 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFBFLLINT: /* fallthrough */
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
-   if (netif_msg_intr(card) && net_ratelimit()) {
-   dev_info(&card->netdev->dev, "Spider RX RAM full, "
-   "incoming packets might be discarded!\n");
-   show_rx_chain(card);
-   }
/* Could happen when rx chain is full */
-   spider_net_resync_head_ptr(card);
-   spider_net_refill_rx_chain(card);
-   spider_net_enable_rxdmac(card);
-   card->num_rx_ints ++;
-   netif_rx_schedule(card->netdev);
+   if (card->ignore_rx_ramfull == 0) {
+   card->ignore_rx_ramfull = 1;
+   spider_net_resync_head_ptr(card);
+   spider_net_refill_rx_chain(card);
+   spider_net_enable_rxdmac(card);
+   card->num_rx_ints ++;
+   netif_rx_schedule(card->netdev);
+   }
show_error = 0;
break;
 
@@ -2374,6 +2373,7 @@ spider_net_setup_netdev(struct spider_ne
 
netdev->irq = card->pdev->irq;
card->num_rx_ints = 0;
+   card->ignore_rx_ramfull = 0;
 
dn = pci_device_to_OF_node(card->pdev);
if (!dn)
Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-07 
11:52:35.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-07 11:55:06.0 
-0500
@@ -164,7 +164,7 @@ extern char spider_net_driver_name[];
 
 /** interrupt mask registers */
 #define SPIDER_NET_INT0_MASK_VALUE 0x3f7fe2c7
-#define SPIDER_NET_INT1_MASK_VALUE 0x7ff7
+#define SPIDER_NET_INT1_MASK_VALUE 0x5ff5
 /* no MAC aborts -> auto retransmission */
 #define SPIDER_NET_INT2_MASK_VALUE 0xffef7ff1
 
@@ -467,6 +467,7 @@ struct spider_net_card {
atomic_t tx_timeout_task_counter;
wait_queue_head_t waitq;
int num_rx_ints;
+   int ignore_rx_ramfull;
 
/* for ethtool */
int msg_enable;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/18] spidernet: Cure RX ram full bug

2007-06-07 Thread Linas Vepstas


This patch fixes a rare deadlock that can occur when the kernel
is not able to empty out the RX ring quickly enough. Below follows
a detailed description of the bug and te fix.

As long as the OS can empty out the RX buffers at a rate faster than
the hardware can fill them, there is no problem. If, for some reason,
the OS fails to empty the RX ring fast enough, the hardware GDACTDPA
pointer will catch up to the head, notice the not-empty condition,
ad stop. However, RX packets may still continue arriving on the wire.
The spidernet chip can save some limited number of these in local RAM.
When this local ram fills up, the spider chip will issue an interrupt
indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit
will be set in GHIINT1STS).  When te RX ram full condition occurs, 
a certain bug/feature is triggered that has to be specially handled. 
This section describes the special handling for this condition.

When the OS finally has a chance to run, it will empty out the RX ring.
In particular, it will clear the descriptor on which the hardware had
stopped. However, once the hardware has decided that a certain
descriptor is invalid, it will not restart at that descriptor; instead
it will restart at the next descr. This potentially will lead to a 
deadlock condition, as the tail pointer will be pointing at this descr, 
which, from the OS point of view, is empty; the OS will be waiting for 
this descr to be filled. However, the hardware has skipped this descr, 
and is filling the next descrs. Since the OS doesn't see this, there
is a potential deadlock, with the OS waiting for one descr to fill, 
while the hardware is waiting for a differen set of descrs to become
empty.

A call to show_rx_chain() at this point indicates the nature of the
problem. A typical print when the network is hung shows the following:

net eth1: Spider RX RAM full, incoming packets might be discarded!
net eth1: Total number of descrs=256
net eth1: Chain tail located at descr=255
net eth1: Chain head is at 255
net eth1: HW curr desc (GDACTDPA) is at 0
net eth1: Have 1 descrs with stat=xa080
net eth1: HW next desc (GDACNEXTDA) is at 1
net eth1: Have 127 descrs with stat=x40800101
net eth1: Have 1 descrs with stat=x4081
net eth1: Have 126 descrs with stat=x40800101
net eth1: Last 1 descrs with stat=xa080

Both the tail and head pointers are pointing at descr 255, which is
marked xa... which is "empty". Thus, from the OS point of view, there
is nothing to be done. In particular, there is the implicit assumption
that everything in front of the "empty" descr must surely also be empty,
as explained in the last section. The OS is waiting for descr 255 to
become non-empty, which, in this case, will never happen.

The HW pointer is at descr 0. This descr is marked 0x4.. or "full". 
Since its already full, the hardware can do nothing more, and thus has
halted processing. Notice that descrs 0 through 254 are all marked
"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is 
descr 254, since tail was at 255.) Thus, the system is deadlocked, 
and there can be no forward progress; the OS thinks there's nothing 
to do, and the hardware has nowhere to put incoming data.

This bug/feature is worked around with the spider_net_resync_head_ptr()
routine. When the driver receives RX interrupts, but an examination
of the RX chain seems to show it is empty, then it is probable that
the hardware has skipped a descr or two (sometimes dozens under heavy
network conditions). The spider_net_resync_head_ptr() subroutine will
search the ring for the next full descr, and the driver will resume
operations there.  Since this will leave "holes" in the ring, there
is also a spider_net_resync_tail_ptr() that will skip over such holes. 


Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   86 +++
 drivers/net/spider_net.h |1 
 2 files changed, 81 insertions(+), 6 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:52:24.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:53:55.0 
-0500
@@ -,6 +,65 @@ static void show_rx_chain(struct spider_
 }
 
 /**
+ * spider_net_resync_head_ptr - Advance head ptr past empty descrs
+ *
+ * If the driver fails to keep up and empty the queue, then the
+ * hardware wil run out of room to put incoming packets. This
+ * will cause the hardware to skip descrs that are full (instead
+ * of halting/retrying). Thus, once the driver runs, it wil need
+ * to "catch up" to where the hardware chain pointer is at.
+ */
+static void spider_net_resync_head_ptr(struct spider_net_card *card)
+{
+   unsigned long flags;
+   struct spider_net_descr_chain *

[PATCH 12/18] spidernet: don't flag rare packets as bad packets

2007-06-07 Thread Linas Vepstas

The current error checking is flagging some perfectly normal, but
usually rare packets as being bad. Do not flag these packets.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:52:20.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:52:24.0 
-0500
@@ -1174,7 +1174,7 @@ spider_net_decode_one_descr(struct spide
goto bad_desc;
}
 
-   if (hwdescr->dmac_cmd_status & 0xfefe) {
+   if (hwdescr->dmac_cmd_status & 0xfcf4) {
dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
   hwdescr->dmac_cmd_status);
pr_err("buf_addr=x%08x\n", hw_buf_addr);
@@ -1543,10 +1543,7 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDCDCEINT: /* fallthrough */
case SPIDER_NET_GDBDCEINT: /* fallthrough */
case SPIDER_NET_GDADCEINT:
-   if (netif_msg_intr(card) && net_ratelimit())
-   dev_err(&card->netdev->dev, "got descriptor chain end 
interrupt, "
-  "restarting DMAC %c.\n",
-  'D'-(i-SPIDER_NET_GDDDCEINT)/3);
+   /* Could happen when rx chain is full */
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
show_error = 0;
@@ -1557,7 +1554,7 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GDCINVDINT: /* fallthrough */
case SPIDER_NET_GDBINVDINT: /* fallthrough */
case SPIDER_NET_GDAINVDINT:
-   /* could happen when rx chain is full */
+   /* Could happen when rx chain is full */
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
show_error = 0;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/18] spidernet: increase the NAPI weight

2007-06-07 Thread Linas Vepstas

Another way of minimizing the likelyhood of RX ram from overflowing
is to empty out the entire rx ring every chance we get. Change
the crazy watchdog timeout from 50 seconds to 3 seconds, while
we're here.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.h |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.h
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.h  2007-06-07 
11:51:47.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.h   2007-06-07 11:52:22.0 
-0500
@@ -56,8 +56,13 @@ extern char spider_net_driver_name[];
 
 #define SPIDER_NET_RX_CSUM_DEFAULT 1
 
-#define SPIDER_NET_WATCHDOG_TIMEOUT50*HZ
-#define SPIDER_NET_NAPI_WEIGHT 64
+#define SPIDER_NET_WATCHDOG_TIMEOUT3*HZ
+
+/* We really really want to empty the ring buffer every time,
+ * so as to avoid the RX ram full bug. So set te napi wieght
+ * to the ring size.
+ */
+#define SPIDER_NET_NAPI_WEIGHT SPIDER_NET_RX_DESCRIPTORS_DEFAULT
 
 #define SPIDER_NET_FIRMWARE_SEQS   6
 #define SPIDER_NET_FIRMWARE_SEQWORDS   1024
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/18] spidernet: service TX later.

2007-06-07 Thread Linas Vepstas

When entering the netdev poll routine, empty out the RX
chain first, before cleaning up the TX chain. This should
help avoid RX buffer overflows.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:52:17.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:52:20.0 
-0500
@@ -1224,7 +1224,6 @@ spider_net_poll(struct net_device *netde
int packets_to_do, packets_done = 0;
int no_more_packets = 0;
 
-   spider_net_cleanup_tx_ring(card);
packets_to_do = min(*budget, netdev->quota);
 
while (packets_to_do) {
@@ -1243,6 +1242,8 @@ spider_net_poll(struct net_device *netde
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
 
+   spider_net_cleanup_tx_ring(card);
+
/* if all packets are in the stack, enable interrupts and return 0 */
/* if not, return 1 */
if (no_more_packets) {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/18] spidernet: reset the card when an rxramfull is seen

2007-06-07 Thread Linas Vepstas

Some versions of the spider have a firmware bug, where the
RX ring sequencer goes crazy when the RX RAM on the device
fills up. Appearently the only viable wrkaround is a soft
reset of the card.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:52:12.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:52:17.0 
-0500
@@ -1518,11 +1518,16 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFBFLLINT: /* fallthrough */
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
-   if (netif_msg_intr(card) && net_ratelimit())
-   dev_err(&card->netdev->dev, "Spider RX RAM full, 
incoming packets "
-  "might be discarded!\n");
+   if (netif_msg_intr(card) && net_ratelimit()) {
+   dev_err(&card->netdev->dev, "Spider RX RAM full, "
+   "incoming packets might be discarded!\n");
+   show_rx_chain(card);
+   }
spider_net_rx_irq_off(card);
-   netif_rx_schedule(card->netdev);
+
+   /* If the card is spewing rxramfulls, then reset */
+   atomic_inc(&card->tx_timeout_task_counter);
+   schedule_work(&card->tx_timeout_task);
show_error = 0;
break;
 
@@ -2100,6 +2105,8 @@ spider_net_workaround_rxramfull(struct s
 {
int i, sequencer = 0;
 
+   dev_info(&card->pdev->dev, "calling rxramfull workaround\n");
+
/* cancel reset */
spider_net_write_reg(card, SPIDER_NET_CKRCTRL,
 SPIDER_NET_CKRCTRL_RUN_VALUE);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/18] spidernet: enhance the dump routine

2007-06-07 Thread Linas Vepstas

Crazy device problems are hard to debug, when one does not have
good trace info. This patch makes a major enhancement to the
device dump routine.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   78 ++-
 1 file changed, 70 insertions(+), 8 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
14:07:26.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 14:07:55.0 
-0500
@@ -1022,34 +1022,94 @@ spider_net_pass_skb_up(struct spider_net
netif_receive_skb(skb);
 }
 
-#ifdef DEBUG
 static void show_rx_chain(struct spider_net_card *card)
 {
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *start= chain->tail;
struct spider_net_descr *descr= start;
+   struct spider_net_hw_descr *hwd = start->hwdescr;
+   struct device *dev = &card->netdev->dev;
+   u32 curr_desc, next_desc;
int status;
 
+   int tot = 0;
int cnt = 0;
-   int cstat = spider_net_get_descr_status(descr);
-   printk(KERN_INFO "RX chain tail at descr=%ld\n",
-(start - card->descr) - card->tx_chain.num_desc);
+   int off = start - chain->ring;
+   int cstat = hwd->dmac_cmd_status;
+
+   dev_info(dev, "Total number of descrs=%d\n",
+   chain->num_desc);
+   dev_info(dev, "Chain tail located at descr=%d, status=0x%x\n",
+   off, cstat);
+
+   curr_desc = spider_net_read_reg(card, SPIDER_NET_GDACTDPA);
+   next_desc = spider_net_read_reg(card, SPIDER_NET_GDACNEXTDA);
+
status = cstat;
do
{
-   status = spider_net_get_descr_status(descr);
+   hwd = descr->hwdescr;
+   off = descr - chain->ring;
+   status = hwd->dmac_cmd_status;
+
+   if (descr == chain->head)
+   dev_info(dev, "Chain head is at %d, head status=0x%x\n",
+off, status);
+
+   if (curr_desc == descr->bus_addr)
+   dev_info(dev, "HW curr desc (GDACTDPA) is at %d, 
status=0x%x\n",
+off, status);
+
+   if (next_desc == descr->bus_addr)
+   dev_info(dev, "HW next desc (GDACNEXTDA) is at %d, 
status=0x%x\n",
+off, status);
+
+   if (hwd->next_descr_addr == 0)
+   dev_info(dev, "chain is cut at %d\n", off);
+
if (cstat != status) {
-   printk(KERN_INFO "Have %d descrs with stat=x%08x\n", 
cnt, cstat);
+   int from = (chain->num_desc + off - cnt) % 
chain->num_desc;
+   int to = (chain->num_desc + off - 1) % chain->num_desc;
+   dev_info(dev, "Have %d (from %d to %d) descrs "
+"with stat=0x%08x\n", cnt, from, to, cstat);
cstat = status;
cnt = 0;
}
+
cnt ++;
+   tot ++;
+   descr = descr->next;
+   } while (descr != start);
+
+   dev_info(dev, "Last %d descrs with stat=0x%08x "
+"for a total of %d descrs\n", cnt, cstat, tot);
+
+#ifdef DEBUG
+   /* Now dump the whole ring */
+   descr = start;
+   do
+   {
+   struct spider_net_hw_descr *hwd = descr->hwdescr;
+   status = spider_net_get_descr_status(hwd);
+   cnt = descr - chain->ring;
+   dev_info(dev, "Descr %d stat=0x%08x skb=%p\n",
+cnt, status, descr->skb);
+   dev_info(dev, "bus addr=%08x buf addr=%08x sz=%d\n",
+descr->bus_addr, hwd->buf_addr, hwd->buf_size);
+   dev_info(dev, "next=%08x result sz=%d valid sz=%d\n",
+hwd->next_descr_addr, hwd->result_size,
+hwd->valid_size);
+   dev_info(dev, "dmac=%08x data stat=%08x data err=%08x\n",
+hwd->dmac_cmd_status, hwd->data_status,
+hwd->data_error);
+   dev_info(dev, "\n");
+
descr = descr->next;
} while (descr != start);
-   printk(KERN_INFO "Last %d descrs with stat=x%08x\n", cnt, cstat);
-}
 #endif
 
+}
+
 /**
  * spider_net_decode_one_descr - processes an RX descriptor
  * @card: card structure
@@ -1137,6 +1197,8 @@ spider_net_decode_one_descr(struct spide
r

[PATCH 7/18] spidernet: Don't terminate the RX ring

2007-06-07 Thread Linas Vepstas

Subject: [PATCH 7/18] spidernet: Don't terminate the RX ring

There is no real reason to terminate the RX ring; it
doesn't make the operation any smooother, and it does
require an extra sync. So don't do it.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:51:52.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:51:55.0 
-0500
@@ -461,13 +461,9 @@ spider_net_prepare_rx_descr(struct spide
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
hwdescr->buf_addr = buf;
-   hwdescr->next_descr_addr = 0;
wmb();
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_CARDOWNED |
 SPIDER_NET_DMAC_NOINTR_COMPLETE;
-
-   wmb();
-   descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
}
 
return 0;
@@ -556,12 +552,16 @@ spider_net_refill_rx_chain(struct spider
 static int
 spider_net_alloc_rx_skbs(struct spider_net_card *card)
 {
-   int result;
-   struct spider_net_descr_chain *chain;
+   struct spider_net_descr_chain *chain = &card->rx_chain;
+   struct spider_net_descr *start = chain->tail;
+   struct spider_net_descr *descr = start;
 
-   result = -ENOMEM;
+   /* Link up the hardware chain pointers */
+   do {
+   descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
+   descr = descr->next;
+   } while (descr != start);
 
-   chain = &card->rx_chain;
/* Put at least one buffer into the chain. if this fails,
 * we've got a problem. If not, spider_net_refill_rx_chain
 * will do the rest at the end of this function. */
@@ -578,7 +578,7 @@ spider_net_alloc_rx_skbs(struct spider_n
 
 error:
spider_net_free_rx_chain_contents(card);
-   return result;
+   return -ENOMEM;
 }
 
 /**
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/18] spidernet: null out skb pointer after its been used.

2007-06-07 Thread Linas Vepstas

If the ethernet interface is brought down while there is still
RX traffic in flight, the device shutdown routine can end up
trying to double-free an skb, leading to a crash in mm/slab.c
Avoid the double-free by nulling out the skb pointer.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:51:51.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:51:52.0 
-0500
@@ -1132,6 +1132,7 @@ spider_net_decode_one_descr(struct spide
 
/* Ok, we've got a packet in descr */
spider_net_pass_skb_up(descr, card);
+   descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
return 1;
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/18] spidernet: zero out a pointer.

2007-06-07 Thread Linas Vepstas

Invalidate a pointer as its pci_unmap'ed; this is a bit of 
paranoia to make sure hardware doesn't continue trying to 
DMA to it. 

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:51:48.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:51:51.0 
-0500
@@ -1067,6 +1067,7 @@ spider_net_decode_one_descr(struct spide
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *descr = chain->tail;
struct spider_net_hw_descr *hwdescr = descr->hwdescr;
+   u32 hw_buf_addr;
int status;
 
status = spider_net_get_descr_status(hwdescr);
@@ -1080,7 +1081,9 @@ spider_net_decode_one_descr(struct spide
chain->tail = descr->next;
 
/* unmap descriptor */
-   pci_unmap_single(card->pdev, hwdescr->buf_addr,
+   hw_buf_addr = hwdescr->buf_addr;
+   hwdescr->buf_addr = 0x;
+   pci_unmap_single(card->pdev, hw_buf_addr,
SPIDER_NET_MAX_FRAME, PCI_DMA_FROMDEVICE);
 
if ( (status == SPIDER_NET_DESCR_RESPONSE_ERROR) ||
@@ -1114,7 +1117,7 @@ spider_net_decode_one_descr(struct spide
if (hwdescr->dmac_cmd_status & 0xfefe) {
dev_err(&card->netdev->dev, "bad status, cmd_status=x%08x\n",
   hwdescr->dmac_cmd_status);
-   pr_err("buf_addr=x%08x\n", hwdescr->buf_addr);
+   pr_err("buf_addr=x%08x\n", hw_buf_addr);
pr_err("buf_size=x%08x\n", hwdescr->buf_size);
pr_err("next_descr_addr=x%08x\n", hwdescr->next_descr_addr);
pr_err("result_size=x%08x\n", hwdescr->result_size);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/18] spidernet: move a block of code around

2007-06-07 Thread Linas Vepstas


Put the enable and disable routines next to one-another, 
as this makes verifying thier symmetry that much easier.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:51:47.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:51:48.0 
-0500
@@ -505,6 +505,20 @@ spider_net_enable_rxdmac(struct spider_n
 }
 
 /**
+ * spider_net_disable_rxdmac - disables the receive DMA controller
+ * @card: card structure
+ *
+ * spider_net_disable_rxdmac terminates processing on the DMA controller
+ * by turing off the DMA controller, with the force-end flag set.
+ */
+static inline void
+spider_net_disable_rxdmac(struct spider_net_card *card)
+{
+   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
+SPIDER_NET_DMA_RX_FEND_VALUE);
+}
+
+/**
  * spider_net_refill_rx_chain - refills descriptors/skbs in the rx chains
  * @card: card structure
  *
@@ -656,20 +670,6 @@ write_hash:
 }
 
 /**
- * spider_net_disable_rxdmac - disables the receive DMA controller
- * @card: card structure
- *
- * spider_net_disable_rxdmac terminates processing on the DMA controller by
- * turing off DMA and issueing a force end
- */
-static void
-spider_net_disable_rxdmac(struct spider_net_card *card)
-{
-   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
-SPIDER_NET_DMA_RX_FEND_VALUE);
-}
-
-/**
  * spider_net_prepare_tx_descr - fill tx descriptor with skb data
  * @card: card structure
  * @descr: descriptor structure to fill out
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/18] spidernet: checksum and ethtool

2007-06-07 Thread Linas Vepstas

From: Stephen Hemminger <[EMAIL PROTECTED]>

It doesn't look like spidernet hardware can really checksum all protocols,
the code looks like it does IPV4 only.  If so, it should use NETIF_F_IP_CSUM
instead of NETIF_F_HW_CSUM.

The driver doesn't need it's own get/set for ethtool tx csum, and it
should use the standard ethtool_op_get_link.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>

-

 drivers/net/spider_net.c |4 ++--
 drivers/net/spider_net_ethtool.c |   21 +++--
 2 files changed, 5 insertions(+), 20 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:51:40.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:51:45.0 
-0500
@@ -718,7 +718,7 @@ spider_net_prepare_tx_descr(struct spide
SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
spin_unlock_irqrestore(&chain->lock, flags);
 
-   if (skb->protocol == htons(ETH_P_IP) && skb->ip_summed == 
CHECKSUM_PARTIAL)
+   if (skb->ip_summed == CHECKSUM_PARTIAL)
switch (ip_hdr(skb)->protocol) {
case IPPROTO_TCP:
hwdescr->dmac_cmd_status |= SPIDER_NET_DMAC_TCP;
@@ -2225,7 +2225,7 @@ spider_net_setup_netdev(struct spider_ne
 
spider_net_setup_netdev_ops(netdev);
 
-   netdev->features = NETIF_F_HW_CSUM | NETIF_F_LLTX;
+   netdev->features = NETIF_F_IP_CSUM | NETIF_F_LLTX;
/* some time: NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX |
 *  NETIF_F_HW_VLAN_FILTER */
 
Index: linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net_ethtool.c  2007-06-07 
11:49:01.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net_ethtool.c   2007-06-07 
11:51:45.0 -0500
@@ -134,22 +134,6 @@ spider_net_ethtool_set_rx_csum(struct ne
return 0;
 }
 
-static uint32_t
-spider_net_ethtool_get_tx_csum(struct net_device *netdev)
-{
-return (netdev->features & NETIF_F_HW_CSUM) != 0;
-}
-
-static int
-spider_net_ethtool_set_tx_csum(struct net_device *netdev, uint32_t data)
-{
-if (data)
-netdev->features |= NETIF_F_HW_CSUM;
-else
-netdev->features &= ~NETIF_F_HW_CSUM;
-
-return 0;
-}
 
 static void
 spider_net_ethtool_get_ringparam(struct net_device *netdev,
@@ -200,11 +184,12 @@ const struct ethtool_ops spider_net_etht
.get_wol= spider_net_ethtool_get_wol,
.get_msglevel   = spider_net_ethtool_get_msglevel,
.set_msglevel   = spider_net_ethtool_set_msglevel,
+   .get_link   = ethtool_op_get_link,
.nway_reset = spider_net_ethtool_nway_reset,
.get_rx_csum= spider_net_ethtool_get_rx_csum,
.set_rx_csum= spider_net_ethtool_set_rx_csum,
-   .get_tx_csum= spider_net_ethtool_get_tx_csum,
-   .set_tx_csum= spider_net_ethtool_set_tx_csum,
+   .get_tx_csum= ethtool_op_get_tx_csum,
+   .set_tx_csum= ethtool_op_set_tx_csum,
.get_ringparam  = spider_net_ethtool_get_ringparam,
.get_strings= spider_net_get_strings,
.get_stats_count= spider_net_get_stats_count,
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/18] spidernet: beautify error messages

2007-06-07 Thread Linas Vepstas

Use dev_err() to print device error messages.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   67 ---
 drivers/net/spider_net.h |2 -
 2 files changed, 36 insertions(+), 33 deletions(-)

Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:51:45.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:51:47.0 
-0500
@@ -434,7 +434,8 @@ spider_net_prepare_rx_descr(struct spide
  bufsize + SPIDER_NET_RXBUF_ALIGN - 1);
if (!descr->skb) {
if (netif_msg_rx_err(card) && net_ratelimit())
-   pr_err("Not enough memory to allocate rx buffer\n");
+   dev_err(&card->netdev->dev,
+   "Not enough memory to allocate rx buffer\n");
card->spider_stats.alloc_rx_skb_error++;
return -ENOMEM;
}
@@ -455,7 +456,7 @@ spider_net_prepare_rx_descr(struct spide
dev_kfree_skb_any(descr->skb);
descr->skb = NULL;
if (netif_msg_rx_err(card) && net_ratelimit())
-   pr_err("Could not iommu-map rx buffer\n");
+   dev_err(&card->netdev->dev, "Could not iommu-map rx 
buffer\n");
card->spider_stats.rx_iommu_map_error++;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
@@ -692,7 +693,7 @@ spider_net_prepare_tx_descr(struct spide
buf = pci_map_single(card->pdev, skb->data, skb->len, PCI_DMA_TODEVICE);
if (pci_dma_mapping_error(buf)) {
if (netif_msg_tx_err(card) && net_ratelimit())
-   pr_err("could not iommu-map packet (%p, %i). "
+   dev_err(&card->netdev->dev, "could not iommu-map packet 
(%p, %i). "
  "Dropping packet\n", skb->data, skb->len);
card->spider_stats.tx_iommu_map_error++;
return -ENOMEM;
@@ -832,9 +833,8 @@ spider_net_release_tx_chain(struct spide
case SPIDER_NET_DESCR_PROTECTION_ERROR:
case SPIDER_NET_DESCR_FORCE_END:
if (netif_msg_tx_err(card))
-   pr_err("%s: forcing end of tx descriptor "
-  "with status x%02x\n",
-  card->netdev->name, status);
+   dev_err(&card->netdev->dev, "forcing end of tx 
descriptor "
+  "with status x%02x\n", status);
card->netdev_stats.tx_errors++;
break;
 
@@ -1087,8 +1087,8 @@ spider_net_decode_one_descr(struct spide
 (status == SPIDER_NET_DESCR_PROTECTION_ERROR) ||
 (status == SPIDER_NET_DESCR_FORCE_END) ) {
if (netif_msg_rx_err(card))
-   pr_err("%s: dropping RX descriptor with state %d\n",
-  card->netdev->name, status);
+   dev_err(&card->netdev->dev,
+  "dropping RX descriptor with state %d\n", 
status);
card->netdev_stats.rx_dropped++;
goto bad_desc;
}
@@ -1096,8 +1096,8 @@ spider_net_decode_one_descr(struct spide
if ( (status != SPIDER_NET_DESCR_COMPLETE) &&
 (status != SPIDER_NET_DESCR_FRAME_END) ) {
if (netif_msg_rx_err(card))
-   pr_err("%s: RX descriptor with unknown state %d\n",
-  card->netdev->name, status);
+   dev_err(&card->netdev->dev,
+  "RX descriptor with unknown state %d\n", status);
card->spider_stats.rx_desc_unk_state++;
goto bad_desc;
}
@@ -1105,16 +1105,14 @@ spider_net_decode_one_descr(struct spide
/* The cases we'll throw away the packet immediately */
if (hwdescr->data_error & SPIDER_NET_DESTROY_RX_FLAGS) {
if (netif_msg_rx_err(card))
-   pr_err("%s: error in received descriptor found, "
+   dev_err(&card->netdev->dev, "error in received 
descriptor found, "
   "data_status=x%08x, data_error=x%08x\n",
-  card->netdev->name,
   hwdescr->data_status, hwdescr->

[PATCH 1/18] spidernet: skb used after netif_receive_skb

2007-06-07 Thread Linas Vepstas

From: Florin Malita <[EMAIL PROTECTED]>

The stats update code in spider_net_pass_skb_up() is touching the skb 
after it's been passed up to the stack. To avoid that, just update the 
stats first.

Signed-off-by: Florin Malita <[EMAIL PROTECTED]>
Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>



 drivers/net/spider_net.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 108adbf..1df2f0b 100644
Index: linux-2.6.22-rc1/drivers/net/spider_net.c
===
--- linux-2.6.22-rc1.orig/drivers/net/spider_net.c  2007-06-07 
11:51:04.0 -0500
+++ linux-2.6.22-rc1/drivers/net/spider_net.c   2007-06-07 11:51:40.0 
-0500
@@ -1014,12 +1014,12 @@ spider_net_pass_skb_up(struct spider_net
 */
}
 
-   /* pass skb up to stack */
-   netif_receive_skb(skb);
-
/* update netdevice statistics */
card->netdev_stats.rx_packets++;
card->netdev_stats.rx_bytes += skb->len;
+
+   /* pass skb up to stack */
+   netif_receive_skb(skb);
 }
 
 #ifdef DEBUG
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/18] spidernet driver bug fixes

2007-06-07 Thread Linas Vepstas

Jeff, please apply for the 2.6.23 kernel tree.  The pach series
consists of two major bugfixes, and several bits of cleanup.

The major bug fixes are: 

1) a rare but fatal bug involving "RX ram full" messages, 
   which results in a driver deadlock.

2) misconfigured TX interrupts, causing a sever performance
   degardation for small packets.

Minor updates include an expanded ring dump routine, and 
documentation for some portions of the device driver.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: checksum and ethtool

2007-06-07 Thread Linas Vepstas
On Tue, May 29, 2007 at 05:24:36PM -0700, Stephen Hemminger wrote:
> It doesn't look like spidernet hardware can really checksum all protocols,
> the code looks like it does IPV4 only.  If so, it should use NETIF_F_IP_CSUM
> instead of NETIF_F_HW_CSUM.
> 
> The driver doesn't need it's own get/set for ethtool tx csum, and it
> should use the standard ethtool_op_get_link.

Can you provide a signed-off-by line, please?  I was hoping to submit
upstream today.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: checksum and ethtool

2007-06-01 Thread Linas Vepstas
On Tue, May 29, 2007 at 05:24:36PM -0700, Stephen Hemminger wrote:
> It doesn't look like spidernet hardware can really checksum all protocols,
> the code looks like it does IPV4 only.  If so, it should use NETIF_F_IP_CSUM
> instead of NETIF_F_HW_CSUM.
> 
> The driver doesn't need it's own get/set for ethtool tx csum, and it
> should use the standard ethtool_op_get_link.
> 
> NOT TESTED (no CELL hardware).

It seems to work. I've been distracted with other spidernet issues;
I will forward your patch upstream early next week.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] s2io: don't run MSI handlers if device is offline.

2007-05-25 Thread Linas Vepstas
On Thu, May 24, 2007 at 05:20:24PM -0400, Jeff Garzik wrote:
> Linas Vepstas wrote:
> >Don't run any of the MSI handlers if the channel is off;
> >also don't gather device statatistics. Also, netif_wake 
> >not needed, per suggestions from
> >Sivakumar Subramani <[EMAIL PROTECTED]>.
> >
> Why are the interrupt handlers being called at all, then?
> 
> This seems to be papering over another bug.

Ahh ... are you suggesting that the arch should be performing 
this check in driver-independent code, and not delivering the
MSI if the pci channel is deemed offline? Yes, right, good point. 

So far, I'd hadn't really given this much thought, and had been 
letting the dev drivers perform this check, but, yes, it is probably 
better to have one common check performed in the generic arch-specific 
code, and the interrupt dropped, if the device is offlined.

I'll look into this.

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] spidernet: increase the NAPI weight

2007-05-22 Thread Linas Vepstas

Another way of minimizing the likelyhood of RX ram from overflowing
is to empty out the entire rx ring every chance we get. Change
the crazy watchdog timeout from 50 seconds to 3 seconds, while
we're here.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.h |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.h
===
--- netdev-2.6.orig/drivers/net/spider_net.h2007-05-22 18:03:24.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.h 2007-05-22 18:03:43.0 -0500
@@ -56,8 +56,13 @@ extern char spider_net_driver_name[];
 
 #define SPIDER_NET_RX_CSUM_DEFAULT 1
 
-#define SPIDER_NET_WATCHDOG_TIMEOUT50*HZ
-#define SPIDER_NET_NAPI_WEIGHT 64
+#define SPIDER_NET_WATCHDOG_TIMEOUT3*HZ
+
+/* We really really want to empty the ring buffer every time,
+ * so as to avoid the RX ram full bug. So set the napi weight
+ * to the ring size.
+ */
+#define SPIDER_NET_NAPI_WEIGHT 
SPIDER_NET_RX_DESCRIPTORS_DEFAULT
 
 #define SPIDER_NET_FIRMWARE_SEQS   6
 #define SPIDER_NET_FIRMWARE_SEQWORDS   1024
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/10] spidernet: service TX later.

2007-05-22 Thread Linas Vepstas

When entering the netdev poll routine, empty out the RX
chain first, before cleaning up the TX chain. This should
help avoid RX buffer overflows.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:39.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:41.0 -0500
@@ -1212,7 +1212,6 @@ spider_net_poll(struct net_device *netde
int packets_to_do, packets_done = 0;
int no_more_packets = 0;
 
-   spider_net_cleanup_tx_ring(card);
packets_to_do = min(*budget, netdev->quota);
 
while (packets_to_do) {
@@ -1231,6 +1230,8 @@ spider_net_poll(struct net_device *netde
spider_net_refill_rx_chain(card);
spider_net_enable_rxdmac(card);
 
+   spider_net_cleanup_tx_ring(card);
+
/* if all packets are in the stack, enable interrupts and return 0 */
/* if not, return 1 */
if (no_more_packets) {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/10] spidernet: reset the card when an rxramfull is seen

2007-05-22 Thread Linas Vepstas

Some versions of the spider have a firmware bug, where the
RX ring sequencer goes crazy when the RX RAM on the device
fills up. Appearently the only viable wrkaround is a soft
reset of the card.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:37.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:39.0 -0500
@@ -1506,11 +1506,17 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFBFLLINT: /* fallthrough */
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
-   if (netif_msg_intr(card) && net_ratelimit())
-   pr_err("%s: Spider RX RAM full, incoming packets "
-  "might be discarded!\n", card->netdev->name);
+   if (netif_msg_intr(card) && net_ratelimit()) {
+   pr_err("%s: Spider RX RAM full, reseting device.\n",
+  card->netdev->name);
+   show_rx_chain(card);
+   }
spider_net_rx_irq_off(card);
netif_rx_schedule(card->netdev);
+
+   /* If the card is spewing rxramfulls, then reset */
+   atomic_inc(&card->tx_timeout_task_counter);
+   schedule_work(&card->tx_timeout_task);
show_error = 0;
break;
 
@@ -2087,6 +2093,8 @@ spider_net_workaround_rxramfull(struct s
 {
int i, sequencer = 0;
 
+   printk(KERN_INFO "%s: calling rxramfull workaround\n", 
card->netdev->name);
+
/* cancel reset */
spider_net_write_reg(card, SPIDER_NET_CKRCTRL,
 SPIDER_NET_CKRCTRL_RUN_VALUE);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/10] spidernet: enhance the dump routine

2007-05-22 Thread Linas Vepstas

Crazy device problems are hard to debug, when one does not have
good trace info. This patch makes a major enhancement to the
device dump routine.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   62 ---
 1 file changed, 54 insertions(+), 8 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:35.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:37.0 -0500
@@ -1024,34 +1024,78 @@ spider_net_pass_skb_up(struct spider_net
netif_receive_skb(skb);
 }
 
-#ifdef DEBUG
 static void show_rx_chain(struct spider_net_card *card)
 {
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *start= chain->tail;
struct spider_net_descr *descr= start;
+   struct spider_net_hw_descr *hwd = start->hwdescr;
+   char *iface = card->netdev->name;
+   u32 curr_desc, next_desc;
int status;
 
int cnt = 0;
-   int cstat = spider_net_get_descr_status(descr);
-   printk(KERN_INFO "RX chain tail at descr=%ld\n",
-(start - card->descr) - card->tx_chain.num_desc);
+   int off = 0;
+   int cstat = hwd->dmac_cmd_status;
+
+   printk(KERN_INFO "%s: Total number of descrs=%d\n",
+   iface, chain->num_desc);
+   printk(KERN_INFO "%s: Chain tail located at descr=%d\n",
+   iface, (int) (start - chain->ring));
+
+   curr_desc = spider_net_read_reg(card, SPIDER_NET_GDACTDPA);
+   next_desc = spider_net_read_reg(card, SPIDER_NET_GDACNEXTDA);
+
status = cstat;
do
{
-   status = spider_net_get_descr_status(descr);
+   hwd = descr->hwdescr;
+   off = descr - chain->ring;
+   if (descr==chain->head)
+   printk(KERN_INFO "%s: chain head is at %d\n", iface, 
off);
+   if (curr_desc == descr->bus_addr)
+   printk(KERN_INFO "%s: hw curr desc is at %d\n", iface, 
off);
+   if (next_desc == descr->bus_addr)
+   printk(KERN_INFO "%s: hw next desc is at %d\n", iface, 
off);
+   if (hwd->next_descr_addr == 0)
+   printk(KERN_INFO "%s: chain is cut at %d\n", iface, 
off);
+   status = hwd->dmac_cmd_status;
if (cstat != status) {
-   printk(KERN_INFO "Have %d descrs with stat=x%08x\n", 
cnt, cstat);
+   printk(KERN_INFO "%s: Have %d descrs with stat=x%08x\n",
+   iface, cnt, cstat);
cstat = status;
cnt = 0;
}
cnt ++;
descr = descr->next;
} while (descr != start);
-   printk(KERN_INFO "Last %d descrs with stat=x%08x\n", cnt, cstat);
-}
+   printk(KERN_INFO "%s: Last %d descrs with stat=x%08x\n",
+   iface, cnt, cstat);
+
+#ifdef DEBUG
+   /* Now dump the whole ring */
+   descr = start;
+   do
+   {
+   struct spider_net_hw_descr *hwd = descr->hwdescr;
+   status = spider_net_get_descr_status(hwd);
+   cnt = descr - chain->ring;
+   printk(KERN_INFO "Descr %d stat=0x%08x skb=%p\n",
+   cnt, status, descr->skb);
+   printk(KERN_INFO "bus addr=%08x buf addr=%08x sz=%d\n",
+   descr->bus_addr, hwd->buf_addr, hwd->buf_size);
+   printk(KERN_INFO "next=%08x result sz=%d valid sz=%d\n",
+   hwd->next_descr_addr, hwd->result_size, 
hwd->valid_size);
+   printk(KERN_INFO "dmac=%08x data stat=%08x data err=%08x\n",
+   hwd->dmac_cmd_status, hwd->data_status, 
hwd->data_error);
+   printk(KERN_INFO "\n");
+
+   descr = descr->next;
+   } while (descr != start);
 #endif
 
+}
+
 /**
  * spider_net_decode_one_descr - processes an RX descriptor
  * @card: card structure
@@ -1141,6 +1185,8 @@ spider_net_decode_one_descr(struct spide
return 1;
 
 bad_desc:
+   if (netif_msg_rx_err(card))
+   show_rx_chain(card);
dev_kfree_skb_irq(descr->skb);
descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/10] spidernet: Don't terminate the RX ring

2007-05-22 Thread Linas Vepstas

There is no real reason to terminate the RX ring; it
doesn't make the operation any smooother, and it does
require an extra sync. So don't do it.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:34.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:35.0 -0500
@@ -462,13 +462,9 @@ spider_net_prepare_rx_descr(struct spide
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
hwdescr->buf_addr = buf;
-   hwdescr->next_descr_addr = 0;
wmb();
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_CARDOWNED |
 SPIDER_NET_DMAC_NOINTR_COMPLETE;
-
-   wmb();
-   descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
}
 
return 0;
@@ -557,12 +553,16 @@ spider_net_refill_rx_chain(struct spider
 static int
 spider_net_alloc_rx_skbs(struct spider_net_card *card)
 {
-   int result;
-   struct spider_net_descr_chain *chain;
+   struct spider_net_descr_chain *chain = &card->rx_chain;
+   struct spider_net_descr *start= chain->tail;
+   struct spider_net_descr *descr = start;
 
-   result = -ENOMEM;
+   /* Link up the hardware chain pointers */
+   do {
+   descr->prev->hwdescr->next_descr_addr = descr->bus_addr;
+   descr = descr->next;
+   } while (descr != start);
 
-   chain = &card->rx_chain;
/* Put at least one buffer into the chain. if this fails,
 * we've got a problem. If not, spider_net_refill_rx_chain
 * will do the rest at the end of this function. */
@@ -579,7 +579,7 @@ spider_net_alloc_rx_skbs(struct spider_n
 
 error:
spider_net_free_rx_chain_contents(card);
-   return result;
+   return -ENOMEM;
 }
 
 /**
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/10] spidernet: null out skb pointer after its been used.

2007-05-22 Thread Linas Vepstas

If the ethernet interface is brought down while there is still
RX traffic in flight, the device shutdown routine can end up
trying to double-free an skb, leading to a crash in mm/slab.c
Avoid the double-free by nulling out the skb pointer.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |1 +
 1 file changed, 1 insertion(+)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:32.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:34.0 -0500
@@ -1136,6 +1136,7 @@ spider_net_decode_one_descr(struct spide
 
/* Ok, we've got a packet in descr */
spider_net_pass_skb_up(descr, card);
+   descr->skb = NULL;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
return 1;
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/10] spidernet: zero out a pointer.

2007-05-22 Thread Linas Vepstas

Invalidate a pointer as its pci_unmap'ed; this is a bit of 
paranoia to make sure hardware doesn't continue trying to 
DMA to it.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:30.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:32.0 -0500
@@ -1069,6 +1069,7 @@ spider_net_decode_one_descr(struct spide
struct spider_net_descr_chain *chain = &card->rx_chain;
struct spider_net_descr *descr = chain->tail;
struct spider_net_hw_descr *hwdescr = descr->hwdescr;
+   u32 hw_buf_addr;
int status;
 
status = spider_net_get_descr_status(hwdescr);
@@ -1082,7 +1083,9 @@ spider_net_decode_one_descr(struct spide
chain->tail = descr->next;
 
/* unmap descriptor */
-   pci_unmap_single(card->pdev, hwdescr->buf_addr,
+   hw_buf_addr = hwdescr->buf_addr;
+   hwdescr->buf_addr = 0x0;
+   pci_unmap_single(card->pdev, hw_buf_addr,
SPIDER_NET_MAX_FRAME, PCI_DMA_FROMDEVICE);
 
if ( (status == SPIDER_NET_DESCR_RESPONSE_ERROR) ||
@@ -1118,7 +1121,7 @@ spider_net_decode_one_descr(struct spide
pr_err("%s: bad status, cmd_status=x%08x\n",
   card->netdev->name,
   hwdescr->dmac_cmd_status);
-   pr_err("buf_addr=x%08x\n", hwdescr->buf_addr);
+   pr_err("buf_addr=x%08x\n", hw_buf_addr);
pr_err("buf_size=x%08x\n", hwdescr->buf_size);
pr_err("next_descr_addr=x%08x\n", hwdescr->next_descr_addr);
pr_err("result_size=x%08x\n", hwdescr->result_size);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 4/10] spidernet: zero out a pointer.

2007-05-22 Thread Linas Vepstas
On Thu, May 17, 2007 at 09:32:56AM +1000, Michael Ellerman wrote:
> > +   hwdescr->buf_addr = 0x0;
> 
> If you're going to be paranoid, shouldn't you do something here to make
> sure the value's hit the device?

I thought the whole point of paranoia is that its inexplicable.

Here's a delusional reply: I didn't see any point to it. 
1) a wmb would add overhead
2) the hardware is supposed to be looking at the status flag,
   anyway, and not misbehaving.
3) there is a wmb when the descr is actually refilled in such
   a way as to actually mean something to the hardware.

All that I really acomplished here is a minor trick to 
aid in debug printing when looking for something bad.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/10] spidernet: move a block of code around

2007-05-22 Thread Linas Vepstas

Put the enable and disable routines next to one-another, 
as this makes verifying thier symmetry that much easier.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:24.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:30.0 -0500
@@ -506,6 +506,20 @@ spider_net_enable_rxdmac(struct spider_n
 }
 
 /**
+ * spider_net_disable_rxdmac - disables the receive DMA controller
+ * @card: card structure
+ *
+ * spider_net_disable_rxdmac terminates processing on the DMA controller
+ * by turing off the DMA controller, with the force-end flag set.
+ */
+static inline void
+spider_net_disable_rxdmac(struct spider_net_card *card)
+{
+   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
+SPIDER_NET_DMA_RX_FEND_VALUE);
+}
+
+/**
  * spider_net_refill_rx_chain - refills descriptors/skbs in the rx chains
  * @card: card structure
  *
@@ -657,20 +671,6 @@ write_hash:
 }
 
 /**
- * spider_net_disable_rxdmac - disables the receive DMA controller
- * @card: card structure
- *
- * spider_net_disable_rxdmac terminates processing on the DMA controller by
- * turing off DMA and issueing a force end
- */
-static void
-spider_net_disable_rxdmac(struct spider_net_card *card)
-{
-   spider_net_write_reg(card, SPIDER_NET_GDADMACCNTR,
-SPIDER_NET_DMA_RX_FEND_VALUE);
-}
-
-/**
  * spider_net_prepare_tx_descr - fill tx descriptor with skb data
  * @card: card structure
  * @descr: descriptor structure to fill out
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/10] spidernet: beautify error messages

2007-05-22 Thread Linas Vepstas

Make error messages print which interface they apply to.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |   10 ++
 drivers/net/spider_net.h |2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-22 18:03:16.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:24.0 -0500
@@ -434,7 +434,8 @@ spider_net_prepare_rx_descr(struct spide
  bufsize + SPIDER_NET_RXBUF_ALIGN - 1);
if (!descr->skb) {
if (netif_msg_rx_err(card) && net_ratelimit())
-   pr_err("Not enough memory to allocate rx buffer\n");
+   pr_err("%s: Not enough memory to allocate rx buffer\n",
+   card->netdev->name);
card->spider_stats.alloc_rx_skb_error++;
return -ENOMEM;
}
@@ -455,7 +456,8 @@ spider_net_prepare_rx_descr(struct spide
dev_kfree_skb_any(descr->skb);
descr->skb = NULL;
if (netif_msg_rx_err(card) && net_ratelimit())
-   pr_err("Could not iommu-map rx buffer\n");
+   pr_err("%s: Could not iommu-map rx buffer\n",
+ card->netdev->name);
card->spider_stats.rx_iommu_map_error++;
hwdescr->dmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE;
} else {
@@ -1455,8 +1457,8 @@ spider_net_handle_error_irq(struct spide
case SPIDER_NET_GRFAFLLINT: /* fallthrough */
case SPIDER_NET_GRMFLLINT:
if (netif_msg_intr(card) && net_ratelimit())
-   pr_err("Spider RX RAM full, incoming packets "
-  "might be discarded!\n");
+   pr_err("%s: Spider RX RAM full, incoming packets "
+  "might be discarded!\n", card->netdev->name);
spider_net_rx_irq_off(card);
netif_rx_schedule(card->netdev);
show_error = 0;
Index: netdev-2.6/drivers/net/spider_net.h
===
--- netdev-2.6.orig/drivers/net/spider_net.h2007-05-21 17:40:49.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.h 2007-05-22 18:03:24.0 -0500
@@ -25,7 +25,7 @@
 #ifndef _SPIDER_NET_H
 #define _SPIDER_NET_H
 
-#define VERSION "2.0 A"
+#define VERSION "2.0 B"
 
 #include "sungem_phy.h"
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/10] spidernet: skb used after netif_receive_skb

2007-05-22 Thread Linas Vepstas
From: Florin Malita <[EMAIL PROTECTED]>

The stats update code in spider_net_pass_skb_up() is touching the skb 
after it's been passed up to the stack. To avoid that, just update the 
stats first.

Signed-off-by: Florin Malita <[EMAIL PROTECTED]>
Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>


 drivers/net/spider_net.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 108adbf..1df2f0b 100644
Index: netdev-2.6/drivers/net/spider_net.c
===
--- netdev-2.6.orig/drivers/net/spider_net.c2007-05-21 17:40:49.0 
-0500
+++ netdev-2.6/drivers/net/spider_net.c 2007-05-22 18:03:16.0 -0500
@@ -1014,12 +1014,12 @@ spider_net_pass_skb_up(struct spider_net
 */
}
 
-   /* pass skb up to stack */
-   netif_receive_skb(skb);
-
/* update netdevice statistics */
card->netdev_stats.rx_packets++;
card->netdev_stats.rx_bytes += skb->len;
+
+   /* pass skb up to stack */
+   netif_receive_skb(skb);
 }
 
 #ifdef DEBUG
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] s2io: don't run MSI handlers if device is offline.

2007-05-22 Thread Linas Vepstas

Don't run any of the MSI handlers if the channel is off;
also don't gather device statatistics. Also, netif_wake 
not needed, per suggestions from
Sivakumar Subramani <[EMAIL PROTECTED]>.

Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]>
Cc: Ramkrishna Vepa <[EMAIL PROTECTED]>
Cc: Sivakumar Subramani <[EMAIL PROTECTED]>
Cc: Sreenivasa Honnur <[EMAIL PROTECTED]>
Cc: Rastapur Santosh <[EMAIL PROTECTED]>
Cc: Wen Xiong <[EMAIL PROTECTED]>


diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index e46e164..871c37c 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -4202,6 +4202,9 @@ static irqreturn_t s2io_msi_handle(int i
struct mac_info *mac_control;
struct config_param *config;
 
+   if (pci_channel_offline(sp->pdev))
+   return IRQ_NONE;
+
atomic_inc(&sp->isr_cnt);
mac_control = &sp->mac_control;
config = &sp->config;
@@ -4232,6 +4235,9 @@ static irqreturn_t s2io_msix_ring_handle
struct ring_info *ring = (struct ring_info *)dev_id;
struct s2io_nic *sp = ring->nic;
 
+   if (pci_channel_offline(sp->pdev))
+   return IRQ_NONE;
+
atomic_inc(&sp->isr_cnt);
 
rx_intr_handler(ring);
@@ -4246,6 +4252,9 @@ static irqreturn_t s2io_msix_fifo_handle
struct fifo_info *fifo = (struct fifo_info *)dev_id;
struct s2io_nic *sp = fifo->nic;
 
+   if (pci_channel_offline(sp->pdev))
+   return IRQ_NONE;
+
atomic_inc(&sp->isr_cnt);
tx_intr_handler(fifo);
atomic_dec(&sp->isr_cnt);
@@ -4428,6 +4437,9 @@ static void s2io_updt_stats(struct s2io_
u64 val64;
int cnt = 0;
 
+   if (pci_channel_offline(sp->pdev))
+   return;
+
if (atomic_read(&sp->card_state) == CARD_UP) {
/* Apprx 30us on a 133 MHz bus */
val64 = SET_UPDT_CLICKS(10) |
@@ -8122,5 +8134,4 @@ static void s2io_io_resume(struct pci_de
}
 
netif_device_attach(netdev);
-   netif_wake_queue(netdev);
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] s2io: add PCI error recovery support

2007-05-22 Thread Linas Vepstas
On Mon, May 21, 2007 at 06:51:45PM -0400, Jeff Garzik wrote:
> 
> >The part that confuses me is that I'd gotten a message from Jeff
> >back in March (well before 2.6.21 came out), saying it was in his
> >development tree; yet, the patch its not in 2.6.22-rc; Torvalds
> >hasn't yet pulled from it?
> 
> It only appeared in my tree on May 14.  I tend to drop patches that are 
> repeatedly revised, allowing the dust to settle.

OK, a new patch is coming. I did not want to pester you until after -rc1
came out, but perhaps that was the wrong strategy.

--linas

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] s2io: add PCI error recovery support

2007-05-21 Thread Linas Vepstas
On Mon, May 21, 2007 at 02:48:47PM -0700, Andrew Morton wrote:
> On Mon, 21 May 2007 13:58:53 -0500
> [EMAIL PROTECTED] (Linas Vepstas) wrote:
> > This patch adds PCI error recovery support to the 
> 
> This is already in Jeff's development tree.  Your new patch neither
> applies nor unapplies, so if you've changed it, Jeff is now sitting
> on an old version.  I assume he'd like an incremental update patch.

Ahh ! 

I assume I have to git-pull
  /pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
or something like that. Will try that now.

The part that confuses me is that I'd gotten a message from Jeff
back in March (well before 2.6.21 came out), saying it was in his
development tree; yet, the patch its not in 2.6.22-rc; Torvalds
hasn't yet pulled from it?

--linas
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   >