Using 2.6.21-rc1 (x86-64) I can get an oops in the forcedeth driver in
usually under about 5s with heavy network load (near line-rate GE, a
simpy using netcat and /dev/zero from one host to another suffices).

In nv_rx_done we have:

        if (flags & NV_TX_LASTPACKET) {
                if (flags & NV_TX_ERROR) {
                        if (flags & NV_TX_UNDERFLOW)
                                np->stats.tx_fifo_errors++;
                        if (flags & NV_TX_CARRIERLOST)
                                np->stats.tx_carrier_errors++;
                        np->stats.tx_errors++;
                } else {
                        np->stats.tx_packets++;
                        np->stats.tx_bytes += np->get_tx_ctx->skb->len;
                }
                dev_kfree_skb_any(np->get_tx_ctx->skb);
                np->get_tx_ctx->skb = NULL;
        }

Now, it seems that sometimes, for reasons I've not really looked into
as yet that np->get_tx_ctx->skb is NULL, so things go kaput (cr2 ends
up being 0x88, which I assume is the offset of len in skb).

No, if I do something along the lines of:

diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index a363148..59027aa 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -1918,7 +1918,12 @@ static void nv_tx_done(struct net_device *dev)
                                        np->stats.tx_errors++;
                                } else {
                                        np->stats.tx_packets++;
-                                       np->stats.tx_bytes += 
np->get_tx_ctx->skb->len;
+                                       /* XXX for some reason under heavy load,
+                                          np->get_tx_ctx->skb can be null */
+                                       if (likely(np->get_tx_ctx->skb))
+                                               np->stats.tx_bytes += 
np->get_tx_ctx->skb->len;
+                                       else
+                                               printk(KERN_ERR "XXX saw null 
skb\n");
                                }
                                dev_kfree_skb_any(np->get_tx_ctx->skb);
                                np->get_tx_ctx->skb = NULL;

the problem goes away completely, I can do hours of traffic, 100s of
GBs where it would break in a few seconds before.  However, I never
see the printk actually print anything...  so I'm a bit mystified.  I
disassembled the code in the original case and it seems perfectly
sane.

Can anyone explain why I see ->skb == NULL and why the above change
seems to make that go away?  (Or perhaps why the printk isn't
working).

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to