Re: 2.6.16: assertion (!sk->sk_forward_alloc) failed

2006-03-29 Thread JaniD++
Hello, list,

I have this issue too sometimes (~daily), and i can help to find this one if
somebody send me a debug patch! ;-)

Cheers,
Janos

- Original Message - 
From: "Jesse Brandeburg" <[EMAIL PROTECTED]>
To: "Arnaldo Carvalho de Melo" <[EMAIL PROTECTED]>
Cc: "Phil Oester" <[EMAIL PROTECTED]>; "David S. Miller"
<[EMAIL PROTECTED]>; ; "Jesse Brandeburg"
<[EMAIL PROTECTED]>
Sent: Monday, March 27, 2006 10:39 PM
Subject: Re: 2.6.16: assertion (!sk->sk_forward_alloc) failed


> On 3/27/06, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:
> > On 3/27/06, Phil Oester <[EMAIL PROTECTED]> wrote:
> > > David S. Miller wrote:
> > > > E1000 has some TSO bug most likely, try reproducing with
> > > > "ethtool -K eth0 tso off", replacing "eth0" with your actual
> > > > e1000 interface name(s).
> > > >
> > > That does seem to have solved the problem.
> > >
> > > Suggested next steps?  Should I leave TSO disabled, or pester the
e1000
> > > maintainers?
> >
> > Both, probably 8)
>
> I'm here, and pesterable. I don't understand the connection between
> the assertions mentioned and TSO.  Can someone explain to me the
> possible relation of e1000 (and maybe TSO) and sk->sk_forward_alloc?
> I looked through the code and it appears that sk_forward_alloc tracks
> memory allocations (truesize) for sockets.  Its not immediately clear
> whether its transmit or receive or both.
>
> I know I can't ask you to solve our bug, but I'm looking for some clues
about a
> 1) reproducible bug
> 2) area of the driver I might look at (transmit or receive - TSO
> implies transmit, but we don't mess with truesize in transmit)
>
> The reports of this seem awful intermittent and on the surface it
> seems like a stack bug.  I need some help connecting the dots.
>
> Thanks,
>   Jesse
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.16: assertion (!sk->sk_forward_alloc) failed

2006-03-27 Thread David S. Miller
From: "Jesse Brandeburg" <[EMAIL PROTECTED]>
Date: Mon, 27 Mar 2006 12:39:43 -0800

> The reports of this seem awful intermittent and on the surface it
> seems like a stack bug.  I need some help connecting the dots.

That assertion would trigger if the driver caused a double-free of the
SKB, or somehow otherwise modified the SKB from it's original form at
transmit time.

For example, if you were to move the page offset and length pointers
around (to work around a hw bug or similar), it could break the
accounting if not done correctly.

There was a recent case of a use-after-free in the transmit path, if
you remember, where -EFAULT was being returned from the
->hard_start_xmit() method of the e1000 driver instead of a proper
NETDEV_TX_* value, which causes the caller to think the packet had not
been freed, when in fact it was, so it would loop and try to resend
the freed SKB.

There may be other kinds of bugs lurking in the e1000 transmit path.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.16: assertion (!sk->sk_forward_alloc) failed

2006-03-27 Thread Jesse Brandeburg
On 3/27/06, Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> wrote:
> On 3/27/06, Phil Oester <[EMAIL PROTECTED]> wrote:
> > David S. Miller wrote:
> > > E1000 has some TSO bug most likely, try reproducing with
> > > "ethtool -K eth0 tso off", replacing "eth0" with your actual
> > > e1000 interface name(s).
> > >
> > That does seem to have solved the problem.
> >
> > Suggested next steps?  Should I leave TSO disabled, or pester the e1000
> > maintainers?
>
> Both, probably 8)

I'm here, and pesterable. I don't understand the connection between
the assertions mentioned and TSO.  Can someone explain to me the
possible relation of e1000 (and maybe TSO) and sk->sk_forward_alloc? 
I looked through the code and it appears that sk_forward_alloc tracks
memory allocations (truesize) for sockets.  Its not immediately clear
whether its transmit or receive or both.

I know I can't ask you to solve our bug, but I'm looking for some clues about a
1) reproducible bug
2) area of the driver I might look at (transmit or receive - TSO
implies transmit, but we don't mess with truesize in transmit)

The reports of this seem awful intermittent and on the surface it
seems like a stack bug.  I need some help connecting the dots.

Thanks,
  Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.16: assertion (!sk->sk_forward_alloc) failed

2006-03-27 Thread Arnaldo Carvalho de Melo
On 3/27/06, Phil Oester <[EMAIL PROTECTED]> wrote:
> David S. Miller wrote:
> > E1000 has some TSO bug most likely, try reproducing with
> > "ethtool -K eth0 tso off", replacing "eth0" with your actual
> > e1000 interface name(s).
> >
> That does seem to have solved the problem.
>
> Suggested next steps?  Should I leave TSO disabled, or pester the e1000
> maintainers?

Both, probably 8)

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.16: assertion (!sk->sk_forward_alloc) failed

2006-03-27 Thread Phil Oester

David S. Miller wrote:

E1000 has some TSO bug most likely, try reproducing with
"ethtool -K eth0 tso off", replacing "eth0" with your actual
e1000 interface name(s).
  

That does seem to have solved the problem.

Suggested next steps?  Should I leave TSO disabled, or pester the e1000 
maintainers?


Phil
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.16: assertion (!sk->sk_forward_alloc) failed

2006-03-24 Thread David S. Miller
From: Phil Oester <[EMAIL PROTECTED]>
Date: Fri, 24 Mar 2006 17:03:35 -0800

> Upgraded a number of heavily used squid boxes to 2.6.16 from 2.6.13
> this week, and noticed these errors in the logs:
> 
> KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
> KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
> 
> but then looked through the logs and noticed they were occurring
> even on 2.6.13.  I see some talk about the messages in the archives,
> but no resolution.  Some of the discussion centered around e1000, which these
> boxes do use.  I checked another squid box which uses e100, and it
> does not have these errors.
> 
> Thoughts?

E1000 has some TSO bug most likely, try reproducing with
"ethtool -K eth0 tso off", replacing "eth0" with your actual
e1000 interface name(s).

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.16: assertion (!sk->sk_forward_alloc) failed

2006-03-24 Thread Phil Oester
Upgraded a number of heavily used squid boxes to 2.6.16 from 2.6.13
this week, and noticed these errors in the logs:

KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)

but then looked through the logs and noticed they were occurring
even on 2.6.13.  I see some talk about the messages in the archives,
but no resolution.  Some of the discussion centered around e1000, which these
boxes do use.  I checked another squid box which uses e100, and it
does not have these errors.

Thoughts?

Phil
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html