Fw: Assert: CPU #..., mangle/filter comefrom(....) = ...

2006-02-08 Thread Andrew Morton


Begin forwarded message:

Date: Thu, 09 Feb 2006 06:10:49 +0100
From: Knut Petersen <[EMAIL PROTECTED]>
To: linux-kernel@vger.kernel.org
Subject: Assert: CPU #..., mangle/filter comefrom() = ...


As there probably is a reason to printk assertions:

[   28.616455] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   28.617112] ASSERT: CPU #0, mangle comefrom(f75ae890) = 1
[   28.617115] ASSERT: CPU #0, filter comefrom(f79ca090) = 2
[   28.934256] ASSERT: CPU #0, mangle comefrom(f7051890) = 1
[   28.934260] ASSERT: CPU #0, filter comefrom(f7aa9090) = 2
[   32.766440] ASSERT: CPU #0, mangle comefrom(f709c090) = 1
[   32.766443] ASSERT: CPU #0, filter comefrom(f71f3090) = 2

cu,
 Knut
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KERNEL: assertion (!sk->sk_forward_alloc) failed

2006-02-08 Thread Jesse Brandeburg
On 2/8/06, David S. Miller <[EMAIL PROTECTED]> wrote:
> From: Jesse Brandeburg <[EMAIL PROTECTED]>
> Date: Wed, 8 Feb 2006 12:07:14 -0800
>
> > this should be on netdev (cc'd), i included some of the thread here.
>  ...
> > I though Herbert had fixed these, and it looks like half the patches
> > got into 2.6.14.3, but not the fix to the fix committed on 9-6 (not in
> > 2.6.14.* at all)
>
> What are the changeset IDs so I can fix this?

I think the commit id that is missing from 2.6.14.X is
fb5f5e6e0cebd574be737334671d1aa8f170d5f3

but here is the web link if i gave the wrong info
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fb5f5e6e0cebd574be737334671d1aa8f170d5f3
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] check connect(2) status for IPv6 UDP socket

2006-02-08 Thread Herbert Xu
David S. Miller <[EMAIL PROTECTED]> wrote:
> From: Nicolas DICHTEL <[EMAIL PROTECTED]>
> Date: Mon, 06 Feb 2006 12:00:30 +0100
> 
>> in the same way of this patch, why dst_entry are stored for
>> RAW socket ? In case of specific IPSec rules for ICMPv6,
>> xfrm state can be different for the same destination.
>> Attached, a proposed patch.
> 
> We cache the flow we used to store that dst into the socket,
> and we'll verify that on the next sendmsg() call so it's OK.
> 
> See the checks done in ip6_dst_lookup() when we have a cached
> route attached to the socket.

I think he's saying that the checks in ip6_dst_lookup is not enough
for IPsec because it only checks the destination address and oif
instead of all the addresses/protocol/ports.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] recent commit breaks multi-descriptor receives with ip fragments

2006-02-08 Thread Jesse Brandeburg


On Tue, 7 Feb 2006, Herbert Xu wrote:

I share your feelings towards this patch.  However, what e1000 is doing
is broken.  It should be filling in the frags array, not frag_list.


Okay, this implies all of our data will be in pages.  Does this mean that 
we have to copy the header back to skb->data?  eth_type_trans requires 
that the header be at skb->data, and kmap'ing the header creates the 
non-trivial problem of having to call kunmap at some point.


My experience with alloc_page/put_page in the "packet split" e1000 code is
that it is slower to call alloc_page/put_page (they show up in top 
10 oprofile) at certain packet rates, and will negatively affect our

performance as compared to using the slab allocator.  Do you think so too?

All we want to do is have a single packet represented by several 
descriptors indicated up the stack.


9K jumbos:
2k
2k
2k
2k
1k

If you think it would be benefical, I think we and others could use some 
more structure in the stack to support this.


As another option, for now we can back out the patch that made memory 
utilization so much more efficient by using small descriptors all the 
time, and go back to our monolithic descriptor usage.


Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread jamal
On Thu, 2006-09-02 at 03:40 +0300, Alexey Kuznetsov wrote:
> Hello!
> 
> > BTW, Alexey - if you have a chance can you look at the breakage of
> > sendmsg() in relation to multicast that exists today?
> 
> If it is is about groups > 31, which cannot be mapped to nl_groups,
> it is possible just to add setsockopt(), setting dst_group. It is just
> to complete the API.
> 

Thats one way to do it (and may infact resolve things)- but not exactly
the way it is at the moment:
netlink_sendmsg() has changed such that even if libnetlink user call 
specified more than one target group, in the kernel it will only got to
the first one; you should really take a look at the whole of
netlink_sendmsg() but here is a sample snippet:

---
if (msg->msg_namelen) {
if (addr->nl_family != AF_NETLINK)
return -EINVAL;
dst_pid = addr->nl_pid;
dst_group = ffs(addr->nl_groups);
if (dst_group && !netlink_capable(sock,
NL_NONROOT_SEND))
return -EPERM;
} else {
dst_pid = nlk->dst_pid;
dst_group = nlk->dst_group;
}

-

which used to be:
-
 if (msg->msg_namelen) {
 if (addr->nl_family != AF_NETLINK)
 return -EINVAL;
 dst_pid = addr->nl_pid;
 dst_groups = addr->nl_groups;
 if (dst_groups && !netlink_capable(sock, NL_NONROOT_SEND))
 return -EPERM;
 } else {
 dst_pid = nlk->dst_pid;
 dst_groups = nlk->dst_groups;
 }
---

note: there is no "groups" anymore.

> Do something use this? I do not think so.
> 

I am not aware of anyone using it but desktop people want > 32 mcast
groups.

> 
> > (libnetlink assumes its existence). I can explain more if what i said
> > didnt make sense.
> 
> Please...
> 

Basically I have an app that sent to multi groups by using one of the
calls in libnetlink [rtnl_talk()] with say groups = 0x3; which
translates to 0x1 and 0x2 in the kernel and such a message will be
delivered to two listeners one subscribed to 0x1 and another to 0x2. -
Now it will only get delivered to the socket listening on 0x1.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] drivers/net/wireless: correct reported ssid lengths

2006-02-08 Thread John W. Linville
On Mon, Jan 30, 2006 at 02:28:43PM -0800, Jean Tourrilhes wrote:
> On Mon, Jan 30, 2006 at 01:28:44PM -0500, Dan Williams wrote:
> > 
> > I'll post a revert patch for the patch I originally sent so that we go
> > back to the original behavior.
> 
>   Sorry, I may have overreacted. I think that's a worthwhile
> change, let's just plan it properly.

Is this resolved?  Does Dan still need to send a patch?

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread David S. Miller
From: jamal <[EMAIL PROTECTED]>
Date: Wed, 08 Feb 2006 19:07:25 -0500

> On Thu, 2006-09-02 at 01:46 +0300, Alexey Kuznetsov wrote:
> > Hello!
> > 
> > > What about the dilemma of when there are no netlink sockets
> > > involved? ;-> 
> > > i.e what is the semantics when there is no netlink socket to map them
> > > to, such as in the case of ioctl?
> > 
> > 0, which is legal address of kernel socket.
> > 
> > In this case it means that "kernel" did the operation.
> 
> Ok, that settles it then - so I have no qualms with the patch.
> All along i was looking at the nl_pid as a representation of
> "who ordered the operation" not "who did the operation"
> I actually have found such knowledge useful in user space.
> It is a good thing Hasso is writing the manpages because he
> has good grasp of the whole thing.

I'll put in the patch either tonight or tomorrow.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Alexey Kuznetsov
Hello!

> BTW, Alexey - if you have a chance can you look at the breakage of
> sendmsg() in relation to multicast that exists today?

If it is is about groups > 31, which cannot be mapped to nl_groups,
it is possible just to add setsockopt(), setting dst_group. It is just
to complete the API.

Do something use this? I do not think so.


> (libnetlink assumes its existence). I can explain more if what i said
> didnt make sense.

Please...

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] recent commit breaks multi-descriptor receives with ip fragments

2006-02-08 Thread David S. Miller
From: Jesse Brandeburg <[EMAIL PROTECTED]>
Date: Wed, 8 Feb 2006 16:30:04 -0800 (Pacific Standard Time)

> Basically it appears that code has been there for a long time (since 2.4) 
> and both e1000 and other drivers use it.  If the change to 
> skb_copy_datagram_iovec is reverted, this method of chaining skbs 
> continues to work.  I don't see any clear reason why it shouldn't work.

I think regardless of your valid points you should fix e1000
to chain page/offset/len pairs into a single SKB because using
a frag list of SKBs is very wasteful and is going to break
TCP socket buffer accounting.

I think there was even a thread about this side effect e1000
has on TCP recently.

Please fix e1000, then I'll revert the skb_copy_datagram_iovec()
change being discussed.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] recent commit breaks multi-descriptor receives with ip fragments

2006-02-08 Thread Jesse Brandeburg

On Tue, 7 Feb 2006, David S. Miller wrote:

From: Jesse Brandeburg <[EMAIL PROTECTED]>
Date: Tue, 7 Feb 2006 17:41:28 -0800 (Pacific Standard Time)

> so we generally call dev_alloc_skb to get the receive buffers to give to
> our hardware.  When we use multiple receive buffers what is the right way
> to allocate memory to give buffers to the hardware and then later, to
> "chain" the descriptors together to make the packet?  Using skb's is the
> common way as far as I've understood it.
>
> Your input on the correct way to do these things is greatly appreciated.

Allocate a single SKB and fill in the skb_shared_info() page/offset/len
pairs, making sure to take proper references to the pages you add.
Coalesce when possible.


Given that there is code in the kernel in
net/ipv4/ip_fragment.c: ip_frag_reasm

/* If the first fragment is fragmented itself, we split
 * it to two chunks: the first with data and paged part
 * and the second, holding only fragments. */

checks for a frag_list and in response, *creates* the frag 
list inside a frag list, when does that code get used?


Basically it appears that code has been there for a long time (since 2.4) 
and both e1000 and other drivers use it.  If the change to 
skb_copy_datagram_iovec is reverted, this method of chaining skbs 
continues to work.  I don't see any clear reason why it shouldn't work.


Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread jamal
On Thu, 2006-09-02 at 01:46 +0300, Alexey Kuznetsov wrote:
> Hello!
> 
> > What about the dilemma of when there are no netlink sockets
> > involved? ;-> 
> > i.e what is the semantics when there is no netlink socket to map them
> > to, such as in the case of ioctl?
> 
> 0, which is legal address of kernel socket.
> 
> In this case it means that "kernel" did the operation.

Ok, that settles it then - so I have no qualms with the patch.
All along i was looking at the nl_pid as a representation of
"who ordered the operation" not "who did the operation"
I actually have found such knowledge useful in user space.
It is a good thing Hasso is writing the manpages because he
has good grasp of the whole thing.

Alexey - if you have time take a look at the sendmsg() thing i 
referenced earlier.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Herbert Xu
On Wed, Feb 08, 2006 at 03:58:59PM -0800, David S. Miller wrote:
> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Thu, 09 Feb 2006 10:49:37 +1100
> 
> > The difference between gcc -pedantic and sparse is that it doesn't
> > warn about obviously correct cases like p != 0 or p = 0.
> 
> So obviously correct that you left out an equals sign in the
> second case :-)

Actually I intended that to be an assignment as this is something that
has generated many sparse patches.  Now if this was a comparison
mistakenly written as an assignment, gcc would pick it up anyway:

a.c:3: warning: suggest parentheses around assignment used as truth value

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Randy.Dunlap
On Wed, 8 Feb 2006, David S. Miller wrote:

> From: Herbert Xu <[EMAIL PROTECTED]>
> Date: Thu, 09 Feb 2006 10:49:37 +1100
>
> > The difference between gcc -pedantic and sparse is that it doesn't
> > warn about obviously correct cases like p != 0 or p = 0.
>
> So obviously correct that you left out an equals sign in the
> second case :-)

You should thank him.  8;)

-- 
~Randy
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv6 address autoconfiguration does not work after device down/up cycle

2006-02-08 Thread YOSHIFUJI Hideaki / 吉藤英明
In article <[EMAIL PROTECTED]> (at Wed, 08 Feb 2006 15:59:31 -0800 (PST)), 
"David S. Miller" <[EMAIL PROTECTED]> says:

> > Good spot.  Please drop {}, otherwise, I agree.  Thank you.
> 
> I'll take care of fixing that when I put in the fix.

Okay, thanks.

Acked-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread David S. Miller
From: Herbert Xu <[EMAIL PROTECTED]>
Date: Thu, 09 Feb 2006 10:49:37 +1100

> The difference between gcc -pedantic and sparse is that it doesn't
> warn about obviously correct cases like p != 0 or p = 0.

So obviously correct that you left out an equals sign in the
second case :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv6 address autoconfiguration does not work after device down/up cycle

2006-02-08 Thread David S. Miller
From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date: Thu, 09 Feb 2006 08:44:36 +0900 (JST)

> Hello.
> 
> In article <[EMAIL PROTECTED]> (at Wed, 08 Feb 2006 17:17:24 +0200), Kristian 
> Slavov <[EMAIL PROTECTED]> says:
> 
> > During NETDEV_DOWN we clear IF_READY, and we don't set it back in 
> > NETDEV_UP. While starting to perform DAD on the link-local address, we 
> > notice that the device is not in IF_READY, and we abort autoconfiguration 
> > process (which would eventually send router solicitations).
> 
> Good spot.  Please drop {}, otherwise, I agree.  Thank you.

I'll take care of fixing that when I put in the fix.
Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Herbert Xu
Alexey Dobriyan <[EMAIL PROTECTED]> wrote:
> 
> Oh, and for the record: current sparse from Linus doesn't warn about
> this. Slightly modified sparse warns. Bugs which were uncovered by
> more or less trivial and slightly broken sparse patch [1] are:
> 
>[PATCH] dscc4: fix dscc4_init_dummy_skb check
>[PATCH] opl3sa2: fix adding second dma channel
>[PATCH] gusclassic: fix adding second dma channel
>[PATCH] ipw2200: fix ->eeprom[EEPROM_VERSION] check
>[PATCH] ixj: fix writing silence check

The first two can also be caught with gcc -pedantic.  In fact, catching
the last two should be doable there as well.  The difference between
gcc -pedantic and sparse is that it doesn't warn about obviously correct
cases like p != 0 or p = 0.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv6 address autoconfiguration does not work after device down/up cycle

2006-02-08 Thread YOSHIFUJI Hideaki / 吉藤英明
Hello.

In article <[EMAIL PROTECTED]> (at Wed, 08 Feb 2006 17:17:24 +0200), Kristian 
Slavov <[EMAIL PROTECTED]> says:

> During NETDEV_DOWN we clear IF_READY, and we don't set it back in 
> NETDEV_UP. While starting to perform DAD on the link-local address, we 
> notice that the device is not in IF_READY, and we abort autoconfiguration 
> process (which would eventually send router solicitations).

Good spot.  Please drop {}, otherwise, I agree.  Thank you.

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Paul Mackerras
Alexey Dobriyan writes:

> The fact that they can be represented by the same bit patterns is
> irrelevant.

Indeed it is.  The fact that the C standard says that "0" is a valid
representation for a null pointer in C source code *is* relevant,
though.  That is in fact something that *wasn't* in K&R C but was
introduced in ISO Standard C, and I think it makes sense.

Anyway, this whole discussion is silly.  Can we move on to discussing
whether "disc" or "disk" is the correct spelling, or something else
equally productive? :)

Paul.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Andi Kleen
On Thursday 09 February 2006 00:07, David Stevens wrote:
> From: Stephen Hemminger <[EMAIL PROTECTED]>
> 
> > IMHO converting skb->dev to skb->devindex and using ifindex sounds best.
> > It gets rid of the need to refcount as much but keeps the safety from
> > buggy protocols.  Ipv6 could probably use ifindex as well.
> 
> A couple years ago, we identified a performance problem in
> IPv6 because it had the index and not the dev in a per-packet piece
> of code. The result was spending a lot of time searching the device
> list just to increment a (required) per-device MIB counter.
> I don't know if that's still there or not, but of course
> those sorts of implications should be considered, too.

It's using a hash these days.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Herbert Xu
On Wed, Feb 08, 2006 at 02:47:12PM -0800, David S. Miller wrote:
> From: David Stevens <[EMAIL PROTECTED]>
> Date: Wed, 8 Feb 2006 14:45:08 -0800
> 
> > Why would sparse complain about this? 0 is a well-defined
> > pointer value (the only value guaranteed to be by the language).
> 
> Because sparse goes beyond the standards and tries to
> catch cases that usually end up being bugs.

Well to be frank the only significant NULL/0 bug that I know can't
be caught by converting 0 to NULL anyway.  The bug only exists on
architectures where the representations of null-pointers of types
are different.  On those architectures, if you have something like

int func(char *fmt, ...);

...

func("foo", NULL);

Then this may break if func actually expected a int * while NULL
could be something different altogether.  The only safe way to
write this is to have

func("foo", (int *)NULL);

or

func("foo", (int *)0);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Hasso Tepper
jamal wrote:
> a) if user-app using netlink modifies something, the event will report
> the nl_pid mapped to its pid or other non-zero value depending on
> the number of netlink sockets mapped to that process/user-app

It will always be nl_pid of the socket. And don't forget that this pid and 
negative numbers stuff is valid only if application doesn't set nl_pid. 
It's completely legal for application to do that before calling bind() 
and if there's no conflict (no already open netlink socket with this 
nl_pid), everything is OK.


-- 
Hasso Tepper
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread David Stevens
From: Stephen Hemminger <[EMAIL PROTECTED]>

> IMHO converting skb->dev to skb->devindex and using ifindex sounds best.
> It gets rid of the need to refcount as much but keeps the safety from
> buggy protocols.  Ipv6 could probably use ifindex as well.

A couple years ago, we identified a performance problem in
IPv6 because it had the index and not the dev in a per-packet piece
of code. The result was spending a lot of time searching the device
list just to increment a (required) per-device MIB counter.
I don't know if that's still there or not, but of course
those sorts of implications should be considered, too.

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Alexey Dobriyan
On Wed, Feb 08, 2006 at 02:45:08PM -0800, David Stevens wrote:
> [EMAIL PROTECTED] wrote on 02/08/2006 02:19:20 PM:
> 
> > James Carlson <[EMAIL PROTECTED]> wrote:
> > > Alexey Dobriyan writes:
> > >> - if (ap == 0)
> > >> + if (!ap)
> > >
> > > And the solution is to treat it as a boolean instead?!  I'm not sure
> > > which is more ugly.
> >
> > Treating it as a boolean looks good to me.  It's better than the
> existing
> > code because it shuts sparse up.
>
> Why would sparse complain about this? 0 is a well-defined
> pointer value (the only value guaranteed to be by the language).
> From K&R ANSI C edition, section 5.4:
>
> "Pointers and integers are not interchangeable. Zero is the sole
> exception: the constant zero may be assigned to a pointer, and a pointer
> may be compared with the constant zero."

O is an integer, NULL is a pointer. You compare integers with integers
and pointers with pointers. You don't compare integers with pointers
because they are fundamentally different objects. The former counts
things, the latter points to them.

The fact that they can be represented by the same bit patterns is
irrelevant. The fact that K&R, C89, C99 make

p = malloc();
if (p == NULL)

legal C is irrelevant.

Because

p = malloc();
if (!p)
err;

is idiomatic C, I've used this form.


Oh, and for the record: current sparse from Linus doesn't warn about
this. Slightly modified sparse warns. Bugs which were uncovered by
more or less trivial and slightly broken sparse patch [1] are:

[PATCH] dscc4: fix dscc4_init_dummy_skb check
[PATCH] opl3sa2: fix adding second dma channel
[PATCH] gusclassic: fix adding second dma channel
[PATCH] ipw2200: fix ->eeprom[EEPROM_VERSION] check
[PATCH] ixj: fix writing silence check

ppp-NULL.patch is the biggest part of
187 files changed, 494 insertions(+), 499 deletions(-)
so I've flushed it first.

[1]:

--- a/evaluate.c
+++ b/evaluate.c
@@ -934,6 +934,10 @@ static struct symbol *evaluate_compare(s
/* Pointer types? */
if (is_ptr_type(ltype) || is_ptr_type(rtype)) {
// FIXME! Check the types for compatibility
+   if (is_ptr_type(ltype) && !is_ptr_type(rtype))
+   bad_expr_type(expr);
+   if (!is_ptr_type(ltype) && is_ptr_type(rtype))
+   bad_expr_type(expr);
expr->op = modify_for_unsigned(expr->op);
goto OK;
}



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Rick Jones

David S. Miller wrote:

From: David Stevens <[EMAIL PROTECTED]>
Date: Wed, 8 Feb 2006 14:45:08 -0800



   Why would sparse complain about this? 0 is a well-defined
pointer value (the only value guaranteed to be by the language).



Because sparse goes beyond the standards and tries to
catch cases that usually end up being bugs.


So, how might it tell when someone is mistakenly using a pointer as a boolean if 
it takes a pass on if(foo) or if(!foo)  instead of looking for if(foo == NULL) 
or if(foo != NULL)?(or I suppose if (NULL == foo) and if (NULL != foo))


it would seem that the compars with NULL, when read by pseudorandom folks would 
be more likely to reinforce that foo is a pointer rather than a boolean/flag.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Paul Mackerras
David S. Miller writes:

> Because sparse goes beyond the standards and tries to
> catch cases that usually end up being bugs.

When has a pointer comparison with an explicit "0" ever caused a bug?

Paul.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Alexey Kuznetsov
Hello!

> What about the dilemma of when there are no netlink sockets
> involved? ;-> 
> i.e what is the semantics when there is no netlink socket to map them
> to, such as in the case of ioctl?

0, which is legal address of kernel socket.

In this case it means that "kernel" did the operation.
Moreover, it is legal (not good, but legal) to use 0 even when it was
an explicit responce to a netlink request. Kernel did the work, eventually.


> a) if user-app using netlink modifies something, the event will report
> the nl_pid mapped to its pid or other non-zero value depending on
> the number of netlink sockets mapped to that process/user-app

It is pid of netlink socket requesting the operation.
If this pid is negative, it will be negative number.

The idea was that someone who heard the message, could ask the author
for an additional information via netlink. It is not used, of course,
but however only real netlink addresses should be used, otherwise
you could decide that the change was made by an innocent party,
which bound to this number by some reason.


> b) if the same user-app used ioctl instead the reported nl_pid is 0
> in the event

Exactly.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread David S. Miller
From: David Stevens <[EMAIL PROTECTED]>
Date: Wed, 8 Feb 2006 14:45:08 -0800

> Why would sparse complain about this? 0 is a well-defined
> pointer value (the only value guaranteed to be by the language).

Because sparse goes beyond the standards and tries to
catch cases that usually end up being bugs.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Paul Mackerras
James Carlson writes:

> Alexey Dobriyan writes:
> > -   if (ap == 0)
> > +   if (!ap)
> 
> And the solution is to treat it as a boolean instead?!  I'm not sure
> which is more ugly.
> 
> Why wouldn't explicit comparison against NULL be the preferred fix?

I just think this whole "you shouldn't compare a pointer to 0" thing
is completely silly, and patches like this have about the same level
of usefulness as patches that change "color" to "colour" or vice
versa.

However, the head penguin seems to have some bee in his bonnet about 0
not being a pointer value (despite what the C standard says), so
whatever.  *shrug*

Paul.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread David S. Miller
From: Andi Kleen <[EMAIL PROTECTED]>
Date: Wed, 8 Feb 2006 23:20:38 +0100

> But are you sure the hash table used for that would be strong enough
> to handle the load?

With RCU locking I think it is.

> And what happens when a ifindex is reused? Then packets could end up
> on the wrong interface. That might be a security issue.

It is an interesting issue, but I don't think it's a real problem.

For example, if the interface is downed, the route gets invalidated
and that would prevent sending.  The packet would go to /dev/null
because of what we do in net/core/dst.c/dst_ifdown().
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread David Stevens
[EMAIL PROTECTED] wrote on 02/08/2006 02:19:20 PM:

> James Carlson <[EMAIL PROTECTED]> wrote:
> > Alexey Dobriyan writes:
> >> - if (ap == 0)
> >> + if (!ap)
> > 
> > And the solution is to treat it as a boolean instead?!  I'm not sure
> > which is more ugly.
> 
> Treating it as a boolean looks good to me.  It's better than the 
existing
> code because it shuts sparse up.

Why would sparse complain about this? 0 is a well-defined
pointer value (the only value guaranteed to be by the language).
From K&R ANSI C edition, section 5.4:

"Pointers and integers are not interchangeable. Zero is the sole
exception: the constant zero may be assigned to a pointer, and a pointer
may be compared with the constant zero."

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network delays, mysterious push packets

2006-02-08 Thread David S. Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 8 Feb 2006 13:08:41 -0800

> If you want to find a patch that made things better/worse 
> (and it is post git). Look into "git bisect"

He's referencing bkbits URLs, I don't think he even knows
what GIT is :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] forcedeth: Add support for MSI/MSIX

2006-02-08 Thread Ayaz Abdulla
The ethernet controller is integrated into the chipset. Therefore, no 
mismatch between MSI/MSIX chipset support and device support should exist.


Ayaz


Roland Dreier wrote:

Roland> Is forcedeth ever used with a discrete ethernet
Roland> controller?  I thought that nforce NICs are always
Roland> intergrated in the chipset, and hence the entire scope for
Roland> MSI problems is within a single chip.  However, I don't
Roland> know if there are devices that report MSI capability but
Roland> where it doesn't work (that's a question for nvidia).

Just to be clear -- I meant nforce devices that report MSI capability
but where it doesn't work.

I know for a fact that there are many many devices in the world that
report an MSI capability but where it doesn't work ;)

 - R.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Roland Dreier
James> And the solution is to treat it as a boolean instead?!  I'm
James> not sure which is more ugly.

James> Why wouldn't explicit comparison against NULL be the
James> preferred fix?

"if (ptr)" and "if (!ptr)" are the preferred idiom for testing whether
a pointer is NULL.  What is gained by writing "if (ptr != NULL)" ?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO) feature(v2) for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs

2006-02-08 Thread David S. Miller
From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Wed, 08 Feb 2006 13:02:15 -0500

> Ravinandan Arakali wrote:
> > Hi,
> > Just wondering if anybody got a chance to review the below patch.
> > This version(as per Rick's comment on v1 patch) includes support
> > for TCP timestamps.
> 
> It's been merged in the 'lro' branch of netdev-2.6.git for a little 
> while now.  Once it gets additional review (and hopefully testing), I am 
> OK with it going upstream.

Now that the timestamp support is in there I'm fine with these
changes.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Herbert Xu
James Carlson <[EMAIL PROTECTED]> wrote:
> Alexey Dobriyan writes:
>> - if (ap == 0)
>> + if (!ap)
> 
> And the solution is to treat it as a boolean instead?!  I'm not sure
> which is more ugly.

Treating it as a boolean looks good to me.  It's better than the existing
code because it shuts sparse up.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Andi Kleen
On Wednesday 08 February 2006 20:12, Stephen Hemminger wrote:
> On Tue, 07 Feb 2006 16:26:01 -0800 (PST)
> "David S. Miller" <[EMAIL PROTECTED]> wrote:
> 
> > From: Stephen Hemminger <[EMAIL PROTECTED]>
> > Date: Tue, 7 Feb 2006 16:19:42 -0800
> > 
> > > Also, isn't a lot of the problem reduced if network devices
> > > are affinitied?
> > 
> > Not for routing/firewalling, we touch the destination device's
> > counters on input softing of the source device.
> 
> IMHO converting skb->dev to skb->devindex and using ifindex sounds best.
> It gets rid of the need to refcount as much but keeps the safety from
> buggy protocols.  Ipv6 could probably use ifindex as well.

But are you sure the hash table used for that would be strong enough to handle
the load?

And what happens when a ifindex is reused? Then packets could end up
on the wrong interface. That might be a security issue.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : No need to update last_rx in loopback driver

2006-02-08 Thread David S. Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Wed, 08 Feb 2006 16:06:31 +0100

> loopback driver carefully uses per_cpu storage for statistics but updates 
> loopback_dev.last_rx

This has been discussed before, this is an attribute every
driver must keep uptodate.  Things like bonding use it,
for example.

Yes, loopback isn't usable with bonding, but it's a bad
precedence to set that a useful metric is set for some
devices and not for others.

Find a cleaner way to fix this SMP cacheline pingpong,
thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Hasso Tepper
jamal wrote:
> So the question is what would be the "address"/nl_pid of something
> issued by an ioctl (refer to my earlier email to Alexey).

It's the kernel who creates this message and puts it to the netlink 
domain, so I'd say 0.


-- 
Hasso Tepper
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] check connect(2) status for IPv6 UDP socket

2006-02-08 Thread David S. Miller
From: Nicolas DICHTEL <[EMAIL PROTECTED]>
Date: Mon, 06 Feb 2006 12:00:30 +0100

> in the same way of this patch, why dst_entry are stored for
> RAW socket ? In case of specific IPSec rules for ICMPv6,
> xfrm state can be different for the same destination.
> Attached, a proposed patch.

We cache the flow we used to store that dst into the socket,
and we'll verify that on the next sendmsg() call so it's OK.

See the checks done in ip6_dst_lookup() when we have a cached
route attached to the socket.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KERNEL: assertion (!sk->sk_forward_alloc) failed

2006-02-08 Thread David S. Miller
From: Jesse Brandeburg <[EMAIL PROTECTED]>
Date: Wed, 8 Feb 2006 12:07:14 -0800

> this should be on netdev (cc'd), i included some of the thread here.
 ...
> I though Herbert had fixed these, and it looks like half the patches
> got into 2.6.14.3, but not the fix to the fix committed on 9-6 (not in
> 2.6.14.* at all)

What are the changeset IDs so I can fix this?

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread James Carlson
Alexey Dobriyan writes:
> - if (ap == 0)
> + if (!ap)

And the solution is to treat it as a boolean instead?!  I'm not sure
which is more ugly.

Why wouldn't explicit comparison against NULL be the preferred fix?

-- 
James Carlson 42.703N 71.076W <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ppp: don't use 0 in pointer context

2006-02-08 Thread Alexey Dobriyan
Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 ppp is the biggest offender.

 drivers/net/ppp_async.c   |   34 ++--
 drivers/net/ppp_generic.c |  128 +++---
 drivers/net/ppp_synctty.c |   26 -
 drivers/net/pppoe.c   |2
 4 files changed, 95 insertions(+), 95 deletions(-)

--- a/drivers/net/ppp_async.c
+++ b/drivers/net/ppp_async.c
@@ -159,7 +159,7 @@ ppp_asynctty_open(struct tty_struct *tty
 
err = -ENOMEM;
ap = kmalloc(sizeof(*ap), GFP_KERNEL);
-   if (ap == 0)
+   if (!ap)
goto out;
 
/* initialize the asyncppp structure */
@@ -215,7 +215,7 @@ ppp_asynctty_close(struct tty_struct *tt
ap = tty->disc_data;
tty->disc_data = NULL;
write_unlock_irq(&disc_data_lock);
-   if (ap == 0)
+   if (!ap)
return;
 
/*
@@ -230,10 +230,10 @@ ppp_asynctty_close(struct tty_struct *tt
tasklet_kill(&ap->tsk);
 
ppp_unregister_channel(&ap->chan);
-   if (ap->rpkt != 0)
+   if (ap->rpkt)
kfree_skb(ap->rpkt);
skb_queue_purge(&ap->rqueue);
-   if (ap->tpkt != 0)
+   if (ap->tpkt)
kfree_skb(ap->tpkt);
kfree(ap);
 }
@@ -285,13 +285,13 @@ ppp_asynctty_ioctl(struct tty_struct *tt
int err, val;
int __user *p = (int __user *)arg;
 
-   if (ap == 0)
+   if (!ap)
return -ENXIO;
err = -EFAULT;
switch (cmd) {
case PPPIOCGCHAN:
err = -ENXIO;
-   if (ap == 0)
+   if (!ap)
break;
err = -EFAULT;
if (put_user(ppp_channel_index(&ap->chan), p))
@@ -301,7 +301,7 @@ ppp_asynctty_ioctl(struct tty_struct *tt
 
case PPPIOCGUNIT:
err = -ENXIO;
-   if (ap == 0)
+   if (!ap)
break;
err = -EFAULT;
if (put_user(ppp_unit_number(&ap->chan), p))
@@ -354,7 +354,7 @@ ppp_asynctty_receive(struct tty_struct *
struct asyncppp *ap = ap_get(tty);
unsigned long flags;
 
-   if (ap == 0)
+   if (!ap)
return;
spin_lock_irqsave(&ap->recv_lock, flags);
ppp_async_input(ap, buf, cflags, count);
@@ -373,7 +373,7 @@ ppp_asynctty_wakeup(struct tty_struct *t
struct asyncppp *ap = ap_get(tty);
 
clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags);
-   if (ap == 0)
+   if (!ap)
return;
set_bit(XMIT_WAKEUP, &ap->xmit_flags);
tasklet_schedule(&ap->tsk);
@@ -688,7 +688,7 @@ ppp_async_push(struct asyncppp *ap)
tty_stuffed = 1;
continue;
}
-   if (ap->optr >= ap->olim && ap->tpkt != 0) {
+   if (ap->optr >= ap->olim && ap->tpkt) {
if (ppp_async_encode(ap)) {
/* finished processing ap->tpkt */
clear_bit(XMIT_FULL, &ap->xmit_flags);
@@ -708,7 +708,7 @@ ppp_async_push(struct asyncppp *ap)
clear_bit(XMIT_BUSY, &ap->xmit_flags);
/* any more work to do? if not, exit the loop */
if (!(test_bit(XMIT_WAKEUP, &ap->xmit_flags)
- || (!tty_stuffed && ap->tpkt != 0)))
+ || (!tty_stuffed && ap->tpkt)))
break;
/* more work to do, see if we can do it now */
if (test_and_set_bit(XMIT_BUSY, &ap->xmit_flags))
@@ -719,7 +719,7 @@ ppp_async_push(struct asyncppp *ap)
 
 flush:
clear_bit(XMIT_BUSY, &ap->xmit_flags);
-   if (ap->tpkt != 0) {
+   if (ap->tpkt) {
kfree_skb(ap->tpkt);
ap->tpkt = NULL;
clear_bit(XMIT_FULL, &ap->xmit_flags);
@@ -852,7 +852,7 @@ ppp_async_input(struct asyncppp *ap, con
s = 0;
for (i = 0; i < count; ++i) {
c = buf[i];
-   if (flags != 0 && flags[i] != 0)
+   if (flags && flags[i] != 0)
continue;
s |= (c & 0x80)? SC_RCV_B7_1: SC_RCV_B7_0;
c = ((c >> 4) ^ c) & 0xf;
@@ -869,7 +869,7 @@ ppp_async_input(struct asyncppp *ap, con
n = scan_ordinary(ap, buf, count);
 
f = 0;
-   if (flags != 0 && (ap->state & SC_TOSS) == 0) {
+   if (flags && (ap->state & SC_TOSS) == 0) {
/* check the flags to see if any char had an error */
for (j = 0; j < n; ++j)
if ((f = flags[j]) != 0)
@@ -882,9 +882,9 @@ ppp_async_input(struct asyncppp *ap, con
} else if (n > 0 && (ap->state & SC_TOSS) == 0) {
/* stuff the chars in the skb */
s

Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread jamal
On Wed, 2006-08-02 at 23:41 +0200, Hasso Tepper wrote:
> jamal wrote:

> > At the moment if a route (v6 or v4) was added by quagga and i had a
> > socket that was listening in a different process - what pid will i see
> > (in my user space app)? Is it of the quagga process or is it 0?
> 
> No, it's 0xefff for most of Quagga users. The reason behind is that 
> Quagga uses the first netlink socket it opens only for listening 
> multicast groups. For sending messages to the kernel it opens second 
> socket. Quagga lets kernel assign nl_pid's to the socket (by setting it 
> to 0 before calling bind()). The first socket gets nl_pid equal to 
> process pid of quagga (actually it's zebra daemon of quagga suite ;) and 
> the second gets (0 - 4097) -

Right - this is what i referred to as "the negative number". if you add
a third socket it will go below 0xefff etc
  
> net/netlink/af_netlink.c:netlink_autobind() for details.
> 
> As far as I understand it - the only connection between process pid and 
> nl_pid member in sockaddr_nl structure is that Alexey happened to use pid 
> of the process opening netlink socket as base of algorithm assigning 
> unique unicast address to this socket if application doesn't set it 
> itself (ie. it's 0 before calling bind()).

So the question is what would be the "address"/nl_pid of something
issued by an ioctl (refer to my earlier email to Alexey).

cheers,
jamal 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Hasso Tepper
jamal wrote:
> On Wed, 2006-08-02 at 22:50 +0200, Hasso Tepper wrote:
> > jamal wrote:
> > > Ok, thanks for the reminder Hasso.
> > > so essentially at the moment the pid that will show up (if
> > > quagga added the v6 route) will be that of quagga, correct?
> >
> > No. Quote from Alexey:
>
> I sense you missed my question. I understand the variants of PIDs that
> can occur(real, 0 or ones that need you to do a getname()), the
> question is:
> At the moment if a route (v6 or v4) was added by quagga and i had a
> socket that was listening in a different process - what pid will i see
> (in my user space app)? Is it of the quagga process or is it 0?

No, it's 0xefff for most of Quagga users. The reason behind is that 
Quagga uses the first netlink socket it opens only for listening 
multicast groups. For sending messages to the kernel it opens second 
socket. Quagga lets kernel assign nl_pid's to the socket (by setting it 
to 0 before calling bind()). The first socket gets nl_pid equal to 
process pid of quagga (actually it's zebra daemon of quagga suite ;) and 
the second gets (0 - 4097) - see  
net/netlink/af_netlink.c:netlink_autobind() for details.

As far as I understand it - the only connection between process pid and 
nl_pid member in sockaddr_nl structure is that Alexey happened to use pid 
of the process opening netlink socket as base of algorithm assigning 
unique unicast address to this socket if application doesn't set it 
itself (ie. it's 0 before calling bind()).


-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread jamal
On Thu, 2006-09-02 at 00:02 +0300, Alexey Kuznetsov wrote:

> I am afraid It is not logical and inconsistent and really breaks
> netlink. :-)
> 
> Those nl_pid's are _addresses_ of netlink sockets. You cannot fill them
> with random numbers.
> 

Ok, Alexey:
What about the dilemma of when there are no netlink sockets
involved? ;-> 
i.e what is the semantics when there is no netlink socket to map them
to, such as in the case of ioctl?  This is where the majority of those
changes occurred (as well as in the case of the patch you posted). 

by consistency i mean this:

a) if user-app using netlink modifies something, the event will report
the nl_pid mapped to its pid or other non-zero value depending on
the number of netlink sockets mapped to that process/user-app
b) if the same user-app used ioctl instead the reported nl_pid is 0
in the event


cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network delays, mysterious push packets

2006-02-08 Thread Stephen Hemminger
On Wed, 08 Feb 2006 12:18:23 -0800
David Carlton <[EMAIL PROTECTED]> wrote:

> On Mon, 06 Feb 2006 15:50:28 -0800 (PST), "David S. Miller" <[EMAIL 
> PROTECTED]> said:
> 
> > From: David Carlton <[EMAIL PROTECTED]>
> > Date: Mon, 06 Feb 2006 14:38:10 -0800
> 
> >> I'm working on an application that we're trying to switch from a 2.4
> >> kernel to a 2.6 kernel.  (I believe we're using 2.6.9.)  One part of
> >> the program periodically sends out chunks of data (whose size is just
> >> over 1MB) via tcp.
> 
> > Please reproduce with something more current and report to the correct
> > mailing list (netdev@vger.kernel.org).
> 
> Thanks for the suggestion.  The problem went away with 2.6.15 (I
> believe uname -a called it 2.6.15.3), which is nice.
> 
> I'll try to see if I can figure out which patch caused the problem (it
> may or may not be easy for us to switch to that kernel); I don't
> suppose you have any idea what patch caused the problem?  (The other
> endpoint is that the problem is still present in 2.6.10, where for the
> latter we're using whatever the latest 2.6.10 kernel is that's
> available for Fedora Core 2.)  Also, is
>  the right
> place to look for revision history?  Recent commits don't have a
> change message, which makes me think I'm looking in the wrong place.
> 
> I put up a tcpdump at
> , if anybody's
> curious.  The problems start with packet 7367; the problematic
> transmission is from 64.12.82.96:33826 to 64.12.82.49:34483 (which
> starts at packet 6482).  The tcpdump was taken on the sender side, and
> some packets are missing or corrupted in the dump (for reasons that I
> assume have nothing to do with the problem at hand.)
> 

I bet the new appropriate byte count (abc) logic is fixing your app.
You can get old behavior with sysctl.ipv4.tcp_abc=0.

If you want to find a patch that made things better/worse 
(and it is post git). Look into "git bisect"


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread jamal
On Wed, 2006-08-02 at 22:50 +0200, Hasso Tepper wrote:
> jamal wrote:
> > Ok, thanks for the reminder Hasso.
> > so essentially at the moment the pid that will show up (if
> > quagga added the v6 route) will be that of quagga, correct?
> 
> No. Quote from Alexey:
> 

I sense you missed my question. I understand the variants of PIDs that
can occur(real, 0 or ones that need you to do a getname()), the question
is:
At the moment if a route (v6 or v4) was added by quagga and i had a
socket that was listening in a different process - what pid will i see
(in my user space app)? Is it of the quagga process or is it 0?

cheers,
jamal

PS:-I will read your man-page - I have it printed, just havent got
around to it yet.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Alexey Kuznetsov
Hello!

> Ok, thanks for the reminder Hasso.
> so essentially at the moment the pid that will show up (if
> quagga added the v6 route) will be that of quagga, correct?

In the part, where v6 routes added via netlink it was a long expected
thing.. I was lazy in many places to pass identity of author of change
through the layers. Use of 0 was enough good replacement for bummers,
because you never can distinguish the report about change
made by you from a report made by kernel in any case.

What's about use of current->pid, they must be replaced with 0s,
I am sorry I did not search for this bug in net/ipv6...


> Same with v4 iirc? If yes, then isnt it logical that if ifconfig added
> an ip address it should naturally be ifconfig's pid that shows up (for
> consistency)?

I am afraid It is not logical and inconsistent and really breaks
netlink. :-)

Those nl_pid's are _addresses_ of netlink sockets. You cannot fill them
with random numbers.


> [1] I understand the point put forward by Alexey that: had he called
> that field "port" instead of "pid" when he created netlink, things would
> have had different semantics - I am just talking about consistency here.

The semantics is clear. nl_pid is not a pid. :-)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Which driver for SysKonnect SK-9Exxx PCIe NIC?

2006-02-08 Thread Stephen Hemminger
On Wed, 08 Feb 2006 12:20:22 -0800
Ben Greear <[EMAIL PROTECTED]> wrote:

> I have a dual-port SysKonnect NIC.  In 'lspci' I see one entry
> (was expecting two..but maybe that is a separate issue).  The
> chipset appears to be SK-9Exx 10/100/1000Base-T adapter (rev 12).
> 
> skge does not seem to support this NIC (in 2.6.15).  Is there
> another driver that does?
> 
> Thanks,
> Ben
> 

sky2 from 2.6.16 


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Hasso Tepper
jamal wrote:
> Ok, thanks for the reminder Hasso.
> so essentially at the moment the pid that will show up (if
> quagga added the v6 route) will be that of quagga, correct?

No. Quote from Alexey:

"Netlink "pid" has nothing to do with current->pid. I called it 
incorrectly, if it was named "port", the confusion would be avoided."

And quote from netlink.7 I wrote:

"nl_pid is the unicast address of netlink socket.  It's always 0 if the 
destination is in the kernel.  For a  userspace  process,  nl_pid  is 
usually  the  PID  of  the process owning the destination socket. 
However, nl_pid identifies a netlink socket, not a process.  If a process 
owns several netlink sockets, then nl_pid can only be equal to the 
process ID for at most one socket.

There are two ways to  assign  nl_pid to  a  netlink  socket.  If the 
application sets nl_pid before calling bind(2), then it is up to the 
application to make sure that nl_pid is unique.  If the application sets 
it to 0, the kernel takes care of assigning it.  The kernel assigns the 
process ID  to  the  first  netlink socket the process opens and assigns 
a unique nl_pid to every netlink socket that the process subsequently 
creates."

Once again link to the archive to the message I posted with new netlink.7 
no one bother to comment yet ;).

http://www.mail-archive.com/netdev%40vger.kernel.org/msg06166.html


-- 
Hasso Tepper
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread jamal

Ok, thanks for the reminder Hasso.
so essentially at the moment the pid that will show up (if
quagga added the v6 route) will be that of quagga, correct?
Same with v4 iirc? If yes, then isnt it logical that if ifconfig added
an ip address it should naturally be ifconfig's pid that shows up (for
consistency)? Note we can add ip addresses via netlink with newer
kernels and the apps/user-space pid will show up[1].

cheers,
jamal

[1] I understand the point put forward by Alexey that: had he called
that field "port" instead of "pid" when he created netlink, things would
have had different semantics - I am just talking about consistency here.

On Wed, 2006-08-02 at 22:08 +0200, Hasso Tepper wrote:
> Hasso Tepper wrote:
> > No, that was different issue and isn't related with issue Alexey
> > poiting to. The issue I complained and you fixed it, Jamal, was that
> > IPv6 related netlink messages had always pid 0 even if they were issued
> > by application.
> 
> It was unclear, so I have to correct myself - "... by application via 
> netlink socket." That was the change:
> http://www.kernel.org/hg/linux-2.6/?cmd=changeset;node=061262a38daa1717e8343846bc9a8fd5712bd07a
> 
> > What Alexey pointing to is the change you did earlier - set pid in the
> > messages not related to netlink sockets - ie. changes initiated by user
> > using ioctls for example.
> 
> That was the change Alexey is talking about:
> http://www.kernel.org/hg/linux-2.6/?cmd=changeset;node=bd982ca8d4365a3c34e66d8d1be1df04466a19e8
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Which driver for SysKonnect SK-9Exxx PCIe NIC?

2006-02-08 Thread Ben Greear

I have a dual-port SysKonnect NIC.  In 'lspci' I see one entry
(was expecting two..but maybe that is a separate issue).  The
chipset appears to be SK-9Exx 10/100/1000Base-T adapter (rev 12).

skge does not seem to support this NIC (in 2.6.15).  Is there
another driver that does?

Thanks,
Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network delays, mysterious push packets

2006-02-08 Thread David Carlton
On Mon, 06 Feb 2006 15:50:28 -0800 (PST), "David S. Miller" <[EMAIL PROTECTED]> 
said:

> From: David Carlton <[EMAIL PROTECTED]>
> Date: Mon, 06 Feb 2006 14:38:10 -0800

>> I'm working on an application that we're trying to switch from a 2.4
>> kernel to a 2.6 kernel.  (I believe we're using 2.6.9.)  One part of
>> the program periodically sends out chunks of data (whose size is just
>> over 1MB) via tcp.

> Please reproduce with something more current and report to the correct
> mailing list (netdev@vger.kernel.org).

Thanks for the suggestion.  The problem went away with 2.6.15 (I
believe uname -a called it 2.6.15.3), which is nice.

I'll try to see if I can figure out which patch caused the problem (it
may or may not be easy for us to switch to that kernel); I don't
suppose you have any idea what patch caused the problem?  (The other
endpoint is that the problem is still present in 2.6.10, where for the
latter we're using whatever the latest 2.6.10 kernel is that's
available for Fedora Core 2.)  Also, is
 the right
place to look for revision history?  Recent commits don't have a
change message, which makes me think I'm looking in the wrong place.

I put up a tcpdump at
, if anybody's
curious.  The problems start with packet 7367; the problematic
transmission is from 64.12.82.96:33826 to 64.12.82.49:34483 (which
starts at packet 6482).  The tcpdump was taken on the sender side, and
some packets are missing or corrupted in the dump (for reasons that I
assume have nothing to do with the problem at hand.)

David Carlton
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Ben Greear

Stephen Hemminger wrote:

On Wed, 08 Feb 2006 11:24:24 -0800
Ben Greear <[EMAIL PROTECTED]> wrote:



Stephen Hemminger wrote:


On Tue, 07 Feb 2006 16:26:01 -0800 (PST)
"David S. Miller" <[EMAIL PROTECTED]> wrote:




From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 7 Feb 2006 16:19:42 -0800




Also, isn't a lot of the problem reduced if network devices
are affinitied?


Not for routing/firewalling, we touch the destination device's
counters on input softing of the source device.



IMHO converting skb->dev to skb->devindex and using ifindex sounds best.
It gets rid of the need to refcount as much but keeps the safety from
buggy protocols.  Ipv6 could probably use ifindex as well.


If we do this, can we keep a skb->dev pointer and assign it lazily
(sort of like we do with the timestamp?)  That way, we can hopefully
optimize to not bump the refcount in the hot path, but older protocols
can easily be made to work as they have been...



No, just fix the protocols.


Well imagine that the protocol *does* need the netdev.  If we have
to look it up once, might as well store the result in case the next
user of the skb also needs it.  It's a net gain of 4 bytes in
the skb to preserve the skb pointer..seems like a reasonable tradeoff
to me.

Eventually, might be able to 'fix' all the protocols..but surely that
won't happen over-night.


If there is any ifidx -> skb lookups in the hot path, that is liable
to kill us since we'll have to take a lock and grovel through
the netdevice hash table?


The hash table locking could be per chain. or lockless.


Even a per chain lock would be worse than the atomic inc/dec
of the refcount, wouldn't it?

Now, a lockless table..that would be nice.  That should be a win
regardless of any skb work.  Sending packets to a packet-socket
uses a dev_get for every packet, for instance...

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Arnaldo Carvalho de Melo
On 2/8/06, Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> On Wed, 08 Feb 2006 11:24:24 -0800
> Ben Greear <[EMAIL PROTECTED]> wrote:
>
> > Stephen Hemminger wrote:
> > > On Tue, 07 Feb 2006 16:26:01 -0800 (PST)
> > > "David S. Miller" <[EMAIL PROTECTED]> wrote:
> > >
> > >
> > >>From: Stephen Hemminger <[EMAIL PROTECTED]>
> > >>Date: Tue, 7 Feb 2006 16:19:42 -0800
> > >>
> > >>
> > >>>Also, isn't a lot of the problem reduced if network devices
> > >>>are affinitied?
> > >>
> > >>Not for routing/firewalling, we touch the destination device's
> > >>counters on input softing of the source device.
> > >
> > >
> > > IMHO converting skb->dev to skb->devindex and using ifindex sounds best.
> > > It gets rid of the need to refcount as much but keeps the safety from
> > > buggy protocols.  Ipv6 could probably use ifindex as well.
> >
> > If we do this, can we keep a skb->dev pointer and assign it lazily
> > (sort of like we do with the timestamp?)  That way, we can hopefully
> > optimize to not bump the refcount in the hot path, but older protocols
> > can easily be made to work as they have been...
>
> No, just fix the protocols.

Exactly, no point in keeping cruft, we still have SOCKOPS_WRAPPED, etc, it just
piles up.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Hasso Tepper
Hasso Tepper wrote:
> No, that was different issue and isn't related with issue Alexey
> poiting to. The issue I complained and you fixed it, Jamal, was that
> IPv6 related netlink messages had always pid 0 even if they were issued
> by application.

It was unclear, so I have to correct myself - "... by application via 
netlink socket." That was the change:
http://www.kernel.org/hg/linux-2.6/?cmd=changeset;node=061262a38daa1717e8343846bc9a8fd5712bd07a

> What Alexey pointing to is the change you did earlier - set pid in the
> messages not related to netlink sockets - ie. changes initiated by user
> using ioctls for example.

That was the change Alexey is talking about:
http://www.kernel.org/hg/linux-2.6/?cmd=changeset;node=bd982ca8d4365a3c34e66d8d1be1df04466a19e8


-- 
Hasso Tepper
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KERNEL: assertion (!sk->sk_forward_alloc) failed

2006-02-08 Thread Jesse Brandeburg
this should be on netdev (cc'd), i included some of the thread here.

On 2/7/06, Yoseph Basri <[EMAIL PROTECTED]> wrote:
> Hello kernel maillist,
>
> I'm new member maillist.
>
> Currently, I receive the warning log from my kernel.
>
> Since update to
> Linux  2.6.14.3 #1 SMP Fri Nov 25 20:20:05 SGT 2005 i686 GNU/Linux from 2.4,
>
> I am getting the warning log:
>
> kernel: KERNEL: assertion (!sk->sk_forward_alloc) failed at
> net/core/stream.c (279)
> kernel: KERNEL: assertion (!sk->sk_forward_alloc) failed at
> net/ipv4/af_inet.c (148)
>
> Any information about this issue, and how to solve this problem ?
>
> Another server that i rolled back from 2.6.14 to 2.4 kernel and has no
> warning again. is this bug from 2.6 kernel?
>
> Thanks for your info.

On 2/7/06, Yoseph Basri <[EMAIL PROTECTED]> wrote:
> Hi Boris,
>
> I think so, here is from the dmeg info:
>
> Intel(R) PRO/1000 Network Driver - version 6.0.60-k2
> Copyright (c) 1999-2005 Intel Corporation.
> ACPI: PCI Interrupt :06:08.0[A] -> GSI 29 (level, low) -> IRQ 16
> e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
> ACPI: PCI Interrupt :06:08.1[B] -> GSI 30 (level, low) -> IRQ 17
> e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection
> e100: Intel(R) PRO/100 Network Driver, 3.4.14-k2-NAPI
> e100: Copyright(c) 1999-2005 Intel Corporation
>
> any info regarding this warning?
>
> YB
>
> On 2/7/06, Boris B. Zhmurov <[EMAIL PROTECTED]> wrote:
> > Hello, Yoseph Basri.
> >
> > On 07.02.2006 12:46 you said the following:
> >
> > > I am getting the warning log:
> > >
> > > kernel: KERNEL: assertion (!sk->sk_forward_alloc) failed at
> > > net/core/stream.c (279)
> > > kernel: KERNEL: assertion (!sk->sk_forward_alloc) failed at
> > > net/ipv4/af_inet.c (148)
> >
> >
> > Do you have Intel Pro 1000 network card? Same here...

whats the relevance of e1000?

I though Herbert had fixed these, and it looks like half the patches
got into 2.6.14.3, but not the fix to the fix committed on 9-6 (not in
2.6.14.* at all)

Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Stephen Hemminger
On Wed, 08 Feb 2006 11:24:24 -0800
Ben Greear <[EMAIL PROTECTED]> wrote:

> Stephen Hemminger wrote:
> > On Tue, 07 Feb 2006 16:26:01 -0800 (PST)
> > "David S. Miller" <[EMAIL PROTECTED]> wrote:
> > 
> > 
> >>From: Stephen Hemminger <[EMAIL PROTECTED]>
> >>Date: Tue, 7 Feb 2006 16:19:42 -0800
> >>
> >>
> >>>Also, isn't a lot of the problem reduced if network devices
> >>>are affinitied?
> >>
> >>Not for routing/firewalling, we touch the destination device's
> >>counters on input softing of the source device.
> > 
> > 
> > IMHO converting skb->dev to skb->devindex and using ifindex sounds best.
> > It gets rid of the need to refcount as much but keeps the safety from
> > buggy protocols.  Ipv6 could probably use ifindex as well.
> 
> If we do this, can we keep a skb->dev pointer and assign it lazily
> (sort of like we do with the timestamp?)  That way, we can hopefully
> optimize to not bump the refcount in the hot path, but older protocols
> can easily be made to work as they have been...

No, just fix the protocols.

> If there is any ifidx -> skb lookups in the hot path, that is liable
> to kill us since we'll have to take a lock and grovel through
> the netdevice hash table?

The hash table locking could be per chain. or lockless.


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Ben Greear

Stephen Hemminger wrote:

On Tue, 07 Feb 2006 16:26:01 -0800 (PST)
"David S. Miller" <[EMAIL PROTECTED]> wrote:



From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Tue, 7 Feb 2006 16:19:42 -0800



Also, isn't a lot of the problem reduced if network devices
are affinitied?


Not for routing/firewalling, we touch the destination device's
counters on input softing of the source device.



IMHO converting skb->dev to skb->devindex and using ifindex sounds best.
It gets rid of the need to refcount as much but keeps the safety from
buggy protocols.  Ipv6 could probably use ifindex as well.


If we do this, can we keep a skb->dev pointer and assign it lazily
(sort of like we do with the timestamp?)  That way, we can hopefully
optimize to not bump the refcount in the hot path, but older protocols
can easily be made to work as they have been...

If there is any ifidx -> skb lookups in the hot path, that is liable
to kill us since we'll have to take a lock and grovel through
the netdevice hash table?

Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv6 address autoconfiguration does not work after device down/up cycle

2006-02-08 Thread Juha-Matti Tapio
On Wed, Feb 08, 2006 at 05:17:24PM +0200, Kristian Slavov wrote:
> During NETDEV_DOWN we clear IF_READY, and we don't set it back in 
> NETDEV_UP. While starting to perform DAD on the link-local address, we 
> notice that the device is not in IF_READY, and we abort autoconfiguration 
> process (which would eventually send router solicitations).
> The following patch seems to get the work done. Patch is against 2.6.15.3.

Ack. This fixes the problems with sungem that I reported a few days ago.
Thank you. 


signature.asc
Description: Digital signature


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Stephen Hemminger
On Tue, 07 Feb 2006 16:26:01 -0800 (PST)
"David S. Miller" <[EMAIL PROTECTED]> wrote:

> From: Stephen Hemminger <[EMAIL PROTECTED]>
> Date: Tue, 7 Feb 2006 16:19:42 -0800
> 
> > Also, isn't a lot of the problem reduced if network devices
> > are affinitied?
> 
> Not for routing/firewalling, we touch the destination device's
> counters on input softing of the source device.

IMHO converting skb->dev to skb->devindex and using ifindex sounds best.
It gets rid of the need to refcount as much but keeps the safety from
buggy protocols.  Ipv6 could probably use ifindex as well.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
OSDL http://developer.osdl.org/~shemminger
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO) feature(v2) for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs

2006-02-08 Thread Jeff Garzik

Ravinandan Arakali wrote:

Hi,
Just wondering if anybody got a chance to review the below patch.
This version(as per Rick's comment on v1 patch) includes support
for TCP timestamps.


It's been merged in the 'lro' branch of netdev-2.6.git for a little 
while now.  Once it gets additional review (and hopefully testing), I am 
OK with it going upstream.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Hasso Tepper
jamal wrote:
> On Wed, 2006-08-02 at 18:27 +0300, Alexey Kuznetsov wrote:
> > When a netlink message is not related to a netlink socket,
> > it is issued by kernel socket with pid 0. Netlink "pid" has nothing
> > to do with current->pid. I called it incorrectly, if it was named
> > "port", the confusion would be avoided.

This confusion was the main reason I rewrote rtnetlink.7 manpage. I 
received 0 comments till now, though ... does it mean it's so good? ;) 
Can you, Alexey, comment at least pid part?

http://www.mail-archive.com/netdev%40vger.kernel.org/msg06166.html

> > Jamal, please, review. Did you have reasons to do this?
>
> The reason was driven by some apps such as quagga/zebra which
> get confused when they see pid of 0 for things _they_ added.
> Essentially there was lack of consistency, at times the app that made
> the kernel change has its pid appear on the resulting netlink message
> and at others it was 0 or the large (negative) number when you had
> more than 1 socket within the same process.
>
> CCing Hasso Tepper and more details of the original fix are
> here:
> http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html

No, that was different issue and isn't related with issue Alexey poiting 
to. The issue I complained and you fixed it, Jamal, was that IPv6 related 
netlink messages had always pid 0 even if they were issued by 
application.

What Alexey pointing to is the change you did earlier - set pid in the 
messages not related to netlink sockets - ie. changes initiated by user 
using ioctls for example.

-- 
Hasso Tepper
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO) feature(v2) for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs

2006-02-08 Thread Ravinandan Arakali
Hi,
Just wondering if anybody got a chance to review the below patch.
This version(as per Rick's comment on v1 patch) includes support
for TCP timestamps.

Thanks,
Ravi

-Original Message-
From: Ravinandan Arakali [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 25, 2006 11:53 AM
To: [EMAIL PROTECTED]; netdev@vger.kernel.org
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: [PATCH 2.6.16-rc1] S2io: Large Receive Offload (LRO)
feature(v2) for Neterion (s2io) 10GbE Xframe PCI-X and PCI-E NICs


Hi,
Below is a patch for the Large Receive Offload feature.
Please review and let us know your comments.

LRO algorithm was described in an OLS 2005 presentation, located at
ftp.s2io.com
user: linuxdocs
password: HALdocs

The same ftp site has Programming Manual for Xframe-I ASIC.
LRO feature is supported on Neterion Xframe-I, Xframe-II and
Xframe-Express 10GbE NICs.

Brief description:
The Large Receive Offload(LRO) feature is a stateless offload
that is complementary to TSO feature but on the receive path.
The idea is to combine and collapse(upto 64K maximum) in the
driver, in-sequence TCP packets belonging to the same session.
It is mainly designed to improve 1500 mtu receive performance,
since Jumbo frame performance is already close to 10GbE line
rate. Some performance numbers are attached below.

Implementation details:
1. Handle packet chains from multiple sessions(current default
MAX_LRO_SESSSIONS=32).
2. Examine each packet for eligiblity to aggregate. A packet is
considered eligible if it meets all the below criteria.
  a. It is a TCP/IP packet and L2 type is not LLC or SNAP.
  b. The packet has no checksum errors(L3 and L4).
  c. There are no IP options. The only TCP option supported is timestamps.
  d. Search and locate the LRO object corresponding to this
 socket and ensure packet is in TCP sequence.
  e. It's not a special packet(SYN, FIN, RST, URG, PSH etc. flags are not
set).
  f. TCP payload is non-zero(It's not a pure ACK).
  g. It's not an IP-fragmented packet.
3. If a packet is found eligible, the LRO object is updated with
   information such as next sequence number expected, current length
   of aggregated packet and so on. If not eligible or max packets
   reached, update IP and TCP headers of first packet in the chain
   and pass it up to stack.
4. The frag_list in skb structure is used to chain packets into one
   large packet.

Kernel changes required: None

Performance results:
Main focus of the initial testing was on 1500 mtu receiver, since this
is a bottleneck not covered by the existing stateless offloads.

There are couple disclaimers about the performance results below:
1. Your mileage will vary We initially concentrated on couple pci-x
2.0 platforms that are powerful enough to push 10 GbE NIC and do not
have bottlenecks other than cpu%;  testing on other platforms is still
in progress. On some lower end systems we are seeing lower gains.

2. Current LRO implementation is still (for the most part) software based,
and therefore performance potential of the feature is far from being
realized.
Full hw implementation of LRO is expected in the next version of Xframe
ASIC.

Performance delta(with MTU=1500) going from LRO disabled to enabled:
IBM 2-way Xeon (x366) : 3.5 to 7.1 Gbps
2-way Opteron : 4.5 to 6.1 Gbps

Signed-off-by: Ravinandan Arakali <[EMAIL PROTECTED]>
---

diff -urpN old/drivers/net/s2io.c new_ts/drivers/net/s2io.c
--- old/drivers/net/s2io.c  2006-01-19 04:31:05.0 -0800
+++ new_ts/drivers/net/s2io.c   2006-01-24 08:56:25.0 -0800
@@ -57,6 +57,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 

 #include 
 #include 
@@ -66,7 +69,7 @@
 #include "s2io.h"
 #include "s2io-regs.h"

-#define DRV_VERSION "Version 2.0.9.4"
+#define DRV_VERSION "2.0.11.2"

 /* S2io Driver name & version. */
 static char s2io_driver_name[] = "Neterion";
@@ -168,6 +171,11 @@ static char ethtool_stats_keys[][ETH_GST
{"\n DRIVER STATISTICS"},
{"single_bit_ecc_errs"},
{"double_bit_ecc_errs"},
+   ("lro_aggregated_pkts"),
+   ("lro_flush_both_count"),
+   ("lro_out_of_sequence_pkts"),
+   ("lro_flush_due_to_max_pkts"),
+   ("lro_avg_aggr_pkts"),
 };

 #define S2IO_STAT_LEN sizeof(ethtool_stats_keys)/ ETH_GSTRING_LEN
@@ -317,6 +325,12 @@ static unsigned int indicate_max_pkts;
 static unsigned int rxsync_frequency = 3;
 /* Interrupt type. Values can be 0(INTA), 1(MSI), 2(MSI_X) */
 static unsigned int intr_type = 0;
+/* Large receive offload feature */
+static unsigned int lro = 0;
+/* Max pkts to be aggregated by LRO at one time. If not specified,
+ * aggregation happens until we hit max IP pkt size(64K)
+ */
+static unsigned int lro_max_pkts = 0x;

 /*
  * S2IO device table.
@@ -1476,6 +1490,19 @@ static int init_nic(struct s2io_nic *nic
writel((u32) (val64 >> 32), (add + 4));
val64 = readq(&bar0->mac_cfg);

+  

Re: [PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread jamal
Hi Alexey,

On Wed, 2006-08-02 at 18:27 +0300, Alexey Kuznetsov wrote:
> Hello!
> 
> When a netlink message is not related to a netlink socket,
> it is issued by kernel socket with pid 0. Netlink "pid" has nothing
> to do with current->pid. I called it incorrectly, if it was named "port",
> the confusion would be avoided.
> 
> Jamal, please, review. Did you have reasons to do this?
> 

The reason was driven by some apps such as quagga/zebra which
get confused when they see pid of 0 for things _they_ added.
Essentially there was lack of consistency, at times the app that made
the kernel change has its pid appear on the resulting netlink message
and at others it was 0 or the large (negative) number when you had
more than 1 socket within the same process.
 
CCing Hasso Tepper and more details of the original fix are
here:
http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html

BTW, Alexey - if you have a chance can you look at the breakage of
sendmsg() in relation to multicast that exists today? It came in as a
result of trying to extend the number of multicast groups to be > 32
(libnetlink assumes its existence). I can explain more if what i said
didnt make sense.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] illegal use of pid in rtnetlink

2006-02-08 Thread Alexey Kuznetsov
Hello!

When a netlink message is not related to a netlink socket,
it is issued by kernel socket with pid 0. Netlink "pid" has nothing
to do with current->pid. I called it incorrectly, if it was named "port",
the confusion would be avoided.

Jamal, please, review. Did you have reasons to do this?

Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]>

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8700379..eca2976 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -455,7 +455,7 @@ void rtmsg_ifinfo(int type, struct net_d
if (!skb)
return;
 
-   if (rtnetlink_fill_ifinfo(skb, dev, type, current->pid, 0, change, 0) < 
0) {
+   if (rtnetlink_fill_ifinfo(skb, dev, type, 0, 0, change, 0) < 0) {
kfree_skb(skb);
return;
}
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 95b9d81..3ffa60d 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1135,7 +1135,7 @@ static void rtmsg_ifa(int event, struct 
 
if (!skb)
netlink_set_err(rtnl, 0, RTNLGRP_IPV4_IFADDR, ENOBUFS);
-   else if (inet_fill_ifaddr(skb, ifa, current->pid, 0, event, 0) < 0) {
+   else if (inet_fill_ifaddr(skb, ifa, 0, 0, event, 0) < 0) {
kfree_skb(skb);
netlink_set_err(rtnl, 0, RTNLGRP_IPV4_IFADDR, EINVAL);
} else {
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index ef4724d..0f4145b 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1045,7 +1045,7 @@ fib_convert_rtentry(int cmd, struct nlms
}
 
nl->nlmsg_flags = NLM_F_REQUEST;
-   nl->nlmsg_pid = current->pid;
+   nl->nlmsg_pid = 0;
nl->nlmsg_seq = 0;
nl->nlmsg_len = NLMSG_LENGTH(sizeof(*rtm));
if (cmd == SIOCDELRT) {
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IPv6 address autoconfiguration does not work after device down/up cycle

2006-02-08 Thread Kristian Slavov

Hi,

If you set network interface down and up again, the IPv6 address 
autoconfiguration does not work. 'ip addr' shows that the link-local 
address is in tentative state. We don't even react to periodical router 
advertisements.


During NETDEV_DOWN we clear IF_READY, and we don't set it back in 
NETDEV_UP. While starting to perform DAD on the link-local address, we 
notice that the device is not in IF_READY, and we abort autoconfiguration 
process (which would eventually send router solicitations).


The following patch seems to get the work done. Patch is against 2.6.15.3.

BR,
--
Krisu
diff -Naur orig/net/ipv6/addrconf.c new/net/ipv6/addrconf.c
--- orig/net/ipv6/addrconf.c2006-02-08 17:05:20.0 +0200
+++ new/net/ipv6/addrconf.c 2006-02-08 17:07:15.0 +0200
@@ -2164,6 +2164,10 @@
dev->name);
break;
}
+
+   if (idev) {
+   idev->if_flags |= IF_READY;
+   }
} else {
if (!netif_carrier_ok(dev)) {
/* device is still not ready. */


[PATCH] NET : No need to update last_rx in loopback driver

2006-02-08 Thread Eric Dumazet
loopback driver carefully uses per_cpu storage for statistics but updates 
loopback_dev.last_rx


As last_rx is only used by bond driver to detect link activity, and loopback 
dev wont be used as a bond slave, can we avoid this write access that slowdown 
SMP platforms, because of cache ping pongs ?


If this patch is not correct, I suspect we should update trans_start in 
loopback driver since the condition in the bond driver is bidirectional :


if (((jiffies - slave->dev->trans_start) <= delta_in_ticks) &&
((jiffies - slave->dev->last_rx) <= delta_in_ticks)) {

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
--- a/drivers/net/loopback.c2006-02-07 12:10:55.0 +0100
+++ b/drivers/net/loopback.c2006-02-08 16:49:08.0 +0100
@@ -147,7 +147,6 @@
return 0;
}
 #endif
-   dev->last_rx = jiffies;
 
lb_stats = &per_cpu(loopback_stats, get_cpu());
lb_stats->rx_bytes += skb->len;


[PATCH] fix a severe bug in netlink

2006-02-08 Thread Alexey Kuznetsov
Hello!

netlink overrun was broken while improvement of netlink.
Destination socket is used in the place where it was meant to be source socket,
so that now overrun is never sent to user netlink sockets, when it should be,
and it even can be set on kernel socket, which results in complete deadlock
of rtnetlink.

Suggested fix is to restore status quo passing source socket as additional
argument to netlink_attachskb().

A little explanation: overrun is set on a socket, when it failed
to receive some message and sender of this messages does not or even
have no way to handle this error. This happens in two cases:
1. when kernel sends something. Kernel never retransmits and cannot
   wait for buffer space. 
2. when user sends a broadcast and the message was not delivered
   to some recipients. 


Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]>

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 6a2ccf7..c256ebe 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -160,7 +160,8 @@ extern int netlink_unregister_notifier(s
 
 /* finegrained unicast helpers: */
 struct sock *netlink_getsockbyfilp(struct file *filp);
-int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock, long 
timeo);
+int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock,
+   long timeo, struct sock *ssk);
 void netlink_detachskb(struct sock *sk, struct sk_buff *skb);
 int netlink_sendskb(struct sock *sk, struct sk_buff *skb, int protocol);
 
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 59302fc..fd2e26b 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1018,7 +1018,8 @@ retry:
goto out;
}
 
-   ret = netlink_attachskb(sock, nc, 0, 
MAX_SCHEDULE_TIMEOUT);
+   ret = netlink_attachskb(sock, nc, 0,
+   MAX_SCHEDULE_TIMEOUT, NULL);
if (ret == 1)
goto retry;
if (ret) {
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 2101b45..6b9772d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -702,7 +702,8 @@ struct sock *netlink_getsockbyfilp(struc
  * 0: continue
  * 1: repeat lookup - reference dropped while waiting for socket memory.
  */
-int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock, long 
timeo)
+int netlink_attachskb(struct sock *sk, struct sk_buff *skb, int nonblock,
+   long timeo, struct sock *ssk)
 {
struct netlink_sock *nlk;
 
@@ -712,7 +713,7 @@ int netlink_attachskb(struct sock *sk, s
test_bit(0, &nlk->state)) {
DECLARE_WAITQUEUE(wait, current);
if (!timeo) {
-   if (!nlk->pid)
+   if (!ssk || nlk_sk(ssk)->pid == 0)
netlink_overrun(sk);
sock_put(sk);
kfree_skb(skb);
@@ -797,7 +798,7 @@ retry:
kfree_skb(skb);
return PTR_ERR(sk);
}
-   err = netlink_attachskb(sk, skb, nonblock, timeo);
+   err = netlink_attachskb(sk, skb, nonblock, timeo, ssk);
if (err == 1)
goto retry;
if (err)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPVS] Shrink ip_vs_*.c includes

2006-02-08 Thread Arnaldo Carvalho de Melo
On 2/8/06, Roberto Nibali <[EMAIL PROTECTED]> wrote:

> I've thought about the linux/modules.h header (ignoring the fact
> mentioned by Herbert because of its small chance of happening) and my
> suggestion was/is to move it to ip_vs.h. I'll ping Arnaldo regarding
> this cleanup, since he's done it before and certainly has some scripts
> to automate this.

Well, I use a graphviz based tool to see the "picture", i.e. the
header hierarchy,
like this one:

http://oops.ghostprotocols.net:81/acme/tcp.h.ps

The tools are called hviz and ghviz and are available at:

http://www.kernel.org/pub/linux/kernel/people/acme/

I've been thinking about using sparse to do a include "linker", i.e. a tool
that would find all the symbols "provided" by .h files and the symbols
"required" by .h and .c files, that would tell what headers are needed
and what headers currently being #included aren't needed and should
be dropped, but have other priorities in my TODO list right now :-\

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Acx100-devel] [PATCH] acx: make firmware statistics parsing more intelligent

2006-02-08 Thread Andreas Mohr
Hi,

On Sat, Feb 04, 2006 at 01:31:16PM +0200, Denis Vlasenko wrote:
> On Friday 03 February 2006 12:58, Andreas Mohr wrote:
> > - use "% 8" modulo instead of more complicated "% 5" calculation
> 
> Why? There will be HZ=100 and HZ=250 boxes, 8ms doesn't fit well.
> 10ms is ok, 5ms or 2.5ms is more or less ok too, but 8ms?
> 
> But I'll apply this anyway.

Your objection is right, I think, since trading in a more "normal" scheduling
slice length for a *slight* gain in code size doesn't make too much sense.
So maybe still use "% 5"?

Sorry for the wasted discussion...

Andreas Mohr
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mv643xx_eth: remove repeated includes of linux/in.h and linux/ip.h

2006-02-08 Thread Dale Farnsworth
From: Dale Farnsworth <[EMAIL PROTECTED]>

Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>

---

These includes were added twice:
in commit 78a5e534758349fd3effc90ce1152b55368f52ee by Olaf Hering and
in commit b6298c22c5e9f698812e2520003ee178aad50c10 by Al Viro.
This patch reverts 78a5e534758349fd3effc90ce1152b55368f52ee.

They probably should have been included before linux/tcp.h in
the first place.

 drivers/net/mv643xx_eth.c |2 --
 1 file changed, 2 deletions(-)

Index: linux-2.6-mv643xx_enet/drivers/net/mv643xx_eth.c
===
--- linux-2.6-mv643xx_enet.orig/drivers/net/mv643xx_eth.c
+++ linux-2.6-mv643xx_enet/drivers/net/mv643xx_eth.c
@@ -37,8 +37,6 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 
 #include 
 #include 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] get rid of circular list of adev's

2006-02-08 Thread Denis Vlasenko
Hi folks,

If someone has cardbus acx, please test this one.

Basically, in acxpci_e_remove() we will check the presence
of the card by reading a register and if it is not all ones,
we will command device to shut down.

Previously it was done in acxpci_e_cleanup_module().

+INLINE_IO int
+adev_present(acx_device_t *adev)
+{
+   /* fast version (accesses the first register, IO_ACX_SOFT_RESET,
+* which should be safe): */
+   return readl(adev->iobase) != 0x;
+}


@@ -1785,6 +1725,43 @@ acxpci_e_remove(struct pci_dev *pdev)

adev = ndev2adev(ndev);

+   /* If device wasn't hot unplugged... */
+   if (adev_present(adev)) {
+
+   acx_sem_lock(adev);
+
+   /* disable both Tx and Rx to shut radio down properly */
+   acx_s_issue_cmd(adev, ACX1xx_CMD_DISABLE_TX, NULL, 0);
+   acx_s_issue_cmd(adev, ACX1xx_CMD_DISABLE_RX, NULL, 0);
...

--
vda
diff -urpN current.0/acx_struct.h current.1/acx_struct.h
--- current.0/acx_struct.h	Sat Feb  4 13:11:55 2006
+++ current.1/acx_struct.h	Wed Feb  8 12:48:33 2006
@@ -1220,14 +1220,9 @@ struct acx_device {
 	unsigned long		lock_time;
 #endif
 
-	/*** Device chain ***/
-	struct acx_device	*next;		/* link for list of devices */
-
 	/*** Linux network device ***/
 	struct net_device	*ndev;		/* pointer to linux netdevice */
-	struct net_device	*prev_nd;	/* FIXME: We should not chain via our
-		 * private struct acx_device _and_
-		 * the struct net_device */
+
 	/*** Device statistics ***/
 	struct net_device_stats	stats;		/* net device statistics */
 #ifdef WIRELESS_EXT
diff -urpN current.0/pci.c current.1/pci.c
--- current.0/pci.c	Sat Feb  4 14:49:33 2006
+++ current.1/pci.c	Wed Feb  8 13:34:42 2006
@@ -174,11 +174,13 @@ write_flush(acx_device_t *adev)
 	readb(adev->iobase);
 }
 
-
-/***
-*/
-static struct net_device *root_adev_newest = NULL;
-DECLARE_MUTEX(root_adev_sem);
+INLINE_IO int
+adev_present(acx_device_t *adev)
+{
+	/* fast version (accesses the first register, IO_ACX_SOFT_RESET,
+	 * which should be safe): */
+	return readl(adev->iobase) != 0x;
+}
 
 
 /***
@@ -1273,62 +1275,6 @@ acx_show_card_eeprom_id(acx_device_t *ad
 
 
 /***
-*/
-static void
-acxpci_s_device_chain_add(struct net_device *ndev)
-{
-	acx_device_t *adev = ndev2adev(ndev);
-
-	down(&root_adev_sem);
-	adev->prev_nd = root_adev_newest;
-	root_adev_newest = ndev;
-	adev->ndev = ndev;
-	up(&root_adev_sem);
-}
-
-static void
-acxpci_s_device_chain_remove(struct net_device *ndev)
-{
-	struct net_device *querydev;
-	struct net_device *olderdev;
-	struct net_device *newerdev;
-
-	down(&root_adev_sem);
-	querydev = root_adev_newest;
-	newerdev = NULL;
-	while (querydev) {
-		olderdev = ndev2adev(querydev)->prev_nd;
-		if (0 == strcmp(querydev->name, ndev->name)) {
-			if (!newerdev) {
-/* if we were at the beginning of the
- * list, then it's the list head that
- * we need to update to point at the
- * next older device */
-root_adev_newest = olderdev;
-			} else {
-/* it's the device that is newer than us
- * that we need to update to point at
- * the device older than us */
-ndev2adev(newerdev)->prev_nd = olderdev;
-			}
-			break;
-		}
-		/* "newerdev" is actually the device of the old iteration,
-		 * but since the list starts (root_adev_newest)
-		 * with the newest devices,
-		 * it's newer than the ones following.
-		 * Oh the joys of iterating from newest to oldest :-\ */
-		newerdev = querydev;
-
-		/* keep checking old devices for matches until we hit the end
-		 * of the list */
-		querydev = olderdev;
-	}
-	up(&root_adev_sem);
-}
-
-
-/***
 ** acxpci_free_desc_queues
 **
 ** Releases the queues that have been allocated, the
@@ -1643,9 +1589,6 @@ acxpci_e_probe(struct pci_dev *pdev, con
 #endif
 	SET_NETDEV_DEV(ndev, &pdev->dev);
 
-	/* register new dev in linked list */
-	acxpci_s_device_chain_add(ndev);
-
 	log(L_IRQ|L_INIT, "using IRQ %d\n", pdev->irq);
 
 	/* need to be able to restore PCI state after a suspend */
@@ -1725,7 +1668,6 @@ fail_init_mac:
 fail_read_eeprom_version:
 fail_reset:
 
-	acxpci_s_device_chain_remove(ndev);
 	free_netdev(ndev);
 fail_alloc_netdev:
 fail_irq:
@@ -1759,11 +1701,8 @@ done:
 /***
 ** acxpci_e_remove
 **
-** Deallocate PCI resources for the acx chip.
-**
-** This should NOT execute any other hardware operations on the card,
-** since the card might already be ejected. Instead, that should be done
-** in cleanup_module, since the card is most likely still available there.
+** Shut device down (if not hot unplugged)
+** and deallocate PCI resources for the acx chip.
 **
 ** pdev - ptr to PCI de

Re: Strange IPsec freeze/partial fix

2006-02-08 Thread Olaf Kirch
On Wed, Feb 08, 2006 at 07:46:48AM +1100, Herbert Xu wrote:
> I suggest that we simply bail out always.  If the dst decides to die
> on us later on, the packet will be dropped anyway.  So there is no
> great urgency to retry here.  Once we have the proper resolution
> queueing, we can then do the retry again.

Yes, that is simpler, and should work as well.

> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Acked-by: Olaf Kirch <[EMAIL PROTECTED]>

Olaf
-- 
Olaf Kirch   |  --- o --- Nous sommes du soleil we love when we play
[EMAIL PROTECTED] |/ | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Andi Kleen
On Wednesday 08 February 2006 11:34, Eric Dumazet wrote:

> 1) Instead of storing a 2-uple {pointer,generation} (and using 12 or 16 bytes 
> on 64 bits platforms), we could just use a 32 bit quantity 
> [(ifindex<<8)+(gen_number)]

That would add an 2^24 netdevice limit. Someone will sooner or later
run into this.

Also I'm not sure it's worth it just to save 4 bytes. It would cost more
TLB misses too.

> 2) Not easy to not free the devices because their size is variable (see 
> alloc_netdev()) depending on private data for the driver.

you mean reuse?

I guess when you have lots of devices they are very likely to be all
the same size (e.g. all vlan or all PPP) So it would be reasonable to only 
reuse a device with the same size.

Only drawback is that it might be a long search for the right size
if it's at the other end of a long list and there isn't a hash table
or similar.  But I guess that case could be ignored at first because it should 
be 
very unlikely and if it really should be a problem for somebody then some
suitable data structure could be added.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Eric Dumazet

Andi Kleen a écrit :

On Wednesday 08 February 2006 01:44, David S. Miller wrote:

From: Ben Greear <[EMAIL PROTECTED]>
Date: Tue, 07 Feb 2006 16:39:52 -0800


Rick Jones wrote:
In the realm of straw ideas, how often are netdevs added and removed, 
and would leaving a tombstone behind consume too much memory?

In certain cases...say, with vlans, you could very often create and
destroy net devices.  I think that giving up and leaking the memory
is not a good idea.

I think he's suggesting another thing.  Reattach the skb->dev
to some dummy device that always persists, when a device goes
down.


One possible way would be to combine the tombstone with a global 
generation number.


tombstone would be just the netdevice memory itself not freed, but
possibly reused.

Instead of just having a pointer to struct netdevice in the skb have 
a pointer and a generation number. Each use checks the generation

number and if it doesn't match the net devices throw away the
skb because its device is gone. This check is cheap on SMP because 
it's fully read only and can happen with shared cache lines.


Then never free struct net_devices, but only reuse them with
higher generation numbers.

The "use" could be some central place like dev_queue_xmit()
and perhaps some places in netfilter. I don't think every
reference of skb->dev would need to check it because most could
probably tolerate stale devices

Disadvantages: 
- When the device is freed but not reused yet it must be 
in a state that it can be safely accessed by all users

[should be easy]
- If there is some workload that only needs temporarily a lot
of devices then the system can never recover to use less for them.
Probably not a big issue.
- It would need a few bytes in the skb (and possibly other data structures
referencing the device) to store the generation count.
- The generation number check would add a few cycles, but I guess it's much 
cheaper
than the cache misses from the reference counts.
- If the generation count wraps then skbs could be sent to the wrong devices.
With 32bit probably not a big issue.



1) Instead of storing a 2-uple {pointer,generation} (and using 12 or 16 bytes 
on 64 bits platforms), we could just use a 32 bit quantity 
[(ifindex<<8)+(gen_number)]


And have a global table (yes... thats tricky) :

struct {
struct net_device *dev;
unsigned int netgen;
} iftable[MAXDEVICES];
/*
 * We assume 'gen' is valid : not out of bounds
 */
static void *get_dev_by_netgen(unsigned int netgen)
{
unsigned int idx = (netgen >> 8);
struct net_device *dev;
BUG_ON(idx >= MAXDEVICES);
dev = iftable[idx];
if (dev && iftable[idx].netgen != netgen)
dev = NULL;
return dev;
}

This would actually save some bytes on 64bits platforms, and the cpu cycles we 
loose in validating a 'netgen' are still beter than cache misses.


2) Not easy to not free the devices because their size is variable (see 
alloc_netdev()) depending on private data for the driver.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Packet socket: directly access the mmapped ring buffer

2006-02-08 Thread Evgeniy Polyakov
On Wed, Feb 08, 2006 at 02:26:32PM +0900, FUJITA Tomonori ([EMAIL PROTECTED]) 
wrote:
> Mike Christie and I've developed the SCSI Userspace target
> framework. Target LLDs (for Fibre channel, iSCSI HBAs, etc) pass SCSI
> commands to SCSI commands to the user-space daemon. The daemon
> executes the commands and sends the results back to the LLDs.
> 
> Please refer scsi-ml for further details.
> 
> http://thread.gmane.org/gmane.linux.scsi/22409
> 
> We need efficient kernel and user-space communication interface and
> used netlink. Jeff Garzik suggested the packet socket mmap'd ring
> buffer.
> 
> The mmap'd ring buffer is really nice, but we want to access directly
> the ring buffer withough going through the networking stack to avoid
> memory allocation and overhead.

Depending on calling context, it can be not sufficient to leave sofirqs
enabled in that function.
 
> Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]>
> Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
> ---
> 
>  include/net/af_packet.h |6 ++
>  net/packet/af_packet.c  |   17 +
>  2 files changed, 23 insertions(+), 0 deletions(-)
>  create mode 100644 include/net/af_packet.h
> 
> c627f3a1da6e5e7e9e46d58401adcf168ea45787
> diff --git a/include/net/af_packet.h b/include/net/af_packet.h
> new file mode 100644
> index 000..5a75e07
> --- /dev/null
> +++ b/include/net/af_packet.h
> @@ -0,0 +1,6 @@
> +#ifndef __LINUX_NET_AFPACKET_H
> +#define __LINUX_NET_AFPACKET_H
> +
> +extern struct tpacket_hdr *packet_socket_frame(struct sock *sk);
> +
> +#endif
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 9db7dbd..b5fbd74 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -562,6 +562,23 @@ drop:
>  }
>  
>  #ifdef CONFIG_PACKET_MMAP
> +struct tpacket_hdr *packet_socket_frame(struct sock *sk)
> +{
> + struct packet_sock *po;
> + struct tpacket_hdr *h;
> +
> + po = pkt_sk(sk);
> + spin_lock(&sk->sk_receive_queue.lock);
> + h = (struct tpacket_hdr *) packet_lookup_frame(po, po->head);
> + if (h->tp_status)
> + h = ERR_PTR(-ENOBUFS);
> + else
> + po->head = po->head != po->frame_max ? po->head+1 : 0;
> + spin_unlock(&sk->sk_receive_queue.lock);
> + return h;
> +}
> +EXPORT_SYMBOL_GPL(packet_socket_frame);
> +
>  static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev, struct 
> packet_type *pt, struct net_device *orig_dev)
>  {
>   struct sock *sk;
> -- 
> 1.1.3

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Acx100-devel] [PATCH] acxsm: Add _{get,set}_encodeext and improve logging in _encode

2006-02-08 Thread Denis Vlasenko
On Tuesday 07 February 2006 22:04, Carlos Martín wrote:
> Content-Transfer-Encoding: quoted-printable

Patch was damaged. I think sending them as plain-text attachment
would work better.

> Add _{get,set}_encodeext and improve logging in _encode
> 
> The code in _{get,set}_encode has been reordered a bit so we have
> better logging (function entry and exit) and _{get,set}_encodeext have
> been implemented as a wrapper for the ieee80211 stack functions.

Applied, thanks!

--
vda
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH,2.4,SECURITY,NET] orinoco: CVE-2005-3180: Information leakage due to incorrect padding

2006-02-08 Thread Horms
> [PATCH] Better fixup for the orinoco driver
> 
> The latest kernel added a pretty ugly fix for the orinoco etherleak bug
> which contains bogus skb->len checks already done by the caller and causes
> copies of all odd sized frames (which are quite common)
> 
> While the skb->len check should be ripped out the other fix is harder to do
> properly so I'm proposing for this the -mm tree only until next 2.6.x so
> that it gets tested.
> 
> Instead of copying buffers around blindly this code implements a padding
> aware version of the hermes buffer writing function which does padding as
> the buffer is loaded and thus more cleanly and without bogus 1.5K copies.
> 
> Signed-off-by: Alan Cox <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

The above is a patch included in 2.6.16 as a fix for CVE-2005-3180.  It to
be applicable to 2.4.  I have made a backport below, with the only
semi-significant change being including the ALIGN macro in orinoco.c, as it
doesn't exist in 2.4.

As yet untested

Signed-off-by: Horms <[EMAIL PROTECTED]>

index 0c06b14..b99edd3 100644
--- a/drivers/net/wireless/hermes.c
+++ b/drivers/net/wireless/hermes.c
@@ -448,6 +448,43 @@ int hermes_bap_pwrite(hermes_t *hw, int 
return err;
 }
 
+/* Write a block of data to the chip's buffer with padding if
+ * neccessary, via the BAP. Synchronization/serialization is the
+ * caller's problem. len must be even.
+ *
+ * Returns: < 0 on internal failure (errno), 0 on success, > 0 on error from 
firmware
+ */
+int hermes_bap_pwrite_pad(hermes_t *hw, int bap, const void *buf, unsigned 
data_len, unsigned len,
+ u16 id, u16 offset)
+{
+   int dreg = bap ? HERMES_DATA1 : HERMES_DATA0;
+   int err = 0;
+
+   if (len < 0 || len % 2 || data_len > len)
+   return -EINVAL;
+
+   err = hermes_bap_seek(hw, bap, id, offset);
+   if (err)
+   goto out;
+
+   /* Transfer all the complete words of data */
+   hermes_write_words(hw, dreg, buf, data_len/2);
+   /* If there is an odd byte left over pad and transfer it */
+   if (data_len & 1) {
+   u8 end[2];
+   end[1] = 0;
+   end[0] = ((unsigned char *)buf)[data_len - 1];
+   hermes_write_words(hw, dreg, end, 1);
+   data_len ++;
+   }
+   /* Now send zeros for the padding */
+   if (data_len < len)
+   hermes_clear_words(hw, dreg, (len - data_len) / 2);
+   /* Complete */
+ out:
+   return err;
+}
+
 /* Read a Length-Type-Value record from the card.
  *
  * If length is NULL, we ignore the length read from the card, and
@@ -534,6 +571,7 @@ EXPORT_SYMBOL(hermes_allocate);
 
 EXPORT_SYMBOL(hermes_bap_pread);
 EXPORT_SYMBOL(hermes_bap_pwrite);
+EXPORT_SYMBOL(hermes_bap_pwrite_pad);
 EXPORT_SYMBOL(hermes_read_ltv);
 EXPORT_SYMBOL(hermes_write_ltv);
 
index 5c01d0d..5a7e587 100644
--- a/drivers/net/wireless/hermes.h
+++ b/drivers/net/wireless/hermes.h
@@ -319,6 +319,8 @@ int hermes_bap_pread(hermes_t *hw, int b
   u16 id, u16 offset);
 int hermes_bap_pwrite(hermes_t *hw, int bap, const void *buf, unsigned len,
u16 id, u16 offset);
+int hermes_bap_pwrite_pad(hermes_t *hw, int bap, const void *buf,
+   unsigned data_len, unsigned len, u16 id, u16 offset);
 int hermes_read_ltv(hermes_t *hw, int bap, u16 rid, unsigned buflen,
u16 *length, void *buf);
 int hermes_write_ltv(hermes_t *hw, int bap, u16 rid,
index 5b5ca26..ec4003f 100644
--- a/drivers/net/wireless/orinoco.c
+++ b/drivers/net/wireless/orinoco.c
@@ -2312,6 +2312,8 @@ orinoco_stat_gather(struct net_device *d
}
 }
 
+#define ALIGN(x,a) (((x)+(a)-1)&~((a)-1))
+
 static int
 orinoco_xmit(struct sk_buff *skb, struct net_device *dev)
 {
@@ -2407,14 +2409,22 @@ orinoco_xmit(struct sk_buff *skb, struct
stats->tx_errors++;
goto fail;
}
+   /* Actual xfer length - allow for padding */
+   len = ALIGN(data_len, 2);
+   if (len < ETH_ZLEN - ETH_HLEN)
+   len = ETH_ZLEN - ETH_HLEN;
} else { /* IEEE 802.3 frame */
data_len = len + ETH_HLEN;
data_off = HERMES_802_3_OFFSET;
p = skb->data;
+   /* Actual xfer length - round up for odd length packets */
+   len = ALIGN(data_len, 2);
+   if (len < ETH_ZLEN)
+   len = ETH_ZLEN;
}
 
-   /* Round up for odd length packets */
-   err = hermes_bap_pwrite(hw, USER_BAP, p, RUP_EVEN(data_len), txfid, 
data_off);
+   err = hermes_bap_pwrite_pad(hw, USER_BAP, p, data_len, len,
+   txfid, data_off);
if (err) {
printk(KERN_ERR "%s: Error %d writing packet to BAP\n",
   dev->name, err);

Re: [PATCH] NET : SMP optimization of netdevice refcount

2006-02-08 Thread Andi Kleen
On Wednesday 08 February 2006 01:44, David S. Miller wrote:
> From: Ben Greear <[EMAIL PROTECTED]>
> Date: Tue, 07 Feb 2006 16:39:52 -0800
> 
> > Rick Jones wrote:
> > > In the realm of straw ideas, how often are netdevs added and removed, 
> > > and would leaving a tombstone behind consume too much memory?
> > 
> > In certain cases...say, with vlans, you could very often create and
> > destroy net devices.  I think that giving up and leaking the memory
> > is not a good idea.
> 
> I think he's suggesting another thing.  Reattach the skb->dev
> to some dummy device that always persists, when a device goes
> down.

One possible way would be to combine the tombstone with a global 
generation number.

tombstone would be just the netdevice memory itself not freed, but
possibly reused.

Instead of just having a pointer to struct netdevice in the skb have 
a pointer and a generation number. Each use checks the generation
number and if it doesn't match the net devices throw away the
skb because its device is gone. This check is cheap on SMP because 
it's fully read only and can happen with shared cache lines.

Then never free struct net_devices, but only reuse them with
higher generation numbers.

The "use" could be some central place like dev_queue_xmit()
and perhaps some places in netfilter. I don't think every
reference of skb->dev would need to check it because most could
probably tolerate stale devices

Disadvantages: 
- When the device is freed but not reused yet it must be 
in a state that it can be safely accessed by all users
[should be easy]
- If there is some workload that only needs temporarily a lot
of devices then the system can never recover to use less for them.
Probably not a big issue.
- It would need a few bytes in the skb (and possibly other data structures
referencing the device) to store the generation count.
- The generation number check would add a few cycles, but I guess it's much 
cheaper
than the cache misses from the reference counts.
- If the generation count wraps then skbs could be sent to the wrong devices.
With 32bit probably not a big issue.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPVS] Shrink ip_vs_*.c includes

2006-02-08 Thread Horms
On Tue, Feb 07, 2006 at 09:07:34PM -0800, David S. Miller wrote:
> From: Horms <[EMAIL PROTECTED]>
> Date: Wed, 8 Feb 2006 12:09:29 +0900
> 
> > Unfortunately this seems like it is going to be more tedious than 
> > we first thought. I would guess writing some sort of tool to analyse
> > symbols and headers is the way to go. Else it seems more or less
> > impossible to clean up headers, even on a small scale.
> 
> It's doable on a small scale, you just have to approach the problem
> from the other direction.  Ie. pick a header file and audit the use of
> that specific header file across the tree.
> 
> Folks have done this with headers like linux/sched.h and friends in
> the past.

Point taken. 

I imagine a tool could digest this reasonably efficiently.
Its just doing it by hand which is somewhat labourious.

> And it's worthwhile because anything that minimises kernel rebuild
> when touching a header file helps streamline development.

Agreed

-- 
Horms
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPVS] Shrink ip_vs_*.c includes

2006-02-08 Thread Roberto Nibali

The correct way to go about this is to go through each included header
file and check if any of its symbols are used in the source file.

Or if this is too tedious just leave it alone.


Hi Herbert, 

thanks for your feedback. 

Dave, 


please discard this patch for now.

Ratz,

Unfortunately this seems like it is going to be more tedious than 
we first thought. I would guess writing some sort of tool to analyse

symbols and headers is the way to go. Else it seems more or less
impossible to clean up headers, even on a small scale.


I've thought about the linux/modules.h header (ignoring the fact 
mentioned by Herbert because of its small chance of happening) and my 
suggestion was/is to move it to ip_vs.h. I'll ping Arnaldo regarding 
this cleanup, since he's done it before and certainly has some scripts 
to automate this.


Thanks for the review and discussion,
Roberto Nibali, ratz
--
echo 
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html