Re: [PATCH 0/7] NetXen: Make driver use multiple PCI functions
On Tuesday 24 April 2007 23:31, Jeff Garzik wrote: Mithlesh Thukral wrote: hi All, Thanks Stephen for your suggestion. I am resending the 7 patches after incorporating the suggestion. These patches are with respect to netdev#upstream and we wish their inclusion in 2.6.22 kernel. Out of these the first 2 patches were already accepted into the netdev tree, but we have requested them to be dropped. So we are resending those 2. Please see the following thread for more details : http://www.spinics.net/lists/netdev/msg26805.html So what does that mean? If the patches were accepted, then you must send further patches cumulative to what is currently in the tree. If it is already accepted, you cannot drop a patch. My apologies for the confusion created by my email. We wish inclusion of these 7 patches in the netdev tree. All these patches are cumulative with what is currently present in the tree. Thanks, Mithlesh Thukral Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]:Replace with time_before in net/ipv4/ipip.c
Hi, Replacing (jiffies - errtime TIMEOUT) with time_before in net/ipv4/ipip.c thanks. Signed-off-by: Shani Moideen [EMAIL PROTECTED] diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index 3ec5ce0..d2bc835 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -108,6 +108,7 @@ #include linux/init.h #include linux/netfilter_ipv4.h #include linux/if_ether.h +#include linux/jiffies.h #include net/sock.h #include net/ip.h @@ -324,7 +325,7 @@ static int ipip_err(struct sk_buff *skb, u32 info) if (t-parms.iph.ttl == 0 type == ICMP_TIME_EXCEEDED) goto out; - if (jiffies - t-err_time IPTUNNEL_ERR_TIMEO) + if (time_before(jiffies , t-err_time + IPTUNNEL_ERR_TIMEO)) t-err_count++; else t-err_count = 1; @@ -590,7 +591,7 @@ static int ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) } if (tunnel-err_count 0) { - if (jiffies - tunnel-err_time IPTUNNEL_ERR_TIMEO) { + if (time_before(jiffies , tunnel-err_time + IPTUNNEL_ERR_TIMEO)) { tunnel-err_count--; dst_link_failure(skb); } else -- Shani - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]:Replacing with time_before in net/ipv4/ip_gre.c
Hi, Replacing with time_before in net/ipv4/ip_gre.c thanks. Signed-off-by: Shani Moideen [EMAIL PROTECTED] diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 9151da6..05cd63b 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -28,6 +28,7 @@ #include linux/igmp.h #include linux/netfilter_ipv4.h #include linux/if_ether.h +#include linux/jiffies.h #include net/sock.h #include net/ip.h @@ -376,7 +377,7 @@ static void ipgre_err(struct sk_buff *skb, u32 info) if (t-parms.iph.ttl == 0 type == ICMP_TIME_EXCEEDED) goto out; - if (jiffies - t-err_time IPTUNNEL_ERR_TIMEO) + if (time_before(jiffies , t-err_time + IPTUNNEL_ERR_TIMEO)) t-err_count++; else t-err_count = 1; @@ -801,7 +802,7 @@ static int ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) #endif if (tunnel-err_count 0) { - if (jiffies - tunnel-err_time IPTUNNEL_ERR_TIMEO) { + if (time_before(jiffies , tunnel-err_time + IPTUNNEL_ERR_TIMEO)) { tunnel-err_count--; dst_link_failure(skb); Shani - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fix ipOutNoRoutes counter error for TCP and UDP
Hi Mr. David I have modified my patch according to you advice. I think - EHOSTUNREACH is only for input path. In output path, we can just simply check -ENETUNREACH (^_^), the patch is shown in the end of this mail. BTW: my E-mail has been changed to [EMAIL PROTECTED] Function need to fix: tcp_v4_connect(); ip4_datagram_connect(); udp_sendmsg(); I think we need to make these checks more carefully. Route lookup can fail for several reasons other than no route being available. Two examples are: 1) Out of memory error while creating route 2) IPSEC disallows communication to that flow ID As a result, we'll probably best limiting the counter increment when the error is either -EHOSTUNREACH or -ENETUNREACH. signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruNp a/net/ipv4/datagram.c b/net/ipv4/datagram.c --- a/net/ipv4/datagram.c 2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/datagram.c 2007-04-25 15:21:42.0 +0800 @@ -50,8 +50,12 @@ int ip4_datagram_connect(struct sock *sk RT_CONN_FLAGS(sk), oif, sk-sk_protocol, inet-sport, usin-sin_port, sk); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return err; + } + if ((rt-rt_flags RTCF_BROADCAST) !sock_flag(sk, SOCK_BROADCAST)) { ip_rt_put(rt); return -EACCES; diff -ruNp a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c 2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/tcp_ipv4.c 2007-04-25 15:21:42.0 +0800 @@ -192,8 +192,11 @@ int tcp_v4_connect(struct sock *sk, stru RT_CONN_FLAGS(sk), sk-sk_bound_dev_if, IPPROTO_TCP, inet-sport, usin-sin_port, sk); - if (tmp 0) + if (tmp 0) { + if (tmp == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return tmp; + } if (rt-rt_flags (RTCF_MULTICAST | RTCF_BROADCAST)) { ip_rt_put(rt); diff -ruNp a/net/ipv4/udp.c b/net/ipv4/udp.c --- a/net/ipv4/udp.c2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/udp.c2007-04-25 15:21:42.0 +0800 @@ -630,8 +630,11 @@ int udp_sendmsg(struct kiocb *iocb, stru .dport = dport } } }; security_sk_classify_flow(sk, fl); err = ip_route_output_flow(rt, fl, sk, !(msg-msg_flagsMSG_DONTWAIT)); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); goto out; + } err = -EACCES; if ((rt-rt_flags RTCF_BROADCAST) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Getting the new RxRPC patches upstream
Oleg Nesterov [EMAIL PROTECTED] wrote: Yes sure. Note that this is documented: /* * Kill off a pending schedule_delayed_work(). Note that the work callback * function may still be running on return from cancel_delayed_work(). Run * flush_workqueue() or cancel_work_sync() to wait on it. */ No, it isn't documented. It says that the *work* callback may be running, but does not mention the timer callback. However, just looking at the cancellation function source made it clear that this would wait for the timer handler to return first. However, is it worth just making cancel_delayed_work() a void function and not returning anything? I'm not sure the return value is very useful. David - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]:Replace with time_before in net/ipv4/ipip.c
From: Shani Moideen [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 11:30:13 +0530 Replacing (jiffies - errtime TIMEOUT) with time_before in net/ipv4/ipip.c thanks. Signed-off-by: Shani Moideen [EMAIL PROTECTED] The test you are replacing actually gives a larger window of acceptance than time_before() does. It's been a long standing issue whether we should give up this semantic advantage for the sake of code cleanliness in the networking code. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fix ipOutNoRoutes counter error for TCP and UDP
Please do not post in HTML, nobody will read it, including me. Please use plain ASCII text for all mailing list postings, especially those containing patches. Thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8057] New: slab corruption running ip6sic
* Herbert Xu ([EMAIL PROTECTED]) wrote: Jarek Poplawski [EMAIL PROTECTED] wrote: My proposal is: maybe Eric could change this in xfrm6_tunnel_rcv() from xfrm6_tunnel.c e.g. like this: return xfrm6_rcv_spi(skb, spi) 0 ? : 0; and, if no errors in testing, he could resubmit this patch? I agree, this is the right fix. The fix proposed by Jarek indeed fixes the problem, tested on two boxes, with an -rc5 kernel and a yesterdays git Acked-by: Eric Sesterhenn [EMAIL PROTECTED] --- linux-2.6/net/ipv6/xfrm6_tunnel.c.orig 2007-04-25 00:22:30.0 +0200 +++ linux-2.6/net/ipv6/xfrm6_tunnel.c 2007-04-25 00:22:45.0 +0200 @@ -261,7 +261,7 @@ static int xfrm6_tunnel_rcv(struct sk_bu __be32 spi; spi = xfrm6_tunnel_spi_lookup((xfrm_address_t *)iph-saddr); - return xfrm6_rcv_spi(skb, spi); + return xfrm6_rcv_spi(skb, spi) 0 ? : 0; } static int xfrm6_tunnel_err(struct sk_buff *skb, struct inet6_skb_parm *opt, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with commit a0ee18b9b7d3847976c6fb315c06a34fb296de0e
Hi, On Tuesday 24 April 2007 00:23:13 Ismail Dönmez wrote: On Tuesday 24 April 2007 00:17:40 Thomas Graf wrote: * Ismail D?nmez [EMAIL PROTECTED] 2007-04-23 22:09 Yes I know the fix is in but I wondered why its creating such problems with 2.6.18 kernel, guess it depends on some other commits. As long as you apply the complete patch including the additional sanity check for RTN_MAX it should work perfectly fine on 2.6.18. The sanity check part doesn't seem to apply to 2.6.18. I can't think of any connection between the patch and the errors you are seeing. Are you absolutely sure the errors you see are directly connected to applying the patch? Yes actually I am but I'll re-test and see. Thanks. I was able to reproduce the same problem with Linus' GIT tree too. Since I started to see these after I applied the commit a0ee18b9b7d3847976c6fb315c06a34fb296de0e to 2.6.18 tree, there is a big possiblity that the commit is the culprit. I attach the relevant dmesg messages. Problem happened after 12 hours of uptime and and net connection gets stable again after 1-2 minutes. Regards, ismail -- Life is a game, and if you aren't in it to win, what the heck are you still doing here? -- Linus Torvalds (talking about open source development) Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. printk: 1 messages suppressed. Neighbour table overflow. printk: 4 messages suppressed. Neighbour table overflow. printk: 3 messages suppressed. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. printk: 3 messages suppressed. Neighbour table overflow. printk: 15 messages suppressed. Neighbour table overflow. printk: 6 messages suppressed. Neighbour table overflow. printk: 13 messages suppressed. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. Neighbour table overflow. printk: 2 messages suppressed. Neighbour table overflow. printk: 1 messages suppressed. Neighbour table overflow. Neighbour table overflow.
Re: Fix ipOutNoRoutes counter error for TCP and UDP
Hi Mr. David I have modified my patch according to you advice. I think - EHOSTUNREACH is only for input path. In output path, we can just simply check-ENETUNREACH (^_^), the patch is shown in the end of this mail. BTW: my E-mail has been changed to [EMAIL PROTECTED] Function need to fix: tcp_v4_connect(); ip4_datagram_connect(); udp_sendmsg(); I think we need to make these checks more carefully. Route lookup can fail for several reasons other than no route being available. Two examples are: 1) Out of memory error while creating route 2) IPSEC disallows communication to that flow ID As a result, we'll probably best limiting the counter increment when the error is either -EHOSTUNREACH or -ENETUNREACH. signed-off-by: Wei Dong [EMAIL PROTECTED] diff -ruNp a/net/ipv4/datagram.c b/net/ipv4/datagram.c --- a/net/ipv4/datagram.c 2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/datagram.c 2007-04-25 15:21:42.0 +0800 @@ -50,8 +50,12 @@ int ip4_datagram_connect(struct sock *sk RT_CONN_FLAGS(sk), oif, sk-sk_protocol, inet-sport, usin-sin_port, sk); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return err; + } + if ((rt-rt_flags RTCF_BROADCAST) !sock_flag(sk, SOCK_BROADCAST)) { ip_rt_put(rt); return -EACCES; diff -ruNp a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c 2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/tcp_ipv4.c 2007-04-25 15:21:42.0 +0800 @@ -192,8 +192,11 @@ int tcp_v4_connect(struct sock *sk, stru RT_CONN_FLAGS(sk), sk-sk_bound_dev_if, IPPROTO_TCP, inet-sport, usin-sin_port, sk); - if (tmp 0) + if (tmp 0) { + if (tmp == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); return tmp; + } if (rt-rt_flags (RTCF_MULTICAST | RTCF_BROADCAST)) { ip_rt_put(rt); diff -ruNp a/net/ipv4/udp.c b/net/ipv4/udp.c --- a/net/ipv4/udp.c2007-04-25 15:20:19.0 +0800 +++ b/net/ipv4/udp.c2007-04-25 15:21:42.0 +0800 @@ -630,8 +630,11 @@ int udp_sendmsg(struct kiocb *iocb, stru .dport = dport } } }; security_sk_classify_flow(sk, fl); err = ip_route_output_flow(rt, fl, sk, !(msg-msg_flagsMSG_DONTWAIT)); - if (err) + if (err) { + if (err == -ENETUNREACH) + IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES); goto out; + } err = -EACCES; if ((rt-rt_flags RTCF_BROADCAST) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Getting the new RxRPC patches upstream
On 04/25, David Howells wrote: Oleg Nesterov [EMAIL PROTECTED] wrote: Yes sure. Note that this is documented: /* * Kill off a pending schedule_delayed_work(). Note that the work callback * function may still be running on return from cancel_delayed_work(). Run * flush_workqueue() or cancel_work_sync() to wait on it. */ No, it isn't documented. It says that the *work* callback may be running, but does not mention the timer callback. However, just looking at the cancellation function source made it clear that this would wait for the timer handler to return first. Ah yes, it says nothing about what the returned value means... However, is it worth just making cancel_delayed_work() a void function and not returning anything? I'm not sure the return value is very useful. cancel_rearming_delayed_work() needs this, tty_io.c, probably somebody else. Oleg. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Getting the new RxRPC patches upstream
Oleg Nesterov [EMAIL PROTECTED] wrote: Ah yes, it says nothing about what the returned value means... Yeah... If you could amend that as part of your patch, that'd be great. David - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/16] AF_RXRPC socket family and AFS rewrite [try #3]
The first of these patches together provide secure client-side RxRPC connectivity as a Linux kernel socket family. Only the RxRPC transport/session side is supplied - the presentation side (marshalling the data) is left to the client. Copies of the patches can be found here: http://people.redhat.com/~dhowells/rxrpc/series http://people.redhat.com/~dhowells/rxrpc/01-move-skb-generic.diff http://people.redhat.com/~dhowells/rxrpc/02-cancel_delayed_work.diff http://people.redhat.com/~dhowells/rxrpc/03-keys.diff http://people.redhat.com/~dhowells/rxrpc/04-timer-exports.diff http://people.redhat.com/~dhowells/rxrpc/05-af_rxrpc.diff Further patches make the in-kernel AFS filesystem use AF_RXRPC and delete the old RxRPC implementation: http://people.redhat.com/~dhowells/rxrpc/06-afs-cleanup.diff http://people.redhat.com/~dhowells/rxrpc/07-af_rxrpc-kernel.diff http://people.redhat.com/~dhowells/rxrpc/08-af_rxrpc-afs.diff http://people.redhat.com/~dhowells/rxrpc/09-af_rxrpc-delete-old.diff And then the rest of the patches extend AFS to provide automatic unmounting of automount trees, security support and directory-level write support (create, mkdir, etc.): http://people.redhat.com/~dhowells/rxrpc/10-afs-multimount.diff http://people.redhat.com/~dhowells/rxrpc/11-afs-security.diff http://people.redhat.com/~dhowells/rxrpc/12-afs-doc.diff http://people.redhat.com/~dhowells/rxrpc/13-netlink-support-MSG_TRUNC.diff http://people.redhat.com/~dhowells/rxrpc/14-afs-get-capabilities.diff http://people.redhat.com/~dhowells/rxrpc/15-afs-initcallbackstate3.diff http://people.redhat.com/~dhowells/rxrpc/16-afs-dir-write-support.diff Note that file-level write support is not yet complete and so is not included in this patch set. The userspace access methods make use of the control data passed to/by sendmsg() and recvmsg(). See the three simple test programs: http://people.redhat.com/~dhowells/rxrpc/klog.c http://people.redhat.com/~dhowells/rxrpc/rxrpc.c http://people.redhat.com/~dhowells/rxrpc/listen.c The klog program is provided to go and get a Kerberos IV key from the AFS kaserver. Currently it must be edited before compiling to note the right server IP address and the appropriate credentials. These programs can be compiled by: make klog rxrpc listen CFLAGS=-Wall -g LDLIBS=-lcrypto -lcrypt -lkrb4 -lkeyutils Then a ticket can be obtained by: ./klog If a security key is acquired in this way, then all subsequent AFS operations - including VL lookups and mounts - performed with that session keyring will be authenticated using that key. The key can be viewed like so: [EMAIL PROTECTED] ~]# keyctl show Session Keyring -3 --alswrv 0 0 keyring: _ses.3268 2 --alswrv 0 0 \_ keyring: _uid.0 111416553 --als--v 0 0 \_ rxrpc: [EMAIL PROTECTED] TODO: (*) Make certain parameters (such as connection timeouts) userspace configurable. (*) Make userspace utilities use it; librxrpc. (*) Userspace documentation. (*) KerberosV security. Changes: (*) SOCK_RPC has been removed. SOCK_DGRAM is now used instead. (*) I've add a facility whereby calls can be made to destinations other than the connect() address of a client socket by making use of msg_name in the msghdr struct when using sendmsg() to send the first data packet of a call. Indeed, a client socket need not be connected before being used so. (*) I've also added a facility whereby client calls may also be made on server sockets, again by using msg_name in the msghdr struct. In such a case, the server's local transport endpoint is used. (*) I've made the write buffer space check available to various callers (sk_write_space) and implemented poll support. (*) Rewrote rxrpc_recvmsg(). It now concatenates adjacent data messages from the same call when delivering them. (*) Updated the documentation to include notes on recvmsg, cover control messages and cover SOL_RXRPC-level socket options. (*) Provided an in-kernel interface to give in-kernel utilities easier access to the facility. (*) Made fs/afs/ use it. (*) Deleted the old contents of net/rxrpc/. (*) Use the scatterlist interface to the crypto API for now. The patch that added the direct access interface conflicts with patches Herbert Xu is producing, so I've dropped it for the moment. (*) Moved a bug fix to make secure connection reuse work from the af_rxrpc-kernel patch to the af_rxrpc main patch. (*) Make RxRPC use its own private work queues rather than keventd's to avoid deadlocks when AFS tries to use keventd too. This also puts encryption in the private work queue rather than keventd's queue as that might take a relatively long time to
[PATCH 03/16] AF_RXRPC: Key facility changes for AF_RXRPC [try #3]
Export the keyring key type definition and document its availability. Add alternative types into the key's type_data union to make it more useful. Not all users necessarily want to use it as a list_head (AF_RXRPC doesn't, for example), so make it clear that it can be used in other ways. Signed-Off-By: David Howells [EMAIL PROTECTED] --- Documentation/keys.txt | 12 include/linux/key.h |2 ++ security/keys/keyring.c |2 ++ 3 files changed, 16 insertions(+), 0 deletions(-) diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 60c665d..81d9aa0 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -859,6 +859,18 @@ payload contents for more information. void unregister_key_type(struct key_type *type); +Under some circumstances, it may be desirable to desirable to deal with a +bundle of keys. The facility provides access to the keyring type for managing +such a bundle: + + struct key_type key_type_keyring; + +This can be used with a function such as request_key() to find a specific +keyring in a process's keyrings. A keyring thus found can then be searched +with keyring_search(). Note that it is not possible to use request_key() to +search a specific keyring, so using keyrings in this way is of limited utility. + + === NOTES ON ACCESSING PAYLOAD CONTENTS === diff --git a/include/linux/key.h b/include/linux/key.h index 169f05e..a9220e7 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -160,6 +160,8 @@ struct key { */ union { struct list_headlink; + unsigned long x[2]; + void*p[2]; } type_data; /* key data diff --git a/security/keys/keyring.c b/security/keys/keyring.c index ad45ce7..88292e3 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -66,6 +66,8 @@ struct key_type key_type_keyring = { .read = keyring_read, }; +EXPORT_SYMBOL(key_type_keyring); + /* * semaphore to serialise link/link calls to prevent two link calls in parallel * introducing a cycle - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/16] AF_RXRPC: Move generic skbuff stuff from XFRM code to generic code [try #3]
Move generic skbuff stuff from XFRM code to generic code so that AF_RXRPC can use it too. The kdoc comments I've attached to the functions needs to be checked by whoever wrote them as I had to make some guesses about the workings of these functions. Signed-Off-By: David Howells [EMAIL PROTECTED] --- include/linux/skbuff.h |6 ++ include/net/esp.h |2 - net/core/skbuff.c | 188 net/xfrm/xfrm_algo.c | 169 --- 4 files changed, 194 insertions(+), 171 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5992f65..c905d42 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -83,6 +83,7 @@ */ struct net_device; +struct scatterlist; #ifdef CONFIG_NETFILTER struct nf_conntrack { @@ -361,6 +362,11 @@ extern struct sk_buff *skb_realloc_headroom(struct sk_buff *skb, extern struct sk_buff *skb_copy_expand(const struct sk_buff *skb, int newheadroom, int newtailroom, gfp_t priority); +extern intskb_to_sgvec(struct sk_buff *skb, + struct scatterlist *sg, int offset, + int len); +extern intskb_cow_data(struct sk_buff *skb, int tailbits, + struct sk_buff **trailer); extern intskb_pad(struct sk_buff *skb, int pad); #define dev_kfree_skb(a) kfree_skb(a) extern void skb_over_panic(struct sk_buff *skb, int len, diff --git a/include/net/esp.h b/include/net/esp.h index 713d039..d05d8d2 100644 --- a/include/net/esp.h +++ b/include/net/esp.h @@ -40,8 +40,6 @@ struct esp_data } auth; }; -extern int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len); -extern int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer); extern void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len); static inline int esp_mac_digest(struct esp_data *esp, struct sk_buff *skb, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 336958f..aa02bd4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -55,6 +55,7 @@ #include linux/cache.h #include linux/rtnetlink.h #include linux/init.h +#include linux/scatterlist.h #include net/protocol.h #include net/dst.h @@ -2005,6 +2006,190 @@ void __init skb_init(void) NULL, NULL); } +/** + * skb_to_sgvec - Fill a scatter-gather list from a socket buffer + * @skb: Socket buffer containing the buffers to be mapped + * @sg: The scatter-gather list to map into + * @offset: The offset into the buffer's contents to start mapping + * @len: Length of buffer space to be mapped + * + * Fill the specified scatter-gather list with mappings/pointers into a + * region of the buffer space attached to a socket buffer. + */ +int +skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len) +{ + int start = skb_headlen(skb); + int i, copy = start - offset; + int elt = 0; + + if (copy 0) { + if (copy len) + copy = len; + sg[elt].page = virt_to_page(skb-data + offset); + sg[elt].offset = (unsigned long)(skb-data + offset) % PAGE_SIZE; + sg[elt].length = copy; + elt++; + if ((len -= copy) == 0) + return elt; + offset += copy; + } + + for (i = 0; i skb_shinfo(skb)-nr_frags; i++) { + int end; + + BUG_TRAP(start = offset + len); + + end = start + skb_shinfo(skb)-frags[i].size; + if ((copy = end - offset) 0) { + skb_frag_t *frag = skb_shinfo(skb)-frags[i]; + + if (copy len) + copy = len; + sg[elt].page = frag-page; + sg[elt].offset = frag-page_offset+offset-start; + sg[elt].length = copy; + elt++; + if (!(len -= copy)) + return elt; + offset += copy; + } + start = end; + } + + if (skb_shinfo(skb)-frag_list) { + struct sk_buff *list = skb_shinfo(skb)-frag_list; + + for (; list; list = list-next) { + int end; + + BUG_TRAP(start = offset + len); + + end = start + list-len; + if ((copy = end - offset) 0) { + if (copy len) + copy = len; + elt += skb_to_sgvec(list, sg+elt, offset - start, copy); + if ((len -= copy) == 0)
[PATCH 02/16] cancel_delayed_work: use del_timer() instead of del_timer_sync() [try #3]
del_timer_sync() buys nothing for cancel_delayed_work(), but it is less efficient since it locks the timer unconditionally, and may wait for the completion of the delayed_work_timer_fn(). cancel_delayed_work() == 0 means: before this patch: work-func may still be running or queued after this patch: work-func may still be running or queued, or delayed_work_timer_fn-__queue_work() in progress. The latter doesn't differ from the caller's POV, delayed_work_timer_fn() is called with _PENDING bit set. cancel_delayed_work() == 1 with this patch adds a new possibility: delayed_work-work was cancelled, but delayed_work_timer_fn is still running (this is only possible for the re-arming works on single-threaded workqueue). In this case the timer was re-started by work-func(), nobody else can do this. This in turn means that delayed_work_timer_fn has already passed __queue_work() (and wont't touch delayed_work) because nobody else can queue delayed_work-work. Signed-off-by: Oleg Nesterov [EMAIL PROTECTED] Signed-Off-By: David Howells [EMAIL PROTECTED] --- include/linux/workqueue.h |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 2a7b38d..b8abfc7 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -191,14 +191,15 @@ int execute_in_process_context(work_func_t fn, struct execute_work *); /* * Kill off a pending schedule_delayed_work(). Note that the work callback - * function may still be running on return from cancel_delayed_work(). Run - * flush_scheduled_work() to wait on it. + * function may still be running on return from cancel_delayed_work(), unless + * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or + * cancel_work_sync() to wait on it. */ static inline int cancel_delayed_work(struct delayed_work *work) { int ret; - ret = del_timer_sync(work-timer); + ret = del_timer(work-timer); if (ret) work_release(work-work); return ret; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/16] AFS: Handle multiple mounts of an AFS superblock correctly [try #3]
Handle multiple mounts of an AFS superblock correctly, checking to see whether the superblock is already initialised after calling sget() rather than just unconditionally stamping all over it. Also delete the silent parameter to afs_fill_super() as it's not used and can, in any case, be obtained from sb-s_flags. Signed-Off-By: David Howells [EMAIL PROTECTED] --- fs/afs/super.c | 26 -- 1 files changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/afs/super.c b/fs/afs/super.c index efc4fe6..77e6875 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -212,7 +212,7 @@ static int afs_test_super(struct super_block *sb, void *data) /* * fill in the superblock */ -static int afs_fill_super(struct super_block *sb, void *data, int silent) +static int afs_fill_super(struct super_block *sb, void *data) { struct afs_mount_params *params = data; struct afs_super_info *as = NULL; @@ -319,17 +319,23 @@ static int afs_get_sb(struct file_system_type *fs_type, goto error; } - sb-s_flags = flags; - - ret = afs_fill_super(sb, params, flags MS_SILENT ? 1 : 0); - if (ret 0) { - up_write(sb-s_umount); - deactivate_super(sb); - goto error; + if (!sb-s_root) { + /* initial superblock/root creation */ + _debug(create); + sb-s_flags = flags; + ret = afs_fill_super(sb, params); + if (ret 0) { + up_write(sb-s_umount); + deactivate_super(sb); + goto error; + } + sb-s_flags |= MS_ACTIVE; + } else { + _debug(reuse); + ASSERTCMP(sb-s_flags, , MS_ACTIVE); } - sb-s_flags |= MS_ACTIVE; - simple_set_mnt(mnt, sb); + simple_set_mnt(mnt, sb); afs_put_volume(params.volume); afs_put_cell(params.default_cell); _leave( = 0 [%p], sb); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/16] commit ad495d7b6cfcd1bc2eaf06c42699be0bb5d84234 [try #3]
[NETLINK]: Mirror UDP MSG_TRUNC semantics. If the user passes MSG_TRUNC in via msg_flags, return the full packet size not the truncated size. Idea from Herbert Xu and Thomas Graf. Signed-off-by: David S. Miller [EMAIL PROTECTED] --- net/netlink/af_netlink.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index c48b0f4..5890210 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1242,6 +1242,9 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock, scm_recv(sock, msg, siocb-scm, flags); + if (flags MSG_TRUNC) + copied = skb-len; + out: netlink_rcv_wake(sk); return err ? : copied; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/16] AFS: Implement the CB.InitCallBackState3 operation [try #3]
Implement the CB.InitCallBackState3 operation for the fileserver to call. This reduces the amount of network traffic because if this op is aborted, the fileserver will then attempt an CB.InitCallBackState operation. Signed-Off-By: David Howells [EMAIL PROTECTED] --- fs/afs/AFS_CM.h|1 + fs/afs/cmservice.c | 46 ++ 2 files changed, 47 insertions(+), 0 deletions(-) diff --git a/fs/afs/AFS_CM.h b/fs/afs/AFS_CM.h index d4bd201..7b4d4fa 100644 --- a/fs/afs/AFS_CM.h +++ b/fs/afs/AFS_CM.h @@ -23,6 +23,7 @@ enum AFS_CM_Operations { CBGetCE = 208, /* get cache file description */ CBGetXStatsVersion = 209, /* get version of extended statistics */ CBGetXStats = 210, /* get contents of extended statistics data */ + CBInitCallBackState3= 213, /* initialise callback state, version 3 */ CBGetCapabilities = 65538, /* get client capabilities */ }; diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 5139723..3d58861 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -20,6 +20,8 @@ struct workqueue_struct *afs_cm_workqueue; static int afs_deliver_cb_init_call_back_state(struct afs_call *, struct sk_buff *, bool); +static int afs_deliver_cb_init_call_back_state3(struct afs_call *, + struct sk_buff *, bool); static int afs_deliver_cb_probe(struct afs_call *, struct sk_buff *, bool); static int afs_deliver_cb_callback(struct afs_call *, struct sk_buff *, bool); static int afs_deliver_cb_get_capabilities(struct afs_call *, struct sk_buff *, @@ -47,6 +49,16 @@ static const struct afs_call_type afs_SRXCBInitCallBackState = { }; /* + * CB.InitCallBackState3 operation type + */ +static const struct afs_call_type afs_SRXCBInitCallBackState3 = { + .name = CB.InitCallBackState3, + .deliver= afs_deliver_cb_init_call_back_state3, + .abort_to_error = afs_abort_to_error, + .destructor = afs_cm_destructor, +}; + +/* * CB.Probe operation type */ static const struct afs_call_type afs_SRXCBProbe = { @@ -83,6 +95,9 @@ bool afs_cm_incoming_call(struct afs_call *call) case CBInitCallBackState: call-type = afs_SRXCBInitCallBackState; return true; + case CBInitCallBackState3: + call-type = afs_SRXCBInitCallBackState3; + return true; case CBProbe: call-type = afs_SRXCBProbe; return true; @@ -312,6 +327,37 @@ static int afs_deliver_cb_init_call_back_state(struct afs_call *call, } /* + * deliver request data to a CB.InitCallBackState3 call + */ +static int afs_deliver_cb_init_call_back_state3(struct afs_call *call, + struct sk_buff *skb, + bool last) +{ + struct afs_server *server; + struct in_addr addr; + + _enter(,{%u},%d, skb-len, last); + + if (!last) + return 0; + + /* no unmarshalling required */ + call-state = AFS_CALL_REPLYING; + + /* we'll need the file server record as that tells us which set of +* vnodes to operate upon */ + memcpy(addr, skb-nh.iph-saddr, 4); + server = afs_find_server(addr); + if (!server) + return -ENOTCONN; + call-server = server; + + INIT_WORK(call-work, SRXAFSCB_InitCallBackState); + schedule_work(call-work); + return 0; +} + +/* * allow the fileserver to see if the cache manager is still alive */ static void SRXAFSCB_Probe(struct work_struct *work) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/16] AFS: Add support for the CB.GetCapabilities operation [try #3]
Add support for the CB.GetCapabilities operation with which the fileserver can ask the client for the following information: (1) The list of network interfaces it has available as IPv4 address + netmask plus the MTUs. (2) The client's UUID. (3) The extended capabilities of the client, for which the only current one is unified error mapping (abort code interpretation). To support this, the patch adds the following routines to AFS: (1) A function to iterate through all the network interfaces using RTNETLINK to extract IPv4 addresses and MTUs. (2) A function to iterate through all the network interfaces using RTNETLINK to pull out the MAC address of the lowest index interface to use in UUID construction. Signed-Off-By: David Howells [EMAIL PROTECTED] --- fs/afs/AFS_CM.h|3 fs/afs/Makefile|1 fs/afs/cmservice.c | 98 ++ fs/afs/internal.h | 42 fs/afs/main.c | 49 + fs/afs/rxrpc.c | 39 fs/afs/use-rtnetlink.c | 473 7 files changed, 705 insertions(+), 0 deletions(-) diff --git a/fs/afs/AFS_CM.h b/fs/afs/AFS_CM.h index 7c8e3d4..d4bd201 100644 --- a/fs/afs/AFS_CM.h +++ b/fs/afs/AFS_CM.h @@ -23,6 +23,9 @@ enum AFS_CM_Operations { CBGetCE = 208, /* get cache file description */ CBGetXStatsVersion = 209, /* get version of extended statistics */ CBGetXStats = 210, /* get contents of extended statistics data */ + CBGetCapabilities = 65538, /* get client capabilities */ }; +#define AFS_CAP_ERROR_TRANSLATION 0x1 + #endif /* AFS_FS_H */ diff --git a/fs/afs/Makefile b/fs/afs/Makefile index cca198b..01545eb 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -18,6 +18,7 @@ kafs-objs := \ security.o \ server.o \ super.o \ + use-rtnetlink.o \ vlclient.o \ vlocation.o \ vnode.o \ diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 9cb3ac5..5139723 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -22,6 +22,8 @@ static int afs_deliver_cb_init_call_back_state(struct afs_call *, struct sk_buff *, bool); static int afs_deliver_cb_probe(struct afs_call *, struct sk_buff *, bool); static int afs_deliver_cb_callback(struct afs_call *, struct sk_buff *, bool); +static int afs_deliver_cb_get_capabilities(struct afs_call *, struct sk_buff *, + bool); static void afs_cm_destructor(struct afs_call *); /* @@ -55,6 +57,16 @@ static const struct afs_call_type afs_SRXCBProbe = { }; /* + * CB.GetCapabilities operation type + */ +static const struct afs_call_type afs_SRXCBGetCapabilites = { + .name = CB.GetCapabilities, + .deliver= afs_deliver_cb_get_capabilities, + .abort_to_error = afs_abort_to_error, + .destructor = afs_cm_destructor, +}; + +/* * route an incoming cache manager call * - return T if supported, F if not */ @@ -74,6 +86,9 @@ bool afs_cm_incoming_call(struct afs_call *call) case CBProbe: call-type = afs_SRXCBProbe; return true; + case CBGetCapabilities: + call-type = afs_SRXCBGetCapabilites; + return true; default: return false; } @@ -328,3 +343,86 @@ static int afs_deliver_cb_probe(struct afs_call *call, struct sk_buff *skb, schedule_work(call-work); return 0; } + +/* + * allow the fileserver to ask about the cache manager's capabilities + */ +static void SRXAFSCB_GetCapabilities(struct work_struct *work) +{ + struct afs_interface *ifs; + struct afs_call *call = container_of(work, struct afs_call, work); + int loop, nifs; + + struct { + struct /* InterfaceAddr */ { + __be32 nifs; + __be32 uuid[11]; + __be32 ifaddr[32]; + __be32 netmask[32]; + __be32 mtu[32]; + } ia; + struct /* Capabilities */ { + __be32 capcount; + __be32 caps[1]; + } cap; + } reply; + + _enter(); + + nifs = 0; + ifs = kcalloc(32, sizeof(*ifs), GFP_KERNEL); + if (ifs) { + nifs = afs_get_ipv4_interfaces(ifs, 32, false); + if (nifs 0) { + kfree(ifs); + ifs = NULL; + nifs = 0; + } + } + + memset(reply, 0, sizeof(reply)); + reply.ia.nifs = htonl(nifs); + + reply.ia.uuid[0] = htonl(afs_uuid.time_low); + reply.ia.uuid[1] = htonl(afs_uuid.time_mid); + reply.ia.uuid[2] = htonl(afs_uuid.time_hi_and_version); + reply.ia.uuid[3] = htonl((s8)
[PATCH 12/16] AFS: Update the AFS fs documentation [try #3]
Update the AFS fs documentation. Signed-Off-By: David Howells [EMAIL PROTECTED] --- Documentation/filesystems/afs.txt | 214 +++-- 1 files changed, 154 insertions(+), 60 deletions(-) diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt index 2f4237d..12ad6c7 100644 --- a/Documentation/filesystems/afs.txt +++ b/Documentation/filesystems/afs.txt @@ -1,31 +1,82 @@ + kAFS: AFS FILESYSTEM -ABOUT -= +Contents: + + - Overview. + - Usage. + - Mountpoints. + - Proc filesystem. + - The cell database. + - Security. + - Examples. + + + +OVERVIEW + -This filesystem provides a fairly simple AFS filesystem driver. It is under -development and only provides very basic facilities. It does not yet support -the following AFS features: +This filesystem provides a fairly simple secure AFS filesystem driver. It is +under development and does not yet provide the full feature set. The features +it does support include: - (*) Write support. - (*) Communications security. - (*) Local caching. - (*) pioctl() system call. - (*) Automatic mounting of embedded mountpoints. + (*) Security (currently only AFS kaserver and KerberosIV tickets). + (*) File reading. + (*) Automounting. + +It does not yet support the following AFS features: + + (*) Write support. + + (*) Local caching. + + (*) pioctl() system call. + + +=== +COMPILATION +=== + +The filesystem should be enabled by turning on the kernel configuration +options: + + CONFIG_AF_RXRPC - The RxRPC protocol transport + CONFIG_RXKAD- The RxRPC Kerberos security handler + CONFIG_AFS - The AFS filesystem + +Additionally, the following can be turned on to aid debugging: + + CONFIG_AF_RXRPC_DEBUG - Permit AF_RXRPC debugging to be enabled + CONFIG_AFS_DEBUG- Permit AFS debugging to be enabled + +They permit the debugging messages to be turned on dynamically by manipulating +the masks in the following files: + + /sys/module/af_rxrpc/parameters/debug + /sys/module/afs/parameters/debug + + += USAGE = When inserting the driver modules the root cell must be specified along with a list of volume location server IP addresses: - insmod rxrpc.o + insmod af_rxrpc.o + insmod rxkad.o insmod kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91 -The first module is a driver for the RxRPC remote operation protocol, and the -second is the actual filesystem driver for the AFS filesystem. +The first module is the AF_RXRPC network protocol driver. This provides the +RxRPC remote operation protocol and may also be accessed from userspace. See: + + Documentation/networking/rxrpc.txt + +The second module is the kerberos RxRPC security driver, and the third module +is the actual filesystem driver for the AFS filesystem. Once the module has been loaded, more modules can be added by the following procedure: @@ -33,7 +84,7 @@ procedure: echo add grand.central.org 18.7.14.88:128.2.191.224 /proc/fs/afs/cells Where the parameters to the add command are the name of a cell and a list of -volume location servers within that cell. +volume location servers within that cell, with the latter separated by colons. Filesystems can be mounted anywhere by commands similar to the following: @@ -42,11 +93,6 @@ Filesystems can be mounted anywhere by commands similar to the following: mount -t afs #root.afs. /afs mount -t afs #root.cell. /afs/cambridge - NB: When using this on Linux 2.4, the mount command has to be different, - since the filesystem doesn't have access to the device name argument: - - mount -t afs none /afs -ovol=#root.afs. - Where the initial character is either a hash or a percent symbol depending on whether you definitely want a R/W volume (hash) or whether you'd prefer a R/O volume, but are willing to use a R/W volume instead (percent). @@ -60,55 +106,66 @@ named volume will be looked up in the cell specified during insmod. Additional cells can be added through /proc (see later section). +=== MOUNTPOINTS === -AFS has a concept of mountpoints. These are specially formatted symbolic links -(of the same form as the device name passed to mount). kAFS presents these -to the user as directories that have special properties: +AFS has a concept of mountpoints. In AFS terms, these are specially formatted +symbolic links (of the same form as the device name passed to mount). kAFS +presents these to the user as directories that have a follow-link capability +(ie: symbolic link semantics). If anyone attempts to access them, they will +automatically cause the target volume to be mounted (if possible) on that site. - (*) They cannot
Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior
On Tue, 2007-24-04 at 21:05 -0700, Stephen Hemminger wrote: Peter P Waskiewicz Jr wrote: Only if this binary compatiable with older kernels. It is not. But i think that is a lesser problem, the bigger question is: Why would you need to change a qdisc just so you can support egress multiqueues? BTW, is there any reason this is being cced to lkml? cheers, jamal PS:- I havent read the kernel patches (i am congested and about 1000 messages behind on netdev) and my opinions may be influenced by an approach i have in trying to help someone fixup a wireless driver with multiqueue support. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8057] New: slab corruption running ip6sic
On Wed, Apr 25, 2007 at 10:27:59AM +0200, Eric Sesterhenn / Snakebyte wrote: * Herbert Xu ([EMAIL PROTECTED]) wrote: Jarek Poplawski [EMAIL PROTECTED] wrote: My proposal is: maybe Eric could change this in xfrm6_tunnel_rcv() from xfrm6_tunnel.c e.g. like this: return xfrm6_rcv_spi(skb, spi) 0 ? : 0; and, if no errors in testing, he could resubmit this patch? I agree, this is the right fix. The fix proposed by Jarek indeed fixes the problem, tested on two boxes, with an -rc5 kernel and a yesterdays git Acked-by: Eric Sesterhenn [EMAIL PROTECTED] --- linux-2.6/net/ipv6/xfrm6_tunnel.c.orig2007-04-25 00:22:30.0 +0200 +++ linux-2.6/net/ipv6/xfrm6_tunnel.c 2007-04-25 00:22:45.0 +0200 @@ -261,7 +261,7 @@ static int xfrm6_tunnel_rcv(struct sk_bu __be32 spi; spi = xfrm6_tunnel_spi_lookup((xfrm_address_t *)iph-saddr); - return xfrm6_rcv_spi(skb, spi); + return xfrm6_rcv_spi(skb, spi) 0 ? : 0; } static int xfrm6_tunnel_err(struct sk_buff *skb, struct inet6_skb_parm *opt, I think the main idea of fixing the problem plus testing is Eric's only credit and I've only proposed some change in placement of cosmetic value. So, Eric, considering all massive work you've done with rather feeble support, please, fix the comment and sign this patch (at least I'm not going to). I also please Andrew to change the assignement of this patch in -mm. Thanks regards, Jarek P. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PATCH] [net-2.6.22] IPv6, IPv4 Updates
Dave, Please consider pulling following commits available on net-2.6.22-20070425a-inet6-cleanup-20070425 branch at git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-dev.git. HEADLINES - [IPV6] SIT: Unify code path to get hash array index. [IPV4] IPIP: Unify code path to get hash array index. [IPV4] IP_GRE: Unify code path to get hash array index. [IPV6]: Export in6addr_any for future use. [IPV6] XFRM: Use ip6addr_any where applicable. [IPV6] NDISC: Unify main process of sending ND messages. DIFFSTAT include/linux/in6.h|2 net/ipv4/ip_gre.c | 23 ++-- net/ipv4/ipip.c| 22 +--- net/ipv6/addrconf.c|2 net/ipv6/ndisc.c | 283 ++-- net/ipv6/sit.c | 23 ++-- net/ipv6/xfrm6_input.c |4 - 7 files changed, 112 insertions(+), 247 deletions(-) CHANGESETS -- commit ed808452811f1b5b55727ab6c5336a488d5689b4 Author: YOSHIFUJI Hideaki [EMAIL PROTECTED] Date: Tue Apr 24 20:44:47 2007 +0900 [IPV6] SIT: Unify code path to get hash array index. Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 27fe10f..1efa95a 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -99,10 +99,10 @@ static struct ip_tunnel * ipip6_tunnel_lookup(__be32 remote, __be32 local) return NULL; } -static struct ip_tunnel ** ipip6_bucket(struct ip_tunnel *t) +static struct ip_tunnel **__ipip6_bucket(struct ip_tunnel_parm *parms) { - __be32 remote = t-parms.iph.daddr; - __be32 local = t-parms.iph.saddr; + __be32 remote = parms-iph.daddr; + __be32 local = parms-iph.saddr; unsigned h = 0; int prio = 0; @@ -117,6 +117,11 @@ static struct ip_tunnel ** ipip6_bucket(struct ip_tunnel *t) return tunnels[prio][h]; } +static inline struct ip_tunnel **ipip6_bucket(struct ip_tunnel *t) +{ + return __ipip6_bucket(t-parms); +} + static void ipip6_tunnel_unlink(struct ip_tunnel *t) { struct ip_tunnel **tp; @@ -147,19 +152,9 @@ static struct ip_tunnel * ipip6_tunnel_locate(struct ip_tunnel_parm *parms, int __be32 local = parms-iph.saddr; struct ip_tunnel *t, **tp, *nt; struct net_device *dev; - unsigned h = 0; - int prio = 0; char name[IFNAMSIZ]; - if (remote) { - prio |= 2; - h ^= HASH(remote); - } - if (local) { - prio |= 1; - h ^= HASH(local); - } - for (tp = tunnels[prio][h]; (t = *tp) != NULL; tp = t-next) { + for (tp = __ipip6_bucket(parms); (t = *tp) != NULL; tp = t-next) { if (local == t-parms.iph.saddr remote == t-parms.iph.daddr) return t; } --- commit 2f66586f53dd6319323c7d0c6ac0d4a4fb522865 Author: YOSHIFUJI Hideaki [EMAIL PROTECTED] Date: Tue Apr 24 20:44:47 2007 +0900 [IPV4] IPIP: Unify code path to get hash array index. Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index 37ab391..ebd2f2d 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -157,10 +157,10 @@ static struct ip_tunnel * ipip_tunnel_lookup(__be32 remote, __be32 local) return NULL; } -static struct ip_tunnel **ipip_bucket(struct ip_tunnel *t) +static struct ip_tunnel **__ipip_bucket(struct ip_tunnel_parm *parms) { - __be32 remote = t-parms.iph.daddr; - __be32 local = t-parms.iph.saddr; + __be32 remote = parms-iph.daddr; + __be32 local = parms-iph.saddr; unsigned h = 0; int prio = 0; @@ -175,6 +175,10 @@ static struct ip_tunnel **ipip_bucket(struct ip_tunnel *t) return tunnels[prio][h]; } +static inline struct ip_tunnel **ipip_bucket(struct ip_tunnel *t) +{ + return __ipip_bucket(t-parms); +} static void ipip_tunnel_unlink(struct ip_tunnel *t) { @@ -206,19 +210,9 @@ static struct ip_tunnel * ipip_tunnel_locate(struct ip_tunnel_parm *parms, int c __be32 local = parms-iph.saddr; struct ip_tunnel *t, **tp, *nt; struct net_device *dev; - unsigned h = 0; - int prio = 0; char name[IFNAMSIZ]; - if (remote) { - prio |= 2; - h ^= HASH(remote); - } - if (local) { - prio |= 1; - h ^= HASH(local); - } - for (tp = tunnels[prio][h]; (t = *tp) != NULL; tp = t-next) { + for (tp = __ipip_bucket(parms); (t = *tp) != NULL; tp = t-next) { if (local == t-parms.iph.saddr remote == t-parms.iph.daddr) return t; } --- commit e8b22bea08420e24a09e32972f455c21206fe102 Author: YOSHIFUJI Hideaki [EMAIL PROTECTED] Date: Tue Apr 24 20:44:48 2007 +0900 [IPV4] IP_GRE: Unify code path to get hash array index. Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] diff --git a/net/ipv4
Re: netlink locking warnings in 2.6.21-rc7-mm1
David Miller wrote: I think I see what might be the problem, nlk-cb_mutex is set to rtnl_mutex and this is used for other purposes in various code paths here, maybe there is a double mutex_unlock() or similar due to that? Nothing in the callbacks should be touching the rtnl, that would have been broken before since we already used to hold it during the first invocation of the dump callback, the only difference is that we now hold it during the entire dump operation. The cb_mutex is only set on socket creation, so there's also nothing that should be rewriting it. I can't see whats wrong here. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink locking warnings in 2.6.21-rc7-mm1
Herbert Xu wrote: David Miller [EMAIL PROTECTED] wrote: I think I see what might be the problem, nlk-cb_mutex is set to rtnl_mutex and this is used for other purposes in various code paths here, maybe there is a double mutex_unlock() or similar due to that? Indeed, the RTNL is held during the processing of all RTNETLINK messages so we'd be trying to lock it recursively here which is not allowed. No, it is released before calling netlink_dump_start(). Actually I'm not quite sure what the benefit is for allowing an override CB mutex. Since we still have to take it and we always allocate memory for a mutex anyway this would seem to be strictly worse than just using our own mutex. The idea was that netlink families that don't want to consistently hold the same mutex used for queue processing during the entire dump operation can still have per-socket mutexes just to protect the callback data and have concurrent dump continuations. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] AF_RXRPC socket family and AFS rewrite [try #3]
Andrew Morton [EMAIL PROTECTED] wrote: I'm ducking all feature and cleanup patches now, and probably shall continue to do so for some weeks. The priority (which I believe to be increasingly urgent) is to fix the 2.6.21 regressions and to stabilise the things which we presently have queued for 2.6.22. Not to mention the 1000ish unaddressed bug reports in bugzilla and elsewhere. Fair enough. I think the idea is for them (or at least some of them) to go through one of DaveM's net git trees anyway. David - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with commit a0ee18b9b7d3847976c6fb315c06a34fb296de0e
On Wednesday 25 April 2007 11:52:52 Ismail Dönmez wrote: Hi, On Tuesday 24 April 2007 00:23:13 Ismail Dönmez wrote: On Tuesday 24 April 2007 00:17:40 Thomas Graf wrote: * Ismail D?nmez [EMAIL PROTECTED] 2007-04-23 22:09 Yes I know the fix is in but I wondered why its creating such problems with 2.6.18 kernel, guess it depends on some other commits. As long as you apply the complete patch including the additional sanity check for RTN_MAX it should work perfectly fine on 2.6.18. The sanity check part doesn't seem to apply to 2.6.18. I can't think of any connection between the patch and the errors you are seeing. Are you absolutely sure the errors you see are directly connected to applying the patch? Yes actually I am but I'll re-test and see. Thanks. I was able to reproduce the same problem with Linus' GIT tree too. Since I started to see these after I applied the commit a0ee18b9b7d3847976c6fb315c06a34fb296de0e to 2.6.18 tree, there is a big possiblity that the commit is the culprit. I attach the relevant dmesg messages. Problem happened after 12 hours of uptime and and net connection gets stable again after 1-2 minutes. Ignore this my laptop seems to be totally dying now including wireless card too. Sorry for the noise. Regards, ismail -- Life is a game, and if you aren't in it to win, what the heck are you still doing here? -- Linus Torvalds (talking about open source development) signature.asc Description: This is a digitally signed message part.
Re: Getting the new RxRPC patches upstream
David Miller [EMAIL PROTECTED] wrote: Is it possible for your changes to be purely networking and not need those changes outside of the networking? See my latest patchset release. I've reduced the dependencies on non-networking changes to: (1) Oleg Nesterov's patch to change cancel_delayed_work() to use del_timer() rather than del_timer_sync() [patch 02/16]. This patch can be discarded without compilation failure at the expense of making AFS slightly less efficient. It also makes AF_RXRPC slightly less efficient, but only in the rmmod path. (2) A symbol export in the keyring stuff plus a proliferation of the types available in the struct key::type_data union [patch 03/16]. This does not conflict with any other patches that I know about. (3) A symbol export in the timer stuff [patch 04/16]. Everything else that remains after the reduction is confined to the AF_RXRPC or AFS code, save for a couple of networking patches in my patchset that you already have and I just need to make the thing compile. I'm not sure that I can make the AF_RXRPC patches totally independent of the AFS patches as the two sets need to interleave since the last AF_RXRPC patch deletes the old RxRPC code - which the old AFS code depends on. David - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20.7 mss negotiation and path mtu discovery mostly broken?
Ristuccia, Brian [EMAIL PROTECTED] writes: I'm seeing a problem where the kernel attempts to send packets with a MSS larger than the one negotiated when the TCP connection is established. Even after ICMP can't fragment messages arrive, the kernel still attempts to increase the MSS rather aggressively. The end result is extremely poor throughput when sending to a network with a smaller MTU. In /proc/sys/net/ipv4: ip_no_pmtu_disc:0 tcp_mtu_probing:0 The sending host (10.2.10.254) has an MTU of 9000. The destination host (12.33.234.69) has an MTU of 1500. There is one router between the hosts which will drop packets with the DF flag when they don't fit the destination interface's MTU and generates the required icmp can't fragment message. The dump shows the initial handshake with correct mss options sent: 08:39:55.493029 IP 12.33.234.69.35026 10.2.10.254.22: S 2768979373:2768979373( 0) win 5840 mss 1460,sackOK,timestamp 3873837730 0,nop,wscale 2 08:39:55.493119 IP 10.2.10.254.22 12.33.234.69.35026: S 963242385:963242385(0) ack 2768979374 win 17896 mss 8960,sackOK,timestamp 413751 The MSS clamp for sending to 10.2.10.254.22 is 8960. MSS is only one way -- each uses what the other tells it. In the following dump, the system eventually gets in a state where it oscillates between sendng undeliverable 2896 byte packets and deliverable 1448 byte ones. This should only happen on PMTU expire, which is normally ~15mins. Perhaps you misconfigured it manually using sysctl. -And - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: 2.6.20.7 mss negotiation and path mtu discovery mostly broken?
08:39:55.493029 IP 12.33.234.69.35026 10.2.10.254.22: S 2768979373:2768979373( 0) win 5840 mss 1460,sackOK,timestamp 3873837730 0,nop,wscale 2 08:39:55.493119 IP 10.2.10.254.22 12.33.234.69.35026: S 963242385:963242385(0) ack 2768979374 win 17896 mss 8960,sackOK,timestamp 413751 The MSS clamp for sending to 10.2.10.254.22 is 8960. MSS is only one way -- each uses what the other tells it. Right - except that 10.2.10.254 keeps sending to 12.33.234.69 with an increasingly large MSS, even though 12.33.234.69 has asked for no larger than 1460. In the following dump, the system eventually gets in a state where it oscillates between sendng undeliverable 2896 byte packets and deliverable 1448 byte ones. This should only happen on PMTU expire, which is normally ~15mins. Perhaps you misconfigured it manually using sysctl. This is /proc/sys/net/ipv4/route/mtu_expires? It's 600. -- Brian Ristuccia This email message and any attachments are confidential information of Starent Networks, Corp. The information transmitted may not be used to create or change any contractual obligations of Starent Networks, Corp. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this e-mail and its attachments by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify the sender immediately -- by replying to this message or by sending an email to [EMAIL PROTECTED] -- and destroy all copies of this message and any attachments without reading or disclosing their contents. Thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb-net/pegasus: fix pegasus carrier detection
In general i agree with the reasoning below. However, isn't it better to remove the code that sets carrier on/off in intr_callback()? There's a reliable way of getting the link status by reading the MII. After correct checking of the return value from read_mii_word(), set_carrier() is what is good enough. If 2 seconds is too long of an interval we could reduce it to 1 second or, if needed, less. I'd like to avoid adding additional flags per device as it will take forever to collect information about their correct behavior and update pegasus.h. In short i think this part of your patch should be enough: --- @@ -847,10 +848,16 @@ static void intr_callback(struct urb *urb) * d[0].NO_CARRIER kicks in only with failed TX. * ... so monitoring with MII may be safest. */ - if (d[0] NO_CARRIER) - netif_carrier_off(net); - else - netif_carrier_on(net); - /* bytes 3-4 == rx_lostpkt, reg 2E/2F */ pegasus-stats.rx_missed_errors += ((d[3] 0x7f) 8) | d[4]; @@ -950,7 +957,7 @@ static void set_carrier(struct net_device *net) pegasus_t *pegasus = netdev_priv(net); u16 tmp; - if (!read_mii_word(pegasus, pegasus-phy, MII_BMSR, tmp)) + if (read_mii_word(pegasus, pegasus-phy, MII_BMSR, tmp)) return; --- cheers, Petko On Tue, 24 Apr 2007, Dan Williams wrote: On Tue, 2007-04-24 at 20:48 +0300, [EMAIL PROTECTED] wrote: On Tue, Apr 24, 2007 at 12:49:12PM -0400, Jeff Garzik wrote: Long term, Greg seemed OK with moving the net drivers from drivers/usb/net to drivers/usb/net, in line with the current policy of placing net drivers in drivers/net/*, bus agnostic. After that move, sending to netdev and me (as you did here) would be the preferred avenue. Speaking of which, do you want me to do this in the 2.6.22-rc1 timeframe? Usually big code moves like this are good to do right after rc1 comes out as the major churn is usually completed then. Sorry to interfere, but could you guys wait until tomorrow before applying the patch to your respective GIT trees? I'd like to check if the code is doing the right thing and avoid patch reversal. Original problem was that the patch I referenced in the commit message from Jan 6 2006 switched the return value semantics from read_mii_word(). Before the patch, read_mii_word returned 1 on success, 0 on error. After the patch, it returns the generally accepted 0 on success and !0 on error. That causes set_carrier() to return immediately rather than fiddle with netif_carrier_*. When the Jan 6 2006 patch went in changing the return values, set_carrier() was not updated for the new return values. Nothing else in the code cares about read_mii_word()'s return value except set_carrier(). But when the card is brought up and no cable is plugged in, intr_callback() gets called repeatedly, which itself repeatedly calls netif_carrier_on() due to the NO_CARRIER check. The comment there about NO_CARRIER kicks in on TX failure seems accurate, because even with no cable plugged in, and therefore no packets getting transmitted, the NO_CARRIER check is never true on the Belkin part. Therefore, netif_carrier_on() is always called as a result of the failure of d[0] NO_CARRIER, turning carrier back on even if there is no cable plugged in. This bulldozes over the MII carrier_check routine too. I don't think the intr_callback() code should ever turn the carrier _on_, because there's that 2*HZ MII carrier check which can certainly handle the carrier on/off stuff. LINK_STATUS appears valid on the Belkin part too, so we can add that as a reverse-quirk and use LINK_STATUS on parts where it works. If you think that the NO_CARRIER check should be in _addition_ to the LINK_STATUS check, that's fine with me, provided that the NO_CARRIER check only turns carrier off. Dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb-net/pegasus: fix pegasus carrier detection
On Wed, 2007-04-25 at 17:58 +0300, Petko Manolov wrote: In general i agree with the reasoning below. However, isn't it better to remove the code that sets carrier on/off in intr_callback()? I'm fine with this; whatever makes carrier status work makes me happy :) Dan There's a reliable way of getting the link status by reading the MII. After correct checking of the return value from read_mii_word(), set_carrier() is what is good enough. If 2 seconds is too long of an interval we could reduce it to 1 second or, if needed, less. I'd like to avoid adding additional flags per device as it will take forever to collect information about their correct behavior and update pegasus.h. In short i think this part of your patch should be enough: --- @@ -847,10 +848,16 @@ static void intr_callback(struct urb *urb) * d[0].NO_CARRIER kicks in only with failed TX. * ... so monitoring with MII may be safest. */ - if (d[0] NO_CARRIER) - netif_carrier_off(net); - else - netif_carrier_on(net); - /* bytes 3-4 == rx_lostpkt, reg 2E/2F */ pegasus-stats.rx_missed_errors += ((d[3] 0x7f) 8) | d[4]; @@ -950,7 +957,7 @@ static void set_carrier(struct net_device *net) pegasus_t *pegasus = netdev_priv(net); u16 tmp; - if (!read_mii_word(pegasus, pegasus-phy, MII_BMSR, tmp)) + if (read_mii_word(pegasus, pegasus-phy, MII_BMSR, tmp)) return; --- cheers, Petko On Tue, 24 Apr 2007, Dan Williams wrote: On Tue, 2007-04-24 at 20:48 +0300, [EMAIL PROTECTED] wrote: On Tue, Apr 24, 2007 at 12:49:12PM -0400, Jeff Garzik wrote: Long term, Greg seemed OK with moving the net drivers from drivers/usb/net to drivers/usb/net, in line with the current policy of placing net drivers in drivers/net/*, bus agnostic. After that move, sending to netdev and me (as you did here) would be the preferred avenue. Speaking of which, do you want me to do this in the 2.6.22-rc1 timeframe? Usually big code moves like this are good to do right after rc1 comes out as the major churn is usually completed then. Sorry to interfere, but could you guys wait until tomorrow before applying the patch to your respective GIT trees? I'd like to check if the code is doing the right thing and avoid patch reversal. Original problem was that the patch I referenced in the commit message from Jan 6 2006 switched the return value semantics from read_mii_word(). Before the patch, read_mii_word returned 1 on success, 0 on error. After the patch, it returns the generally accepted 0 on success and !0 on error. That causes set_carrier() to return immediately rather than fiddle with netif_carrier_*. When the Jan 6 2006 patch went in changing the return values, set_carrier() was not updated for the new return values. Nothing else in the code cares about read_mii_word()'s return value except set_carrier(). But when the card is brought up and no cable is plugged in, intr_callback() gets called repeatedly, which itself repeatedly calls netif_carrier_on() due to the NO_CARRIER check. The comment there about NO_CARRIER kicks in on TX failure seems accurate, because even with no cable plugged in, and therefore no packets getting transmitted, the NO_CARRIER check is never true on the Belkin part. Therefore, netif_carrier_on() is always called as a result of the failure of d[0] NO_CARRIER, turning carrier back on even if there is no cable plugged in. This bulldozes over the MII carrier_check routine too. I don't think the intr_callback() code should ever turn the carrier _on_, because there's that 2*HZ MII carrier check which can certainly handle the carrier on/off stuff. LINK_STATUS appears valid on the Belkin part too, so we can add that as a reverse-quirk and use LINK_STATUS on parts where it works. If you think that the NO_CARRIER check should be in _addition_ to the LINK_STATUS check, that's fine with me, provided that the NO_CARRIER check only turns carrier off. Dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb-net/pegasus: fix pegasus carrier detection
On Wed, 25 Apr 2007, Dan Williams wrote: On Wed, 2007-04-25 at 17:58 +0300, Petko Manolov wrote: In general i agree with the reasoning below. However, isn't it better to remove the code that sets carrier on/off in intr_callback()? I'm fine with this; whatever makes carrier status work makes me happy :) Great. Are you going to submit the new patch or this hard labor will lay on my shoulders? :) Petko There's a reliable way of getting the link status by reading the MII. After correct checking of the return value from read_mii_word(), set_carrier() is what is good enough. If 2 seconds is too long of an interval we could reduce it to 1 second or, if needed, less. I'd like to avoid adding additional flags per device as it will take forever to collect information about their correct behavior and update pegasus.h. In short i think this part of your patch should be enough: --- @@ -847,10 +848,16 @@ static void intr_callback(struct urb *urb) * d[0].NO_CARRIER kicks in only with failed TX. * ... so monitoring with MII may be safest. */ - if (d[0] NO_CARRIER) - netif_carrier_off(net); - else - netif_carrier_on(net); - /* bytes 3-4 == rx_lostpkt, reg 2E/2F */ pegasus-stats.rx_missed_errors += ((d[3] 0x7f) 8) | d[4]; @@ -950,7 +957,7 @@ static void set_carrier(struct net_device *net) pegasus_t *pegasus = netdev_priv(net); u16 tmp; - if (!read_mii_word(pegasus, pegasus-phy, MII_BMSR, tmp)) + if (read_mii_word(pegasus, pegasus-phy, MII_BMSR, tmp)) return; --- cheers, Petko On Tue, 24 Apr 2007, Dan Williams wrote: On Tue, 2007-04-24 at 20:48 +0300, [EMAIL PROTECTED] wrote: On Tue, Apr 24, 2007 at 12:49:12PM -0400, Jeff Garzik wrote: Long term, Greg seemed OK with moving the net drivers from drivers/usb/net to drivers/usb/net, in line with the current policy of placing net drivers in drivers/net/*, bus agnostic. After that move, sending to netdev and me (as you did here) would be the preferred avenue. Speaking of which, do you want me to do this in the 2.6.22-rc1 timeframe? Usually big code moves like this are good to do right after rc1 comes out as the major churn is usually completed then. Sorry to interfere, but could you guys wait until tomorrow before applying the patch to your respective GIT trees? I'd like to check if the code is doing the right thing and avoid patch reversal. Original problem was that the patch I referenced in the commit message from Jan 6 2006 switched the return value semantics from read_mii_word(). Before the patch, read_mii_word returned 1 on success, 0 on error. After the patch, it returns the generally accepted 0 on success and !0 on error. That causes set_carrier() to return immediately rather than fiddle with netif_carrier_*. When the Jan 6 2006 patch went in changing the return values, set_carrier() was not updated for the new return values. Nothing else in the code cares about read_mii_word()'s return value except set_carrier(). But when the card is brought up and no cable is plugged in, intr_callback() gets called repeatedly, which itself repeatedly calls netif_carrier_on() due to the NO_CARRIER check. The comment there about NO_CARRIER kicks in on TX failure seems accurate, because even with no cable plugged in, and therefore no packets getting transmitted, the NO_CARRIER check is never true on the Belkin part. Therefore, netif_carrier_on() is always called as a result of the failure of d[0] NO_CARRIER, turning carrier back on even if there is no cable plugged in. This bulldozes over the MII carrier_check routine too. I don't think the intr_callback() code should ever turn the carrier _on_, because there's that 2*HZ MII carrier check which can certainly handle the carrier on/off stuff. LINK_STATUS appears valid on the Belkin part too, so we can add that as a reverse-quirk and use LINK_STATUS on parts where it works. If you think that the NO_CARRIER check should be in _addition_ to the LINK_STATUS check, that's fine with me, provided that the NO_CARRIER check only turns carrier off. Dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][XFRM] export SAD info
Dave, Something ive been meaning to do since you made the hash changes. I will be doing one also for policy. Against latest Linus tree because i am having strange challenges syncing net-2.6.22. cheers, jamal [XFRM] export SAD info On a system with a lot of SAs, counting SAD entries chews useful CPU time since you need to dump the whole SAD to user space; i.e something like ip xfrm state ls | grep -i src | wc -l I have seen taking literally minutes on a 40K SAs when the system is swapping. With this patch, some of the SAD info (that was already being tracked) is exposed to user space. i.e you do: ip xfrm state count And you get the count; you can also pass -s to the command line and get the hash info. Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] --- commit 1fb99604e38f27c1ad4cb74b11f148c34d0d3be6 tree 1bb35db627ac5d3d2f370d0fc993ba6b80392696 parent 146d97b89c83c9460012185bfd584d21a3b5fe19 author Jamal Hadi Salim [EMAIL PROTECTED] Wed, 25 Apr 2007 11:30:21 -0400 committer Jamal Hadi Salim [EMAIL PROTECTED] Wed, 25 Apr 2007 11:30:21 -0400 include/linux/xfrm.h | 25 ++ include/net/xfrm.h|8 +++ net/xfrm/xfrm_state.c | 12 ++- net/xfrm/xfrm_user.c | 56 + 4 files changed, 100 insertions(+), 1 deletions(-) diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h index 15ca89e..9c656a5 100644 --- a/include/linux/xfrm.h +++ b/include/linux/xfrm.h @@ -181,6 +181,10 @@ enum { XFRM_MSG_MIGRATE, #define XFRM_MSG_MIGRATE XFRM_MSG_MIGRATE + XFRM_MSG_NEWSADINFO, +#define XFRM_MSG_NEWSADINFO XFRM_MSG_NEWSADINFO + XFRM_MSG_GETSADINFO, +#define XFRM_MSG_GETSADINFO XFRM_MSG_GETSADINFO __XFRM_MSG_MAX }; #define XFRM_MSG_MAX (__XFRM_MSG_MAX - 1) @@ -234,6 +238,17 @@ enum xfrm_ae_ftype_t { #define XFRM_AE_MAX (__XFRM_AE_MAX - 1) }; +/* SAD Table filter flags */ +enum xfrm_sad_ftype_t { + XFRM_SAD_UNSPEC, + XFRM_SAD_HMASK=1, + XFRM_SAD_HMAX=2, + XFRM_SAD_CNT=4, + __XFRM_SAD_MAX + +#define XFRM_SAD_MAX (__XFRM_SAD_MAX - 1) +}; + struct xfrm_userpolicy_type { __u8type; __u16 reserved1; @@ -265,6 +280,16 @@ enum xfrm_attr_type_t { #define XFRMA_MAX (__XFRMA_MAX - 1) }; +enum xfrm_sadattr_type_t { + XFRMA_SAD_UNSPEC, + XFRMA_SADHMASK, + XFRMA_SADHMAX, + XFRMA_SADCNT, + __XFRMA_SAD_MAX + +#define XFRMA_SAD_MAX (__XFRMA_SAD_MAX - 1) +}; + struct xfrm_usersa_info { struct xfrm_selectorsel; struct xfrm_id id; diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 5a00aa8..4922e9f 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -416,6 +416,13 @@ struct xfrm_audit u32 secid; }; +/* SAD metadata, add more later */ +struct xfrm_sadinfo +{ + u32 sadhcnt; /* current hash bkts */ + u32 sadhmcnt; /* max allowed hash bkts */ + u32 sadcnt; /* current running count */ +}; #ifdef CONFIG_AUDITSYSCALL extern void xfrm_audit_log(uid_t auid, u32 secid, int type, int result, struct xfrm_policy *xp, struct xfrm_state *x); @@ -938,6 +945,7 @@ static inline int xfrm_state_sort(struct xfrm_state **dst, struct xfrm_state **s extern struct xfrm_state *xfrm_find_acq_byseq(u32 seq); extern int xfrm_state_delete(struct xfrm_state *x); extern void xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info); +extern void xfrm_sad_getinfo(struct xfrm_sadinfo *si); extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq); extern void xfrm_replay_advance(struct xfrm_state *x, __be32 seq); extern void xfrm_replay_notify(struct xfrm_state *x, int event); diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index c1581fb..98e5ce3 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -53,7 +53,7 @@ static struct hlist_head *xfrm_state_bysrc __read_mostly; static struct hlist_head *xfrm_state_byspi __read_mostly; static unsigned int xfrm_state_hmask __read_mostly; static unsigned int xfrm_state_hashmax __read_mostly = 1 * 1024 * 1024; -static u32 xfrm_state_num; +static unsigned int xfrm_state_num; static unsigned int xfrm_state_genid; static inline unsigned int xfrm_dst_hash(xfrm_address_t *daddr, @@ -421,6 +421,16 @@ restart: } EXPORT_SYMBOL(xfrm_state_flush); +void xfrm_sad_getinfo(struct xfrm_sadinfo *si) +{ + spin_lock_bh(xfrm_state_lock); + si-sadcnt = xfrm_state_num; + si-sadhcnt = xfrm_state_hmask; + si-sadhmcnt = xfrm_state_hashmax; + spin_unlock_bh(xfrm_state_lock); +} +EXPORT_SYMBOL(xfrm_sad_getinfo); + static int xfrm_init_tempsel(struct xfrm_state *x, struct flowi *fl, struct xfrm_tmpl *tmpl, diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 816e369..089159a 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -672,6 +672,61 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff
[PATCH 00/16] AF_RXRPC socket family and AFS rewrite [try #4]
The first of these patches together provide secure client-side RxRPC connectivity as a Linux kernel socket family. Only the RxRPC transport/session side is supplied - the presentation side (marshalling the data) is left to the client. Copies of the patches can be found here: http://people.redhat.com/~dhowells/rxrpc/series http://people.redhat.com/~dhowells/rxrpc/01-move-skb-generic.diff http://people.redhat.com/~dhowells/rxrpc/02-cancel_delayed_work.diff http://people.redhat.com/~dhowells/rxrpc/03-keys.diff http://people.redhat.com/~dhowells/rxrpc/04-timer-exports.diff http://people.redhat.com/~dhowells/rxrpc/05-af_rxrpc.diff Further patches make the in-kernel AFS filesystem use AF_RXRPC and delete the old RxRPC implementation: http://people.redhat.com/~dhowells/rxrpc/06-afs-cleanup.diff http://people.redhat.com/~dhowells/rxrpc/07-af_rxrpc-kernel.diff http://people.redhat.com/~dhowells/rxrpc/08-af_rxrpc-afs.diff http://people.redhat.com/~dhowells/rxrpc/09-af_rxrpc-delete-old.diff And then the rest of the patches extend AFS to provide automatic unmounting of automount trees, security support and directory-level write support (create, mkdir, etc.): http://people.redhat.com/~dhowells/rxrpc/10-afs-multimount.diff http://people.redhat.com/~dhowells/rxrpc/11-afs-security.diff http://people.redhat.com/~dhowells/rxrpc/12-afs-doc.diff http://people.redhat.com/~dhowells/rxrpc/13-netlink-support-MSG_TRUNC.diff http://people.redhat.com/~dhowells/rxrpc/14-afs-get-capabilities.diff http://people.redhat.com/~dhowells/rxrpc/15-afs-initcallbackstate3.diff http://people.redhat.com/~dhowells/rxrpc/16-afs-dir-write-support.diff Note that file-level write support is not yet complete and so is not included in this patch set. The userspace access methods make use of the control data passed to/by sendmsg() and recvmsg(). See the three simple test programs: http://people.redhat.com/~dhowells/rxrpc/klog.c http://people.redhat.com/~dhowells/rxrpc/rxrpc.c http://people.redhat.com/~dhowells/rxrpc/listen.c The klog program is provided to go and get a Kerberos IV key from the AFS kaserver. Currently it must be edited before compiling to note the right server IP address and the appropriate credentials. These programs can be compiled by: make klog rxrpc listen CFLAGS=-Wall -g LDLIBS=-lcrypto -lcrypt -lkrb4 -lkeyutils Then a ticket can be obtained by: ./klog If a security key is acquired in this way, then all subsequent AFS operations - including VL lookups and mounts - performed with that session keyring will be authenticated using that key. The key can be viewed like so: [EMAIL PROTECTED] ~]# keyctl show Session Keyring -3 --alswrv 0 0 keyring: _ses.3268 2 --alswrv 0 0 \_ keyring: _uid.0 111416553 --als--v 0 0 \_ rxrpc: [EMAIL PROTECTED] TODO: (*) Make certain parameters (such as connection timeouts) userspace configurable. (*) Make userspace utilities use it; librxrpc. (*) Userspace documentation. (*) KerberosV security. Changes: (*) SOCK_RPC has been removed. SOCK_DGRAM is now used instead. (*) I've add a facility whereby calls can be made to destinations other than the connect() address of a client socket by making use of msg_name in the msghdr struct when using sendmsg() to send the first data packet of a call. Indeed, a client socket need not be connected before being used so. (*) I've also added a facility whereby client calls may also be made on server sockets, again by using msg_name in the msghdr struct. In such a case, the server's local transport endpoint is used. (*) I've made the write buffer space check available to various callers (sk_write_space) and implemented poll support. (*) Rewrote rxrpc_recvmsg(). It now concatenates adjacent data messages from the same call when delivering them. (*) Updated the documentation to include notes on recvmsg, cover control messages and cover SOL_RXRPC-level socket options. (*) Provided an in-kernel interface to give in-kernel utilities easier access to the facility. (*) Made fs/afs/ use it. (*) Deleted the old contents of net/rxrpc/. (*) Use the scatterlist interface to the crypto API for now. The patch that added the direct access interface conflicts with patches Herbert Xu is producing, so I've dropped it for the moment. (*) Moved a bug fix to make secure connection reuse work from the af_rxrpc-kernel patch to the af_rxrpc main patch. (*) Make RxRPC use its own private work queues rather than keventd's to avoid deadlocks when AFS tries to use keventd too. This also puts encryption in the private work queue rather than keventd's queue as that might take a relatively long time to
[PATCH 02/16] cancel_delayed_work: use del_timer() instead of del_timer_sync() [try #4]
del_timer_sync() buys nothing for cancel_delayed_work(), but it is less efficient since it locks the timer unconditionally, and may wait for the completion of the delayed_work_timer_fn(). cancel_delayed_work() == 0 means: before this patch: work-func may still be running or queued after this patch: work-func may still be running or queued, or delayed_work_timer_fn-__queue_work() in progress. The latter doesn't differ from the caller's POV, delayed_work_timer_fn() is called with _PENDING bit set. cancel_delayed_work() == 1 with this patch adds a new possibility: delayed_work-work was cancelled, but delayed_work_timer_fn is still running (this is only possible for the re-arming works on single-threaded workqueue). In this case the timer was re-started by work-func(), nobody else can do this. This in turn means that delayed_work_timer_fn has already passed __queue_work() (and wont't touch delayed_work) because nobody else can queue delayed_work-work. Signed-off-by: Oleg Nesterov [EMAIL PROTECTED] Signed-Off-By: David Howells [EMAIL PROTECTED] --- include/linux/workqueue.h |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 2a7b38d..b8abfc7 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -191,14 +191,15 @@ int execute_in_process_context(work_func_t fn, struct execute_work *); /* * Kill off a pending schedule_delayed_work(). Note that the work callback - * function may still be running on return from cancel_delayed_work(). Run - * flush_scheduled_work() to wait on it. + * function may still be running on return from cancel_delayed_work(), unless + * it returns 1 and the work doesn't re-arm itself. Run flush_workqueue() or + * cancel_work_sync() to wait on it. */ static inline int cancel_delayed_work(struct delayed_work *work) { int ret; - ret = del_timer_sync(work-timer); + ret = del_timer(work-timer); if (ret) work_release(work-work); return ret; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/16] AF_RXRPC: Move generic skbuff stuff from XFRM code to generic code [try #4]
Move generic skbuff stuff from XFRM code to generic code so that AF_RXRPC can use it too. The kdoc comments I've attached to the functions needs to be checked by whoever wrote them as I had to make some guesses about the workings of these functions. Signed-Off-By: David Howells [EMAIL PROTECTED] --- include/linux/skbuff.h |6 ++ include/net/esp.h |2 - net/core/skbuff.c | 188 net/xfrm/xfrm_algo.c | 169 --- 4 files changed, 194 insertions(+), 171 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5992f65..c905d42 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -83,6 +83,7 @@ */ struct net_device; +struct scatterlist; #ifdef CONFIG_NETFILTER struct nf_conntrack { @@ -361,6 +362,11 @@ extern struct sk_buff *skb_realloc_headroom(struct sk_buff *skb, extern struct sk_buff *skb_copy_expand(const struct sk_buff *skb, int newheadroom, int newtailroom, gfp_t priority); +extern intskb_to_sgvec(struct sk_buff *skb, + struct scatterlist *sg, int offset, + int len); +extern intskb_cow_data(struct sk_buff *skb, int tailbits, + struct sk_buff **trailer); extern intskb_pad(struct sk_buff *skb, int pad); #define dev_kfree_skb(a) kfree_skb(a) extern void skb_over_panic(struct sk_buff *skb, int len, diff --git a/include/net/esp.h b/include/net/esp.h index 713d039..d05d8d2 100644 --- a/include/net/esp.h +++ b/include/net/esp.h @@ -40,8 +40,6 @@ struct esp_data } auth; }; -extern int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len); -extern int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer); extern void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len); static inline int esp_mac_digest(struct esp_data *esp, struct sk_buff *skb, diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 336958f..aa02bd4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -55,6 +55,7 @@ #include linux/cache.h #include linux/rtnetlink.h #include linux/init.h +#include linux/scatterlist.h #include net/protocol.h #include net/dst.h @@ -2005,6 +2006,190 @@ void __init skb_init(void) NULL, NULL); } +/** + * skb_to_sgvec - Fill a scatter-gather list from a socket buffer + * @skb: Socket buffer containing the buffers to be mapped + * @sg: The scatter-gather list to map into + * @offset: The offset into the buffer's contents to start mapping + * @len: Length of buffer space to be mapped + * + * Fill the specified scatter-gather list with mappings/pointers into a + * region of the buffer space attached to a socket buffer. + */ +int +skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len) +{ + int start = skb_headlen(skb); + int i, copy = start - offset; + int elt = 0; + + if (copy 0) { + if (copy len) + copy = len; + sg[elt].page = virt_to_page(skb-data + offset); + sg[elt].offset = (unsigned long)(skb-data + offset) % PAGE_SIZE; + sg[elt].length = copy; + elt++; + if ((len -= copy) == 0) + return elt; + offset += copy; + } + + for (i = 0; i skb_shinfo(skb)-nr_frags; i++) { + int end; + + BUG_TRAP(start = offset + len); + + end = start + skb_shinfo(skb)-frags[i].size; + if ((copy = end - offset) 0) { + skb_frag_t *frag = skb_shinfo(skb)-frags[i]; + + if (copy len) + copy = len; + sg[elt].page = frag-page; + sg[elt].offset = frag-page_offset+offset-start; + sg[elt].length = copy; + elt++; + if (!(len -= copy)) + return elt; + offset += copy; + } + start = end; + } + + if (skb_shinfo(skb)-frag_list) { + struct sk_buff *list = skb_shinfo(skb)-frag_list; + + for (; list; list = list-next) { + int end; + + BUG_TRAP(start = offset + len); + + end = start + list-len; + if ((copy = end - offset) 0) { + if (copy len) + copy = len; + elt += skb_to_sgvec(list, sg+elt, offset - start, copy); + if ((len -= copy) == 0)
[PATCH 03/16] AF_RXRPC: Key facility changes for AF_RXRPC [try #4]
Export the keyring key type definition and document its availability. Add alternative types into the key's type_data union to make it more useful. Not all users necessarily want to use it as a list_head (AF_RXRPC doesn't, for example), so make it clear that it can be used in other ways. Signed-Off-By: David Howells [EMAIL PROTECTED] --- Documentation/keys.txt | 12 include/linux/key.h |2 ++ security/keys/keyring.c |2 ++ 3 files changed, 16 insertions(+), 0 deletions(-) diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 60c665d..81d9aa0 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt @@ -859,6 +859,18 @@ payload contents for more information. void unregister_key_type(struct key_type *type); +Under some circumstances, it may be desirable to desirable to deal with a +bundle of keys. The facility provides access to the keyring type for managing +such a bundle: + + struct key_type key_type_keyring; + +This can be used with a function such as request_key() to find a specific +keyring in a process's keyrings. A keyring thus found can then be searched +with keyring_search(). Note that it is not possible to use request_key() to +search a specific keyring, so using keyrings in this way is of limited utility. + + === NOTES ON ACCESSING PAYLOAD CONTENTS === diff --git a/include/linux/key.h b/include/linux/key.h index 169f05e..a9220e7 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -160,6 +160,8 @@ struct key { */ union { struct list_headlink; + unsigned long x[2]; + void*p[2]; } type_data; /* key data diff --git a/security/keys/keyring.c b/security/keys/keyring.c index ad45ce7..88292e3 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -66,6 +66,8 @@ struct key_type key_type_keyring = { .read = keyring_read, }; +EXPORT_SYMBOL(key_type_keyring); + /* * semaphore to serialise link/link calls to prevent two link calls in parallel * introducing a cycle - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/16] AFS: Handle multiple mounts of an AFS superblock correctly [try #4]
Handle multiple mounts of an AFS superblock correctly, checking to see whether the superblock is already initialised after calling sget() rather than just unconditionally stamping all over it. Also delete the silent parameter to afs_fill_super() as it's not used and can, in any case, be obtained from sb-s_flags. Signed-Off-By: David Howells [EMAIL PROTECTED] --- fs/afs/super.c | 26 -- 1 files changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/afs/super.c b/fs/afs/super.c index efc4fe6..77e6875 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -212,7 +212,7 @@ static int afs_test_super(struct super_block *sb, void *data) /* * fill in the superblock */ -static int afs_fill_super(struct super_block *sb, void *data, int silent) +static int afs_fill_super(struct super_block *sb, void *data) { struct afs_mount_params *params = data; struct afs_super_info *as = NULL; @@ -319,17 +319,23 @@ static int afs_get_sb(struct file_system_type *fs_type, goto error; } - sb-s_flags = flags; - - ret = afs_fill_super(sb, params, flags MS_SILENT ? 1 : 0); - if (ret 0) { - up_write(sb-s_umount); - deactivate_super(sb); - goto error; + if (!sb-s_root) { + /* initial superblock/root creation */ + _debug(create); + sb-s_flags = flags; + ret = afs_fill_super(sb, params); + if (ret 0) { + up_write(sb-s_umount); + deactivate_super(sb); + goto error; + } + sb-s_flags |= MS_ACTIVE; + } else { + _debug(reuse); + ASSERTCMP(sb-s_flags, , MS_ACTIVE); } - sb-s_flags |= MS_ACTIVE; - simple_set_mnt(mnt, sb); + simple_set_mnt(mnt, sb); afs_put_volume(params.volume); afs_put_cell(params.default_cell); _leave( = 0 [%p], sb); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/16] AFS: Implement the CB.InitCallBackState3 operation [try #4]
Implement the CB.InitCallBackState3 operation for the fileserver to call. This reduces the amount of network traffic because if this op is aborted, the fileserver will then attempt an CB.InitCallBackState operation. Signed-Off-By: David Howells [EMAIL PROTECTED] --- fs/afs/afs_cm.h|1 + fs/afs/cmservice.c | 46 ++ 2 files changed, 47 insertions(+), 0 deletions(-) diff --git a/fs/afs/afs_cm.h b/fs/afs/afs_cm.h index d4bd201..7b4d4fa 100644 --- a/fs/afs/afs_cm.h +++ b/fs/afs/afs_cm.h @@ -23,6 +23,7 @@ enum AFS_CM_Operations { CBGetCE = 208, /* get cache file description */ CBGetXStatsVersion = 209, /* get version of extended statistics */ CBGetXStats = 210, /* get contents of extended statistics data */ + CBInitCallBackState3= 213, /* initialise callback state, version 3 */ CBGetCapabilities = 65538, /* get client capabilities */ }; diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index f8ad36b..32deb04 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -20,6 +20,8 @@ struct workqueue_struct *afs_cm_workqueue; static int afs_deliver_cb_init_call_back_state(struct afs_call *, struct sk_buff *, bool); +static int afs_deliver_cb_init_call_back_state3(struct afs_call *, + struct sk_buff *, bool); static int afs_deliver_cb_probe(struct afs_call *, struct sk_buff *, bool); static int afs_deliver_cb_callback(struct afs_call *, struct sk_buff *, bool); static int afs_deliver_cb_get_capabilities(struct afs_call *, struct sk_buff *, @@ -47,6 +49,16 @@ static const struct afs_call_type afs_SRXCBInitCallBackState = { }; /* + * CB.InitCallBackState3 operation type + */ +static const struct afs_call_type afs_SRXCBInitCallBackState3 = { + .name = CB.InitCallBackState3, + .deliver= afs_deliver_cb_init_call_back_state3, + .abort_to_error = afs_abort_to_error, + .destructor = afs_cm_destructor, +}; + +/* * CB.Probe operation type */ static const struct afs_call_type afs_SRXCBProbe = { @@ -83,6 +95,9 @@ bool afs_cm_incoming_call(struct afs_call *call) case CBInitCallBackState: call-type = afs_SRXCBInitCallBackState; return true; + case CBInitCallBackState3: + call-type = afs_SRXCBInitCallBackState3; + return true; case CBProbe: call-type = afs_SRXCBProbe; return true; @@ -312,6 +327,37 @@ static int afs_deliver_cb_init_call_back_state(struct afs_call *call, } /* + * deliver request data to a CB.InitCallBackState3 call + */ +static int afs_deliver_cb_init_call_back_state3(struct afs_call *call, + struct sk_buff *skb, + bool last) +{ + struct afs_server *server; + struct in_addr addr; + + _enter(,{%u},%d, skb-len, last); + + if (!last) + return 0; + + /* no unmarshalling required */ + call-state = AFS_CALL_REPLYING; + + /* we'll need the file server record as that tells us which set of +* vnodes to operate upon */ + memcpy(addr, skb-nh.iph-saddr, 4); + server = afs_find_server(addr); + if (!server) + return -ENOTCONN; + call-server = server; + + INIT_WORK(call-work, SRXAFSCB_InitCallBackState); + schedule_work(call-work); + return 0; +} + +/* * allow the fileserver to see if the cache manager is still alive */ static void SRXAFSCB_Probe(struct work_struct *work) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][XFRM] export SAD info
That patch has xfrm_state_num being mucked with; just ignore that bit. I need to send a patch against net-2.6.22 and i will clean that up - just need some feedback. Would it make sense to have those vars as u32 instead of unsigned int? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/16] commit ad495d7b6cfcd1bc2eaf06c42699be0bb5d84234 [try #4]
[NETLINK]: Mirror UDP MSG_TRUNC semantics. If the user passes MSG_TRUNC in via msg_flags, return the full packet size not the truncated size. Idea from Herbert Xu and Thomas Graf. Signed-off-by: David S. Miller [EMAIL PROTECTED] --- net/netlink/af_netlink.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index c48b0f4..5890210 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1242,6 +1242,9 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock, scm_recv(sock, msg, siocb-scm, flags); + if (flags MSG_TRUNC) + copied = skb-len; + out: netlink_rcv_wake(sk); return err ? : copied; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/16] AFS: Add support for the CB.GetCapabilities operation [try #4]
Add support for the CB.GetCapabilities operation with which the fileserver can ask the client for the following information: (1) The list of network interfaces it has available as IPv4 address + netmask plus the MTUs. (2) The client's UUID. (3) The extended capabilities of the client, for which the only current one is unified error mapping (abort code interpretation). To support this, the patch adds the following routines to AFS: (1) A function to iterate through all the network interfaces using RTNETLINK to extract IPv4 addresses and MTUs. (2) A function to iterate through all the network interfaces using RTNETLINK to pull out the MAC address of the lowest index interface to use in UUID construction. Signed-Off-By: David Howells [EMAIL PROTECTED] --- fs/afs/Makefile|1 fs/afs/afs_cm.h|3 fs/afs/cmservice.c | 98 ++ fs/afs/internal.h | 42 fs/afs/main.c | 49 + fs/afs/rxrpc.c | 39 fs/afs/use-rtnetlink.c | 473 7 files changed, 705 insertions(+), 0 deletions(-) diff --git a/fs/afs/Makefile b/fs/afs/Makefile index cca198b..01545eb 100644 --- a/fs/afs/Makefile +++ b/fs/afs/Makefile @@ -18,6 +18,7 @@ kafs-objs := \ security.o \ server.o \ super.o \ + use-rtnetlink.o \ vlclient.o \ vlocation.o \ vnode.o \ diff --git a/fs/afs/afs_cm.h b/fs/afs/afs_cm.h index 7c8e3d4..d4bd201 100644 --- a/fs/afs/afs_cm.h +++ b/fs/afs/afs_cm.h @@ -23,6 +23,9 @@ enum AFS_CM_Operations { CBGetCE = 208, /* get cache file description */ CBGetXStatsVersion = 209, /* get version of extended statistics */ CBGetXStats = 210, /* get contents of extended statistics data */ + CBGetCapabilities = 65538, /* get client capabilities */ }; +#define AFS_CAP_ERROR_TRANSLATION 0x1 + #endif /* AFS_FS_H */ diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 7e184bb..f8ad36b 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -22,6 +22,8 @@ static int afs_deliver_cb_init_call_back_state(struct afs_call *, struct sk_buff *, bool); static int afs_deliver_cb_probe(struct afs_call *, struct sk_buff *, bool); static int afs_deliver_cb_callback(struct afs_call *, struct sk_buff *, bool); +static int afs_deliver_cb_get_capabilities(struct afs_call *, struct sk_buff *, + bool); static void afs_cm_destructor(struct afs_call *); /* @@ -55,6 +57,16 @@ static const struct afs_call_type afs_SRXCBProbe = { }; /* + * CB.GetCapabilities operation type + */ +static const struct afs_call_type afs_SRXCBGetCapabilites = { + .name = CB.GetCapabilities, + .deliver= afs_deliver_cb_get_capabilities, + .abort_to_error = afs_abort_to_error, + .destructor = afs_cm_destructor, +}; + +/* * route an incoming cache manager call * - return T if supported, F if not */ @@ -74,6 +86,9 @@ bool afs_cm_incoming_call(struct afs_call *call) case CBProbe: call-type = afs_SRXCBProbe; return true; + case CBGetCapabilities: + call-type = afs_SRXCBGetCapabilites; + return true; default: return false; } @@ -328,3 +343,86 @@ static int afs_deliver_cb_probe(struct afs_call *call, struct sk_buff *skb, schedule_work(call-work); return 0; } + +/* + * allow the fileserver to ask about the cache manager's capabilities + */ +static void SRXAFSCB_GetCapabilities(struct work_struct *work) +{ + struct afs_interface *ifs; + struct afs_call *call = container_of(work, struct afs_call, work); + int loop, nifs; + + struct { + struct /* InterfaceAddr */ { + __be32 nifs; + __be32 uuid[11]; + __be32 ifaddr[32]; + __be32 netmask[32]; + __be32 mtu[32]; + } ia; + struct /* Capabilities */ { + __be32 capcount; + __be32 caps[1]; + } cap; + } reply; + + _enter(); + + nifs = 0; + ifs = kcalloc(32, sizeof(*ifs), GFP_KERNEL); + if (ifs) { + nifs = afs_get_ipv4_interfaces(ifs, 32, false); + if (nifs 0) { + kfree(ifs); + ifs = NULL; + nifs = 0; + } + } + + memset(reply, 0, sizeof(reply)); + reply.ia.nifs = htonl(nifs); + + reply.ia.uuid[0] = htonl(afs_uuid.time_low); + reply.ia.uuid[1] = htonl(afs_uuid.time_mid); + reply.ia.uuid[2] = htonl(afs_uuid.time_hi_and_version); + reply.ia.uuid[3] = htonl((s8)
[PATCH 12/16] AFS: Update the AFS fs documentation [try #4]
Update the AFS fs documentation. Signed-Off-By: David Howells [EMAIL PROTECTED] --- Documentation/filesystems/afs.txt | 214 +++-- 1 files changed, 154 insertions(+), 60 deletions(-) diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt index 2f4237d..12ad6c7 100644 --- a/Documentation/filesystems/afs.txt +++ b/Documentation/filesystems/afs.txt @@ -1,31 +1,82 @@ + kAFS: AFS FILESYSTEM -ABOUT -= +Contents: + + - Overview. + - Usage. + - Mountpoints. + - Proc filesystem. + - The cell database. + - Security. + - Examples. + + + +OVERVIEW + -This filesystem provides a fairly simple AFS filesystem driver. It is under -development and only provides very basic facilities. It does not yet support -the following AFS features: +This filesystem provides a fairly simple secure AFS filesystem driver. It is +under development and does not yet provide the full feature set. The features +it does support include: - (*) Write support. - (*) Communications security. - (*) Local caching. - (*) pioctl() system call. - (*) Automatic mounting of embedded mountpoints. + (*) Security (currently only AFS kaserver and KerberosIV tickets). + (*) File reading. + (*) Automounting. + +It does not yet support the following AFS features: + + (*) Write support. + + (*) Local caching. + + (*) pioctl() system call. + + +=== +COMPILATION +=== + +The filesystem should be enabled by turning on the kernel configuration +options: + + CONFIG_AF_RXRPC - The RxRPC protocol transport + CONFIG_RXKAD- The RxRPC Kerberos security handler + CONFIG_AFS - The AFS filesystem + +Additionally, the following can be turned on to aid debugging: + + CONFIG_AF_RXRPC_DEBUG - Permit AF_RXRPC debugging to be enabled + CONFIG_AFS_DEBUG- Permit AFS debugging to be enabled + +They permit the debugging messages to be turned on dynamically by manipulating +the masks in the following files: + + /sys/module/af_rxrpc/parameters/debug + /sys/module/afs/parameters/debug + + += USAGE = When inserting the driver modules the root cell must be specified along with a list of volume location server IP addresses: - insmod rxrpc.o + insmod af_rxrpc.o + insmod rxkad.o insmod kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91 -The first module is a driver for the RxRPC remote operation protocol, and the -second is the actual filesystem driver for the AFS filesystem. +The first module is the AF_RXRPC network protocol driver. This provides the +RxRPC remote operation protocol and may also be accessed from userspace. See: + + Documentation/networking/rxrpc.txt + +The second module is the kerberos RxRPC security driver, and the third module +is the actual filesystem driver for the AFS filesystem. Once the module has been loaded, more modules can be added by the following procedure: @@ -33,7 +84,7 @@ procedure: echo add grand.central.org 18.7.14.88:128.2.191.224 /proc/fs/afs/cells Where the parameters to the add command are the name of a cell and a list of -volume location servers within that cell. +volume location servers within that cell, with the latter separated by colons. Filesystems can be mounted anywhere by commands similar to the following: @@ -42,11 +93,6 @@ Filesystems can be mounted anywhere by commands similar to the following: mount -t afs #root.afs. /afs mount -t afs #root.cell. /afs/cambridge - NB: When using this on Linux 2.4, the mount command has to be different, - since the filesystem doesn't have access to the device name argument: - - mount -t afs none /afs -ovol=#root.afs. - Where the initial character is either a hash or a percent symbol depending on whether you definitely want a R/W volume (hash) or whether you'd prefer a R/O volume, but are willing to use a R/W volume instead (percent). @@ -60,55 +106,66 @@ named volume will be looked up in the cell specified during insmod. Additional cells can be added through /proc (see later section). +=== MOUNTPOINTS === -AFS has a concept of mountpoints. These are specially formatted symbolic links -(of the same form as the device name passed to mount). kAFS presents these -to the user as directories that have special properties: +AFS has a concept of mountpoints. In AFS terms, these are specially formatted +symbolic links (of the same form as the device name passed to mount). kAFS +presents these to the user as directories that have a follow-link capability +(ie: symbolic link semantics). If anyone attempts to access them, they will +automatically cause the target volume to be mounted (if possible) on that site. - (*) They cannot
Re: [PATCH] usb-net/pegasus: fix pegasus carrier detection
On Wed, 2007-04-25 at 18:09 +0300, Petko Manolov wrote: On Wed, 25 Apr 2007, Dan Williams wrote: On Wed, 2007-04-25 at 17:58 +0300, Petko Manolov wrote: In general i agree with the reasoning below. However, isn't it better to remove the code that sets carrier on/off in intr_callback()? I'm fine with this; whatever makes carrier status work makes me happy :) Great. Are you going to submit the new patch or this hard labor will lay on my shoulders? :) Well, it looked like you already had one; but if you'd like I'll whip up a new one. Dan Petko There's a reliable way of getting the link status by reading the MII. After correct checking of the return value from read_mii_word(), set_carrier() is what is good enough. If 2 seconds is too long of an interval we could reduce it to 1 second or, if needed, less. I'd like to avoid adding additional flags per device as it will take forever to collect information about their correct behavior and update pegasus.h. In short i think this part of your patch should be enough: --- @@ -847,10 +848,16 @@ static void intr_callback(struct urb *urb) * d[0].NO_CARRIER kicks in only with failed TX. * ... so monitoring with MII may be safest. */ - if (d[0] NO_CARRIER) - netif_carrier_off(net); - else - netif_carrier_on(net); - /* bytes 3-4 == rx_lostpkt, reg 2E/2F */ pegasus-stats.rx_missed_errors += ((d[3] 0x7f) 8) | d[4]; @@ -950,7 +957,7 @@ static void set_carrier(struct net_device *net) pegasus_t *pegasus = netdev_priv(net); u16 tmp; - if (!read_mii_word(pegasus, pegasus-phy, MII_BMSR, tmp)) + if (read_mii_word(pegasus, pegasus-phy, MII_BMSR, tmp)) return; --- cheers, Petko On Tue, 24 Apr 2007, Dan Williams wrote: On Tue, 2007-04-24 at 20:48 +0300, [EMAIL PROTECTED] wrote: On Tue, Apr 24, 2007 at 12:49:12PM -0400, Jeff Garzik wrote: Long term, Greg seemed OK with moving the net drivers from drivers/usb/net to drivers/usb/net, in line with the current policy of placing net drivers in drivers/net/*, bus agnostic. After that move, sending to netdev and me (as you did here) would be the preferred avenue. Speaking of which, do you want me to do this in the 2.6.22-rc1 timeframe? Usually big code moves like this are good to do right after rc1 comes out as the major churn is usually completed then. Sorry to interfere, but could you guys wait until tomorrow before applying the patch to your respective GIT trees? I'd like to check if the code is doing the right thing and avoid patch reversal. Original problem was that the patch I referenced in the commit message from Jan 6 2006 switched the return value semantics from read_mii_word(). Before the patch, read_mii_word returned 1 on success, 0 on error. After the patch, it returns the generally accepted 0 on success and !0 on error. That causes set_carrier() to return immediately rather than fiddle with netif_carrier_*. When the Jan 6 2006 patch went in changing the return values, set_carrier() was not updated for the new return values. Nothing else in the code cares about read_mii_word()'s return value except set_carrier(). But when the card is brought up and no cable is plugged in, intr_callback() gets called repeatedly, which itself repeatedly calls netif_carrier_on() due to the NO_CARRIER check. The comment there about NO_CARRIER kicks in on TX failure seems accurate, because even with no cable plugged in, and therefore no packets getting transmitted, the NO_CARRIER check is never true on the Belkin part. Therefore, netif_carrier_on() is always called as a result of the failure of d[0] NO_CARRIER, turning carrier back on even if there is no cable plugged in. This bulldozes over the MII carrier_check routine too. I don't think the intr_callback() code should ever turn the carrier _on_, because there's that 2*HZ MII carrier check which can certainly handle the carrier on/off stuff. LINK_STATUS appears valid on the Belkin part too, so we can add that as a reverse-quirk and use LINK_STATUS on parts where it works. If you think that the NO_CARRIER check should be in _addition_ to the LINK_STATUS check, that's fine with me, provided that the NO_CARRIER check only turns carrier off. Dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/16] AF_RXRPC: Make it possible to merely try to cancel timers from a module [try #4]
Export try_to_del_timer_sync() for use by the AF_RXRPC module. Signed-Off-By: David Howells [EMAIL PROTECTED] --- kernel/timer.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/kernel/timer.c b/kernel/timer.c index dd6c2c1..b22bd39 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -505,6 +505,8 @@ out: return ret; } +EXPORT_SYMBOL(try_to_del_timer_sync); + /** * del_timer_sync - deactivate a timer and wait for the handler to finish. * @timer: the timer to be deactivated - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb-net/pegasus: fix pegasus carrier detection
The patch went upstream ~24 hours ago: c43c49bd61fdb9bb085ddafcaadb17d06f95ec43 Upstream is the base for any new patches. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cfg80211: new wireless config infrastructure
Hi there, John W. Linville schrieb: From: Johannes Berg [EMAIL PROTECTED] --- /dev/null +++ b/net/wireless/core.c @@ -0,0 +1,209 @@ +/* + * This is the linux wireless configuration interface. + * + * Copyright 2006, 2007 Johannes Berg [EMAIL PROTECTED] + */ + +#include linux/if.h +#include linux/module.h +#include linux/err.h +#include linux/mutex.h +#include linux/list.h +#include linux/nl80211.h +#include linux/debugfs.h +#include linux/notifier.h +#include linux/device.h +#include net/genetlink.h +#include net/cfg80211.h +#include net/wireless.h +#include core.h +#include sysfs.h + +/* name for sysfs, %d is appended */ +#define PHY_NAME phy + +MODULE_AUTHOR(Johannes Berg); +MODULE_LICENSE(GPL); +MODULE_DESCRIPTION(wireless configuration support); + +/* RCU might be appropriate here since we usually + * only read the list, and that can happen quite + * often because we need to do it for each command */ +LIST_HEAD(cfg80211_drv_list); +DEFINE_MUTEX(cfg80211_drv_mutex); +static int wiphy_counter; + +/* for debugfs */ +static struct dentry *ieee80211_debugfs_dir; + +/* exported functions */ + +struct wiphy *wiphy_new(struct cfg80211_ops *ops, int sizeof_priv) +{ + struct cfg80211_registered_device *drv; + int alloc_size; + + alloc_size = sizeof(*drv) + sizeof_priv; + + drv = kzalloc(alloc_size, GFP_KERNEL); + if (!drv) + return NULL; + + drv-ops = ops; + + mutex_lock(cfg80211_drv_mutex); + + if (unlikely(wiphy_counter0)) { mutex_unlock(cfg80211_drv_mutex); + /* ugh, wrapped! */ + kfree(drv); + return NULL; + } + drv-idx = wiphy_counter; + + /* give it a proper name */ + snprintf(drv-wiphy.dev.bus_id, BUS_ID_SIZE, + PHY_NAME %d, drv-idx); + + /* now increase counter for the next time */ + wiphy_counter++; + mutex_unlock(cfg80211_drv_mutex); Since drv and its contents are not visible to anyone yet, I suggest the following code flow for that: mutex_lock(cfg80211_drv_mutex); drv-idx = wiphy_counter; /* increase counter for the next time, if id didn't wrap */ if (drv-idx = 0) wiphy_counter++; mutex_unlock(cfg80211_drv_mutex); if (drv-idx 0) { kfree(drv); return NULL; } /* give it a proper name */ snprintf(drv-wiphy.dev.bus_id, BUS_ID_SIZE, PHY_NAME %d, drv-idx); [enqueue to all lists here] Rest looks good so far. Regards Ingo Oeser - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Fwd: [PATCH] [TIPC]: Enhancements to msg_set_bits() routine]
Hi Jon, Jon Paul Maloy schrieb: 2) The code has been optimized to minimize the number of run-time endianness conversion operations by leveraging the fact that the mask (and, in some cases, the value as well) is constant and the necessary conversion can be performed by the compiler. 3) It can be checked by sparse, if you use proper types. diff --git a/net/tipc/msg.h b/net/tipc/msg.h index 62d5490..5c64e55 100644 --- a/net/tipc/msg.h +++ b/net/tipc/msg.h @@ -71,8 +71,11 @@ static inline void msg_set_word(struct tipc_msg *m, u32 w, u32 val) static inline void msg_set_bits(struct tipc_msg *m, u32 w, u32 pos, u32 mask, u32 val) static inlinevoid msg_set_bits(struct tipc_msg *m, u32 w, u32 pos, __be32 mask, __be32 val) Care to resubmit? Best Regards Ingo Oeser - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior
-Original Message- From: J Hadi Salim [mailto:[EMAIL PROTECTED] On Behalf Of jamal Sent: Wednesday, April 25, 2007 4:37 AM To: Stephen Hemminger Cc: Waskiewicz Jr, Peter P; netdev@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]; cramerj; Kok, Auke-jan H; Leech, Christopher; [EMAIL PROTECTED] Subject: Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior On Tue, 2007-24-04 at 21:05 -0700, Stephen Hemminger wrote: Peter P Waskiewicz Jr wrote: Only if this binary compatiable with older kernels. It is not. But i think that is a lesser problem, the bigger question is: Why would you need to change a qdisc just so you can support egress multiqueues? The previous version of my multiqueue patches I sent for consideration had feedback from Patrick McHardy asking that the user be able to configure the PRIO qdisc to run with multiqueue support or not. That is why TC needed a modification, since I agreed with Patrick that this would be a useful option. All the versions of multiqueue network device support I've sent for consideration had PRIO modified to support multiqueue devices, since it lends itself well for the model of multiple, independent flows. BTW, is there any reason this is being cced to lkml? Since this change affects how tc interacts with the qdisc layer, I cced lkml. cheers, jamal PS:- I havent read the kernel patches (i am congested and about 1000 messages behind on netdev) and my opinions may be influenced by an approach i have in trying to help someone fixup a wireless driver with multiqueue support. As long as someone is looking at them, I'll be happy. :-) Thanks, -PJ Waskiewicz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] infinite recursion in netlink
Hello! Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. The patch also makes some minimal cleanup: 1. Return something consistent (-ENOENT) when fib table is missing 2. Do not crash when queue is empty (does not happen, but yet) 3. Put result of lookup Signed-off-by: Alexey Kuznetsov [EMAIL PROTECTED] diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index fc920f6..cac06c4 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -776,6 +776,8 @@ static void nl_fib_lookup(struct fib_res .nl_u = { .ip4_u = { .daddr = frn-fl_addr, .tos = frn-fl_tos, .scope = frn-fl_scope } } }; + + frn-err = -ENOENT; if (tb) { local_bh_disable(); @@ -787,6 +789,7 @@ static void nl_fib_lookup(struct fib_res frn-nh_sel = res.nh_sel; frn-type = res.type; frn-scope = res.scope; + fib_res_put(res); } local_bh_enable(); } @@ -801,6 +804,9 @@ static void nl_fib_input(struct sock *sk struct fib_table *tb; skb = skb_dequeue(sk-sk_receive_queue); + if (skb == NULL) + return; + nlh = (struct nlmsghdr *)skb-data; if (skb-len NLMSG_SPACE(0) || skb-len nlh-nlmsg_len || nlh-nlmsg_len NLMSG_LENGTH(sizeof(*frn))) { @@ -813,7 +819,7 @@ static void nl_fib_input(struct sock *sk nl_fib_lookup(frn, tb); - pid = nlh-nlmsg_pid; /*pid of sending process */ + pid = NETLINK_CB(skb).pid; /* pid of sending process */ NETLINK_CB(skb).pid = 0; /* from kernel */ NETLINK_CB(skb).dst_group = 0; /* unicast */ netlink_unicast(sk, skb, pid, MSG_DONTWAIT); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
sysctls
I note that the networking tree is adding new sysctls: HEAD/include/linux/sysctl.h NET_IPV6_ACCEPT_SOURCE_ROUTE=25, === NET_IPV6_OPTIMISTIC_DAD=24, NET_IPV6_ACCEPT_SOURCE_ROUTE=25, /include/linux/sysctl.h (Well, it's trying to - there are some git rejects in net-2.6.22) But we kind-of decided a while back to stop doing that and to use CTL_UNNUMBERED. Frankly, I don't 100% remember the thinking - Eric, can you please remind us? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] AF_RXRPC socket family and AFS rewrite [try #3]
From: David Howells [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 14:38:32 +0100 I think the idea is for them (or at least some of them) to go through one of DaveM's net git trees anyway. Then please generate your patches against my net-2.6.21 GIT tree. Most of your initial patches in the series (the SKB routine one for example) are already in my tree. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink locking warnings in 2.6.21-rc7-mm1
I just retested bare net-2.6.22, pulled 30 minutes ago. I got just one warning: PM: Removing info for No Bus::06:0b.0 eth0: no IPv6 routers present ipw2200: Radio Frequency Kill Switch is On: Kill switch must be turned off for wireless networking to work. PM: Adding info for No Bus:eth1 ipw2200: Detected geography ZZA (11 802.11bg channels, 13 802.11a channels) ipw2200: Failed to send WEP_KEY: Aborted due to RF kill switch. ipw2200: Failed to send WEP_KEY: Command timed out. ipw2200: Failed to send WEP_KEY: Command timed out. BUG: at kernel/mutex-debug.c:82 debug_mutex_unlock() [c012d18a] debug_mutex_unlock+0x5a/0x134 [c02d67e2] __mutex_unlock_slowpath+0x9d/0xcf [f8c3618b] ipw_wx_set_encode+0x0/0x82 [ipw2200] [c028b92c] rtnl_unlock+0xa/0x29 [c0286651] dev_ioctl+0x3d0/0x402 [c014b078] __handle_mm_fault+0x7c6/0x7e8 [c01a649b] selinux_file_alloc_security+0x1f/0x40 [c027b943] sock_ioctl+0x0/0x1be [c0162925] do_ioctl+0x19/0x4d [c0162b58] vfs_ioctl+0x1ff/0x216 [c0162bbb] sys_ioctl+0x4c/0x65 [c0103b0c] syscall_call+0x7/0xb [c02d] unix_dgram_sendmsg+0x76/0x400 === It's 100% reproducible here, using http://userweb.kernel.org/~akpm/config-sony.txt The weird ASSERT_RTNL warnings aren't there, so something else in -mm (prior to git-net.patch in the series file) would appear to be interacting with net changes. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sysctls
Andrew Morton [EMAIL PROTECTED] writes: I note that the networking tree is adding new sysctls: HEAD/include/linux/sysctl.h NET_IPV6_ACCEPT_SOURCE_ROUTE=25, === NET_IPV6_OPTIMISTIC_DAD=24, NET_IPV6_ACCEPT_SOURCE_ROUTE=25, /include/linux/sysctl.h (Well, it's trying to - there are some git rejects in net-2.6.22) But we kind-of decided a while back to stop doing that and to use CTL_UNNUMBERED. Frankly, I don't 100% remember the thinking - Eric, can you please remind us? The thinking is this: Binary sysctl numbers are a problem because of patch conflicts like the above, and the related user space breakage they cause. In practice no one uses binary sysctl numbers. So the policy should be to add new sysctl's using CTL_UNNUMBERED (to prevent patch conflicts and user space breakage). There may be cases where someone actually needs the binary sysctl interface. Once there is a demonstrated need we can go back and very carefully add numbers for these very few cases, with a strong review process. Adding binary sysctl numbers should be done as carefully as and with as much review as adding syscall numbers, and distro kernels and other stable kernels should never get a sysctl number backport until the number first reaches Linus's tree. To avoid difference in meaning between different kernels. Given that no one except on BSD uses the binary sysctl interface anyway my personal preference is to just freeze it and to reduce the number of binary sysctls we support if possible. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sysctls
On Wed, Apr 25, 2007 at 01:45:19PM -0600, Eric W. Biederman wrote: Andrew Morton [EMAIL PROTECTED] writes: I note that the networking tree is adding new sysctls: HEAD/include/linux/sysctl.h NET_IPV6_ACCEPT_SOURCE_ROUTE=25, === NET_IPV6_OPTIMISTIC_DAD=24, NET_IPV6_ACCEPT_SOURCE_ROUTE=25, /include/linux/sysctl.h (Well, it's trying to - there are some git rejects in net-2.6.22) But we kind-of decided a while back to stop doing that and to use CTL_UNNUMBERED. Frankly, I don't 100% remember the thinking - Eric, can you please remind us? The thinking is this: Binary sysctl numbers are a problem because of patch conflicts like the above, and the related user space breakage they cause. In practice no one uses binary sysctl numbers. So the policy should be to add new sysctl's using CTL_UNNUMBERED (to prevent patch conflicts and user space breakage). There may be cases where someone actually needs the binary sysctl interface. Once there is a demonstrated need we can go back and very carefully add numbers for these very few cases, with a strong review process. Adding binary sysctl numbers should be done as carefully as and with as much review as adding syscall numbers, and distro kernels and other stable kernels should never get a sysctl number backport until the number first reaches Linus's tree. To avoid difference in meaning between different kernels. Given that no one except on BSD uses the binary sysctl interface anyway my personal preference is to just freeze it and to reduce the number of binary sysctls we support if possible. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I did the optimistic dad sysctl, and have no strict use for numbered sysctls (I was just unaware of the policy). I'll work up a patch to use register_sysclt_table with CTL_UNNUMBERED in the next few days. Regards Neil - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] AF_RXRPC socket family and AFS rewrite [try #3]
David Miller [EMAIL PROTECTED] wrote: Then please generate your patches against my net-2.6.21 GIT tree. Most of your initial patches in the series (the SKB routine one for example) are already in my tree. Do you mean your net-2.6.22 GIT tree? Do you want me to make it available as a GIT tree for you to pull? Or would you prefer patches? David - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sysctls
From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 12:29:24 -0700 I note that the networking tree is adding new sysctls: HEAD/include/linux/sysctl.h NET_IPV6_ACCEPT_SOURCE_ROUTE=25, === NET_IPV6_OPTIMISTIC_DAD=24, NET_IPV6_ACCEPT_SOURCE_ROUTE=25, /include/linux/sysctl.h (Well, it's trying to - there are some git rejects in net-2.6.22) I knew this was going to happen because of Yoshifuji's security fix, the conflict is trivial to resolve. I'll rebase the net-2.6.22 tree later today since all we should have before 2.6.21-final is the netlink OOPS'er fix Alexey just posted. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/16] AF_RXRPC socket family and AFS rewrite [try #3]
From: David Howells [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 20:56:47 +0100 David Miller [EMAIL PROTECTED] wrote: Then please generate your patches against my net-2.6.21 GIT tree. Most of your initial patches in the series (the SKB routine one for example) are already in my tree. Do you mean your net-2.6.22 GIT tree? Do you want me to make it available as a GIT tree for you to pull? Or would you prefer patches? Just patches is perfectly fine. Also, if it's easier to diff against -mm, that works too since Andrew integrates my net-2.6.22 tree into -mm most of the time. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] infinite recursion in netlink
On Wed, Apr 25, 2007 at 10:38:56PM +0400, Alexey Kuznetsov wrote: Hello! Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. Any hint on when this feature appeared so that we can notify the distros for older releases? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] infinite recursion in netlink
From: Greg KH [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 12:59:41 -0700 On Wed, Apr 25, 2007 at 10:38:56PM +0400, Alexey Kuznetsov wrote: Hello! Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. Any hint on when this feature appeared so that we can notify the distros for older releases? It's been there since Jun 20th, 2005 commit 246955fe4c38bd706ae30e37c64892c94213775d Author: Robert Olsson [EMAIL PROTECTED] Date: Mon Jun 20 13:36:39 2005 -0700 [NETLINK]: fib_lookup() via netlink Below is a more generic patch to do fib_lookup via netlink. For others we should say that we discussed this as a way to verify route selection. It's also possible there are others uses for this. In short the fist half of struct fib_result_nl is filled in by caller and netlink call fills in the other half and returns it. In case anyone is interested there is a corresponding user app to compare the full routing table this was used to test implementation of the LC-trie. Signed-off-by: David S. Miller [EMAIL PROTECTED] diff --git a/include/linux/netlink.h b/include/linux/netlink.h index e38407a..561d4dc 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -14,6 +14,7 @@ #define NETLINK_SELINUX7 /* SELinux event notifications */ #define NETLINK_ARPD 8 #define NETLINK_AUDIT 9 /* auditing */ +#define NETLINK_FIB_LOOKUP 10 #define NETLINK_ROUTE6 11 /* af_inet6 route comm channel */ #define NETLINK_IP6_FW 13 #define NETLINK_DNRTMSG14 /* DECnet routing messages */ diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index e5a5f6b..a4208a3 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -109,6 +109,20 @@ struct fib_result { #endif }; +struct fib_result_nl { + u32 fl_addr; /* To be looked up*/ + u32 fl_fwmark; + unsigned char fl_tos; + unsigned char fl_scope; + unsigned char tb_id_in; + + unsigned char tb_id; /* Results */ + unsigned char prefixlen; + unsigned char nh_sel; + unsigned char type; + unsigned char scope; + int err; +}; #ifdef CONFIG_IP_ROUTE_MULTIPATH diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 563e7d6..cd8e45a 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -516,6 +516,60 @@ static void fib_del_ifaddr(struct in_ifaddr *ifa) #undef BRD1_OK } +static void nl_fib_lookup(struct fib_result_nl *frn, struct fib_table *tb ) +{ + + struct fib_result res; + struct flowifl = { .nl_u = { .ip4_u = { .daddr = frn-fl_addr, + .fwmark = frn-fl_fwmark, + .tos = frn-fl_tos, + .scope = frn-fl_scope } } }; + if (tb) { + local_bh_disable(); + + frn-tb_id = tb-tb_id; + frn-err = tb-tb_lookup(tb, fl, res); + + if (!frn-err) { + frn-prefixlen = res.prefixlen; + frn-nh_sel = res.nh_sel; + frn-type = res.type; + frn-scope = res.scope; + } + local_bh_enable(); + } +} + +static void nl_fib_input(struct sock *sk, int len) +{ + struct sk_buff *skb = NULL; +struct nlmsghdr *nlh = NULL; + struct fib_result_nl *frn; + int err; + u32 pid; + struct fib_table *tb; + + skb = skb_recv_datagram(sk, 0, 0, err); + nlh = (struct nlmsghdr *)skb-data; + + frn = (struct fib_result_nl *) NLMSG_DATA(nlh); + tb = fib_get_table(frn-tb_id_in); + + nl_fib_lookup(frn, tb); + + pid = nlh-nlmsg_pid; /*pid of sending process */ + NETLINK_CB(skb).groups = 0; /* not in mcast group */ + NETLINK_CB(skb).pid = 0; /* from kernel */ + NETLINK_CB(skb).dst_pid = pid; + NETLINK_CB(skb).dst_groups = 0; /* unicast */ + netlink_unicast(sk, skb, pid, MSG_DONTWAIT); +} + +static void nl_fib_lookup_init(void) +{ + netlink_kernel_create(NETLINK_FIB_LOOKUP, nl_fib_input); +} + static void fib_disable_ip(struct net_device *dev, int force) { if (fib_sync_down(0, dev, force)) @@ -604,6 +658,7 @@ void __init ip_fib_init(void) register_netdevice_notifier(fib_netdev_notifier); register_inetaddr_notifier(fib_inetaddr_notifier); + nl_fib_lookup_init(); } EXPORT_SYMBOL(inet_addr_type); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a
Re: sysctls
David Miller [EMAIL PROTECTED] writes: From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 12:29:24 -0700 I note that the networking tree is adding new sysctls: HEAD/include/linux/sysctl.h NET_IPV6_ACCEPT_SOURCE_ROUTE=25, === NET_IPV6_OPTIMISTIC_DAD=24, NET_IPV6_ACCEPT_SOURCE_ROUTE=25, /include/linux/sysctl.h (Well, it's trying to - there are some git rejects in net-2.6.22) I knew this was going to happen because of Yoshifuji's security fix, the conflict is trivial to resolve. I'll rebase the net-2.6.22 tree later today since all we should have before 2.6.21-final is the netlink OOPS'er fix Alexey just posted. David for clarity do you happen to know of anyone using binary sysctl values? In particular is there any reason not to use CTL_UNNUMBERED for new networking sysctls? Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] infinite recursion in netlink
From: Alexey Kuznetsov [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 22:38:56 +0400 Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. The patch also makes some minimal cleanup: 1. Return something consistent (-ENOENT) when fib table is missing 2. Do not crash when queue is empty (does not happen, but yet) 3. Put result of lookup Signed-off-by: Alexey Kuznetsov [EMAIL PROTECTED] Applied, thanks a lot Alexey. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.20.7 mss negotiation and path mtu discovery mostly broken?
I had previously posted this message to linux-kernel, but David Miller asked me to post here instead. Some replies to my message on l-k have already been copied here. I'm seeing a problem where the kernel attempts to send packets with a MSS larger than the one negotiated when the TCP connection is established. Even after ICMP can't fragment messages arrive, the kernel still attempts to increase the MSS rather aggressively. The end result is extremely poor throughput when sending to a network with a smaller MTU. In /proc/sys/net/ipv4: ip_no_pmtu_disc:0 tcp_mtu_probing:0 The sending host (10.2.10.254) has an MTU of 9000. The destination host (12.33.234.69) has an MTU of 1500. There is one router between the hosts which will drop packets with the DF flag when they don't fit the destination interface's MTU and generates the required icmp can't fragment message. The dump shows the initial handshake with correct mss options sent: 08:39:55.493029 IP 12.33.234.69.35026 10.2.10.254.22: S 2768979373:2768979373( 0) win 5840 mss 1460,sackOK,timestamp 3873837730 0,nop,wscale 2 08:39:55.493119 IP 10.2.10.254.22 12.33.234.69.35026: S 963242385:963242385(0) ack 2768979374 win 17896 mss 8960,sackOK,timestamp 413751 3873837730,nop,wscal e 5 Then I see the system send larger packets (larger than the mss), provoking a can't fragment from the router. Now I suppose it might be reasonable to occasionally probe a larger MSS when the current MSS is a result of reductions due to path mtu discovery. After all, the path taken could change over time. But when the current MSS is at the value negotiated by the MSS option during the TCP handshake, it seems like there's no sense in trying to send with a lager MSS. Even if there were, there's certainly no justification for making such an attempt every other packet (2.6.18) or every fourth packet (2.6.20.7). In the following dump, the system eventually gets in a state where it oscillates between sendng undeliverable 2896 byte packets and deliverable 1448 byte ones. 08:39:55.649689 IP 10.2.10.254.22 12.33.234.69.35026: . 5906:10250(4344) ack 1 794 win 674 nop,nop,timestamp 413790 3873837887 08:39:55.650532 IP 10.2.10.1 10.2.10.254: icmp 92: 12.33.234.69 unreachable - need to frag (mtu 1500) 08:39:55.689774 IP 12.33.234.69.35026 10.2.10.254.22: . ack 5906 win 4544 nop ,nop,timestamp 3873837927 413790 08:39:55.689784 IP 10.2.10.254.22 12.33.234.69.35026: . 10250:13146(2896) ack 1794 win 674 nop,nop,timestamp 413800 3873837927 08:39:55.690497 IP 10.2.10.1 10.2.10.254: icmp 92: 12.33.234.69 unreachable - need to frag (mtu 1500) 08:39:55.902494 IP 10.2.10.254.22 12.33.234.69.35026: . 5906:7354(1448) ack 17 94 win 674 nop,nop,timestamp 413853 3873837927 Since any sane router will only generate can't fragment ICMP's at a limited rate, for two hosts on gigabit ethernet, one on a MTU 1500 subnet and another on a MTU 9000 subnet, I can move only 40-50KB/sec over an affected TCP connection. I was unable to find any reference to this problem in the kernel changelogs, or even any reports of anyone else having a similar problem. The above dumps are from 2.6.19.7. I could also reproduce the problem in 2.6.18, although the dumps looked slightly different. I was unable to reproduce this problem with the 2.6.9-42.0.10.ELsmp kernel which ships in RHEL4. I can send a pcap dump to anyone interested. -- Brian Ristuccia This email message and any attachments are confidential information of Starent Networks, Corp. The information transmitted may not be used to create or change any contractual obligations of Starent Networks, Corp. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this e-mail and its attachments by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify the sender immediately -- by replying to this message or by sending an email to [EMAIL PROTECTED] -- and destroy all copies of this message and any attachments without reading or disclosing their contents. Thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sysctls
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Wed, 25 Apr 2007 14:06:34 -0600 David for clarity do you happen to know of anyone using binary sysctl values? None at all. In particular is there any reason not to use CTL_UNNUMBERED for new networking sysctls? Neil said he would send me a patch to do that. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: 2.6.20.7 mss negotiation and path mtu discovery mostly broken?
I'm seeing a problem where the kernel attempts to send packets with a MSS larger than the one negotiated when the TCP connection is established. Even after ICMP can't fragment messages arrive, the kernel still attempts to increase the MSS rather aggressively. The end result is extremely poor throughput when sending to a network with a smaller MTU. I've tracked this problem to the TSO feature in the bnx2 driver. Turning off TSO with ethtool -K eth1 tso off seems to work around the problem. It appears that the bnx2 device is not using the correct mss when performing segmentation offload. -Brian This email message and any attachments are confidential information of Starent Networks, Corp. The information transmitted may not be used to create or change any contractual obligations of Starent Networks, Corp. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this e-mail and its attachments by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify the sender immediately -- by replying to this message or by sending an email to [EMAIL PROTECTED] -- and destroy all copies of this message and any attachments without reading or disclosing their contents. Thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sysctls
David Miller [EMAIL PROTECTED] writes: From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Wed, 25 Apr 2007 14:06:34 -0600 David for clarity do you happen to know of anyone using binary sysctl values? None at all. In particular is there any reason not to use CTL_UNNUMBERED for new networking sysctls? Neil said he would send me a patch to do that. Thanks. I just wanted to be certain I wasn't missing something, when asking people not to use binary sysctl values. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Security] [PATCH] infinite recursion in netlink
On Wed, 25 Apr 2007, Alexey Kuznetsov wrote: Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. So I assume it's this line that actually _fixes_ it: - pid = nlh-nlmsg_pid; /*pid of sending process */ + pid = NETLINK_CB(skb).pid; /* pid of sending process */ NETLINK_CB(skb).pid = 0; /* from kernel */ NETLINK_CB(skb).dst_group = 0; /* unicast */ netlink_unicast(sk, skb, pid, MSG_DONTWAIT); No? If so, shouldn't we also have some safety-net to make sure it doesn't still get routed back forever, ie adding something like if (!pid) { skb_free(skb); return -EINVAL; } or similar? I don't know the netlink layer from a dolphin, but if the old code could cause infinite recursion, it sounds like the new code could too with the right pid, since the only change is the choice of pid. Yes/No/This is why Linus is a dickweed and doesn't understand the problem? Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Security] [PATCH] infinite recursion in netlink
From: Linus Torvalds [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 13:15:12 -0700 (PDT) If so, shouldn't we also have some safety-net to make sure it doesn't still get routed back forever, ie adding something like if (!pid) { skb_free(skb); return -EINVAL; } or similar? I don't know the netlink layer from a dolphin, but if the old code could cause infinite recursion, it sounds like the new code could too with the right pid, since the only change is the choice of pid. Netlink pids are more like port numbers in the socket sense, do not confuse them with process pids or similar. The kernel explicitly assigns them to sockets, and zero is special. The fact that the process pid of the socket creator is used as an initial selection heuristic, is just that, a heuristic. Alexey's fix is %100 the right way to go IMHO. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink locking warnings in 2.6.21-rc7-mm1
Andrew Morton wrote: I just retested bare net-2.6.22, pulled 30 minutes ago. I got just one warning: BUG: at kernel/mutex-debug.c:82 debug_mutex_unlock() [c012d18a] debug_mutex_unlock+0x5a/0x134 [c02d67e2] __mutex_unlock_slowpath+0x9d/0xcf [f8c3618b] ipw_wx_set_encode+0x0/0x82 [ipw2200] [c028b92c] rtnl_unlock+0xa/0x29 [c0286651] dev_ioctl+0x3d0/0x402 [c014b078] __handle_mm_fault+0x7c6/0x7e8 [c01a649b] selinux_file_alloc_security+0x1f/0x40 [c027b943] sock_ioctl+0x0/0x1be [c0162925] do_ioctl+0x19/0x4d [c0162b58] vfs_ioctl+0x1ff/0x216 [c0162bbb] sys_ioctl+0x4c/0x65 [c0103b0c] syscall_call+0x7/0xb [c02d] unix_dgram_sendmsg+0x76/0x400 === It's 100% reproducible here, using http://userweb.kernel.org/~akpm/config-sony.txt The weird ASSERT_RTNL warnings aren't there, so something else in -mm (prior to git-net.patch in the series file) would appear to be interacting with net changes. I think I found the problem, the rtnl_mutex was reinitialized on every rtnetlink socket creation. This is most likely responsible for both warnings. [NETLINK]: don't reinitialize callback mutex Don't reinitialize the callback mutex the netlink_kernel_create caller handed in, it is supposed to already be initialized and could already be held by someone. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 9cc4e9c2d8b022c10ded98610a3cd76a8b89cf49 tree e53f10a158858e20ef2e9922cabc5bf43980708d parent 7255fbb088e3f1b8be97472a38f645a8da595fe2 author Patrick McHardy [EMAIL PROTECTED] Wed, 25 Apr 2007 22:47:20 +0200 committer Patrick McHardy [EMAIL PROTECTED] Wed, 25 Apr 2007 22:47:20 +0200 net/netlink/af_netlink.c |8 ++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index ec16c9b..64d4b27 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -388,8 +388,12 @@ static int __netlink_create(struct socket *sock, struct mutex *cb_mutex, sock_init_data(sock, sk); nlk = nlk_sk(sk); - nlk-cb_mutex = cb_mutex ? : nlk-cb_def_mutex; - mutex_init(nlk-cb_mutex); + if (cb_mutex) + nlk-cb_mutex = cb_mutex; + else { + nlk-cb_mutex = nlk-cb_def_mutex; + mutex_init(nlk-cb_mutex); + } init_waitqueue_head(nlk-wait); sk-sk_destruct = netlink_sock_destruct;
Re: netlink locking warnings in 2.6.21-rc7-mm1
From: Patrick McHardy [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 22:51:43 +0200 [NETLINK]: don't reinitialize callback mutex Don't reinitialize the callback mutex the netlink_kernel_create caller handed in, it is supposed to already be initialized and could already be held by someone. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] Applied, thanks a lot for tracking this down Patrick. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
very strange inet_sock corruption with rpc
Hi All To support a piece of custom functionality, we needed to add 2 member to the struct inet_sock. During testing, we started seeing an interesting corruption. Following a hunch, we've completely ripped out all of our code with the exception of 5 lines that do this: diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index ce6da97..605f5c0 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -140,6 +140,8 @@ struct inet_sock { __be32 addr; struct flowifl; } cork; + void *foo; + u32 bar; }; #define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */ diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index cf358c8..98ad2c2 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -335,6 +335,9 @@ lookup_protocol: sk_refcnt_debug_inc(sk); + inet-foo = NULL; + inet-bar = 0; + if (inet-num) { /* It assumes that any protocol which allows * the user to assign a number at socket (Variables were really named something else, but I hacked this into net-2.6 to see if I could reproduce). With just the above patch, I can catch a corruption of the inet_sock in the inet_cks_bind_conflict() with this: diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 43fb160..5cd5b6d 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -45,6 +45,18 @@ int inet_csk_bind_conflict(const struct sock *sk, int reuse = sk-sk_reuse; sk_for_each_bound(sk2, node, tb-owners) { + if (inet_sk(sk2)-foo) { + printk(KERN_WARN sk2 might be corrupt. Info:\n); + printk(KERN_WARN \tsk2 = %p\n, sk2); + printk(KERN_WARN \ttb-port = %d\n, tb-port); + printk(KERN_WARN \tinet_sk(sk2)-num = %d\n, + inet_sk(sk2)-num); + printk(KERN_WARN \tinet_sk(sk2)-foo = %p\n, + inet_sk(sk2)-foo); + printk(KERN_WARN \tinet_sk(sk2)-bar = %p\n, + inet_sk(sk2)-bar); + WARN_ON(1); + } Nobody outside of inet_create() writes to the foo pointer so it should always be NULL. I've enabled SLAB debugging, stack overflow debugging, VM debugging and nothing triggers. The corruption is triggered after about 10 minutes of running the following script: nfspath = $1 localpath = $2 while true; do mount $nfspath $localpath sleep 5 cp /boot/vmlinuz $localpath sleep 5 rm $localpath/vmlinuz sleep 5 umount $localpath done And looks like this: sk2 might be corrupt. Info: sk2 = 8100f004d080 tb-port = 844 inet_sk(sk2)-num = 61695 inet_sk(sk2)-foo = 24242424243f243f inet_sk(sk2)-bar = 3f24243f BUG: at net/ipv4/inet_connection_sock.c:58 inet_csk_bind_conflict() Call Trace: [803cc591] inet_csk_bind_conflict+0xcb/0x178 [803cc4c6] inet_csk_bind_conflict+0x0/0x178 [803cc2ff] inet_csk_get_port+0x11a/0x1ef [803ddf51] inet_bind+0x117/0x1f5 [88184e13] :sunrpc:xs_bindresvport+0x4e/0xbf [881853a4] :sunrpc:xs_tcp_connect_worker+0x0/0x2a0 [88185433] :sunrpc:xs_tcp_connect_worker+0x8f/0x2a0 [80248bd3] run_workqueue+0x8f/0x137 [80245687] worker_thread+0x0/0x14a [8024579b] worker_thread+0x114/0x14a [8027e544] default_wake_function+0x0/0xe [8022ff49] kthread+0xd1/0x100 [80258f68] child_rip+0xa/0x12 [8022fe78] kthread+0x0/0x100 [80258f5e] child_rip+0x0/0x12 It looks like someone is stepping all over the inet_sock. We'll continue looking, but if anyone has any ideas of what might be going on, I'd appreciate it. It looks like a serious bug lurking somewhere. -vlad p.s the mount is using nfsv3 over UDP (nothing fancy at all) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sysctls
On Wed, 25 Apr 2007 15:53:19 -0400 Neil Horman [EMAIL PROTECTED] wrote: I did the optimistic dad sysctl, and have no strict use for numbered sysctls (I was just unaware of the policy). I'll work up a patch to use register_sysclt_table with CTL_UNNUMBERED in the next few days. I don't think you need to add a call to register_sysctl_table(), if that's what you're proposing. Just drop the changes to sysctl.h and use CTL_UNNUMBERED in sysctl.c. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PATCH] [net-2.6.22] IPv6, IPv4 Updates
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 21:55:21 +0900 (JST) Please consider pulling following commits available on net-2.6.22-20070425a-inet6-cleanup-20070425 branch at git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-dev.git. HEADLINES - [IPV6] SIT: Unify code path to get hash array index. [IPV4] IPIP: Unify code path to get hash array index. [IPV4] IP_GRE: Unify code path to get hash array index. [IPV6]: Export in6addr_any for future use. [IPV6] XFRM: Use ip6addr_any where applicable. [IPV6] NDISC: Unify main process of sending ND messages. Pulled, thanks a lot! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: very strange inet_sock corruption with rpc
On Wed, 2007-04-25 at 17:03 -0400, Vlad Yasevich wrote: Hi All To support a piece of custom functionality, we needed to add 2 member to the struct inet_sock. During testing, we started seeing an interesting corruption. Following a hunch, we've completely ripped out all of our code with the exception of 5 lines that do this: diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index ce6da97..605f5c0 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -140,6 +140,8 @@ struct inet_sock { __be32 addr; struct flowifl; } cork; + void *foo; + u32 bar; }; #define IPCORK_OPT 1 /* ip-options has been held in ipcork.opt */ diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index cf358c8..98ad2c2 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -335,6 +335,9 @@ lookup_protocol: sk_refcnt_debug_inc(sk); + inet-foo = NULL; + inet-bar = 0; + if (inet-num) { /* It assumes that any protocol which allows * the user to assign a number at socket (Variables were really named something else, but I hacked this into net-2.6 to see if I could reproduce). With just the above patch, I can catch a corruption of the inet_sock in the inet_cks_bind_conflict() with this: diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 43fb160..5cd5b6d 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -45,6 +45,18 @@ int inet_csk_bind_conflict(const struct sock *sk, int reuse = sk-sk_reuse; sk_for_each_bound(sk2, node, tb-owners) { + if (inet_sk(sk2)-foo) { + printk(KERN_WARN sk2 might be corrupt. Info:\n); + printk(KERN_WARN \tsk2 = %p\n, sk2); + printk(KERN_WARN \ttb-port = %d\n, tb-port); + printk(KERN_WARN \tinet_sk(sk2)-num = %d\n, + inet_sk(sk2)-num); + printk(KERN_WARN \tinet_sk(sk2)-foo = %p\n, + inet_sk(sk2)-foo); + printk(KERN_WARN \tinet_sk(sk2)-bar = %p\n, + inet_sk(sk2)-bar); + WARN_ON(1); + } Nobody outside of inet_create() writes to the foo pointer so it should always be NULL. I've enabled SLAB debugging, stack overflow debugging, VM debugging and nothing triggers. The corruption is triggered after about 10 minutes of running the following script: nfspath = $1 localpath = $2 while true; do mount $nfspath $localpath sleep 5 cp /boot/vmlinuz $localpath sleep 5 rm $localpath/vmlinuz sleep 5 umount $localpath done And looks like this: sk2 might be corrupt. Info: sk2 = 8100f004d080 tb-port = 844 inet_sk(sk2)-num = 61695 inet_sk(sk2)-foo = 24242424243f243f inet_sk(sk2)-bar = 3f24243f BUG: at net/ipv4/inet_connection_sock.c:58 inet_csk_bind_conflict() Call Trace: [803cc591] inet_csk_bind_conflict+0xcb/0x178 [803cc4c6] inet_csk_bind_conflict+0x0/0x178 [803cc2ff] inet_csk_get_port+0x11a/0x1ef [803ddf51] inet_bind+0x117/0x1f5 [88184e13] :sunrpc:xs_bindresvport+0x4e/0xbf [881853a4] :sunrpc:xs_tcp_connect_worker+0x0/0x2a0 [88185433] :sunrpc:xs_tcp_connect_worker+0x8f/0x2a0 If you are using NFS over UDP, why is a TCP routine getting called by sunrpc? [80248bd3] run_workqueue+0x8f/0x137 [80245687] worker_thread+0x0/0x14a [8024579b] worker_thread+0x114/0x14a [8027e544] default_wake_function+0x0/0xe [8022ff49] kthread+0xd1/0x100 [80258f68] child_rip+0xa/0x12 [8022fe78] kthread+0x0/0x100 [80258f5e] child_rip+0x0/0x12 It looks like someone is stepping all over the inet_sock. We'll continue looking, but if anyone has any ideas of what might be going on, I'd appreciate it. It looks like a serious bug lurking somewhere. -vlad p.s the mount is using nfsv3 over UDP (nothing fancy at all) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20.7 mss negotiation and path mtu discovery mostly broken?
Ristuccia, Brian [EMAIL PROTECTED] wrote: 08:39:55.649689 IP 10.2.10.254.22 12.33.234.69.35026: . 5906:10250(4344) ack 1 794 win 674 nop,nop,timestamp 413790 3873837887 08:39:55.650532 IP 10.2.10.1 10.2.10.254: icmp 92: 12.33.234.69 unreachable - need to frag (mtu 1500) Where was this dump taken, on 10.2.10.254? If so could youd either take the dump further down the route or show the full contents (with tcpdump -x) of the ICMP error here so that we can see what the actual packet size was? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] infinite recursion in netlink
Greg KH wrote: On Wed, Apr 25, 2007 at 10:38:56PM +0400, Alexey Kuznetsov wrote: Hello! Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. Any hint on when this feature appeared so that we can notify the distros for older releases? thanks, 2.6.13 if I'm not mistaken, confirmed on debian testing by Simeon Miteff. From man 7 netlink: NETLINK_W1 and NETLINK_FIB_LOOKUP appeared in Linux 2.6.13. Jaco - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bluetooth patches for 2.6.21-rc7
From: Marcel Holtmann [EMAIL PROTECTED] Date: Thu, 26 Apr 2007 01:05:55 +0200 I have two last minute patches before the final 2.6.21 kernel hits the streets. One is a kernel memory leak that has been classified as security issue. The second one is a sysfs fix to correct a wrong use of class and bus devices. I don't think this one will make it as Linus is very eager to get the release out at this point :-) If it doesn't, I will make sure to push it into the -stable branch, so no worries. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Bluetooth patches for 2.6.21-rc7
Hi Dave, I have two last minute patches before the final 2.6.21 kernel hits the streets. One is a kernel memory leak that has been classified as security issue. The second one is a sysfs fix to correct a wrong use of class and bus devices. Regards Marcel Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6.git This will update the following files: net/bluetooth/hci_sock.c |9 + net/bluetooth/hci_sysfs.c |9 - net/bluetooth/l2cap.c |6 ++ 3 files changed, 23 insertions(+), 1 deletion(-) through these ChangeSets: Commit: 9457de6253a222a8c340b0442fb63c172069d962 Author: Marcel Holtmann [EMAIL PROTECTED] Wed, 25 Apr 2007 22:38:39 +0200 [Bluetooth] Attach host adapters to the Bluetooth bus The Bluetooth host adapters are attached to the Bluetooth class and the low-level connections are children of these class devices. Having class devices as parent of bus devices breaks a lot of reasonable assumptions about sysfs. The host adapters should be attached to the Bluetooth bus to simplify the dependency resolving. For compatibility an additional symlink from the Bluetooth class will be used. Signed-off-by: Marcel Holtmann [EMAIL PROTECTED] Commit: 32f1cf0a4643018f8473065d645dbc6b5772e93c Author: Marcel Holtmann [EMAIL PROTECTED] Wed, 25 Apr 2007 22:38:34 +0200 [Bluetooth] Fix L2CAP and HCI setsockopt() information leaks The L2CAP and HCI setsockopt() implementations have a small information leak that makes it possible to leak kernel stack memory to userspace. If the optlen parameter is 0, no data will be copied by copy_from_user(), but the uninitialized stack buffer will be read and stored later. A call to getsockopt() can now retrieve the leaked information. To fix this problem the stack buffer given to copy_from_user() must be initialized with the current settings. Signed-off-by: Marcel Holtmann [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bluetooth patches for 2.6.21-rc7
Hi Dave, I have two last minute patches before the final 2.6.21 kernel hits the streets. One is a kernel memory leak that has been classified as security issue. The second one is a sysfs fix to correct a wrong use of class and bus devices. I don't think this one will make it as Linus is very eager to get the release out at this point :-) I realized that 2.6.21 is almost out of the door. This is why I put Linus on CC. His call. Regards Marcel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sysctls
From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 14:50:18 -0700 On Wed, 25 Apr 2007 15:53:19 -0400 Neil Horman [EMAIL PROTECTED] wrote: I did the optimistic dad sysctl, and have no strict use for numbered sysctls (I was just unaware of the policy). I'll work up a patch to use register_sysclt_table with CTL_UNNUMBERED in the next few days. I don't think you need to add a call to register_sysctl_table(), if that's what you're proposing. Just drop the changes to sysctl.h and use CTL_UNNUMBERED in sysctl.c. Ok, I'll take care of this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink locking warnings in 2.6.21-rc7-mm1
On Wed, 25 Apr 2007 22:51:43 +0200 Patrick McHardy [EMAIL PROTECTED] wrote: I think I found the problem, the rtnl_mutex was reinitialized on every rtnetlink socket creation. This is most likely responsible for both warnings. Yup, no warnings any more, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] bridge: don't change packet type
The change to forward STP bpdu's (for usermode STP) through normal path, changed the packet type in the process. Since link local stuff is multicast, it should stay pkt_type = PACKET_MULTICAST. The code was probably copy/pasted incorrectly from the bridge pseudo-device receive path. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- bridge-2.6.22.orig/net/bridge/br_input.c +++ bridge-2.6.22/net/bridge/br_input.c @@ -131,12 +131,9 @@ struct sk_buff *br_handle_frame(struct n if (!is_valid_ether_addr(eth_hdr(skb)-h_source)) goto drop; - if (unlikely(is_link_local(dest))) { - skb-pkt_type = PACKET_HOST; - + if (unlikely(is_link_local(dest))) return (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb-dev, NULL, br_handle_local_finish) == 0) ? skb : NULL; - } switch (p-state) { case BR_STATE_FORWARDING: -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Bridge patches for 2.6.22
Here are some patches for bridge code in 2.6.22. The user mode RSTP from Aji is working. Anyone who wants to test it can get it from: git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/rstp.git Thanks -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] bridge: drop PAUSE frames
Pause frames should never make it out of the network device into the stack. But if a device was misconfigured, it might happen. So drop pause frames in bridge. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- bridge-2.6.22.orig/include/linux/if_ether.h +++ bridge-2.6.22/include/linux/if_ether.h @@ -61,6 +61,7 @@ #define ETH_P_8021Q0x8100 /* 802.1Q VLAN Extended Header */ #define ETH_P_IPX 0x8137 /* IPX over DIX */ #define ETH_P_IPV6 0x86DD /* IPv6 over bluebook */ +#define ETH_P_PAUSE0x8808 /* IEEE Pause frames. See 802.3 31B */ #define ETH_P_SLOW 0x8809 /* Slow Protocol. See 802.3ad 43B */ #define ETH_P_WCCP 0x883E /* Web-cache coordination protocol * defined in draft-wilson-wrec-wccp-v2-00.txt */ --- bridge-2.6.22.orig/net/bridge/br_input.c +++ bridge-2.6.22/net/bridge/br_input.c @@ -131,9 +131,14 @@ struct sk_buff *br_handle_frame(struct n if (!is_valid_ether_addr(eth_hdr(skb)-h_source)) goto drop; - if (unlikely(is_link_local(dest))) + if (unlikely(is_link_local(dest))) { + /* Pause frames shouldn't be passed up by driver anyway */ + if (skb-protocol == htons(ETH_P_PAUSE)) + goto drop; + return (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb-dev, NULL, br_handle_local_finish) == 0) ? skb : NULL; + } switch (p-state) { case BR_STATE_FORWARDING: -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] bridge: missing rtnl
Writing to /sys/class/net/brX/bridge/stp_state causes a warning because RTNL is not held when call br_stp_if.c Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- net/bridge/br_sysfs_br.c |2 ++ 1 file changed, 2 insertions(+) --- bridge-2.6.22.orig/net/bridge/br_sysfs_br.c +++ bridge-2.6.22/net/bridge/br_sysfs_br.c @@ -149,9 +149,11 @@ static ssize_t show_stp_state(struct dev static void set_stp_state(struct net_bridge *br, unsigned long val) { + rtnl_lock(); spin_unlock_bh(br-lock); br_stp_set_enabled(br, val); spin_lock_bh(br-lock); + rtnl_unlock(); } static ssize_t store_stp_state(struct device *d, -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] bridge: if no STP then forward all BPDUs
If a bridge is not running STP, then it has no way to detect a cycle in the network. But if it is not running STP and some other machine or device is running STP, then if STP BPDU's get forwarded to it can detect the cycle. This is how the old 2.4 and early 2.6 code worked. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- net/bridge/br_input.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) --- bridge-2.6.22.orig/net/bridge/br_input.c +++ bridge-2.6.22/net/bridge/br_input.c @@ -136,8 +136,14 @@ struct sk_buff *br_handle_frame(struct n if (skb-protocol == htons(ETH_P_PAUSE)) goto drop; - return (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb-dev, - NULL, br_handle_local_finish) == 0) ? skb : NULL; + /* Process STP BPDU's through normal netif_receive_skb() path */ + if (p-br-stp_enabled != BR_NO_STP) { + if (NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, skb-dev, + NULL, br_handle_local_finish)) + return NULL; + else + return skb; + } } switch (p-state) { -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] usb-net/pegasus: simplify carrier detection
Simplify pegasus carrier detection; rely only on the periodic MII polling. Reverts pieces of c43c49bd61fdb9bb085ddafcaadb17d06f95ec43. Signed-off-by: Dan Williams [EMAIL PROTECTED] --- a/drivers/usb/net/pegasus.h 2007-04-25 21:21:00.0 -0400 +++ b/drivers/usb/net/pegasus.h 2007-04-25 21:21:13.0 -0400 @@ -11,7 +11,6 @@ #definePEGASUS_II 0x8000 #defineHAS_HOME_PNA0x4000 -#defineTRUST_LINK_STATUS 0x2000 #definePEGASUS_MTU 1536 #defineRX_SKBS 4 @@ -204,7 +203,7 @@ PEGASUS_DEV( Allied Telesyn Int. AT-USB100, VENDOR_ALLIEDTEL, 0xb100, DEFAULT_GPIO_RESET | PEGASUS_II ) PEGASUS_DEV( Belkin F5D5050 USB Ethernet, VENDOR_BELKIN, 0x0121, - DEFAULT_GPIO_RESET | PEGASUS_II | TRUST_LINK_STATUS ) + DEFAULT_GPIO_RESET | PEGASUS_II ) PEGASUS_DEV( Billionton USB-100, VENDOR_BILLIONTON, 0x0986, DEFAULT_GPIO_RESET ) PEGASUS_DEV( Billionton USBLP-100, VENDOR_BILLIONTON, 0x0987, --- a/drivers/usb/net/pegasus.c 2007-04-25 21:20:32.0 -0400 +++ b/drivers/usb/net/pegasus.c 2007-04-25 21:22:15.0 -0400 @@ -848,16 +848,6 @@ * d[0].NO_CARRIER kicks in only with failed TX. * ... so monitoring with MII may be safest. */ - if (pegasus-features TRUST_LINK_STATUS) { - if (d[5] LINK_STATUS) - netif_carrier_on(net); - else - netif_carrier_off(net); - } else { - /* Never set carrier _on_ based on ! NO_CARRIER */ - if (d[0] NO_CARRIER) - netif_carrier_off(net); - } /* bytes 3-4 == rx_lostpkt, reg 2E/2F */ pegasus-stats.rx_missed_errors += ((d[3] 0x7f) 8) | d[4]; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH -mm take4 2/6] support multiple logging
Well.. before you can finish this work we need to decide upon what the interface to userspace will be. - The miscdev isn't appropriate Why isn't miscdev appropriate? We just shouldn't use miscdev for networking conventionally? Yes it's rather odd, especially for networking. What does the miscdev _do_ anyway? Is it purely a target for the ioctls? Yes, I purely use miscdev for the ioctls. I want to use sysfs and ioctl to implement the dynamic configurabillity. The sysfs shows/changes netconsole configurations(IP address, port and so on). A userland application using the ioctl adds/removes netconsole port. I thought that the dynamic configurability could be realized without a userland application. in the kernel only. (e.g. only sysfs, no userland application) But I think we need the function to automatically resolve the destination MAC address from IP address because of the resolving cost and I should implement a userland application, not netconsole kernel module. The netconsle will become more useful by implementing the above function. Some other speculations: 1. Would it be possible to add ioctl's to /dev/console? This would be more in keeping with older Unix style model. 2. Using sysfs makes sense if there is a device object that exists to add the sysfs attributes to. 3. Procfs is handy for summary type tables. 4. Netlink does feel like overkill for this. Although newer generic netlink makes it easier. If I use sysfs, Is it proper location that adds each attributes of netconsole port in /sys/class/misc/netconsole/port[0-9]*, or another locations in /sys/? Stephen Hemminger said The configuration of netconsole's looks like the configuration of routes. I think so too. So I think ioctl commands for adding/removing port and the following userland application like route(8) command by using the ioctl. e.g. 1. add port # netconfig add 192.168.0.10 2. remove port # netconfig remove 1 3. show port info # netconfig id status Source IP Source Port Destination IP Destination Port Destination MAC 1 enable 192.168.0.1 6665192.168.0.10 00:11:22:33:44:55 2 disable 192.168.0.1 6665192.168.0.20 00:11:22:33:44:66 route(8) command uses ioctl for Netlink. But, I'm going to implement ioctl's to /dev/console because of the above comments. Thank you for your comments. Any comments very welcome. -- Keiichi KII NEC Corporation OSS Promotion Center E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH -mm take4 2/6] support multiple logging
From: Keiichi KII [EMAIL PROTECTED] Date: Thu, 26 Apr 2007 13:02:04 +0900 Stephen Hemminger said The configuration of netconsole's looks like the configuration of routes. I think so too. So I think ioctl commands for adding/removing port and the following userland application like route(8) command by using the ioctl. Like the route command itself, the route changing ioctl()s are old deprecated BSD compatible functionality. All current routing configuration is done using netlink and the 'ip' utility. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8342] New: sctp_getsockopt_local_addrs_old() calls copy_to_user() while a spinlock is held
From: Vlad Yasevich [EMAIL PROTECTED] Date: Mon, 23 Apr 2007 13:43:35 -0400 [PATCH] [SCTP] Fix sctp_getsockopt_local_addrs_old() to use local storage sctp_getsockopt_local_addrs_old() in net/sctp/socket.c calls copy_to_user() while the spinlock addr_lock is held. this should not be done as copy_to_user() might sleep. the call to sctp_copy_laddrs_to_user() while holding the lock is also problematic as it calls copy_to_user() Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] As Andrew Morton just noticed and fixed in -mm, you're passing in int pointers to arguments that should be size_t pointers, specifically for some of the calls to sctp_copy_laddrs(). Please fix this, and please start testing builds on 64-bit platforms (even if via cross compile) so that you can catch these as the warnings generated by the compiler on this one were obvious. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] bridge: don't change packet type
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 16:47:38 -0700 The change to forward STP bpdu's (for usermode STP) through normal path, changed the packet type in the process. Since link local stuff is multicast, it should stay pkt_type = PACKET_MULTICAST. The code was probably copy/pasted incorrectly from the bridge pseudo-device receive path. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] bridge: drop PAUSE frames
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 16:47:39 -0700 Pause frames should never make it out of the network device into the stack. But if a device was misconfigured, it might happen. So drop pause frames in bridge. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] It can happen when in promiscuous mode, but that's the only legal case I can think of. But that case shouldn't be hitting this path I don't think. So this change is borderline and if anything we should put an assertion somewhere maybe, but applied for now. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] bridge: missing rtnl
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 16:47:41 -0700 Writing to /sys/class/net/brX/bridge/stp_state causes a warning because RTNL is not held when call br_stp_if.c Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] Applied, thanks Stephen. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Security] [PATCH] infinite recursion in netlink
On Wed, Apr 25, 2007 at 01:15:12PM -0700, Linus Torvalds wrote: On Wed, 25 Apr 2007, Alexey Kuznetsov wrote: Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. Wait, I just had the bright idea of actually testing this before I pushed out a 2.6.20.9 kernel with another fix in it, and nope, still crashes, even with this patch :( Full stackdump in a picture (forgot to have netconsole running) at: http://www.kroah.com/netlink_oops.jpg Any thoughts? I'll go try 2.6.21 now too... thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Security] [PATCH] infinite recursion in netlink
From: Greg KH [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 22:29:12 -0700 On Wed, Apr 25, 2007 at 01:15:12PM -0700, Linus Torvalds wrote: On Wed, 25 Apr 2007, Alexey Kuznetsov wrote: Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. Wait, I just had the bright idea of actually testing this before I pushed out a 2.6.20.9 kernel with another fix in it, and nope, still crashes, even with this patch :( Full stackdump in a picture (forgot to have netconsole running) at: http://www.kroah.com/netlink_oops.jpg Any thoughts? I'll go try 2.6.21 now too... Crap. We should have let this one simmer for a day to get more eyes on it. Thanks for catching this Greg. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Security] [PATCH] infinite recursion in netlink
* Greg KH ([EMAIL PROTECTED]) wrote: On Wed, Apr 25, 2007 at 01:15:12PM -0700, Linus Torvalds wrote: On Wed, 25 Apr 2007, Alexey Kuznetsov wrote: Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. Wait, I just had the bright idea of actually testing this before I pushed out a 2.6.20.9 kernel with another fix in it, and nope, still crashes, even with this patch :( Odd, I tested it too (on linus-git), and it's fixed (it was definitely the problem, of sending back to kernel). thanks, -chris - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Security] [PATCH] infinite recursion in netlink
On Wed, Apr 25, 2007 at 10:32:01PM -0700, David Miller wrote: From: Greg KH [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 22:29:12 -0700 On Wed, Apr 25, 2007 at 01:15:12PM -0700, Linus Torvalds wrote: On Wed, 25 Apr 2007, Alexey Kuznetsov wrote: Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. Wait, I just had the bright idea of actually testing this before I pushed out a 2.6.20.9 kernel with another fix in it, and nope, still crashes, even with this patch :( Full stackdump in a picture (forgot to have netconsole running) at: http://www.kroah.com/netlink_oops.jpg Any thoughts? I'll go try 2.6.21 now too... Crap. We should have let this one simmer for a day to get more eyes on it. Thanks for catching this Greg. Odd, 2.6.21 doesn't crash at all. Can anyone verify that I made the 2.6.20.8 release correctly with the proper patch? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Security] [PATCH] infinite recursion in netlink
On Wed, Apr 25, 2007 at 10:44:20PM -0700, Greg KH wrote: On Wed, Apr 25, 2007 at 10:32:01PM -0700, David Miller wrote: From: Greg KH [EMAIL PROTECTED] Date: Wed, 25 Apr 2007 22:29:12 -0700 On Wed, Apr 25, 2007 at 01:15:12PM -0700, Linus Torvalds wrote: On Wed, 25 Apr 2007, Alexey Kuznetsov wrote: Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. Wait, I just had the bright idea of actually testing this before I pushed out a 2.6.20.9 kernel with another fix in it, and nope, still crashes, even with this patch :( Full stackdump in a picture (forgot to have netconsole running) at: http://www.kroah.com/netlink_oops.jpg Any thoughts? I'll go try 2.6.21 now too... Crap. We should have let this one simmer for a day to get more eyes on it. Thanks for catching this Greg. Odd, 2.6.21 doesn't crash at all. Can anyone verify that I made the 2.6.20.8 release correctly with the proper patch? fyi, here's the patch that I applied, perhaps 2.6.20 needed something else too? thanks, greg k-h Subject: NETLINK: Infinite recursion in netlink. From: Alexey Kuznetsov [EMAIL PROTECTED] [NETLINK]: Infinite recursion in netlink. Reply to NETLINK_FIB_LOOKUP messages were misrouted back to kernel, which resulted in infinite recursion and stack overflow. The bug is present in all kernel versions since the feature appeared. The patch also makes some minimal cleanup: 1. Return something consistent (-ENOENT) when fib table is missing 2. Do not crash when queue is empty (does not happen, but yet) 3. Put result of lookup Signed-off-by: Alexey Kuznetsov [EMAIL PROTECTED] Signed-off-by: David S. Miller [EMAIL PROTECTED] Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED] --- net/ipv4/fib_frontend.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -772,6 +772,8 @@ static void nl_fib_lookup(struct fib_res .nl_u = { .ip4_u = { .daddr = frn-fl_addr, .tos = frn-fl_tos, .scope = frn-fl_scope } } }; + + frn-err = -ENOENT; if (tb) { local_bh_disable(); @@ -783,6 +785,7 @@ static void nl_fib_lookup(struct fib_res frn-nh_sel = res.nh_sel; frn-type = res.type; frn-scope = res.scope; + fib_res_put(res); } local_bh_enable(); } @@ -797,6 +800,9 @@ static void nl_fib_input(struct sock *sk struct fib_table *tb; skb = skb_dequeue(sk-sk_receive_queue); + if (skb == NULL) + return; + nlh = (struct nlmsghdr *)skb-data; if (skb-len NLMSG_SPACE(0) || skb-len nlh-nlmsg_len || nlh-nlmsg_len NLMSG_LENGTH(sizeof(*frn))) { @@ -809,7 +815,7 @@ static void nl_fib_input(struct sock *sk nl_fib_lookup(frn, tb); - pid = nlh-nlmsg_pid; /*pid of sending process */ + pid = NETLINK_CB(skb).pid; /* pid of sending process */ NETLINK_CB(skb).pid = 0; /* from kernel */ NETLINK_CB(skb).dst_group = 0; /* unicast */ netlink_unicast(sk, skb, pid, MSG_DONTWAIT); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html