Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
From: David Stevens <[EMAIL PROTECTED]> Date: Fri, 22 Jun 2007 21:30:05 -0700 > [EMAIL PROTECTED] wrote on 06/22/2007 06:17:46 PM: > > > On 23/06/07 02:04, David Stevens wrote: > > > Why not make the application that writes resolv.conf > > > also listen on a raw ICMPv6 socket? I don't believe you'd need > > > any kernel changes, then, and it seems pretty simple and > > > straightforward. > > > > Because then it requires yet another network daemon, RA in > > the kernel means there's no need for one to manage adding > > auto-configured IP addresses... what's wrong with doing the > > same for DNS? > > It's not yet another one, since you have to run something > to get it in resolv.conf, anyway. That seems much better to me > than having the kernel track data that can only be used at the > application layer. The app itself looks like it'd be really simple. > Auto-configured addresses are used by the kernel. It has to > have those addresses. But the kernel doesn't do DNS look-ups, or > write resolv.conf; that's the difference, for me. I totally agree with David, this stuff definitely does not belong in the kernel. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
On 23/06/07 05:30, David Stevens wrote: > [EMAIL PROTECTED] wrote on 06/22/2007 06:17:46 PM: Is there a reason why you're CC:ing the Sender? Doesn't that end up in the mailbox(es) of the netdev admin(s)? >> On 23/06/07 02:04, David Stevens wrote: >>> Why not make the application that writes resolv.conf >>> also listen on a raw ICMPv6 socket? I don't believe you'd need >>> any kernel changes, then, and it seems pretty simple and >>> straightforward. >> Because then it requires yet another network daemon, RA in >> the kernel means there's no need for one to manage adding >> auto-configured IP addresses... what's wrong with doing the >> same for DNS? > > It's not yet another one, since you have to run something > to get it in resolv.conf, anyway. That seems much better to me Well, it'd be the library including it - so there'd be no daemon application involved. > than having the kernel track data that can only be used at the > application layer. The app itself looks like it'd be really simple. Keeping application data in the kernel does start to get silly though, e.g. everything in dhcp-options(5)... but DNS is used almost everywhere. This could be a configuration option so that anyone who doesn't want it can disable it completely. > Auto-configured addresses are used by the kernel. It has to > have those addresses. But the kernel doesn't do DNS look-ups, or > write resolv.conf; that's the difference, for me. Using DHCPv6 as an example, auto-configuration does not have to be in the kernel at all either. -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
[EMAIL PROTECTED] wrote on 06/22/2007 06:17:46 PM: > On 23/06/07 02:04, David Stevens wrote: > > Why not make the application that writes resolv.conf > > also listen on a raw ICMPv6 socket? I don't believe you'd need > > any kernel changes, then, and it seems pretty simple and > > straightforward. > > Because then it requires yet another network daemon, RA in > the kernel means there's no need for one to manage adding > auto-configured IP addresses... what's wrong with doing the > same for DNS? It's not yet another one, since you have to run something to get it in resolv.conf, anyway. That seems much better to me than having the kernel track data that can only be used at the application layer. The app itself looks like it'd be really simple. Auto-configured addresses are used by the kernel. It has to have those addresses. But the kernel doesn't do DNS look-ups, or write resolv.conf; that's the difference, for me. +-DLS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
On Fri, 2007-06-22 at 20:09 -0400, C. Scott Ananian wrote: > > > diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h > > > linux-2.6.22-rc5/include/net/ip6_rdnss.h > > > --- linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h1969-12-31 > > > 19:00:00.0 -0500 > > > +++ linux-2.6.22-rc5/include/net/ip6_rdnss.h2007-06-21 > > > 18:16:33.0 -0400 @@ -0,0 +1,58 @@ > > > +#ifndef _NET_IP6_RDNSS_H > > > +#define _NET_IP6_RDNSS_H > > > + > > > +#ifdef __KERNEL__ > > > + > > > +#include > > > + > > > +struct nd_opt_rdnss { > > > +__u8type; > > > +__u8length; > > > +#if defined(__BIG_ENDIAN_BITFIELD) > > > +__u8priority:4, > > > +open:1, > > > +reserved1:3; > > > +#elif defined(__LITTLE_ENDIAN_BITFIELD) > > > +__u8reserved1:3, > > > +open:1, > > > +priority:4; > > > +#else > > > +# error not little or big endian > > > +#endif > > > > That is not endianess-safe. Don't use foo:x at all > > for stuff where a specific endianess is needed. The > > compiler doesn't make any guarantee about it. > > This was copied directly from include/net/ip6_route.h. I believe that > it does in fact work, and I (for one) find this much more readable > than the alternative. If it is in fact broken, then > include/net/ip6_route.h (and the 35 other files which use this #ifdef > in this manner) should be fixed. Though in general, we shouldn't be using bitfields, FYI. They are known to generate really crappy code on many architectures, and patches that contain them have been smacked down quite hard by people we all know are better hackers than us :) Dan - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
On 23/06/07 02:04, David Stevens wrote: > Why not make the application that writes resolv.conf > also listen on a raw ICMPv6 socket? I don't believe you'd need > any kernel changes, then, and it seems pretty simple and > straightforward. Because then it requires yet another network daemon, RA in the kernel means there's no need for one to manage adding auto-configured IP addresses... what's wrong with doing the same for DNS? I don't think it should be in resolv.conf format though. Can't you make a change to glibc too to have it use it properly? Something like "hosts: files rdnss dns" or an option that can be added to resolv.conf? -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Scott, Why not make the application that writes resolv.conf also listen on a raw ICMPv6 socket? I don't believe you'd need any kernel changes, then, and it seems pretty simple and straightforward. +-DLS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
> diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h > linux-2.6.22-rc5/include/net/ip6_rdnss.h > --- linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h1969-12-31 > 19:00:00.0 -0500 > +++ linux-2.6.22-rc5/include/net/ip6_rdnss.h2007-06-21 > 18:16:33.0 -0400 @@ -0,0 +1,58 @@ > +#ifndef _NET_IP6_RDNSS_H > +#define _NET_IP6_RDNSS_H > + > +#ifdef __KERNEL__ > + > +#include > + > +struct nd_opt_rdnss { > +__u8type; > +__u8length; > +#if defined(__BIG_ENDIAN_BITFIELD) > +__u8priority:4, > +open:1, > +reserved1:3; > +#elif defined(__LITTLE_ENDIAN_BITFIELD) > +__u8reserved1:3, > +open:1, > +priority:4; > +#else > +# error not little or big endian > +#endif That is not endianess-safe. Don't use foo:x at all for stuff where a specific endianess is needed. The compiler doesn't make any guarantee about it. This was copied directly from include/net/ip6_route.h. I believe that it does in fact work, and I (for one) find this much more readable than the alternative. If it is in fact broken, then include/net/ip6_route.h (and the 35 other files which use this #ifdef in this manner) should be fixed. --scott -- ( http://cscott.net/ ) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] inetdevice.h must include sysctl.h
From: "Satyam Sharma" <[EMAIL PROTECTED]> Date: Sat, 23 Jun 2007 05:26:52 +0530 > [PATCH] include sysctl.h from inetdevice.h > > When CONFIG_INET=y and CONFIG_SYSCTL=n: > > In file included from net/core/netpoll.c:16: > include/linux/inetdevice.h:15: error: > '__NET_IPV4_CONF_MAX' undeclared here (not in a function) > make[2]: *** [net/core/netpoll.o] Error 1 > make[1]: *** [net/core] Error 2 > make: *** [net] Error 2 > > So #include sysctl.h from inetdevice.h. > > Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]> Patch applied, thank you. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
On Saturday 23 June 2007 01:26:19 C. Scott Ananian wrote: > +struct rdns6_info { > + rwlock_tlock; > + struct timer_list expiry_timer; > + struct rdns6_entry *rdnss_list; > + struct inet6_dev * in6_dev; /* back pointer for netlink notify */ > + int expire_all : 1, /* remove entries on ifdown */ > + free_me : 1; /* safely free this struct */ > +}; Sparse will complain about that. I suggest you do: +struct rdns6_info { + rwlock_tlock; + struct timer_list expiry_timer; + struct rdns6_entry *rdnss_list; + struct inet6_dev * in6_dev; /* back pointer for netlink notify */ + u8 expire_all; /* remove entries on ifdown */ + u8 free_me; /* safely free this struct */ +}; Will generate better code and struct size shouldn't increase. So it's a net win. -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
On Saturday 23 June 2007 01:26:19 C. Scott Ananian wrote: > Attached is my first draft of a patch to implement RDNSS-in-Router > Advertisements support for IPv6 ( > http://tools.ietf.org/html/draft-jeong-dnsop-ipv6-dns-discovery-12 ) > as implemented in radvd ( http://www.litech.org/radvd/ ). It > currently exports the autoconfigured DNS list as /proc/net/ipv6_dns -- > ultimately it ought to (a) implement inotify on this file, so that > glibc could use it like /etc/resolv.conf and get notifications when > the DNS list changes, and (b) export the DNS list via netlink as well. > > Comments & discussion, please! > --scott > diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h > linux-2.6.22-rc5/include/net/ip6_rdnss.h > --- linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h 1969-12-31 > 19:00:00.0 -0500 > +++ linux-2.6.22-rc5/include/net/ip6_rdnss.h2007-06-21 > 18:16:33.0 -0400 @@ -0,0 +1,58 @@ > +#ifndef _NET_IP6_RDNSS_H > +#define _NET_IP6_RDNSS_H > + > +#ifdef __KERNEL__ > + > +#include > + > +struct nd_opt_rdnss { > + __u8type; > + __u8length; > +#if defined(__BIG_ENDIAN_BITFIELD) > + __u8priority:4, > + open:1, > + reserved1:3; > +#elif defined(__LITTLE_ENDIAN_BITFIELD) > + __u8reserved1:3, > + open:1, > + priority:4; > +#else > +# error not little or big endian > +#endif That is not endianess-safe. Don't use foo:x at all for stuff where a specific endianess is needed. The compiler doesn't make any guarantee about it. Please do __u8 flags; #define FOOBAR_RESERVED 0x07 #define FOOBAR_OPEN 0x08 #define FOOBAR_PRIORITY 0xF0 and use them in the code. In general I try to avoid the foo:x stuff, as it has little or no gain. It just generates worse code. -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration
Attached is my first draft of a patch to implement RDNSS-in-Router Advertisements support for IPv6 ( http://tools.ietf.org/html/draft-jeong-dnsop-ipv6-dns-discovery-12 ) as implemented in radvd ( http://www.litech.org/radvd/ ). It currently exports the autoconfigured DNS list as /proc/net/ipv6_dns -- ultimately it ought to (a) implement inotify on this file, so that glibc could use it like /etc/resolv.conf and get notifications when the DNS list changes, and (b) export the DNS list via netlink as well. Comments & discussion, please! --scott [ps. i'm copy-and-pasting the patch into gmail, against my better judgement. let me know if it doesn't apply for you, and i'll resent in a less-clever mail agent.] -- ( http://cscott.net/ ) diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_fib.h linux-2.6.22-rc5/include/net/ip6_fib.h --- linux-2.6.22-rc5-orig/include/net/ip6_fib.h 2007-06-16 22:09:12.0 -0400 +++ linux-2.6.22-rc5/include/net/ip6_fib.h 2007-06-20 14:17:58.0 -0400 @@ -79,6 +79,7 @@ struct rt6key }; struct fib6_table; +struct rdns6_info; struct rt6_info { @@ -105,6 +106,8 @@ struct rt6_info struct rt6key rt6i_src; u8 rt6i_protocol; + +struct rdns6_info *rt6i_rdnss; }; static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst) diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h linux-2.6.22-rc5/include/net/ip6_rdnss.h --- linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h 1969-12-31 19:00:00.0 -0500 +++ linux-2.6.22-rc5/include/net/ip6_rdnss.h2007-06-21 18:16:33.0 -0400 @@ -0,0 +1,58 @@ +#ifndef _NET_IP6_RDNSS_H +#define _NET_IP6_RDNSS_H + +#ifdef __KERNEL__ + +#include + +struct nd_opt_rdnss { + __u8type; + __u8length; +#if defined(__BIG_ENDIAN_BITFIELD) + __u8priority:4, + open:1, + reserved1:3; +#elif defined(__LITTLE_ENDIAN_BITFIELD) + __u8reserved1:3, + open:1, + priority:4; +#else +# error not little or big endian +#endif + __u8reserved2; + __be32 lifetime; + struct in6_addr rdnss[1]; /* 1 or more */ +}; + +struct rdns6_entry { + struct rdns6_entry *next; + struct in6_addr rdnss; + __u8priority; + __u8open; + __u32 lifetime; + unsigned long expires; +}; + +struct rdns6_info { + rwlock_tlock; + struct timer_list expiry_timer; + struct rdns6_entry *rdnss_list; + struct inet6_dev * in6_dev; /* back pointer for netlink notify */ + int expire_all : 1, /* remove entries on ifdown */ + free_me : 1; /* safely free this struct */ +}; + +/* Receive and process an RA message with the given RDNSS options. */ +extern voidrdns6_ra_rcv(struct inet6_dev *dev, struct rt6_info *rt, +struct nd_opt_rdnss **opts, int opt_cnt); +/* Expire all of the dns server info from a route (as on an ifdown). */ +extern voidrdns6_info_expire_all(struct rt6_info *rt); +/* Delete the DNS list information from a struct rt6_info. */ +extern voidrdns6_info_del(struct rt6_info *rt); + +/* Generate the /proc/net/ipv6_dns file. */ +extern int rdns6_proc_info(char *buffer, char **start, + off_t offset, int length); + +#endif +#endif diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ndisc.h linux-2.6.22-rc5/include/net/ndisc.h --- linux-2.6.22-rc5-orig/include/net/ndisc.h 2007-06-16 22:09:12.0 -0400 +++ linux-2.6.22-rc5/include/net/ndisc.h2007-06-18 15:30:00.0 -0400 @@ -24,6 +24,7 @@ enum { ND_OPT_MTU = 5, /* RFC2461 */ __ND_OPT_ARRAY_MAX, ND_OPT_ROUTE_INFO = 24, /* RFC4191 */ + ND_OPT_RDNSS_INFO = 25, /* draft/radvd */ __ND_OPT_MAX }; diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/net/ipv6/Makefile linux-2.6.22-rc5/net/ipv6/Makefile --- linux-2.6.22-rc5-orig/net/ipv6/Makefile 2007-06-16 22:09:12.0 -0400 +++ linux-2.6.22-rc5/net/ipv6/Makefile 2007-06-18 16:39:02.0 -0400 @@ -8,7 +8,7 @@ ipv6-objs :=af_inet6.o anycast.o ip6_ou route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \ raw.o protocol.o icmp.o mcast.o reassembly.o tcp_ipv6.o \ exthdrs.o sysctl_net_ipv6.o datagram.o \ - ip6_flowlabel.o inet6_connection_sock.o + ip6_flowlabel.o inet6_connection_sock.o ip6_rdnss.o ipv6-$(CONFIG_XFRM) += xfrm6_policy.o xfrm6_state.o xfrm6_input.o \ xfrm6_outp
[PATCH] sctp: lock_sock_nested in sctp_sock_migrate
I'm not sure that I've gotten either the sctp or lockdep details right, but with this patch I don't get lockdep yelling at me any more :) -- sctp: lock_sock_nested in sctp_sock_migrate sctp_sock_migrate() grabs the socket lock on a newly allocated socket while holding the socket lock on an old socket. lockdep worries that this might be a recursive lock attempt. task/3026 is trying to acquire lock: (sk_lock-AF_INET){--..}, at: [] sctp_sock_migrate+0x2e3/0x327 [sctp] but task is already holding lock: (sk_lock-AF_INET){--..}, at: [] sctp_accept+0xdf/0x1e3 [sctp] This patch tells lockdep that this locking is safe by using lock_sock_nested(). Signed-off-by: Zach Brown <[EMAIL PROTECTED]> diff -r 8adcfdf2545b net/sctp/socket.c --- a/net/sctp/socket.c Fri Jun 22 11:11:33 2007 -0700 +++ b/net/sctp/socket.c Fri Jun 22 15:05:22 2007 -0700 @@ -6084,8 +6084,11 @@ static void sctp_sock_migrate(struct soc * queued to the backlog. This prevents a potential race between * backlog processing on the old socket and new-packet processing * on the new socket. -*/ - sctp_lock_sock(newsk); +* +* The caller has just allocated newsk so we can guarantee that other +* paths won't try to lock it and then oldsk. +*/ + lock_sock_nested(newsk, SINGLE_DEPTH_NESTING); sctp_assoc_migrate(assoc, newsk); /* If the association on the newsk is already closed before accept() - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: Basic network namespace infrastructure.
This is the basic infrastructure needed to support network namespaces. This infrastructure is: - Registration functions to support initializing per network namespace data when a network namespaces is created or destroyed. - struct net. The network namespace datastructure. This structure will grow as variables are made per network namespace but this is the minimal starting point. - Functions to grab a reference to the network namespace. I provide both get/put functions that keep a network namespace from being freed. And hold/release functions serve as weak references and will warn if their count is not zero when the data structure is freed. Useful for dealing with more complicated data structures like the ipv4 route cache. - A list of all of the network namespaces so we can iterate over them. - A slab for the network namespace data structures allowing leaks to be spotted. I have deliberately chosen to not make it possible to compile out the code as the support for per-network namespace initialization and uninitialization needs to always be compiled in once code has started using it (even if we don't have network namespaces, and because no one has ever measured any performance overhead specific to network namespace infrastructure. As code to compile out the network namespace pointers etc is complicated it is best to avoid that code unless that complexity is justified. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> --- include/net/net_namespace.h | 66 ++ net/core/Makefile |2 +- net/core/net_namespace.c| 291 +++ 3 files changed, 358 insertions(+), 1 deletions(-) create mode 100644 include/net/net_namespace.h create mode 100644 net/core/net_namespace.c diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h new file mode 100644 index 000..c909b3a --- /dev/null +++ b/include/net/net_namespace.h @@ -0,0 +1,66 @@ +/* + * Operations on the network namespace + */ +#ifndef __NET_NET_NAMESPACE_H +#define __NET_NET_NAMESPACE_H + +#include +#include +#include + +struct net { + atomic_t count; /* To decided when the network namespace +* should go +*/ + atomic_t use_count; /* For references we destroy on demand */ + struct list_head list; /* list of network namespace structures */ + struct work_struct work;/* work struct for freeing */ +}; + +extern struct net init_net; +extern struct list_head net_namespace_list; + +extern void __put_net(struct net *net); + +static inline struct net *get_net(struct net *net) +{ + atomic_inc(&net->count); + return net; +} + +static inline void put_net(struct net *net) +{ + if (atomic_dec_and_test(&net->count)) + __put_net(net); +} + +static inline struct net *hold_net(struct net *net) +{ + atomic_inc(&net->use_count); + return net; +} + +static inline void release_net(struct net *net) +{ + atomic_dec(&net->use_count); +} + +extern void net_lock(void); +extern void net_unlock(void); + +#define for_each_net(VAR) \ + list_for_each_entry(VAR, &net_namespace_list, list); + + +struct pernet_operations { + struct list_head list; + int (*init)(struct net *net); + void (*exit)(struct net *net); +}; + +extern int register_pernet_subsys(struct pernet_operations *); +extern void unregister_pernet_subsys(struct pernet_operations *); +extern int register_pernet_device(struct pernet_operations *); +extern void unregister_pernet_device(struct pernet_operations *); + +#endif /* __NET_NET_NAMESPACE_H */ diff --git a/net/core/Makefile b/net/core/Makefile index 4751613..ea9b3f3 100644 --- a/net/core/Makefile +++ b/net/core/Makefile @@ -3,7 +3,7 @@ # obj-y := sock.o request_sock.o skbuff.o iovec.o datagram.o stream.o scm.o \ -gen_stats.o gen_estimator.o +gen_stats.o gen_estimator.o net_namespace.o obj-$(CONFIG_SYSCTL) += sysctl_net_core.o diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c new file mode 100644 index 000..397c15f --- /dev/null +++ b/net/core/net_namespace.c @@ -0,0 +1,291 @@ +#include +#include +#include +#include +#include +#include +#include + +/* + * Our network namespace constructor/destructor lists + */ + +static LIST_HEAD(pernet_list); +static struct list_head *first_device = &pernet_list; +static DEFINE_MUTEX(net_mutex); + +static DEFINE_MUTEX(net_list_mutex); +LIST_HEAD(net_namespace_list); + +static struct kmem_cache *net_cachep; + +struct net init_net; + +void net_lock(void) +{ + mutex_lock(&net_list_mutex); +} + +void net_unlock(void) +{ + mutex_unlock(&net_list_mutex); +} + +static struct net *net_alloc(void) +{ + return kmem_cache_alloc(net_cachep, GFP_KERNEL); +} + +static void net_free(struct net *net) +{ + if (!net) + return; + + if (unlikely(
RE: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
> Patrick McHardy wrote: > > Waskiewicz Jr, Peter P wrote: > > > >>Thought about this more last night and this morning. As > far as I can > >>tell, I still need this. If the qdisc gets loaded with multiqueue > >>turned on, I can just use the value of band to assign > >>skb->queue_mapping. But if the qdisc is loaded without multiqueue > >>support, then I need to assign a value of zero to queue_mapping, or > >>not assign it at all (it will be zero'd out before the call to > >>->enqueue() in dev_queue_xmit()). But I'd rather not have a > >>conditional in the hotpath checking if the qdisc is multiqueue; I'd > >>rather have the array to match the bands so I can just do > an assignment. > >> > >>What do you think? > > > > > > > > I very much doubt that it has any measurable impact. You > can also add > > a small inline function > > > > void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue) > > > OK I didn't really listen obviously :) A compile time option > won't help. Just remove it and assign it conditionally. Sounds good. Thanks Patrick. -PJ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] pasemi_mac: Fix TX interrupt threshold
It was mistakenly set to interrupt on the second packet instead of first, causing some interesting latency behaviour. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: netdev-2.6/drivers/net/pasemi_mac.c === --- netdev-2.6.orig/drivers/net/pasemi_mac.c +++ netdev-2.6/drivers/net/pasemi_mac.c @@ -755,7 +755,7 @@ static int pasemi_mac_open(struct net_de flags |= PAS_MAC_CFG_PCFG_TSR_1G | PAS_MAC_CFG_PCFG_SPD_1G; pci_write_config_dword(mac->iob_pdev, PAS_IOB_DMA_RXCH_CFG(mac->dma_rxch), - PAS_IOB_DMA_RXCH_CFG_CNTTH(1)); + PAS_IOB_DMA_RXCH_CFG_CNTTH(0)); pci_write_config_dword(mac->iob_pdev, PAS_IOB_DMA_TXCH_CFG(mac->dma_txch), PAS_IOB_DMA_TXCH_CFG_CNTTH(32)); -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] pasemi_mac: Clean TX ring in poll
Clean the TX ring in the poll call, to avoid sitting on mapped buffers for a long time. NFS doesn't seem to like it much, for example. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: netdev-2.6/drivers/net/pasemi_mac.c === --- netdev-2.6.orig/drivers/net/pasemi_mac.c +++ netdev-2.6/drivers/net/pasemi_mac.c @@ -1052,6 +1052,7 @@ static int pasemi_mac_poll(struct net_de int pkts, limit = min(*budget, dev->quota); struct pasemi_mac *mac = netdev_priv(dev); + pasemi_mac_clean_tx(mac); pkts = pasemi_mac_clean_rx(mac, limit); dev->quota -= pkts; -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/9] pasemi_mac patches for 2.6.23
Hi, pasemi_mac patches: minor tweaks, bugfixes and perf enhancements. Please consider for the 2.6.23 merge window. Thanks, -Olof - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] pasemi_mac: Use MMIO instead of pci config accessors
Move away from using the pci config access functions for simple register access. Our device has all of the registers in the config space (hey, from the hardware point of view it looks reasonable :-), so we need to somehow get to it. Newer firmwares have it in the device tree such that we can just get it and ioremap it there (in case it ever moves in future products). For now, provide a hardcoded fallback for older firmwares. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: netdev-2.6/drivers/net/pasemi_mac.c === --- netdev-2.6.orig/drivers/net/pasemi_mac.c +++ netdev-2.6/drivers/net/pasemi_mac.c @@ -81,46 +81,47 @@ MODULE_PARM_DESC(debug, "PA Semi MAC bit static struct pasdma_status *dma_status; -static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg) +static inline unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg) { unsigned int val; - pci_read_config_dword(mac->iob_pdev, reg, &val); + val = in_le32(mac->iob_regs+reg); + return val; } -static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg, +static inline void write_iob_reg(struct pasemi_mac *mac, unsigned int reg, unsigned int val) { - pci_write_config_dword(mac->iob_pdev, reg, val); + out_le32(mac->iob_regs+reg, val); } -static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg) +static inline unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg) { unsigned int val; - pci_read_config_dword(mac->pdev, reg, &val); + val = in_le32(mac->regs+reg); return val; } -static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg, +static inline void write_mac_reg(struct pasemi_mac *mac, unsigned int reg, unsigned int val) { - pci_write_config_dword(mac->pdev, reg, val); + out_le32(mac->regs+reg, val); } -static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg) +static inline unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg) { unsigned int val; - pci_read_config_dword(mac->dma_pdev, reg, &val); + val = in_le32(mac->dma_regs+reg); return val; } -static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg, +static inline void write_dma_reg(struct pasemi_mac *mac, unsigned int reg, unsigned int val) { - pci_write_config_dword(mac->dma_pdev, reg, val); + out_le32(mac->dma_regs+reg, val); } static int pasemi_get_mac_addr(struct pasemi_mac *mac) @@ -585,7 +586,6 @@ static int pasemi_mac_clean_tx(struct pa } mac->tx->next_to_clean += count; spin_unlock_irqrestore(&mac->tx->lock, flags); - netif_wake_queue(mac->netdev); return count; @@ -1077,6 +1077,73 @@ static int pasemi_mac_poll(struct net_de } } +static inline void __iomem * __devinit map_onedev(struct pci_dev *p, int index) +{ + struct device_node *dn; + void __iomem *ret; + + dn = pci_device_to_OF_node(p); + if (!dn) + goto fallback; + + ret = of_iomap(dn, index); + if (!ret) + goto fallback; + + return ret; +fallback: + /* This is hardcoded and ugly, but we have some firmware versions +* who don't provide the register space in the device tree. Luckily +* they are at well-known locations so we can just do the math here. +*/ + return ioremap(0xe000 + (p->devfn << 12), 0x1000); +} + +static int __devinit pasemi_mac_map_regs(struct pasemi_mac *mac) +{ + struct resource res; + struct device_node *dn; + int err; + + mac->dma_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa007, NULL); + if (!mac->dma_pdev) { + dev_err(&mac->pdev->dev, "Can't find DMA Controller\n"); + return -ENODEV; + } + + mac->iob_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa001, NULL); + if (!mac->iob_pdev) { + dev_err(&mac->pdev->dev, "Can't find I/O Bridge\n"); + return -ENODEV; + } + + mac->regs = map_onedev(mac->pdev, 0); + mac->dma_regs = map_onedev(mac->dma_pdev, 0); + mac->iob_regs = map_onedev(mac->iob_pdev, 0); + + if (!mac->regs || !mac->dma_regs || !mac->iob_regs) { + dev_err(&mac->pdev->dev, "Can't map registers\n"); + return -ENODEV; + } + + /* The dma status structure is located in the I/O bridge, and +* is cache coherent. +*/ + if (!dma_status) { + dn = pci_device_to_OF_node(mac->iob_pdev); + if (dn) + err = of_address_to_resource(dn, 1, &res); + if (!dn || err) { + /* Fallback for old firmware */ + res.start = 0xfd80; +
[PATCH 3/9] pasemi_mac: Abstract out register access
Abstract out the PCI config read/write accesses into reg read/write ones, still calling the pci accessors on the back end. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: netdev-2.6/drivers/net/pasemi_mac.c === --- netdev-2.6.orig/drivers/net/pasemi_mac.c +++ netdev-2.6/drivers/net/pasemi_mac.c @@ -81,6 +81,48 @@ MODULE_PARM_DESC(debug, "PA Semi MAC bit static struct pasdma_status *dma_status; +static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg) +{ + unsigned int val; + + pci_read_config_dword(mac->iob_pdev, reg, &val); + return val; +} + +static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg, + unsigned int val) +{ + pci_write_config_dword(mac->iob_pdev, reg, val); +} + +static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg) +{ + unsigned int val; + + pci_read_config_dword(mac->pdev, reg, &val); + return val; +} + +static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg, + unsigned int val) +{ + pci_write_config_dword(mac->pdev, reg, val); +} + +static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg) +{ + unsigned int val; + + pci_read_config_dword(mac->dma_pdev, reg, &val); + return val; +} + +static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg, + unsigned int val) +{ + pci_write_config_dword(mac->dma_pdev, reg, val); +} + static int pasemi_get_mac_addr(struct pasemi_mac *mac) { struct pci_dev *pdev = mac->pdev; @@ -166,22 +208,21 @@ static int pasemi_mac_setup_rx_resources memset(ring->buffers, 0, RX_RING_SIZE * sizeof(u64)); - pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXCHAN_BASEL(chan_id), - PAS_DMA_RXCHAN_BASEL_BRBL(ring->dma)); + write_dma_reg(mac, PAS_DMA_RXCHAN_BASEL(chan_id), PAS_DMA_RXCHAN_BASEL_BRBL(ring->dma)); + + write_dma_reg(mac, PAS_DMA_RXCHAN_BASEU(chan_id), + PAS_DMA_RXCHAN_BASEU_BRBH(ring->dma >> 32) | + PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 2)); + + write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id), + PAS_DMA_RXCHAN_CFG_HBU(1)); - pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXCHAN_BASEU(chan_id), - PAS_DMA_RXCHAN_BASEU_BRBH(ring->dma >> 32) | - PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 2)); - - pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXCHAN_CFG(chan_id), - PAS_DMA_RXCHAN_CFG_HBU(1)); - - pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXINT_BASEL(mac->dma_if), - PAS_DMA_RXINT_BASEL_BRBL(__pa(ring->buffers))); - - pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXINT_BASEU(mac->dma_if), - PAS_DMA_RXINT_BASEU_BRBH(__pa(ring->buffers) >> 32) | - PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3)); + write_dma_reg(mac, PAS_DMA_RXINT_BASEL(mac->dma_if), + PAS_DMA_RXINT_BASEL_BRBL(__pa(ring->buffers))); + + write_dma_reg(mac, PAS_DMA_RXINT_BASEU(mac->dma_if), + PAS_DMA_RXINT_BASEU_BRBH(__pa(ring->buffers) >> 32) | + PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3)); ring->next_to_fill = 0; ring->next_to_clean = 0; @@ -233,18 +274,18 @@ static int pasemi_mac_setup_tx_resources memset(ring->desc, 0, TX_RING_SIZE * sizeof(struct pas_dma_xct_descr)); - pci_write_config_dword(mac->dma_pdev, PAS_DMA_TXCHAN_BASEL(chan_id), - PAS_DMA_TXCHAN_BASEL_BRBL(ring->dma)); + write_dma_reg(mac, PAS_DMA_TXCHAN_BASEL(chan_id), + PAS_DMA_TXCHAN_BASEL_BRBL(ring->dma)); val = PAS_DMA_TXCHAN_BASEU_BRBH(ring->dma >> 32); val |= PAS_DMA_TXCHAN_BASEU_SIZ(TX_RING_SIZE >> 2); - pci_write_config_dword(mac->dma_pdev, PAS_DMA_TXCHAN_BASEU(chan_id), val); + write_dma_reg(mac, PAS_DMA_TXCHAN_BASEU(chan_id), val); - pci_write_config_dword(mac->dma_pdev, PAS_DMA_TXCHAN_CFG(chan_id), - PAS_DMA_TXCHAN_CFG_TY_IFACE | - PAS_DMA_TXCHAN_CFG_TATTR(mac->dma_if) | - PAS_DMA_TXCHAN_CFG_UP | - PAS_DMA_TXCHAN_CFG_WT(2)); + write_dma_reg(mac, PAS_DMA_TXCHAN_CFG(chan_id), + PAS_DMA_TXCHAN_CFG_TY_IFACE | + PAS_DMA_TXCHAN_CFG_TATTR(mac->dma_if) | + PAS_DMA_TXCHAN_CFG_UP | + PAS_DMA_TXCHAN_CFG_WT(2)); ring->next_to_use = 0; ring->next_to_clean = 0; @@ -383,12 +424,8 @@ static void pasemi_mac_replenish_rx_ring wmb(); -
[PATCH 8/9] pasemi_mac: Reduce locking when cleaning TX ring
Postpone pci unmap and skb free of the transmitted buffers to outside of the tx ring lock, batching them up 32 at a time. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: netdev-2.6/drivers/net/pasemi_mac.c === --- netdev-2.6.orig/drivers/net/pasemi_mac.c +++ netdev-2.6/drivers/net/pasemi_mac.c @@ -562,37 +562,56 @@ static int pasemi_mac_clean_tx(struct pa int i; struct pasemi_mac_buffer *info; struct pas_dma_xct_descr *dp; - int start, count; + unsigned int start, count, limit; + unsigned int total_count; int flags; + struct sk_buff *skbs[32]; + dma_addr_t dmas[32]; + total_count = 0; +restart: spin_lock_irqsave(&mac->tx->lock, flags); start = mac->tx->next_to_clean; + limit = min(mac->tx->next_to_use, start+32); + count = 0; - for (i = start; i < mac->tx->next_to_use; i++) { + for (i = start; i < limit; i++) { dp = &TX_DESC(mac, i); + if (unlikely(dp->mactx & XCT_MACTX_O)) + /* Not yet transmitted */ break; - count++; - info = &TX_DESC_INFO(mac, i); - - pci_unmap_single(mac->dma_pdev, info->dma, -info->skb->len, PCI_DMA_TODEVICE); - dev_kfree_skb_irq(info->skb); + skbs[count] = info->skb; + dmas[count] = info->dma; info->skb = NULL; info->dma = 0; dp->mactx = 0; dp->ptr = 0; + + count++; } mac->tx->next_to_clean += count; spin_unlock_irqrestore(&mac->tx->lock, flags); netif_wake_queue(mac->netdev); - return count; + for (i = 0; i < count; i++) { + pci_unmap_single(mac->dma_pdev, dmas[i], +skbs[i]->len, PCI_DMA_TODEVICE); + dev_kfree_skb_irq(skbs[i]); + } + + total_count += count; + + /* If the batch was full, try to clean more */ + if (count == 32) + goto restart; + + return total_count; } -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] pasemi_mac: Minor performance tweaks
Various minor performance tweaks, do some explicit prefetching of packet data, etc. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: netdev-2.6/drivers/net/pasemi_mac.c === --- netdev-2.6.orig/drivers/net/pasemi_mac.c +++ netdev-2.6/drivers/net/pasemi_mac.c @@ -481,6 +481,7 @@ static int pasemi_mac_clean_rx(struct pa rmb(); dp = &RX_DESC(mac, n); + prefetchw(dp); macrx = dp->macrx; if (!(macrx & XCT_MACRX_O)) @@ -502,8 +503,10 @@ static int pasemi_mac_clean_rx(struct pa if (info->dma == dma) break; } + prefetchw(info); skb = info->skb; + prefetchw(skb); info->dma = 0; pci_unmap_single(mac->dma_pdev, dma, skb->len, @@ -526,9 +529,7 @@ static int pasemi_mac_clean_rx(struct pa skb_put(skb, len); - skb->protocol = eth_type_trans(skb, mac->netdev); - - if ((macrx & XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK) { + if (likely((macrx & XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK)) { skb->ip_summed = CHECKSUM_COMPLETE; skb->csum = (macrx & XCT_MACRX_CSUM_M) >> XCT_MACRX_CSUM_S; @@ -538,6 +539,7 @@ static int pasemi_mac_clean_rx(struct pa mac->stats.rx_bytes += len; mac->stats.rx_packets++; + skb->protocol = eth_type_trans(skb, mac->netdev); netif_receive_skb(skb); dp->ptr = 0; @@ -569,7 +571,7 @@ static int pasemi_mac_clean_tx(struct pa for (i = start; i < mac->tx->next_to_use; i++) { dp = &TX_DESC(mac, i); - if (!dp || (dp->mactx & XCT_MACTX_O)) + if (unlikely(dp->mactx & XCT_MACTX_O)) break; count++; @@ -957,7 +959,7 @@ static int pasemi_mac_start_tx(struct sk struct pasemi_mac_txring *txring; struct pasemi_mac_buffer *info; struct pas_dma_xct_descr *dp; - u64 dflags; + u64 dflags, mactx, ptr; dma_addr_t map; int flags; @@ -985,6 +987,9 @@ static int pasemi_mac_start_tx(struct sk if (dma_mapping_error(map)) return NETDEV_TX_BUSY; + mactx = dflags | XCT_MACTX_LLEN(skb->len); + ptr = XCT_PTR_LEN(skb->len) | XCT_PTR_ADDR(map); + txring = mac->tx; spin_lock_irqsave(&txring->lock, flags); @@ -1005,12 +1010,11 @@ static int pasemi_mac_start_tx(struct sk } } - dp = &TX_DESC(mac, txring->next_to_use); info = &TX_DESC_INFO(mac, txring->next_to_use); - dp->mactx = dflags | XCT_MACTX_LLEN(skb->len); - dp->ptr = XCT_PTR_LEN(skb->len) | XCT_PTR_ADDR(map); + dp->mactx = mactx; + dp->ptr = ptr; info->dma = map; info->skb = skb; -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] pasemi_mac: Enable LLTX
Enable LLTX on pasemi_mac: we're already doing sufficient locking in the driver to enable it. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: netdev-2.6/drivers/net/pasemi_mac.c === --- netdev-2.6.orig/drivers/net/pasemi_mac.c +++ netdev-2.6/drivers/net/pasemi_mac.c @@ -1239,7 +1239,7 @@ pasemi_mac_probe(struct pci_dev *pdev, c dev->set_multicast_list = pasemi_mac_set_rx_mode; dev->weight = 64; dev->poll = pasemi_mac_poll; - dev->features = NETIF_F_HW_CSUM; + dev->features = NETIF_F_HW_CSUM | NETIF_F_LLTX; err = pasemi_mac_map_regs(mac); if (err) -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] pasemi_mac: Enable L2 caching of packet headers
Enable settings to target L2 for the first few cachelines of the packet, since we'll access them to get to the various headers. Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: netdev-2.6/drivers/net/pasemi_mac.c === --- netdev-2.6.orig/drivers/net/pasemi_mac.c +++ netdev-2.6/drivers/net/pasemi_mac.c @@ -216,7 +216,7 @@ static int pasemi_mac_setup_rx_resources PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 2)); write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id), - PAS_DMA_RXCHAN_CFG_HBU(1)); + PAS_DMA_RXCHAN_CFG_HBU(2)); write_dma_reg(mac, PAS_DMA_RXINT_BASEL(mac->dma_if), PAS_DMA_RXINT_BASEL_BRBL(__pa(ring->buffers))); @@ -225,6 +225,9 @@ static int pasemi_mac_setup_rx_resources PAS_DMA_RXINT_BASEU_BRBH(__pa(ring->buffers) >> 32) | PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3)); + write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac->dma_if), + PAS_DMA_RXINT_CFG_DHL(2)); + ring->next_to_fill = 0; ring->next_to_clean = 0; Index: netdev-2.6/drivers/net/pasemi_mac.h === --- netdev-2.6.orig/drivers/net/pasemi_mac.h +++ netdev-2.6/drivers/net/pasemi_mac.h @@ -218,6 +218,14 @@ enum { #definePAS_DMA_RXINT_RCMDSTA_ACT 0x0001 #definePAS_DMA_RXINT_RCMDSTA_DROPS_M 0xfffe #definePAS_DMA_RXINT_RCMDSTA_DROPS_S 17 +#define PAS_DMA_RXINT_CFG(i) (0x204+(i)*_PAS_DMA_RXINT_STRIDE) +#definePAS_DMA_RXINT_CFG_DHL_M 0x0700 +#definePAS_DMA_RXINT_CFG_DHL_S 24 +#definePAS_DMA_RXINT_CFG_DHL(x)(((x) << PAS_DMA_RXINT_CFG_DHL_S) & \ +PAS_DMA_RXINT_CFG_DHL_M) +#definePAS_DMA_RXINT_CFG_WIF 0x0002 +#definePAS_DMA_RXINT_CFG_WIL 0x0001 + #define PAS_DMA_RXINT_INCR(i) (0x210+(i)*_PAS_DMA_RXINT_STRIDE) #definePAS_DMA_RXINT_INCR_INCR_M 0x #definePAS_DMA_RXINT_INCR_INCR_S 0 -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] pasemi_mac: Simplify memcpy for short receives
No need to copy over the skipped align bytes (besides, NET_IP_ALIGN is 0 on ppc64). Signed-off-by: Olof Johansson <[EMAIL PROTECTED]> Index: netdev-2.6/drivers/net/pasemi_mac.c === --- netdev-2.6.orig/drivers/net/pasemi_mac.c +++ netdev-2.6/drivers/net/pasemi_mac.c @@ -516,9 +516,7 @@ static int pasemi_mac_clean_rx(struct pa netdev_alloc_skb(mac->netdev, len + NET_IP_ALIGN); if (new_skb) { skb_reserve(new_skb, NET_IP_ALIGN); - memcpy(new_skb->data - NET_IP_ALIGN, - skb->data - NET_IP_ALIGN, - len + NET_IP_ALIGN); + memcpy(new_skb->data, skb->data, len); /* save the skb in buffer_info as good */ skb = new_skb; } -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFD] L2 Network namespace infrastructure
Currently all of the prerequisite work for implementing a network namespace (i.e. virtualization of the network stack with one kernel) has already been merged or is in the process of being merged. Therefore it is now time for a bit of high level design review of the network namespace work and time to begin sending patches. -- User space semantics If you are in a different network namespace it looks like you have a separate independent copy of the network stack. User visible kernel structures that will appear to be per network namespace include network devices, routing tables, sockets, and netfilter rules. -- The basic design There will be a network namespace structure that holds the global variables for a network namespace, making those global variables per network namespace. One of those per network namespace global variables will be the loopback device. Which means the network namespace a packet resides in can be found simply by examining the network device or the socket the packet is traversing. Either a pointer to this global structure will be passed into the functions that need to reference per network namespace variables or a structure that is already passed in (such as the network device) will be modified to contain a pointer to the network namespace structure. Depending upon the data structure it will either be modified to hold a per entry network namespace pointer or it there will be a separate copy per network namespace. For large global data structures like the ipv4 routing cache hash table adding an additional pointer to the entries appears the more reasonable solution. The initialization and cleanup functions will be refactored into functions that do the work on a network namespace basis and functions that perform truly global initialization and cleanup. And a registration mechanism will be available to register functions that are per network namespace. It is a namespace so like the other namespaces that have been implemented a clone flag will exist to create the namespace during clone or unshare. There will be an additional network stack feature that will allow you to migrate network devices between namespaces. When complete all of the features of the network stack ipv4, ipv6, decnet, sysctls, virtual devices, routing tables, scheduling, ipsec, netfilter, etc should be able to operate in a per network namespace fashion. --- The implementation plan The plan for implementing this is to first get network namespace infrastructure merged. So that pieces of the network stack can be made to operate in a per network namespace fashion. Then the plan is to proceed as if we are doing a global kernel lock removal. For each layer of the networking stack pass down the per network namespace parameter to the functions and modify the functions to verify they are only operating on the initial network namespace. Then one piece at a time update the code to handle working in multiple network namespaces, and push the network namespace information down to the lower levels. This plan calls for a lot of patches that are essentially noise. But the result is simple and generally obviously correct patches, that can be easily reviewed, and can be safely merged one at a time, and don't impose any additional ongoing maintenance overhead. In my current proof of concept patchset it takes about 100 patches before ipv4 is up and working. --- Performance In initial measurements the only performance overhead we have been able to measure is getting the packet to the network namespace. Going through ethernet bridging or routing seems to trigger copies of the packet that slow things down. When packets go directly to the network namespace no performance penalty has yet been measured. --- The question At the design level does this approach sound reasonable? Eric p.s. I will follow up shortly with a patch that is one implementation of the basic network namespace infrastructure. Feel free to cut it to shreds (as it is likely overkill) but it should help put the pieces of what I am talking about into perspective. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Ethernet driver for EISA only SNI RM200/RM400 machines
Hi, This is new ethernet driver, which use the code taken out of lasi_82596 (done by the other patch I just sent). Thomas. Ethernet driver for EISA only SNI RM200/RM400 machines Signed-off-by: Thomas Bogendoerfer <[EMAIL PROTECTED]> --- diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index b0d0d73..af5c90f 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -435,6 +435,13 @@ config LASI_82596 Say Y here to support the builtin Intel 82596 ethernet controller found in Hewlett-Packard PA-RISC machines with 10Mbit ethernet. +config SNI_82596 + tristate "SNI RM ethernet" + depends on NET_ETHERNET && SNI_RM + help + Say Y here to support the on-board Intel 82596 ethernet controller + built into SNI RM machines. + config MIPS_JAZZ_SONIC tristate "MIPS JAZZ onboard SONIC Ethernet support" depends on NET_ETHERNET && MACH_JAZZ diff --git a/drivers/net/Makefile b/drivers/net/Makefile index d268b49..b03270c 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -161,6 +161,7 @@ obj-$(CONFIG_ELPLUS) += 3c505.o obj-$(CONFIG_AC3200) += ac3200.o 8390.o obj-$(CONFIG_APRICOT) += 82596.o obj-$(CONFIG_LASI_82596) += lasi_82596.o +obj-$(CONFIG_SNI_82596) += sni_82596.o obj-$(CONFIG_MVME16x_NET) += 82596.o obj-$(CONFIG_BVME6000_NET) += 82596.o obj-$(CONFIG_SC92031) += sc92031.o diff --git a/drivers/net/sni_82596.c b/drivers/net/sni_82596.c new file mode 100644 index 000..a37d08a --- /dev/null +++ b/drivers/net/sni_82596.c @@ -0,0 +1,204 @@ +/* + * sni_82596.c -- driver for intel 82596 ethernet controller, as + * used in older SNI RM machines + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SNI_82596_DRIVER_VERSION "SNI RM 82596 driver - Revision: 0.01" + +static char sni_82596_string[] = "snirm_82596"; + +#define DMA_ALLOC dma_alloc_coherent +#define DMA_FREE dma_free_coherent +#define DMA_WBACK(priv, addr, len) do { } while (0) +#define DMA_INV(priv, addr, len) do { } while (0) +#define DMA_WBACK_INV(priv, addr, len) do { } while (0) + +#define SYSBUS 0x4400 + +/* big endian CPU, 82596 little endian */ +#define SWAP32(x) cpu_to_le32((u32)(x)) +#define SWAP16(x) cpu_to_le16((u16)(x)) + +#define OPT_MPU_16BIT0x01 + +static inline void CA(struct net_device *dev); +static inline void MPU_PORT(struct net_device *dev, int c, dma_addr_t x); + +#include "lib82596.c" + +MODULE_AUTHOR("Thomas Bogendoerfer"); +MODULE_DESCRIPTION("i82596 driver"); +MODULE_LICENSE("GPL"); +module_param(i596_debug, int, 0); +MODULE_PARM_DESC(i596_debug, "82596 debug mask"); + +static inline void CA(struct net_device *dev) +{ + struct i596_private *lp = netdev_priv(dev); + + writel(0, lp->ca); +} + + +static inline void MPU_PORT(struct net_device *dev, int c, dma_addr_t x) +{ + struct i596_private *lp = netdev_priv(dev); + + u32 v = (u32) (c) | (u32) (x); + + if (lp->options & OPT_MPU_16BIT) { + writew(v & 0x, lp->mpu_port); + wmb(); udelay(1); /* order writes to MPU port */ + writew(v >> 16, lp->mpu_port); + } else { + writel(v, lp->mpu_port); + wmb(); udelay(1); /* order writes to MPU port */ + writel(v, lp->mpu_port); + } +} + + +static int __devinit sni_82596_probe(struct platform_device *dev) +{ + struct net_device *netdevice; + struct i596_private *lp; + struct resource *res, *ca, *idprom, *options; + int retval = -ENODEV; + static int init; + void __iomem *mpu_addr = NULL; + void __iomem *ca_addr = NULL; + u8 __iomem *eth_addr = NULL; + + if (init == 0) { + printk(KERN_INFO SNI_82596_DRIVER_VERSION "\n"); + init++; + } + + res = platform_get_resource(dev, IORESOURCE_MEM, 0); + if (!res) + goto probe_failed; + mpu_addr = ioremap_nocache(res->start, 4); + if (!mpu_addr) { + retval = -ENOMEM; + goto probe_failed; + } + ca = platform_get_resource(dev, IORESOURCE_MEM, 1); + if (!ca) + goto probe_failed; + ca_addr = ioremap_nocache(ca->start, 4); + if (!ca_addr) { + retval = -ENOMEM; + goto probe_failed; + } + idprom = platform_get_resource(dev, IORESOURCE_MEM, 2); + if (!idprom) + goto probe_failed; + eth_addr = ioremap_nocache(idprom->start, 0x10); + if (!eth_addr) { + retval = -ENOMEM; + goto probe_failed; + } + options = platform_get_resource(dev, 0, 0); + if (!options) + goto probe_failed; + + printk(KERN_INFO "Fo
Re: Linksys Gigabit USB2.0 adapter (asix) regression
David Hollis wrote: >> To rule out the possibility of the nic being defective, I connected the >> USB nic to a windows computer. There it works, although the ethernet >> connection is a bit flaky (just like it seems...). >> >> Then I did a diff on the respective kernel sources of 2.6.20.3 and >> 2.6.22-rc2 (asix.c and usbnet.c), I found a few changes, but they do not >> seem to be related to my problem. >> >> I am the and of my repertoire here, can anyone please do some >> suggestions for further testing or even better, fix it ;-) > > You wouldn't happen to know what PHY that device is using? The AX88178 > (Gigabit USB Ethernet) support in the driver currently only supports the > Marvell PHY, which is the only one I've actually encountered to-date. > If you can rebuild the driver from your kernel sources but with DEBUG > enabled (uncomment it at the top of asix.c) No problem, I will do it on sunday. No need to build the driver out-of-tree btw. > After you build the module, load it with insmod ./asix.ko, plug in your > device and send me the dmesg output. I'm particularly interested in the > PHYID=0x12345678 line. That will tell me what PHY chip is being used in > that device and if I need to add support for it. Will do. Thanks! - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
Patrick McHardy wrote: > Waskiewicz Jr, Peter P wrote: > >>Thought about this more last night and this morning. As far as I can >>tell, I still need this. If the qdisc gets loaded with multiqueue >>turned on, I can just use the value of band to assign >>skb->queue_mapping. But if the qdisc is loaded without multiqueue >>support, then I need to assign a value of zero to queue_mapping, or not >>assign it at all (it will be zero'd out before the call to ->enqueue() >>in dev_queue_xmit()). But I'd rather not have a conditional in the >>hotpath checking if the qdisc is multiqueue; I'd rather have the array >>to match the bands so I can just do an assignment. >> >>What do you think? > > > > I very much doubt that it has any measurable impact. You can > also add a small inline function > > void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue) OK I didn't really listen obviously :) A compile time option won't help. Just remove it and assign it conditionally. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
Patrick McHardy wrote: > void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue) > { > #ifdef CONFIG_NET_SCH_MULTIQUEUE > skb->queue_mapping = queue; > #else > skb->queue_mapping = 0; > #endif Maybe even use it everywhere and guard skb->queue_mapping by an #ifdef, on 32 bit it does enlarge the skb. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
Waskiewicz Jr, Peter P wrote: >>> #include >>>@@ -40,9 +42,13 @@ >>> struct prio_sched_data >>> { >>> int bands; >>>+#ifdef CONFIG_NET_SCH_RR >>>+int curband; /* for round-robin */ >>>+#endif >>> struct tcf_proto *filter_list; >>> u8 prio2band[TC_PRIO_MAX+1]; >>> struct Qdisc *queues[TCQ_PRIO_BANDS]; >>>+u16 band2queue[TC_PRIO_MAX + 1]; >>> >> >>Why is this still here? Its a 1:1 mapping. > > > Thought about this more last night and this morning. As far as I can > tell, I still need this. If the qdisc gets loaded with multiqueue > turned on, I can just use the value of band to assign > skb->queue_mapping. But if the qdisc is loaded without multiqueue > support, then I need to assign a value of zero to queue_mapping, or not > assign it at all (it will be zero'd out before the call to ->enqueue() > in dev_queue_xmit()). But I'd rather not have a conditional in the > hotpath checking if the qdisc is multiqueue; I'd rather have the array > to match the bands so I can just do an assignment. > > What do you think? I very much doubt that it has any measurable impact. You can also add a small inline function void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue) { #ifdef CONFIG_NET_SCH_MULTIQUEUE skb->queue_mapping = queue; #else skb->queue_mapping = 0; #endif } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue
> > #include > > @@ -40,9 +42,13 @@ > > struct prio_sched_data > > { > > int bands; > > +#ifdef CONFIG_NET_SCH_RR > > + int curband; /* for round-robin */ > > +#endif > > struct tcf_proto *filter_list; > > u8 prio2band[TC_PRIO_MAX+1]; > > struct Qdisc *queues[TCQ_PRIO_BANDS]; > > + u16 band2queue[TC_PRIO_MAX + 1]; > > > > Why is this still here? Its a 1:1 mapping. Thought about this more last night and this morning. As far as I can tell, I still need this. If the qdisc gets loaded with multiqueue turned on, I can just use the value of band to assign skb->queue_mapping. But if the qdisc is loaded without multiqueue support, then I need to assign a value of zero to queue_mapping, or not assign it at all (it will be zero'd out before the call to ->enqueue() in dev_queue_xmit()). But I'd rather not have a conditional in the hotpath checking if the qdisc is multiqueue; I'd rather have the array to match the bands so I can just do an assignment. What do you think? Thanks, -PJ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPROUTE]: Add nested compat attribute
Patrick McHardy wrote: > extern int parse_rtattr(struct rtattr *tb[], int max, struct rtattr *rta, > int len); > extern int parse_rtattr_byindex(struct rtattr *tb[], int max, struct rtattr > *rta, int len); > +extern int parse_rtattr_nested_compat(struct rtattr *tb[], int max, struct > rtattr *rta, void **data, int len); Same change as in the kernel patch, avoid cast in caller and make the signature match existing ones better. [IPROUTE]: Add nested compat attribute Add a nested compat attribute type that can be used to convert attributes that contain a structure to nested attributes in a backwards compatible way. The attribute looks like this: struct { [ compat contents ] struct rtattr { .rta_len= total size, .rta_type = type, } rta; struct old_structure struct; [ nested top-level attribute ] struct rtattr { .rta_len= nest size, .rta_type = type, } nest_attr; [ optional 0 .. n nested attributes ] struct rtattr { .rta_len= private attribute len, .rta_type = private attribute typ, } nested_attr; struct nested_data data; }; Since both userspace and kernel deal correctly with attributes that are larger than expected old versions will just parse the compat part and ignore the rest. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 7d4cee8b6e0ddceda300ad1e4242cd068a722f0c tree ea8b1245518d6ed352ef5d8e379588fa76981de9 parent cd71a8e07f57a74d52e62cc1fed39c03ad64bc08 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 19:13:08 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 19:13:08 +0200 include/libnetlink.h |9 + lib/libnetlink.c | 46 ++ 2 files changed, 55 insertions(+), 0 deletions(-) diff --git a/include/libnetlink.h b/include/libnetlink.h index 49e248e..b67c5a5 100644 --- a/include/libnetlink.h +++ b/include/libnetlink.h @@ -39,15 +39,24 @@ extern int rtnl_send(struct rtnl_handle extern int addattr32(struct nlmsghdr *n, int maxlen, int type, __u32 data); extern int addattr_l(struct nlmsghdr *n, int maxlen, int type, const void *data, int alen); extern int addraw_l(struct nlmsghdr *n, int maxlen, const void *data, int len); +extern struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type); +extern int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest); +extern struct rtattr *addattr_nest_compat(struct nlmsghdr *n, int maxlen, int type, const void *data, int len); +extern int addattr_nest_compat_end(struct nlmsghdr *n, struct rtattr *nest); extern int rta_addattr32(struct rtattr *rta, int maxlen, int type, __u32 data); extern int rta_addattr_l(struct rtattr *rta, int maxlen, int type, const void *data, int alen); extern int parse_rtattr(struct rtattr *tb[], int max, struct rtattr *rta, int len); extern int parse_rtattr_byindex(struct rtattr *tb[], int max, struct rtattr *rta, int len); +extern int __parse_rtattr_nested_compat(struct rtattr *tb[], int max, struct rtattr *rta, int len); #define parse_rtattr_nested(tb, max, rta) \ (parse_rtattr((tb), (max), RTA_DATA(rta), RTA_PAYLOAD(rta))) +#define parse_rtattr_nested_compat(tb, max, rta, data, len) \ +({ data = RTA_PAYLOAD(rta) >= len ? RTA_DATA(rta) : NULL; \ + __parse_rtattr_nested_compat(tb, max, rta, len); }) + extern int rtnl_listen(struct rtnl_handle *, rtnl_filter_t handler, void *jarg); extern int rtnl_from_file(FILE *, rtnl_filter_t handler, diff --git a/lib/libnetlink.c b/lib/libnetlink.c index 555dd5c..12883fe 100644 --- a/lib/libnetlink.c +++ b/lib/libnetlink.c @@ -527,6 +527,39 @@ int addraw_l(struct nlmsghdr *n, int max return 0; } +struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type) +{ + struct rtattr *nest = NLMSG_TAIL(n); + + addattr_l(n, maxlen, type, NULL, 0); + return nest; +} + +int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest) +{ + nest->rta_len = (void *)NLMSG_TAIL(n) - (void *)nest; + return n->nlmsg_len; +} + +struct rtattr *addattr_nest_compat(struct nlmsghdr *n, int maxlen, int type, + const void *data, int len) +{ + struct rtattr *start = NLMSG_TAIL(n); + + addattr_l(n, maxlen, type, data, len); + addattr_nest(n, maxlen, type); + return start; +} + +int addattr_nest_compat_end(struct nlmsghdr *n, struct rtattr *start) +{ + struct rtattr *nest = (void *)start + NLMSG_ALIGN(start->rta_len); + + start->rta_len = (void *)NLMSG_TAIL(n) - (void *)start; + addattr_nest_end(n, nest); + return n->nlmsg_len; +} + int rta_addattr32(struct rtattr *rta, int maxlen, int type, __u32 data) { int len = RTA_LENGTH(4); @@ -589,3 +622,16 @@ int parse_rtattr_byindex(struct
Re: [RTNETLINK]: Add nested compat attribute
Patrick McHardy wrote: > extern int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr > *rta, int len); > +extern int rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr, > + struct rtattr *rta, void **data, int len); > This version is a bit nicer because it avoids a cast in the caller and makes the signature match the existing functions better. In the previous version it the call would have looked like this: if (rtattr_parse_nested_compat(tb, TCA_PRIO_MAX, opt, (void *)&qopt, sizeof(*qopt))) return -EINVAL; now its: if (rtattr_parse_nested_compat(tb, TCA_PRIO_MAX, opt, qopt, sizeof(*qopt))) return -EINVAL; [RTNETLINK]: Add nested compat attribute Add a nested compat attribute type that can be used to convert attributes that contain a structure to nested attributes in a backwards compatible way. The attribute looks like this: struct { [ compat contents ] struct rtattr { .rta_len= total size, .rta_type = type, } rta; struct old_structure struct; [ nested top-level attribute ] struct rtattr { .rta_len= nest size, .rta_type = type, } nest_attr; [ optional 0 .. n nested attributes ] struct rtattr { .rta_len= private attribute len, .rta_type = private attribute typ, } nested_attr; struct nested_data data; }; Since both userspace and kernel deal correctly with attributes that are larger than expected old versions will just parse the compat part and ignore the rest. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 1e16b5e521172515c53ce96a0bc3cf0e2d77c001 tree c9880c58391e2df77ecab3d9b6a6849947714eb3 parent c4edf5d552b1450d903a7e7e2d846f2169087e10 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 19:06:54 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 19:06:54 +0200 include/linux/rtnetlink.h | 18 ++ net/core/rtnetlink.c | 14 ++ 2 files changed, 32 insertions(+), 0 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 6127858..d40b0c9 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -570,10 +570,16 @@ static __inline__ int rtattr_strcmp(cons } extern int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr *rta, int len); +extern int __rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr, + struct rtattr *rta, int len); #define rtattr_parse_nested(tb, max, rta) \ rtattr_parse((tb), (max), RTA_DATA((rta)), RTA_PAYLOAD((rta))) +#define rtattr_parse_nested_compat(tb, max, rta, data, len) \ +({ data = RTA_PAYLOAD(rta) >= len ? RTA_DATA(rta) : NULL; \ + __rtattr_parse_nested_compat(tb, max, rta, len); }) + extern int rtnetlink_send(struct sk_buff *skb, u32 pid, u32 group, int echo); extern int rtnl_unicast(struct sk_buff *skb, u32 pid); extern int rtnl_notify(struct sk_buff *skb, u32 pid, u32 group, @@ -638,6 +644,18 @@ #define RTA_NEST_END(skb, start) \ ({ (start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \ (skb)->len; }) +#define RTA_NEST_COMPAT(skb, type, attrlen, data) \ +({ struct rtattr *__start = (struct rtattr *)skb_tail_pointer(skb); \ + RTA_PUT(skb, type, attrlen, data); \ + RTA_NEST(skb, type); \ + __start; }) + +#define RTA_NEST_COMPAT_END(skb, start) \ +({ struct rtattr *__nest = (void *)(start) + NLMSG_ALIGN((start)->rta_len); \ + (start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \ + RTA_NEST_END(skb, __nest); \ + (skb)->len; }) + #define RTA_NEST_CANCEL(skb, start) \ ({ if (start) \ skb_trim(skb, (unsigned char *) (start) - (skb)->data); \ diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 06c0c5a..54c17e4 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -97,6 +97,19 @@ int rtattr_parse(struct rtattr *tb[], in return 0; } +int __rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr, +struct rtattr *rta, int len) +{ + if (RTA_PAYLOAD(rta) < len) + return -1; + if (RTA_PAYLOAD(rta) >= RTA_ALIGN(len) + sizeof(struct rtattr)) { + rta = RTA_DATA(rta) + RTA_ALIGN(len); + return rtattr_parse_nested(tb, maxattr, rta); + } + memset(tb, 0, sizeof(struct rtattr *) * maxattr); + return 0; +} + static struct rtnl_link *rtnl_msg_handlers[NPROTO]; static inline int rtm_msgindex(int msgtype) @@ -1297,6 +1310,7 @@ void __init rtnetlink_init(void) EXPORT_SYMBOL(__rta_fill); EXPORT_SYMBOL(rtattr_strlcpy); EXPORT_SYMBOL(rtattr_parse); +EXPORT_SYMBOL(__rtattr_pa
Re: [patch 0/7] CAN: Add new PF_CAN protocol family, try #3
Patrick McHardy wrote: > Oliver Hartkopp wrote: > >> >> Is it the right approach to let netif_receive_skb() set the iif-value or >> should we better set this value on our own before invoking netif_rx()? >> > > netif_receive_skb is meant to be used as a default, the driver can > override this if it makes sense. If you touch it anyway you might > as well set it to the final value. The CAN bus is really not that high sophisticated network technology, so it does not need more than the default internal network transport mechanics the Linux Kernel already provides in an excellent manner. I also thought about setting skb->iif myself to ensure the correct value to be set - maybe Jamal has also an opinion on this. The CAN bus only transports CAN-frames with a 11/29 bit CAN-Identifier (for CSMA/CA arbitration) with up to 8 Bytes of payload. There is no space for VLANs and other addressing schemes that are known from Ethernet or other network media. So in opposite to all the fancy VLANs, routing, filter, NAT and whatever the CAN is really dumb ;-) Regards, Oliver - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] alpha: fix alignment problem in csum_ipv6_magic()
Ivan Kokshaysky wrote: On Thu, Jun 21, 2007 at 04:35:01PM -0700, Andrew Morton wrote: In http://bugzilla.kernel.org/show_bug.cgi?id=8659, Dustin is reporting that this patch broke tcp-on-ipv6. Oops. Two instructions operating on the 'len' arg ($18) got swapped... This should fix ev6 version, ev5 one seems to be ok. Signed-off-by: Ivan Kokshaysky <[EMAIL PROTECTED]> Ivan. --- 2.6.22-rc4-mm2/arch/alpha/lib/ev6-csum_ipv6_magic.S Fri Jun 22 15:02:23 2007 +++ linux/arch/alpha/lib/ev6-csum_ipv6_magic.S Fri Jun 22 15:05:38 2007 @@ -76,18 +76,18 @@ csum_ipv6_magic: cmoveq $6,$31,$22 # E : src aligned? ldq_u $23,15($17) # L : Latency: 3 - or $18,$4,$18 # E : 00CCDDAABBCC - extql $1,$6,$1# U : U L L U : + inswl $18,3,$18 # U : 00CCDD00 + addl$19,$7,$19 # E : U L U L : bbaabb00 or $0,$22,$0 # E : 1st src word complete - extqh $5,$6,$5# U : - addl$19,$7,$19 # E : bbaabb00 - and $17,7,$6# E : L U L U : dst misalignment + extql $1,$6,$1# U : + or $18,$4,$18 # E : 00CCDDAABBCC + extqh $5,$6,$5# U : L U L U - inswl $18,3,$18 # U : 00CCDD00 - or $1,$5,$1# E : 2nd src word complete + and $17,7,$6# E : dst misalignment extql $2,$6,$2# U : - extqh $3,$6,$22 # U : U L U U : + or $1,$5,$1# E : 2nd src word complete + extqh $3,$6,$22 # U : L U L U : cmoveq $6,$31,$22 # E : dst aligned? extql $3,$6,$3# U : Awesome! Works like a champ! Thank you guys so very much! You rock! -Dustin - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Nested compat attribute sch_prio example
These two patches contain some example code how to use the nested compat attribute in sch_prio. [NET_SCHED]: sch_prio: nested compat attribute test Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 2ceff1c312a93446c95c41c3a54245a15278fe07 tree 7335b4957440cec27894dd38f3a707b4344f21ea parent dece87e23c7cfa1159d3be0ea5b0db89a0fc5872 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 18:10:44 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 18:10:44 +0200 include/linux/pkt_sched.h |9 + net/sched/sch_prio.c | 15 --- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index 2599d39..0bedabe 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -101,6 +101,15 @@ struct tc_prio_qopt __u8 priomap[TC_PRIO_MAX+1]; /* Map: logical priority -> PRIO band */ }; +enum +{ + TCA_PRIO_UNPSEC, + TCA_PRIO_TEST, + __TCA_PRIO_MAX +}; + +#define TCA_PRIO_MAX (__TCA_PRIO_MAX - 1) + /* TBF section */ struct tc_tbf_qopt diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index 6d7542c..40a13e8 100644 --- a/net/sched/sch_prio.c +++ b/net/sched/sch_prio.c @@ -198,10 +198,12 @@ prio_destroy(struct Qdisc* sch) static int prio_tune(struct Qdisc *sch, struct rtattr *opt) { struct prio_sched_data *q = qdisc_priv(sch); - struct tc_prio_qopt *qopt = RTA_DATA(opt); + struct tc_prio_qopt *qopt; + struct rtattr *tb[TCA_PRIO_MAX]; int i; - if (opt->rta_len < RTA_LENGTH(sizeof(*qopt))) + if (rtattr_parse_nested_compat(tb, TCA_PRIO_MAX, opt, (void *)&qopt, + sizeof(*qopt))) return -EINVAL; if (qopt->bands > TCQ_PRIO_BANDS || qopt->bands < 2) return -EINVAL; @@ -211,6 +213,9 @@ static int prio_tune(struct Qdisc *sch, return -EINVAL; } + if (tb[TCA_PRIO_TEST-1]) + printk("TCA_PRIO_TEST: %u\n", *(u32 *)RTA_DATA(tb[TCA_PRIO_TEST-1])); + sch_tree_lock(sch); q->bands = qopt->bands; memcpy(q->prio2band, qopt->priomap, TC_PRIO_MAX+1); @@ -268,11 +273,15 @@ static int prio_dump(struct Qdisc *sch, { struct prio_sched_data *q = qdisc_priv(sch); unsigned char *b = skb_tail_pointer(skb); + struct rtattr *nest; struct tc_prio_qopt opt; opt.bands = q->bands; memcpy(&opt.priomap, q->prio2band, TC_PRIO_MAX+1); - RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt); + + nest = RTA_NEST_COMPAT(skb, TCA_OPTIONS, sizeof(opt), &opt); + RTA_PUT_U32(skb, TCA_PRIO_TEST, 321); + RTA_NEST_COMPAT_END(skb, nest); return skb->len; rtattr_failure: [IPROUTE]: sch_prio: nested compat attribute test Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 37e8758e1238bf26172f25a0e3b0dec9c8c4f986 tree c79e320def8f4c5a5ed7f037f3ca6ec68487b375 parent d283ea3c852f54941ec785ad39dbfa4586f518c7 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 18:00:04 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 18:00:04 +0200 include/linux/pkt_sched.h |9 + tc/q_prio.c | 13 ++--- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h index dc61a85..77eaab1 100644 --- a/include/linux/pkt_sched.h +++ b/include/linux/pkt_sched.h @@ -101,6 +101,15 @@ struct tc_prio_qopt __u8 priomap[TC_PRIO_MAX+1]; /* Map: logical priority -> PRIO band */ }; +enum +{ + TCA_PRIO_UNSPEC, + TCA_PRIO_TEST, + __TCA_PRIO_MAX +}; + +#define TCA_PRIO_MAX (__TCA_PRIO_MAX - 1) + /* TBF section */ struct tc_tbf_qopt diff --git a/tc/q_prio.c b/tc/q_prio.c index d696e1b..4934416 100644 --- a/tc/q_prio.c +++ b/tc/q_prio.c @@ -40,6 +40,7 @@ static int prio_parse_opt(struct qdisc_u int pmap_mode = 0; int idx = 0; struct tc_prio_qopt opt={3,{ 1, 2, 2, 2, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1 }}; + struct rtattr *nest; while (argc > 0) { if (strcmp(*argv, "bands") == 0) { @@ -90,7 +91,9 @@ static int prio_parse_opt(struct qdisc_u opt.priomap[idx] = opt.priomap[TC_PRIO_BESTEFFORT]; } */ - addattr_l(n, 1024, TCA_OPTIONS, &opt, sizeof(opt)); + nest = addattr_nest_compat(n, 1024, TCA_OPTIONS, &opt, sizeof(opt)); + addattr32(n, 1024, TCA_PRIO_TEST, 123); + addattr_nest_compat_end(n, nest); return 0; } @@ -98,16 +101,20 @@ int prio_print_opt(struct qdisc_util *qu { int i; struct tc_prio_qopt *qopt; + struct rtattr *tb[TCA_PRIO_MAX+1]; if (opt == NULL) return 0; - if (RTA_PAYLOAD(opt) < sizeof(*qopt)) + if (parse_rtattr_nested_compat(tb, TCA_PRIO_MAX, opt, (void *)&qopt, sizeof(*qopt))) return -1; - qopt = RTA_DATA(opt); + fprintf(f, "bands %u priomap ", qopt->bands); for (i=0; i<=TC_PRIO_MAX; i++) fprintf(f, " %d", qopt->priomap[i]); + + if (tb[TCA_PRIO_TEST]) + fprintf(f, " TCA_PRIO_TEST: %u ", *(__u32 *)RTA_DATA(tb[TCA_PRIO_TEST])); return 0; }
[RTNETLINK]: Add nested compat attribute
This patch adds a new attribute type that can be used to replace non-nested attributes that contain structures by nested ones in a compatible way. This can be used in cases like Peter's who is trying to extend sch_prio, which currently uses a fixed structure without any holes. Switching to nested attributes makes sure that the next person won't run into the same problem. [RTNETLINK]: Add nested compat attribute Add a nested compat attribute type that can be used to convert attributes that contain a structure to nested attributes in a backwards compatible way. The attribute looks like this: struct { [ compat contents ] struct rtattr { .rta_len= total size, .rta_type = type, } rta; struct old_structure struct; [ nested top-level attribute ] struct rtattr { .rta_len= nest size, .rta_type = type, } nest_attr; [ optional 0 .. n nested attributes ] struct rtattr { .rta_len= private attribute len, .rta_type = private attribute typ, } nested_attr; struct nested_data data; }; Since both userspace and kernel deal correctly with attributes that are larger than expected old versions will just parse the compat part and ignore the rest. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit dece87e23c7cfa1159d3be0ea5b0db89a0fc5872 tree c14be602de94b258e0343816b6c1809233a2ff5f parent c4edf5d552b1450d903a7e7e2d846f2169087e10 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 17:52:21 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 17:52:21 +0200 include/linux/rtnetlink.h | 14 ++ net/core/rtnetlink.c | 16 2 files changed, 30 insertions(+), 0 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 6127858..6731e7f 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -570,6 +570,8 @@ static __inline__ int rtattr_strcmp(cons } extern int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr *rta, int len); +extern int rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr, + struct rtattr *rta, void **data, int len); #define rtattr_parse_nested(tb, max, rta) \ rtattr_parse((tb), (max), RTA_DATA((rta)), RTA_PAYLOAD((rta))) @@ -638,6 +640,18 @@ #define RTA_NEST_END(skb, start) \ ({ (start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \ (skb)->len; }) +#define RTA_NEST_COMPAT(skb, type, attrlen, data) \ +({ struct rtattr *__start = (struct rtattr *)skb_tail_pointer(skb); \ + RTA_PUT(skb, type, attrlen, data); \ + RTA_NEST(skb, type); \ + __start; }) + +#define RTA_NEST_COMPAT_END(skb, start) \ +({ struct rtattr *__nest = (void *)(start) + NLMSG_ALIGN((start)->rta_len); \ + (start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \ + RTA_NEST_END(skb, __nest); \ + (skb)->len; }) + #define RTA_NEST_CANCEL(skb, start) \ ({ if (start) \ skb_trim(skb, (unsigned char *) (start) - (skb)->data); \ diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 06c0c5a..c25d23b 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -97,6 +97,21 @@ int rtattr_parse(struct rtattr *tb[], in return 0; } +int rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr, + struct rtattr *rta, void **data, int len) +{ + if (RTA_PAYLOAD(rta) < len) + return -1; + *data = RTA_DATA(rta); + + if (RTA_PAYLOAD(rta) >= RTA_ALIGN(len) + sizeof(struct rtattr)) { + rta = RTA_DATA(rta) + RTA_ALIGN(len); + return rtattr_parse_nested(tb, maxattr, rta); + } + memset(tb, 0, sizeof(struct rtattr *) * maxattr); + return 0; +} + static struct rtnl_link *rtnl_msg_handlers[NPROTO]; static inline int rtm_msgindex(int msgtype) @@ -1297,6 +1312,7 @@ void __init rtnetlink_init(void) EXPORT_SYMBOL(__rta_fill); EXPORT_SYMBOL(rtattr_strlcpy); EXPORT_SYMBOL(rtattr_parse); +EXPORT_SYMBOL(rtattr_parse_nested_compat); EXPORT_SYMBOL(rtnetlink_put_metrics); EXPORT_SYMBOL(rtnl_lock); EXPORT_SYMBOL(rtnl_trylock);
[IPROUTE]: Add nested compat attribute
This patch adds a new attribute type that can be used to replace non-nested attributes that contain structures by nested ones in a compatible way. This can be used in cases like Peter's who is trying to extend sch_prio, which currently uses a fixed structure without any holes. Switching to nested attributes makes sure that the next person won't run into the same problem. [IPROUTE]: Add nested compat attribute Add a nested compat attribute type that can be used to convert attributes that contain a structure to nested attributes in a backwards compatible way. The attribute looks like this: struct { [ compat contents ] struct rtattr { .rta_len = total size, .rta_type = type, } rta; struct old_structure struct; [ nested top-level attribute ] struct rtattr { .rta_len = nest size, .rta_type = type, } nest_attr; [ optional 0 .. n nested attributes ] struct rtattr { .rta_len = private attribute len, .rta_type = private attribute typ, } nested_attr; struct nested_data data; }; Since both userspace and kernel deal correctly with attributes that are larger than expected old versions will just parse the compat part and ignore the rest. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit d283ea3c852f54941ec785ad39dbfa4586f518c7 tree d2300d75cf50897386e670591fa963acd2bbd21b parent cd71a8e07f57a74d52e62cc1fed39c03ad64bc08 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 17:59:27 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 17:59:27 +0200 include/libnetlink.h |5 + lib/libnetlink.c | 48 2 files changed, 53 insertions(+), 0 deletions(-) diff --git a/include/libnetlink.h b/include/libnetlink.h index 49e248e..bd426e6 100644 --- a/include/libnetlink.h +++ b/include/libnetlink.h @@ -39,11 +39,16 @@ extern int rtnl_send(struct rtnl_handle extern int addattr32(struct nlmsghdr *n, int maxlen, int type, __u32 data); extern int addattr_l(struct nlmsghdr *n, int maxlen, int type, const void *data, int alen); extern int addraw_l(struct nlmsghdr *n, int maxlen, const void *data, int len); +extern struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type); +extern int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest); +extern struct rtattr *addattr_nest_compat(struct nlmsghdr *n, int maxlen, int type, const void *data, int len); +extern int addattr_nest_compat_end(struct nlmsghdr *n, struct rtattr *nest); extern int rta_addattr32(struct rtattr *rta, int maxlen, int type, __u32 data); extern int rta_addattr_l(struct rtattr *rta, int maxlen, int type, const void *data, int alen); extern int parse_rtattr(struct rtattr *tb[], int max, struct rtattr *rta, int len); extern int parse_rtattr_byindex(struct rtattr *tb[], int max, struct rtattr *rta, int len); +extern int parse_rtattr_nested_compat(struct rtattr *tb[], int max, struct rtattr *rta, void **data, int len); #define parse_rtattr_nested(tb, max, rta) \ (parse_rtattr((tb), (max), RTA_DATA(rta), RTA_PAYLOAD(rta))) diff --git a/lib/libnetlink.c b/lib/libnetlink.c index 555dd5c..2add7e9 100644 --- a/lib/libnetlink.c +++ b/lib/libnetlink.c @@ -527,6 +527,39 @@ int addraw_l(struct nlmsghdr *n, int max return 0; } +struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type) +{ + struct rtattr *nest = NLMSG_TAIL(n); + + addattr_l(n, maxlen, type, NULL, 0); + return nest; +} + +int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest) +{ + nest->rta_len = (void *)NLMSG_TAIL(n) - (void *)nest; + return n->nlmsg_len; +} + +struct rtattr *addattr_nest_compat(struct nlmsghdr *n, int maxlen, int type, + const void *data, int len) +{ + struct rtattr *start = NLMSG_TAIL(n); + + addattr_l(n, maxlen, type, data, len); + addattr_nest(n, maxlen, type); + return start; +} + +int addattr_nest_compat_end(struct nlmsghdr *n, struct rtattr *start) +{ + struct rtattr *nest = (void *)start + NLMSG_ALIGN(start->rta_len); + + start->rta_len = (void *)NLMSG_TAIL(n) - (void *)start; + addattr_nest_end(n, nest); + return n->nlmsg_len; +} + int rta_addattr32(struct rtattr *rta, int maxlen, int type, __u32 data) { int len = RTA_LENGTH(4); @@ -589,3 +622,18 @@ int parse_rtattr_byindex(struct rtattr * fprintf(stderr, "!!!Deficit %d, rta_len=%d\n", len, rta->rta_len); return i; } + +int parse_rtattr_nested_compat(struct rtattr *tb[], int max, struct rtattr *rta, + void **data, int len) +{ + if (RTA_PAYLOAD(rta) < len) + return -1; + *data = RTA_DATA(rta); + + if (RTA_PAYLOAD(rta) >= RTA_ALIGN(len) + sizeof(struct rtattr)) { + rta = RTA_DATA(rta) + RTA_ALIGN(len); + return parse_rtattr_nested(tb, max, rta); + } + memset(tb, 0, sizeof(struct rtattr *) * max); + return 0; +}
Re: [patch 0/7] CAN: Add new PF_CAN protocol family, try #3
Oliver Hartkopp wrote: Patrick McHardy wrote: Urs Thuermann wrote: * Use skb->iif instead of skb->cb to pass receiving interface from raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg(). skb->iif doesn't necessarily point to the incoming network device as seen seen by netif_receive_skb, for layered devices it currently always points to the first interface that received a packet. This is exactly the intention. Its so far also only used for traffic classification, please explain how you're using it and what values it is set to on which paths. As you might have seen in Documentation/networking/can.txt (hint, hint, hint!) the CAN has no routing, no ARP, no MAC adressing and is a broadcast only medium. So if there is (at least) any reasonable addressing on CAN it consists of the CAN-frame's "CAN-Identifier" and the CAN-bus this CAN-frame is sent/received on. For this reason the information about the interface the CAN-frame has been received on has to be made available to the user-application if it needs this information. Until your hint about our skb->cb missusage, we (successfully) transported this information inside skb->cb to socket-level. But indeed skb->iif is the better (and in our opinion the right) place to transport this information inside the skb to the socket-level. Lets hear Jamal's opinion on this, to be honest I never understood how exactly it is supposed to be used. In both cases (receiving real CAN-frames from the CAN-netdev / performing the loopback of CAN-frames) we set skb->iif to zero to let netif_receive_skb() set the iif-value to the current skb->dev index. So skb->iif is set to the first interface the CAN-frame is received on, which is what we need & intended here. Is it the right approach to let netif_receive_skb() set the iif-value or should we better set this value on our own before invoking netif_rx()? netif_receive_skb is meant to be used as a default, the driver can override this if it makes sense. If you touch it anyway you might as well set it to the final value. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/7] CAN: Add new PF_CAN protocol family, try #3
Patrick McHardy wrote: > Urs Thuermann wrote: > >> * Use skb->iif instead of skb->cb to pass receiving interface from >> raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg(). >> > > > skb->iif doesn't necessarily point to the incoming network device > as seen seen by netif_receive_skb, for layered devices it currently > always points to the first interface that received a packet. > This is exactly the intention. > Its so far also only used for traffic classification, please explain > how you're using it and what values it is set to on which paths. > As you might have seen in Documentation/networking/can.txt (hint, hint, hint!) the CAN has no routing, no ARP, no MAC adressing and is a broadcast only medium. So if there is (at least) any reasonable addressing on CAN it consists of the CAN-frame's "CAN-Identifier" and the CAN-bus this CAN-frame is sent/received on. For this reason the information about the interface the CAN-frame has been received on has to be made available to the user-application if it needs this information. Until your hint about our skb->cb missusage, we (successfully) transported this information inside skb->cb to socket-level. But indeed skb->iif is the better (and in our opinion the right) place to transport this information inside the skb to the socket-level. In both cases (receiving real CAN-frames from the CAN-netdev / performing the loopback of CAN-frames) we set skb->iif to zero to let netif_receive_skb() set the iif-value to the current skb->dev index. So skb->iif is set to the first interface the CAN-frame is received on, which is what we need & intended here. Is it the right approach to let netif_receive_skb() set the iif-value or should we better set this value on our own before invoking netif_rx()? Best regards, Oliver - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linksys Gigabit USB2.0 adapter (asix) regression
On Wed, 2007-06-20 at 13:56 +0200, Erik Slagter wrote: > To rule out the possibility of the nic being defective, I connected the > USB nic to a windows computer. There it works, although the ethernet > connection is a bit flaky (just like it seems...). > > Then I did a diff on the respective kernel sources of 2.6.20.3 and > 2.6.22-rc2 (asix.c and usbnet.c), I found a few changes, but they do not > seem to be related to my problem. > > I am the and of my repertoire here, can anyone please do some > suggestions for further testing or even better, fix it ;-) You wouldn't happen to know what PHY that device is using? The AX88178 (Gigabit USB Ethernet) support in the driver currently only supports the Marvell PHY, which is the only one I've actually encountered to-date. If you can rebuild the driver from your kernel sources but with DEBUG enabled (uncomment it at the top of asix.c) You can build the driver out-of-tree by creating a Makefile with these contents: obj-m += asix.o EXTRA_CFLAGS += -DDEBUG all: make -C /lib/modules/`uname -r`/build SUBDIRS=`pwd` clean: make -C /lib/modules/`uname -r`/build SUBDIRS=`pwd` clean (You'll also need to copy usbnet.h into that directory) After you build the module, load it with insmod ./asix.ko, plug in your device and send me the dmesg output. I'm particularly interested in the PHYID=0x12345678 line. That will tell me what PHY chip is being used in that device and if I need to add support for it. -- David Hollis <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20->2.6.21 - networking dies after random time
On Fri, Jun 22, 2007 at 10:56:44AM +0200, Marcin Ślusarz wrote: ... > When I disable on-board network card in BIOS (controlled by skge) > ne2k-pci card is still locking up. So I think it's strictly ne2k-pci > card bug. I made some tests and I know how to reproduce it fast (on my > machine) - just make some heavy network traffic... ... I'm no good at hardware, but I guess this log could be not enough. So, if nobody will find something more sensible, maybe you can try some of these suggestions: - you've written it was OK with 2.6.20; it would be interesting to check if there were any changes in config (beside new options) or even retry 2.6.20 with "current" config after make oldconfig; - during such problems it's better to try to turn off as much unnecessary options/drivers as possible to find if it's really about network driver; e.g.: no SMP, tv cards, acpi - only basic, without options etc.; - if possible try it with newer kernel e.g. 2.6.22-rc5; - if possible try it with another, fresh distro (e.g. some live CD/DVD/USB bootable); - there was a lockdep warning from tvtime/bttv; - try to get some more debugging (help: modinfo ne2k-pci). Regards, Jarek P. PS: for anybody interested - here is the beginning of this story: http://marc.info/?l=linux-kernel&m=118202978609968&w=2 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Sky2 driver in 2.6.22-rc5-git1-cfs-v17
On tor, 2007-06-21 at 21:13 -0700, Stephen Hemminger wrote: > On Fri, 22 Jun 2007 04:45:25 +0200 > Ian Kumlien <[EMAIL PROTECTED]> wrote: > > > On tor, 2007-06-21 at 18:57 -0700, Stephen Hemminger wrote: > > > Redirected of LKML, netdev is the proper list. > > > > Thanks =) > > > > > On Thu, 21 Jun 2007 22:51:32 +0200 > > > Ian Kumlien <[EMAIL PROTECTED]> wrote: > > > > > > > Hi, > > > > > > > > recently have started to see this in my dmesg: > > > > > > > > NETDEV WATCHDOG: eth0: transmit timed out > > > > sky2 eth0: tx timeout > > > > sky2 eth0: transmit ring 449 .. 408 report=449 done=449 > > > > sky2 eth0: disabling interface > > > > sky2 eth0: enabling interface > > > > sky2 eth0: ram buffer 48K > > > > sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx > > > > > > > > I'm not using MSI since it seems to have caused problems in the past. > > > > > > > > I run with a 9k mtu > > > > > > > > sky2 eth0: transmit ring 18 .. 489 report=18 done=18 > > > > I assume ring max is 512 (ie 1-512) since: > > > > Ring parameters for eth0: > > > > Current hardware settings: > > > > RX: 168 > > > > RX Mini:0 > > > > RX Jumbo: 0 > > > > TX: 511 > > > > > > > > And 489 + 41 - 18 = 512 > > > > > > > > sky2 eth0: transmit ring 197 .. 156 report=197 done=197 > > > > sky2 eth0: transmit ring 480 .. 439 report=480 done=480 > > > > sky2 eth0: transmit ring 413 .. 372 report=413 done=413 > > > > sky2 eth0: transmit ring 320 .. 279 report=320 done=320 > > > > > > > > Else, they are all off by 41. > > > > > > > > Is this a known bug? > > > no > > > > Damn =P > > > > > > Comments? ideas? > > > > > > > which chip version. probably Yukon EC that seems to be the only one > > > that does gigabit with Ram buffer. > > > > sky2 :02:00.0: v1.14 addr 0xdbffc000 irq 18 Yukon-EC (0xb6) rev 2 > > > > > Does it work alright if you set transmit ring size smaller with ethtool? > > > There might be an off-by-one bug in the worst case calculations about > > > list element usage. > > > > I tried this... but not with a specific size, i think i did 480, and yes > > it timed out... any ideas on a more educated value? > > > > -- > > Ian Kumlien -- http://pomac.netswarm.net > > Also try setting the idle_timeout module parameter to something link 10 (ms). > It will fix problems with lost interrupts. I have changed it now, and i'm leaving it running... One interesting bit is that if i lowered from 511 to 510, The magic number was 42 not 41. -- Ian Kumlien -- http://pomac.netswarm.net signature.asc Description: This is a digitally signed message part
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Urs Thuermann wrote: > Patrick McHardy <[EMAIL PROTECTED]> writes: > > >>Is there a reason why you're still doing the "allocate n devices >>on init" thing instead of using the rtnl_link API? > > > Sorry, it's simply a matter of time. We have been extremely busy with > other projects and two presentations (mgmt, customers, and press) the > last two weeks and have worked on the other changes this week. I'm > sorry I haven't yet been able to look at your rtnl_link code close > enough, but it's definitely on my todo list. Starting on Sunday I'll > be on a business trip to .jp for a week, and I hope I get to it in > that week, otherwise on return. Sorry, but busy is no reason for merging code that has deprecated (at least by me :)) behaviour. Please change this before submitting for inclusion. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/7] CAN: Add new PF_CAN protocol family, try #3
Urs Thuermann wrote: > * Use skb->iif instead of skb->cb to pass receiving interface from > raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg(). skb->iif doesn't necessarily point to the incoming network device as seen seen by netif_receive_skb, for layered devices it currently always points to the first interface that received a packet. Its so far also only used for traffic classification, please explain how you're using it and what values it is set to on which paths. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Patrick McHardy <[EMAIL PROTECTED]> writes: > Is there a reason why you're still doing the "allocate n devices > on init" thing instead of using the rtnl_link API? Sorry, it's simply a matter of time. We have been extremely busy with other projects and two presentations (mgmt, customers, and press) the last two weeks and have worked on the other changes this week. I'm sorry I haven't yet been able to look at your rtnl_link code close enough, but it's definitely on my todo list. Starting on Sunday I'll be on a business trip to .jp for a week, and I hope I get to it in that week, otherwise on return. urs - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
via-rhine: Transmit timed out problem
Hi, I'm experiencing a strange problem with a via rhine network card on Ubuntu 7.04 (2.6.20-16-generic #2 SMP). The hardware seemed to come into an inconsistent state, since rmmod'ing and modprobe'ing the via-rhine driver back didn't help. After the problem had appeared, I could see the following in dmesg: [ 8601.971189] irq 21: nobody cared (try booting with the "irqpoll" option) [ 8601.971214] [] __report_bad_irq+0x24/0x80 [ 8601.971229] [] note_interrupt+0x25e/0x290 [ 8601.971238] [] handle_IRQ_event+0x30/0x60 [ 8601.971245] [] handle_fasteoi_irq+0xc1/0xf0 [ 8601.971252] [] do_IRQ+0x40/0x80 [ 8601.971259] [] common_interrupt+0x23/0x30 [ 8601.971269] [] mwait_idle_with_hints+0x46/0x60 [ 8601.971276] [] cpu_idle+0x49/0xd0 [ 8601.971289] === [ 8601.971291] handlers: [ 8601.971293] [] (usb_hcd_irq+0x0/0x60 [usbcore]) [ 8601.971311] [] (rhine_interrupt+0x0/0xb80 [via_rhine]) [ 8601.971324] Disabling IRQ #21 [ 8637.970985] NETDEV WATCHDOG: eth0: transmit timed out [ 8637.971135] eth0: Transmit timed out, status 1003, PHY status 786d, resetting [ 8637.971163] via-rhine: Reset not complete yet. Trying harder. [ 8637.971754] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 [ 8640.749432] via-rhine: Reset not complete yet. Trying harder. [ 8640.750018] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 [ 8644.746689] NETDEV WATCHDOG: eth0: transmit timed out [ 8644.746838] eth0: Transmit timed out, status 0003, PHY status 786d, resetting [ 8644.747446] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 [ 8648.743327] NETDEV WATCHDOG: eth0: transmit timed out [ 8648.743476] eth0: Transmit timed out, status 0003, PHY status 786d, resetting [ 8648.744083] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 [ 8651.070635] eth0: no IPv6 routers present [ 8670.723818] NETDEV WATCHDOG: eth0: transmit timed out [ 8670.723968] eth0: Transmit timed out, status 0003, PHY status 786d, resetting [ 8670.723995] via-rhine: Reset not complete yet. Trying harder. [ 8670.724578] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 [ 8726.668036] NETDEV WATCHDOG: eth0: transmit timed out The interrupt seemed to be unhandled and got disabled by the kernel then. The transmission seemed to time out for some reason (probably, the hardware got into an inconsistent state?). Some related information: [EMAIL PROTECTED]:~% lspci |grep -i rhine 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 7c) [EMAIL PROTECTED]:~% uname -a Linux coreduo 2.6.20-16-generic #2 SMP Thu Jun 7 20:19:32 UTC 2007 i686 GNU/Linux [EMAIL PROTECTED]:~% dmesg|grep rhine [2.982700] via-rhine.c:v1.10-LK1.4.2 Sept-11-2006 Written by Donald Becker Is that information sufficient for debug? Let me know if you need any additional data. Kirill - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET 04/05]: dev: secondary unicast address support
[NET]: dev: secondary unicast address support Add support for configuring secondary unicast addresses on network devices. To support this devices capable of filtering multiple unicast addresses need to change their set_multicast_list function to configure unicast filters as well and assign it to dev->set_rx_mode instead of dev->set_multicast_list. Other devices are put into promiscous mode when secondary unicast addresses are present. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 099e4ab74adb9418155132b093533f152a31b583 tree 7c8f52672f7b6e1323a479545225d88a2eb35670 parent 02536a101d6fd8b1924b1e05c44409c7b4568335 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 14:13:46 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 14:13:46 +0200 include/linux/netdevice.h | 12 +++- net/core/dev.c| 144 - net/core/dev_mcast.c | 37 +--- 3 files changed, 139 insertions(+), 54 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b2db124..46585dc 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -393,6 +393,9 @@ struct net_device unsigned char addr_len; /* hardware address length */ unsigned short dev_id; /* for shared network cards */ + struct dev_addr_list*uc_list; /* Secondary unicast mac addresses */ + int uc_count; /* Number of installed ucasts */ + int uc_promisc; struct dev_addr_list*mc_list; /* Multicast mac addresses */ int mc_count; /* Number of installed mcasts */ int promiscuity; @@ -498,6 +501,8 @@ struct net_device void *saddr, unsigned len); int (*rebuild_header)(struct sk_buff *skb); +#define HAVE_SET_RX_MODE + void(*set_rx_mode)(struct net_device *dev); #define HAVE_MULTICAST void(*set_multicast_list)(struct net_device *dev); #define HAVE_SET_MAC_ADDR @@ -1004,8 +1009,11 @@ extern struct net_device *alloc_netdev(int sizeof_priv, const char *name, void (*setup)(struct net_device *)); extern int register_netdev(struct net_device *dev); extern voidunregister_netdev(struct net_device *dev); -/* Functions used for multicast support */ -extern voiddev_mc_upload(struct net_device *dev); +/* Functions used for secondary unicast and multicast support */ +extern voiddev_set_rx_mode(struct net_device *dev); +extern void__dev_set_rx_mode(struct net_device *dev); +extern int dev_unicast_delete(struct net_device *dev, void *addr, int alen); +extern int dev_unicast_add(struct net_device *dev, void *addr, int alen); extern int dev_mc_delete(struct net_device *dev, void *addr, int alen, int all); extern int dev_mc_add(struct net_device *dev, void *addr, int alen, int newonly); extern voiddev_mc_discard(struct net_device *dev); diff --git a/net/core/dev.c b/net/core/dev.c index 1496715..50a4e1e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -942,7 +942,7 @@ int dev_open(struct net_device *dev) /* * Initialize multicasting status */ - dev_mc_upload(dev); + dev_set_rx_mode(dev); /* * Wakeup transmit queue engine @@ -2496,17 +2496,7 @@ int netdev_set_master(struct net_device *slave, struct net_device *master) return 0; } -/** - * dev_set_promiscuity - update promiscuity count on a device - * @dev: device - * @inc: modifier - * - * Add or remove promiscuity from a device. While the count in the device - * remains above zero the interface remains promiscuous. Once it hits zero - * the device reverts back to normal filtering operation. A negative inc - * value is used to drop promiscuity on the device. - */ -void dev_set_promiscuity(struct net_device *dev, int inc) +static void __dev_set_promiscuity(struct net_device *dev, int inc) { unsigned short old_flags = dev->flags; @@ -2515,7 +2505,6 @@ void dev_set_promiscuity(struct net_device *dev, int inc) else dev->flags |= IFF_PROMISC; if (dev->flags != old_flags) { - dev_mc_upload(dev); printk(KERN_INFO "device %s %s promiscuous mode\n", dev->name, (dev->flags & IFF_PROMISC) ? "entered" : "left"); @@ -2529,6 +2518,25 @@ void dev_set_promiscuity(struct net_device *dev, int inc) } /
[E1000 05/05]: Secondary unicast address support
[E1000]: Secondary unicast address support Add support for configuring secondary unicast addresses. Unicast addresses take precendece over multicast addresses when filling the exact address filters to avoid going to promiscous mode. When more unicast addresses are present than filter slots, unicast filtering is disabled and all slots can be used for multicast addresses. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 9613e4e4017b8bb68fcdd28cf5f9ae00bff18e28 tree e19261eea046a0404af0b26e2b99725ee33ae3c2 parent 099e4ab74adb9418155132b093533f152a31b583 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 14:13:48 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 14:13:48 +0200 drivers/net/e1000/e1000_main.c | 47 ++-- 1 files changed, 31 insertions(+), 16 deletions(-) diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index cf8af92..716fc8f 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -149,7 +149,7 @@ static void e1000_clean_tx_ring(struct e1000_adapter *adapter, struct e1000_tx_ring *tx_ring); static void e1000_clean_rx_ring(struct e1000_adapter *adapter, struct e1000_rx_ring *rx_ring); -static void e1000_set_multi(struct net_device *netdev); +static void e1000_set_rx_mode(struct net_device *netdev); static void e1000_update_phy_info(unsigned long data); static void e1000_watchdog(unsigned long data); static void e1000_82547_tx_fifo_stall(unsigned long data); @@ -513,7 +513,7 @@ static void e1000_configure(struct e1000_adapter *adapter) struct net_device *netdev = adapter->netdev; int i; - e1000_set_multi(netdev); + e1000_set_rx_mode(netdev); e1000_restore_vlan(adapter); e1000_init_manageability(adapter); @@ -924,7 +924,7 @@ e1000_probe(struct pci_dev *pdev, netdev->stop = &e1000_close; netdev->hard_start_xmit = &e1000_xmit_frame; netdev->get_stats = &e1000_get_stats; - netdev->set_multicast_list = &e1000_set_multi; + netdev->set_rx_mode = &e1000_set_rx_mode; netdev->set_mac_address = &e1000_set_mac; netdev->change_mtu = &e1000_change_mtu; netdev->do_ioctl = &e1000_ioctl; @@ -2412,21 +2412,22 @@ e1000_set_mac(struct net_device *netdev, void *p) } /** - * e1000_set_multi - Multicast and Promiscuous mode set + * e1000_set_rx_mode - Secondary Unicast, Multicast and Promiscuous mode set * @netdev: network interface device structure * - * The set_multi entry point is called whenever the multicast address - * list or the network interface flags are updated. This routine is - * responsible for configuring the hardware for proper multicast, + * The set_rx_mode entry point is called whenever the unicast or multicast + * address lists or the network interface flags are updated. This routine is + * responsible for configuring the hardware for proper unicast, multicast, * promiscuous mode, and all-multi behavior. **/ static void -e1000_set_multi(struct net_device *netdev) +e1000_set_rx_mode(struct net_device *netdev) { struct e1000_adapter *adapter = netdev_priv(netdev); struct e1000_hw *hw = &adapter->hw; - struct dev_mc_list *mc_ptr; + struct dev_addr_list *uc_ptr; + struct dev_addr_list *mc_ptr; uint32_t rctl; uint32_t hash_value; int i, rar_entries = E1000_RAR_ENTRIES; @@ -2449,9 +2450,16 @@ e1000_set_multi(struct net_device *netdev) rctl |= (E1000_RCTL_UPE | E1000_RCTL_MPE); } else if (netdev->flags & IFF_ALLMULTI) { rctl |= E1000_RCTL_MPE; - rctl &= ~E1000_RCTL_UPE; } else { - rctl &= ~(E1000_RCTL_UPE | E1000_RCTL_MPE); + rctl &= ~E1000_RCTL_MPE; + } + + uc_ptr = NULL; + if (netdev->uc_count > rar_entries - 1) { + rctl |= E1000_RCTL_UPE; + } else if (!(netdev->flags & IFF_PROMISC)) { + rctl &= ~E1000_RCTL_UPE; + uc_ptr = netdev->uc_list; } E1000_WRITE_REG(hw, RCTL, rctl); @@ -2461,7 +2469,10 @@ e1000_set_multi(struct net_device *netdev) if (hw->mac_type == e1000_82542_rev2_0) e1000_enter_82542_rst(adapter); - /* load the first 14 multicast address into the exact filters 1-14 + /* load the first 14 addresses into the exact filters 1-14. Unicast +* addresses take precedence to avoid disabling unicast filtering +* when possible. +* * RAR 0 is used for the station MAC adddress * if there are not 14 addresses, go ahead and clear the filters * -- with 82571 controllers only 0-13 entries are filled here @@ -2469,8 +2480,11 @@ e1000_set_multi(struct net_device *netdev) mc_ptr = netdev->mc_list; for (i = 1; i < rar_entries; i++) { - if (m
[NET 03/05]: dev_mcast: switch to generic net_device address lists
[NET]: dev_mcast: switch to generic net_device address lists Use generic net_device address lists for multicast list handling. Some defines are used to keep drivers working. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 02536a101d6fd8b1924b1e05c44409c7b4568335 tree 6624b4f7f6fb0b10bac091ca43b733dfd1609afc parent 6d8fd140951de7cc8faab4922dba74dd1db3cae5 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 03:25:28 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 03:25:28 +0200 include/linux/netdevice.h | 17 +++- net/core/dev_mcast.c | 96 +++-- 2 files changed, 22 insertions(+), 91 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3785a8a..b2db124 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -189,15 +189,12 @@ struct dev_addr_list /* * We tag multicasts with these structures. */ - -struct dev_mc_list -{ - struct dev_mc_list *next; - __u8dmi_addr[MAX_ADDR_LEN]; - unsigned char dmi_addrlen; - int dmi_users; - int dmi_gusers; -}; + +#define dev_mc_listdev_addr_list +#define dmi_addr da_addr +#define dmi_addrlenda_addrlen +#define dmi_users da_users +#define dmi_gusers da_gusers struct hh_cache { @@ -396,7 +393,7 @@ struct net_device unsigned char addr_len; /* hardware address length */ unsigned short dev_id; /* for shared network cards */ - struct dev_mc_list *mc_list; /* Multicast mac addresses */ + struct dev_addr_list*mc_list; /* Multicast mac addresses */ int mc_count; /* Number of installed mcasts */ int promiscuity; int allmulti; diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c index 80bb2e3..7029074 100644 --- a/net/core/dev_mcast.c +++ b/net/core/dev_mcast.c @@ -102,47 +102,20 @@ void dev_mc_upload(struct net_device *dev) int dev_mc_delete(struct net_device *dev, void *addr, int alen, int glbl) { - int err = 0; - struct dev_mc_list *dmi, **dmip; + int err; netif_tx_lock_bh(dev); + err = __dev_addr_delete(&dev->mc_list, addr, alen, glbl); + if (!err) { + dev->mc_count--; - for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip = &dmi->next) { /* -* Find the entry we want to delete. The device could -* have variable length entries so check these too. +* We have altered the list, so the card +* loaded filter is now wrong. Fix it */ - if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 && - alen == dmi->dmi_addrlen) { - if (glbl) { - int old_glbl = dmi->dmi_gusers; - dmi->dmi_gusers = 0; - if (old_glbl == 0) - break; - } - if (--dmi->dmi_users) - goto done; - - /* -* Last user. So delete the entry. -*/ - *dmip = dmi->next; - dev->mc_count--; - - kfree(dmi); - - /* -* We have altered the list, so the card -* loaded filter is now wrong. Fix it -*/ - __dev_mc_upload(dev); - - netif_tx_unlock_bh(dev); - return 0; - } + + __dev_mc_upload(dev); } - err = -ENOENT; -done: netif_tx_unlock_bh(dev); return err; } @@ -153,46 +126,15 @@ done: int dev_mc_add(struct net_device *dev, void *addr, int alen, int glbl) { - int err = 0; - struct dev_mc_list *dmi, *dmi1; - - dmi1 = kmalloc(sizeof(*dmi), GFP_ATOMIC); + int err; netif_tx_lock_bh(dev); - for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) { - if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 && - dmi->dmi_addrlen == alen) { - if (glbl) { - int old_glbl = dmi->dmi_gusers; - dmi->dmi_gusers = 1; - if (old_glbl) - goto done; - } - dmi->dmi_users++; - goto done; - } - } - - if ((dmi = dmi1) == NULL) { - netif_tx_unlock_bh(dev); -
[NET 01/05]: dev_mcast: unexport dev_mc_upload
[NET]: dev_mcast: unexport dev_mc_upload dev_mc_add/dev_mc_delete take care of uploading the list when necessary and thats the only interface other code should use. Also remove two incorrect calls in DECnet. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit cdf660f0bd4cca9d2cbe86a31adc60d6fa8a60ec tree 2f08c8240b7da9b17725896c3f7eb9c7a960c92c parent 45da27ba265dba3c740c45d47f584c30d7066f82 author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 00:56:00 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 00:56:00 +0200 net/core/dev_mcast.c |1 - net/decnet/dn_dev.c |3 --- 2 files changed, 0 insertions(+), 4 deletions(-) diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c index 5a54053..80bb2e3 100644 --- a/net/core/dev_mcast.c +++ b/net/core/dev_mcast.c @@ -292,4 +292,3 @@ void __init dev_mcast_init(void) EXPORT_SYMBOL(dev_mc_add); EXPORT_SYMBOL(dev_mc_delete); -EXPORT_SYMBOL(dev_mc_upload); diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c index ab41c18..e31549e 100644 --- a/net/decnet/dn_dev.c +++ b/net/decnet/dn_dev.c @@ -461,7 +461,6 @@ static int dn_dev_insert_ifa(struct dn_dev *dn_db, struct dn_ifaddr *ifa) if (ifa->ifa_local != dn_eth2dn(dev->dev_addr)) { dn_dn2eth(mac_addr, ifa->ifa_local); dev_mc_add(dev, mac_addr, ETH_ALEN, 0); - dev_mc_upload(dev); } } @@ -1064,8 +1063,6 @@ static int dn_eth_up(struct net_device *dev) else dev_mc_add(dev, dn_rt_all_rt_mcast, ETH_ALEN, 0); - dev_mc_upload(dev); - dn_db->use_long = 1; return 0; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET 02/05]: dev: introduce generic net_device address lists
[NET]: dev: introduce generic net_device address lists Introduce struct dev_addr_list and list maintenance functions based on dev_mc_list and the related functions. This will be used by follow-up patches for both multicast and secondary unicast addresses. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 6d8fd140951de7cc8faab4922dba74dd1db3cae5 tree b80412116a867d544808f140e76cdf22bbc8b248 parent cdf660f0bd4cca9d2cbe86a31adc60d6fa8a60ec author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 03:25:26 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 03:25:26 +0200 include/linux/netdevice.h | 11 +++ net/core/dev.c| 69 + 2 files changed, 80 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e7913ee..3785a8a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -177,6 +177,14 @@ struct netif_rx_stats DECLARE_PER_CPU(struct netif_rx_stats, netdev_rx_stat); +struct dev_addr_list +{ + struct dev_addr_list*next; + u8 da_addr[MAX_ADDR_LEN]; + u8 da_addrlen; + int da_users; + int da_gusers; +}; /* * We tag multicasts with these structures. @@ -1004,6 +1012,9 @@ extern void dev_mc_upload(struct net_device *dev); extern int dev_mc_delete(struct net_device *dev, void *addr, int alen, int all); extern int dev_mc_add(struct net_device *dev, void *addr, int alen, int newonly); extern voiddev_mc_discard(struct net_device *dev); +extern int __dev_addr_delete(struct dev_addr_list **list, void *addr, int alen, int all); +extern int __dev_addr_add(struct dev_addr_list **list, void *addr, int alen, int newonly); +extern void__dev_addr_discard(struct dev_addr_list **list); extern voiddev_set_promiscuity(struct net_device *dev, int inc); extern voiddev_set_allmulti(struct net_device *dev, int inc); extern voidnetdev_state_change(struct net_device *dev); diff --git a/net/core/dev.c b/net/core/dev.c index 2609062..1496715 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2551,6 +2551,75 @@ void dev_set_allmulti(struct net_device *dev, int inc) dev_mc_upload(dev); } +int __dev_addr_delete(struct dev_addr_list **list, void *addr, int alen, + int glbl) +{ + struct dev_addr_list *da; + + for (; (da = *list) != NULL; list = &da->next) { + if (memcmp(da->da_addr, addr, da->da_addrlen) == 0 && + alen == da->da_addrlen) { + if (glbl) { + int old_glbl = da->da_gusers; + da->da_gusers = 0; + if (old_glbl == 0) + break; + } + if (--da->da_users) + return 0; + + *list = da->next; + kfree(da); + return 0; + } + } + return -ENOENT; +} + +int __dev_addr_add(struct dev_addr_list **list, void *addr, int alen, int glbl) +{ + struct dev_addr_list *da; + + for (da = *list; da != NULL; da = da->next) { + if (memcmp(da->da_addr, addr, da->da_addrlen) == 0 && + da->da_addrlen == alen) { + if (glbl) { + int old_glbl = da->da_gusers; + da->da_gusers = 1; + if (old_glbl) + return 0; + } + da->da_users++; + return 0; + } + } + + da = kmalloc(sizeof(*da), GFP_ATOMIC); + if (da == NULL) + return -ENOMEM; + memcpy(da->da_addr, addr, alen); + da->da_addrlen = alen; + da->da_users = 1; + da->da_gusers = glbl ? 1 : 0; + da->next = *list; + *list = da; + return 0; +} + +void __dev_addr_discard(struct dev_addr_list **list) +{ + struct dev_addr_list *tmp; + + while (*list != NULL) { + tmp = *list; + *list = tmp->next; + if (tmp->da_users > tmp->da_gusers) + printk("__dev_addr_discard: address leakage! " + "da_users=%d\n", tmp->da_users); + kfree(tmp); + } +} + unsigned dev_get_flags(const struct net_device *dev) { unsigned flags; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET 00/05]: Secondary unicast address support v2
This is an updated version of the secondary unicast address patches. I've introduced a common structure and helpers for both unicast and multicast addresses to make it easier for virtual software devices that want to synchronize addresses to a lower device to reuse code. Additionally I fixed a deadlock when putting the device into promiscous mode, renamed dev->set_address_list to dev->set_rx_mode and cleaned the code up a bit. One remaining question is how to handle the case that too many unicast addresses are configured and the device is put into promiscous mode or unicast filtering is disabled by the driver. In that case we're not getting the message that is normally printed by dev_set_promiscous and no audit log. Not sure if that can already happen when configuring multicast, I thought it was worth mentioning. drivers/net/e1000/e1000_main.c | 47 ++--- include/linux/netdevice.h | 40 +--- net/core/dev.c | 213 --- net/core/dev_mcast.c | 128 +++- net/decnet/dn_dev.c|3 - 5 files changed, 269 insertions(+), 162 deletions(-) Patrick McHardy (5): [NET]: dev_mcast: unexport dev_mc_upload [NET]: dev: introduce generic net_device address lists [NET]: dev_mcast: switch to generic net_device address lists [NET]: dev: secondary unicast address support [E1000]: Secondary unicast address support - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/02]: Secondary unicast address support
Eric W. Biederman wrote: Ben Greear <[EMAIL PROTECTED]> writes: Patrick McHardy wrote: Eric W. Biederman wrote: For the macvlan code do we need to do anything special if we transmit to a mac we would normally receive? Another unicast mac of the same nic for example. That doesn't happen under normal circumstances. I don't believe it would work. Assuming you mean you want to send between two mac-vlans on the same physical nic... This can work if your mac-vlans are on different subnets and you are routing between them (and if you have my send-to-self patch or have another way to let a system send packets to itself). Ok. I didn't know if you could trigger this case without without having then endpoints in separate namespaces. I was suspecting the routing code would realize what we were doing realize the route is local and route through lo. The routing code will short-circuit by default. It takes quite a bit of effort to make them _not_ short circuit..that is what I was talking about. Mac-vlans will be just like any other ethernet nics as far as routing goes. A normal ethernet switch will NOT turn a packet around on the same interface it was received, so that is why you must have them on different subnets and have a router in between. Yes. That is essentially the configuration I was wondering about. For sending directly to yourself, something like the 'veth' driver is probably more useful. True. And I think it has a place. However the common case with the tunnel devices is to just hook them all up to an ethernet bridge as well as a real ethernet device. The far ends of the ethernet tunnels are dropped into different namespaces. Which gets a very similar effect to the mac vlan code. I'm just wondering if I can not setup an ethernet tunnel device when my primary purpose is to talk to the outside world, but occasionally want a little in the box traffic. mac-vlans should work on veth devices just fine, and the veths will also short-circuit route (at least if they are in the same namespace). I'm not sure I understand what you are trying to do..but in general both veth and mac-vlans should act like ethernet nics..so if you can find some way that does _not_ hold, please let us know. Thanks, Ben Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/02]: Secondary unicast address support
Ben Greear wrote: > Patrick McHardy wrote: > >> Eric W. Biederman wrote: >> >>> For the macvlan hash you just use an upper byte. Is that just a >>> simple starting place, or do we not need a more complex hash. >>> >> >> >> That gave me an idea, since the default addresses are random >> anyway I'm now using an incrementing counter for the upper byte. > > > Is there not a (relatively) easy way to hash the entire 6 bytes? > > I'd prefer to be able to set the MACs to anything I want, without > worrying about trivially hitting a worst-case hash scenario. That would only happen if all your addresses have the same high byte. I can't see a reason why you would want to do this, even with manually configured addresses its still reasonable to expect a uniform distribution. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] alpha: fix alignment problem in csum_ipv6_magic()
On Thu, Jun 21, 2007 at 04:35:01PM -0700, Andrew Morton wrote: > In http://bugzilla.kernel.org/show_bug.cgi?id=8659, Dustin is reporting > that this patch broke tcp-on-ipv6. Oops. Two instructions operating on the 'len' arg ($18) got swapped... This should fix ev6 version, ev5 one seems to be ok. Signed-off-by: Ivan Kokshaysky <[EMAIL PROTECTED]> Ivan. --- 2.6.22-rc4-mm2/arch/alpha/lib/ev6-csum_ipv6_magic.S Fri Jun 22 15:02:23 2007 +++ linux/arch/alpha/lib/ev6-csum_ipv6_magic.S Fri Jun 22 15:05:38 2007 @@ -76,18 +76,18 @@ csum_ipv6_magic: cmoveq $6,$31,$22 # E : src aligned? ldq_u $23,15($17) # L : Latency: 3 - or $18,$4,$18 # E : 00CCDDAABBCC - extql $1,$6,$1# U : U L L U : + inswl $18,3,$18 # U : 00CCDD00 + addl$19,$7,$19 # E : U L U L : bbaabb00 or $0,$22,$0 # E : 1st src word complete - extqh $5,$6,$5# U : - addl$19,$7,$19 # E : bbaabb00 - and $17,7,$6# E : L U L U : dst misalignment + extql $1,$6,$1# U : + or $18,$4,$18 # E : 00CCDDAABBCC + extqh $5,$6,$5# U : L U L U - inswl $18,3,$18 # U : 00CCDD00 - or $1,$5,$1# E : 2nd src word complete + and $17,7,$6# E : dst misalignment extql $2,$6,$2# U : - extqh $3,$6,$22 # U : U L U U : + or $1,$5,$1# E : 2nd src word complete + extqh $3,$6,$22 # U : L U L U : cmoveq $6,$31,$22 # E : dst aligned? extql $3,$6,$3# U : - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] NetXen: Fix MSI issues by using PCI function 0
On Friday 22 June 2007 10:40:41 Mithlesh Thukral wrote: > NetXen: Fix issue of MSI not working correctly > NetXen driver uses PCI function 0 to provide the functionality of MSI. > The patch makes driver check the bus master bit for function 0 and > enable it after the card initialization. > > Signed-off-by: Milan Bag <[EMAIL PROTECTED]> > Signed-off-by: Wen Xiong <[EMAIL PROTECTED]> > Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]> > > --- > > drivers/net/netxen/netxen_nic_main.c | 13 ++--- > 1 files changed, 6 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/netxen/netxen_nic_main.c > b/drivers/net/netxen/netxen_nic_main.c > index 6167b58..e68356b 100644 > --- a/drivers/net/netxen/netxen_nic_main.c > +++ b/drivers/net/netxen/netxen_nic_main.c > @@ -355,13 +355,6 @@ #endif > /* initialize the adapter */ > netxen_initialize_adapter_hw(adapter); > > -#ifdef CONFIG_PPC > - if ((adapter->ahw.boardcfg.board_type == > - NETXEN_BRDTYPE_P2_SB31_10G_IMEZ) && > - (pci_func_id == 2)) > - goto err_out_free_adapter; > -#endif /* CONFIG_PPC */ > - > /* >* Adapter in our case is quad port so initialize it before >* initializing the ports > @@ -509,6 +502,12 @@ #endif > NETXEN_CAM_RAM(0x1fc))); > if (val == 0x) { > /* This is the first boot after power up */ > + netxen_nic_read_w0(adapter, NETXEN_PCIE_REG(0x4), &val); > + if (!(val & 0x4)) { > + val |= 0x4; > + netxen_nic_write_w0(adapter, NETXEN_PCIE_REG(0x4), val); > + mdelay(100); > + } msleep()? Or wait, what is this delay trying to do? Commit the register access? The better way to commit a register write is to read-back the value, usually. -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Urs Thuermann wrote: > This patch adds the virtual CAN bus (vcan) network driver. > The vcan device is just a loopback device for CAN frames, no > real CAN hardware is involved. Is there a reason why you're still doing the "allocate n devices on init" thing instead of using the rtnl_link API? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] NetXen: Fix the rmmod error on PBlades due incorrect cleanup
On Friday 22 June 2007 10:42:38 Mithlesh Thukral wrote: > diff --git a/drivers/net/netxen/netxen_nic_hw.c > b/drivers/net/netxen/netxen_nic_hw.c > index 4e958c9..f0df6fb 100644 > --- a/drivers/net/netxen/netxen_nic_hw.c > +++ b/drivers/net/netxen/netxen_nic_hw.c > @@ -378,6 +378,7 @@ int netxen_nic_hw_resources(struct netxe > crb_rcvpeg_state)); > while (state != PHAN_PEG_RCV_INITIALIZED && loops < 20) { > udelay(100); > + schedule(); > /* Window 1 call */ > state = readl(NETXEN_CRB_NORMALIZE(adapter, > recv_crb_registers Better do msleep(1); instead of udelay+schedule. > @@ -700,7 +701,7 @@ void netxen_nic_pci_change_crbwindow(str > adapter->curr_window = 0; > } > > -void netxen_load_firmware(struct netxen_adapter *adapter) > +int netxen_load_firmware(struct netxen_adapter *adapter) > { > int i; > u32 data, size = 0; > @@ -712,15 +713,25 @@ void netxen_load_firmware(struct netxen_ > writel(1, NETXEN_CRB_NORMALIZE(adapter, NETXEN_ROMUSB_GLB_CAS_RST)); > > for (i = 0; i < size; i++) { > - if (netxen_rom_fast_read(adapter, flashaddr, (int *)&data) != > 0) { > - DPRINTK(ERR, > - "Error in netxen_rom_fast_read(). Will skip" > - "loading flash image\n"); > - return; > + while (netxen_rom_fast_read(adapter, flashaddr, (int *)&data) > != 0) { > + long timeout = 2 * HZ; > + while (timeout) { > + if (signal_pending(current)) { > + printk( "%s: Got a signal, exiting\n", > __FUNCTION__ ); > + return -1; > + } > + set_current_state(TASK_INTERRUPTIBLE); > + timeout = schedule_timeout(timeout); > + } You're opencoding msleep_interruptible() here? And this sleeps two seconds between each rom-read attempt. Is that really your intention? I'd say better attempt to read more often and sleep less each time. > off = netxen_nic_pci_set_window(adapter, memaddr); > addr = pci_base_offset(adapter, off); > writel(data, addr); > + while (readl(addr) != data) { > + mdelay(100); > + writel(data, addr); > + } Add a timeout. Else this will result in a system hang, if the hardware is faulty. > diff --git a/drivers/net/netxen/netxen_nic_init.c > b/drivers/net/netxen/netxen_nic_init.c > index 15f6dc5..8f5f4f8 100644 > --- a/drivers/net/netxen/netxen_nic_init.c > +++ b/drivers/net/netxen/netxen_nic_init.c > @@ -408,8 +408,12 @@ static inline int > do_rom_fast_read(struct netxen_adapter *adapter, int addr, int *valp) > { > if (jiffies > (last_schedule_time + (8 * HZ))) { > - last_schedule_time = jiffies; > - schedule(); > + if (last_schedule_time) { > + last_schedule_time = jiffies; > + schedule(); > + } else { > + last_schedule_time = jiffies; > + } Why this strange thing? I'd simply call cond_resched() instead of all this custom schedule timekeeping. That's best for system latency. > -void netxen_phantom_init(struct netxen_adapter *adapter, int pegtune_val) > +int netxen_phantom_init(struct netxen_adapter *adapter, int pegtune_val) > { > u32 val = 0; > - int loops = 0; > > if (!pegtune_val) { > - val = readl(NETXEN_CRB_NORMALIZE(adapter, CRB_CMDPEG_STATE)); > - while (val != PHAN_INITIALIZE_COMPLETE && > - val != PHAN_INITIALIZE_ACK && loops < 20) { > - udelay(100); > - schedule(); > - val = > - readl(NETXEN_CRB_NORMALIZE > + do { > + long timeout = 10 * HZ; > + while (timeout) { > + if (signal_pending(current)) { > + printk(KERN_INFO"%s: Got a signal, > exiting\n", __FUNCTION__ ); > + printk(KERN_INFO"%s: val=0x%x, > pegtune_val=0x%x\n", __FUNCTION__, > + val, pegtune_val ); > + return -1; > + } > + set_current_state(TASK_INTERRUPTIBLE); > + timeout = schedule_timeout(timeout); > + } > + val = readl(NETXEN_CRB_NORMALIZE > (adapter, CRB_CMDPEG_STATE)); msleep_interruptible()? > @@ -1278,11 +1
Re: Fwd: [PATCH] [-mm] ACPI: export ACPI events via netlink
On Thu, 2007-06-21 at 11:47 -0400, jamal wrote: > On Wed, 2007-20-06 at 13:25 +0200, Johannes Berg wrote: > > > Ok. That's definitely a bug in nl80211 as we have it in development > > right now. > > Sorry, have never looked at that code. No worries, I was just stating that. > You can use setsockopt to set the multicast groups. What you cant do > with that is subscribe to many groups in one shot. > The call in iproute2 hasnt reflected this reality yet. Ah, ok, I see now. I was under the impression that groups was always just a u32. > > I'd really like to be able to reserve multicast groups with special > > semantics too, especially I might want to permit/deny non-CAP_NET_ADMIN > > users from binding specific multicast groups. That isn't actually > > possible with netlink nor genetlink right now afaict. > > This would be hard - but doable via SELinux interface. I think you > should be able to extend your tool to make calls to that interface. Why do you think that would be hard? It'd basically just mean replacing the netlink_capable(sock, NL_NONROOT_RECV) calls with a call that actually tests depending on the group(s) it wants. > > If we register multiple IDs then we'll end up filling up the generic > > netlink family space really soon. > > Theres a huge number of these groups; and not just that, but considering > that some genetlink users may not be interested in such multicast > groups, it is quiet usable to have many groups as long as we avoid > conflict. Yeah, never mind, I thought that the number of groups was limited to 32. > The multicast issue wasnt well-attacked. We have a group magically > assigned to a user based on their allocated id. It should be feasible > to add an API to the kernel for registering for many groups and allow > user space to discover these groups before registering. Maybe thats > the path to proceed to. Yeah, sounds reasonable, you could ask the controller for which groups are attached to a family and then get the IDs for those groups by name. johannes signature.asc Description: This is a digitally signed message part
Re: [WIP][PATCHES] Network xmit batching
On Thu, Jun 21, 2007 at 02:00:07PM -0700, Rick Jones ([EMAIL PROTECTED]) wrote: > > Simple test included test -> desktop and vice versa traffic with 128 and > > 4096 block size in netperf-2.4.3 setup. > > Is that in conjunction with setting the test-specific -D to set > TCP_NODELAY, or was Nagle left-on? If the latter, perhaps timing issues > could be why the confidence intervals weren't hit since the relative > batching of 128byte sends into larger segments is something of a race. I used this parameters: netperf -l 60 -H kano -t TCP_STREAM -i 10,2 -I 99,5 -- -m 128 -s 128K -S 128K so without nodelay. With nodelay I've gotten: batch-128: 128.91 mbit/sec mainline-128: 140.57 mbit/sec which is about 5 times less than withouth nodelay (~760 mbit/s) Although nodelay results look more realistic. > rick jones -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipv4: Only destroy inet devices when we receive an NETDEV_UNREGISTER event
Never mind. I saw this and I thought it was an old obscure bug. But it appears it is a new condition, that has already been fixed. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20->2.6.21 - networking dies after random time
2007/6/19, Jarek Poplawski <[EMAIL PROTECTED]>: On Mon, Jun 18, 2007 at 08:10:00AM -0700, Stephen Hemminger wrote: > On Mon, 18 Jun 2007 13:08:49 +0200 > Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > > On 16-06-2007 23:35, Marcin .lusarz wrote: > > > hi > > > after upgrading kernel from 2.6.20 to 2.6.21.3 i'm experiencing really > > > strange problem - my _both_ network cards dies after random uptime - > > > sometimes it's a few minutes, sometimes hours, sometimes it does not > > > happen for a couple of days... > > > today it happened for the first time without nvidia module and almost > > > immediately after system start > > > > > > here is the output of some commands which might help debug this: ... > > It looks like skge driver enables different device than probbed. > > Maybe you've something old/wrong about eth0/eth1 in /etc configs? > > More likely it is just user level device renaming. Most distro's > rename devices (if needed) using udev. On the other hand it's interesting, why it's not always, and why sometimes it took so long? I'm sorry for delay, but i was offline for the last week and probably will for some time :| When I disable on-board network card in BIOS (controlled by skge) ne2k-pci card is still locking up. So I think it's strictly ne2k-pci card bug. I made some tests and I know how to reproduce it fast (on my machine) - just make some heavy network traffic... As I'm offline right now I can't bisect it, but i turned on more debugging, maybe you can deduce something... [0.00] Linux version 2.6.21.3 ([EMAIL PROTECTED]) (gcc version 4.1.2 (Gentoo 4.1.2)) #4 PREEMPT Wed Jun 20 22:37:05 CEST 2007 [0.00] Command line: root=/dev/sda5 video=vesafb vga=794 [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000e4000 - 0010 (reserved) [0.00] BIOS-e820: 0010 - 3ffb (usable) [0.00] BIOS-e820: 3ffb - 3ffc (ACPI data) [0.00] BIOS-e820: 3ffc - 3fff (ACPI NVS) [0.00] BIOS-e820: 3fff - 4000 (reserved) [0.00] BIOS-e820: ff78 - 0001 (reserved) [0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used [0.00] Entering add_active_range(0, 256, 262064) 1 entries of 256 used [0.00] end_pfn_map = 1048576 [0.00] DMI 2.3 present. [0.00] ACPI: RSDP 000FA810, 0021 (r2 ACPIAM) [0.00] ACPI: XSDT 3FFB0100, 003C (r1 A M I OEMXSDT 1427 MSFT 97) [0.00] ACPI: FACP 3FFB0290, 00F4 (r3 A M I OEMFACP 1427 MSFT 97) [0.00] ACPI: DSDT 3FFB03E0, 38A1 (r1 A0036 A00360011 MSFT 10D) [0.00] ACPI: FACS 3FFC, 0040 [0.00] ACPI: APIC 3FFB0390, 004A (r1 A M I OEMAPIC 1427 MSFT 97) [0.00] ACPI: OEMB 3FFC0040, 003F (r1 A M I OEMBIOS 1427 MSFT 97) [0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used [0.00] Entering add_active_range(0, 256, 262064) 1 entries of 256 used [0.00] Zone PFN ranges: [0.00] DMA 0 -> 4096 [0.00] DMA324096 -> 1048576 [0.00] Normal1048576 -> 1048576 [0.00] early_node_map[2] active PFN ranges [0.00] 0:0 -> 159 [0.00] 0: 256 -> 262064 [0.00] On node 0 totalpages: 261967 [0.00] DMA zone: 56 pages used for memmap [0.00] DMA zone: 2549 pages reserved [0.00] DMA zone: 1394 pages, LIFO batch:0 [0.00] DMA32 zone: 3526 pages used for memmap [0.00] DMA32 zone: 254442 pages, LIFO batch:31 [0.00] Normal zone: 0 pages used for memmap [0.00] Looks like a VIA chipset. Disabling IOMMU. Override with iommu=allowed [0.00] ACPI: PM-Timer IO Port: 0x808 [0.00] ACPI: Local APIC address 0xfee0 [0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) [0.00] Processor #0 (Bootup-CPU) [0.00] ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0]) [0.00] IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23 [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [0.00] ACPI: IRQ0 used by override. [0.00] ACPI: IRQ2 used by override. [0.00] ACPI: IRQ9 used by override. [0.00] Setting APIC routing to flat [0.00] Using ACPI (MADT) for SMP configuration information [0.00] Nosave address range: 0009f000 - 000a [0.00] Nosave address range: 000a - 000e4000 [0.00] Nosave address range: 000e4000 - 0010 [0.00] Allocating PCI resources starting at 5000 (gap: 4000:bf78) [0.00] Built 1 zonelist
[PATCH 3/3] NetXen: Fix the rmmod error on PBlades due incorrect cleanup
NetXen: Unload graceful unloading of NetXen driver. To allow graceful handing of Netxen module load/unload sequences, modified code allows driver close routine to be invoked via unregister_netdev() call in driver remove routine to free the command buffer list and flush queues. Next dummy dma buffer that the hardware uses is released after disabling its functionality. Finally other software resources are released and the hardware is left in a reset state for future load/unload. Signed-off-by: Milan Bag <[EMAIL PROTECTED]> Signed-off-by: Wen Xiong <[EMAIL PROTECTED]> Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]> --- drivers/net/netxen/netxen_nic.h | 80 + drivers/net/netxen/netxen_nic_hdr.h |2 drivers/net/netxen/netxen_nic_hw.c | 25 - drivers/net/netxen/netxen_nic_init.c | 52 ++- drivers/net/netxen/netxen_nic_main.c | 112 + 5 files changed, 206 insertions(+), 65 deletions(-) diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h index 62aeab9..0e3be92 100644 --- a/drivers/net/netxen/netxen_nic.h +++ b/drivers/net/netxen/netxen_nic.h @@ -952,6 +952,26 @@ struct netxen_adapter { int (*stop_port) (struct netxen_adapter *); }; /* netxen_adapter structure */ +/* + * NetXen dma watchdog control structure + * + * Bit 0 : enabled => R/O: 1 watchdog active, 0 inactive + * Bit 1 : disable_request => 1 req disable dma watchdog + * Bit 2 : enable_request => 1 req enable dma watchdog + * Bit 3-31: unused + */ + +typedef u32 dma_watchdog_ctrl_t; + +#define netxen_set_dma_watchdog_disable_req(config_word) \ + _netxen_set_bits(config_word, 1, 1, 1) +#define netxen_set_dma_watchdog_enable_req(config_word) \ + _netxen_set_bits(config_word, 2, 1, 1) +#define netxen_get_dma_watchdog_enabled(config_word) \ + ((config_word) & 0x1) +#define netxen_get_dma_watchdog_disabled(config_word) \ + (((config_word) >> 1) & 0x1) + /* Max number of xmit producer threads that can run simultaneously */ #defineMAX_XMIT_PRODUCERS 16 @@ -1031,8 +1051,8 @@ int netxen_nic_erase_pxe(struct netxen_a /* Functions from netxen_nic_init.c */ void netxen_free_adapter_offload(struct netxen_adapter *adapter); int netxen_initialize_adapter_offload(struct netxen_adapter *adapter); -void netxen_phantom_init(struct netxen_adapter *adapter, int pegtune_val); -void netxen_load_firmware(struct netxen_adapter *adapter); +int netxen_phantom_init(struct netxen_adapter *adapter, int pegtune_val); +int netxen_load_firmware(struct netxen_adapter *adapter); int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose); int netxen_rom_fast_read(struct netxen_adapter *adapter, int addr, int *valp); int netxen_rom_fast_read_words(struct netxen_adapter *adapter, int addr, @@ -1230,6 +1250,62 @@ static inline void get_brd_name_by_type( name = "Unknown"; } +static inline int +dma_watchdog_shutdown_request(struct netxen_adapter *adapter) +{ + dma_watchdog_ctrl_t ctrl; + + /* check if already inactive */ + if (netxen_nic_hw_read_wx(adapter, + NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), &ctrl, 4)) + printk(KERN_ERR "failed to read dma watchdog status\n"); + + if (netxen_get_dma_watchdog_enabled(ctrl) == 0) + return 1; + + /* Send the disable request */ + netxen_set_dma_watchdog_disable_req(ctrl); + netxen_crb_writelit_adapter(adapter, + NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), ctrl); + + return 0; +} + +static inline int +dma_watchdog_shutdown_poll_result(struct netxen_adapter *adapter) +{ + dma_watchdog_ctrl_t ctrl; + + if (netxen_nic_hw_read_wx(adapter, + NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), &ctrl, 4)) + printk(KERN_ERR "failed to read dma watchdog status\n"); + + return ((netxen_get_dma_watchdog_enabled(ctrl) == 0) && + (netxen_get_dma_watchdog_disabled(ctrl) == 0)); +} + +static inline int +dma_watchdog_wakeup(struct netxen_adapter *adapter) +{ + dma_watchdog_ctrl_t ctrl; + + if (netxen_nic_hw_read_wx(adapter, + NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), &ctrl, 4)) + printk(KERN_ERR "failed to read dma watchdog status\n"); + + if (netxen_get_dma_watchdog_enabled(ctrl)) + return 1; + + /* send the wakeup request */ + netxen_set_dma_watchdog_enable_req(ctrl); + + netxen_crb_writelit_adapter(adapter, + NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), ctrl); + + return 0; +} + + int netxen_is_flash_supported(struct netxen_adapter *adapter); int netxen_get_flash_mac_addr(struct netxen_adapter *adapter, u64 mac[]); extern void netxen_change_ringparam(struct netxen_adapter *adapter); diff --git a/d
[PATCH 2/3] NetXen: Make use of per port interrupt mask scheme
NetXen: Make use of per port interrupt scheme. This patch makes the driver inform the firmware that it can support the per port interrupt mask scheme. The driver too needs to check whether the firmware also supports the per port interrupt scheme. If yes, then interrupt for each port is enabled/disabled instead of disabling for the entire card as it was being done till now. Signed-off-by: Milan Bag <[EMAIL PROTECTED]> Signed-off-by: Wen Xiong <[EMAIL PROTECTED]> Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]> --- drivers/net/netxen/netxen_nic.h | 104 + drivers/net/netxen/netxen_nic_hw.c |5 - drivers/net/netxen/netxen_nic_init.c |2 drivers/net/netxen/netxen_nic_main.c | 28 +++-- drivers/net/netxen/netxen_nic_phan_reg.h | 14 ++ 5 files changed, 121 insertions(+), 32 deletions(-) diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h index 91f25e0..62aeab9 100644 --- a/drivers/net/netxen/netxen_nic.h +++ b/drivers/net/netxen/netxen_nic.h @@ -937,6 +937,7 @@ struct netxen_adapter { struct netxen_ring_ctx *ctx_desc; struct pci_dev *ctx_desc_pdev; dma_addr_t ctx_desc_phys_addr; + int intr_scheme; int (*enable_phy_interrupts) (struct netxen_adapter *); int (*disable_phy_interrupts) (struct netxen_adapter *); void (*handle_phy_intr) (struct netxen_adapter *); @@ -1080,37 +1081,102 @@ struct net_device_stats *netxen_nic_get_ static inline void netxen_nic_disable_int(struct netxen_adapter *adapter) { - /* -* ISR_INT_MASK: Can be read from window 0 or 1. -*/ - writel(0x7ff, PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK)); + uint32_tmask = 0x7ff; + int count = 0; + + DPRINTK(1,INFO,"Entered ISR Disable \n"); + + switch(adapter->portnum) { + case 0: + writel(0x0, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_0)); + break; + case 1: + writel(0x0, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_1)); + break; + case 2: + writel(0x0, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_2)); + break; + case 3: + writel(0x0, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_3)); + break; + } + + if (adapter->intr_scheme != -1 && + adapter->intr_scheme != INTR_SCHEME_PERPORT) { + writel(mask, + (void *)(PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK))); + } + /* Window = 0 or 1 */ + if (!(adapter->flags & NETXEN_NIC_MSI_ENABLED)) { + do { + writel(0x, (void *) + (PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_TARGET_STATUS))); + mask = readl((void *) + (pci_base_offset(adapter, ISR_INT_VECTOR))); + } while (((mask & 0x80) != 0) && (++count < 32)); + + if ((mask & 0x80) != 0) { + printk(KERN_NOTICE "Could not disable interrupt completely\n"); + } + } + + DPRINTK(1,INFO,"Done with Disable Int\n"); + + return; } static inline void netxen_nic_enable_int(struct netxen_adapter *adapter) { u32 mask; - switch (adapter->ahw.board_type) { - case NETXEN_NIC_GBE: - mask = 0x77b; - break; - case NETXEN_NIC_XGBE: - mask = 0x77f; - break; - default: - mask = 0x7ff; - break; - } + DPRINTK(1, INFO, "Entered ISR Enable \n"); + + if (adapter->intr_scheme != -1 && + adapter->intr_scheme != INTR_SCHEME_PERPORT) { + switch (adapter->ahw.board_type) { + case NETXEN_NIC_GBE: + mask = 0x77b; + break; + case NETXEN_NIC_XGBE: + mask = 0x77f; + break; + default: + mask = 0x7ff; + break; + } - writel(mask, PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK)); + writel(mask, + (void *)(PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK))); + } + switch (adapter->portnum) { + case 0: + writel(0x1, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_0)); + break; + case 1: + writel(0x1, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_1)); + break; + case 2: + writel(0x1, NETXEN_CRB_NORMALIZE(adapter, CRB_SW_INT_MASK_2)); +
[PATCH 1/3] NetXen: Fix MSI issues by using PCI function 0
NetXen: Fix issue of MSI not working correctly NetXen driver uses PCI function 0 to provide the functionality of MSI. The patch makes driver check the bus master bit for function 0 and enable it after the card initialization. Signed-off-by: Milan Bag <[EMAIL PROTECTED]> Signed-off-by: Wen Xiong <[EMAIL PROTECTED]> Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]> --- drivers/net/netxen/netxen_nic_main.c | 13 ++--- 1 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c index 6167b58..e68356b 100644 --- a/drivers/net/netxen/netxen_nic_main.c +++ b/drivers/net/netxen/netxen_nic_main.c @@ -355,13 +355,6 @@ #endif /* initialize the adapter */ netxen_initialize_adapter_hw(adapter); -#ifdef CONFIG_PPC - if ((adapter->ahw.boardcfg.board_type == - NETXEN_BRDTYPE_P2_SB31_10G_IMEZ) && - (pci_func_id == 2)) - goto err_out_free_adapter; -#endif /* CONFIG_PPC */ - /* * Adapter in our case is quad port so initialize it before * initializing the ports @@ -509,6 +502,12 @@ #endif NETXEN_CAM_RAM(0x1fc))); if (val == 0x) { /* This is the first boot after power up */ + netxen_nic_read_w0(adapter, NETXEN_PCIE_REG(0x4), &val); + if (!(val & 0x4)) { + val |= 0x4; + netxen_nic_write_w0(adapter, NETXEN_PCIE_REG(0x4), val); + mdelay(100); + } val = readl(NETXEN_CRB_NORMALIZE(adapter, NETXEN_ROMUSB_GLB_SW_RESET)); printk(KERN_INFO"NetXen: read 0x%08x for reset reg.\n",val); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] NetXen: Updates and bug fixes for NetXen 1/10G driver
Hi All, I will be sending updates for NetXen NIC 1/10 G Ethernet driver in the following emails. These are bug fixes and better interrupt handling schemes. All these patches have been test on x86 machines and PowerPC blades. Thanks, Mithlesh Thukral - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] NetXen: Updates and bug fixes for NetXen 1/10G driver
All, I am recalling these 2 patches. Please dont review them. I will resend them again along with a new patch which has come up. Sorry for the inconvenience. Thanks, Mithlesh Thukral On Thursday 21 June 2007 22:34, Mithlesh Thukral wrote: > Hi All, > > I will be sending updates for NetXen NIC 1/10 G Ethernet driver > in the following emails. These are bug fixes and better interrupt > handling schemes. These have been test on x86 machines and > PowerPC blades. > > Thanks, > Mithlesh Thukral - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] CONFIG_INET depend on CONFIG_SYSCTL
At Tue, 12 Jun 2007 23:05:45 -0700 (PDT), David Miller wrote: > > From: Yoshinori Sato <[EMAIL PROTECTED]> > Date: Wed, 13 Jun 2007 14:59:16 +0900 > > > At Tue, 12 Jun 2007 01:08:55 -0700 (PDT), > > David Miller wrote: > > > > > 2) It is much better to add the appropriate CONFIG_SYSCTL > > >ifdefs to the INET code than to force it on for everyone. > > > > It examined that, but many corrections become necessary. > > I understand, but embedded people will not be happy that > SYSFS is a requirement for IPV4 networking. Every little > bit of space savings matters for them. A reply became late, sorry. I do not check it in detail, but there seem to be part a few depending SYSFS. I need to check whether can separate a SYSFS depending part. It may take time, but tries to check it. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Yoshinori Sato <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html