Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread David Miller
From: David Stevens <[EMAIL PROTECTED]>
Date: Fri, 22 Jun 2007 21:30:05 -0700

> [EMAIL PROTECTED] wrote on 06/22/2007 06:17:46 PM:
> 
> > On 23/06/07 02:04, David Stevens wrote:
> > > Why not make the application that writes resolv.conf
> > > also listen on a raw ICMPv6 socket? I don't believe you'd need
> > > any kernel changes, then, and it seems pretty simple and
> > > straightforward.
> > 
> > Because then it requires yet another network daemon, RA in 
> > the kernel means there's no need for one to manage adding 
> > auto-configured IP addresses... what's wrong with doing the 
> > same for DNS?
> 
> It's not yet another one, since you have to run something
> to get it in resolv.conf, anyway. That seems much better to me
> than having the kernel track data that can only be used at the
> application layer. The app itself looks like it'd be really simple.
> Auto-configured addresses are used by the kernel. It has to
> have those addresses. But the kernel doesn't do DNS look-ups, or
> write resolv.conf; that's the difference, for me.

I totally agree with David, this stuff definitely does not belong
in the kernel.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread Simon Arlott
On 23/06/07 05:30, David Stevens wrote:
> [EMAIL PROTECTED] wrote on 06/22/2007 06:17:46 PM:
Is there a reason why you're CC:ing the Sender? Doesn't that end 
up in the mailbox(es) of the netdev admin(s)?

>> On 23/06/07 02:04, David Stevens wrote:
>>> Why not make the application that writes resolv.conf
>>> also listen on a raw ICMPv6 socket? I don't believe you'd need
>>> any kernel changes, then, and it seems pretty simple and
>>> straightforward.
>> Because then it requires yet another network daemon, RA in 
>> the kernel means there's no need for one to manage adding 
>> auto-configured IP addresses... what's wrong with doing the 
>> same for DNS?
> 
> It's not yet another one, since you have to run something
> to get it in resolv.conf, anyway. That seems much better to me

Well, it'd be the library including it - so there'd be no daemon 
application involved.

> than having the kernel track data that can only be used at the
> application layer. The app itself looks like it'd be really simple.

Keeping application data in the kernel does start to get silly though, 
e.g. everything in dhcp-options(5)... but DNS is used almost 
everywhere. This could be a configuration option so that anyone who 
doesn't want it can disable it completely.

> Auto-configured addresses are used by the kernel. It has to
> have those addresses. But the kernel doesn't do DNS look-ups, or
> write resolv.conf; that's the difference, for me.

Using DHCPv6 as an example, auto-configuration does not have to be in 
the kernel at all either.

-- 
Simon Arlott
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread David Stevens
[EMAIL PROTECTED] wrote on 06/22/2007 06:17:46 PM:

> On 23/06/07 02:04, David Stevens wrote:
> > Why not make the application that writes resolv.conf
> > also listen on a raw ICMPv6 socket? I don't believe you'd need
> > any kernel changes, then, and it seems pretty simple and
> > straightforward.
> 
> Because then it requires yet another network daemon, RA in 
> the kernel means there's no need for one to manage adding 
> auto-configured IP addresses... what's wrong with doing the 
> same for DNS?

It's not yet another one, since you have to run something
to get it in resolv.conf, anyway. That seems much better to me
than having the kernel track data that can only be used at the
application layer. The app itself looks like it'd be really simple.
Auto-configured addresses are used by the kernel. It has to
have those addresses. But the kernel doesn't do DNS look-ups, or
write resolv.conf; that's the difference, for me.

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread Dan Williams
On Fri, 2007-06-22 at 20:09 -0400, C. Scott Ananian wrote:
> > > diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h
> > > linux-2.6.22-rc5/include/net/ip6_rdnss.h
> > > --- linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h1969-12-31
> > > 19:00:00.0 -0500
> > > +++ linux-2.6.22-rc5/include/net/ip6_rdnss.h2007-06-21
> > > 18:16:33.0 -0400 @@ -0,0 +1,58 @@
> > > +#ifndef _NET_IP6_RDNSS_H
> > > +#define _NET_IP6_RDNSS_H
> > > +
> > > +#ifdef __KERNEL__
> > > +
> > > +#include 
> > > +
> > > +struct nd_opt_rdnss {
> > > +__u8type;
> > > +__u8length;
> > > +#if defined(__BIG_ENDIAN_BITFIELD)
> > > +__u8priority:4,
> > > +open:1,
> > > +reserved1:3;
> > > +#elif defined(__LITTLE_ENDIAN_BITFIELD)
> > > +__u8reserved1:3,
> > > +open:1,
> > > +priority:4;
> > > +#else
> > > +# error not little or big endian
> > > +#endif
> >
> > That is not endianess-safe. Don't use foo:x at all
> > for stuff where a specific endianess is needed. The
> > compiler doesn't make any guarantee about it.
> 
> This was copied directly from include/net/ip6_route.h.  I believe that
> it does in fact work, and I (for one) find this much more readable
> than the alternative.  If it is in fact broken, then
> include/net/ip6_route.h (and the 35 other files which use this #ifdef
> in this manner) should be fixed.

Though in general, we shouldn't be using bitfields, FYI.  They are known
to generate really crappy code on many architectures, and patches that
contain them have been smacked down quite hard by people we all know are
better hackers than us :)

Dan


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread Simon Arlott
On 23/06/07 02:04, David Stevens wrote:
> Why not make the application that writes resolv.conf
> also listen on a raw ICMPv6 socket? I don't believe you'd need
> any kernel changes, then, and it seems pretty simple and
> straightforward.

Because then it requires yet another network daemon, RA in 
the kernel means there's no need for one to manage adding 
auto-configured IP addresses... what's wrong with doing the 
same for DNS?

I don't think it should be in resolv.conf format though. Can't 
you make a change to glibc too to have it use it properly?

Something like "hosts: files rdnss dns" or an option that can 
be added to resolv.conf?

-- 
Simon Arlott
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread David Stevens
Scott,
Why not make the application that writes resolv.conf
also listen on a raw ICMPv6 socket? I don't believe you'd need
any kernel changes, then, and it seems pretty simple and
straightforward.

+-DLS

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread C. Scott Ananian

> diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h
> linux-2.6.22-rc5/include/net/ip6_rdnss.h
> --- linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h1969-12-31
> 19:00:00.0 -0500
> +++ linux-2.6.22-rc5/include/net/ip6_rdnss.h2007-06-21
> 18:16:33.0 -0400 @@ -0,0 +1,58 @@
> +#ifndef _NET_IP6_RDNSS_H
> +#define _NET_IP6_RDNSS_H
> +
> +#ifdef __KERNEL__
> +
> +#include 
> +
> +struct nd_opt_rdnss {
> +__u8type;
> +__u8length;
> +#if defined(__BIG_ENDIAN_BITFIELD)
> +__u8priority:4,
> +open:1,
> +reserved1:3;
> +#elif defined(__LITTLE_ENDIAN_BITFIELD)
> +__u8reserved1:3,
> +open:1,
> +priority:4;
> +#else
> +# error not little or big endian
> +#endif

That is not endianess-safe. Don't use foo:x at all
for stuff where a specific endianess is needed. The
compiler doesn't make any guarantee about it.


This was copied directly from include/net/ip6_route.h.  I believe that
it does in fact work, and I (for one) find this much more readable
than the alternative.  If it is in fact broken, then
include/net/ip6_route.h (and the 35 other files which use this #ifdef
in this manner) should be fixed.
--scott

--
( http://cscott.net/ )
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] inetdevice.h must include sysctl.h

2007-06-22 Thread David Miller
From: "Satyam Sharma" <[EMAIL PROTECTED]>
Date: Sat, 23 Jun 2007 05:26:52 +0530

> [PATCH] include sysctl.h from inetdevice.h
> 
> When CONFIG_INET=y and CONFIG_SYSCTL=n:
> 
> In file included from net/core/netpoll.c:16:
> include/linux/inetdevice.h:15: error:
> '__NET_IPV4_CONF_MAX' undeclared here (not in a function)
> make[2]: *** [net/core/netpoll.o] Error 1
> make[1]: *** [net/core] Error 2
> make: *** [net] Error 2
> 
> So #include sysctl.h from inetdevice.h.
> 
> Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

Patch applied, thank you.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread Michael Buesch
On Saturday 23 June 2007 01:26:19 C. Scott Ananian wrote:
> +struct rdns6_info {
> +   rwlock_tlock;
> +   struct timer_list   expiry_timer;
> +   struct rdns6_entry *rdnss_list;
> +   struct inet6_dev *  in6_dev; /* back pointer for netlink notify */
> +   int expire_all : 1, /* remove entries on ifdown */
> +   free_me : 1; /* safely free this struct */
> +};

Sparse will complain about that.
I suggest you do:

+struct rdns6_info {
+   rwlock_tlock;
+   struct timer_list   expiry_timer;
+   struct rdns6_entry *rdnss_list;
+   struct inet6_dev *  in6_dev; /* back pointer for netlink notify */
+   u8 expire_all; /* remove entries on ifdown */
+   u8 free_me; /* safely free this struct */
+};

Will generate better code and
struct size shouldn't increase. So it's a net win.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread Michael Buesch
On Saturday 23 June 2007 01:26:19 C. Scott Ananian wrote:
> Attached is my first draft of a patch to implement RDNSS-in-Router
> Advertisements support for IPv6 (
> http://tools.ietf.org/html/draft-jeong-dnsop-ipv6-dns-discovery-12 )
> as implemented in radvd ( http://www.litech.org/radvd/ ).  It
> currently exports the autoconfigured DNS list as /proc/net/ipv6_dns --
> ultimately it ought to (a) implement inotify on this file, so that
> glibc could use it like /etc/resolv.conf and get notifications when
> the DNS list changes, and (b) export the DNS list via netlink as well.
> 
> Comments & discussion, please!
>  --scott

> diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h
> linux-2.6.22-rc5/include/net/ip6_rdnss.h
> --- linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h   1969-12-31
> 19:00:00.0 -0500
> +++ linux-2.6.22-rc5/include/net/ip6_rdnss.h2007-06-21
> 18:16:33.0 -0400 @@ -0,0 +1,58 @@
> +#ifndef _NET_IP6_RDNSS_H
> +#define _NET_IP6_RDNSS_H
> +
> +#ifdef __KERNEL__
> +
> +#include 
> +
> +struct nd_opt_rdnss {
> +   __u8type;
> +   __u8length;
> +#if defined(__BIG_ENDIAN_BITFIELD)
> +   __u8priority:4,
> +   open:1,
> +   reserved1:3;
> +#elif defined(__LITTLE_ENDIAN_BITFIELD)
> +   __u8reserved1:3,
> +   open:1,
> +   priority:4;
> +#else
> +# error not little or big endian
> +#endif

That is not endianess-safe. Don't use foo:x at all
for stuff where a specific endianess is needed. The
compiler doesn't make any guarantee about it.

Please do
__u8 flags;
#define FOOBAR_RESERVED 0x07
#define FOOBAR_OPEN 0x08
#define FOOBAR_PRIORITY 0xF0

and use them in the code.

In general I try to avoid the foo:x stuff, as it
has little or no gain. It just generates worse code.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFD] First draft of RDNSS-in-RA support for IPv6 DNS autoconfiguration

2007-06-22 Thread C. Scott Ananian

Attached is my first draft of a patch to implement RDNSS-in-Router
Advertisements support for IPv6 (
http://tools.ietf.org/html/draft-jeong-dnsop-ipv6-dns-discovery-12 )
as implemented in radvd ( http://www.litech.org/radvd/ ).  It
currently exports the autoconfigured DNS list as /proc/net/ipv6_dns --
ultimately it ought to (a) implement inotify on this file, so that
glibc could use it like /etc/resolv.conf and get notifications when
the DNS list changes, and (b) export the DNS list via netlink as well.

Comments & discussion, please!
--scott

[ps. i'm copy-and-pasting the patch into gmail, against my better
judgement.  let me know if it doesn't apply for you, and i'll resent
in a less-clever mail agent.]
--
( http://cscott.net/ )


diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_fib.h
linux-2.6.22-rc5/include/net/ip6_fib.h
--- linux-2.6.22-rc5-orig/include/net/ip6_fib.h 2007-06-16
22:09:12.0 -0400
+++ linux-2.6.22-rc5/include/net/ip6_fib.h  2007-06-20 14:17:58.0 
-0400
@@ -79,6 +79,7 @@ struct rt6key
};

struct fib6_table;
+struct rdns6_info;

struct rt6_info
{
@@ -105,6 +106,8 @@ struct rt6_info
struct rt6key   rt6i_src;

u8  rt6i_protocol;
+
+struct rdns6_info   *rt6i_rdnss;
};

static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h
linux-2.6.22-rc5/include/net/ip6_rdnss.h
--- linux-2.6.22-rc5-orig/include/net/ip6_rdnss.h   1969-12-31
19:00:00.0 -0500
+++ linux-2.6.22-rc5/include/net/ip6_rdnss.h2007-06-21 18:16:33.0 
-0400
@@ -0,0 +1,58 @@
+#ifndef _NET_IP6_RDNSS_H
+#define _NET_IP6_RDNSS_H
+
+#ifdef __KERNEL__
+
+#include 
+
+struct nd_opt_rdnss {
+   __u8type;
+   __u8length;
+#if defined(__BIG_ENDIAN_BITFIELD)
+   __u8priority:4,
+   open:1,
+   reserved1:3;
+#elif defined(__LITTLE_ENDIAN_BITFIELD)
+   __u8reserved1:3,
+   open:1,
+   priority:4;
+#else
+# error not little or big endian
+#endif
+   __u8reserved2;
+   __be32  lifetime;
+   struct in6_addr rdnss[1];   /* 1 or more */
+};
+
+struct rdns6_entry {
+   struct rdns6_entry *next;
+   struct in6_addr rdnss;
+   __u8priority;
+   __u8open;
+   __u32   lifetime;
+   unsigned long   expires;
+};
+
+struct rdns6_info {
+   rwlock_tlock;
+   struct timer_list   expiry_timer;
+   struct rdns6_entry *rdnss_list;
+   struct inet6_dev *  in6_dev; /* back pointer for netlink notify */
+   int expire_all : 1, /* remove entries on ifdown */
+   free_me : 1; /* safely free this struct */
+};
+
+/* Receive and process an RA message with the given RDNSS options. */
+extern voidrdns6_ra_rcv(struct inet6_dev *dev, struct rt6_info *rt,
+struct nd_opt_rdnss **opts, int opt_cnt);
+/* Expire all of the dns server info from a route (as on an ifdown). */
+extern voidrdns6_info_expire_all(struct rt6_info *rt);
+/* Delete the DNS list information from a struct rt6_info. */
+extern voidrdns6_info_del(struct rt6_info *rt);
+
+/* Generate the /proc/net/ipv6_dns file. */
+extern int rdns6_proc_info(char *buffer, char **start,
+   off_t offset, int length);
+
+#endif
+#endif
diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/include/net/ndisc.h
linux-2.6.22-rc5/include/net/ndisc.h
--- linux-2.6.22-rc5-orig/include/net/ndisc.h   2007-06-16
22:09:12.0 -0400
+++ linux-2.6.22-rc5/include/net/ndisc.h2007-06-18 15:30:00.0 
-0400
@@ -24,6 +24,7 @@ enum {
ND_OPT_MTU = 5, /* RFC2461 */
__ND_OPT_ARRAY_MAX,
ND_OPT_ROUTE_INFO = 24, /* RFC4191 */
+   ND_OPT_RDNSS_INFO = 25, /* draft/radvd */
__ND_OPT_MAX
};

diff -ruHpN -X dontdiff linux-2.6.22-rc5-orig/net/ipv6/Makefile
linux-2.6.22-rc5/net/ipv6/Makefile
--- linux-2.6.22-rc5-orig/net/ipv6/Makefile 2007-06-16 22:09:12.0 
-0400
+++ linux-2.6.22-rc5/net/ipv6/Makefile  2007-06-18 16:39:02.0 -0400
@@ -8,7 +8,7 @@ ipv6-objs :=af_inet6.o anycast.o ip6_ou
route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
raw.o protocol.o icmp.o mcast.o reassembly.o tcp_ipv6.o \
exthdrs.o sysctl_net_ipv6.o datagram.o \
-   ip6_flowlabel.o inet6_connection_sock.o
+   ip6_flowlabel.o inet6_connection_sock.o ip6_rdnss.o

ipv6-$(CONFIG_XFRM) += xfrm6_policy.o xfrm6_state.o xfrm6_input.o \
xfrm6_outp

[PATCH] sctp: lock_sock_nested in sctp_sock_migrate

2007-06-22 Thread Zach Brown
I'm not sure that I've gotten either the sctp or lockdep details right,
but with this patch I don't get lockdep yelling at me any more :)

--

sctp: lock_sock_nested in sctp_sock_migrate

sctp_sock_migrate() grabs the socket lock on a newly allocated socket while
holding the socket lock on an old socket.  lockdep worries that this might
be a recursive lock attempt.

 task/3026 is trying to acquire lock:
  (sk_lock-AF_INET){--..}, at: [] 
sctp_sock_migrate+0x2e3/0x327 [sctp]
 but task is already holding lock:
  (sk_lock-AF_INET){--..}, at: [] sctp_accept+0xdf/0x1e3 
[sctp]

This patch tells lockdep that this locking is safe by using
lock_sock_nested().

Signed-off-by: Zach Brown <[EMAIL PROTECTED]>

diff -r 8adcfdf2545b net/sctp/socket.c
--- a/net/sctp/socket.c Fri Jun 22 11:11:33 2007 -0700
+++ b/net/sctp/socket.c Fri Jun 22 15:05:22 2007 -0700
@@ -6084,8 +6084,11 @@ static void sctp_sock_migrate(struct soc
 * queued to the backlog.  This prevents a potential race between
 * backlog processing on the old socket and new-packet processing
 * on the new socket.
-*/
-   sctp_lock_sock(newsk);
+*
+* The caller has just allocated newsk so we can guarantee that other
+* paths won't try to lock it and then oldsk.
+*/
+   lock_sock_nested(newsk, SINGLE_DEPTH_NESTING);
sctp_assoc_migrate(assoc, newsk);
 
/* If the association on the newsk is already closed before accept()

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: Basic network namespace infrastructure.

2007-06-22 Thread Eric W. Biederman

This is the basic infrastructure needed to support network
namespaces.  This infrastructure is:
- Registration functions to support initializing per network
  namespace data when a network namespaces is created or destroyed.

- struct net.  The network namespace datastructure.
  This structure will grow as variables are made per network
  namespace but this is the minimal starting point.

- Functions to grab a reference to the network namespace.
  I provide both get/put functions that keep a network namespace
  from being freed.  And hold/release functions serve as weak references
  and will warn if their count is not zero when the data structure
  is freed.  Useful for dealing with more complicated data structures
  like the ipv4 route cache.

- A list of all of the network namespaces so we can iterate over them.

- A slab for the network namespace data structures allowing leaks
  to be spotted.

I have deliberately chosen to not make it possible to compile out the
code as the support for per-network namespace initialization and
uninitialization needs to always be compiled in once code has
started using it (even if we don't have network namespaces,
and because no one has ever measured any performance overhead
specific to network namespace infrastructure.  As code
to compile out the network namespace pointers etc is complicated
it is best to avoid that code unless that complexity is justified.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 include/net/net_namespace.h |   66 ++
 net/core/Makefile   |2 +-
 net/core/net_namespace.c|  291 +++
 3 files changed, 358 insertions(+), 1 deletions(-)
 create mode 100644 include/net/net_namespace.h
 create mode 100644 net/core/net_namespace.c

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
new file mode 100644
index 000..c909b3a
--- /dev/null
+++ b/include/net/net_namespace.h
@@ -0,0 +1,66 @@
+/*
+ * Operations on the network namespace
+ */
+#ifndef __NET_NET_NAMESPACE_H
+#define __NET_NET_NAMESPACE_H
+
+#include 
+#include 
+#include 
+
+struct net {
+   atomic_t count; /* To decided when the network namespace
+* should go
+*/
+   atomic_t use_count; /* For references we destroy on demand */
+   struct list_head list;  /* list of network namespace structures */
+   struct work_struct work;/* work struct for freeing */
+};
+
+extern struct net init_net;
+extern struct list_head net_namespace_list;
+
+extern void __put_net(struct net *net);
+
+static inline struct net *get_net(struct net *net)
+{
+   atomic_inc(&net->count);
+   return net;
+}
+
+static inline void put_net(struct net *net)
+{
+   if (atomic_dec_and_test(&net->count))
+   __put_net(net);
+}
+
+static inline struct net *hold_net(struct net *net)
+{
+   atomic_inc(&net->use_count);
+   return net;
+}
+
+static inline void release_net(struct net *net)
+{
+   atomic_dec(&net->use_count);
+}
+
+extern void net_lock(void);
+extern void net_unlock(void);
+
+#define for_each_net(VAR)  \
+   list_for_each_entry(VAR, &net_namespace_list, list);
+
+
+struct pernet_operations {
+   struct list_head list;
+   int (*init)(struct net *net);
+   void (*exit)(struct net *net);
+};
+
+extern int register_pernet_subsys(struct pernet_operations *);
+extern void unregister_pernet_subsys(struct pernet_operations *);
+extern int register_pernet_device(struct pernet_operations *);
+extern void unregister_pernet_device(struct pernet_operations *);
+
+#endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/core/Makefile b/net/core/Makefile
index 4751613..ea9b3f3 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -3,7 +3,7 @@
 #
 
 obj-y := sock.o request_sock.o skbuff.o iovec.o datagram.o stream.o scm.o \
-gen_stats.o gen_estimator.o
+gen_stats.o gen_estimator.o net_namespace.o
 
 obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
new file mode 100644
index 000..397c15f
--- /dev/null
+++ b/net/core/net_namespace.c
@@ -0,0 +1,291 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Our network namespace constructor/destructor lists
+ */
+
+static LIST_HEAD(pernet_list);
+static struct list_head *first_device = &pernet_list;
+static DEFINE_MUTEX(net_mutex);
+
+static DEFINE_MUTEX(net_list_mutex);
+LIST_HEAD(net_namespace_list);
+
+static struct kmem_cache *net_cachep;
+
+struct net init_net;
+
+void net_lock(void)
+{
+   mutex_lock(&net_list_mutex);
+}
+
+void net_unlock(void)
+{
+   mutex_unlock(&net_list_mutex);
+}
+
+static struct net *net_alloc(void)
+{
+   return kmem_cache_alloc(net_cachep, GFP_KERNEL);
+}
+
+static void net_free(struct net *net)
+{
+   if (!net)
+   return;
+
+   if (unlikely(

RE: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-22 Thread Waskiewicz Jr, Peter P
> Patrick McHardy wrote:
> > Waskiewicz Jr, Peter P wrote:
> > 
> >>Thought about this more last night and this morning.  As 
> far as I can 
> >>tell, I still need this.  If the qdisc gets loaded with multiqueue 
> >>turned on, I can just use the value of band to assign
> >>skb->queue_mapping.  But if the qdisc is loaded without multiqueue
> >>support, then I need to assign a value of zero to queue_mapping, or 
> >>not assign it at all (it will be zero'd out before the call to 
> >>->enqueue() in dev_queue_xmit()).  But I'd rather not have a 
> >>conditional in the hotpath checking if the qdisc is multiqueue; I'd 
> >>rather have the array to match the bands so I can just do 
> an assignment.
> >>
> >>What do you think?
> > 
> > 
> > 
> > I very much doubt that it has any measurable impact. You 
> can also add 
> > a small inline function
> > 
> > void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue)
> 
> 
> OK I didn't really listen obviously :) A compile time option 
> won't help. Just remove it and assign it conditionally.

Sounds good.  Thanks Patrick.

-PJ
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] pasemi_mac: Fix TX interrupt threshold

2007-06-22 Thread Olof Johansson
It was mistakenly set to interrupt on the second packet instead of first, 
causing
some interesting latency behaviour.


Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: netdev-2.6/drivers/net/pasemi_mac.c
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.c
+++ netdev-2.6/drivers/net/pasemi_mac.c
@@ -755,7 +755,7 @@ static int pasemi_mac_open(struct net_de
flags |= PAS_MAC_CFG_PCFG_TSR_1G | PAS_MAC_CFG_PCFG_SPD_1G;
 
pci_write_config_dword(mac->iob_pdev, 
PAS_IOB_DMA_RXCH_CFG(mac->dma_rxch),
-  PAS_IOB_DMA_RXCH_CFG_CNTTH(1));
+  PAS_IOB_DMA_RXCH_CFG_CNTTH(0));
 
pci_write_config_dword(mac->iob_pdev, 
PAS_IOB_DMA_TXCH_CFG(mac->dma_txch),
   PAS_IOB_DMA_TXCH_CFG_CNTTH(32));

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] pasemi_mac: Clean TX ring in poll

2007-06-22 Thread Olof Johansson
Clean the TX ring in the poll call, to avoid sitting on mapped buffers
for a long time. NFS doesn't seem to like it much, for example.


Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: netdev-2.6/drivers/net/pasemi_mac.c
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.c
+++ netdev-2.6/drivers/net/pasemi_mac.c
@@ -1052,6 +1052,7 @@ static int pasemi_mac_poll(struct net_de
int pkts, limit = min(*budget, dev->quota);
struct pasemi_mac *mac = netdev_priv(dev);
 
+   pasemi_mac_clean_tx(mac);
pkts = pasemi_mac_clean_rx(mac, limit);
 
dev->quota -= pkts;

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] pasemi_mac patches for 2.6.23

2007-06-22 Thread Olof Johansson
Hi,

pasemi_mac patches: minor tweaks, bugfixes and perf enhancements.

Please consider for the 2.6.23 merge window.


Thanks,

-Olof
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] pasemi_mac: Use MMIO instead of pci config accessors

2007-06-22 Thread Olof Johansson
Move away from using the pci config access functions for simple register
access.  Our device has all of the registers in the config space (hey,
from the hardware point of view it looks reasonable :-), so we need to
somehow get to it. Newer firmwares have it in the device tree such that
we can just get it and ioremap it there (in case it ever moves in future
products). For now, provide a hardcoded fallback for older firmwares.


Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>


Index: netdev-2.6/drivers/net/pasemi_mac.c
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.c
+++ netdev-2.6/drivers/net/pasemi_mac.c
@@ -81,46 +81,47 @@ MODULE_PARM_DESC(debug, "PA Semi MAC bit
 
 static struct pasdma_status *dma_status;
 
-static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg)
+static inline unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int 
reg)
 {
unsigned int val;
 
-   pci_read_config_dword(mac->iob_pdev, reg, &val);
+   val = in_le32(mac->iob_regs+reg);
+
return val;
 }
 
-static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg,
+static inline void write_iob_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
-   pci_write_config_dword(mac->iob_pdev, reg, val);
+   out_le32(mac->iob_regs+reg, val);
 }
 
-static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg)
+static inline unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int 
reg)
 {
unsigned int val;
 
-   pci_read_config_dword(mac->pdev, reg, &val);
+   val = in_le32(mac->regs+reg);
return val;
 }
 
-static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg,
+static inline void write_mac_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
-   pci_write_config_dword(mac->pdev, reg, val);
+   out_le32(mac->regs+reg, val);
 }
 
-static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg)
+static inline unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int 
reg)
 {
unsigned int val;
 
-   pci_read_config_dword(mac->dma_pdev, reg, &val);
+   val = in_le32(mac->dma_regs+reg);
return val;
 }
 
-static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg,
+static inline void write_dma_reg(struct pasemi_mac *mac, unsigned int reg,
  unsigned int val)
 {
-   pci_write_config_dword(mac->dma_pdev, reg, val);
+   out_le32(mac->dma_regs+reg, val);
 }
 
 static int pasemi_get_mac_addr(struct pasemi_mac *mac)
@@ -585,7 +586,6 @@ static int pasemi_mac_clean_tx(struct pa
}
mac->tx->next_to_clean += count;
spin_unlock_irqrestore(&mac->tx->lock, flags);
-
netif_wake_queue(mac->netdev);
 
return count;
@@ -1077,6 +1077,73 @@ static int pasemi_mac_poll(struct net_de
}
 }
 
+static inline void __iomem * __devinit map_onedev(struct pci_dev *p, int index)
+{
+   struct device_node *dn;
+   void __iomem *ret;
+
+   dn = pci_device_to_OF_node(p);
+   if (!dn)
+   goto fallback;
+
+   ret = of_iomap(dn, index);
+   if (!ret)
+   goto fallback;
+
+   return ret;
+fallback:
+   /* This is hardcoded and ugly, but we have some firmware versions
+* who don't provide the register space in the device tree. Luckily
+* they are at well-known locations so we can just do the math here.
+*/
+   return ioremap(0xe000 + (p->devfn << 12), 0x1000);
+}
+
+static int __devinit pasemi_mac_map_regs(struct pasemi_mac *mac)
+{
+   struct resource res;
+   struct device_node *dn;
+   int err;
+
+   mac->dma_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa007, NULL);
+   if (!mac->dma_pdev) {
+   dev_err(&mac->pdev->dev, "Can't find DMA Controller\n");
+   return -ENODEV;
+   }
+
+   mac->iob_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa001, NULL);
+   if (!mac->iob_pdev) {
+   dev_err(&mac->pdev->dev, "Can't find I/O Bridge\n");
+   return -ENODEV;
+   }
+
+   mac->regs = map_onedev(mac->pdev, 0);
+   mac->dma_regs = map_onedev(mac->dma_pdev, 0);
+   mac->iob_regs = map_onedev(mac->iob_pdev, 0);
+
+   if (!mac->regs || !mac->dma_regs || !mac->iob_regs) {
+   dev_err(&mac->pdev->dev, "Can't map registers\n");
+   return -ENODEV;
+   }
+
+   /* The dma status structure is located in the I/O bridge, and
+* is cache coherent.
+*/
+   if (!dma_status) {
+   dn = pci_device_to_OF_node(mac->iob_pdev);
+   if (dn)
+   err = of_address_to_resource(dn, 1, &res);
+   if (!dn || err) {
+   /* Fallback for old firmware */
+   res.start = 0xfd80;
+

[PATCH 3/9] pasemi_mac: Abstract out register access

2007-06-22 Thread Olof Johansson
Abstract out the PCI config read/write accesses into reg read/write ones, still
calling the pci accessors on the back end.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>


Index: netdev-2.6/drivers/net/pasemi_mac.c
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.c
+++ netdev-2.6/drivers/net/pasemi_mac.c
@@ -81,6 +81,48 @@ MODULE_PARM_DESC(debug, "PA Semi MAC bit
 
 static struct pasdma_status *dma_status;
 
+static unsigned int read_iob_reg(struct pasemi_mac *mac, unsigned int reg)
+{
+   unsigned int val;
+
+   pci_read_config_dword(mac->iob_pdev, reg, &val);
+   return val;
+}
+
+static void write_iob_reg(struct pasemi_mac *mac, unsigned int reg,
+ unsigned int val)
+{
+   pci_write_config_dword(mac->iob_pdev, reg, val);
+}
+
+static unsigned int read_mac_reg(struct pasemi_mac *mac, unsigned int reg)
+{
+   unsigned int val;
+
+   pci_read_config_dword(mac->pdev, reg, &val);
+   return val;
+}
+
+static void write_mac_reg(struct pasemi_mac *mac, unsigned int reg,
+ unsigned int val)
+{
+   pci_write_config_dword(mac->pdev, reg, val);
+}
+
+static unsigned int read_dma_reg(struct pasemi_mac *mac, unsigned int reg)
+{
+   unsigned int val;
+
+   pci_read_config_dword(mac->dma_pdev, reg, &val);
+   return val;
+}
+
+static void write_dma_reg(struct pasemi_mac *mac, unsigned int reg,
+ unsigned int val)
+{
+   pci_write_config_dword(mac->dma_pdev, reg, val);
+}
+
 static int pasemi_get_mac_addr(struct pasemi_mac *mac)
 {
struct pci_dev *pdev = mac->pdev;
@@ -166,22 +208,21 @@ static int pasemi_mac_setup_rx_resources
 
memset(ring->buffers, 0, RX_RING_SIZE * sizeof(u64));
 
-   pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXCHAN_BASEL(chan_id),
-  PAS_DMA_RXCHAN_BASEL_BRBL(ring->dma));
+   write_dma_reg(mac, PAS_DMA_RXCHAN_BASEL(chan_id), 
PAS_DMA_RXCHAN_BASEL_BRBL(ring->dma));
+
+   write_dma_reg(mac, PAS_DMA_RXCHAN_BASEU(chan_id),
+  PAS_DMA_RXCHAN_BASEU_BRBH(ring->dma >> 32) |
+  PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 2));
+
+   write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id),
+  PAS_DMA_RXCHAN_CFG_HBU(1));
 
-   pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXCHAN_BASEU(chan_id),
-  PAS_DMA_RXCHAN_BASEU_BRBH(ring->dma >> 32) |
-  PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 2));
-
-   pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXCHAN_CFG(chan_id),
-  PAS_DMA_RXCHAN_CFG_HBU(1));
-
-   pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXINT_BASEL(mac->dma_if),
-  PAS_DMA_RXINT_BASEL_BRBL(__pa(ring->buffers)));
-
-   pci_write_config_dword(mac->dma_pdev, PAS_DMA_RXINT_BASEU(mac->dma_if),
-  PAS_DMA_RXINT_BASEU_BRBH(__pa(ring->buffers) >> 
32) |
-  PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3));
+   write_dma_reg(mac, PAS_DMA_RXINT_BASEL(mac->dma_if),
+  PAS_DMA_RXINT_BASEL_BRBL(__pa(ring->buffers)));
+
+   write_dma_reg(mac, PAS_DMA_RXINT_BASEU(mac->dma_if),
+  PAS_DMA_RXINT_BASEU_BRBH(__pa(ring->buffers) >> 32) |
+  PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3));
 
ring->next_to_fill = 0;
ring->next_to_clean = 0;
@@ -233,18 +274,18 @@ static int pasemi_mac_setup_tx_resources
 
memset(ring->desc, 0, TX_RING_SIZE * sizeof(struct pas_dma_xct_descr));
 
-   pci_write_config_dword(mac->dma_pdev, PAS_DMA_TXCHAN_BASEL(chan_id),
-  PAS_DMA_TXCHAN_BASEL_BRBL(ring->dma));
+   write_dma_reg(mac, PAS_DMA_TXCHAN_BASEL(chan_id),
+  PAS_DMA_TXCHAN_BASEL_BRBL(ring->dma));
val = PAS_DMA_TXCHAN_BASEU_BRBH(ring->dma >> 32);
val |= PAS_DMA_TXCHAN_BASEU_SIZ(TX_RING_SIZE >> 2);
 
-   pci_write_config_dword(mac->dma_pdev, PAS_DMA_TXCHAN_BASEU(chan_id), 
val);
+   write_dma_reg(mac, PAS_DMA_TXCHAN_BASEU(chan_id), val);
 
-   pci_write_config_dword(mac->dma_pdev, PAS_DMA_TXCHAN_CFG(chan_id),
-  PAS_DMA_TXCHAN_CFG_TY_IFACE |
-  PAS_DMA_TXCHAN_CFG_TATTR(mac->dma_if) |
-  PAS_DMA_TXCHAN_CFG_UP |
-  PAS_DMA_TXCHAN_CFG_WT(2));
+   write_dma_reg(mac, PAS_DMA_TXCHAN_CFG(chan_id),
+  PAS_DMA_TXCHAN_CFG_TY_IFACE |
+  PAS_DMA_TXCHAN_CFG_TATTR(mac->dma_if) |
+  PAS_DMA_TXCHAN_CFG_UP |
+  PAS_DMA_TXCHAN_CFG_WT(2));
 
ring->next_to_use = 0;
ring->next_to_clean = 0;
@@ -383,12 +424,8 @@ static void pasemi_mac_replenish_rx_ring
 
wmb();
 
-  

[PATCH 8/9] pasemi_mac: Reduce locking when cleaning TX ring

2007-06-22 Thread Olof Johansson
Postpone pci unmap and skb free of the transmitted buffers to outside
of the tx ring lock, batching them up 32 at a time.


Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: netdev-2.6/drivers/net/pasemi_mac.c
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.c
+++ netdev-2.6/drivers/net/pasemi_mac.c
@@ -562,37 +562,56 @@ static int pasemi_mac_clean_tx(struct pa
int i;
struct pasemi_mac_buffer *info;
struct pas_dma_xct_descr *dp;
-   int start, count;
+   unsigned int start, count, limit;
+   unsigned int total_count;
int flags;
+   struct sk_buff *skbs[32];
+   dma_addr_t dmas[32];
 
+   total_count = 0;
+restart:
spin_lock_irqsave(&mac->tx->lock, flags);
 
start = mac->tx->next_to_clean;
+   limit = min(mac->tx->next_to_use, start+32);
+
count = 0;
 
-   for (i = start; i < mac->tx->next_to_use; i++) {
+   for (i = start; i < limit; i++) {
dp = &TX_DESC(mac, i);
+
if (unlikely(dp->mactx & XCT_MACTX_O))
+   /* Not yet transmitted */
break;
 
-   count++;
-
info = &TX_DESC_INFO(mac, i);
-
-   pci_unmap_single(mac->dma_pdev, info->dma,
-info->skb->len, PCI_DMA_TODEVICE);
-   dev_kfree_skb_irq(info->skb);
+   skbs[count] = info->skb;
+   dmas[count] = info->dma;
 
info->skb = NULL;
info->dma = 0;
dp->mactx = 0;
dp->ptr = 0;
+
+   count++;
}
mac->tx->next_to_clean += count;
spin_unlock_irqrestore(&mac->tx->lock, flags);
netif_wake_queue(mac->netdev);
 
-   return count;
+   for (i = 0; i < count; i++) {
+   pci_unmap_single(mac->dma_pdev, dmas[i],
+skbs[i]->len, PCI_DMA_TODEVICE);
+   dev_kfree_skb_irq(skbs[i]);
+   }
+
+   total_count += count;
+
+   /* If the batch was full, try to clean more */
+   if (count == 32)
+   goto restart;
+
+   return total_count;
 }
 
 

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] pasemi_mac: Minor performance tweaks

2007-06-22 Thread Olof Johansson
Various minor performance tweaks, do some explicit prefetching of packet
data, etc.

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: netdev-2.6/drivers/net/pasemi_mac.c
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.c
+++ netdev-2.6/drivers/net/pasemi_mac.c
@@ -481,6 +481,7 @@ static int pasemi_mac_clean_rx(struct pa
rmb();
 
dp = &RX_DESC(mac, n);
+   prefetchw(dp);
macrx = dp->macrx;
 
if (!(macrx & XCT_MACRX_O))
@@ -502,8 +503,10 @@ static int pasemi_mac_clean_rx(struct pa
if (info->dma == dma)
break;
}
+   prefetchw(info);
 
skb = info->skb;
+   prefetchw(skb);
info->dma = 0;
 
pci_unmap_single(mac->dma_pdev, dma, skb->len,
@@ -526,9 +529,7 @@ static int pasemi_mac_clean_rx(struct pa
 
skb_put(skb, len);
 
-   skb->protocol = eth_type_trans(skb, mac->netdev);
-
-   if ((macrx & XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK) {
+   if (likely((macrx & XCT_MACRX_HTY_M) == XCT_MACRX_HTY_IPV4_OK)) 
{
skb->ip_summed = CHECKSUM_COMPLETE;
skb->csum = (macrx & XCT_MACRX_CSUM_M) >>
   XCT_MACRX_CSUM_S;
@@ -538,6 +539,7 @@ static int pasemi_mac_clean_rx(struct pa
mac->stats.rx_bytes += len;
mac->stats.rx_packets++;
 
+   skb->protocol = eth_type_trans(skb, mac->netdev);
netif_receive_skb(skb);
 
dp->ptr = 0;
@@ -569,7 +571,7 @@ static int pasemi_mac_clean_tx(struct pa
 
for (i = start; i < mac->tx->next_to_use; i++) {
dp = &TX_DESC(mac, i);
-   if (!dp || (dp->mactx & XCT_MACTX_O))
+   if (unlikely(dp->mactx & XCT_MACTX_O))
break;
 
count++;
@@ -957,7 +959,7 @@ static int pasemi_mac_start_tx(struct sk
struct pasemi_mac_txring *txring;
struct pasemi_mac_buffer *info;
struct pas_dma_xct_descr *dp;
-   u64 dflags;
+   u64 dflags, mactx, ptr;
dma_addr_t map;
int flags;
 
@@ -985,6 +987,9 @@ static int pasemi_mac_start_tx(struct sk
if (dma_mapping_error(map))
return NETDEV_TX_BUSY;
 
+   mactx = dflags | XCT_MACTX_LLEN(skb->len);
+   ptr   = XCT_PTR_LEN(skb->len) | XCT_PTR_ADDR(map);
+
txring = mac->tx;
 
spin_lock_irqsave(&txring->lock, flags);
@@ -1005,12 +1010,11 @@ static int pasemi_mac_start_tx(struct sk
}
}
 
-
dp = &TX_DESC(mac, txring->next_to_use);
info = &TX_DESC_INFO(mac, txring->next_to_use);
 
-   dp->mactx = dflags | XCT_MACTX_LLEN(skb->len);
-   dp->ptr   = XCT_PTR_LEN(skb->len) | XCT_PTR_ADDR(map);
+   dp->mactx = mactx;
+   dp->ptr = ptr;
info->dma = map;
info->skb = skb;
 

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] pasemi_mac: Enable LLTX

2007-06-22 Thread Olof Johansson
Enable LLTX on pasemi_mac: we're already doing sufficient locking
in the driver to enable it.


Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>

Index: netdev-2.6/drivers/net/pasemi_mac.c
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.c
+++ netdev-2.6/drivers/net/pasemi_mac.c
@@ -1239,7 +1239,7 @@ pasemi_mac_probe(struct pci_dev *pdev, c
dev->set_multicast_list = pasemi_mac_set_rx_mode;
dev->weight = 64;
dev->poll = pasemi_mac_poll;
-   dev->features = NETIF_F_HW_CSUM;
+   dev->features = NETIF_F_HW_CSUM | NETIF_F_LLTX;
 
err = pasemi_mac_map_regs(mac);
if (err)

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] pasemi_mac: Enable L2 caching of packet headers

2007-06-22 Thread Olof Johansson
Enable settings to target L2 for the first few cachelines of the packet,
since we'll access them to get to the various headers.


Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>


Index: netdev-2.6/drivers/net/pasemi_mac.c
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.c
+++ netdev-2.6/drivers/net/pasemi_mac.c
@@ -216,7 +216,7 @@ static int pasemi_mac_setup_rx_resources
   PAS_DMA_RXCHAN_BASEU_SIZ(RX_RING_SIZE >> 2));
 
write_dma_reg(mac, PAS_DMA_RXCHAN_CFG(chan_id),
-  PAS_DMA_RXCHAN_CFG_HBU(1));
+  PAS_DMA_RXCHAN_CFG_HBU(2));
 
write_dma_reg(mac, PAS_DMA_RXINT_BASEL(mac->dma_if),
   PAS_DMA_RXINT_BASEL_BRBL(__pa(ring->buffers)));
@@ -225,6 +225,9 @@ static int pasemi_mac_setup_rx_resources
   PAS_DMA_RXINT_BASEU_BRBH(__pa(ring->buffers) >> 32) |
   PAS_DMA_RXINT_BASEU_SIZ(RX_RING_SIZE >> 3));
 
+   write_dma_reg(mac, PAS_DMA_RXINT_CFG(mac->dma_if),
+  PAS_DMA_RXINT_CFG_DHL(2));
+
ring->next_to_fill = 0;
ring->next_to_clean = 0;
 
Index: netdev-2.6/drivers/net/pasemi_mac.h
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.h
+++ netdev-2.6/drivers/net/pasemi_mac.h
@@ -218,6 +218,14 @@ enum {
 #definePAS_DMA_RXINT_RCMDSTA_ACT   0x0001
 #definePAS_DMA_RXINT_RCMDSTA_DROPS_M   0xfffe
 #definePAS_DMA_RXINT_RCMDSTA_DROPS_S   17
+#define PAS_DMA_RXINT_CFG(i)   (0x204+(i)*_PAS_DMA_RXINT_STRIDE)
+#definePAS_DMA_RXINT_CFG_DHL_M 0x0700
+#definePAS_DMA_RXINT_CFG_DHL_S 24
+#definePAS_DMA_RXINT_CFG_DHL(x)(((x) << PAS_DMA_RXINT_CFG_DHL_S) & \
+PAS_DMA_RXINT_CFG_DHL_M)
+#definePAS_DMA_RXINT_CFG_WIF   0x0002
+#definePAS_DMA_RXINT_CFG_WIL   0x0001
+
 #define PAS_DMA_RXINT_INCR(i)  (0x210+(i)*_PAS_DMA_RXINT_STRIDE)
 #definePAS_DMA_RXINT_INCR_INCR_M   0x
 #definePAS_DMA_RXINT_INCR_INCR_S   0

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] pasemi_mac: Simplify memcpy for short receives

2007-06-22 Thread Olof Johansson
No need to copy over the skipped align bytes (besides, NET_IP_ALIGN is
0 on ppc64).

Signed-off-by: Olof Johansson <[EMAIL PROTECTED]>


Index: netdev-2.6/drivers/net/pasemi_mac.c
===
--- netdev-2.6.orig/drivers/net/pasemi_mac.c
+++ netdev-2.6/drivers/net/pasemi_mac.c
@@ -516,9 +516,7 @@ static int pasemi_mac_clean_rx(struct pa
netdev_alloc_skb(mac->netdev, len + NET_IP_ALIGN);
if (new_skb) {
skb_reserve(new_skb, NET_IP_ALIGN);
-   memcpy(new_skb->data - NET_IP_ALIGN,
-   skb->data - NET_IP_ALIGN,
-   len + NET_IP_ALIGN);
+   memcpy(new_skb->data, skb->data, len);
/* save the skb in buffer_info as good */
skb = new_skb;
}

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFD] L2 Network namespace infrastructure

2007-06-22 Thread Eric W. Biederman

Currently all of the prerequisite work for implementing a network
namespace (i.e. virtualization of the network stack with one kernel)
has already been merged or is in the process of being merged.

Therefore it is now time for a bit of high level design review of
the network namespace work and time to begin sending patches.

-- User space semantics

If you are in a different network namespace it looks like you have a
separate independent copy of the network stack.

User visible kernel structures that will appear to be per network
namespace include network devices, routing tables, sockets, and
netfilter rules.


-- The basic design

There will be a network namespace structure that holds the global
variables for a network namespace, making those global variables
per network namespace.

One of those per network namespace global variables will be the
loopback device.  Which means the network namespace a packet resides
in can be found simply by examining the network device or the socket
the packet is traversing.

Either a pointer to this global structure will be passed into
the functions that need to reference per network namespace variables
or a structure that is already passed in (such as the network device)
will be modified to contain a pointer to the network namespace
structure.

Depending upon the data structure it will either be modified to hold
a per entry network namespace pointer or it there will be a separate
copy per network namespace.  For large global data structures like
the ipv4 routing cache hash table adding an additional pointer to the
entries appears the more reasonable solution.

The initialization and cleanup functions will be refactored into
functions that do the work on a network namespace basis and functions
that perform truly global initialization and cleanup.  And a
registration mechanism will be available to register functions that
are per network namespace.

It is a namespace so like the other namespaces that have been
implemented a clone flag will exist to create the namespace during
clone or unshare.

There will be an additional network stack feature that will allow
you to migrate network devices between namespaces.

When complete all of the features of the network stack ipv4, ipv6,
decnet, sysctls, virtual devices, routing tables, scheduling, ipsec,
netfilter, etc  should be able to operate in a per network namespace
fashion.

--- The implementation plan

The plan for implementing this is to first get network namespace
infrastructure merged.  So that pieces of the network stack can be
made to operate in a per network namespace fashion.

Then the plan is to proceed as if we are doing a global kernel lock
removal.  For each layer of the networking stack pass down the per
network namespace parameter to the functions and modify the functions
to verify they are only operating on the initial network namespace.
Then one piece at a time update the code to handle working in
multiple network namespaces, and push the network namespace
information down to the lower levels.

This plan calls for a lot of patches that are essentially noise.  But
the result is simple and generally obviously correct patches, that can
be easily reviewed, and can be safely merged one at a time, and don't
impose any additional ongoing maintenance overhead.

In my current proof of concept patchset it takes about 100 patches
before ipv4 is up and working.

--- Performance

In initial measurements the only performance overhead we have been
able to measure is getting the packet to the network namespace.
Going through ethernet bridging or routing seems to trigger copies
of the packet that slow things down.  When packets go directly to
the network namespace no performance penalty has yet been measured.

--- The question

At the design level does this approach sound reasonable?

Eric

p.s.  I will follow up shortly with a patch that is one implementation
of the basic network namespace infrastructure.  Feel free to cut it
to shreds (as it is likely overkill) but it should help put the pieces
of what I am talking about into perspective.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Ethernet driver for EISA only SNI RM200/RM400 machines

2007-06-22 Thread Thomas Bogendoerfer
Hi,

This is new ethernet driver, which use the code taken out of lasi_82596
(done by the other patch I just sent).

Thomas.


Ethernet driver for EISA only SNI RM200/RM400 machines

Signed-off-by: Thomas Bogendoerfer <[EMAIL PROTECTED]>
---

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b0d0d73..af5c90f 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -435,6 +435,13 @@ config LASI_82596
  Say Y here to support the builtin Intel 82596 ethernet controller
  found in Hewlett-Packard PA-RISC machines with 10Mbit ethernet.
 
+config SNI_82596
+   tristate "SNI RM ethernet"
+   depends on NET_ETHERNET && SNI_RM
+   help
+ Say Y here to support the on-board Intel 82596 ethernet controller
+ built into SNI RM machines.
+
 config MIPS_JAZZ_SONIC
tristate "MIPS JAZZ onboard SONIC Ethernet support"
depends on NET_ETHERNET && MACH_JAZZ
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index d268b49..b03270c 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -161,6 +161,7 @@ obj-$(CONFIG_ELPLUS) += 3c505.o
 obj-$(CONFIG_AC3200) += ac3200.o 8390.o
 obj-$(CONFIG_APRICOT) += 82596.o
 obj-$(CONFIG_LASI_82596) += lasi_82596.o
+obj-$(CONFIG_SNI_82596) += sni_82596.o
 obj-$(CONFIG_MVME16x_NET) += 82596.o
 obj-$(CONFIG_BVME6000_NET) += 82596.o
 obj-$(CONFIG_SC92031) += sc92031.o
diff --git a/drivers/net/sni_82596.c b/drivers/net/sni_82596.c
new file mode 100644
index 000..a37d08a
--- /dev/null
+++ b/drivers/net/sni_82596.c
@@ -0,0 +1,204 @@
+/*
+ * sni_82596.c -- driver for intel 82596 ethernet controller, as
+ *   used in older SNI RM machines
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SNI_82596_DRIVER_VERSION "SNI RM 82596 driver - Revision: 0.01"
+
+static char sni_82596_string[] = "snirm_82596";
+
+#define DMA_ALLOC  dma_alloc_coherent
+#define DMA_FREE   dma_free_coherent
+#define DMA_WBACK(priv, addr, len) do { } while (0)
+#define DMA_INV(priv, addr, len)   do { } while (0)
+#define DMA_WBACK_INV(priv, addr, len) do { } while (0)
+
+#define SYSBUS  0x4400
+
+/* big endian CPU, 82596 little endian */
+#define SWAP32(x)   cpu_to_le32((u32)(x))
+#define SWAP16(x)   cpu_to_le16((u16)(x))
+
+#define OPT_MPU_16BIT0x01
+
+static inline void CA(struct net_device *dev);
+static inline void MPU_PORT(struct net_device *dev, int c, dma_addr_t x);
+
+#include "lib82596.c"
+
+MODULE_AUTHOR("Thomas Bogendoerfer");
+MODULE_DESCRIPTION("i82596 driver");
+MODULE_LICENSE("GPL");
+module_param(i596_debug, int, 0);
+MODULE_PARM_DESC(i596_debug, "82596 debug mask");
+
+static inline void CA(struct net_device *dev)
+{
+   struct i596_private *lp = netdev_priv(dev);
+   
+   writel(0, lp->ca);
+}
+
+
+static inline void MPU_PORT(struct net_device *dev, int c, dma_addr_t x)
+{
+   struct i596_private *lp = netdev_priv(dev);
+
+   u32 v = (u32) (c) | (u32) (x);
+   
+   if (lp->options & OPT_MPU_16BIT) {
+   writew(v & 0x, lp->mpu_port);
+   wmb(); udelay(1); /* order writes to MPU port */
+   writew(v >> 16, lp->mpu_port);
+   } else {
+   writel(v, lp->mpu_port);
+   wmb(); udelay(1); /* order writes to MPU port */
+   writel(v, lp->mpu_port);
+   }
+}
+
+
+static int __devinit sni_82596_probe(struct platform_device *dev)
+{
+   struct  net_device *netdevice;
+   struct i596_private *lp;
+   struct  resource *res, *ca, *idprom, *options;
+   int retval = -ENODEV;
+   static int init;
+   void __iomem *mpu_addr = NULL;
+   void __iomem *ca_addr = NULL;
+   u8 __iomem *eth_addr = NULL;
+   
+   if (init == 0) {
+   printk(KERN_INFO SNI_82596_DRIVER_VERSION "\n");
+   init++;
+   }
+   
+   res = platform_get_resource(dev, IORESOURCE_MEM, 0);
+   if (!res)
+   goto probe_failed;
+   mpu_addr = ioremap_nocache(res->start, 4);
+   if (!mpu_addr) {
+   retval = -ENOMEM;
+   goto probe_failed;
+   }
+   ca = platform_get_resource(dev, IORESOURCE_MEM, 1);
+   if (!ca)
+   goto probe_failed;
+   ca_addr = ioremap_nocache(ca->start, 4);
+   if (!ca_addr) {
+   retval = -ENOMEM;
+   goto probe_failed;
+   }
+   idprom = platform_get_resource(dev, IORESOURCE_MEM, 2);
+   if (!idprom)
+   goto probe_failed;
+   eth_addr = ioremap_nocache(idprom->start, 0x10);
+   if (!eth_addr) {
+   retval = -ENOMEM;
+   goto probe_failed;
+   }
+   options = platform_get_resource(dev, 0, 0);
+   if (!options)
+   goto probe_failed;
+
+   printk(KERN_INFO "Fo

Re: Linksys Gigabit USB2.0 adapter (asix) regression

2007-06-22 Thread Erik Slagter
David Hollis wrote:

>> To rule out the possibility of the nic being defective, I connected the
>> USB nic to a windows computer. There it works, although the ethernet
>> connection is a bit flaky (just like it seems...).
>>
>> Then I did a diff on the respective kernel sources of 2.6.20.3 and
>> 2.6.22-rc2 (asix.c and usbnet.c), I found a few changes, but they do not
>> seem to be related to my problem.
>>
>> I am the and of my repertoire here, can anyone please do some
>> suggestions for further testing or even better, fix it ;-)
> 
> You wouldn't happen to know what PHY that device is using?  The AX88178
> (Gigabit USB Ethernet) support in the driver currently only supports the
> Marvell PHY, which is the only one I've actually encountered to-date.
> If you can rebuild the driver from your kernel sources but with DEBUG
> enabled (uncomment it at the top of asix.c)

No problem, I will do it on sunday. No need to build the driver
out-of-tree btw.

> After you build the module, load it with insmod ./asix.ko, plug in your
> device and send me the dmesg output.  I'm particularly interested in the
> PHYID=0x12345678 line.  That will tell me what PHY chip is being used in
> that device and if I need to add support for it.

Will do. Thanks!

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-22 Thread Patrick McHardy
Patrick McHardy wrote:
> Waskiewicz Jr, Peter P wrote:
> 
>>Thought about this more last night and this morning.  As far as I can
>>tell, I still need this.  If the qdisc gets loaded with multiqueue
>>turned on, I can just use the value of band to assign
>>skb->queue_mapping.  But if the qdisc is loaded without multiqueue
>>support, then I need to assign a value of zero to queue_mapping, or not
>>assign it at all (it will be zero'd out before the call to ->enqueue()
>>in dev_queue_xmit()).  But I'd rather not have a conditional in the
>>hotpath checking if the qdisc is multiqueue; I'd rather have the array
>>to match the bands so I can just do an assignment.
>>
>>What do you think?
> 
> 
> 
> I very much doubt that it has any measurable impact. You can
> also add a small inline function
> 
> void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue)


OK I didn't really listen obviously :) A compile time option
won't help. Just remove it and assign it conditionally.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-22 Thread Patrick McHardy
Patrick McHardy wrote:
> void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue)
> {
> #ifdef CONFIG_NET_SCH_MULTIQUEUE
>   skb->queue_mapping = queue;
> #else
>   skb->queue_mapping = 0;
> #endif


Maybe even use it everywhere and guard skb->queue_mapping by
an #ifdef, on 32 bit it does enlarge the skb.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-22 Thread Patrick McHardy
Waskiewicz Jr, Peter P wrote:
>>> #include 
>>>@@ -40,9 +42,13 @@
>>> struct prio_sched_data
>>> {
>>> int bands;
>>>+#ifdef CONFIG_NET_SCH_RR
>>>+int curband; /* for round-robin */
>>>+#endif
>>> struct tcf_proto *filter_list;
>>> u8  prio2band[TC_PRIO_MAX+1];
>>> struct Qdisc *queues[TCQ_PRIO_BANDS];
>>>+u16 band2queue[TC_PRIO_MAX + 1];
>>>  
>>
>>Why is this still here? Its a 1:1 mapping.
> 
> 
> Thought about this more last night and this morning.  As far as I can
> tell, I still need this.  If the qdisc gets loaded with multiqueue
> turned on, I can just use the value of band to assign
> skb->queue_mapping.  But if the qdisc is loaded without multiqueue
> support, then I need to assign a value of zero to queue_mapping, or not
> assign it at all (it will be zero'd out before the call to ->enqueue()
> in dev_queue_xmit()).  But I'd rather not have a conditional in the
> hotpath checking if the qdisc is multiqueue; I'd rather have the array
> to match the bands so I can just do an assignment.
> 
> What do you think?


I very much doubt that it has any measurable impact. You can
also add a small inline function

void skb_set_queue_mapping(struct sk_buff *skb, unsigned int queue)
{
#ifdef CONFIG_NET_SCH_MULTIQUEUE
skb->queue_mapping = queue;
#else
skb->queue_mapping = 0;
#endif
}
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-22 Thread Waskiewicz Jr, Peter P
> >  #include 
> > @@ -40,9 +42,13 @@
> >  struct prio_sched_data
> >  {
> > int bands;
> > +#ifdef CONFIG_NET_SCH_RR
> > +   int curband; /* for round-robin */
> > +#endif
> > struct tcf_proto *filter_list;
> > u8  prio2band[TC_PRIO_MAX+1];
> > struct Qdisc *queues[TCQ_PRIO_BANDS];
> > +   u16 band2queue[TC_PRIO_MAX + 1];
> >   
> 
> Why is this still here? Its a 1:1 mapping.

Thought about this more last night and this morning.  As far as I can
tell, I still need this.  If the qdisc gets loaded with multiqueue
turned on, I can just use the value of band to assign
skb->queue_mapping.  But if the qdisc is loaded without multiqueue
support, then I need to assign a value of zero to queue_mapping, or not
assign it at all (it will be zero'd out before the call to ->enqueue()
in dev_queue_xmit()).  But I'd rather not have a conditional in the
hotpath checking if the qdisc is multiqueue; I'd rather have the array
to match the bands so I can just do an assignment.

What do you think?

Thanks,
-PJ
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPROUTE]: Add nested compat attribute

2007-06-22 Thread Patrick McHardy
Patrick McHardy wrote:
>  extern int parse_rtattr(struct rtattr *tb[], int max, struct rtattr *rta, 
> int len);
>  extern int parse_rtattr_byindex(struct rtattr *tb[], int max, struct rtattr 
> *rta, int len);
> +extern int parse_rtattr_nested_compat(struct rtattr *tb[], int max, struct 
> rtattr *rta, void **data, int len);


Same change as in the kernel patch, avoid cast in caller
and make the signature match existing ones better.

[IPROUTE]: Add nested compat attribute

Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

The attribute looks like this:

struct {
[ compat contents ]
struct rtattr {
.rta_len= total size,
.rta_type   = type,
} rta;
struct old_structure struct;

[ nested top-level attribute ]
struct rtattr {
.rta_len= nest size,
.rta_type   = type,
} nest_attr;

[ optional 0 .. n nested attributes ]
struct rtattr {
.rta_len= private attribute len,
.rta_type   = private attribute typ,
} nested_attr;
struct nested_data data;
};

Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 7d4cee8b6e0ddceda300ad1e4242cd068a722f0c
tree ea8b1245518d6ed352ef5d8e379588fa76981de9
parent cd71a8e07f57a74d52e62cc1fed39c03ad64bc08
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 19:13:08 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 19:13:08 +0200

 include/libnetlink.h |9 +
 lib/libnetlink.c |   46 ++
 2 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 49e248e..b67c5a5 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -39,15 +39,24 @@ extern int rtnl_send(struct rtnl_handle 
 extern int addattr32(struct nlmsghdr *n, int maxlen, int type, __u32 data);
 extern int addattr_l(struct nlmsghdr *n, int maxlen, int type, const void 
*data, int alen);
 extern int addraw_l(struct nlmsghdr *n, int maxlen, const void *data, int len);
+extern struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type);
+extern int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest);
+extern struct rtattr *addattr_nest_compat(struct nlmsghdr *n, int maxlen, int 
type, const void *data, int len);
+extern int addattr_nest_compat_end(struct nlmsghdr *n, struct rtattr *nest);
 extern int rta_addattr32(struct rtattr *rta, int maxlen, int type, __u32 data);
 extern int rta_addattr_l(struct rtattr *rta, int maxlen, int type, const void 
*data, int alen);
 
 extern int parse_rtattr(struct rtattr *tb[], int max, struct rtattr *rta, int 
len);
 extern int parse_rtattr_byindex(struct rtattr *tb[], int max, struct rtattr 
*rta, int len);
+extern int __parse_rtattr_nested_compat(struct rtattr *tb[], int max, struct 
rtattr *rta, int len);
 
 #define parse_rtattr_nested(tb, max, rta) \
(parse_rtattr((tb), (max), RTA_DATA(rta), RTA_PAYLOAD(rta)))
 
+#define parse_rtattr_nested_compat(tb, max, rta, data, len) \
+({ data = RTA_PAYLOAD(rta) >= len ? RTA_DATA(rta) : NULL; \
+   __parse_rtattr_nested_compat(tb, max, rta, len); })
+
 extern int rtnl_listen(struct rtnl_handle *, rtnl_filter_t handler,
   void *jarg);
 extern int rtnl_from_file(FILE *, rtnl_filter_t handler,
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 555dd5c..12883fe 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -527,6 +527,39 @@ int addraw_l(struct nlmsghdr *n, int max
return 0;
 }
 
+struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type)
+{
+   struct rtattr *nest = NLMSG_TAIL(n);
+
+   addattr_l(n, maxlen, type, NULL, 0);
+   return nest;
+}
+
+int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest)
+{
+   nest->rta_len = (void *)NLMSG_TAIL(n) - (void *)nest;
+   return n->nlmsg_len;
+}
+
+struct rtattr *addattr_nest_compat(struct nlmsghdr *n, int maxlen, int type,
+  const void *data, int len)
+{
+   struct rtattr *start = NLMSG_TAIL(n);
+
+   addattr_l(n, maxlen, type, data, len);
+   addattr_nest(n, maxlen, type);
+   return start;
+}
+
+int addattr_nest_compat_end(struct nlmsghdr *n, struct rtattr *start)
+{
+   struct rtattr *nest = (void *)start + NLMSG_ALIGN(start->rta_len);
+
+   start->rta_len = (void *)NLMSG_TAIL(n) - (void *)start;
+   addattr_nest_end(n, nest);
+   return n->nlmsg_len;
+}
+
 int rta_addattr32(struct rtattr *rta, int maxlen, int type, __u32 data)
 {
int len = RTA_LENGTH(4);
@@ -589,3 +622,16 @@ int parse_rtattr_byindex(struct

Re: [RTNETLINK]: Add nested compat attribute

2007-06-22 Thread Patrick McHardy
Patrick McHardy wrote:
>  extern int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr 
> *rta, int len);
> +extern int rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr,
> +   struct rtattr *rta, void **data, int len);
>  


This version is a bit nicer because it avoids a cast
in the caller and makes the signature match the
existing functions better.

In the previous version it the call would have looked like this:

if (rtattr_parse_nested_compat(tb, TCA_PRIO_MAX, opt, (void *)&qopt,
   sizeof(*qopt)))
return -EINVAL;

now its:

if (rtattr_parse_nested_compat(tb, TCA_PRIO_MAX, opt,
   qopt, sizeof(*qopt)))
return -EINVAL;



[RTNETLINK]: Add nested compat attribute

Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

The attribute looks like this:

struct {
[ compat contents ]
struct rtattr {
.rta_len= total size,
.rta_type   = type,
} rta;
struct old_structure struct;

[ nested top-level attribute ]
struct rtattr {
.rta_len= nest size,
.rta_type   = type,
} nest_attr;

[ optional 0 .. n nested attributes ]
struct rtattr {
.rta_len= private attribute len,
.rta_type   = private attribute typ,
} nested_attr;
struct nested_data data;
};

Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 1e16b5e521172515c53ce96a0bc3cf0e2d77c001
tree c9880c58391e2df77ecab3d9b6a6849947714eb3
parent c4edf5d552b1450d903a7e7e2d846f2169087e10
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 19:06:54 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 19:06:54 +0200

 include/linux/rtnetlink.h |   18 ++
 net/core/rtnetlink.c  |   14 ++
 2 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 6127858..d40b0c9 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -570,10 +570,16 @@ static __inline__ int rtattr_strcmp(cons
 }
 
 extern int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr *rta, 
int len);
+extern int __rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr,
+   struct rtattr *rta, int len);
 
 #define rtattr_parse_nested(tb, max, rta) \
rtattr_parse((tb), (max), RTA_DATA((rta)), RTA_PAYLOAD((rta)))
 
+#define rtattr_parse_nested_compat(tb, max, rta, data, len) \
+({ data = RTA_PAYLOAD(rta) >= len ? RTA_DATA(rta) : NULL; \
+   __rtattr_parse_nested_compat(tb, max, rta, len); })
+
 extern int rtnetlink_send(struct sk_buff *skb, u32 pid, u32 group, int echo);
 extern int rtnl_unicast(struct sk_buff *skb, u32 pid);
 extern int rtnl_notify(struct sk_buff *skb, u32 pid, u32 group,
@@ -638,6 +644,18 @@ #define RTA_NEST_END(skb, start) \
 ({ (start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \
(skb)->len; })
 
+#define RTA_NEST_COMPAT(skb, type, attrlen, data) \
+({ struct rtattr *__start = (struct rtattr *)skb_tail_pointer(skb); \
+   RTA_PUT(skb, type, attrlen, data); \
+   RTA_NEST(skb, type); \
+   __start; })
+
+#define RTA_NEST_COMPAT_END(skb, start) \
+({ struct rtattr *__nest = (void *)(start) + 
NLMSG_ALIGN((start)->rta_len); \
+   (start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \
+   RTA_NEST_END(skb, __nest); \
+   (skb)->len; })
+
 #define RTA_NEST_CANCEL(skb, start) \
 ({ if (start) \
skb_trim(skb, (unsigned char *) (start) - (skb)->data); \
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 06c0c5a..54c17e4 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -97,6 +97,19 @@ int rtattr_parse(struct rtattr *tb[], in
return 0;
 }
 
+int __rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr,
+struct rtattr *rta, int len)
+{
+   if (RTA_PAYLOAD(rta) < len)
+   return -1;
+   if (RTA_PAYLOAD(rta) >= RTA_ALIGN(len) + sizeof(struct rtattr)) {
+   rta = RTA_DATA(rta) + RTA_ALIGN(len);
+   return rtattr_parse_nested(tb, maxattr, rta);
+   }
+   memset(tb, 0, sizeof(struct rtattr *) * maxattr);
+   return 0;
+}
+
 static struct rtnl_link *rtnl_msg_handlers[NPROTO];
 
 static inline int rtm_msgindex(int msgtype)
@@ -1297,6 +1310,7 @@ void __init rtnetlink_init(void)
 EXPORT_SYMBOL(__rta_fill);
 EXPORT_SYMBOL(rtattr_strlcpy);
 EXPORT_SYMBOL(rtattr_parse);
+EXPORT_SYMBOL(__rtattr_pa

Re: [patch 0/7] CAN: Add new PF_CAN protocol family, try #3

2007-06-22 Thread Oliver Hartkopp
Patrick McHardy wrote:
> Oliver Hartkopp wrote:
>
>>
>> Is it the right approach to let netif_receive_skb() set the iif-value or
>> should we better set this value on our own before invoking netif_rx()?
>>   
>
> netif_receive_skb is meant to be used as a default, the driver can
> override this if it makes sense. If you touch it anyway you might
> as well set it to the final value.

The CAN bus is really not that high sophisticated network technology, so
it does not need more than the default internal network transport
mechanics the Linux Kernel already provides in an excellent manner.

I also thought about setting skb->iif myself to ensure the correct value
to be set - maybe Jamal has also an opinion on this. The CAN bus only
transports CAN-frames with a 11/29 bit CAN-Identifier (for CSMA/CA
arbitration) with up to 8 Bytes of payload. There is no space for VLANs
and other addressing schemes that are known from Ethernet or other
network media. So in opposite to all the fancy VLANs, routing, filter,
NAT and whatever the CAN is really dumb ;-)

Regards,
Oliver


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] alpha: fix alignment problem in csum_ipv6_magic()

2007-06-22 Thread Dustin Marquess

Ivan Kokshaysky wrote:

On Thu, Jun 21, 2007 at 04:35:01PM -0700, Andrew Morton wrote:

In http://bugzilla.kernel.org/show_bug.cgi?id=8659, Dustin is reporting
that this patch broke tcp-on-ipv6.


Oops. Two instructions operating on the 'len' arg ($18) got swapped...
This should fix ev6 version, ev5 one seems to be ok.

Signed-off-by: Ivan Kokshaysky <[EMAIL PROTECTED]>

Ivan.

--- 2.6.22-rc4-mm2/arch/alpha/lib/ev6-csum_ipv6_magic.S Fri Jun 22 15:02:23 2007
+++ linux/arch/alpha/lib/ev6-csum_ipv6_magic.S  Fri Jun 22 15:05:38 2007
@@ -76,18 +76,18 @@ csum_ipv6_magic:
 
 	cmoveq	$6,$31,$22	# E : src aligned?

ldq_u   $23,15($17) # L : Latency: 3
-   or  $18,$4,$18  # E : 00CCDDAABBCC
-   extql   $1,$6,$1# U : U L L U :
+   inswl   $18,3,$18   # U : 00CCDD00
+   addl$19,$7,$19  # E : U L U L : bbaabb00
 
 	or	$0,$22,$0	# E : 1st src word complete

-   extqh   $5,$6,$5# U :
-   addl$19,$7,$19  # E : bbaabb00
-   and $17,7,$6# E : L U L U : dst misalignment
+   extql   $1,$6,$1# U :
+   or  $18,$4,$18  # E : 00CCDDAABBCC
+   extqh   $5,$6,$5# U : L U L U
 
-	inswl	$18,3,$18	# U : 00CCDD00

-   or  $1,$5,$1# E : 2nd src word complete
+   and $17,7,$6# E : dst misalignment
extql   $2,$6,$2# U :
-   extqh   $3,$6,$22   # U : U L U U :
+   or  $1,$5,$1# E : 2nd src word complete
+   extqh   $3,$6,$22   # U : L U L U :
 
 	cmoveq	$6,$31,$22	# E : dst aligned?

extql   $3,$6,$3# U :




Awesome! Works like a champ! Thank you guys so very much! You rock!

-Dustin
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Nested compat attribute sch_prio example

2007-06-22 Thread Patrick McHardy

These two patches contain some example code how to use
the nested compat attribute in sch_prio.


[NET_SCHED]: sch_prio: nested compat attribute test

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 2ceff1c312a93446c95c41c3a54245a15278fe07
tree 7335b4957440cec27894dd38f3a707b4344f21ea
parent dece87e23c7cfa1159d3be0ea5b0db89a0fc5872
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 18:10:44 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 18:10:44 +0200

 include/linux/pkt_sched.h |9 +
 net/sched/sch_prio.c  |   15 ---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 2599d39..0bedabe 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -101,6 +101,15 @@ struct tc_prio_qopt
 	__u8	priomap[TC_PRIO_MAX+1];	/* Map: logical priority -> PRIO band */
 };
 
+enum
+{
+	TCA_PRIO_UNPSEC,
+	TCA_PRIO_TEST,
+	__TCA_PRIO_MAX
+};
+
+#define TCA_PRIO_MAX	(__TCA_PRIO_MAX - 1)
+
 /* TBF section */
 
 struct tc_tbf_qopt
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 6d7542c..40a13e8 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -198,10 +198,12 @@ prio_destroy(struct Qdisc* sch)
 static int prio_tune(struct Qdisc *sch, struct rtattr *opt)
 {
 	struct prio_sched_data *q = qdisc_priv(sch);
-	struct tc_prio_qopt *qopt = RTA_DATA(opt);
+	struct tc_prio_qopt *qopt;
+	struct rtattr *tb[TCA_PRIO_MAX];
 	int i;
 
-	if (opt->rta_len < RTA_LENGTH(sizeof(*qopt)))
+	if (rtattr_parse_nested_compat(tb, TCA_PRIO_MAX, opt, (void *)&qopt,
+   sizeof(*qopt)))
 		return -EINVAL;
 	if (qopt->bands > TCQ_PRIO_BANDS || qopt->bands < 2)
 		return -EINVAL;
@@ -211,6 +213,9 @@ static int prio_tune(struct Qdisc *sch, 
 			return -EINVAL;
 	}
 
+	if (tb[TCA_PRIO_TEST-1])
+		printk("TCA_PRIO_TEST: %u\n", *(u32 *)RTA_DATA(tb[TCA_PRIO_TEST-1]));
+
 	sch_tree_lock(sch);
 	q->bands = qopt->bands;
 	memcpy(q->prio2band, qopt->priomap, TC_PRIO_MAX+1);
@@ -268,11 +273,15 @@ static int prio_dump(struct Qdisc *sch, 
 {
 	struct prio_sched_data *q = qdisc_priv(sch);
 	unsigned char *b = skb_tail_pointer(skb);
+	struct rtattr *nest;
 	struct tc_prio_qopt opt;
 
 	opt.bands = q->bands;
 	memcpy(&opt.priomap, q->prio2band, TC_PRIO_MAX+1);
-	RTA_PUT(skb, TCA_OPTIONS, sizeof(opt), &opt);
+
+	nest = RTA_NEST_COMPAT(skb, TCA_OPTIONS, sizeof(opt), &opt);
+	RTA_PUT_U32(skb, TCA_PRIO_TEST, 321);
+	RTA_NEST_COMPAT_END(skb, nest);
 	return skb->len;
 
 rtattr_failure:
[IPROUTE]: sch_prio: nested compat attribute test

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 37e8758e1238bf26172f25a0e3b0dec9c8c4f986
tree c79e320def8f4c5a5ed7f037f3ca6ec68487b375
parent d283ea3c852f54941ec785ad39dbfa4586f518c7
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 18:00:04 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 18:00:04 +0200

 include/linux/pkt_sched.h |9 +
 tc/q_prio.c   |   13 ++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index dc61a85..77eaab1 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -101,6 +101,15 @@ struct tc_prio_qopt
 	__u8	priomap[TC_PRIO_MAX+1];	/* Map: logical priority -> PRIO band */
 };
 
+enum
+{
+	TCA_PRIO_UNSPEC,
+	TCA_PRIO_TEST,
+	__TCA_PRIO_MAX
+};
+
+#define TCA_PRIO_MAX	(__TCA_PRIO_MAX - 1)
+
 /* TBF section */
 
 struct tc_tbf_qopt
diff --git a/tc/q_prio.c b/tc/q_prio.c
index d696e1b..4934416 100644
--- a/tc/q_prio.c
+++ b/tc/q_prio.c
@@ -40,6 +40,7 @@ static int prio_parse_opt(struct qdisc_u
 	int pmap_mode = 0;
 	int idx = 0;
 	struct tc_prio_qopt opt={3,{ 1, 2, 2, 2, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1 }};
+	struct rtattr *nest;
 
 	while (argc > 0) {
 		if (strcmp(*argv, "bands") == 0) {
@@ -90,7 +91,9 @@ static int prio_parse_opt(struct qdisc_u
 			opt.priomap[idx] = opt.priomap[TC_PRIO_BESTEFFORT];
 	}
 */
-	addattr_l(n, 1024, TCA_OPTIONS, &opt, sizeof(opt));
+	nest = addattr_nest_compat(n, 1024, TCA_OPTIONS, &opt, sizeof(opt));
+	addattr32(n, 1024, TCA_PRIO_TEST, 123);
+	addattr_nest_compat_end(n, nest);
 	return 0;
 }
 
@@ -98,16 +101,20 @@ int prio_print_opt(struct qdisc_util *qu
 {
 	int i;
 	struct tc_prio_qopt *qopt;
+	struct rtattr *tb[TCA_PRIO_MAX+1];
 
 	if (opt == NULL)
 		return 0;
 
-	if (RTA_PAYLOAD(opt)  < sizeof(*qopt))
+	if (parse_rtattr_nested_compat(tb, TCA_PRIO_MAX, opt, (void *)&qopt, sizeof(*qopt)))
 		return -1;
-	qopt = RTA_DATA(opt);
+
 	fprintf(f, "bands %u priomap ", qopt->bands);
 	for (i=0; i<=TC_PRIO_MAX; i++)
 		fprintf(f, " %d", qopt->priomap[i]);
+
+	if (tb[TCA_PRIO_TEST])
+		fprintf(f, " TCA_PRIO_TEST: %u ", *(__u32 *)RTA_DATA(tb[TCA_PRIO_TEST]));
 	return 0;
 }
 


[RTNETLINK]: Add nested compat attribute

2007-06-22 Thread Patrick McHardy

This patch adds a new attribute type that can be used
to replace non-nested attributes that contain structures
by nested ones in a compatible way.

This can be used in cases like Peter's who is trying to
extend sch_prio, which currently uses a fixed structure
without any holes.

Switching to nested attributes makes sure that the next
person won't run into the same problem.


[RTNETLINK]: Add nested compat attribute

Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

The attribute looks like this:

struct {
[ compat contents ]
struct rtattr {
.rta_len= total size,
.rta_type   = type,
} rta;
struct old_structure struct;

[ nested top-level attribute ]
struct rtattr {
.rta_len= nest size,
.rta_type   = type,
} nest_attr;

[ optional 0 .. n nested attributes ]
struct rtattr {
.rta_len= private attribute len,
.rta_type   = private attribute typ,
} nested_attr;
struct nested_data data;
};

Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit dece87e23c7cfa1159d3be0ea5b0db89a0fc5872
tree c14be602de94b258e0343816b6c1809233a2ff5f
parent c4edf5d552b1450d903a7e7e2d846f2169087e10
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 17:52:21 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 17:52:21 +0200

 include/linux/rtnetlink.h |   14 ++
 net/core/rtnetlink.c  |   16 
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 6127858..6731e7f 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -570,6 +570,8 @@ static __inline__ int rtattr_strcmp(cons
 }
 
 extern int rtattr_parse(struct rtattr *tb[], int maxattr, struct rtattr *rta, int len);
+extern int rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr,
+  struct rtattr *rta, void **data, int len);
 
 #define rtattr_parse_nested(tb, max, rta) \
 	rtattr_parse((tb), (max), RTA_DATA((rta)), RTA_PAYLOAD((rta)))
@@ -638,6 +640,18 @@ #define RTA_NEST_END(skb, start) \
 ({	(start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \
 	(skb)->len; })
 
+#define RTA_NEST_COMPAT(skb, type, attrlen, data) \
+({	struct rtattr *__start = (struct rtattr *)skb_tail_pointer(skb); \
+	RTA_PUT(skb, type, attrlen, data); \
+	RTA_NEST(skb, type); \
+	__start; })
+
+#define RTA_NEST_COMPAT_END(skb, start) \
+({	struct rtattr *__nest = (void *)(start) + NLMSG_ALIGN((start)->rta_len); \
+	(start)->rta_len = skb_tail_pointer(skb) - (unsigned char *)(start); \
+	RTA_NEST_END(skb, __nest); \
+	(skb)->len; })
+
 #define RTA_NEST_CANCEL(skb, start) \
 ({	if (start) \
 		skb_trim(skb, (unsigned char *) (start) - (skb)->data); \
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 06c0c5a..c25d23b 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -97,6 +97,21 @@ int rtattr_parse(struct rtattr *tb[], in
 	return 0;
 }
 
+int rtattr_parse_nested_compat(struct rtattr *tb[], int maxattr,
+			   struct rtattr *rta, void **data, int len)
+{
+	if (RTA_PAYLOAD(rta) < len)
+		return -1;
+	*data = RTA_DATA(rta);
+
+	if (RTA_PAYLOAD(rta) >= RTA_ALIGN(len) + sizeof(struct rtattr)) {
+		rta = RTA_DATA(rta) + RTA_ALIGN(len);
+		return rtattr_parse_nested(tb, maxattr, rta);
+	}
+	memset(tb, 0, sizeof(struct rtattr *) * maxattr);
+	return 0;
+}
+
 static struct rtnl_link *rtnl_msg_handlers[NPROTO];
 
 static inline int rtm_msgindex(int msgtype)
@@ -1297,6 +1312,7 @@ void __init rtnetlink_init(void)
 EXPORT_SYMBOL(__rta_fill);
 EXPORT_SYMBOL(rtattr_strlcpy);
 EXPORT_SYMBOL(rtattr_parse);
+EXPORT_SYMBOL(rtattr_parse_nested_compat);
 EXPORT_SYMBOL(rtnetlink_put_metrics);
 EXPORT_SYMBOL(rtnl_lock);
 EXPORT_SYMBOL(rtnl_trylock);


[IPROUTE]: Add nested compat attribute

2007-06-22 Thread Patrick McHardy

This patch adds a new attribute type that can be used
to replace non-nested attributes that contain structures
by nested ones in a compatible way.

This can be used in cases like Peter's who is trying to
extend sch_prio, which currently uses a fixed structure
without any holes.

Switching to nested attributes makes sure that the next
person won't run into the same problem.


[IPROUTE]: Add nested compat attribute

Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

The attribute looks like this:

struct {
	[ compat contents ]
	struct rtattr {
		.rta_len	= total size,
		.rta_type	= type,
	} rta;
	struct old_structure struct;

	[ nested top-level attribute ]
	struct rtattr {
		.rta_len	= nest size,
		.rta_type	= type,
	} nest_attr;

	[ optional 0 .. n nested attributes ]
	struct rtattr {
		.rta_len	= private attribute len,
		.rta_type	= private attribute typ,
	} nested_attr;
	struct nested_data data;
};

Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit d283ea3c852f54941ec785ad39dbfa4586f518c7
tree d2300d75cf50897386e670591fa963acd2bbd21b
parent cd71a8e07f57a74d52e62cc1fed39c03ad64bc08
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 17:59:27 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 17:59:27 +0200

 include/libnetlink.h |5 +
 lib/libnetlink.c |   48 
 2 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/include/libnetlink.h b/include/libnetlink.h
index 49e248e..bd426e6 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -39,11 +39,16 @@ extern int rtnl_send(struct rtnl_handle 
 extern int addattr32(struct nlmsghdr *n, int maxlen, int type, __u32 data);
 extern int addattr_l(struct nlmsghdr *n, int maxlen, int type, const void *data, int alen);
 extern int addraw_l(struct nlmsghdr *n, int maxlen, const void *data, int len);
+extern struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type);
+extern int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest);
+extern struct rtattr *addattr_nest_compat(struct nlmsghdr *n, int maxlen, int type, const void *data, int len);
+extern int addattr_nest_compat_end(struct nlmsghdr *n, struct rtattr *nest);
 extern int rta_addattr32(struct rtattr *rta, int maxlen, int type, __u32 data);
 extern int rta_addattr_l(struct rtattr *rta, int maxlen, int type, const void *data, int alen);
 
 extern int parse_rtattr(struct rtattr *tb[], int max, struct rtattr *rta, int len);
 extern int parse_rtattr_byindex(struct rtattr *tb[], int max, struct rtattr *rta, int len);
+extern int parse_rtattr_nested_compat(struct rtattr *tb[], int max, struct rtattr *rta, void **data, int len);
 
 #define parse_rtattr_nested(tb, max, rta) \
 	(parse_rtattr((tb), (max), RTA_DATA(rta), RTA_PAYLOAD(rta)))
diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 555dd5c..2add7e9 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -527,6 +527,39 @@ int addraw_l(struct nlmsghdr *n, int max
 	return 0;
 }
 
+struct rtattr *addattr_nest(struct nlmsghdr *n, int maxlen, int type)
+{
+	struct rtattr *nest = NLMSG_TAIL(n);
+
+	addattr_l(n, maxlen, type, NULL, 0);
+	return nest;
+}
+
+int addattr_nest_end(struct nlmsghdr *n, struct rtattr *nest)
+{
+	nest->rta_len = (void *)NLMSG_TAIL(n) - (void *)nest;
+	return n->nlmsg_len;
+}
+
+struct rtattr *addattr_nest_compat(struct nlmsghdr *n, int maxlen, int type,
+   const void *data, int len)
+{
+	struct rtattr *start = NLMSG_TAIL(n);
+
+	addattr_l(n, maxlen, type, data, len);
+	addattr_nest(n, maxlen, type);
+	return start;
+}
+
+int addattr_nest_compat_end(struct nlmsghdr *n, struct rtattr *start)
+{
+	struct rtattr *nest = (void *)start + NLMSG_ALIGN(start->rta_len);
+
+	start->rta_len = (void *)NLMSG_TAIL(n) - (void *)start;
+	addattr_nest_end(n, nest);
+	return n->nlmsg_len;
+}
+
 int rta_addattr32(struct rtattr *rta, int maxlen, int type, __u32 data)
 {
 	int len = RTA_LENGTH(4);
@@ -589,3 +622,18 @@ int parse_rtattr_byindex(struct rtattr *
 		fprintf(stderr, "!!!Deficit %d, rta_len=%d\n", len, rta->rta_len);
 	return i;
 }
+
+int parse_rtattr_nested_compat(struct rtattr *tb[], int max, struct rtattr *rta,
+			   void **data, int len)
+{
+	if (RTA_PAYLOAD(rta) < len)
+		return -1;
+	*data = RTA_DATA(rta);
+
+	if (RTA_PAYLOAD(rta) >= RTA_ALIGN(len) + sizeof(struct rtattr)) {
+		rta = RTA_DATA(rta) + RTA_ALIGN(len);
+		return parse_rtattr_nested(tb, max, rta);
+	}
+	memset(tb, 0, sizeof(struct rtattr *) * max);
+	return 0;
+}


Re: [patch 0/7] CAN: Add new PF_CAN protocol family, try #3

2007-06-22 Thread Patrick McHardy

Oliver Hartkopp wrote:

Patrick McHardy wrote:
  

Urs Thuermann wrote:
  


* Use skb->iif instead of skb->cb to pass receiving interface from
  raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg().

  

skb->iif doesn't necessarily point to the incoming network device
as seen seen by netif_receive_skb, for layered devices it currently
always points to the first interface that received a packet.
  



This is exactly the intention.

  

Its so far also only used for traffic classification, please explain
how you're using it and what values it is set to on which paths.
  



As you might have seen in Documentation/networking/can.txt (hint, hint,
hint!) the CAN has no routing, no ARP, no MAC adressing and is a
broadcast only medium. So if there is (at least) any reasonable
addressing on CAN it consists of the CAN-frame's "CAN-Identifier" and
the CAN-bus this CAN-frame is sent/received on.

For this reason the information about the interface the CAN-frame has
been received on has to be made available to the user-application if it
needs this information. Until your hint about our skb->cb missusage, we
(successfully) transported this information inside skb->cb to
socket-level. But indeed skb->iif is the better (and in our opinion the
right) place to transport this information inside the skb to the
socket-level.
  


Lets hear Jamal's opinion on this, to be honest I never understood
how exactly it is supposed to be used.


In both cases (receiving real CAN-frames from the CAN-netdev /
performing the loopback of CAN-frames) we set skb->iif to zero to let
netif_receive_skb() set the iif-value to the current skb->dev index. So
skb->iif is set to the first interface the CAN-frame is received on,
which is what we need & intended here.

Is it the right approach to let netif_receive_skb() set the iif-value or
should we better set this value on our own before invoking netif_rx()?
  


netif_receive_skb is meant to be used as a default, the driver can
override this if it makes sense. If you touch it anyway you might
as well set it to the final value.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/7] CAN: Add new PF_CAN protocol family, try #3

2007-06-22 Thread Oliver Hartkopp
Patrick McHardy wrote:
> Urs Thuermann wrote:
>   
>> * Use skb->iif instead of skb->cb to pass receiving interface from
>>   raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg().
>> 
>
>
> skb->iif doesn't necessarily point to the incoming network device
> as seen seen by netif_receive_skb, for layered devices it currently
> always points to the first interface that received a packet.
>   

This is exactly the intention.

> Its so far also only used for traffic classification, please explain
> how you're using it and what values it is set to on which paths.
>   

As you might have seen in Documentation/networking/can.txt (hint, hint,
hint!) the CAN has no routing, no ARP, no MAC adressing and is a
broadcast only medium. So if there is (at least) any reasonable
addressing on CAN it consists of the CAN-frame's "CAN-Identifier" and
the CAN-bus this CAN-frame is sent/received on.

For this reason the information about the interface the CAN-frame has
been received on has to be made available to the user-application if it
needs this information. Until your hint about our skb->cb missusage, we
(successfully) transported this information inside skb->cb to
socket-level. But indeed skb->iif is the better (and in our opinion the
right) place to transport this information inside the skb to the
socket-level.

In both cases (receiving real CAN-frames from the CAN-netdev /
performing the loopback of CAN-frames) we set skb->iif to zero to let
netif_receive_skb() set the iif-value to the current skb->dev index. So
skb->iif is set to the first interface the CAN-frame is received on,
which is what we need & intended here.

Is it the right approach to let netif_receive_skb() set the iif-value or
should we better set this value on our own before invoking netif_rx()?

Best regards,
Oliver


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linksys Gigabit USB2.0 adapter (asix) regression

2007-06-22 Thread David Hollis
On Wed, 2007-06-20 at 13:56 +0200, Erik Slagter wrote:

> To rule out the possibility of the nic being defective, I connected the
> USB nic to a windows computer. There it works, although the ethernet
> connection is a bit flaky (just like it seems...).
> 
> Then I did a diff on the respective kernel sources of 2.6.20.3 and
> 2.6.22-rc2 (asix.c and usbnet.c), I found a few changes, but they do not
> seem to be related to my problem.
> 
> I am the and of my repertoire here, can anyone please do some
> suggestions for further testing or even better, fix it ;-)

You wouldn't happen to know what PHY that device is using?  The AX88178
(Gigabit USB Ethernet) support in the driver currently only supports the
Marvell PHY, which is the only one I've actually encountered to-date.
If you can rebuild the driver from your kernel sources but with DEBUG
enabled (uncomment it at the top of asix.c)


You can build the driver out-of-tree by creating a Makefile with these
contents:

obj-m   += asix.o

EXTRA_CFLAGS += -DDEBUG

all:
make -C /lib/modules/`uname -r`/build SUBDIRS=`pwd`

clean:
make -C /lib/modules/`uname -r`/build SUBDIRS=`pwd` clean


(You'll also need to copy usbnet.h into that directory)


After you build the module, load it with insmod ./asix.ko, plug in your
device and send me the dmesg output.  I'm particularly interested in the
PHYID=0x12345678 line.  That will tell me what PHY chip is being used in
that device and if I need to add support for it.

-- 
David Hollis <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-22 Thread Jarek Poplawski
On Fri, Jun 22, 2007 at 10:56:44AM +0200, Marcin Ślusarz wrote:
...
> When I disable on-board network card in BIOS (controlled by skge)
> ne2k-pci card is still locking up. So I think it's strictly ne2k-pci
> card bug. I made some tests and I know how to reproduce it fast (on my
> machine) - just make some heavy network traffic...
...

I'm no good at hardware, but I guess this log could be not enough.
So, if nobody will find something more sensible, maybe you can try
some of these suggestions:

- you've written it was OK with 2.6.20; it would be interesting
to check if there were any changes in config (beside new options)
or even retry 2.6.20 with "current" config after make oldconfig;
- during such problems it's better to try to turn off as much
unnecessary options/drivers as possible to find if it's really
about network driver; e.g.: no SMP, tv cards, acpi - only
basic, without options etc.;
- if possible try it with newer kernel e.g. 2.6.22-rc5;
- if possible try it with another, fresh distro (e.g. some live
CD/DVD/USB bootable);
- there was a lockdep warning from tvtime/bttv;
- try to get some more debugging (help: modinfo ne2k-pci).

Regards,
Jarek P.

PS: for anybody interested - here is the beginning of this story:
http://marc.info/?l=linux-kernel&m=118202978609968&w=2
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Sky2 driver in 2.6.22-rc5-git1-cfs-v17

2007-06-22 Thread Ian Kumlien
On tor, 2007-06-21 at 21:13 -0700, Stephen Hemminger wrote:
> On Fri, 22 Jun 2007 04:45:25 +0200
> Ian Kumlien <[EMAIL PROTECTED]> wrote:
> 
> > On tor, 2007-06-21 at 18:57 -0700, Stephen Hemminger wrote:
> > > Redirected of LKML,  netdev is the proper list.
> > 
> > Thanks =)
> > 
> > > On Thu, 21 Jun 2007 22:51:32 +0200
> > > Ian Kumlien <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Hi, 
> > > > 
> > > > recently have started to see this in my dmesg:
> > > > 
> > > > NETDEV WATCHDOG: eth0: transmit timed out
> > > > sky2 eth0: tx timeout
> > > > sky2 eth0: transmit ring 449 .. 408 report=449 done=449
> > > > sky2 eth0: disabling interface
> > > > sky2 eth0: enabling interface
> > > > sky2 eth0: ram buffer 48K
> > > > sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
> > > > 
> > > > I'm not using MSI since it seems to have caused problems in the past.
> > > > 
> > > > I run with a 9k mtu
> > > > 
> > > > sky2 eth0: transmit ring 18 .. 489 report=18 done=18
> > > >  I assume ring max is 512 (ie 1-512) since:
> > > > Ring parameters for eth0:
> > > > Current hardware settings:
> > > > RX: 168
> > > > RX Mini:0
> > > > RX Jumbo:   0
> > > > TX: 511
> > > > 
> > > > And 489 + 41 - 18 = 512
> > > > 
> > > > sky2 eth0: transmit ring 197 .. 156 report=197 done=197
> > > > sky2 eth0: transmit ring 480 .. 439 report=480 done=480
> > > > sky2 eth0: transmit ring 413 .. 372 report=413 done=413
> > > > sky2 eth0: transmit ring 320 .. 279 report=320 done=320
> > > > 
> > > > Else, they are all off by 41.
> > > > 
> > > > Is this a known bug?
> > > no
> > 
> > Damn =P
> > 
> > > > Comments? ideas?
> > > >
> > > which chip version. probably Yukon EC that seems to be the only one
> > > that does gigabit with Ram buffer.
> > 
> > sky2 :02:00.0: v1.14 addr 0xdbffc000 irq 18 Yukon-EC (0xb6) rev 2
> > 
> > > Does it work alright if you set transmit ring size smaller with ethtool?
> > > There might be an off-by-one bug in the worst case calculations about
> > > list element usage.
> > 
> > I tried this... but not with a specific size, i think i did 480, and yes
> > it timed out... any ideas on a more educated value?
> > 
> > -- 
> > Ian Kumlien  -- http://pomac.netswarm.net
> 
> Also try setting the idle_timeout module parameter to something link 10 (ms).
> It will fix problems with lost interrupts.

I have changed it now, and i'm leaving it running...

One interesting bit is that if i lowered from 511 to 510, The magic
number was 42 not 41.

-- 
Ian Kumlien  -- http://pomac.netswarm.net


signature.asc
Description: This is a digitally signed message part


Re: [patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-22 Thread Patrick McHardy
Urs Thuermann wrote:
> Patrick McHardy <[EMAIL PROTECTED]> writes:
> 
> 
>>Is there a reason why you're still doing the "allocate n devices
>>on init" thing instead of using the rtnl_link API?
> 
> 
> Sorry, it's simply a matter of time.  We have been extremely busy with
> other projects and two presentations (mgmt, customers, and press) the
> last two weeks and have worked on the other changes this week.  I'm
> sorry I haven't yet been able to look at your rtnl_link code close
> enough, but it's definitely on my todo list.  Starting on Sunday I'll
> be on a business trip to .jp for a week, and I hope I get to it in
> that week, otherwise on return.


Sorry, but busy is no reason for merging code that has deprecated
(at least by me :)) behaviour. Please change this before submitting
for inclusion.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/7] CAN: Add new PF_CAN protocol family, try #3

2007-06-22 Thread Patrick McHardy
Urs Thuermann wrote:
> * Use skb->iif instead of skb->cb to pass receiving interface from
>   raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg().


skb->iif doesn't necessarily point to the incoming network device
as seen seen by netif_receive_skb, for layered devices it currently
always points to the first interface that received a packet.

Its so far also only used for traffic classification, please explain
how you're using it and what values it is set to on which paths.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-22 Thread Urs Thuermann
Patrick McHardy <[EMAIL PROTECTED]> writes:

> Is there a reason why you're still doing the "allocate n devices
> on init" thing instead of using the rtnl_link API?

Sorry, it's simply a matter of time.  We have been extremely busy with
other projects and two presentations (mgmt, customers, and press) the
last two weeks and have worked on the other changes this week.  I'm
sorry I haven't yet been able to look at your rtnl_link code close
enough, but it's definitely on my todo list.  Starting on Sunday I'll
be on a business trip to .jp for a week, and I hope I get to it in
that week, otherwise on return.

urs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


via-rhine: Transmit timed out problem

2007-06-22 Thread Kirill Kuvaldin
Hi,

I'm experiencing a strange problem with a via rhine network card on Ubuntu 
7.04 (2.6.20-16-generic #2 SMP). The hardware seemed to come into an
inconsistent state, since rmmod'ing and modprobe'ing the via-rhine driver 
back didn't help. 

After the problem had appeared, I could see the following in dmesg:

[ 8601.971189] irq 21: nobody cared (try booting with the "irqpoll"
option)
[ 8601.971214]  [] __report_bad_irq+0x24/0x80
[ 8601.971229]  [] note_interrupt+0x25e/0x290
[ 8601.971238]  [] handle_IRQ_event+0x30/0x60
[ 8601.971245]  [] handle_fasteoi_irq+0xc1/0xf0
[ 8601.971252]  [] do_IRQ+0x40/0x80
[ 8601.971259]  [] common_interrupt+0x23/0x30
[ 8601.971269]  [] mwait_idle_with_hints+0x46/0x60
[ 8601.971276]  [] cpu_idle+0x49/0xd0
[ 8601.971289]  ===
[ 8601.971291] handlers:
[ 8601.971293] [] (usb_hcd_irq+0x0/0x60 [usbcore])
[ 8601.971311] [] (rhine_interrupt+0x0/0xb80 [via_rhine])
[ 8601.971324] Disabling IRQ #21
[ 8637.970985] NETDEV WATCHDOG: eth0: transmit timed out
[ 8637.971135] eth0: Transmit timed out, status 1003, PHY status 786d,
resetting
[ 8637.971163] via-rhine: Reset not complete yet. Trying harder.
[ 8637.971754] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[ 8640.749432] via-rhine: Reset not complete yet. Trying harder.
[ 8640.750018] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[ 8644.746689] NETDEV WATCHDOG: eth0: transmit timed out
[ 8644.746838] eth0: Transmit timed out, status 0003, PHY status 786d,
resetting
[ 8644.747446] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[ 8648.743327] NETDEV WATCHDOG: eth0: transmit timed out
[ 8648.743476] eth0: Transmit timed out, status 0003, PHY status 786d,
resetting
[ 8648.744083] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[ 8651.070635] eth0: no IPv6 routers present
[ 8670.723818] NETDEV WATCHDOG: eth0: transmit timed out
[ 8670.723968] eth0: Transmit timed out, status 0003, PHY status 786d,
resetting
[ 8670.723995] via-rhine: Reset not complete yet. Trying harder.
[ 8670.724578] eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
[ 8726.668036] NETDEV WATCHDOG: eth0: transmit timed out


The interrupt seemed to be unhandled and got disabled by the kernel
then. The transmission seemed to time out for some reason (probably, the
hardware got into an inconsistent state?).


Some related information:

[EMAIL PROTECTED]:~% lspci |grep -i rhine
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II]
(rev 7c)
[EMAIL PROTECTED]:~% uname -a
Linux coreduo 2.6.20-16-generic #2 SMP Thu Jun 7 20:19:32 UTC 2007 i686
GNU/Linux
[EMAIL PROTECTED]:~% dmesg|grep rhine
[2.982700] via-rhine.c:v1.10-LK1.4.2 Sept-11-2006 Written by Donald
Becker

Is that information sufficient for debug? Let me know if you need
any additional data.


  Kirill

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[NET 04/05]: dev: secondary unicast address support

2007-06-22 Thread Patrick McHardy
[NET]: dev: secondary unicast address support

Add support for configuring secondary unicast addresses on network
devices. To support this devices capable of filtering multiple
unicast addresses need to change their set_multicast_list function
to configure unicast filters as well and assign it to dev->set_rx_mode
instead of dev->set_multicast_list. Other devices are put into promiscous
mode when secondary unicast addresses are present.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 099e4ab74adb9418155132b093533f152a31b583
tree 7c8f52672f7b6e1323a479545225d88a2eb35670
parent 02536a101d6fd8b1924b1e05c44409c7b4568335
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 14:13:46 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 14:13:46 +0200

 include/linux/netdevice.h |   12 +++-
 net/core/dev.c|  144 -
 net/core/dev_mcast.c  |   37 +---
 3 files changed, 139 insertions(+), 54 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b2db124..46585dc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -393,6 +393,9 @@ struct net_device
unsigned char   addr_len;   /* hardware address length  
*/
unsigned short  dev_id; /* for shared network cards */
 
+   struct dev_addr_list*uc_list;   /* Secondary unicast mac 
addresses */
+   int uc_count;   /* Number of installed ucasts   
*/
+   int uc_promisc;
struct dev_addr_list*mc_list;   /* Multicast mac addresses  
*/
int mc_count;   /* Number of installed mcasts   
*/
int promiscuity;
@@ -498,6 +501,8 @@ struct net_device
void *saddr,
unsigned len);
int (*rebuild_header)(struct sk_buff *skb);
+#define HAVE_SET_RX_MODE
+   void(*set_rx_mode)(struct net_device *dev);
 #define HAVE_MULTICAST  
void(*set_multicast_list)(struct net_device *dev);
 #define HAVE_SET_MAC_ADDR   
@@ -1004,8 +1009,11 @@ extern struct net_device *alloc_netdev(int sizeof_priv, 
const char *name,
   void (*setup)(struct net_device *));
 extern int register_netdev(struct net_device *dev);
 extern voidunregister_netdev(struct net_device *dev);
-/* Functions used for multicast support */
-extern voiddev_mc_upload(struct net_device *dev);
+/* Functions used for secondary unicast and multicast support */
+extern voiddev_set_rx_mode(struct net_device *dev);
+extern void__dev_set_rx_mode(struct net_device *dev);
+extern int dev_unicast_delete(struct net_device *dev, void *addr, 
int alen);
+extern int dev_unicast_add(struct net_device *dev, void *addr, int 
alen);
 extern int dev_mc_delete(struct net_device *dev, void *addr, int 
alen, int all);
 extern int dev_mc_add(struct net_device *dev, void *addr, int 
alen, int newonly);
 extern voiddev_mc_discard(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index 1496715..50a4e1e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -942,7 +942,7 @@ int dev_open(struct net_device *dev)
/*
 *  Initialize multicasting status
 */
-   dev_mc_upload(dev);
+   dev_set_rx_mode(dev);
 
/*
 *  Wakeup transmit queue engine
@@ -2496,17 +2496,7 @@ int netdev_set_master(struct net_device *slave, struct 
net_device *master)
return 0;
 }
 
-/**
- * dev_set_promiscuity - update promiscuity count on a device
- * @dev: device
- * @inc: modifier
- *
- * Add or remove promiscuity from a device. While the count in the device
- * remains above zero the interface remains promiscuous. Once it hits zero
- * the device reverts back to normal filtering operation. A negative inc
- * value is used to drop promiscuity on the device.
- */
-void dev_set_promiscuity(struct net_device *dev, int inc)
+static void __dev_set_promiscuity(struct net_device *dev, int inc)
 {
unsigned short old_flags = dev->flags;
 
@@ -2515,7 +2505,6 @@ void dev_set_promiscuity(struct net_device *dev, int inc)
else
dev->flags |= IFF_PROMISC;
if (dev->flags != old_flags) {
-   dev_mc_upload(dev);
printk(KERN_INFO "device %s %s promiscuous mode\n",
   dev->name, (dev->flags & IFF_PROMISC) ? "entered" :
   "left");
@@ -2529,6 +2518,25 @@ void dev_set_promiscuity(struct net_device *dev, int inc)
 }
 
 /

[E1000 05/05]: Secondary unicast address support

2007-06-22 Thread Patrick McHardy
[E1000]: Secondary unicast address support

Add support for configuring secondary unicast addresses. Unicast
addresses take precendece over multicast addresses when filling
the exact address filters to avoid going to promiscous mode.
When more unicast addresses are present than filter slots,
unicast filtering is disabled and all slots can be used for
multicast addresses.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 9613e4e4017b8bb68fcdd28cf5f9ae00bff18e28
tree e19261eea046a0404af0b26e2b99725ee33ae3c2
parent 099e4ab74adb9418155132b093533f152a31b583
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 14:13:48 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 14:13:48 +0200

 drivers/net/e1000/e1000_main.c |   47 ++--
 1 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index cf8af92..716fc8f 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -149,7 +149,7 @@ static void e1000_clean_tx_ring(struct e1000_adapter 
*adapter,
 struct e1000_tx_ring *tx_ring);
 static void e1000_clean_rx_ring(struct e1000_adapter *adapter,
 struct e1000_rx_ring *rx_ring);
-static void e1000_set_multi(struct net_device *netdev);
+static void e1000_set_rx_mode(struct net_device *netdev);
 static void e1000_update_phy_info(unsigned long data);
 static void e1000_watchdog(unsigned long data);
 static void e1000_82547_tx_fifo_stall(unsigned long data);
@@ -513,7 +513,7 @@ static void e1000_configure(struct e1000_adapter *adapter)
struct net_device *netdev = adapter->netdev;
int i;
 
-   e1000_set_multi(netdev);
+   e1000_set_rx_mode(netdev);
 
e1000_restore_vlan(adapter);
e1000_init_manageability(adapter);
@@ -924,7 +924,7 @@ e1000_probe(struct pci_dev *pdev,
netdev->stop = &e1000_close;
netdev->hard_start_xmit = &e1000_xmit_frame;
netdev->get_stats = &e1000_get_stats;
-   netdev->set_multicast_list = &e1000_set_multi;
+   netdev->set_rx_mode = &e1000_set_rx_mode;
netdev->set_mac_address = &e1000_set_mac;
netdev->change_mtu = &e1000_change_mtu;
netdev->do_ioctl = &e1000_ioctl;
@@ -2412,21 +2412,22 @@ e1000_set_mac(struct net_device *netdev, void *p)
 }
 
 /**
- * e1000_set_multi - Multicast and Promiscuous mode set
+ * e1000_set_rx_mode - Secondary Unicast, Multicast and Promiscuous mode set
  * @netdev: network interface device structure
  *
- * The set_multi entry point is called whenever the multicast address
- * list or the network interface flags are updated.  This routine is
- * responsible for configuring the hardware for proper multicast,
+ * The set_rx_mode entry point is called whenever the unicast or multicast
+ * address lists or the network interface flags are updated. This routine is
+ * responsible for configuring the hardware for proper unicast, multicast,
  * promiscuous mode, and all-multi behavior.
  **/
 
 static void
-e1000_set_multi(struct net_device *netdev)
+e1000_set_rx_mode(struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
-   struct dev_mc_list *mc_ptr;
+   struct dev_addr_list *uc_ptr;
+   struct dev_addr_list *mc_ptr;
uint32_t rctl;
uint32_t hash_value;
int i, rar_entries = E1000_RAR_ENTRIES;
@@ -2449,9 +2450,16 @@ e1000_set_multi(struct net_device *netdev)
rctl |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
} else if (netdev->flags & IFF_ALLMULTI) {
rctl |= E1000_RCTL_MPE;
-   rctl &= ~E1000_RCTL_UPE;
} else {
-   rctl &= ~(E1000_RCTL_UPE | E1000_RCTL_MPE);
+   rctl &= ~E1000_RCTL_MPE;
+   }
+
+   uc_ptr = NULL;
+   if (netdev->uc_count > rar_entries - 1) {
+   rctl |= E1000_RCTL_UPE;
+   } else if (!(netdev->flags & IFF_PROMISC)) {
+   rctl &= ~E1000_RCTL_UPE;
+   uc_ptr = netdev->uc_list;
}
 
E1000_WRITE_REG(hw, RCTL, rctl);
@@ -2461,7 +2469,10 @@ e1000_set_multi(struct net_device *netdev)
if (hw->mac_type == e1000_82542_rev2_0)
e1000_enter_82542_rst(adapter);
 
-   /* load the first 14 multicast address into the exact filters 1-14
+   /* load the first 14 addresses into the exact filters 1-14. Unicast
+* addresses take precedence to avoid disabling unicast filtering
+* when possible.
+*
 * RAR 0 is used for the station MAC adddress
 * if there are not 14 addresses, go ahead and clear the filters
 * -- with 82571 controllers only 0-13 entries are filled here
@@ -2469,8 +2480,11 @@ e1000_set_multi(struct net_device *netdev)
mc_ptr = netdev->mc_list;
 
for (i = 1; i < rar_entries; i++) {
-   if (m

[NET 03/05]: dev_mcast: switch to generic net_device address lists

2007-06-22 Thread Patrick McHardy
[NET]: dev_mcast: switch to generic net_device address lists

Use generic net_device address lists for multicast list handling.
Some defines are used to keep drivers working.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 02536a101d6fd8b1924b1e05c44409c7b4568335
tree 6624b4f7f6fb0b10bac091ca43b733dfd1609afc
parent 6d8fd140951de7cc8faab4922dba74dd1db3cae5
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 03:25:28 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 03:25:28 +0200

 include/linux/netdevice.h |   17 +++-
 net/core/dev_mcast.c  |   96 +++--
 2 files changed, 22 insertions(+), 91 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3785a8a..b2db124 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -189,15 +189,12 @@ struct dev_addr_list
 /*
  * We tag multicasts with these structures.
  */
- 
-struct dev_mc_list
-{  
-   struct dev_mc_list  *next;
-   __u8dmi_addr[MAX_ADDR_LEN];
-   unsigned char   dmi_addrlen;
-   int dmi_users;
-   int dmi_gusers;
-};
+
+#define dev_mc_listdev_addr_list
+#define dmi_addr   da_addr
+#define dmi_addrlenda_addrlen
+#define dmi_users  da_users
+#define dmi_gusers da_gusers
 
 struct hh_cache
 {
@@ -396,7 +393,7 @@ struct net_device
unsigned char   addr_len;   /* hardware address length  
*/
unsigned short  dev_id; /* for shared network cards */
 
-   struct dev_mc_list  *mc_list;   /* Multicast mac addresses  
*/
+   struct dev_addr_list*mc_list;   /* Multicast mac addresses  
*/
int mc_count;   /* Number of installed mcasts   
*/
int promiscuity;
int allmulti;
diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c
index 80bb2e3..7029074 100644
--- a/net/core/dev_mcast.c
+++ b/net/core/dev_mcast.c
@@ -102,47 +102,20 @@ void dev_mc_upload(struct net_device *dev)
 
 int dev_mc_delete(struct net_device *dev, void *addr, int alen, int glbl)
 {
-   int err = 0;
-   struct dev_mc_list *dmi, **dmip;
+   int err;
 
netif_tx_lock_bh(dev);
+   err = __dev_addr_delete(&dev->mc_list, addr, alen, glbl);
+   if (!err) {
+   dev->mc_count--;
 
-   for (dmip = &dev->mc_list; (dmi = *dmip) != NULL; dmip = &dmi->next) {
/*
-*  Find the entry we want to delete. The device could
-*  have variable length entries so check these too.
+*  We have altered the list, so the card
+*  loaded filter is now wrong. Fix it
 */
-   if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 &&
-   alen == dmi->dmi_addrlen) {
-   if (glbl) {
-   int old_glbl = dmi->dmi_gusers;
-   dmi->dmi_gusers = 0;
-   if (old_glbl == 0)
-   break;
-   }
-   if (--dmi->dmi_users)
-   goto done;
-
-   /*
-*  Last user. So delete the entry.
-*/
-   *dmip = dmi->next;
-   dev->mc_count--;
-
-   kfree(dmi);
-
-   /*
-*  We have altered the list, so the card
-*  loaded filter is now wrong. Fix it
-*/
-   __dev_mc_upload(dev);
-
-   netif_tx_unlock_bh(dev);
-   return 0;
-   }
+
+   __dev_mc_upload(dev);
}
-   err = -ENOENT;
-done:
netif_tx_unlock_bh(dev);
return err;
 }
@@ -153,46 +126,15 @@ done:
 
 int dev_mc_add(struct net_device *dev, void *addr, int alen, int glbl)
 {
-   int err = 0;
-   struct dev_mc_list *dmi, *dmi1;
-
-   dmi1 = kmalloc(sizeof(*dmi), GFP_ATOMIC);
+   int err;
 
netif_tx_lock_bh(dev);
-   for (dmi = dev->mc_list; dmi != NULL; dmi = dmi->next) {
-   if (memcmp(dmi->dmi_addr, addr, dmi->dmi_addrlen) == 0 &&
-   dmi->dmi_addrlen == alen) {
-   if (glbl) {
-   int old_glbl = dmi->dmi_gusers;
-   dmi->dmi_gusers = 1;
-   if (old_glbl)
-   goto done;
-   }
-   dmi->dmi_users++;
-   goto done;
-   }
-   }
-
-   if ((dmi = dmi1) == NULL) {
-   netif_tx_unlock_bh(dev);
-

[NET 01/05]: dev_mcast: unexport dev_mc_upload

2007-06-22 Thread Patrick McHardy
[NET]: dev_mcast: unexport dev_mc_upload

dev_mc_add/dev_mc_delete take care of uploading the list when
necessary and thats the only interface other code should use.
Also remove two incorrect calls in DECnet.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit cdf660f0bd4cca9d2cbe86a31adc60d6fa8a60ec
tree 2f08c8240b7da9b17725896c3f7eb9c7a960c92c
parent 45da27ba265dba3c740c45d47f584c30d7066f82
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 00:56:00 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 00:56:00 +0200

 net/core/dev_mcast.c |1 -
 net/decnet/dn_dev.c  |3 ---
 2 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c
index 5a54053..80bb2e3 100644
--- a/net/core/dev_mcast.c
+++ b/net/core/dev_mcast.c
@@ -292,4 +292,3 @@ void __init dev_mcast_init(void)
 
 EXPORT_SYMBOL(dev_mc_add);
 EXPORT_SYMBOL(dev_mc_delete);
-EXPORT_SYMBOL(dev_mc_upload);
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index ab41c18..e31549e 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -461,7 +461,6 @@ static int dn_dev_insert_ifa(struct dn_dev *dn_db, struct 
dn_ifaddr *ifa)
if (ifa->ifa_local != dn_eth2dn(dev->dev_addr)) {
dn_dn2eth(mac_addr, ifa->ifa_local);
dev_mc_add(dev, mac_addr, ETH_ALEN, 0);
-   dev_mc_upload(dev);
}
}
 
@@ -1064,8 +1063,6 @@ static int dn_eth_up(struct net_device *dev)
else
dev_mc_add(dev, dn_rt_all_rt_mcast, ETH_ALEN, 0);
 
-   dev_mc_upload(dev);
-
dn_db->use_long = 1;
 
return 0;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[NET 02/05]: dev: introduce generic net_device address lists

2007-06-22 Thread Patrick McHardy
[NET]: dev: introduce generic net_device address lists

Introduce struct dev_addr_list and list maintenance functions
based on dev_mc_list and the related functions. This will be
used by follow-up patches for both multicast and secondary
unicast addresses.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 6d8fd140951de7cc8faab4922dba74dd1db3cae5
tree b80412116a867d544808f140e76cdf22bbc8b248
parent cdf660f0bd4cca9d2cbe86a31adc60d6fa8a60ec
author Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 03:25:26 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Fri, 22 Jun 2007 03:25:26 +0200

 include/linux/netdevice.h |   11 +++
 net/core/dev.c|   69 +
 2 files changed, 80 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e7913ee..3785a8a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -177,6 +177,14 @@ struct netif_rx_stats
 
 DECLARE_PER_CPU(struct netif_rx_stats, netdev_rx_stat);
 
+struct dev_addr_list
+{
+   struct dev_addr_list*next;
+   u8  da_addr[MAX_ADDR_LEN];
+   u8  da_addrlen;
+   int da_users;
+   int da_gusers;
+};
 
 /*
  * We tag multicasts with these structures.
@@ -1004,6 +1012,9 @@ extern void   dev_mc_upload(struct net_device 
*dev);
 extern int dev_mc_delete(struct net_device *dev, void *addr, int 
alen, int all);
 extern int dev_mc_add(struct net_device *dev, void *addr, int 
alen, int newonly);
 extern voiddev_mc_discard(struct net_device *dev);
+extern int __dev_addr_delete(struct dev_addr_list **list, void 
*addr, int alen, int all);
+extern int __dev_addr_add(struct dev_addr_list **list, void *addr, 
int alen, int newonly);
+extern void__dev_addr_discard(struct dev_addr_list **list);
 extern voiddev_set_promiscuity(struct net_device *dev, int inc);
 extern voiddev_set_allmulti(struct net_device *dev, int inc);
 extern voidnetdev_state_change(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index 2609062..1496715 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2551,6 +2551,75 @@ void dev_set_allmulti(struct net_device *dev, int inc)
dev_mc_upload(dev);
 }
 
+int __dev_addr_delete(struct dev_addr_list **list, void *addr, int alen,
+ int glbl)
+{
+   struct dev_addr_list *da;
+
+   for (; (da = *list) != NULL; list = &da->next) {
+   if (memcmp(da->da_addr, addr, da->da_addrlen) == 0 &&
+   alen == da->da_addrlen) {
+   if (glbl) {
+   int old_glbl = da->da_gusers;
+   da->da_gusers = 0;
+   if (old_glbl == 0)
+   break;
+   }
+   if (--da->da_users)
+   return 0;
+
+   *list = da->next;
+   kfree(da);
+   return 0;
+   }
+   }
+   return -ENOENT;
+}
+
+int __dev_addr_add(struct dev_addr_list **list, void *addr, int alen, int glbl)
+{
+   struct dev_addr_list *da;
+
+   for (da = *list; da != NULL; da = da->next) {
+   if (memcmp(da->da_addr, addr, da->da_addrlen) == 0 &&
+   da->da_addrlen == alen) {
+   if (glbl) {
+   int old_glbl = da->da_gusers;
+   da->da_gusers = 1;
+   if (old_glbl)
+   return 0;
+   }
+   da->da_users++;
+   return 0;
+   }
+   }
+
+   da = kmalloc(sizeof(*da), GFP_ATOMIC);
+   if (da == NULL)
+   return -ENOMEM;
+   memcpy(da->da_addr, addr, alen);
+   da->da_addrlen = alen;
+   da->da_users = 1;
+   da->da_gusers = glbl ? 1 : 0;
+   da->next = *list;
+   *list = da;
+   return 0;
+}
+
+void __dev_addr_discard(struct dev_addr_list **list)
+{
+   struct dev_addr_list *tmp;
+
+   while (*list != NULL) {
+   tmp = *list;
+   *list = tmp->next;
+   if (tmp->da_users > tmp->da_gusers)
+   printk("__dev_addr_discard: address leakage! "
+  "da_users=%d\n", tmp->da_users);
+   kfree(tmp);
+   }
+}
+
 unsigned dev_get_flags(const struct net_device *dev)
 {
unsigned flags;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[NET 00/05]: Secondary unicast address support v2

2007-06-22 Thread Patrick McHardy
This is an updated version of the secondary unicast address patches. I've
introduced a common structure and helpers for both unicast and multicast
addresses to make it easier for virtual software devices that want to
synchronize addresses to a lower device to reuse code. Additionally I
fixed a deadlock when putting the device into promiscous mode, renamed
dev->set_address_list to dev->set_rx_mode and cleaned the code up a bit.

One remaining question is how to handle the case that too many unicast
addresses are configured and the device is put into promiscous mode
or unicast filtering is disabled by the driver. In that case we're not
getting the message that is normally printed by dev_set_promiscous
and no audit log. Not sure if that can already happen when configuring
multicast, I thought it was worth mentioning.



 drivers/net/e1000/e1000_main.c |   47 ++---
 include/linux/netdevice.h  |   40 +---
 net/core/dev.c |  213 ---
 net/core/dev_mcast.c   |  128 +++-
 net/decnet/dn_dev.c|3 -
 5 files changed, 269 insertions(+), 162 deletions(-)

Patrick McHardy (5):
  [NET]: dev_mcast: unexport dev_mc_upload
  [NET]: dev: introduce generic net_device address lists
  [NET]: dev_mcast: switch to generic net_device address lists
  [NET]: dev: secondary unicast address support
  [E1000]: Secondary unicast address support
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-22 Thread Ben Greear

Eric W. Biederman wrote:

Ben Greear <[EMAIL PROTECTED]> writes:


Patrick McHardy wrote:

Eric W. Biederman wrote:

For the macvlan code do we need to do anything special if we transmit
to a mac we would normally receive?  Another unicast mac of the same
nic for example.

That doesn't happen under normal circumstances. I don't believe
it would work.

Assuming you mean you want to send between two mac-vlans on the same physical
nic...

This can work if your mac-vlans are on different subnets and you are
routing between them (and if you have my send-to-self patch or have
another way to let a system send packets to itself).


Ok.  I didn't know if you could trigger this case without without
having then endpoints in separate namespaces.  I was suspecting
the routing code would realize what we were doing realize the
route is local and route through lo.


The routing code will short-circuit by default.  It takes quite
a bit of effort to make them _not_ short circuit..that is what I
was talking about.  Mac-vlans will be just like any
other ethernet nics as far as routing goes.




A normal ethernet switch will NOT turn a packet around on the same
interface it was received, so that is why you must have them on different
subnets and have a router in between.


Yes.  That is essentially the configuration I was wondering about.


For sending directly to yourself, something like the 'veth' driver
is probably more useful.


True.  And I think it has a place.  However the common case with
the tunnel devices is to just hook them all up to an ethernet
bridge as well as a real ethernet device.

The far ends of the ethernet tunnels are dropped into different namespaces.

Which gets a very similar effect to the mac vlan code.

I'm just wondering if I can not setup an ethernet tunnel device
when my primary purpose is to talk to the outside world, but occasionally
want a little in the box traffic.


mac-vlans should work on veth devices just fine, and the veths will also
short-circuit route (at least if they are in the same namespace).

I'm not sure I understand what you are trying to do..but in general
both veth and mac-vlans should act like ethernet nics..so if you can
find some way that does _not_ hold, please let us know.

Thanks,
Ben




Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-22 Thread Patrick McHardy
Ben Greear wrote:
> Patrick McHardy wrote:
> 
>> Eric W. Biederman wrote:
>>
>>> For the macvlan hash you just use an upper byte.  Is that just a
>>> simple starting place, or do we not need a more complex hash.
>>>   
>>
>>
>> That gave me an idea, since the default addresses are random
>> anyway I'm now using an incrementing counter for the upper byte.
> 
> 
> Is there not a (relatively) easy way to hash the entire 6 bytes?
> 
> I'd prefer to be able to set the MACs to anything I want, without
> worrying about trivially hitting a worst-case hash scenario.


That would only happen if all your addresses have the same high
byte. I can't see a reason why you would want to do this, even
with manually configured addresses its still reasonable to
expect a uniform distribution.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] alpha: fix alignment problem in csum_ipv6_magic()

2007-06-22 Thread Ivan Kokshaysky
On Thu, Jun 21, 2007 at 04:35:01PM -0700, Andrew Morton wrote:
> In http://bugzilla.kernel.org/show_bug.cgi?id=8659, Dustin is reporting
> that this patch broke tcp-on-ipv6.

Oops. Two instructions operating on the 'len' arg ($18) got swapped...
This should fix ev6 version, ev5 one seems to be ok.

Signed-off-by: Ivan Kokshaysky <[EMAIL PROTECTED]>

Ivan.

--- 2.6.22-rc4-mm2/arch/alpha/lib/ev6-csum_ipv6_magic.S Fri Jun 22 15:02:23 2007
+++ linux/arch/alpha/lib/ev6-csum_ipv6_magic.S  Fri Jun 22 15:05:38 2007
@@ -76,18 +76,18 @@ csum_ipv6_magic:
 
cmoveq  $6,$31,$22  # E : src aligned?
ldq_u   $23,15($17) # L : Latency: 3
-   or  $18,$4,$18  # E : 00CCDDAABBCC
-   extql   $1,$6,$1# U : U L L U :
+   inswl   $18,3,$18   # U : 00CCDD00
+   addl$19,$7,$19  # E : U L U L : bbaabb00
 
or  $0,$22,$0   # E : 1st src word complete
-   extqh   $5,$6,$5# U :
-   addl$19,$7,$19  # E : bbaabb00
-   and $17,7,$6# E : L U L U : dst misalignment
+   extql   $1,$6,$1# U :
+   or  $18,$4,$18  # E : 00CCDDAABBCC
+   extqh   $5,$6,$5# U : L U L U
 
-   inswl   $18,3,$18   # U : 00CCDD00
-   or  $1,$5,$1# E : 2nd src word complete
+   and $17,7,$6# E : dst misalignment
extql   $2,$6,$2# U :
-   extqh   $3,$6,$22   # U : U L U U :
+   or  $1,$5,$1# E : 2nd src word complete
+   extqh   $3,$6,$22   # U : L U L U :
 
cmoveq  $6,$31,$22  # E : dst aligned?
extql   $3,$6,$3# U :
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] NetXen: Fix MSI issues by using PCI function 0

2007-06-22 Thread Michael Buesch
On Friday 22 June 2007 10:40:41 Mithlesh Thukral wrote:
> NetXen: Fix issue of MSI not working correctly
> NetXen driver uses PCI function 0 to provide the functionality of MSI.
> The patch makes driver check the bus master bit for function 0 and
> enable it after the card initialization.
> 
> Signed-off-by: Milan Bag <[EMAIL PROTECTED]>
> Signed-off-by: Wen Xiong <[EMAIL PROTECTED]>
> Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]>
> 
> ---
> 
>  drivers/net/netxen/netxen_nic_main.c |   13 ++---
>  1 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/netxen/netxen_nic_main.c 
> b/drivers/net/netxen/netxen_nic_main.c
> index 6167b58..e68356b 100644
> --- a/drivers/net/netxen/netxen_nic_main.c
> +++ b/drivers/net/netxen/netxen_nic_main.c
> @@ -355,13 +355,6 @@ #endif
>   /* initialize the adapter */
>   netxen_initialize_adapter_hw(adapter);
>  
> -#ifdef CONFIG_PPC
> - if ((adapter->ahw.boardcfg.board_type ==
> - NETXEN_BRDTYPE_P2_SB31_10G_IMEZ) &&
> - (pci_func_id == 2))
> - goto err_out_free_adapter;
> -#endif /* CONFIG_PPC */
> -
>   /*
>*  Adapter in our case is quad port so initialize it before
>*  initializing the ports
> @@ -509,6 +502,12 @@ #endif
>   NETXEN_CAM_RAM(0x1fc)));
>   if (val == 0x) {
>   /* This is the first boot after power up */
> + netxen_nic_read_w0(adapter, NETXEN_PCIE_REG(0x4), &val);
> + if (!(val & 0x4)) {
> + val |= 0x4;
> + netxen_nic_write_w0(adapter, NETXEN_PCIE_REG(0x4), val);
> + mdelay(100);
> + }

msleep()?
Or wait, what is this delay trying to do? Commit the register access?
The better way to commit a register write is to read-back the value, usually.

-- 
Greetings Michael.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-22 Thread Patrick McHardy
Urs Thuermann wrote:
> This patch adds the virtual CAN bus (vcan) network driver.
> The vcan device is just a loopback device for CAN frames, no
> real CAN hardware is involved.


Is there a reason why you're still doing the "allocate n devices
on init" thing instead of using the rtnl_link API?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] NetXen: Fix the rmmod error on PBlades due incorrect cleanup

2007-06-22 Thread Michael Buesch
On Friday 22 June 2007 10:42:38 Mithlesh Thukral wrote:
> diff --git a/drivers/net/netxen/netxen_nic_hw.c 
> b/drivers/net/netxen/netxen_nic_hw.c
> index 4e958c9..f0df6fb 100644
> --- a/drivers/net/netxen/netxen_nic_hw.c
> +++ b/drivers/net/netxen/netxen_nic_hw.c
> @@ -378,6 +378,7 @@ int netxen_nic_hw_resources(struct netxe
>  crb_rcvpeg_state));
>   while (state != PHAN_PEG_RCV_INITIALIZED && loops < 20) {
>   udelay(100);
> + schedule();
>   /* Window 1 call */
>   state = readl(NETXEN_CRB_NORMALIZE(adapter,
>  recv_crb_registers

Better do msleep(1); instead of udelay+schedule.

> @@ -700,7 +701,7 @@ void netxen_nic_pci_change_crbwindow(str
>   adapter->curr_window = 0;
>  }
>  
> -void netxen_load_firmware(struct netxen_adapter *adapter)
> +int netxen_load_firmware(struct netxen_adapter *adapter)
>  {
>   int i;
>   u32 data, size = 0;
> @@ -712,15 +713,25 @@ void netxen_load_firmware(struct netxen_
>   writel(1, NETXEN_CRB_NORMALIZE(adapter, NETXEN_ROMUSB_GLB_CAS_RST));
>  
>   for (i = 0; i < size; i++) {
> - if (netxen_rom_fast_read(adapter, flashaddr, (int *)&data) != 
> 0) {
> - DPRINTK(ERR,
> - "Error in netxen_rom_fast_read(). Will skip"
> - "loading flash image\n");
> - return;
> + while (netxen_rom_fast_read(adapter, flashaddr, (int *)&data) 
> != 0) {
> + long timeout = 2 * HZ;
> + while (timeout) {
> + if (signal_pending(current)) {
> + printk( "%s: Got a signal, exiting\n", 
> __FUNCTION__ );
> + return -1;
> + }
> + set_current_state(TASK_INTERRUPTIBLE);
> + timeout = schedule_timeout(timeout);
> + }

You're opencoding msleep_interruptible() here?
And this sleeps two seconds between each rom-read attempt. Is that
really your intention?
I'd say better attempt to read more often and sleep less each time.

>   off = netxen_nic_pci_set_window(adapter, memaddr);
>   addr = pci_base_offset(adapter, off);
>   writel(data, addr);
> + while (readl(addr) != data) {
> + mdelay(100);
> + writel(data, addr);
> + }

Add a timeout. Else this will result in a system hang, if the hardware is 
faulty.

> diff --git a/drivers/net/netxen/netxen_nic_init.c 
> b/drivers/net/netxen/netxen_nic_init.c
> index 15f6dc5..8f5f4f8 100644
> --- a/drivers/net/netxen/netxen_nic_init.c
> +++ b/drivers/net/netxen/netxen_nic_init.c
> @@ -408,8 +408,12 @@ static inline int
>  do_rom_fast_read(struct netxen_adapter *adapter, int addr, int *valp)
>  {
>   if (jiffies > (last_schedule_time + (8 * HZ))) {
> - last_schedule_time = jiffies;
> - schedule();
> + if (last_schedule_time) {
> + last_schedule_time = jiffies;
> + schedule();
> + } else {
> + last_schedule_time = jiffies;
> + }

Why this strange thing?
I'd simply call cond_resched() instead of all this custom
schedule timekeeping. That's best for system latency.

> -void netxen_phantom_init(struct netxen_adapter *adapter, int pegtune_val)
> +int netxen_phantom_init(struct netxen_adapter *adapter, int pegtune_val)
>  {
>   u32 val = 0;
> - int loops = 0;
>  
>   if (!pegtune_val) {
> - val = readl(NETXEN_CRB_NORMALIZE(adapter, CRB_CMDPEG_STATE));
> - while (val != PHAN_INITIALIZE_COMPLETE && 
> - val != PHAN_INITIALIZE_ACK && loops < 20) {
> - udelay(100);
> - schedule();
> - val =
> - readl(NETXEN_CRB_NORMALIZE
> + do {
> + long timeout = 10 * HZ;
> + while (timeout) {
> + if (signal_pending(current)) {
> + printk(KERN_INFO"%s: Got a signal, 
> exiting\n", __FUNCTION__ );
> + printk(KERN_INFO"%s: val=0x%x, 
> pegtune_val=0x%x\n", __FUNCTION__,
> + val, pegtune_val );
> + return -1;
> + }
> + set_current_state(TASK_INTERRUPTIBLE);
> + timeout = schedule_timeout(timeout);
> + }
> + val = readl(NETXEN_CRB_NORMALIZE
> (adapter, CRB_CMDPEG_STATE));

msleep_interruptible()?

> @@ -1278,11 +1

Re: Fwd: [PATCH] [-mm] ACPI: export ACPI events via netlink

2007-06-22 Thread Johannes Berg
On Thu, 2007-06-21 at 11:47 -0400, jamal wrote:
> On Wed, 2007-20-06 at 13:25 +0200, Johannes Berg wrote:
> 
> > Ok. That's definitely a bug in nl80211 as we have it in development
> > right now. 
> 
> Sorry, have never looked at that code.

No worries, I was just stating that.

> You can use setsockopt to set the multicast groups. What you cant do
> with that is subscribe to many groups in one shot.
> The call in iproute2 hasnt reflected this reality yet.

Ah, ok, I see now. I was under the impression that groups was always
just a u32.

> > I'd really like to be able to reserve multicast groups with special
> > semantics too, especially I might want to permit/deny non-CAP_NET_ADMIN
> > users from binding specific multicast groups. That isn't actually
> > possible with netlink nor genetlink right now afaict.
> 
> This would be hard - but doable via SELinux interface. I think you
> should be able to extend your tool to make calls to that interface.

Why do you think that would be hard? It'd basically just mean replacing
the netlink_capable(sock, NL_NONROOT_RECV) calls with a call that
actually tests depending on the group(s) it wants.

> > If we register multiple IDs then we'll end up filling up the generic
> > netlink family space really soon. 
> 
> Theres a huge number of these groups; and not just that, but considering
> that some genetlink users may not be interested in such multicast
> groups, it is quiet usable to have many groups as long as we avoid
> conflict.

Yeah, never mind, I thought that the number of groups was limited to 32.

> The multicast issue wasnt well-attacked. We have a group magically
> assigned to a user based on their allocated id. It should be feasible
> to add an API to the kernel for registering for many groups and allow
> user space to discover these groups before registering. Maybe thats
> the path to proceed to.

Yeah, sounds reasonable, you could ask the controller for which groups
are attached to a family and then get the IDs for those groups by name.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [WIP][PATCHES] Network xmit batching

2007-06-22 Thread Evgeniy Polyakov
On Thu, Jun 21, 2007 at 02:00:07PM -0700, Rick Jones ([EMAIL PROTECTED]) wrote:
> > Simple test included test -> desktop and vice versa traffic with 128 and
> > 4096 block size in netperf-2.4.3 setup.
> 
> Is that in conjunction with setting the test-specific -D to set
> TCP_NODELAY, or was Nagle left-on?  If the latter, perhaps timing issues
> could be why the confidence intervals weren't hit since the relative
> batching of 128byte sends into larger segments is something of a race.

I used this parameters:
netperf -l 60 -H kano -t TCP_STREAM -i 10,2 -I 99,5 -- -m 128 -s 128K
-S 128K

so without nodelay.

With nodelay I've gotten:
batch-128: 128.91 mbit/sec
mainline-128: 140.57 mbit/sec

which is about 5 times less than withouth nodelay (~760 mbit/s)
Although nodelay results look more realistic.


> rick jones

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ipv4: Only destroy inet devices when we receive an NETDEV_UNREGISTER event

2007-06-22 Thread Eric W. Biederman

Never mind.  I saw this and I thought it was an old obscure bug.
But it appears it is a new condition, that has already been
fixed.

Eric


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20->2.6.21 - networking dies after random time

2007-06-22 Thread Marcin Ślusarz

2007/6/19, Jarek Poplawski <[EMAIL PROTECTED]>:

On Mon, Jun 18, 2007 at 08:10:00AM -0700, Stephen Hemminger wrote:
> On Mon, 18 Jun 2007 13:08:49 +0200
> Jarek Poplawski <[EMAIL PROTECTED]> wrote:
>
> > On 16-06-2007 23:35, Marcin .lusarz wrote:
> > > hi
> > > after upgrading kernel from 2.6.20 to 2.6.21.3 i'm experiencing really
> > > strange problem - my _both_ network cards dies after random uptime -
> > > sometimes it's a few minutes, sometimes hours, sometimes it does not
> > > happen for a couple of days...
> > > today it happened for the first time without nvidia module and almost
> > > immediately after system start
> > >
> > > here is the output of some commands which might help debug this:
...
> > It looks like skge driver enables different device than probbed.
> > Maybe you've something old/wrong about eth0/eth1 in /etc configs?
>
> More likely it is just user level device renaming. Most distro's
> rename devices (if needed) using udev.

On the other hand it's interesting, why it's not always, and why
sometimes it took so long?


I'm sorry for delay, but i was offline for the last week and probably
will for some time :|

When I disable on-board network card in BIOS (controlled by skge)
ne2k-pci card is still locking up. So I think it's strictly ne2k-pci
card bug. I made some tests and I know how to reproduce it fast (on my
machine) - just make some heavy network traffic...

As I'm offline right now I can't bisect it, but i turned on more
debugging, maybe you can deduce something...

[0.00] Linux version 2.6.21.3 ([EMAIL PROTECTED]) (gcc version 4.1.2
(Gentoo 4.1.2)) #4 PREEMPT Wed Jun 20 22:37:05 CEST 2007
[0.00] Command line: root=/dev/sda5 video=vesafb vga=794
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009fc00 (usable)
[0.00]  BIOS-e820: 0009fc00 - 000a (reserved)
[0.00]  BIOS-e820: 000e4000 - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 3ffb (usable)
[0.00]  BIOS-e820: 3ffb - 3ffc (ACPI data)
[0.00]  BIOS-e820: 3ffc - 3fff (ACPI NVS)
[0.00]  BIOS-e820: 3fff - 4000 (reserved)
[0.00]  BIOS-e820: ff78 - 0001 (reserved)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 262064) 1 entries of 256 used
[0.00] end_pfn_map = 1048576
[0.00] DMI 2.3 present.
[0.00] ACPI: RSDP 000FA810, 0021 (r2 ACPIAM)
[0.00] ACPI: XSDT 3FFB0100, 003C (r1 A M I  OEMXSDT  1427
MSFT   97)
[0.00] ACPI: FACP 3FFB0290, 00F4 (r3 A M I  OEMFACP  1427
MSFT   97)
[0.00] ACPI: DSDT 3FFB03E0, 38A1 (r1  A0036 A00360011
MSFT  10D)
[0.00] ACPI: FACS 3FFC, 0040
[0.00] ACPI: APIC 3FFB0390, 004A (r1 A M I  OEMAPIC  1427
MSFT   97)
[0.00] ACPI: OEMB 3FFC0040, 003F (r1 A M I  OEMBIOS  1427
MSFT   97)
[0.00] Entering add_active_range(0, 0, 159) 0 entries of 256 used
[0.00] Entering add_active_range(0, 256, 262064) 1 entries of 256 used
[0.00] Zone PFN ranges:
[0.00]   DMA 0 -> 4096
[0.00]   DMA324096 ->  1048576
[0.00]   Normal1048576 ->  1048576
[0.00] early_node_map[2] active PFN ranges
[0.00] 0:0 ->  159
[0.00] 0:  256 ->   262064
[0.00] On node 0 totalpages: 261967
[0.00]   DMA zone: 56 pages used for memmap
[0.00]   DMA zone: 2549 pages reserved
[0.00]   DMA zone: 1394 pages, LIFO batch:0
[0.00]   DMA32 zone: 3526 pages used for memmap
[0.00]   DMA32 zone: 254442 pages, LIFO batch:31
[0.00]   Normal zone: 0 pages used for memmap
[0.00] Looks like a VIA chipset. Disabling IOMMU. Override
with iommu=allowed
[0.00] ACPI: PM-Timer IO Port: 0x808
[0.00] ACPI: Local APIC address 0xfee0
[0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[0.00] Processor #0 (Bootup-CPU)
[0.00] ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 1, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: IRQ0 used by override.
[0.00] ACPI: IRQ2 used by override.
[0.00] ACPI: IRQ9 used by override.
[0.00] Setting APIC routing to flat
[0.00] Using ACPI (MADT) for SMP configuration information
[0.00] Nosave address range: 0009f000 - 000a
[0.00] Nosave address range: 000a - 000e4000
[0.00] Nosave address range: 000e4000 - 0010
[0.00] Allocating PCI resources starting at 5000 (gap:
4000:bf78)
[0.00] Built 1 zonelist

[PATCH 3/3] NetXen: Fix the rmmod error on PBlades due incorrect cleanup

2007-06-22 Thread Mithlesh Thukral
NetXen: Unload graceful unloading of NetXen driver.
To allow graceful handing of Netxen module load/unload sequences,
modified code allows driver close routine to be invoked via
unregister_netdev() call in driver remove routine to free the command
buffer list and flush queues. Next dummy dma buffer that the hardware
uses is released after disabling its functionality. Finally other
software resources are released and the hardware is left in a reset
state for future load/unload.
   
Signed-off-by: Milan Bag <[EMAIL PROTECTED]>
Signed-off-by: Wen Xiong <[EMAIL PROTECTED]>
Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]>

---

 drivers/net/netxen/netxen_nic.h  |   80 +
 drivers/net/netxen/netxen_nic_hdr.h  |2 
 drivers/net/netxen/netxen_nic_hw.c   |   25 -
 drivers/net/netxen/netxen_nic_init.c |   52 ++-
 drivers/net/netxen/netxen_nic_main.c |  112 +
 5 files changed, 206 insertions(+), 65 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 62aeab9..0e3be92 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -952,6 +952,26 @@ struct netxen_adapter {
int (*stop_port) (struct netxen_adapter *);
 }; /* netxen_adapter structure */
 
+/*
+ * NetXen dma watchdog control structure
+ *
+ * Bit 0   : enabled => R/O: 1 watchdog active, 0 inactive
+ * Bit 1   : disable_request => 1 req disable dma watchdog
+ * Bit 2   : enable_request =>  1 req enable dma watchdog
+ * Bit 3-31: unused
+ */
+
+typedef u32 dma_watchdog_ctrl_t;
+
+#define netxen_set_dma_watchdog_disable_req(config_word) \
+   _netxen_set_bits(config_word, 1, 1, 1)
+#define netxen_set_dma_watchdog_enable_req(config_word) \
+   _netxen_set_bits(config_word, 2, 1, 1)
+#define netxen_get_dma_watchdog_enabled(config_word) \
+   ((config_word) & 0x1)
+#define netxen_get_dma_watchdog_disabled(config_word) \
+   (((config_word) >> 1) & 0x1)
+
 /* Max number of xmit producer threads that can run simultaneously */
 #defineMAX_XMIT_PRODUCERS  16
 
@@ -1031,8 +1051,8 @@ int netxen_nic_erase_pxe(struct netxen_a
 /* Functions from netxen_nic_init.c */
 void netxen_free_adapter_offload(struct netxen_adapter *adapter);
 int netxen_initialize_adapter_offload(struct netxen_adapter *adapter);
-void netxen_phantom_init(struct netxen_adapter *adapter, int pegtune_val);
-void netxen_load_firmware(struct netxen_adapter *adapter);
+int netxen_phantom_init(struct netxen_adapter *adapter, int pegtune_val);
+int netxen_load_firmware(struct netxen_adapter *adapter);
 int netxen_pinit_from_rom(struct netxen_adapter *adapter, int verbose);
 int netxen_rom_fast_read(struct netxen_adapter *adapter, int addr, int *valp);
 int netxen_rom_fast_read_words(struct netxen_adapter *adapter, int addr, 
@@ -1230,6 +1250,62 @@ static inline void get_brd_name_by_type(
name = "Unknown";
 }
 
+static inline int
+dma_watchdog_shutdown_request(struct netxen_adapter *adapter)
+{
+   dma_watchdog_ctrl_t ctrl;
+
+   /* check if already inactive */
+   if (netxen_nic_hw_read_wx(adapter,
+   NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), &ctrl, 4))
+   printk(KERN_ERR "failed to read dma watchdog status\n");
+
+   if (netxen_get_dma_watchdog_enabled(ctrl) == 0)
+   return 1;
+
+   /* Send the disable request */
+   netxen_set_dma_watchdog_disable_req(ctrl);
+   netxen_crb_writelit_adapter(adapter,
+   NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), ctrl);
+
+   return 0;
+}
+
+static inline int
+dma_watchdog_shutdown_poll_result(struct netxen_adapter *adapter)
+{
+   dma_watchdog_ctrl_t ctrl;
+
+   if (netxen_nic_hw_read_wx(adapter,
+   NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), &ctrl, 4))
+   printk(KERN_ERR "failed to read dma watchdog status\n");
+
+   return ((netxen_get_dma_watchdog_enabled(ctrl) == 0) && 
+   (netxen_get_dma_watchdog_disabled(ctrl) == 0));
+}
+
+static inline int
+dma_watchdog_wakeup(struct netxen_adapter *adapter)
+{
+   dma_watchdog_ctrl_t ctrl;
+
+   if (netxen_nic_hw_read_wx(adapter,
+   NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), &ctrl, 4))
+   printk(KERN_ERR "failed to read dma watchdog status\n");
+
+   if (netxen_get_dma_watchdog_enabled(ctrl))
+   return 1;
+
+   /* send the wakeup request */
+   netxen_set_dma_watchdog_enable_req(ctrl);
+
+   netxen_crb_writelit_adapter(adapter,
+   NETXEN_CAM_RAM(NETXEN_CAM_RAM_DMA_WATCHDOG_CTRL), ctrl);
+   
+   return 0;
+}
+
+
 int netxen_is_flash_supported(struct netxen_adapter *adapter);
 int netxen_get_flash_mac_addr(struct netxen_adapter *adapter, u64 mac[]);
 extern void netxen_change_ringparam(struct netxen_adapter *adapter);
diff --git a/d

[PATCH 2/3] NetXen: Make use of per port interrupt mask scheme

2007-06-22 Thread Mithlesh Thukral
NetXen: Make use of per port interrupt scheme.
This patch makes the driver inform the firmware that it can support the
per port interrupt mask scheme. The driver too needs to check whether
the firmware also supports the per port interrupt scheme. If yes, 
then interrupt for each port is enabled/disabled instead of disabling 
for the entire card as it was being done till now.

Signed-off-by: Milan Bag <[EMAIL PROTECTED]>
Signed-off-by: Wen Xiong <[EMAIL PROTECTED]>
Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]>

---

 drivers/net/netxen/netxen_nic.h  |  104 +
 drivers/net/netxen/netxen_nic_hw.c   |5 -
 drivers/net/netxen/netxen_nic_init.c |2 
 drivers/net/netxen/netxen_nic_main.c |   28 +++--
 drivers/net/netxen/netxen_nic_phan_reg.h |   14 ++
 5 files changed, 121 insertions(+), 32 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 91f25e0..62aeab9 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -937,6 +937,7 @@ struct netxen_adapter {
struct netxen_ring_ctx *ctx_desc;
struct pci_dev *ctx_desc_pdev;
dma_addr_t ctx_desc_phys_addr;
+   int intr_scheme;
int (*enable_phy_interrupts) (struct netxen_adapter *);
int (*disable_phy_interrupts) (struct netxen_adapter *);
void (*handle_phy_intr) (struct netxen_adapter *);
@@ -1080,37 +1081,102 @@ struct net_device_stats *netxen_nic_get_
 
 static inline void netxen_nic_disable_int(struct netxen_adapter *adapter)
 {
-   /*
-* ISR_INT_MASK: Can be read from window 0 or 1.
-*/
-   writel(0x7ff, PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK));
+   uint32_tmask = 0x7ff;
+   int count = 0;
+
+   DPRINTK(1,INFO,"Entered ISR Disable \n");
+
+   switch(adapter->portnum) {
+   case 0:
+   writel(0x0, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_0));
+   break;
+   case 1:
+   writel(0x0, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_1));
+   break;
+   case 2:
+   writel(0x0, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_2));
+   break;
+   case 3:
+   writel(0x0, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_3));
+   break;
+   }
+
+   if (adapter->intr_scheme != -1 &&
+   adapter->intr_scheme != INTR_SCHEME_PERPORT) {
+   writel(mask,
+   (void *)(PCI_OFFSET_SECOND_RANGE(adapter, 
ISR_INT_MASK)));
+   }
 
+   /* Window = 0 or 1 */
+   if (!(adapter->flags & NETXEN_NIC_MSI_ENABLED)) {
+   do {
+   writel(0x, (void *)
+   (PCI_OFFSET_SECOND_RANGE(adapter, 
ISR_INT_TARGET_STATUS)));
+   mask = readl((void *)
+   (pci_base_offset(adapter, 
ISR_INT_VECTOR)));
+   } while (((mask & 0x80) != 0) && (++count < 32));
+
+   if ((mask & 0x80) != 0) {
+   printk(KERN_NOTICE "Could not disable interrupt 
completely\n");
+   }
+   }
+
+   DPRINTK(1,INFO,"Done with Disable Int\n");
+
+   return;
 }
 
 static inline void netxen_nic_enable_int(struct netxen_adapter *adapter)
 {
u32 mask;
 
-   switch (adapter->ahw.board_type) {
-   case NETXEN_NIC_GBE:
-   mask = 0x77b;
-   break;
-   case NETXEN_NIC_XGBE:
-   mask = 0x77f;
-   break;
-   default:
-   mask = 0x7ff;
-   break;
-   }
+   DPRINTK(1, INFO, "Entered ISR Enable \n");
+
+   if (adapter->intr_scheme != -1 &&
+   adapter->intr_scheme != INTR_SCHEME_PERPORT) {
+   switch (adapter->ahw.board_type) {
+   case NETXEN_NIC_GBE:
+   mask  =  0x77b;
+   break;
+   case NETXEN_NIC_XGBE:
+   mask  =  0x77f;
+   break;
+   default:
+   mask  =  0x7ff;
+   break;
+   }
 
-   writel(mask, PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK));
+   writel(mask,
+   (void *)(PCI_OFFSET_SECOND_RANGE(adapter, 
ISR_INT_MASK)));
+   }
+   switch (adapter->portnum) {
+   case 0:
+   writel(0x1, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_0));
+   break;
+   case 1:
+   writel(0x1, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_1));
+   break;
+   case 2:
+   writel(0x1, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_2));
+

[PATCH 1/3] NetXen: Fix MSI issues by using PCI function 0

2007-06-22 Thread Mithlesh Thukral
NetXen: Fix issue of MSI not working correctly
NetXen driver uses PCI function 0 to provide the functionality of MSI.
The patch makes driver check the bus master bit for function 0 and
enable it after the card initialization.

Signed-off-by: Milan Bag <[EMAIL PROTECTED]>
Signed-off-by: Wen Xiong <[EMAIL PROTECTED]>
Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]>

---

 drivers/net/netxen/netxen_nic_main.c |   13 ++---
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_main.c 
b/drivers/net/netxen/netxen_nic_main.c
index 6167b58..e68356b 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -355,13 +355,6 @@ #endif
/* initialize the adapter */
netxen_initialize_adapter_hw(adapter);
 
-#ifdef CONFIG_PPC
-   if ((adapter->ahw.boardcfg.board_type ==
-   NETXEN_BRDTYPE_P2_SB31_10G_IMEZ) &&
-   (pci_func_id == 2))
-   goto err_out_free_adapter;
-#endif /* CONFIG_PPC */
-
/*
 *  Adapter in our case is quad port so initialize it before
 *  initializing the ports
@@ -509,6 +502,12 @@ #endif
NETXEN_CAM_RAM(0x1fc)));
if (val == 0x) {
/* This is the first boot after power up */
+   netxen_nic_read_w0(adapter, NETXEN_PCIE_REG(0x4), &val);
+   if (!(val & 0x4)) {
+   val |= 0x4;
+   netxen_nic_write_w0(adapter, NETXEN_PCIE_REG(0x4), val);
+   mdelay(100);
+   }
val = readl(NETXEN_CRB_NORMALIZE(adapter,
NETXEN_ROMUSB_GLB_SW_RESET));
printk(KERN_INFO"NetXen: read 0x%08x for reset reg.\n",val);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] NetXen: Updates and bug fixes for NetXen 1/10G driver

2007-06-22 Thread Mithlesh Thukral
Hi All,

I will be sending updates for NetXen NIC 1/10 G Ethernet driver
in the following emails. These are bug fixes and better interrupt
handling schemes. All these patches have been test on x86 machines and
PowerPC blades.

Thanks,
Mithlesh Thukral
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] NetXen: Updates and bug fixes for NetXen 1/10G driver

2007-06-22 Thread Mithlesh Thukral
All,

I am recalling these 2 patches. Please dont review them. 
I will resend them again along with a new patch which has come up. 
Sorry for the inconvenience.

Thanks,
Mithlesh Thukral

On Thursday 21 June 2007 22:34, Mithlesh Thukral wrote:
> Hi All,
>
> I will be sending updates for NetXen NIC 1/10 G Ethernet driver
> in the following emails. These are bug fixes and better interrupt
> handling schemes. These have been test on x86 machines and
> PowerPC blades.
>
> Thanks,
> Mithlesh Thukral
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CONFIG_INET depend on CONFIG_SYSCTL

2007-06-22 Thread Yoshinori Sato
At Tue, 12 Jun 2007 23:05:45 -0700 (PDT),
David Miller wrote:
> 
> From: Yoshinori Sato <[EMAIL PROTECTED]>
> Date: Wed, 13 Jun 2007 14:59:16 +0900
> 
> > At Tue, 12 Jun 2007 01:08:55 -0700 (PDT),
> > David Miller wrote:
> >  
> > > 2) It is much better to add the appropriate CONFIG_SYSCTL
> > >ifdefs to the INET code than to force it on for everyone.
> > 
> > It examined that, but many corrections become necessary.
> 
> I understand, but embedded people will not be happy that
> SYSFS is a requirement for IPV4 networking.  Every little
> bit of space savings matters for them.

A reply became late, sorry.

I do not check it in detail, but there seem to be part a few 
depending SYSFS.
I need to check whether can separate a SYSFS depending part.

It may take time, but tries to check it.

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Yoshinori Sato
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html