Re: Doubt in kernel packet generation

2005-08-12 Thread Evgeniy Polyakov
On Fri, Aug 12, 2005 at 09:33:13AM +0530, varun ([EMAIL PROTECTED]) wrote:
 Hi all,
 
   I have a major doubt regarding how to generate my own icmp
 packet from the kernel space. That is iam aware of raw sockets and
 packet sockets but thats from user space. I want one of my kernel module
 to generate a packet using skb and probably add it to the transmit
 queue. Can anyone help me in this? Iam new to this group so if question is 
 irrelevant so please let me know where i can post it to get the answer.  

net/core/pktgen.c has an excellent example of building network packet in
kernelspace.

Varun

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread Scott Bardone

OPEN TOE submission from Chelsio Communications.

The following items have been addressed:
- cleaned up indentation.
- cleaned up comments.
- cleaned up c-styles.
- using EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL
- removed 2.4 compatibility.
- created TCP_OFFLOAD config option.
- moved #defines to appropriate files.
- removed obfuscating macros.
- included necessary definitions instead of struct.
- made IS_OFFLOADED an inline function instead of macro.

The following items are currently being worked on:
- use sysfs instead of procfs.
- addressing the use of semaphores in 'register_tom'.
- use RCU, need to look at this.
- use inline function instead of TOEDEV macro, requires some work.

Comments:
- static was removed from functions '__tcp_inherit_port'  '__tcp_v4_hash' 
because these are called outside of tcp_ipv4.c from the TOM driver.


Signed-off-by: Scott Bardone [EMAIL PROTECTED]

diff -Naur linux-2.6.13-rc6-git3/include/linux/netdevice.h 
linux-2.6.13-rc6-git3.patched/include/linux/netdevice.h
--- linux-2.6.13-rc6-git3/include/linux/netdevice.h 2005-08-07 
11:18:56.0 -0700
+++ linux-2.6.13-rc6-git3.patched/include/linux/netdevice.h 2005-08-11 
21:28:36.0 -0700
@@ -408,6 +408,9 @@
 #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN 
packets */
 #define NETIF_F_TSO2048/* Can offload TCP/IP segmentation */
 #define NETIF_F_LLTX   4096/* LockLess TX */
+#ifdef CONFIG_TCP_OFFLOAD
+#define NETIF_F_TCPIP_OFFLOAD  65536   /* Can offload TCP/IP */
+#endif
 
/* Called after device is detached from network. */
void(*uninit)(struct net_device *dev);
diff -Naur linux-2.6.13-rc6-git3/include/linux/tcp_diag.h 
linux-2.6.13-rc6-git3.patched/include/linux/tcp_diag.h
--- linux-2.6.13-rc6-git3/include/linux/tcp_diag.h  2005-08-07 
11:18:56.0 -0700
+++ linux-2.6.13-rc6-git3.patched/include/linux/tcp_diag.h  2005-08-11 
21:28:36.0 -0700
@@ -4,6 +4,11 @@
 /* Just some random number */
 #define TCPDIAG_GETSOCK 18
 
+/* TOE API */
+#ifdef CONFIG_TCP_OFFLOAD
+#define TCPDIAG_OFFLOAD 5
+#endif
+
 /* Socket identity */
 struct tcpdiag_sockid
 {
diff -Naur linux-2.6.13-rc6-git3/include/linux/tcp.h 
linux-2.6.13-rc6-git3.patched/include/linux/tcp.h
--- linux-2.6.13-rc6-git3/include/linux/tcp.h   2005-08-07 11:18:56.0 
-0700
+++ linux-2.6.13-rc6-git3.patched/include/linux/tcp.h   2005-08-11 
21:28:36.0 -0700
@@ -235,6 +235,10 @@
return (struct tcp_request_sock *)req;
 }
 
+#ifdef CONFIG_TCP_OFFLOAD
+struct toe_funcs;
+#endif
+
 struct tcp_sock {
/* inet_sock has to be the first member of tcp_sock */
struct inet_sockinet;
@@ -342,6 +346,10 @@
 
struct tcp_func *af_specific;   /* Operations which are 
AF_INET{4,6} specific   */
 
+#ifdef CONFIG_TCP_OFFLOAD
+   struct toe_funcs*toe_specific; /* Operations overriden by TOEs 
*/
+#endif
+
__u32   rcv_wnd;/* Current receiver window  */
__u32   rcv_wup;/* rcv_nxt on last window update sent   */
__u32   write_seq;  /* Tail(+1) of data held in tcp send buffer */
diff -Naur linux-2.6.13-rc6-git3/include/linux/toedev.h 
linux-2.6.13-rc6-git3.patched/include/linux/toedev.h
--- linux-2.6.13-rc6-git3/include/linux/toedev.h1969-12-31 
16:00:00.0 -0800
+++ linux-2.6.13-rc6-git3.patched/include/linux/toedev.h2005-08-11 
22:37:03.94780 -0700
@@ -0,0 +1,126 @@
+/*
+ *   *
+ * File: *
+ *  toedev.h *
+ *   *
+ * Description:  *
+ *  TOE device definitions.  *
+ *   *
+ * This program is free software; you can redistribute it and/or modify  *
+ * it under the terms of the GNU General Public License, version 2, as   *
+ * published by the Free Software Foundation.*
+ *   *
+ * You should have received a copy of the GNU General Public License along   *
+ * with this program; if not, write to the Free Software Foundation, Inc.,   *
+ * 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA. *
+ *   *
+ * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED*
+ * WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF  *
+ * MERCHANTABILITY AND FITNESS 

Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread David S. Miller
From: Scott Bardone [EMAIL PROTECTED]
Date: Thu, 11 Aug 2005 23:16:14 -0700

 - static was removed from functions '__tcp_inherit_port'  '__tcp_v4_hash' 
 because these are called outside of tcp_ipv4.c from the TOM driver.

There is no way you're going to be allowed to call such deep TCP
internals from your driver.

This would mean that every time we wish to change the data structures
and interfaces for TCP socket lookup, your drivers would need to
change.

This is all looking exactly like the deep dark dungeon I feared TOE
support would be.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread Mitchell Blank Jr
The networking gurus can comment on the internals of your patch better than
I can.  Just a few style notes though:

 +#ifdef CONFIG_TCP_OFFLOAD
 +#define NETIF_F_TCPIP_OFFLOAD65536   /* Can offload TCP/IP */
 +#endif

No need to protect this inside CONFIG_* option

 +/* TOE API */
 +#ifdef CONFIG_TCP_OFFLOAD
 +#define TCPDIAG_OFFLOAD 5
 +#endif

Ditto

 +#ifdef CONFIG_TCP_OFFLOAD
 +struct toe_funcs;
 +#endif

Ditto

 +#ifdef CONFIG_TCP_OFFLOAD
 +#include linux/toedev.h
 +#endif

Include linux/toedev.h unconditionally.  Have it handle the !CONFIG_TCP_OFFLOAD
case itself by declaring noop macros for things like toe_neigh_update().
This way you can remove a lot of the #ifdef's you've sprinkled all over the
.c files

 +#define boot_phase 0

Some explaination here?  It looks like something left over from development.

 +#ifndef __raise_softirq_irqoff
 +#define __raise_softirq_irqoff(nr) __cpu_raise_softirq(smp_processor_id(), 
 nr)
 +#endif

What is this needed for?

 +static int toedev_init(void);

This forward declaration seems to be only needed for the boot_phase thing
above, so if that goes this can go as well.

 +/*
 + * Allocate a unique index for a TOE device.  We keep the index within 30 
 bits

Maybe look at lib/idr.c to handle this?

 + struct toedev *dev = kmalloc(sizeof(struct toedev), GFP_KERNEL);
 +
 + if (dev) {
 + memset(dev, 0, sizeof(struct toedev));

Minor nitpick (that some might disagree with)... I usually prefer:

struct toedev *dev = kmalloc(sizeof(*dev), GFP_KERNEL);

 +int toe_receive_skb(struct toedev *dev, struct sk_buff **skb, int n)
 +{
 + int i;

n and i should probably be unsigned int

 +#ifdef CONFIG_TCP_OFFLOAD
 + tcp_listen_offload(sk);
 +#endif

Another example of something that could be an empty macro in a .h file for
the !CONFIG_TCP_OFFLOAD case.

 +#ifndef CONFIG_TCP_OFFLOAD
 +static
 +#endif

Don't do this... just make it non-static unconditionally.  It's not worth
the ugliness.  Same applies to other places.

 +#ifndef CONFIG_TCP_OFFLOAD
 +static
 +#endif
 +__inline__ void __tcp_inherit_port(struct sock *sk, struct sock *child)
  {
   struct tcp_bind_hashbucket *head =
   tcp_bhash[tcp_bhashfn(inet_sk(child)-num)];
 @@ -351,7 +357,10 @@
   }
  }

Things that are inline and are now going to be shared really need to just
remain static inline and move to a header file probably

 +#ifdef CONFIG_TCP_OFFLOAD
 + if (tcp_connect_offload(sk))
 + return 0;
 +#endif

Just another example of the kind of #ifdef that doesn't belong in the .c
files.  If the !CONFIG_TCP_OFFLOAD case just had

#define tcp_connect_offload(sk) (0)

then you can skip the #ifdef

 +#ifndef CONFIG_TCP_OFFLOAD
   LIMIT_NETDEBUG(printk(KERN_DEBUG TCP: drop open 
 request from %u.%u.
 %u.%u/%u\n,
 NIPQUAD(saddr),
 ntohs(skb-h.th-source)));
 +#else
 + NETDEBUG(if (net_ratelimit()) \
 + printk(KERN_DEBUG TCP: drop open 
 +request from %u.%u.
 +%u.%u/%u\n, \
 +NIPQUAD(saddr),
 +ntohs(skb-h.th-source)));
 +#endif

Huh?  What about TOE requires changes to printk ratelimiting?

-Mitch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread Jeff Garzik

David S. Miller wrote:

From: Scott Bardone [EMAIL PROTECTED]
Date: Thu, 11 Aug 2005 23:16:14 -0700


- static was removed from functions '__tcp_inherit_port'  '__tcp_v4_hash' 
because these are called outside of tcp_ipv4.c from the TOM driver.



There is no way you're going to be allowed to call such deep TCP
internals from your driver.

This would mean that every time we wish to change the data structures
and interfaces for TCP socket lookup, your drivers would need to
change.

This is all looking exactly like the deep dark dungeon I feared TOE
support would be.


Although I keep an open mind, I really don't see how any TOE solution 
will ever overcome my own conceptual merge objections:



1) RFC compliance differs based on whether you use a TOE NIC, or Linux 
software stack.  What Linux am I talking to, today?


Linux is consistently the most RFC-compliant net stack in existence, 
AFAIK.  TOE suddenly leaves all that open to question.



2) Security updates.  We can deploy a net stack security fix very 
rapidly, and know that we have solved the issue(s).  With TOE, security 
fixes no longer cover all users.  One has to either wait on multiple TOE 
vendors to deploy firmware fixes, or deploy the software fix and leave 
TOE users exposed.  Once again...  What Linux am I talking to, today?



3) Netfilter.  Either a TOE NIC (a) doesn't support netfilter, (b) needs 
far-reaching packet mangling hooks, or (c) includes its own custom 
netfilter [clone], with attendant bugs and maintenance issues.



4) Configuration.  Either a TOE NIC needs deep net stack hooks, or needs 
its own netlink/ifconfig configuration interfaces.



5) As we see in this thread -- upper layer (TCP, IP) changes in the net 
stack require touching a bunch of low-level drivers.  Brand new 
maintenance issue, which slows down upper layer development.



So far, I haven't seen a TOE NIC that satisfies even half of these 
objections.


About the only TOE situation I could imagine which -would- would be 
where the TOE firmware source code is included in the Linux kernel 
source code, but even then, all the hooks would be nasty.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: atheros driver - desc

2005-08-12 Thread Chris Wedgwood
On Sun, Aug 07, 2005 at 05:01:34PM +0200, Harald Welte wrote:

 I will consult my legal counsel about this.  My current naive
 position on this is that only the actuall process of the
 re-engineering matters, not the result.

Which countries is this advice valid for?  Does someone need to chase
this inside the US in parallel?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread Mitchell Blank Jr
I'm fairly pessimistic about full TOE also, I just want to see the patch
cleaned up a bit so we can see the exact impact it would have.  The RX
optimization work presented in the Neterion and Intel papers at OLS sounds a
lot more interesting to me though.

However, I do want to comment on one statement of yours:

Jeff Garzik wrote:
 3) Netfilter.  Either a TOE NIC (a) doesn't support netfilter, (b) needs
 far-reaching packet mangling hooks, or (c) includes its own custom
 netfilter [clone], with attendant bugs and maintenance issues.

I don't think netfilter is a big deal.  The kernel could still check the
TCP handshake packets (or, if needed, faked-up versions with the same data)
at accept()/connect() time.  If those pass muster it's a pretty good bet
that the other 100,000 packets making up that TCP connection would also.
Of course this limitation would need to be documented but I doubt most
netfilter users would mind too much.  There's obviously edge cases where
you can lose like if you update the netfilter rules you ideally want to
revalidate all the currently open connections.

Since TOE hardware is designed to help the TCP end point you probably
don't have to worry about NAT or other fancy mangling on these interfaces.

-Mitch
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix NET/ROM queue length

2005-08-12 Thread Ralf Baechle
NET/ROM uses virtual interfaces so setting a queue length is wrong.

Signed-off-by: Ralf Baechle DL5RB [EMAIL PROTECTED]

 net/netrom/nr_dev.c |1 -
 1 files changed, 1 deletion(-)

Index: linux-cvs/net/netrom/nr_dev.c
===
--- linux-cvs.orig/net/netrom/nr_dev.c
+++ linux-cvs/net/netrom/nr_dev.c
@@ -187,7 +187,6 @@ void nr_setup(struct net_device *dev)
dev-hard_header_len= NR_NETWORK_LEN + NR_TRANSPORT_LEN;
dev-addr_len   = AX25_ADDR_LEN;
dev-type   = ARPHRD_NETROM;
-   dev-tx_queue_len   = 40;
dev-rebuild_header = nr_rebuild_header;
dev-set_mac_address= nr_set_mac_address;
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6][IPV6] Generalise the tcp_v6_lookup routines

2005-08-12 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/

This is based on the discussions we had on the [EMAIL PROTECTED]
about fully generalising tcp_diag, that is accomplished in this series of
changesets without breaking userspace ABI, it breaks source code if users
move from the previous tcp_diag.h to inet_diag.h, which is expected but
only required if wanting to support this generalised infrastructure, the
work required is basically a big sed, I'll do this later today/tomorrow.

Best Regards,

- Arnaldo

tree 78f33e1b9c74aa4e1586326e0918db068a967676
parent ccd176a23975b634cbdd89ffa190fb9da107c34e
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123816755 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123816755 -0300

[IPV6] Generalise the tcp_v6_lookup routines

In the same way as was done with the v4 counterparts, this will be moved
to inet6_hashtables.c.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]

--

 include/linux/ipv6.h   |5 +
 include/net/inet6_hashtables.h |   26 +++
 net/ipv4/Kconfig   |3 
 net/ipv4/tcp_diag.c|   40 +--
 net/ipv6/tcp_ipv6.c|  139 +
 5 files changed, 122 insertions(+), 91 deletions(-)

--

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -193,6 +193,11 @@ struct inet6_skb_parm {
 
 #define IP6CB(skb) ((struct inet6_skb_parm*)((skb)-cb))
 
+static inline int inet6_iif(const struct sk_buff *skb)
+{
+   return IP6CB(skb)-iif;
+}
+
 struct tcp6_request_sock {
struct tcp_request_sock req;
struct in6_addr loc_addr;
diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
new file mode 100644
--- /dev/null
+++ b/include/net/inet6_hashtables.h
@@ -0,0 +1,26 @@
+/*
+ * INETAn implementation of the TCP/IP protocol suite for the 
LINUX
+ * operating system.  INET is implemented using the BSD Socket
+ * interface as the means of communication with the user level.
+ *
+ * Authors:Lotsa people, from code originally in tcp
+ *
+ * This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _INET6_HASHTABLES_H
+#define _INET6_HASHTABLES_H
+
+#include linux/types.h
+
+struct in6_addr;
+struct inet_hashinfo;
+
+extern struct sock *inet6_lookup(struct inet_hashinfo *hashinfo,
+const struct in6_addr *saddr, const u16 sport,
+const struct in6_addr *daddr, const u16 dport,
+const int dif);
+#endif /* _INET6_HASHTABLES_H */
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -425,9 +425,6 @@ config IP_TCPDIAG
  
  If unsure, say Y.
 
-config IP_TCPDIAG_IPV6
-   def_bool (IP_TCPDIAG=y  IPV6=y) || (IP_TCPDIAG=m  IPV6)
-
 config IP_TCPDIAG_DCCP
def_bool (IP_TCPDIAG=y  IP_DCCP=y) || (IP_TCPDIAG=m  IP_DCCP)
 
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -24,6 +24,10 @@
 #include net/tcp.h
 #include net/ipv6.h
 #include net/inet_common.h
+#include net/inet_connection_sock.h
+#include net/inet_hashtables.h
+#include net/inet_timewait_sock.h
+#include net/inet6_hashtables.h
 
 #include linux/inet.h
 #include linux/stddef.h
@@ -102,7 +106,7 @@ static int tcpdiag_fill(struct sk_buff *
r-tcpdiag_wqueue = 0;
r-tcpdiag_uid = 0;
r-tcpdiag_inode = 0;
-#ifdef CONFIG_IP_TCPDIAG_IPV6
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
if (r-tcpdiag_family == AF_INET6) {
const struct tcp6_timewait_sock *tcp6tw = tcp6_twsk(sk);
 
@@ -121,7 +125,7 @@ static int tcpdiag_fill(struct sk_buff *
r-id.tcpdiag_src[0] = inet-rcv_saddr;
r-id.tcpdiag_dst[0] = inet-daddr;
 
-#ifdef CONFIG_IP_TCPDIAG_IPV6
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
if (r-tcpdiag_family == AF_INET6) {
struct ipv6_pinfo *np = inet6_sk(sk);
 
@@ -196,19 +200,6 @@ nlmsg_failure:
return -1;
 }
 
-#ifdef CONFIG_IP_TCPDIAG_IPV6
-extern struct sock *tcp_v6_lookup(struct in6_addr *saddr, u16 sport,
- struct in6_addr *daddr, u16 dport,
- int dif);
-#else
-static inline struct sock *tcp_v6_lookup(struct in6_addr *saddr, u16 sport,
- 

[PATCH 2/6][INET6_HASHTABLES] Move inet6_lookup functions to net/ipv4/inet6_hashtables.c

2005-08-12 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/

This is based on the discussions we had on the [EMAIL PROTECTED]
about fully generalising tcp_diag, that is accomplished in this series of
changesets without breaking userspace ABI, it breaks source code if users
move from the previous tcp_diag.h to inet_diag.h, which is expected but
only required if wanting to support this generalised infrastructure, the
work required is basically a big sed, I'll do this later today/tomorrow.

Best Regards,

- Arnaldo

tree 321162afae4bc318a868c1294d79be04cef31ad8
parent b0e1ef9a964a4d4ef3510d6820db759bc4821e44
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123817709 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123817709 -0300

[INET6_HASHTABLES] Move inet6_lookup functions to net/ipv4/inet6_hashtables.c

Doing this we allow tcp_diag to support IPV6 even if tcp_diag is compiled
statically and IPV6 is compiled as a module, removing the previous restriction
while not building any IPV6 code if it is not selected.

Now to work on the tcpdiag_register infrastructure and then to rename the whole
thing to inetdiag, reflecting its by then completely generic nature.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

--

 include/net/inet6_hashtables.h |  106 +++-
 net/ipv4/Kconfig   |4 -
 net/ipv4/Makefile  |2 
 net/ipv4/inet6_hashtables.c|   81 +
 net/ipv6/tcp_ipv6.c|  154 -
 5 files changed, 190 insertions(+), 157 deletions(-)

--

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -14,13 +14,117 @@
 #ifndef _INET6_HASHTABLES_H
 #define _INET6_HASHTABLES_H
 
+#include linux/config.h
+
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+#include linux/in6.h
+#include linux/ipv6.h
 #include linux/types.h
 
-struct in6_addr;
+#include net/ipv6.h
+
 struct inet_hashinfo;
 
+/* I have no idea if this is a good hash for v6 or not. -DaveM */
+static inline int inet6_ehashfn(const struct in6_addr *laddr, const u16 lport,
+   const struct in6_addr *faddr, const u16 fport,
+   const int ehash_size)
+{
+   int hashent = (lport ^ fport);
+
+   hashent ^= (laddr-s6_addr32[3] ^ faddr-s6_addr32[3]);
+   hashent ^= hashent  16;
+   hashent ^= hashent  8;
+   return (hashent  (ehash_size - 1));
+}
+
+static inline int inet6_sk_ehashfn(const struct sock *sk, const int ehash_size)
+{
+   const struct inet_sock *inet = inet_sk(sk);
+   const struct ipv6_pinfo *np = inet6_sk(sk);
+   const struct in6_addr *laddr = np-rcv_saddr;
+   const struct in6_addr *faddr = np-daddr;
+   const __u16 lport = inet-num;
+   const __u16 fport = inet-dport;
+   return inet6_ehashfn(laddr, lport, faddr, fport, ehash_size);
+}
+
+/*
+ * Sockets in TCP_CLOSE state are _always_ taken out of the hash, so
+ * we need not check it for TCP lookups anymore, thanks Alexey. -DaveM
+ *
+ * The sockhash lock must be held as a reader here.
+ */
+static inline struct sock *
+   __inet6_lookup_established(struct inet_hashinfo *hashinfo,
+  const struct in6_addr *saddr,
+  const u16 sport,
+  const struct in6_addr *daddr,
+  const u16 hnum,
+  const int dif)
+{
+   struct sock *sk;
+   const struct hlist_node *node;
+   const __u32 ports = INET_COMBINED_PORTS(sport, hnum);
+   /* Optimize here for direct hit, only listening connections can
+* have wildcards anyways.
+*/
+   const int hash = inet6_ehashfn(daddr, hnum, saddr, sport,
+  hashinfo-ehash_size);
+   struct inet_ehash_bucket *head = hashinfo-ehash[hash];
+
+   read_lock(head-lock);
+   sk_for_each(sk, node, head-chain) {
+   /* For IPV6 do the cheaper port and family tests first. */
+   if (INET6_MATCH(sk, saddr, daddr, ports, dif))
+   goto hit; /* You sunk my battleship! */
+   }
+   /* Must check for a TIME_WAIT'er before going to listener hash. */
+   sk_for_each(sk, node, (head + hashinfo-ehash_size)-chain) {
+   const struct inet_timewait_sock *tw = inet_twsk(sk);
+
+   if(*((__u32 *)(tw-tw_dport))  == ports
+  sk-sk_family== PF_INET6) {
+  

[PATCH 3/6][TCPDIAG] Introduce inet_diag_{register,unregister}

2005-08-12 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/

This is based on the discussions we had on the [EMAIL PROTECTED]
about fully generalising tcp_diag, that is accomplished in this series of
changesets without breaking userspace ABI, it breaks source code if users
move from the previous tcp_diag.h to inet_diag.h, which is expected but
only required if wanting to support this generalised infrastructure, the
work required is basically a big sed, I'll do this later today/tomorrow.

Best Regards,

- Arnaldo

tree 068b3f880dfe76b8bae940ede403d6b8f5dd5c8c
parent 790164673413c8cfebc910d6b99ab2ae6ae2c9bb
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123829138 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123829138 -0300

[TCPDIAG] Introduce inet_diag_{register,unregister}

Next changeset will rename tcp_diag to inet_diag and move the tcp_diag code out
of it and into a new tcp_diag.c, similar to the net/dccp/diag.c introduced in
this changeset, completing the transition to a generic inet_diag
infrastructure.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

--

 include/linux/tcp_diag.h |   19 +
 net/dccp/Kconfig |5 +
 net/dccp/Makefile|4 +
 net/dccp/diag.c  |   47 ++
 net/ipv4/Kconfig |3 
 net/ipv4/tcp_diag.c  |  153 ++-
 6 files changed, 186 insertions(+), 45 deletions(-)

--

diff --git a/include/linux/tcp_diag.h b/include/linux/tcp_diag.h
--- a/include/linux/tcp_diag.h
+++ b/include/linux/tcp_diag.h
@@ -5,6 +5,8 @@
 #define TCPDIAG_GETSOCK 18
 #define DCCPDIAG_GETSOCK 19
 
+#define INET_DIAG_GETSOCK_MAX 24
+
 /* Socket identity */
 struct tcpdiag_sockid
 {
@@ -125,4 +127,21 @@ struct tcpvegas_info {
__u32   tcpv_minrtt;
 };
 
+#ifdef __KERNEL__
+struct sock;
+struct inet_hashinfo;
+
+struct inet_diag_handler {
+   struct inet_hashinfo*idiag_hashinfo;
+   void(*idiag_get_info)(struct sock *sk,
+ struct tcpdiagmsg *r,
+ void *info);
+   __u16   idiag_info_size;
+   __u16   idiag_type;
+};
+
+extern int  inet_diag_register(const struct inet_diag_handler *handler);
+extern void inet_diag_unregister(const struct inet_diag_handler *handler);
+#endif /* __KERNEL__ */
+
 #endif /* _TCP_DIAG_H_ */
diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig
--- a/net/dccp/Kconfig
+++ b/net/dccp/Kconfig
@@ -19,6 +19,11 @@ config IP_DCCP
 
  If in doubt, say N.
 
+config IP_DCCP_DIAG
+   depends on IP_DCCP  IP_TCPDIAG
+   def_tristate y if (IP_DCCP = y  IP_TCPDIAG = y)
+   def_tristate m
+
 source net/dccp/ccids/Kconfig
 
 endmenu
diff --git a/net/dccp/Makefile b/net/dccp/Makefile
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -3,4 +3,8 @@ obj-$(CONFIG_IP_DCCP) += dccp.o
 dccp-y := ccid.o input.o ipv4.o minisocks.o options.o output.o proto.o \
  timer.o packet_history.o
 
+obj-$(CONFIG_IP_DCCP_DIAG) += dccp_diag.o
+
 obj-y += ccids/
+
+dccp_diag-y := diag.o
diff --git a/net/dccp/diag.c b/net/dccp/diag.c
new file mode 100644
--- /dev/null
+++ b/net/dccp/diag.c
@@ -0,0 +1,47 @@
+/*
+ *  net/dccp/diag.c
+ *
+ *  An implementation of the DCCP protocol
+ *  Arnaldo Carvalho de Melo [EMAIL PROTECTED]
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include linux/config.h
+
+#include linux/module.h
+#include linux/tcp_diag.h
+
+#include dccp.h
+
+static void dccp_diag_get_info(struct sock *sk, struct tcpdiagmsg *r,
+  void *_info)
+{
+   r-tcpdiag_rqueue = r-tcpdiag_wqueue = 0;
+}
+
+static struct inet_diag_handler dccp_diag_handler = {
+   .idiag_hashinfo  = dccp_hashinfo,
+   .idiag_get_info  = dccp_diag_get_info,
+   .idiag_type  = DCCPDIAG_GETSOCK,
+   .idiag_info_size = 0,
+};
+
+static int __init dccp_diag_init(void)
+{
+   return inet_diag_register(dccp_diag_handler);
+}
+
+static void __exit dccp_diag_fini(void)
+{
+   inet_diag_unregister(dccp_diag_handler);
+}
+
+module_init(dccp_diag_init);
+module_exit(dccp_diag_fini);
+
+MODULE_LICENSE(GPL);
+MODULE_AUTHOR(Arnaldo Carvalho de Melo [EMAIL PROTECTED]);
+MODULE_DESCRIPTION(DCCP inet_diag handler);
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -423,9 +423,6 @@ config IP_TCPDIAG
  
  If unsure, say Y.
 
-config IP_TCPDIAG_DCCP
-   def_bool (IP_TCPDIAG=y  

[PATCH 4/6][TCPDIAG] Just rename everything to inet_diag

2005-08-12 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/

This is based on the discussions we had on the [EMAIL PROTECTED]
about fully generalising tcp_diag, that is accomplished in this series of
changesets without breaking userspace ABI, it breaks source code if users
move from the previous tcp_diag.h to inet_diag.h, which is expected but
only required if wanting to support this generalised infrastructure, the
work required is basically a big sed, I'll do this later today/tomorrow.

Best Regards,

- Arnaldo

tree c121114a797d3c2a5b0fe3bfcc7e26a83a3c1c55
parent e708bc5b8898bc13af6daa55a022272c70e6a747
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123836219 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123836219 -0300

[TCPDIAG] Just rename everything to inet_diag

Next changeset will rename tcp_diag.[ch] to inet_diag.[ch].

I'm taking this longer route so as to easy review, making clear the changes
made all along the way.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

--

 include/linux/netlink.h  |2 
 include/linux/tcp_diag.h |  135 -
 include/net/tcp.h|2 
 net/dccp/Kconfig |4 
 net/dccp/diag.c  |4 
 net/ipv4/Kconfig |   10 
 net/ipv4/Makefile|2 
 net/ipv4/tcp_diag.c  |  395 +--
 net/ipv4/tcp_vegas.c |4 
 net/ipv4/tcp_westwood.c  |4 
 security/selinux/hooks.c |4 
 security/selinux/include/av_inherit.h|2 
 security/selinux/include/av_perm_to_string.h |4 
 security/selinux/include/av_permissions.h|   48 +--
 security/selinux/include/class_to_string.h   |2 
 security/selinux/include/flask.h |2 
 security/selinux/nlmsgtab.c  |   11 
 17 files changed, 314 insertions(+), 321 deletions(-)

--

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -8,7 +8,7 @@
 #define NETLINK_W1 1   /* 1-wire subsystem 
*/
 #define NETLINK_USERSOCK   2   /* Reserved for user mode socket 
protocols  */
 #define NETLINK_FIREWALL   3   /* Firewalling hook 
*/
-#define NETLINK_TCPDIAG4   /* TCP socket monitoring
*/
+#define NETLINK_INET_DIAG  4   /* INET socket monitoring   
*/
 #define NETLINK_NFLOG  5   /* netfilter/iptables ULOG */
 #define NETLINK_XFRM   6   /* ipsec */
 #define NETLINK_SELINUX7   /* SELinux event notifications 
*/
diff --git a/include/linux/tcp_diag.h b/include/linux/tcp_diag.h
--- a/include/linux/tcp_diag.h
+++ b/include/linux/tcp_diag.h
@@ -1,5 +1,5 @@
-#ifndef _TCP_DIAG_H_
-#define _TCP_DIAG_H_ 1
+#ifndef _INET_DIAG_H_
+#define _INET_DIAG_H_ 1
 
 /* Just some random number */
 #define TCPDIAG_GETSOCK 18
@@ -8,39 +8,36 @@
 #define INET_DIAG_GETSOCK_MAX 24
 
 /* Socket identity */
-struct tcpdiag_sockid
-{
-   __u16   tcpdiag_sport;
-   __u16   tcpdiag_dport;
-   __u32   tcpdiag_src[4];
-   __u32   tcpdiag_dst[4];
-   __u32   tcpdiag_if;
-   __u32   tcpdiag_cookie[2];
-#define TCPDIAG_NOCOOKIE (~0U)
+struct inet_diag_sockid {
+   __u16   idiag_sport;
+   __u16   idiag_dport;
+   __u32   idiag_src[4];
+   __u32   idiag_dst[4];
+   __u32   idiag_if;
+   __u32   idiag_cookie[2];
+#define INET_DIAG_NOCOOKIE (~0U)
 };
 
 /* Request structure */
 
-struct tcpdiagreq
-{
-   __u8tcpdiag_family; /* Family of addresses. */
-   __u8tcpdiag_src_len;
-   __u8tcpdiag_dst_len;
-   __u8tcpdiag_ext;/* Query extended information */
+struct inet_diag_req {
+   __u8idiag_family;   /* Family of addresses. */
+   __u8idiag_src_len;
+   __u8idiag_dst_len;
+   __u8idiag_ext;  /* Query extended information */
 
-   struct tcpdiag_sockid id;
+   struct inet_diag_sockid id;
 
-   __u32   tcpdiag_states; /* States to dump */
-   __u32   tcpdiag_dbs;/* Tables to dump (NI) */
+   __u32   idiag_states;   /* States to dump */
+   __u32   idiag_dbs;  /* Tables to dump (NI) */
 };
 
-enum
-{
-   TCPDIAG_REQ_NONE,
-   TCPDIAG_REQ_BYTECODE,
+enum {
+   INET_DIAG_REQ_NONE,
+   INET_DIAG_REQ_BYTECODE,
 };
 
-#define TCPDIAG_REQ_MAX 

[PATCH 5/6][INET_DIAG] Rename tcp_diag.[ch] to inet_diag.[ch]

2005-08-12 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/

This is based on the discussions we had on the [EMAIL PROTECTED]
about fully generalising tcp_diag, that is accomplished in this series of
changesets without breaking userspace ABI, it breaks source code if users
move from the previous tcp_diag.h to inet_diag.h, which is expected but
only required if wanting to support this generalised infrastructure, the
work required is basically a big sed, I'll do this later today/tomorrow.

Best Regards,

- Arnaldo

tree 5ade244ad9d4220137112a4ea75325c652a66e03
parent 415f7316a38f275e121cc1962565cb7077cd188e
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123837525 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123837525 -0300

[INET_DIAG] Rename tcp_diag.[ch] to inet_diag.[ch]

Next changeset will introduce net/ipv4/tcp_diag.c, moving the code that was put
transitioanlly in inet_diag.c.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

--

 b/include/linux/inet_diag.h   |  138 ++
 b/net/dccp/diag.c |2 
 b/net/ipv4/Makefile   |2 
 b/net/ipv4/inet_diag.c|  893 ++
 b/net/ipv4/tcp_vegas.c|2 
 b/net/ipv4/tcp_westwood.c |2 
 b/security/selinux/nlmsgtab.c |2 
 include/linux/tcp_diag.h  |  138 --
 net/ipv4/tcp_diag.c   |  892 -
 9 files changed, 1036 insertions(+), 1035 deletions(-)

--

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
new file mode 100644
--- /dev/null
+++ b/include/linux/inet_diag.h
@@ -0,0 +1,138 @@
+#ifndef _INET_DIAG_H_
+#define _INET_DIAG_H_ 1
+
+/* Just some random number */
+#define TCPDIAG_GETSOCK 18
+#define DCCPDIAG_GETSOCK 19
+
+#define INET_DIAG_GETSOCK_MAX 24
+
+/* Socket identity */
+struct inet_diag_sockid {
+   __u16   idiag_sport;
+   __u16   idiag_dport;
+   __u32   idiag_src[4];
+   __u32   idiag_dst[4];
+   __u32   idiag_if;
+   __u32   idiag_cookie[2];
+#define INET_DIAG_NOCOOKIE (~0U)
+};
+
+/* Request structure */
+
+struct inet_diag_req {
+   __u8idiag_family;   /* Family of addresses. */
+   __u8idiag_src_len;
+   __u8idiag_dst_len;
+   __u8idiag_ext;  /* Query extended information */
+
+   struct inet_diag_sockid id;
+
+   __u32   idiag_states;   /* States to dump */
+   __u32   idiag_dbs;  /* Tables to dump (NI) */
+};
+
+enum {
+   INET_DIAG_REQ_NONE,
+   INET_DIAG_REQ_BYTECODE,
+};
+
+#define INET_DIAG_REQ_MAX INET_DIAG_REQ_BYTECODE
+
+/* Bytecode is sequence of 4 byte commands followed by variable arguments.
+ * All the commands identified by code are conditional jumps forward:
+ * to offset cc+yes or to offset cc+no. yes is supposed to be
+ * length of the command and its arguments.
+ */
+ 
+struct inet_diag_bc_op {
+   unsigned char   code;
+   unsigned char   yes;
+   unsigned short  no;
+};
+
+enum {
+   INET_DIAG_BC_NOP,
+   INET_DIAG_BC_JMP,
+   INET_DIAG_BC_S_GE,
+   INET_DIAG_BC_S_LE,
+   INET_DIAG_BC_D_GE,
+   INET_DIAG_BC_D_LE,
+   INET_DIAG_BC_AUTO,
+   INET_DIAG_BC_S_COND,
+   INET_DIAG_BC_D_COND,
+};
+
+struct inet_diag_hostcond {
+   __u8family;
+   __u8prefix_len;
+   int port;
+   __u32   addr[0];
+};
+
+/* Base info structure. It contains socket identity (addrs/ports/cookie)
+ * and, alas, the information shown by netstat. */
+struct inet_diag_msg {
+   __u8idiag_family;
+   __u8idiag_state;
+   __u8idiag_timer;
+   __u8idiag_retrans;
+
+   struct inet_diag_sockid id;
+
+   __u32   idiag_expires;
+   __u32   idiag_rqueue;
+   __u32   idiag_wqueue;
+   __u32   idiag_uid;
+   __u32   idiag_inode;
+};
+
+/* Extensions */
+
+enum {
+   INET_DIAG_NONE,
+   INET_DIAG_MEMINFO,
+   INET_DIAG_INFO,
+   INET_DIAG_VEGASINFO,
+   INET_DIAG_CONG,
+};
+
+#define INET_DIAG_MAX INET_DIAG_CONG
+
+
+/* INET_DIAG_MEM */
+
+struct inet_diag_meminfo {
+   __u32   idiag_rmem;
+   __u32   idiag_wmem;
+   __u32   idiag_fmem;
+   __u32   idiag_tmem;
+};
+
+/* INET_DIAG_VEGASINFO */
+
+struct tcpvegas_info {
+   __u32   tcpv_enabled;
+   __u32   tcpv_rttcnt;
+   __u32   tcpv_rtt;
+   __u32   tcpv_minrtt;
+};
+
+#ifdef __KERNEL__
+struct sock;
+struct inet_hashinfo;
+
+struct inet_diag_handler {
+   struct inet_hashinfo*idiag_hashinfo;
+   void(*idiag_get_info)(struct sock *sk,
+ struct inet_diag_msg *r,
+

[PATCH 6/6][INET_DIAG] Move the tcp_diag interface to the proper place

2005-08-12 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/

This is based on the discussions we had on the [EMAIL PROTECTED]
about fully generalising tcp_diag, that is accomplished in this series of
changesets without breaking userspace ABI, it breaks source code if users
move from the previous tcp_diag.h to inet_diag.h, which is expected but
only required if wanting to support this generalised infrastructure, the
work required is basically a big sed, I'll do this later today/tomorrow.

Best Regards,

- Arnaldo

tree 34a82c300ebcf262b22e607a303158c758967760
parent b2245293d4c6b27d66812567d524e7eea4e91c25
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123840162 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123840162 -0300

[INET_DIAG] Move the tcp_diag interface to the proper place

With this the previous setup is back, i.e. tcp_diag can be built as a module,
as dccp_diag and both share the infrastructure available in inet_diag.

If one selects CONFIG_INET_DIAG as module CONFIG_INET_TCP_DIAG will also be
built as a module, as will CONFIG_INET_DCCP_DIAG, if CONFIG_IP_DCCP was
selected static or as a module, if CONFIG_INET_DIAG is y, being statically
linked CONFIG_INET_TCP_DIAG will follow suit and CONFIG_INET_DCCP_DIAG will be
built in the same manner as CONFIG_IP_DCCP.

Now to aim at UDP, converting it to use inet_hashinfo, so that we can use
iproute2 for UDP sockets as well.

Ah, just to show an example of this new infrastructure working for DCCP :-)

[EMAIL PROTECTED] ~]# ./ss -dane
State  Recv-Q Send-Q Local Address:Port  Peer Address:Port
LISTEN 0  0  *:5001 *:* ino:942 
sk:cfd503a0
ESTAB  0  0  127.0.0.1:5001 127.0.0.1:32770 ino:943 
sk:cfd50a60
ESTAB  0  0  127.0.0.1:32770127.0.0.1:5001  ino:947 
sk:cfd50700
TIME-WAIT  0  0  127.0.0.1:32769127.0.0.1:5001  
timer:(timewait,3.430ms,0) ino:0 sk:cf209620

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

--

 include/net/tcp.h|2 -
 net/dccp/Kconfig |6 ++---
 net/dccp/Makefile|6 ++---
 net/ipv4/Kconfig |8 +--
 net/ipv4/Makefile|3 +-
 net/ipv4/inet_diag.c |   27 -
 net/ipv4/tcp_diag.c  |   54 +++
 7 files changed, 70 insertions(+), 36 deletions(-)

--

diff --git a/include/net/tcp.h b/include/net/tcp.h
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -479,7 +479,7 @@ static inline void tcp_clear_xmit_timers
 extern unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu);
 extern unsigned int tcp_current_mss(struct sock *sk, int large);
 
-/* tcp_diag.c */
+/* tcp.c */
 extern void tcp_get_info(struct sock *, struct tcp_info *);
 
 /* Read 'sendfile()'-style from a TCP socket */
diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig
--- a/net/dccp/Kconfig
+++ b/net/dccp/Kconfig
@@ -19,9 +19,9 @@ config IP_DCCP
 
  If in doubt, say N.
 
-config IP_DCCP_DIAG
-   depends on IP_DCCP  IP_INET_DIAG
-   def_tristate y if (IP_DCCP = y  IP_INET_DIAG = y)
+config INET_DCCP_DIAG
+   depends on IP_DCCP  INET_DIAG
+   def_tristate y if (IP_DCCP = y  INET_DIAG = y)
def_tristate m
 
 source net/dccp/ccids/Kconfig
diff --git a/net/dccp/Makefile b/net/dccp/Makefile
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -3,8 +3,8 @@ obj-$(CONFIG_IP_DCCP) += dccp.o
 dccp-y := ccid.o input.o ipv4.o minisocks.o options.o output.o proto.o \
  timer.o packet_history.o
 
-obj-$(CONFIG_IP_DCCP_DIAG) += dccp_diag.o
-
-obj-y += ccids/
+obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o
 
 dccp_diag-y := diag.o
+
+obj-y += ccids/
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -413,8 +413,8 @@ config INET_TUNNEL
  
  If unsure, say Y.
 
-config IP_INET_DIAG
-   tristate IP: INET socket monitoring interface
+config INET_DIAG
+   tristate INET: socket monitoring interface
default y
---help---
  Support for INET (TCP, DCCP, etc) socket monitoring interface used by
@@ -423,6 +423,10 @@ config IP_INET_DIAG
  
  If unsure, say Y.
 
+config INET_TCP_DIAG
+   depends on INET_DIAG
+   def_tristate INET_DIAG
+
 config TCP_CONG_ADVANCED
bool TCP: advanced congestion control
---help---
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -30,8 +30,9 @@ obj-$(CONFIG_IP_ROUTE_MULTIPATH_WRANDOM)
 obj-$(CONFIG_IP_ROUTE_MULTIPATH_DRR) += multipath_drr.o
 obj-$(CONFIG_NETFILTER)+= netfilter/
 obj-$(CONFIG_IP_VS) += ipvs/
-obj-$(CONFIG_IP_INET_DIAG) += 

Re: argh... ;/

2005-08-12 Thread John W. Linville
On Thu, Aug 11, 2005 at 10:36:34PM -0700, Chris Wedgwood wrote:
 On Fri, Aug 05, 2005 at 01:20:59PM -0400, John W. Linville wrote:
 
  Yes.  Opening attachments makes them harder to review.
 
 Lots of people can't inline patches because they are inflicted with
 crappy MUAs --- I would much prefer patches as attachments in those
 cases versus mangled patches.

Don't use crappy MUAs?

 Also, I would arguue any sane MUA would make dealing with
 reading/openning patches for sensible mime types trivial.

Any sane MUA wouldn't mangle the patches...

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add new iptables ipt_connbytes match

2005-08-12 Thread Patrick McHardy
Andi Kleen wrote:
 David S. Miller [EMAIL PROTECTED] writes:

Won't work in x86 -- x86_64 compat environments.
 
 Thanks for catching it.
 
 The aligned u64 trick probably will
 
 #define aligned_u64 unsigned long long __attribute__((aligned(8)))
 
 It just forces i386 to be aligned too.
 
 Then use aligned_u64 instead of u64/__u64/u_int64_t in all user visible 
 places. Similar for signed types.

Unfortunately one of the iptables structures which is needed to get the
ruleset in the kernel (ipt_replace) is differently sized when compiled
for 32/64 bit. IIRC it doesn't work at all currently.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: atheros driver - desc

2005-08-12 Thread Harald Welte
On Fri, Aug 12, 2005 at 12:37:55AM -0700, Chris Wedgwood wrote:
 On Sun, Aug 07, 2005 at 05:01:34PM +0200, Harald Welte wrote:
 
  I will consult my legal counsel about this.  My current naive
  position on this is that only the actuall process of the
  re-engineering matters, not the result.
 
 Which countries is this advice valid for?  Does someone need to chase
 this inside the US in parallel?

I'll see whether I can get Eben Moglen to comment on that matter.

-- 
- Harald Welte [EMAIL PROTECTED]  http://gnumonks.org/

Privacy in residential applications is a desirable marketing option.
  (ETSI EN 300 175-7 Ch. A6)


pgpEOdNBXlpM1.pgp
Description: PGP signature


Re: [PATCH] add new iptables ipt_connbytes match

2005-08-12 Thread Harald Welte
On Fri, Aug 12, 2005 at 04:52:49AM +0200, Patrick McHardy wrote:

 This functions looks broken. 

I feared it...

 Divisor and divident are mixed up, the
 shifted result variable is not used in the actual division, the
 first bit has to be  32 assumption is wrong and num_shift is
 calculated incorrectly. To find a 32-bit divisor consisting of the
 most-significant 32 bits we need to find the highest bit set and
 subtract 32 from this, then right-shift by that value if it is larger
 than 0. I can send a fixed patch tomorrow but I'm too tired now.

Thanks.

 +case IPT_CONNBYTES_WHAT_PKTS:
 
 I would really prefer the name IPT_CONNBYTES_PKTS :)

I _think_ it's sure to change it, since we don't include ipt_connbytes.h
in the iptables package.

Just send two incremental patches to Dave.

Cheers,
Harald
-- 
- Harald Welte [EMAIL PROTECTED] http://netfilter.org/

  Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed.-- Paul Vixie


pgpgwRWcZpriU.pgp
Description: PGP signature


Re: [PATCH 2/6][INET6_HASHTABLES] Move inet6_lookup functions to net/ipv4/inet6_hashtables.c

2005-08-12 Thread Arnaldo Carvalho de Melo
Em Fri, Aug 12, 2005 at 09:09:53PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ 
escreveu:
 In article [EMAIL PROTECTED] (at Fri, 12 Aug 2005 08:40:24 -0300), [EMAIL 
 PROTECTED] (Arnaldo Carvalho de Melo) says:
 
  [INET6_HASHTABLES] Move inet6_lookup functions to 
  net/ipv4/inet6_hashtables.c
 
  Doing this we allow tcp_diag to support IPV6 even if tcp_diag is compiled
  statically and IPV6 is compiled as a module, removing the previous 
  restriction
  while not building any IPV6 code if it is not selected.
 
 Please put this into net/ipv6 and list it in obj-y in net/ipv6/Makefile,
 like for net/ipv6/exthdrs_core.c.
 
 --yoshfuji

Humm, was not aware of this, lemme test this and then recreate the tree...

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6][INET6_HASHTABLES] Move inet6_lookup functions to net/ipv4/inet6_hashtables.c

2005-08-12 Thread Arnaldo Carvalho de Melo
Em Fri, Aug 12, 2005 at 09:12:36AM -0300, Arnaldo Carvalho de Melo escreveu:
 Em Fri, Aug 12, 2005 at 09:09:53PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ 
 escreveu:
  In article [EMAIL PROTECTED] (at Fri, 12 Aug 2005 08:40:24 -0300), [EMAIL 
  PROTECTED] (Arnaldo Carvalho de Melo) says:
  
   [INET6_HASHTABLES] Move inet6_lookup functions to 
   net/ipv4/inet6_hashtables.c
  
   Doing this we allow tcp_diag to support IPV6 even if tcp_diag is compiled
   statically and IPV6 is compiled as a module, removing the previous 
   restriction
   while not building any IPV6 code if it is not selected.
  
  Please put this into net/ipv6 and list it in obj-y in net/ipv6/Makefile,
  like for net/ipv6/exthdrs_core.c.
  
  --yoshfuji
 
 Humm, was not aware of this, lemme test this and then recreate the tree...

Done, the mirrors should pick it from master.kernel.org shortly, thank you
Yoshifuji-san.

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Strange uses of netif_start_queue

2005-08-12 Thread Thomas Graf
* Ralf Baechle [EMAIL PROTECTED] 2005-08-12 14:39
 On Fri, Aug 12, 2005 at 02:27:59PM +0100, Ralf Baechle wrote:
 
   Something I noticed doing the tty work. the 6pack driver calls
   netif_start_queue() before it calls register_netdev. I'm curious if this
   is allowed ?
  
  As part of adding support for extended 6pack which is required by the
  PR 430 I've recently fixed that.  It was looking suspect enough that I
  fixed it though I don't see any way this could do harm.
 
 To answer the fundamental question, I think netif_start_queue /
 netif_stop_queue should be allowed in case the driver for some reason has
 the desire to stop queueing of packet immediately after register_netdev.

The statement simply has no effect because the queue cannot be woken up
at this point, if so it would be a bug anyway due to uninitialized
spinlocks regardless of the prior call to netif_start_queue() so the
statement has no effect at all.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: argh... ;/

2005-08-12 Thread Chris Wedgwood
On Fri, Aug 12, 2005 at 07:44:28AM -0400, John W. Linville wrote:

 Don't use crappy MUAs?

Well, plenty of people do.  It's almost the norm so crappy probably
isn't very fair.

It does seem that most if the GUI-base MUAs though by default have
problematic settings (Mozilla, Thunderbird, Evolution, Outlook all
have problems at tims).

People also like to cut  paste patches from xterms or simlar into
MUAs which usually doesn't work very well either.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[OT] Re: [PATCH] xfrm: do not use large arrays in BSS

2005-08-12 Thread Harald Welte
On Thu, Aug 11, 2005 at 02:44:11PM +0200, Balazs Scheidler wrote:
 On Thu, 2005-08-11 at 22:31 +1000, Herbert Xu wrote:
  Balazs Scheidler [EMAIL PROTECTED] wrote:
  
   I've attached a revised patch, this time with complete error checking, 
   and 
   propagating the error code to the caller. Please apply.
  
  Sorry, but it seems that you've left out the bits that check the
  return value from xfrm_init()?
  
 
 Damn. I still have to get used to with git. Thanks for the hint. Anyone
 know a git description that tells me how to follow a tree and maintain
 my own set of patches on top? 

I create one local branch (head) for every feature/patchset by doing
something like cp .git/refs/heads/master .git/refs/heads/foo.  

Then you can switch to the foo head by 
ln -sf refs/heads/foo .git/HEAD; cg-reset

I then apply the patch (cg-patch) and commit (cg-commit).

Whenever I want to sync the upstream tree, i 
ln -sf .git/refs/heads/master .git/HEAD; cg-update origin
ln -sf .git/refs/heads/foo .git/HEAD; cg-reset; cg-merge master
(and iterate over all other heads and do the same).

To get a diff to your local master, you can then do cg-diff -r master:foo

It's not nice, but has been working for me through the last weeks .

-- 
- Harald Welte [EMAIL PROTECTED]  http://gnumonks.org/

Privacy in residential applications is a desirable marketing option.
  (ETSI EN 300 175-7 Ch. A6)


pgpRBjOxCfcAX.pgp
Description: PGP signature


Re: [PATCH 4/6][TCPDIAG] Just rename everything to inet_diag

2005-08-12 Thread James Morris
On Fri, 12 Aug 2005, Arnaldo Carvalho de Melo wrote:

Please do NOT apply these changes to the SELinux code.

These values are automatically generated and must be synchronized with 
userland policy.

 diff --git a/security/selinux/include/av_inherit.h 
 b/security/selinux/include/av_inherit.h
 --- a/security/selinux/include/av_inherit.h
 +++ b/security/selinux/include/av_inherit.h
 @@ -21,7 +21,7 @@
 S_(SECCLASS_SHM, ipc, 0x0200UL)
 S_(SECCLASS_NETLINK_ROUTE_SOCKET, socket, 0x0040UL)
 S_(SECCLASS_NETLINK_FIREWALL_SOCKET, socket, 0x0040UL)
 -   S_(SECCLASS_NETLINK_TCPDIAG_SOCKET, socket, 0x0040UL)
 +   S_(SECCLASS_NETLINK_INET_DIAG_SOCKET, socket, 0x0040UL)
 S_(SECCLASS_NETLINK_NFLOG_SOCKET, socket, 0x0040UL)
 S_(SECCLASS_NETLINK_XFRM_SOCKET, socket, 0x0040UL)
 S_(SECCLASS_NETLINK_SELINUX_SOCKET, socket, 0x0040UL)

etc.

At this stage, I suggest only updating the SELinux code so that it 
recognizes the DCCPDIAG_GETSOCK message.

We need to work out how to transition SELinux policy from a 
netlink_tcpdiag_socket class to netlink_inetdiag_socket.  i.e. whether 
to even bother changing the name of the class, or aliasing it somehow.



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6][TCPDIAG] Just rename everything to inet_diag

2005-08-12 Thread Arnaldo Carvalho de Melo
Em Fri, Aug 12, 2005 at 11:42:11AM -0400, James Morris escreveu:
 On Fri, 12 Aug 2005, Arnaldo Carvalho de Melo wrote:
 
 Please do NOT apply these changes to the SELinux code.
 
 These values are automatically generated and must be synchronized with 
 userland policy.
 
  diff --git a/security/selinux/include/av_inherit.h 
  b/security/selinux/include/av_inherit.h
  --- a/security/selinux/include/av_inherit.h
  +++ b/security/selinux/include/av_inherit.h
  @@ -21,7 +21,7 @@
  S_(SECCLASS_SHM, ipc, 0x0200UL)
  S_(SECCLASS_NETLINK_ROUTE_SOCKET, socket, 0x0040UL)
  S_(SECCLASS_NETLINK_FIREWALL_SOCKET, socket, 0x0040UL)
  -   S_(SECCLASS_NETLINK_TCPDIAG_SOCKET, socket, 0x0040UL)
  +   S_(SECCLASS_NETLINK_INET_DIAG_SOCKET, socket, 0x0040UL)
  S_(SECCLASS_NETLINK_NFLOG_SOCKET, socket, 0x0040UL)
  S_(SECCLASS_NETLINK_XFRM_SOCKET, socket, 0x0040UL)
  S_(SECCLASS_NETLINK_SELINUX_SOCKET, socket, 0x0040UL)
 
 etc.
 
 At this stage, I suggest only updating the SELinux code so that it 
 recognizes the DCCPDIAG_GETSOCK message.
 
 We need to work out how to transition SELinux policy from a 
 netlink_tcpdiag_socket class to netlink_inetdiag_socket.  i.e. whether 
 to even bother changing the name of the class, or aliasing it somehow.

Here I go regenerating the tree, at least this one is closer to the
end of the series... I'll just remove _all_ of the selinux related bits,
OK? Lesson learned :-)

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6][TCPDIAG] Just rename everything to inet_diag

2005-08-12 Thread James Morris
On Fri, 12 Aug 2005, Arnaldo Carvalho de Melo wrote:

 Here I go regenerating the tree, at least this one is closer to the
 end of the series... I'll just remove _all_ of the selinux related bits,
 OK? Lesson learned :-)

Ok, and I'll send a patch to make SELinux compile again :-)


- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add new iptables ipt_connbytes match

2005-08-12 Thread Harald Welte
On Fri, Aug 12, 2005 at 02:03:20PM +0200, Andi Kleen wrote:
  Unfortunately one of the iptables structures which is needed to get the
  ruleset in the kernel (ipt_replace) is differently sized when compiled
  for 32/64 bit. IIRC it doesn't work at all currently.
 
 Yes that's the old bug and cannot be fixed without breaking compatibility. 
 
 But we hope that ctnetlink will not repeat that mistake. That is why I'm 
 suggesting
 to use aligned_u64 in all new interfaces

I'll soon push a patch for all nfnetlink_{conntrack,queue,log} stuff for
net-2.6.14.  Don't worry about that.

But getting back to the original connbytes issue.  Is it worth fixing
it, if the core iptables doesn't even work (the old bug)?

I don't think that we're ever going to fix that bug in the old
{get,set}sockopt interface, but rather introduce a netlink interface
when pkt_tables matures.

-- 
- Harald Welte [EMAIL PROTECTED] http://netfilter.org/

  Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed.-- Paul Vixie


pgp23UNUvw65R.pgp
Description: PGP signature


Fw: [Bug 5050] New: KERNEL: assertion (cnt = tp-packets_out) failed at net/ipv4/tcp_input.c (1476)

2005-08-12 Thread Stephen Hemminger


Begin forwarded message:

Date: Fri, 12 Aug 2005 06:14:57 -0700
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bug 5050] New: KERNEL: assertion (cnt = tp-packets_out)
failed at net/ipv4/tcp_input.c (1476)


http://bugzilla.kernel.org/show_bug.cgi?id=5050

   Summary: KERNEL: assertion (cnt = tp-packets_out) failed at
net/ipv4/tcp_input.c (1476)
Kernel Version: 2.6.13-rc6
Status: NEW
  Severity: normal
 Owner: [EMAIL PROTECTED]
 Submitter: [EMAIL PROTECTED]


Distribution: Debian
Hardware Environment: P4 3.2 GHz 2048MB RAM 4xscsi disk
Software Environment: squid + netfilter
Problem Description: KERNEL: assertion (cnt = tp-packets_out) failed
at net/ipv4/tcp_input.c (1476)

Steps to reproduce: 
System is running 2 days and after that time produce this message

KERNEL: assertion (cnt = tp-packets_out) failed at
net/ipv4/tcp_input.c (1476) KERNEL: assertion (cnt = tp-packets_out)
failed at net/ipv4/tcp_input.c (1476)

--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [OT] Re: [PATCH] xfrm: do not use large arrays in BSS

2005-08-12 Thread Harald Welte
On Fri, Aug 12, 2005 at 06:14:39PM +0200, Balazs Scheidler wrote:

  Whenever I want to sync the upstream tree, i 
  ln -sf .git/refs/heads/master .git/HEAD; cg-update origin

Sorry, there is a cg-reset missing between the ln and the cg-update

-- 
- Harald Welte [EMAIL PROTECTED]  http://gnumonks.org/

Privacy in residential applications is a desirable marketing option.
  (ETSI EN 300 175-7 Ch. A6)


pgpnTpuXnYH97.pgp
Description: PGP signature


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread David S. Miller
From: Dimitris Michailidis [EMAIL PROTECTED]
Date: Fri, 12 Aug 2005 10:22:47 -0700

 This is true.  There is nothing fundamentally preventing both passive
 and active opens to check netfilter before OKing a connection.  Once a
 connection is established, it's rather impractical to run each of its
 packets through netfilter, this is 10G after all.  You'd probably not
 lose much functionality that you could have otherwise used at these
 speeds.

People don't use netfilter just for state tracking and filtering,
they also use it to some extent for rate limiting, packet logging, and
similar things.  And as busses and cpus get faster, your this is
10G after all argument becomes null and void.

Note that this TOE mess also makes the packet scheduler, queueing
disciplines, and packet classifiers totally unusable as well.

Essentially, half of the Linux networking stack's features are turned
uncontrollably _OFF_ in the presence of TOE.

It is this, along with many other reasons, why the Linux networking
community, in general, are so against TOE.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add new iptables ipt_connbytes match

2005-08-12 Thread Andi Kleen
 I don't think that we're ever going to fix that bug in the old
 {get,set}sockopt interface, but rather introduce a netlink interface
 when pkt_tables matures.

All new interfaces should be emulation clean, so that if the old interface
is replaced later it should eventually work. The best way to do that
is to use aligned_u64. Should probably put that into linux/types.h

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-12 Thread David S. Miller
From: Dimitris Michailidis [EMAIL PROTECTED]
Date: Fri, 12 Aug 2005 10:00:12 -0700

 On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote:
  This would mean that every time we wish to change the data structures
  and interfaces for TCP socket lookup, your drivers would need to
  change.
 
 I think using TCP's own functions was done exactly to avoid this
 problem.

That's doesn't achieve the desired result.

I do plan to merge in IBM's move of the TCP hash tables over
to RCU style locking, and that will require knowledge of the
locking at the call sites to the functions you have exported
to the TOE drivers.  The TOE drivers would break as a result.

You are creating a maintainence headache for us as well.  Once this
stuff gets exported to drivers, it becomes nearly impossible to
change.  And I absolutely reserve the right to create restrictions of
use that increase the flexibility we have to change interfaces, data
structures, and locking strategies in the future.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6][TCPDIAG] Just rename everything to inet_diag

2005-08-12 Thread David S. Miller
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
Date: Fri, 12 Aug 2005 13:17:36 -0300

 Just checked:
 
 rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/
 
 Has the reworked, not touching selinux tree

We might have to reneg on changing things from tcpdiag
to inetdiag, James's conflict was one I did not anticipate.

Let me think about this over the weekend before we commit
to doing things one way or the other.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add new iptables ipt_connbytes match

2005-08-12 Thread Harald Welte
On Fri, Aug 12, 2005 at 08:23:55PM +0200, Andi Kleen wrote:
  I don't think that we're ever going to fix that bug in the old
  {get,set}sockopt interface, but rather introduce a netlink interface
  when pkt_tables matures.
 
 All new interfaces should be emulation clean, so that if the old interface
 is replaced later it should eventually work. The best way to do that
 is to use aligned_u64. Should probably put that into linux/types.h

Ok, I hope everyone is fine with this patch:

-- 
- Harald Welte [EMAIL PROTECTED] http://netfilter.org/

  Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed.-- Paul Vixie
[NETFILTER] introduce and use aligned_u64 data type

As proposed by Andi Kleen, this is required esp. for x86_64 architecture,
where 64bit code needs 8byte aligned 64bit data types, but 32bit userspace
apps will only align to 4bytes.

Signed-off-by: Harald Welte [EMAIL PROTECTED]

---
commit 30da9a3da187af74b2e2d00becf2d9cab3624ddd
tree 7666f6ce67e96beedc8884f1aba18ea80a20e2b1
parent 7c249f391a3b9bc86ec07d734959c532a3c7a3f6
author Harald Welte [EMAIL PROTECTED] Fr, 12 Aug 2005 21:00:28 +0200
committer Harald Welte [EMAIL PROTECTED] Fr, 12 Aug 2005 21:00:28 +0200

 include/linux/netfilter/nfnetlink_log.h  |5 +++--
 include/linux/netfilter/nfnetlink_queue.h|5 +++--
 include/linux/netfilter_ipv4/ipt_connbytes.h |4 ++--
 include/linux/types.h|3 +++
 4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink_log.h 
b/include/linux/netfilter/nfnetlink_log.h
--- a/include/linux/netfilter/nfnetlink_log.h
+++ b/include/linux/netfilter/nfnetlink_log.h
@@ -5,6 +5,7 @@
  * and not any kind of function definitions.  It is shared between kernel and
  * userspace.  Don't put kernel specific stuff in here */
 
+#include linux/types.h
 #include linux/netfilter/nfnetlink.h
 
 enum nfulnl_msg_types {
@@ -27,8 +28,8 @@ struct nfulnl_msg_packet_hw {
 } __attribute__ ((packed));
 
 struct nfulnl_msg_packet_timestamp {
-   u_int64_t   sec;
-   u_int64_t   usec;
+   aligned_u64 sec;
+   aligned_u64 usec;
 } __attribute__ ((packed));
 
 #define NFULNL_PREFIXLEN   30  /* just like old log target */
diff --git a/include/linux/netfilter/nfnetlink_queue.h 
b/include/linux/netfilter/nfnetlink_queue.h
--- a/include/linux/netfilter/nfnetlink_queue.h
+++ b/include/linux/netfilter/nfnetlink_queue.h
@@ -1,6 +1,7 @@
 #ifndef _NFNETLINK_QUEUE_H
 #define _NFNETLINK_QUEUE_H
 
+#include linux/types.h
 #include linux/netfilter/nfnetlink.h
 
 enum nfqnl_msg_types {
@@ -24,8 +25,8 @@ struct nfqnl_msg_packet_hw {
 } __attribute__ ((packed));
 
 struct nfqnl_msg_packet_timestamp {
-   u_int64_t   sec;
-   u_int64_t   usec;
+   aligned_u64 sec;
+   aligned_u64 usec;
 } __attribute__ ((packed));
 
 enum nfqnl_attr_type {
diff --git a/include/linux/netfilter_ipv4/ipt_connbytes.h 
b/include/linux/netfilter_ipv4/ipt_connbytes.h
--- a/include/linux/netfilter_ipv4/ipt_connbytes.h
+++ b/include/linux/netfilter_ipv4/ipt_connbytes.h
@@ -16,8 +16,8 @@ enum ipt_connbytes_direction {
 struct ipt_connbytes_info
 {
struct {
-   u_int64_t from; /* count to be matched */
-   u_int64_t to;   /* count to be matched */
+   aligned_u64 from;   /* count to be matched */
+   aligned_u64 to; /* count to be matched */
} count;
u_int8_t what;  /* ipt_connbytes_what */
u_int8_t direction; /* ipt_connbytes_direction */
diff --git a/include/linux/types.h b/include/linux/types.h
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -123,6 +123,9 @@ typedef __u64   u_int64_t;
 typedef__s64   int64_t;
 #endif
 
+/* this is a special 64bit data type that is 8-byte aligned */
+#define aligned_u64 unsigned long long __attribute__((aligned(8)))
+
 /*
  * The type used for indexing onto a disc or disc partition.
  * If required, asm/types.h can override it and define


pgpTd8fpnsZCU.pgp
Description: PGP signature


Re: [PATCH 4/6][TCPDIAG] Just rename everything to inet_diag

2005-08-12 Thread Arnaldo Carvalho de Melo
On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote:
 From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
 Date: Fri, 12 Aug 2005 13:17:36 -0300
 
  Just checked:
 
  rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/
 
  Has the reworked, not touching selinux tree
 
 We might have to reneg on changing things from tcpdiag
 to inetdiag, James's conflict was one I did not anticipate.
 
 Let me think about this over the weekend before we commit
 to doing things one way or the other.

Take your time but as far as I understood from talking to James it was just a
matter of rerunning some sort of userspace tool to regenerate those files,
something he said he would be doing after I submitted the non-touching
SELinux parts.

He seems to be interested in eventually reflecting the fact that inet_diag
uses the same netlink sock for several inet transport level protocols, but I'd
say that for a start he could as well apply the current, pre-inet_diag rules
for just TCP to all the protocols now using this kernel communication channel.

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add new iptables ipt_connbytes match

2005-08-12 Thread David S. Miller
From: Harald Welte [EMAIL PROTECTED]
Date: Fri, 12 Aug 2005 21:03:43 +0200

 Ok, I hope everyone is fine with this patch:

It is, but I did not add the connbytes patch into my tree so I can't
use this patch as-is.  That's why I replied this is broken, fix u64
alignment to the connbytes patch instead of applied, thanks :-)

Please untangle this stuff.  This is how we end up with a big mess of
noise changesets in the tree, due to how we have been putting
half-working changes in first then a bunch of fixup patches.  I'd
like to avoid that, because I then spend a lot of time redoing things
when I rebase the tree later.

So in this case, send me the aligned_u64 patch seperately which
doesn't assume connbytes is in the tree.  Then another patch which
adds connbytes with the proper usage of aligned_u64.

Thanks Harald.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.13-rc6 1/2]netdevice ethtool: Add support for getting the permanent hardware address (resend)

2005-08-12 Thread Jon Wetzel
Adds a new field to net device to hold the permanent hardware address, and adds
a new generic ethtool_op function to get that address.

Signed-off-by: Jon Wetzel [EMAIL PROTECTED]
Signed-off-by: John W. Linville [EMAIL PROTECTED]

--- linux-2.6.13-rc6/include/linux/netdevice.h  2005-08-12 13:10:12.0 
-0500
+++ linux-2.6.13-rc6-jw/include/linux/netdevice.h   2005-08-12 
13:35:18.0 -0500
@@ -336,6 +336,7 @@
/* Interface address info. */
unsigned char   broadcast[MAX_ADDR_LEN];/* hw bcast add 
*/
unsigned char   dev_addr[MAX_ADDR_LEN]; /* hw address   */
+   unsigned char   perm_addr[MAX_ADDR_LEN]; /* permanent hw 
address */
unsigned char   addr_len;   /* hardware address length  
*/
unsigned short  dev_id; /* for shared network cards */
 
--- linux-2.6.13-rc6/include/linux/ethtool.h2005-08-05 02:04:37.0 
-0500
+++ linux-2.6.13-rc6-jw/include/linux/ethtool.h 2005-08-12 13:42:28.0 
-0500
@@ -250,6 +250,12 @@
u64 data[0];
 };
 
+struct ethtool_perm_addr {
+   u32 cmd;/* ETHTOOLGPERMADDR */
+   int size;
+   chardata[0];
+}
+
 struct net_device;
 
 /* Some generic methods drivers may use in their ethtool_ops */
@@ -261,6 +267,8 @@
 int ethtool_op_set_sg(struct net_device *dev, u32 data);
 u32 ethtool_op_get_tso(struct net_device *dev);
 int ethtool_op_set_tso(struct net_device *dev, u32 data);
+int ethtool_op_get_perm_addr(struct net_device *dev, int len,
+struct ethtool_addr *addr);
 
 /**
  * ethtool_ops - Alter and report network device settings
--- linux-2.6.13-rc6/net/core/ethtool.c 2005-08-05 02:04:37.0 -0500
+++ linux-2.6.13-rc6-jw/net/core/ethtool.c  2005-08-12 13:43:35.0 
-0500
@@ -81,6 +81,16 @@
return 0;
 }
 
+int ethtool_op_get_perm_addr(struct net_device *dev, int len, struct 
ethtool_addr *addr)
+{
+   if ( len  MAX_ADDR_LEN )
+   return -ETOOSMALL;
+
+   memcpy(addr-data, dev-perm_addr, MAX_ADDR_LEN);
+   return 0;
+}
+ 
+
 /* Handlers for each ethtool command */
 
 static int ethtool_get_settings(struct net_device *dev, void __user *useraddr)
@@ -826,6 +836,7 @@
 
 EXPORT_SYMBOL(dev_ethtool);
 EXPORT_SYMBOL(ethtool_op_get_link);
+EXPORT_SYMBOL_GPL(ethtool_op_get_perm_addr);
 EXPORT_SYMBOL(ethtool_op_get_sg);
 EXPORT_SYMBOL(ethtool_op_get_tso);
 EXPORT_SYMBOL(ethtool_op_get_tx_csum);
@@ -833,3 +844,4 @@
 EXPORT_SYMBOL(ethtool_op_set_tso);
 EXPORT_SYMBOL(ethtool_op_set_tx_csum);
 EXPORT_SYMBOL(ethtool_op_set_tx_hw_csum);
+
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.13-rc6 1/2]netdevice ethtool: Add support for getting the permanent hardware address (resend)

2005-08-12 Thread David S. Miller
From: Jon Wetzel [EMAIL PROTECTED]
Date: Fri, 12 Aug 2005 15:52:28 -0500

 Adds a new field to net device to hold the permanent hardware
 address, and adds a new generic ethtool_op function to get that
 address.

 Signed-off-by: Jon Wetzel [EMAIL PROTECTED]
 Signed-off-by: John W. Linville [EMAIL PROTECTED]

I think I'll put this stuff in for 2.6.14, it's too late
in the devel cycle to stick it into 2.6.13.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.13-rc6 2/2]e1000: Add support for getting the permanent hardware address (correction)

2005-08-12 Thread Jon Wetzel
Accidentally sent an old version of this patch.  This is the current one.

e1000 driver updated to fill in the new field in netdevice and use the new
ethtool, get_perm_addr.

Signed-off-by: Jon Wetzel [EMAIL PROTECTED]
Signed-off-by: John W. Linville [EMAIL PROTECTED]

--- linux-2.6.13-rc6/drivers/net/e1000/e1000_ethtool.c  2005-08-12 
13:09:16.0 -0500
+++ linux-2.6.13-rc6-jw/drivers/net/e1000/e1000_ethtool.c   2005-08-12 
13:36:09.0 -0500
@@ -1739,6 +1739,7 @@
.phys_id= e1000_phys_id,
.get_stats_count= e1000_get_stats_count,
.get_ethtool_stats  = e1000_get_ethtool_stats,
+   .get_perm_addr  = ethtool_op_get_perm_addr,
 };
 
 void e1000_set_ethtool_ops(struct net_device *netdev)
--- linux-2.6.13-rc6/drivers/net/e1000/e1000_main.c 2005-08-12 
13:09:17.0 -0500
+++ linux-2.6.13-rc6-jw/drivers/net/e1000/e1000_main.c  2005-08-12 
13:36:09.0 -0500
@@ -614,8 +614,9 @@
if(e1000_read_mac_addr(adapter-hw))
DPRINTK(PROBE, ERR, EEPROM Read Error\n);
memcpy(netdev-dev_addr, adapter-hw.mac_addr, netdev-addr_len);
+   memcpy(netdev-perm_addr, adapter-hw.mac_addr, netdev-addr_len);
 
-   if(!is_valid_ether_addr(netdev-dev_addr)) {
+   if(!is_valid_ether_addr(netdev-perm_addr)) {
DPRINTK(PROBE, ERR, Invalid MAC Address\n);
err = -EIO;
goto err_eeprom;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6][TCPDIAG] Just rename everything to inet_diag

2005-08-12 Thread Arnaldo Carvalho de Melo
On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote:
 From: James Morris [EMAIL PROTECTED]
 Date: Fri, 12 Aug 2005 15:00:49 -0400 (EDT)
  Just do what you think is right for the core networking and we'll adjust
  SELinux accordingly.
 
 Ok, I've pulled in Arnaldo's changes, as-is.

Thanks!

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skb-pkt_type

2005-08-12 Thread David S. Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Wed, 10 Aug 2005 02:18:46 +0200

 BTW, an idea to make room for ipvs_property would be to place the three
 nfctinfo bits in the lower three bits of the nfct pointer. I'm not sure
 if it guarantees 8 byte alignemnt, which would be required for this to
 work ..

It turns out that we need a two-bit state for the fast SKB
cloning patch I'm working on with Thomas Graf, which perfectly
combines with the now-3-bit pkt_type field.  So that's the
plan for the time being.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Bug 5050] New: KERNEL: assertion (cnt = tp-packets_out) failed at net/ipv4/tcp_input.c (1476)

2005-08-12 Thread Herbert Xu
On Fri, Aug 12, 2005 at 09:15:44AM -0700, Stephen Hemminger wrote:
 
 Steps to reproduce: 
 System is running 2 days and after that time produce this message
 
 KERNEL: assertion (cnt = tp-packets_out) failed at
 net/ipv4/tcp_input.c (1476) KERNEL: assertion (cnt = tp-packets_out)
 failed at net/ipv4/tcp_input.c (1476)

We believe that this bug may have been fixed by the following patch
which was applied after rc6.  Please apply only the debugging patch
and let us know what it prints out so that we can confirm that this
is indeed the problem.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1370,15 +1370,21 @@ int tcp_retransmit_skb(struct sock *sk, 
 
if (skb-len  cur_mss) {
int old_factor = tcp_skb_pcount(skb);
-   int new_factor;
+   int diff;
 
if (tcp_fragment(sk, skb, cur_mss, cur_mss))
return -ENOMEM; /* We'll try again later. */
 
/* New SKB created, account for it. */
-   new_factor = tcp_skb_pcount(skb);
-   tp-packets_out -= old_factor - new_factor;
-   tp-packets_out += tcp_skb_pcount(skb-next);
+   diff = old_factor - tcp_skb_pcount(skb) -
+  tcp_skb_pcount(skb-next);
+   tp-packets_out -= diff;
+
+   if (diff  0) {
+   tp-fackets_out -= diff;
+   if ((int)tp-fackets_out  0)
+   tp-fackets_out = 0;
+   }
}
 
/* Collapse two adjacent packets if worthwhile and we can. */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1474,6 +1474,10 @@ static void tcp_mark_head_lost(struct so
int cnt = packets;
 
BUG_TRAP(cnt = tp-packets_out);
+   if (unlikely(cnt = tp-packets_out)) {
+   printk(packets_out = %d, fackets_out = %d, reordering = %d, 
sack_ok = 0x%x, mss_cache=%d\n, tp-packets_out, tp-fackets_out, 
tp-reordering, tp-rx_opt.sack_ok, tp-mss_cache);
+   dump_stack();
+   }
 
sk_stream_for_retrans_queue(skb, sk) {
cnt -= tcp_skb_pcount(skb);


[RFC NETLINK 0/8]: Support dynamic number of groups

2005-08-12 Thread Patrick McHardy

Hi,

besides a small bugfix, this patchset adds support for dynamic number
of groups to netlink. To support an arbitary number of groups a couple
of changes had to me made, I'll explain them below. The patches are
only sent to netdev to avoid spamming your inboxes.

The destination groups of a packet are currently stored in the cb as a
bitmask. To avoid beeing limited by the size of the cb, support for
broadcasting to multiple groups using a single call to netlink_broadcast
is removed and only a single destination group is supported. which is
stored as an integer in the cb. No users in the kernel used more than a
single destination group.

The subscribed groups bitmask in struct netlink_sock is only 32 bit
wide, it is changed to be dynamically allocated. Currently binding to
a group is possible before a kernel socket for a protocol exists.
To avoid guessing the group number and dealing with reallocations this
is changed and sockets for a protocol can only be created when a
kernel socket exists. Herbert and Thomas agreed that pure userspace
communication is not a good idea with current netlink and the change
should be ok.

For compatibility, userspace can still subscribe to the lower 32 groups
using bind and see which groups a socket is subscribed to using
getsockname, to subscribe/unsubscribe groups in the extended range two
setsockopt options are provided. struct nl_addr can only contain up to
32 groups, to get the destination group of a packet for the extended
range a nl_pktinfo control message can be enabled using another
setsockopt option.


[NETLINK]: Fix module refcounting problems
[NETLINK]: Remove unused groups member from struct netlink_skb_parms
[NETLINK]: Use group numbers instead of bitmasks internally
[NETLINK]: Convert netlink users to use group numbers instead of bitmasks
[NETLINK]: Return -EPROTONOSUPPORT in netlink_create() if no kernel 
socket is registered

[NETLINK]: Support dynamic number of multicast groups per netlink family
[NETLINK]: Add set/getsockopt options to support more than 32 groups
[NETLINK]: Add groups argument to netlink_kernel_create
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[NETLINK 1/8]: Fix module refcounting problems

2005-08-12 Thread Patrick McHardy
 [NETLINK]: Fix module refcounting problems

Use-after-free: the struct proto_ops containing the module pointer
is freed when a socket with pid=0 is released, which besides for kernel
sockets is true for all unbound sockets.

Module refcount leak: when the kernel socket is closed before all user
sockets have been closed the proto_ops struct for this family is
replaced by the generic one and the module refcount can't be dropped.

The second problem can't be solved cleanly using module refcounting in the
generic socket code, so this patch adds explicit refcounting to
netlink_create/netlink_release.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 1f74632caaf6f2bf31cf02ac28c5087e4224b02e
tree c63e3fcfef8d10a928ac7a03fd2ba66ea12479cf
parent 036b419a397e294a5a8ca37845e3023f979976fc
author Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 00:21:23 +0200
committer Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 00:21:23 +0200

 net/netlink/af_netlink.c |  100 --
 1 files changed, 35 insertions(+), 65 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -73,8 +73,12 @@ struct netlink_sock {
 	struct netlink_callback	*cb;
 	spinlock_t		cb_lock;
 	void			(*data_ready)(struct sock *sk, int bytes);
+	struct module		*module;
+	u32			flags;
 };
 
+#define NETLINK_KERNEL_SOCKET	0x1
+
 static inline struct netlink_sock *nlk_sk(struct sock *sk)
 {
 	return (struct netlink_sock *)sk;
@@ -97,7 +101,7 @@ struct netlink_table {
 	struct nl_pid_hash hash;
 	struct hlist_head mc_list;
 	unsigned int nl_nonroot;
-	struct proto_ops *p_ops;
+	struct module *module;
 };
 
 static struct netlink_table *nl_table;
@@ -338,6 +342,7 @@ static int netlink_create(struct socket 
 {
 	struct sock *sk;
 	struct netlink_sock *nlk;
+	struct module *module;
 
 	sock-state = SS_UNCONNECTED;
 
@@ -347,30 +352,36 @@ static int netlink_create(struct socket 
 	if (protocol0 || protocol = MAX_LINKS)
 		return -EPROTONOSUPPORT;
 
-	netlink_table_grab();
+	netlink_lock_table();
 	if (!nl_table[protocol].hash.entries) {
 #ifdef CONFIG_KMOD
 		/* We do 'best effort'.  If we find a matching module,
 		 * it is loaded.  If not, we don't return an error to
 		 * allow pure userspace-userspace communication. -HW
 		 */
-		netlink_table_ungrab();
+		netlink_unlock_table();
 		request_module(net-pf-%d-proto-%d, PF_NETLINK, protocol);
-		netlink_table_grab();
+		netlink_lock_table();
 #endif
 	}
-	netlink_table_ungrab();
+	module = nl_table[protocol].module;
+	if (!try_module_get(module))
+		module = NULL;
+	netlink_unlock_table();
 
-	sock-ops = nl_table[protocol].p_ops;
+	sock-ops = netlink_ops;
 
 	sk = sk_alloc(PF_NETLINK, GFP_KERNEL, netlink_proto, 1);
-	if (!sk)
+	if (!sk) {
+		module_put(module);
 		return -ENOMEM;
+	}
 
 	sock_init_data(sock, sk);
 
 	nlk = nlk_sk(sk);
 
+	nlk-module = module;
 	spin_lock_init(nlk-cb_lock);
 	init_waitqueue_head(nlk-wait);
 	sk-sk_destruct = netlink_sock_destruct;
@@ -415,22 +426,15 @@ static int netlink_release(struct socket
 		notifier_call_chain(netlink_chain, NETLINK_URELEASE, n);
 	}	
 
-	/* When this is a kernel socket, we need to remove the owner pointer,
-	 * since we don't know whether the module will be dying at any given
-	 * point - HW
-	 */
-	if (!nlk-pid) {
-		struct proto_ops *p_tmp;
+	if (nlk-module)
+		module_put(nlk-module);
 
+	if (nlk-flags  NETLINK_KERNEL_SOCKET) {
 		netlink_table_grab();
-		p_tmp = nl_table[sk-sk_protocol].p_ops;
-		if (p_tmp != netlink_ops) {
-			nl_table[sk-sk_protocol].p_ops = netlink_ops;
-			kfree(p_tmp);
-		}
+		nl_table[sk-sk_protocol].module = NULL;
 		netlink_table_ungrab();
 	}
-	
+
 	sock_put(sk);
 	return 0;
 }
@@ -1061,9 +1065,9 @@ static void netlink_data_ready(struct so
 struct sock *
 netlink_kernel_create(int unit, void (*input)(struct sock *sk, int len), struct module *module)
 {
-	struct proto_ops *p_ops;
 	struct socket *sock;
 	struct sock *sk;
+	struct netlink_sock *nlk;
 
 	if (!nl_table)
 		return NULL;
@@ -1071,64 +1075,32 @@ netlink_kernel_create(int unit, void (*i
 	if (unit0 || unit=MAX_LINKS)
 		return NULL;
 
-	/* Do a quick check, to make us not go down to netlink_insert()
-	 * if protocol already has kernel socket.
-	 */
-	sk = netlink_lookup(unit, 0);
-	if (unlikely(sk)) {
-		sock_put(sk);
-		return NULL;
-	}
-
 	if (sock_create_lite(PF_NETLINK, SOCK_DGRAM, unit, sock))
 		return NULL;
 
-	sk = NULL;
-	if (module) {
-		/* Every registering protocol implemented in a module needs
-		 * it's own p_ops, since the socket code cannot deal with
-		 * module refcounting otherwise.  -HW
-		 */
-		p_ops = kmalloc(sizeof(*p_ops), GFP_KERNEL);
-		if (!p_ops)
-			goto out_sock_release;
-
-		memcpy(p_ops, netlink_ops, sizeof(*p_ops));
-		p_ops-owner = module;
-	} else
-		p_ops = netlink_ops;
-
-	netlink_table_grab();
-	nl_table[unit].p_ops = p_ops;
-	netlink_table_ungrab();
-
-	if (netlink_create(sock, unit)  0) 

[NETLINK 4/8]: Convert netlink users to use group numbers instead of bitmasks

2005-08-12 Thread Patrick McHardy
 [NETLINK]: Convert netlink users to use group numbers instead of bitmasks

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit a8a8c74ef1b37254f920103a6ce70237a6a55dab
tree c8decf70f15805fc7c23bee441b2ce8b14e7b264
parent 5c34a3fbc1e62fc90db80f148e07ea7817013dca
author Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 00:56:59 +0200
committer Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 00:56:59 +0200

 include/linux/netfilter/nfnetlink.h   |   23 +++-
 include/linux/netfilter_decnet.h  |   14 ++
 include/linux/rtnetlink.h |   42 +++--
 include/linux/xfrm.h  |   18 
 net/bridge/netfilter/ebt_ulog.c   |4 +--
 net/core/neighbour.c  |8 +++---
 net/core/rtnetlink.c  |6 ++--
 net/core/wireless.c   |4 +--
 net/decnet/dn_dev.c   |8 +++---
 net/decnet/dn_table.c |4 +--
 net/decnet/netfilter/dn_rtmsg.c   |6 ++--
 net/ipv4/devinet.c|7 ++---
 net/ipv4/fib_frontend.c   |2 +
 net/ipv4/fib_semantics.c  |4 +--
 net/ipv4/netfilter/ip_conntrack_netlink.c |   12 
 net/ipv4/netfilter/ipt_ULOG.c |8 +++---
 net/ipv6/addrconf.c   |   24 -
 net/ipv6/route.c  |8 +++---
 net/netfilter/nfnetlink.c |2 +
 net/sched/act_api.c   |8 +++---
 net/sched/cls_api.c   |2 +
 net/sched/sch_api.c   |4 +--
 net/xfrm/xfrm_user.c  |   23 +++-
 23 files changed, 163 insertions(+), 78 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink.h b/include/linux/netfilter/nfnetlink.h
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -2,13 +2,34 @@
 #define _NFNETLINK_H
 #include linux/types.h
 
-/* nfnetlink groups: Up to 32 maximum */
+#ifndef __KERNEL__
+/* nfnetlink groups: Up to 32 maximum - backwards compatibility for userspace */
 #define NF_NETLINK_CONNTRACK_NEW 		0x0001
 #define NF_NETLINK_CONNTRACK_UPDATE		0x0002
 #define NF_NETLINK_CONNTRACK_DESTROY		0x0004
 #define NF_NETLINK_CONNTRACK_EXP_NEW		0x0008
 #define NF_NETLINK_CONNTRACK_EXP_UPDATE		0x0010
 #define NF_NETLINK_CONNTRACK_EXP_DESTROY	0x0020
+#endif
+
+enum nfnetlink_groups {
+	NFNLGRP_NONE,
+#define NFNLGRP_NONE			NFNLGRP_NONE
+	NFNLGRP_CONNTRACK_NEW,
+#define NFNLGRP_CONNTRACK_NEW		NFNLGRP_CONNTRACK_NEW
+	NFNLGRP_CONNTRACK_UPDATE,
+#define NFNLGRP_CONNTRACK_UPDATE	NFNLGRP_CONNTRACK_UPDATE
+	NFNLGRP_CONNTRACK_DESTROY,
+#define NFNLGRP_CONNTRACK_DESTROY	NFNLGRP_CONNTRACK_DESTROY
+	NFNLGRP_CONNTRACK_EXP_NEW,
+#define	NFNLGRP_CONNTRACK_EXP_NEW	NFNLGRP_CONNTRACK_EXP_NEW
+	NFNLGRP_CONNTRACK_EXP_UPDATE,
+#define NFNLGRP_CONNTRACK_EXP_UPDATE	NFNLGRP_CONNTRACK_EXP_UPDATE
+	NFNLGRP_CONNTRACK_EXP_DESTROY,
+#define NFNLGRP_CONNTRACK_EXP_DESTROY	NFNLGRP_CONNTRACK_EXP_DESTROY
+	__NFNLGRP_MAX,
+};
+#define NFNLGRP_MAX	(__NFNLGRP_MAX - 1)
 
 /* Generic structure for encapsulation optional netfilter information.
  * It is reminiscent of sockaddr, but with sa_family replaced
diff --git a/include/linux/netfilter_decnet.h b/include/linux/netfilter_decnet.h
--- a/include/linux/netfilter_decnet.h
+++ b/include/linux/netfilter_decnet.h
@@ -56,7 +56,21 @@ struct nf_dn_rtmsg {
 
 #define NFDN_RTMSG(r) ((unsigned char *)(r) + NLMSG_ALIGN(sizeof(struct nf_dn_rtmsg)))
 
+#ifndef __KERNEL__
+/* backwards compatibility for userspace */
 #define DNRMG_L1_GROUP 0x01
 #define DNRMG_L2_GROUP 0x02
+#endif
+
+enum {
+	DNRNG_NLGRP_NONE,
+#define DNRNG_NLGRP_NONE	DNRNG_NLGRP_NONE
+	DNRNG_NLGRP_L1,
+#define DNRNG_NLGRP_L1		DNRNG_NLGRP_L1
+	DNRNG_NLGRP_L2,
+#define DNRNG_NLGRP_L2		DNRNG_NLGRP_L2
+	__DNRNG_NLGRP_MAX
+};
+#define DNRNG_NLGRP_MAX	(__DNRNG_NLGRP_MAX - 1)
 
 #endif /*__LINUX_DECNET_NETFILTER_H*/
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -826,9 +826,8 @@ enum
 #define TCA_RTA(r)  ((struct rtattr*)(((char*)(r)) + NLMSG_ALIGN(sizeof(struct tcmsg
 #define TCA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcmsg))
 
-
-/* RTnetlink multicast groups */
-
+#ifndef __KERNEL__
+/* RTnetlink multicast groups - backwards compatibility for userspace */
 #define RTMGRP_LINK		1
 #define RTMGRP_NOTIFY		2
 #define RTMGRP_NEIGH		4
@@ -847,6 +846,43 @@ enum
 #define RTMGRP_DECnet_ROUTE 0x4000
 
 #define RTMGRP_IPV6_PREFIX	0x2
+#endif
+
+/* RTnetlink multicast groups */
+enum rtnetlink_groups {
+	RTNLGRP_NONE,
+#define RTNLGRP_NONE		RTNLGRP_NONE
+	RTNLGRP_LINK,
+#define RTNLGRP_LINK		RTNLGRP_LINK
+	RTNLGRP_NOTIFY,
+#define RTNLGRP_NOTIFY		RTNLGRP_NOTIFY
+	RTNLGRP_NEIGH,
+#define RTNLGRP_NEIGH		RTNLGRP_NEIGH
+	RTNLGRP_TC,
+#define 

[NETLINK 2/8]: Remove unused groups member from struct netlink_skb_parms

2005-08-12 Thread Patrick McHardy
 [NETLINK]: Remove unused groups member from struct netlink_skb_parms

Signed-off-by: Patrick McHardy [EMAIL PROTECTED],net

---
commit 910f9b156d87a1d9d013985ce3973b9a0d27dbd6
tree a430be569a7d7c79088d7b830a57e31c98f95060
parent 1f74632caaf6f2bf31cf02ac28c5087e4224b02e
author Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 00:30:12 +0200
committer Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 00:30:12 +0200

 include/linux/netlink.h  |1 -
 net/ipv4/fib_frontend.c  |1 -
 net/netlink/af_netlink.c |1 -
 3 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -106,7 +106,6 @@ struct netlink_skb_parms
 {
 	struct ucred		creds;		/* Skb credentials	*/
 	__u32			pid;
-	__u32			groups;
 	__u32			dst_pid;
 	__u32			dst_groups;
 	kernel_cap_t		eff_cap;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -558,7 +558,6 @@ static void nl_fib_input(struct sock *sk
 	nl_fib_lookup(frn, tb);
 	
 	pid = nlh-nlmsg_pid;   /*pid of sending process */
-	NETLINK_CB(skb).groups = 0; /* not in mcast group */
 	NETLINK_CB(skb).pid = 0; /* from kernel */
 	NETLINK_CB(skb).dst_pid = pid;
 	NETLINK_CB(skb).dst_groups = 0;  /* unicast */
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -954,7 +954,6 @@ static int netlink_sendmsg(struct kiocb 
 		goto out;
 
 	NETLINK_CB(skb).pid	= nlk-pid;
-	NETLINK_CB(skb).groups	= nlk-groups;
 	NETLINK_CB(skb).dst_pid = dst_pid;
 	NETLINK_CB(skb).dst_groups = dst_groups;
 	NETLINK_CB(skb).loginuid = audit_get_loginuid(current-audit_context);


[NETLINK 6/8]: Support dynamic number of multicast groups per netlink family

2005-08-12 Thread Patrick McHardy
 [NETLINK]: Support dynamic number of multicast groups per netlink family

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit a5314b2c777dc032b93f4f068ab1759f5610999f
tree 68571754baf232d5c76b15ec7e270b4af058867a
parent 2b1cc05d6484d70aae14d869730f8ce959ed7bdd
author Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 01:16:52 +0200
committer Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 01:16:52 +0200

 net/netlink/af_netlink.c |   66 +-
 1 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -60,21 +60,24 @@
 #include net/scm.h
 
 #define Nprintk(a...)
+#define NLGRPSZ(x)	(ALIGN(x, sizeof(unsigned long) * 8) / 8)
 
 struct netlink_sock {
 	/* struct sock has to be the first member of netlink_sock */
 	struct sock		sk;
 	u32			pid;
-	unsigned int		groups;
 	u32			dst_pid;
 	u32			dst_group;
+	u32			flags;
+	u32			subscriptions;
+	u32			ngroups;
+	unsigned long		*groups;
 	unsigned long		state;
 	wait_queue_head_t	wait;
 	struct netlink_callback	*cb;
 	spinlock_t		cb_lock;
 	void			(*data_ready)(struct sock *sk, int bytes);
 	struct module		*module;
-	u32			flags;
 };
 
 #define NETLINK_KERNEL_SOCKET	0x1
@@ -101,6 +104,7 @@ struct netlink_table {
 	struct nl_pid_hash hash;
 	struct hlist_head mc_list;
 	unsigned int nl_nonroot;
+	unsigned int groups;
 	struct module *module;
 	int registered;
 };
@@ -138,6 +142,7 @@ static void netlink_sock_destruct(struct
 	BUG_TRAP(!atomic_read(sk-sk_rmem_alloc));
 	BUG_TRAP(!atomic_read(sk-sk_wmem_alloc));
 	BUG_TRAP(!nlk_sk(sk)-cb);
+	BUG_TRAP(!nlk_sk(sk)-groups);
 }
 
 /* This lock without WQ_FLAG_EXCLUSIVE is good on UP and it is _very_ bad on SMP.
@@ -333,7 +338,7 @@ static void netlink_remove(struct sock *
 	netlink_table_grab();
 	if (sk_del_node_init(sk))
 		nl_table[sk-sk_protocol].hash.entries--;
-	if (nlk_sk(sk)-groups)
+	if (nlk_sk(sk)-subscriptions)
 		__sk_del_bind_node(sk);
 	netlink_table_ungrab();
 }
@@ -369,6 +374,8 @@ static int __netlink_create(struct socke
 static int netlink_create(struct socket *sock, int protocol)
 {
 	struct module *module = NULL;
+	struct netlink_sock *nlk;
+	unsigned int groups;
 	int err = 0;
 
 	sock-state = SS_UNCONNECTED;
@@ -392,15 +399,23 @@ static int netlink_create(struct socket 
 		module = nl_table[protocol].module;
 	else
 		err = -EPROTONOSUPPORT;
+	groups = nl_table[protocol].groups;
 	netlink_unlock_table();
 
-	if (err)
-		goto out;
+	if (err || (err = __netlink_create(sock, protocol)  0))
+		goto out_module;
+
+	nlk = nlk_sk(sock-sk);
 
-	if ((err = __netlink_create(sock, protocol)  0))
+	nlk-groups = kmalloc(NLGRPSZ(groups), GFP_KERNEL);
+	if (nlk-groups == NULL) {
+		err = -ENOMEM;
 		goto out_module;
+	}
+	memset(nlk-groups, 0, NLGRPSZ(groups));
+	nlk-ngroups = groups;
 
-	nlk_sk(sock-sk)-module = module;
+	nlk-module = module;
 out:
 	return err;
 
@@ -437,7 +452,7 @@ static int netlink_release(struct socket
 
 	skb_queue_purge(sk-sk_write_queue);
 
-	if (nlk-pid  !nlk-groups) {
+	if (nlk-pid  !nlk-subscriptions) {
 		struct netlink_notify n = {
 		.protocol = sk-sk_protocol,
 		.pid = nlk-pid,
@@ -455,6 +470,7 @@ static int netlink_release(struct socket
 		netlink_table_ungrab();
 	}
 
+	kfree(nlk-groups);
 	sock_put(sk);
 	return 0;
 }
@@ -503,6 +519,18 @@ static inline int netlink_capable(struct
 	   capable(CAP_NET_ADMIN);
 } 
 
+static void
+netlink_update_subscriptions(struct sock *sk, unsigned int subscriptions)
+{
+	struct netlink_sock *nlk = nlk_sk(sk);
+
+	if (nlk-subscriptions  !subscriptions)
+		__sk_del_bind_node(sk);
+	else if (!nlk-subscriptions  subscriptions)
+		sk_add_bind_node(sk, nl_table[sk-sk_protocol].mc_list);
+	nlk-subscriptions = subscriptions;
+}
+
 static int netlink_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 {
 	struct sock *sk = sock-sk;
@@ -528,15 +556,14 @@ static int netlink_bind(struct socket *s
 			return err;
 	}
 
-	if (!nladdr-nl_groups  !nlk-groups)
+	if (!nladdr-nl_groups  !(u32)nlk-groups[0])
 		return 0;
 
 	netlink_table_grab();
-	if (nlk-groups  !nladdr-nl_groups)
-		__sk_del_bind_node(sk);
-	else if (!nlk-groups  nladdr-nl_groups)
-		sk_add_bind_node(sk, nl_table[sk-sk_protocol].mc_list);
-	nlk-groups = nladdr-nl_groups;
+	netlink_update_subscriptions(sk, nlk-subscriptions +
+	 hweight32(nladdr-nl_groups) -
+	 hweight32(nlk-groups[0]));
+	*(u32 *)nlk-groups = nladdr-nl_groups;
 	netlink_table_ungrab();
 
 	return 0;
@@ -590,7 +617,7 @@ static int netlink_getname(struct socket
 		nladdr-nl_groups = netlink_group_mask(nlk-dst_group);
 	} else {
 		nladdr-nl_pid = nlk-pid;
-		nladdr-nl_groups = nlk-groups; 
+		nladdr-nl_groups = nlk-groups[0];
 	}
 	return 0;
 }
@@ -791,7 +818,8 @@ static inline int do_one_broadcast(struc
 	if (p-exclude_sk == sk)
 		goto out;
 
-	if (nlk-pid == p-pid || 

[NETLINK 8/8]: Add groups argument to netlink_kernel_create

2005-08-12 Thread Patrick McHardy
 [NETLINK]: Add groups argument to netlink_kernel_create

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 5719d60b114683e7c1bf1aa9a553efb641184e1b
tree c6a56c893ae404e6767f3cefbebd2a88a2981775
parent c366740a65d35924ee4efce970db8a738dd4b384
author Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 01:50:00 +0200
committer Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 01:50:00 +0200

 drivers/w1/w1_int.c |2 +-
 include/linux/netlink.h |2 +-
 kernel/audit.c  |2 +-
 lib/kobject_uevent.c|2 +-
 net/bridge/netfilter/ebt_ulog.c |3 ++-
 net/core/rtnetlink.c|3 ++-
 net/decnet/netfilter/dn_rtmsg.c |4 ++--
 net/ipv4/fib_frontend.c |2 +-
 net/ipv4/netfilter/ip_queue.c   |2 +-
 net/ipv4/netfilter/ipt_ULOG.c   |3 ++-
 net/ipv4/tcp_diag.c |2 +-
 net/ipv6/netfilter/ip6_queue.c  |3 ++-
 net/netfilter/nfnetlink.c   |4 ++--
 net/netlink/af_netlink.c|6 --
 net/xfrm/xfrm_user.c|4 ++--
 15 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/drivers/w1/w1_int.c b/drivers/w1/w1_int.c
--- a/drivers/w1/w1_int.c
+++ b/drivers/w1/w1_int.c
@@ -88,7 +88,7 @@ static struct w1_master * w1_alloc_dev(u
 
 	dev-groups = 23;
 	dev-seq = 1;
-	dev-nls = netlink_kernel_create(NETLINK_W1, NULL, THIS_MODULE);
+	dev-nls = netlink_kernel_create(NETLINK_W1, 1, NULL, THIS_MODULE);
 	if (!dev-nls) {
 		printk(KERN_ERR Failed to create new netlink socket(%u) for w1 master %s.\n,
 			NETLINK_NFLOG, dev-dev.bus_id);
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -125,7 +125,7 @@ struct netlink_skb_parms
 #define NETLINK_CREDS(skb)	(NETLINK_CB((skb)).creds)
 
 
-extern struct sock *netlink_kernel_create(int unit, void (*input)(struct sock *sk, int len), struct module *module);
+extern struct sock *netlink_kernel_create(int unit, unsigned int groups, void (*input)(struct sock *sk, int len), struct module *module);
 extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err);
 extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 pid, int nonblock);
 extern int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 pid,
diff --git a/kernel/audit.c b/kernel/audit.c
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -514,7 +514,7 @@ static int __init audit_init(void)
 {
 	printk(KERN_INFO audit: initializing netlink socket (%s)\n,
 	   audit_default ? enabled : disabled);
-	audit_sock = netlink_kernel_create(NETLINK_AUDIT, audit_receive,
+	audit_sock = netlink_kernel_create(NETLINK_AUDIT, 0, audit_receive,
 	   THIS_MODULE);
 	if (!audit_sock)
 		audit_panic(cannot initialize netlink socket);
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -153,7 +153,7 @@ EXPORT_SYMBOL_GPL(kobject_uevent_atomic)
 
 static int __init kobject_uevent_init(void)
 {
-	uevent_sock = netlink_kernel_create(NETLINK_KOBJECT_UEVENT, NULL,
+	uevent_sock = netlink_kernel_create(NETLINK_KOBJECT_UEVENT, 1, NULL,
 	THIS_MODULE);
 
 	if (!uevent_sock) {
diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c
--- a/net/bridge/netfilter/ebt_ulog.c
+++ b/net/bridge/netfilter/ebt_ulog.c
@@ -258,7 +258,8 @@ static int __init init(void)
 		spin_lock_init(ulog_buffers[i].lock);
 	}
 
-	ebtulognl = netlink_kernel_create(NETLINK_NFLOG, NULL, THIS_MODULE);
+	ebtulognl = netlink_kernel_create(NETLINK_NFLOG, EBT_ULOG_MAXNLGROUPS,
+	  NULL, THIS_MODULE);
 	if (!ebtulognl)
 		ret = -ENOMEM;
 	else if ((ret = ebt_register_watcher(ulog)))
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -708,7 +708,8 @@ void __init rtnetlink_init(void)
 	if (!rta_buf)
 		panic(rtnetlink_init: cannot allocate rta_buf\n);
 
-	rtnl = netlink_kernel_create(NETLINK_ROUTE, rtnetlink_rcv, THIS_MODULE);
+	rtnl = netlink_kernel_create(NETLINK_ROUTE, RTNLGRP_MAX, rtnetlink_rcv,
+	 THIS_MODULE);
 	if (rtnl == NULL)
 		panic(rtnetlink_init: cannot initialize rtnetlink\n);
 	netlink_set_nonroot(NETLINK_ROUTE, NL_NONROOT_RECV);
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -138,8 +138,8 @@ static int __init init(void)
 {
 	int rv = 0;
 
-	dnrmg = netlink_kernel_create(NETLINK_DNRTMSG, dnrmg_receive_user_sk,
-  THIS_MODULE);
+	dnrmg = netlink_kernel_create(NETLINK_DNRTMSG, DNRNG_NLGRP_MAX,
+	  dnrmg_receive_user_sk, THIS_MODULE);
 	if (dnrmg == NULL) {
 		printk(KERN_ERR dn_rtmsg: Cannot create netlink socket);
 		return -ENOMEM;
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ 

[NETLINK 5/8]: Return -EPROTONOSUPPORT in netlink_create() if no kernel socket is registered

2005-08-12 Thread Patrick McHardy
 [NETLINK]: Return -EPROTONOSUPPORT in netlink_create() if no kernel socket is registered

This is necessary for dynamic number of netlink groups to make sure we know
the number of possible groups before bind() is called. With this change pure
userspace communication using unused netlink protocols becomes impossible.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 2b1cc05d6484d70aae14d869730f8ce959ed7bdd
tree 3c1145ba97171ef0652f80d7e35a54de1b0be4bf
parent a8a8c74ef1b37254f920103a6ce70237a6a55dab
author Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 01:05:49 +0200
committer Patrick McHardy [EMAIL PROTECTED] Sat, 13 Aug 2005 01:05:49 +0200

 net/netlink/af_netlink.c |   72 --
 1 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -102,6 +102,7 @@ struct netlink_table {
 	struct hlist_head mc_list;
 	unsigned int nl_nonroot;
 	struct module *module;
+	int registered;
 };
 
 static struct netlink_table *nl_table;
@@ -343,11 +344,32 @@ static struct proto netlink_proto = {
 	.obj_size = sizeof(struct netlink_sock),
 };
 
-static int netlink_create(struct socket *sock, int protocol)
+static int __netlink_create(struct socket *sock, int protocol)
 {
 	struct sock *sk;
 	struct netlink_sock *nlk;
-	struct module *module;
+
+	sock-ops = netlink_ops;
+
+	sk = sk_alloc(PF_NETLINK, GFP_KERNEL, netlink_proto, 1);
+	if (!sk)
+		return -ENOMEM;
+
+	sock_init_data(sock, sk);
+
+	nlk = nlk_sk(sk);
+	spin_lock_init(nlk-cb_lock);
+	init_waitqueue_head(nlk-wait);
+
+	sk-sk_destruct = netlink_sock_destruct;
+	sk-sk_protocol = protocol;
+	return 0;
+}
+
+static int netlink_create(struct socket *sock, int protocol)
+{
+	struct module *module = NULL;
+	int err = 0;
 
 	sock-state = SS_UNCONNECTED;
 
@@ -358,41 +380,33 @@ static int netlink_create(struct socket 
 		return -EPROTONOSUPPORT;
 
 	netlink_lock_table();
-	if (!nl_table[protocol].hash.entries) {
 #ifdef CONFIG_KMOD
-		/* We do 'best effort'.  If we find a matching module,
-		 * it is loaded.  If not, we don't return an error to
-		 * allow pure userspace-userspace communication. -HW
-		 */
+	if (!nl_table[protocol].registered) {
 		netlink_unlock_table();
 		request_module(net-pf-%d-proto-%d, PF_NETLINK, protocol);
 		netlink_lock_table();
-#endif
 	}
-	module = nl_table[protocol].module;
-	if (!try_module_get(module))
-		module = NULL;
+#endif
+	if (nl_table[protocol].registered 
+	try_module_get(nl_table[protocol].module))
+		module = nl_table[protocol].module;
+	else
+		err = -EPROTONOSUPPORT;
 	netlink_unlock_table();
 
-	sock-ops = netlink_ops;
-
-	sk = sk_alloc(PF_NETLINK, GFP_KERNEL, netlink_proto, 1);
-	if (!sk) {
-		module_put(module);
-		return -ENOMEM;
-	}
-
-	sock_init_data(sock, sk);
+	if (err)
+		goto out;
 
-	nlk = nlk_sk(sk);
+	if ((err = __netlink_create(sock, protocol)  0))
+		goto out_module;
 
-	nlk-module = module;
-	spin_lock_init(nlk-cb_lock);
-	init_waitqueue_head(nlk-wait);
-	sk-sk_destruct = netlink_sock_destruct;
+	nlk_sk(sock-sk)-module = module;
+out:
+	return err;
 
-	sk-sk_protocol = protocol;
-	return 0;
+out_module:
+	module_put(module);
+	goto out;
 }
 
 static int netlink_release(struct socket *sock)
@@ -437,6 +451,7 @@ static int netlink_release(struct socket
 	if (nlk-flags  NETLINK_KERNEL_SOCKET) {
 		netlink_table_grab();
 		nl_table[sk-sk_protocol].module = NULL;
+		nl_table[sk-sk_protocol].registered = 0;
 		netlink_table_ungrab();
 	}
 
@@ -1082,7 +1097,7 @@ netlink_kernel_create(int unit, void (*i
 	if (sock_create_lite(PF_NETLINK, SOCK_DGRAM, unit, sock))
 		return NULL;
 
-	if (netlink_create(sock, unit)  0)
+	if (__netlink_create(sock, unit)  0)
 		goto out_sock_release;
 
 	sk = sock-sk;
@@ -1098,6 +1113,7 @@ netlink_kernel_create(int unit, void (*i
 
 	netlink_table_grab();
 	nl_table[unit].module = module;
+	nl_table[unit].registered = 1;
 	netlink_table_ungrab();
 
 	return sk;


skb-stamp conversion missing from latest net-2.6.14

2005-08-12 Thread Patrick McHardy

Hi Dave,

I just wanted to make the patch to break compilation for
unconverted code for the skb-stamp change and noticed that
the patch is missing from your latest net-2.6.14 tree. Is
this deliberate or did it get lost?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html