Re: [NETLINK 8/10]: Support dynamic number of multicast groups per netlink family

2005-08-15 Thread Patrick McHardy

Evgeniy Polyakov wrote:
+	nlk-groups[0] = (nlk-groups[0]  ~0xUL) | nladdr-nl_groups; 
	netlink_table_ungrab();



I have some doubt about 64bit platforms.


We want to replace the lower 32 bit. What are the doubts you're haveing?


return 0;
@@ -590,7 +619,7 @@ static int netlink_getname(struct socket
nladdr-nl_groups = netlink_group_mask(nlk-dst_group);
} else {
nladdr-nl_pid = nlk-pid;
-		nladdr-nl_groups = nlk-groups; 
+		nladdr-nl_groups = nlk-groups[0];



And here too.

nlk-groups[0] is an unsigned long, which is 64bit on 64bit platforms.


So it will be truncated to 32bit, which is exactly what is intended
here. The problem Dave was refering to was a cast of unsigned long *
to u32 *, which doesn't work because it will use the upper 4 byte on
big-endian 64bit. But without pointer casts this should work well.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETLINK 8/10]: Support dynamic number of multicast groups per netlink family

2005-08-15 Thread Evgeniy Polyakov
On Mon, Aug 15, 2005 at 10:16:19AM +0200, Patrick McHardy ([EMAIL PROTECTED]) 
wrote:
 Evgeniy Polyakov wrote:
 +   nlk-groups[0] = (nlk-groups[0]  ~0xUL) | 
 nladdr-nl_groups; netlink_table_ungrab();
 
 
 I have some doubt about 64bit platforms.
 
 We want to replace the lower 32 bit. What are the doubts you're haveing?
 return 0;
 @@ -590,7 +619,7 @@ static int netlink_getname(struct socket
 nladdr-nl_groups = netlink_group_mask(nlk-dst_group);
 } else {
 nladdr-nl_pid = nlk-pid;
 -   nladdr-nl_groups = nlk-groups; 
 +   nladdr-nl_groups = nlk-groups[0];
 
 
 And here too.
 
 nlk-groups[0] is an unsigned long, which is 64bit on 64bit platforms.
 
 So it will be truncated to 32bit, which is exactly what is intended
 here. The problem Dave was refering to was a cast of unsigned long *
 to u32 *, which doesn't work because it will use the upper 4 byte on
 big-endian 64bit. But without pointer casts this should work well.

it's not Dave's bug, all this changes force compiler to scream, which thrusts 
forward.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETLINK 8/10]: Support dynamic number of multicast groups per netlink family

2005-08-15 Thread Patrick McHardy

Evgeniy Polyakov wrote:

it's not Dave's bug, all this changes force compiler to scream, which thrusts 
forward.


I don't get any compiler warnings with gcc-4.0.1 on x86 and amd64,
so could you please be more specific?


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETLINK 8/10]: Support dynamic number of multicast groups per netlink family

2005-08-15 Thread Evgeniy Polyakov
On Mon, Aug 15, 2005 at 11:06:27AM +0200, Patrick McHardy ([EMAIL PROTECTED]) 
wrote:
 Evgeniy Polyakov wrote:
 it's not Dave's bug, all this changes force compiler to scream, which 
 thrusts forward.
 
 I don't get any compiler warnings with gcc-4.0.1 on x86 and amd64,
 so could you please be more specific?

My fault, it was my changes on top of yours about which compiler warns :)
Sorry for that.
unsigned long can be transformed into any type safely.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETLINK 6/8]: Support dynamic number of multicast groups per netlink family

2005-08-15 Thread Thomas Graf
* Patrick McHardy [EMAIL PROTECTED] 2005-08-14 15:30
 Thomas Graf wrote:
  * Patrick McHardy [EMAIL PROTECTED] 2005-08-13 02:36
  
 [NETLINK]: Support dynamic number of multicast groups per netlink family
 
 Signed-off-by: Patrick McHardy [EMAIL PROTECTED]
 
 -   if ((err = __netlink_create(sock, protocol)  0))
 +   nlk-groups = kmalloc(NLGRPSZ(groups), GFP_KERNEL);
 +   if (nlk-groups == NULL) {
 +   err = -ENOMEM;
 goto out_module;
 +   }
  
  
  Inteded to depute the cleanup of __netlink_create to a
  call to sock_release() by the caller?
 
 Sorry, I'm not sure I understand what you mean :)

You don't undo the allocations of __netlink_create if the
kmalloc for nlk-groups fails. So my question is if this is
intended and you really want to rely on the caller to invoke
sock_release() to free the sk again or whether it might be
worth to follow the rule of leave things untouched in case
of an error.

  Given we remove the minimal group size of 32 introduced
  in a later patch would it make sense to not allocate if
  groups==0 at the cost of a few additional runtime checks?
  I only see a real cost in do_one_broacast() but the
  check for group -  1 = ngroups already ensures it to be
  allocated so I don't see any problems performance wise.
 
 We could do that, the main reason why my patches enforce a minimum
 of 32 groups is for backwards compatiblity so getsockname returns the
 same nl_groups mask that was specified in bind. I'm not sure if we
 really need this ..

Good point, I'll think about this.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NETLINK 6/8]: Support dynamic number of multicast groups per netlink family

2005-08-15 Thread Patrick McHardy
Thomas Graf wrote:
 You don't undo the allocations of __netlink_create if the
 kmalloc for nlk-groups fails. So my question is if this is
 intended and you really want to rely on the caller to invoke
 sock_release() to free the sk again or whether it might be
 worth to follow the rule of leave things untouched in case
 of an error.

It was intentional, but I agree the other way around would be more
consistent. If you want to send a patch, go ahead, otherwise I'll
put it on my cleanup-list. Not allowing user sockets for unregistered
protocols allows a couple of other cleanups as well.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] TCP Offload (TOE) - Chelsio

2005-08-15 Thread Dimitris Michailidis
On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote:
 From: Dimitris Michailidis [EMAIL PROTECTED]
 Date: Fri, 12 Aug 2005 10:00:12 -0700
 
  On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote:
   This would mean that every time we wish to change the data structures
   and interfaces for TCP socket lookup, your drivers would need to
   change.
 
  I think using TCP's own functions was done exactly to avoid this
  problem.
 
 That's doesn't achieve the desired result.
 
 I do plan to merge in IBM's move of the TCP hash tables over
 to RCU style locking, and that will require knowledge of the
 locking at the call sites to the functions you have exported
 to the TOE drivers.  The TOE drivers would break as a result.

TOE uses the same locking strategies the host TCP uses (lock_sock and
the rest) so it should at least be familiar.  It doesn't use
ehash_lock or head-lock other than indirectly through functions such
as the above, and does its normal lookups in its own lockless table
that is based on flow ids rather than 4-tuples.  I haven't seen the
patches you mention recently, I recall seeing some RCU ehash
discussion several months ago and that didn't seem it would have much
of an impact.  If you have something more recent I can take a look and
tell you if it would affect anything.

 
 You are creating a maintainence headache for us as well.  Once this
 stuff gets exported to drivers, it becomes nearly impossible to
 change.  And I absolutely reserve the right to create restrictions of
 use that increase the flexibility we have to change interfaces, data
 structures, and locking strategies in the future.
 
I think you have a fine attitude here.  There are and there will be a
lot more users of the SW TCP than of TOEs and I think you should feel
free to improve the former however you can.  The TOE code still works
with kernels going back to 2.4.22, tracking changes in mainline TCP
hasn't been an issue so far.  If you can give maintainers a heads up
before changes you think may be disruptive I think that would be
plenty on your part.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-mm PATCH 05/32] net: fix-up schedule_timeout() usage

2005-08-15 Thread Nishanth Aravamudan
Description: Use schedule_timeout_{,un}interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size. Also use
human-time conversion functions instead of hard-coded division to avoid
rounding issues.

Signed-off-by: Nishanth Aravamudan [EMAIL PROTECTED]

---

 net/core/pktgen.c|   13 +
 net/dccp/proto.c |2 +-
 net/ipv4/ipconfig.c  |5 ++---
 net/irda/ircomm/ircomm_tty.c |9 +++--
 net/sunrpc/svcsock.c |3 +--
 5 files changed, 12 insertions(+), 20 deletions(-)

--- 2.6.13-rc5-mm1/net/core/pktgen.c2005-08-07 10:05:22.0 -0700
+++ 2.6.13-rc5-mm1-dev/net/core/pktgen.c2005-08-14 13:32:59.0 
-0700
@@ -1452,8 +1452,7 @@ static int proc_thread_write(struct file
thread_lock();
t-control |= T_REMDEV;
thread_unlock();
-   current-state = TASK_INTERRUPTIBLE;
-   schedule_timeout(HZ/8);  /* Propagate thread-control  */
+   schedule_timeout_interruptible(msecs_to_jiffies(125));  /* 
Propagate thread-control  */
ret = count;
 sprintf(pg_result, OK: rem_device_all);
goto out;
@@ -1716,10 +1715,9 @@ static void spin(struct pktgen_dev *pkt_
printk(KERN_INFO sleeping for %d\n, (int)(spin_until_us - now));
while (now  spin_until_us) {
/* TODO: optimise sleeping behavior */
-   if (spin_until_us - now  (100/HZ)+1) {
-   current-state = TASK_INTERRUPTIBLE;
-   schedule_timeout(1);
-   } else if (spin_until_us - now  100) {
+   if (spin_until_us - now  jiffies_to_usecs(1)+1)
+   schedule_timeout_interruptible(1);
+   else if (spin_until_us - now  100) {
do_softirq();
if (!pkt_dev-running)
return;
@@ -2449,8 +2447,7 @@ static void pktgen_run_all_threads(void)
}
thread_unlock();
 
-   current-state = TASK_INTERRUPTIBLE;
-   schedule_timeout(HZ/8);  /* Propagate thread-control  */
+   schedule_timeout_interruptible(msecs_to_jiffies(125));  /* Propagate 
thread-control  */

pktgen_wait_all_threads_run();
 }
diff -urpN 2.6.13-rc5-mm1/net/dccp/proto.c 2.6.13-rc5-mm1-dev/net/dccp/proto.c
--- 2.6.13-rc5-mm1/net/dccp/proto.c 2005-08-07 10:05:22.0 -0700
+++ 2.6.13-rc5-mm1-dev/net/dccp/proto.c 2005-08-10 16:10:55.0 -0700
@@ -225,7 +225,7 @@ int dccp_sendmsg(struct kiocb *iocb, str
if (delay  timeo)
goto out_discard;
release_sock(sk);
-   delay = schedule_timeout(delay);
+   delay = schedule_timeout_interruptible(delay);
lock_sock(sk);
timeo -= delay;
if (signal_pending(current))
diff -urpN 2.6.13-rc5-mm1/net/ipv4/ipconfig.c 
2.6.13-rc5-mm1-dev/net/ipv4/ipconfig.c
--- 2.6.13-rc5-mm1/net/ipv4/ipconfig.c  2005-08-07 10:05:22.0 -0700
+++ 2.6.13-rc5-mm1-dev/net/ipv4/ipconfig.c  2005-08-10 15:26:57.0 
-0700
@@ -1102,10 +1102,9 @@ static int __init ic_dynamic(void)
 #endif
 
jiff = jiffies + (d-next ? CONF_INTER_TIMEOUT : timeout);
-   while (time_before(jiffies, jiff)  !ic_got_reply) {
+   while (time_before(jiffies, jiff)  !ic_got_reply)
set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(1);
-   }
+   schedule_timeout_uninterruptible(1);
 #ifdef IPCONFIG_DHCP
/* DHCP isn't done until we get a DHCPACK. */
if ((ic_got_reply  IC_BOOTP)
diff -urpN 2.6.13-rc5-mm1/net/irda/ircomm/ircomm_tty.c 
2.6.13-rc5-mm1-dev/net/irda/ircomm/ircomm_tty.c
--- 2.6.13-rc5-mm1/net/irda/ircomm/ircomm_tty.c 2005-08-07 09:57:38.0 
-0700
+++ 2.6.13-rc5-mm1-dev/net/irda/ircomm/ircomm_tty.c 2005-08-10 
15:27:13.0 -0700
@@ -567,10 +567,8 @@ static void ircomm_tty_close(struct tty_
self-tty = NULL;
 
if (self-blocked_open) {
-   if (self-close_delay) {
-   current-state = TASK_INTERRUPTIBLE;
-   schedule_timeout(self-close_delay);
-   }
+   if (self-close_delay)
+   schedule_timeout_interruptible(self-close_delay);
wake_up_interruptible(self-open_wait);
}
 
@@ -863,8 +861,7 @@ static void ircomm_tty_wait_until_sent(s
spin_lock_irqsave(self-spinlock, flags);
while (self-tx_skb  self-tx_skb-len) {
spin_unlock_irqrestore(self-spinlock, flags);
-   current-state = TASK_INTERRUPTIBLE;
-   schedule_timeout(poll_time);
+   

[-mm PATCH 20/32] drivers/net: fix-up schedule_timeout() usage

2005-08-15 Thread Nishanth Aravamudan
Description: Use schedule_timeout_interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size.

Signed-off-by: Nishanth Aravamudan [EMAIL PROTECTED]

---

 drivers/net/8139cp.c  |3 -
 drivers/net/hp100.c   |   48 ++
 drivers/net/irda/stir4200.c   |7 +---
 drivers/net/ixgb/ixgb_ethtool.c   |7 +---
 drivers/net/ns83820.c |3 -
 drivers/net/tokenring/ibmtr.c |9 ++---
 drivers/net/tokenring/olympic.c   |2 -
 drivers/net/tokenring/tms380tr.c  |3 -
 drivers/net/typhoon.c |7 +---
 drivers/net/wan/cosa.c|6 +--
 drivers/net/wan/cycx_drv.c|3 -
 drivers/net/wan/dscc4.c   |9 +
 drivers/net/wan/farsync.c |3 -
 drivers/net/wireless/ipw2100.c|   17 +++---
 drivers/net/wireless/prism54/islpci_dev.c |6 +--
 drivers/net/wireless/prism54/islpci_mgt.c |5 +--
 include/linux/ibmtr.h |4 +-
 include/linux/netdevice.h |6 +--
 18 files changed, 54 insertions(+), 94 deletions(-)

diff -urpN 2.6.13-rc5-mm1/drivers/net/8139cp.c 
2.6.13-rc5-mm1-dev/drivers/net/8139cp.c
--- 2.6.13-rc5-mm1/drivers/net/8139cp.c 2005-08-07 09:58:00.0 -0700
+++ 2.6.13-rc5-mm1-dev/drivers/net/8139cp.c 2005-08-08 15:54:06.0 
-0700
@@ -1029,8 +1029,7 @@ static void cp_reset_hw (struct cp_priva
if (!(cpr8(Cmd)  CmdReset))
return;
 
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   schedule_timeout(10);
+   schedule_timeout_uninterruptible(10);
}
 
printk(KERN_ERR %s: hardware reset timeout\n, cp-dev-name);
diff -urpN 2.6.13-rc5-mm1/drivers/net/hp100.c 
2.6.13-rc5-mm1-dev/drivers/net/hp100.c
--- 2.6.13-rc5-mm1/drivers/net/hp100.c  2005-08-07 09:58:01.0 -0700
+++ 2.6.13-rc5-mm1-dev/drivers/net/hp100.c  2005-08-08 15:55:41.0 
-0700
@@ -2517,10 +2517,8 @@ static int hp100_down_vg_link(struct net
do {
if (hp100_inb(VG_LAN_CFG_1)  HP100_LINK_CABLE_ST)
break;
-   if (!in_interrupt()) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   schedule_timeout(1);
-   }
+   if (!in_interrupt())
+   schedule_timeout_interruptible(1);
} while (time_after(time, jiffies));
 
if (time_after_eq(jiffies, time))   /* no signal-no logout */
@@ -2536,10 +2534,8 @@ static int hp100_down_vg_link(struct net
do {
if (!(hp100_inb(VG_LAN_CFG_1)  HP100_LINK_UP_ST))
break;
-   if (!in_interrupt()) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   schedule_timeout(1);
-   }
+   if (!in_interrupt())
+   schedule_timeout_interruptible(1);
} while (time_after(time, jiffies));
 
 #ifdef HP100_DEBUG
@@ -2577,10 +2573,8 @@ static int hp100_down_vg_link(struct net
do {
if (!(hp100_inb(MAC_CFG_4)  HP100_MAC_SEL_ST))
break;
-   if (!in_interrupt()) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   schedule_timeout(1);
-   }
+   if (!in_interrupt())
+   schedule_timeout_interruptible(1);
} while (time_after(time, jiffies));
 
hp100_orb(HP100_AUTO_MODE, MAC_CFG_3);  /* Autosel back on */
@@ -2591,10 +2585,8 @@ static int hp100_down_vg_link(struct net
do {
if ((hp100_inb(VG_LAN_CFG_1)  HP100_LINK_CABLE_ST) == 0)
break;
-   if (!in_interrupt()) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   schedule_timeout(1);
-   }
+   if (!in_interrupt())
+   schedule_timeout_interruptible(1);
} while (time_after(time, jiffies));
 
if (time_before_eq(time, jiffies)) {
@@ -2606,10 +2598,8 @@ static int hp100_down_vg_link(struct net
 
time = jiffies + (2 * HZ);  /* This seems to take a while */
do {
-   if (!in_interrupt()) {
-   set_current_state(TASK_INTERRUPTIBLE);
-   schedule_timeout(1);
-   }
+   if (!in_interrupt())
+   schedule_timeout_interruptible(1);
} while (time_after(time, jiffies));
 
return 0;
@@ -2659,10 +2649,8 @@ static int hp100_login_to_vg_hub(struct 
do {
if (~(hp100_inb(VG_LAN_CFG_1)  HP100_LINK_UP_ST))
  

Re: [NETLINK 6/8]: Support dynamic number of multicast groups per netlink family

2005-08-15 Thread David S. Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Mon, 15 Aug 2005 16:05:49 +0200

 It was intentional, but I agree the other way around would be more
 consistent. If you want to send a patch, go ahead, otherwise I'll
 put it on my cleanup-list. Not allowing user sockets for unregistered
 protocols allows a couple of other cleanups as well.

I would also like to recommend a few other potential cleanups:

1) inconsistent use of unsigned int vs. u32 for types used
   to hold the same data, for example the groups member
   of nl_table[] (which is unsigned int) vs. nlk-groups
   which is u32.

2) do_one_set_err() and do_one_broadcast() both make this identical
   test:

if (nlk-pid == p-pid || p-group - 1 = nlk-ngroups ||
!test_bit(p-group - 1, nlk-groups))
goto out;

   so, consider making this into an inline function or similar.

But this is nit-picking, of course, your patches were very
well done Patrick :)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: udp source port randomization?

2005-08-15 Thread David S. Miller
From: bert hubert [EMAIL PROTECTED]
Date: Mon, 15 Aug 2005 22:16:49 +0200

 Currently socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) delivers the exact same
 source port each time I run it, 32776. The second invocation, without
 closing the first socket, generates 32777.
 
 This strikes me as being somewhat insecure and not in the spirit of TCP
 source port randomization.

UDP does not have the same kind of vulnerability from port
number guessing.  In fact, UDP is extremely vulnerable for
connected sockets no matter what we do in the port allocation
area.

UDP does not have sequence numbers, so there is nothing
protecting an attacker from injecting random crap into
a UDP connection.

Another factor influencing this is the fact that most UDP
usage is of the request/response type where the port
identity only exists for those two packets.

I really don't think it's worth the work to add UDP port
randomization at all.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: udp source port randomization?

2005-08-15 Thread Andi Kleen
 It does help 16 bits :-) Better than nothing. 

16bits is so poor that any secure algorithms using it would
just give a false sense of security.

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skb-pkt_type

2005-08-15 Thread David S. Miller
From: Thomas Graf [EMAIL PROTECTED]
Date: Mon, 15 Aug 2005 16:43:57 +0200

 Dave, I found another problem in the earlier patch of mine,
 when we free the clone portion and the parent is still alive
 we used to set UNAVAIL in the else branch but at this point
 the skb could have been gone already, I fixed this in this
 patch.

This patch hangs the machine on boot for me.
Probably this is occuring, once again, on the
first TCP usage which is the only spot which will
use fclones in this patch.

I thought firstly that it might be due to the child's fclone field not
being initialized at __alloc_skb() time.  So I fixed that up like so:

+   if (fclone) {
+   struct sk_buff *n = skb + 1;
+
+   skb-fclone = SKB_FCLONE_ORIG;
+   n-fclone = SKB_FCLONE_UNAVAILABLE;
+   }

That is a real bug, because we do not explicitly initialize the child
skb with a memset() here, only the parent SKB gets that.

But things are still busted somehow.  Still looking.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skb-pkt_type

2005-08-15 Thread David S. Miller

Ok, this scheme doesn't work as-is.

We never run the __kfree_skb() actions on the parent
SKB if the child drops the parent SKB users count to zero.

This means we don't release the DST and other objects
referenced in the parent SKB.  We also never release the
SKB memory in this case either.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skb-pkt_type

2005-08-15 Thread David S. Miller
From: David S. Miller [EMAIL PROTECTED]
Date: Mon, 15 Aug 2005 15:45:00 -0700 (PDT)

 Ok, this scheme doesn't work as-is.

FWIW the fclone_ref version works perfectly fine, and
I'm running this right now.  I'm including it below
against current net-2.6.14 for reference.

So what do folks think we should do?  I'm inclined to put
this in first, as-is, then if we can get the skb-users
variant functional we can add that in as a follow-on
patch.

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -162,6 +162,13 @@ struct skb_timeval {
u32 off_usec;
 };
 
+
+enum {
+   SKB_FCLONE_UNAVAILABLE,
+   SKB_FCLONE_ORIG,
+   SKB_FCLONE_CLONE,
+};
+
 /** 
  * struct sk_buff - socket buffer
  * @next: Next buffer in list
@@ -255,8 +262,10 @@ struct sk_buff {
ip_summed:2,
nohdr:1,
nfctinfo:3;
-   __u8pkt_type;
+   __u8pkt_type:3,
+   fclone:2;
__u16   protocol;
+   atomic_tfclone_ref;
 
void(*destructor)(struct sk_buff *skb);
 #ifdef CONFIG_NETFILTER
@@ -295,8 +304,20 @@ struct sk_buff {
 #include asm/system.h
 
 extern void   __kfree_skb(struct sk_buff *skb);
-extern struct sk_buff *alloc_skb(unsigned int size,
-unsigned int __nocast priority);
+extern struct sk_buff *__alloc_skb(unsigned int size,
+  unsigned int __nocast priority, int fclone);
+static inline struct sk_buff *alloc_skb(unsigned int size,
+   unsigned int __nocast priority)
+{
+   return __alloc_skb(size, priority, 0);
+}
+
+static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
+  unsigned int __nocast priority)
+{
+   return __alloc_skb(size, priority, 1);
+}
+
 extern struct sk_buff *alloc_skb_from_cache(kmem_cache_t *cp,
unsigned int size,
unsigned int __nocast priority);
diff --git a/include/net/sock.h b/include/net/sock.h
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1195,7 +1195,7 @@ static inline struct sk_buff *sk_stream_
int hdr_len;
 
hdr_len = SKB_DATA_ALIGN(sk-sk_prot-max_header);
-   skb = alloc_skb(size + hdr_len, gfp);
+   skb = alloc_skb_fclone(size + hdr_len, gfp);
if (skb) {
skb-truesize += mem;
if (sk-sk_forward_alloc = (int)skb-truesize ||
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -69,6 +69,7 @@
 #include asm/system.h
 
 static kmem_cache_t *skbuff_head_cache;
+static kmem_cache_t *skbuff_fclone_cache;
 
 struct timeval __read_mostly skb_tv_base;
 
@@ -120,7 +121,7 @@ void skb_under_panic(struct sk_buff *skb
  */
 
 /**
- * alloc_skb   -   allocate a network buffer
+ * __alloc_skb -   allocate a network buffer
  * @size: size to allocate
  * @gfp_mask: allocation mask
  *
@@ -131,14 +132,20 @@ void skb_under_panic(struct sk_buff *skb
  * Buffers may only be allocated from interrupts using a @gfp_mask of
  * %GFP_ATOMIC.
  */
-struct sk_buff *alloc_skb(unsigned int size, unsigned int __nocast gfp_mask)
+struct sk_buff *__alloc_skb(unsigned int size, unsigned int __nocast gfp_mask,
+   int fclone)
 {
struct sk_buff *skb;
u8 *data;
 
/* Get the HEAD */
-   skb = kmem_cache_alloc(skbuff_head_cache,
-  gfp_mask  ~__GFP_DMA);
+   if (fclone)
+   skb = kmem_cache_alloc(skbuff_fclone_cache,
+  gfp_mask  ~__GFP_DMA);
+   else
+   skb = kmem_cache_alloc(skbuff_head_cache,
+  gfp_mask  ~__GFP_DMA);
+
if (!skb)
goto out;
 
@@ -155,7 +162,14 @@ struct sk_buff *alloc_skb(unsigned int s
skb-data = data;
skb-tail = data;
skb-end  = data + size;
+   if (fclone) {
+   struct sk_buff *child = skb + 1;
 
+   skb-fclone = SKB_FCLONE_ORIG;
+   atomic_set(skb-fclone_ref, 1);
+
+   child-fclone = SKB_FCLONE_UNAVAILABLE;
+   }
atomic_set((skb_shinfo(skb)-dataref), 1);
skb_shinfo(skb)-nr_frags  = 0;
skb_shinfo(skb)-tso_size = 0;
@@ -268,8 +282,31 @@ void skb_release_data(struct sk_buff *sk
  */
 void kfree_skbmem(struct sk_buff *skb)
 {
+   struct sk_buff *other;
+
skb_release_data(skb);
-   kmem_cache_free(skbuff_head_cache, skb);
+   switch (skb-fclone) {
+   case SKB_FCLONE_UNAVAILABLE:
+   kmem_cache_free(skbuff_head_cache, skb);
+   break;

[PATCH] iproute2 support for inet_diag

2005-08-15 Thread Arnaldo Carvalho de Melo
Hi Stephen,

Please consider applying.

One thing I think we should address is to show the name of the
protocol in listings where sockets of more than one protocol type are
being displayed, but this patch is a good start, I guess.

Best Regards,

- Arnaldo

diff -uNrp iproute2-ss050808.orig/include/linux/inet_diag.h 
iproute2-ss050808.dccp/include/linux/inet_diag.h
--- iproute2-ss050808.orig/include/linux/inet_diag.h1969-12-31 
21:00:00.0 -0300
+++ iproute2-ss050808.dccp/include/linux/inet_diag.h2005-08-15 
21:47:38.0 -0300
@@ -0,0 +1,121 @@
+#ifndef _INET_DIAG_H_
+#define _INET_DIAG_H_ 1
+
+/* Just some random number */
+#define TCPDIAG_GETSOCK 18
+#define DCCPDIAG_GETSOCK 19
+
+#define INET_DIAG_GETSOCK_MAX 24
+
+/* Socket identity */
+struct inet_diag_sockid {
+   __u16   idiag_sport;
+   __u16   idiag_dport;
+   __u32   idiag_src[4];
+   __u32   idiag_dst[4];
+   __u32   idiag_if;
+   __u32   idiag_cookie[2];
+#define INET_DIAG_NOCOOKIE (~0U)
+};
+
+/* Request structure */
+
+struct inet_diag_req {
+   __u8idiag_family;   /* Family of addresses. */
+   __u8idiag_src_len;
+   __u8idiag_dst_len;
+   __u8idiag_ext;  /* Query extended information */
+
+   struct inet_diag_sockid id;
+
+   __u32   idiag_states;   /* States to dump */
+   __u32   idiag_dbs;  /* Tables to dump (NI) */
+};
+
+enum {
+   INET_DIAG_REQ_NONE,
+   INET_DIAG_REQ_BYTECODE,
+};
+
+#define INET_DIAG_REQ_MAX INET_DIAG_REQ_BYTECODE
+
+/* Bytecode is sequence of 4 byte commands followed by variable arguments.
+ * All the commands identified by code are conditional jumps forward:
+ * to offset cc+yes or to offset cc+no. yes is supposed to be
+ * length of the command and its arguments.
+ */
+ 
+struct inet_diag_bc_op {
+   unsigned char   code;
+   unsigned char   yes;
+   unsigned short  no;
+};
+
+enum {
+   INET_DIAG_BC_NOP,
+   INET_DIAG_BC_JMP,
+   INET_DIAG_BC_S_GE,
+   INET_DIAG_BC_S_LE,
+   INET_DIAG_BC_D_GE,
+   INET_DIAG_BC_D_LE,
+   INET_DIAG_BC_AUTO,
+   INET_DIAG_BC_S_COND,
+   INET_DIAG_BC_D_COND,
+};
+
+struct inet_diag_hostcond {
+   __u8family;
+   __u8prefix_len;
+   int port;
+   __u32   addr[0];
+};
+
+/* Base info structure. It contains socket identity (addrs/ports/cookie)
+ * and, alas, the information shown by netstat. */
+struct inet_diag_msg {
+   __u8idiag_family;
+   __u8idiag_state;
+   __u8idiag_timer;
+   __u8idiag_retrans;
+
+   struct inet_diag_sockid id;
+
+   __u32   idiag_expires;
+   __u32   idiag_rqueue;
+   __u32   idiag_wqueue;
+   __u32   idiag_uid;
+   __u32   idiag_inode;
+};
+
+/* Extensions */
+
+enum {
+   INET_DIAG_NONE,
+   INET_DIAG_MEMINFO,
+   INET_DIAG_INFO,
+   INET_DIAG_VEGASINFO,
+   INET_DIAG_CONG,
+};
+
+#define INET_DIAG_MAX INET_DIAG_CONG
+
+
+/* INET_DIAG_MEM */
+
+struct inet_diag_meminfo {
+   __u32   idiag_rmem;
+   __u32   idiag_wmem;
+   __u32   idiag_fmem;
+   __u32   idiag_tmem;
+};
+
+/* INET_DIAG_VEGASINFO */
+
+struct tcpvegas_info {
+   __u32   tcpv_enabled;
+   __u32   tcpv_rttcnt;
+   __u32   tcpv_rtt;
+   __u32   tcpv_minrtt;
+};
+
+#endif /* _INET_DIAG_H_ */
diff -uNrp iproute2-ss050808.orig/include/linux/netlink.h 
iproute2-ss050808.dccp/include/linux/netlink.h
--- iproute2-ss050808.orig/include/linux/netlink.h  2005-08-08 
17:24:41.0 -0300
+++ iproute2-ss050808.dccp/include/linux/netlink.h  2005-08-15 
21:35:34.0 -0300
@@ -8,19 +8,17 @@
 #define NETLINK_W1 1   /* 1-wire subsystem 
*/
 #define NETLINK_USERSOCK   2   /* Reserved for user mode socket 
protocols  */
 #define NETLINK_FIREWALL   3   /* Firewalling hook 
*/
-#define NETLINK_TCPDIAG4   /* TCP socket monitoring
*/
+#define NETLINK_INET_DIAG  4   /* INET socket monitoring   
*/
 #define NETLINK_NFLOG  5   /* netfilter/iptables ULOG */
 #define NETLINK_XFRM   6   /* ipsec */
 #define NETLINK_SELINUX7   /* SELinux event notifications 
*/
-#define NETLINK_ARPD   8
+#define NETLINK_ISCSI  8   /* Open-iSCSI */
 #define NETLINK_AUDIT  9   /* auditing */
 #define NETLINK_FIB_LOOKUP 10  
-#define NETLINK_ROUTE6 11  /* af_inet6 route comm channel */
 #define NETLINK_NETFILTER  12  /* netfilter subsystem */
 #define NETLINK_IP6_FW 13
 #define NETLINK_DNRTMSG14  /* DECnet routing messages */
 #define NETLINK_KOBJECT_UEVENT 15  /* Kernel messages to userspace */
-#define NETLINK_TAPBASE16  /* 16 to 31 are ethertap */
 
 

Re: [PATCH] iproute2 support for inet_diag

2005-08-15 Thread Arnaldo Carvalho de Melo
Em Mon, Aug 15, 2005 at 09:51:54PM -0300, Arnaldo Carvalho de Melo escreveu:
 Hi Stephen,
 
   Please consider applying.
 
   One thing I think we should address is to show the name of the
 protocol in listings where sockets of more than one protocol type are
 being displayed, but this patch is a good start, I guess.

Or even better, to use /etc/protocols and accept --proto as long as it is
listed in /etc/protocols, for that we would have a special case in the
kernel where 18 is mapped to TCP as well, there is no protocol mapped to
18 in my /etc/protocols right now, this would be the any problem with this
scheme, I guess. This way we would not have to change iproute2 everytime we
add support to inet_diag in some inet transport protocol (SCTP and UDP
being the next potential ones).

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skb-pkt_type

2005-08-15 Thread Herbert Xu
On Mon, Aug 15, 2005 at 05:10:41PM -0700, David S. Miller wrote:
 
 So what do folks think we should do?  I'm inclined to put
 this in first, as-is, then if we can get the skb-users
 variant functional we can add that in as a follow-on
 patch.

Fine by me.  I have a suggestion as to where fclone_ref
should be though.  I'd put it outside sk_buff.  So when you
allocate sk_buff * 2 for fast clones, make that
sk_buff * 2 + atomic_t.  Then you will only have to carry
it around for the fast clones.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skb-pkt_type

2005-08-15 Thread David S. Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Tue, 16 Aug 2005 12:02:31 +1000

 I have a suggestion as to where fclone_ref
 should be though.  I'd put it outside sk_buff.  So when you
 allocate sk_buff * 2 for fast clones, make that
 sk_buff * 2 + atomic_t.  Then you will only have to carry
 it around for the fast clones.

Excellent idea, I'll work on that change tonight
or tomorrow sometime.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1][NET] Fix sparse warnings

2005-08-15 Thread Arnaldo Carvalho de Melo
Hi David,

Please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/

Please let me know about anything you may want reworked.

Best Regards,

- Arnaldo

tree 7f65f8f8a8cf5b2f66089c9c039f2b032d964bab
parent f38354751f9c96203164bf9fcf3ec9ee91ef07e5
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1124169482 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1124169482 -0300

[NET] Fix sparse warnings

Of this type, mostly:

CHECK   net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not 
declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not 
declared. Should it be static?

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]

--

 include/linux/if_ether.h|2 +
 include/linux/if_frad.h |6 +++--
 include/linux/if_tr.h   |4 +++
 include/linux/igmp.h|3 ++
 include/linux/net.h |7 ++
 include/linux/netdevice.h   |   10 +
 include/linux/netfilter_ipv6.h  |4 +--
 include/linux/security.h|6 +++--
 include/linux/skbuff.h  |2 +
 include/linux/socket.h  |7 ++
 include/net/addrconf.h  |6 +
 include/net/af_unix.h   |   15 +
 include/net/icmp.h  |7 ++
 include/net/ip.h|   23 +
 include/net/ip_fib.h|5 
 include/net/ipv6.h  |   35 ++--
 include/net/p8022.h |2 +
 include/net/raw.h   |7 +-
 include/net/route.h |2 +
 include/net/sock.h  |   12 +++
 include/net/tcp.h   |   12 +++
 include/net/udp.h   |5 
 init/main.c |2 -
 kernel/sysctl.c |4 ---
 net/802/p8023.c |1 
 net/802/sysctl_net_802.c|3 +-
 net/core/dev.c  |6 -
 net/core/sysctl_net_core.c  |9 +---
 net/core/utils.c|2 +
 net/core/wireless.c |4 ---
 net/ethernet/eth.c  |2 -
 net/ethernet/sysctl_net_ether.c |1 
 net/ipv4/af_inet.c  |   14 -
 net/ipv4/datagram.c |1 
 net/ipv4/inetpeer.c |1 
 net/ipv4/ip_sockglue.c  |2 -
 net/ipv4/proc.c |3 --
 net/ipv4/syncookies.c   |2 -
 net/ipv4/sysctl_net_ipv4.c  |   43 ++--
 net/ipv4/tcp_input.c|2 -
 net/ipv4/tcp_ipv4.c |2 -
 net/ipv6/addrconf.c |4 +--
 net/ipv6/af_inet6.c |   24 --
 net/ipv6/ipv6_sockglue.c|8 ---
 net/ipv6/route.c|6 +
 net/ipv6/sit.c  |2 -
 net/ipv6/sysctl_net_ipv6.c  |3 --
 net/ipv6/tcp_ipv6.c |4 ---
 net/ipv6/udp.c  |2 -
 net/ipx/af_ipx.c|2 -
 net/socket.c|   11 --
 net/sysctl_net.c|8 ++-
 net/unix/af_unix.c  |8 ---
 net/unix/sysctl_net_unix.c  |2 -
 54 files changed, 208 insertions(+), 162 deletions(-)

--

diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -110,6 +110,8 @@ static inline struct ethhdr *eth_hdr(con
 {
return (struct ethhdr *)skb-mac.raw;
 }
+
+extern struct ctl_table ether_table[];
 #endif
 
 #endif /* _LINUX_IF_ETHER_H */
diff --git a/include/linux/if_frad.h b/include/linux/if_frad.h
--- a/include/linux/if_frad.h
+++ b/include/linux/if_frad.h
@@ -191,10 +191,12 @@ struct frad_local
int   buffer;   /* current buffer for S508 firmware */
 };
 
-extern void dlci_ioctl_set(int (*hook)(unsigned int, void __user *));
-
 #endif /* __KERNEL__ */
 
 #endif /* CONFIG_DLCI || CONFIG_DLCI_MODULE */
 
+#ifdef __KERNEL__
+extern void dlci_ioctl_set(int (*hook)(unsigned int, void __user *));
+#endif
+
 #endif
diff --git a/include/linux/if_tr.h b/include/linux/if_tr.h
--- a/include/linux/if_tr.h
+++ b/include/linux/if_tr.h
@@ -43,12 +43,16 @@ struct trh_hdr {
 };
 
 #ifdef __KERNEL__
+#include linux/config.h
 #include linux/skbuff.h
 
 static inline struct trh_hdr *tr_hdr(const struct sk_buff *skb)
 {
return (struct trh_hdr *)skb-mac.raw;
 }
+#ifdef CONFIG_SYSCTL
+extern struct ctl_table tr_table[];
+#endif
 #endif
 
 /* This is an Token-Ring LLC structure */
diff --git a/include/linux/igmp.h b/include/linux/igmp.h
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -129,6 +129,9 @@ struct 

Re: [PATCH 1/1][NET] Fix sparse warnings

2005-08-15 Thread David S. Miller
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
Date: Tue, 16 Aug 2005 02:24:14 -0300

   Please consider pulling from:
 
 rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/
 
   Please let me know about anything you may want reworked.

Looks good, pulled.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html