Re: [NETLINK 8/10]: Support dynamic number of multicast groups per netlink family
Evgeniy Polyakov wrote: + nlk-groups[0] = (nlk-groups[0] ~0xUL) | nladdr-nl_groups; netlink_table_ungrab(); I have some doubt about 64bit platforms. We want to replace the lower 32 bit. What are the doubts you're haveing? return 0; @@ -590,7 +619,7 @@ static int netlink_getname(struct socket nladdr-nl_groups = netlink_group_mask(nlk-dst_group); } else { nladdr-nl_pid = nlk-pid; - nladdr-nl_groups = nlk-groups; + nladdr-nl_groups = nlk-groups[0]; And here too. nlk-groups[0] is an unsigned long, which is 64bit on 64bit platforms. So it will be truncated to 32bit, which is exactly what is intended here. The problem Dave was refering to was a cast of unsigned long * to u32 *, which doesn't work because it will use the upper 4 byte on big-endian 64bit. But without pointer casts this should work well. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NETLINK 8/10]: Support dynamic number of multicast groups per netlink family
On Mon, Aug 15, 2005 at 10:16:19AM +0200, Patrick McHardy ([EMAIL PROTECTED]) wrote: Evgeniy Polyakov wrote: + nlk-groups[0] = (nlk-groups[0] ~0xUL) | nladdr-nl_groups; netlink_table_ungrab(); I have some doubt about 64bit platforms. We want to replace the lower 32 bit. What are the doubts you're haveing? return 0; @@ -590,7 +619,7 @@ static int netlink_getname(struct socket nladdr-nl_groups = netlink_group_mask(nlk-dst_group); } else { nladdr-nl_pid = nlk-pid; - nladdr-nl_groups = nlk-groups; + nladdr-nl_groups = nlk-groups[0]; And here too. nlk-groups[0] is an unsigned long, which is 64bit on 64bit platforms. So it will be truncated to 32bit, which is exactly what is intended here. The problem Dave was refering to was a cast of unsigned long * to u32 *, which doesn't work because it will use the upper 4 byte on big-endian 64bit. But without pointer casts this should work well. it's not Dave's bug, all this changes force compiler to scream, which thrusts forward. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NETLINK 8/10]: Support dynamic number of multicast groups per netlink family
Evgeniy Polyakov wrote: it's not Dave's bug, all this changes force compiler to scream, which thrusts forward. I don't get any compiler warnings with gcc-4.0.1 on x86 and amd64, so could you please be more specific? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NETLINK 8/10]: Support dynamic number of multicast groups per netlink family
On Mon, Aug 15, 2005 at 11:06:27AM +0200, Patrick McHardy ([EMAIL PROTECTED]) wrote: Evgeniy Polyakov wrote: it's not Dave's bug, all this changes force compiler to scream, which thrusts forward. I don't get any compiler warnings with gcc-4.0.1 on x86 and amd64, so could you please be more specific? My fault, it was my changes on top of yours about which compiler warns :) Sorry for that. unsigned long can be transformed into any type safely. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NETLINK 6/8]: Support dynamic number of multicast groups per netlink family
* Patrick McHardy [EMAIL PROTECTED] 2005-08-14 15:30 Thomas Graf wrote: * Patrick McHardy [EMAIL PROTECTED] 2005-08-13 02:36 [NETLINK]: Support dynamic number of multicast groups per netlink family Signed-off-by: Patrick McHardy [EMAIL PROTECTED] - if ((err = __netlink_create(sock, protocol) 0)) + nlk-groups = kmalloc(NLGRPSZ(groups), GFP_KERNEL); + if (nlk-groups == NULL) { + err = -ENOMEM; goto out_module; + } Inteded to depute the cleanup of __netlink_create to a call to sock_release() by the caller? Sorry, I'm not sure I understand what you mean :) You don't undo the allocations of __netlink_create if the kmalloc for nlk-groups fails. So my question is if this is intended and you really want to rely on the caller to invoke sock_release() to free the sk again or whether it might be worth to follow the rule of leave things untouched in case of an error. Given we remove the minimal group size of 32 introduced in a later patch would it make sense to not allocate if groups==0 at the cost of a few additional runtime checks? I only see a real cost in do_one_broacast() but the check for group - 1 = ngroups already ensures it to be allocated so I don't see any problems performance wise. We could do that, the main reason why my patches enforce a minimum of 32 groups is for backwards compatiblity so getsockname returns the same nl_groups mask that was specified in bind. I'm not sure if we really need this .. Good point, I'll think about this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NETLINK 6/8]: Support dynamic number of multicast groups per netlink family
Thomas Graf wrote: You don't undo the allocations of __netlink_create if the kmalloc for nlk-groups fails. So my question is if this is intended and you really want to rely on the caller to invoke sock_release() to free the sk again or whether it might be worth to follow the rule of leave things untouched in case of an error. It was intentional, but I agree the other way around would be more consistent. If you want to send a patch, go ahead, otherwise I'll put it on my cleanup-list. Not allowing user sockets for unregistered protocols allows a couple of other cleanups as well. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Offload (TOE) - Chelsio
On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote: From: Dimitris Michailidis [EMAIL PROTECTED] Date: Fri, 12 Aug 2005 10:00:12 -0700 On 8/12/05, David S. Miller [EMAIL PROTECTED] wrote: This would mean that every time we wish to change the data structures and interfaces for TCP socket lookup, your drivers would need to change. I think using TCP's own functions was done exactly to avoid this problem. That's doesn't achieve the desired result. I do plan to merge in IBM's move of the TCP hash tables over to RCU style locking, and that will require knowledge of the locking at the call sites to the functions you have exported to the TOE drivers. The TOE drivers would break as a result. TOE uses the same locking strategies the host TCP uses (lock_sock and the rest) so it should at least be familiar. It doesn't use ehash_lock or head-lock other than indirectly through functions such as the above, and does its normal lookups in its own lockless table that is based on flow ids rather than 4-tuples. I haven't seen the patches you mention recently, I recall seeing some RCU ehash discussion several months ago and that didn't seem it would have much of an impact. If you have something more recent I can take a look and tell you if it would affect anything. You are creating a maintainence headache for us as well. Once this stuff gets exported to drivers, it becomes nearly impossible to change. And I absolutely reserve the right to create restrictions of use that increase the flexibility we have to change interfaces, data structures, and locking strategies in the future. I think you have a fine attitude here. There are and there will be a lot more users of the SW TCP than of TOEs and I think you should feel free to improve the former however you can. The TOE code still works with kernels going back to 2.4.22, tracking changes in mainline TCP hasn't been an issue so far. If you can give maintainers a heads up before changes you think may be disruptive I think that would be plenty on your part. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[-mm PATCH 05/32] net: fix-up schedule_timeout() usage
Description: Use schedule_timeout_{,un}interruptible() instead of set_current_state()/schedule_timeout() to reduce kernel size. Also use human-time conversion functions instead of hard-coded division to avoid rounding issues. Signed-off-by: Nishanth Aravamudan [EMAIL PROTECTED] --- net/core/pktgen.c| 13 + net/dccp/proto.c |2 +- net/ipv4/ipconfig.c |5 ++--- net/irda/ircomm/ircomm_tty.c |9 +++-- net/sunrpc/svcsock.c |3 +-- 5 files changed, 12 insertions(+), 20 deletions(-) --- 2.6.13-rc5-mm1/net/core/pktgen.c2005-08-07 10:05:22.0 -0700 +++ 2.6.13-rc5-mm1-dev/net/core/pktgen.c2005-08-14 13:32:59.0 -0700 @@ -1452,8 +1452,7 @@ static int proc_thread_write(struct file thread_lock(); t-control |= T_REMDEV; thread_unlock(); - current-state = TASK_INTERRUPTIBLE; - schedule_timeout(HZ/8); /* Propagate thread-control */ + schedule_timeout_interruptible(msecs_to_jiffies(125)); /* Propagate thread-control */ ret = count; sprintf(pg_result, OK: rem_device_all); goto out; @@ -1716,10 +1715,9 @@ static void spin(struct pktgen_dev *pkt_ printk(KERN_INFO sleeping for %d\n, (int)(spin_until_us - now)); while (now spin_until_us) { /* TODO: optimise sleeping behavior */ - if (spin_until_us - now (100/HZ)+1) { - current-state = TASK_INTERRUPTIBLE; - schedule_timeout(1); - } else if (spin_until_us - now 100) { + if (spin_until_us - now jiffies_to_usecs(1)+1) + schedule_timeout_interruptible(1); + else if (spin_until_us - now 100) { do_softirq(); if (!pkt_dev-running) return; @@ -2449,8 +2447,7 @@ static void pktgen_run_all_threads(void) } thread_unlock(); - current-state = TASK_INTERRUPTIBLE; - schedule_timeout(HZ/8); /* Propagate thread-control */ + schedule_timeout_interruptible(msecs_to_jiffies(125)); /* Propagate thread-control */ pktgen_wait_all_threads_run(); } diff -urpN 2.6.13-rc5-mm1/net/dccp/proto.c 2.6.13-rc5-mm1-dev/net/dccp/proto.c --- 2.6.13-rc5-mm1/net/dccp/proto.c 2005-08-07 10:05:22.0 -0700 +++ 2.6.13-rc5-mm1-dev/net/dccp/proto.c 2005-08-10 16:10:55.0 -0700 @@ -225,7 +225,7 @@ int dccp_sendmsg(struct kiocb *iocb, str if (delay timeo) goto out_discard; release_sock(sk); - delay = schedule_timeout(delay); + delay = schedule_timeout_interruptible(delay); lock_sock(sk); timeo -= delay; if (signal_pending(current)) diff -urpN 2.6.13-rc5-mm1/net/ipv4/ipconfig.c 2.6.13-rc5-mm1-dev/net/ipv4/ipconfig.c --- 2.6.13-rc5-mm1/net/ipv4/ipconfig.c 2005-08-07 10:05:22.0 -0700 +++ 2.6.13-rc5-mm1-dev/net/ipv4/ipconfig.c 2005-08-10 15:26:57.0 -0700 @@ -1102,10 +1102,9 @@ static int __init ic_dynamic(void) #endif jiff = jiffies + (d-next ? CONF_INTER_TIMEOUT : timeout); - while (time_before(jiffies, jiff) !ic_got_reply) { + while (time_before(jiffies, jiff) !ic_got_reply) set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(1); - } + schedule_timeout_uninterruptible(1); #ifdef IPCONFIG_DHCP /* DHCP isn't done until we get a DHCPACK. */ if ((ic_got_reply IC_BOOTP) diff -urpN 2.6.13-rc5-mm1/net/irda/ircomm/ircomm_tty.c 2.6.13-rc5-mm1-dev/net/irda/ircomm/ircomm_tty.c --- 2.6.13-rc5-mm1/net/irda/ircomm/ircomm_tty.c 2005-08-07 09:57:38.0 -0700 +++ 2.6.13-rc5-mm1-dev/net/irda/ircomm/ircomm_tty.c 2005-08-10 15:27:13.0 -0700 @@ -567,10 +567,8 @@ static void ircomm_tty_close(struct tty_ self-tty = NULL; if (self-blocked_open) { - if (self-close_delay) { - current-state = TASK_INTERRUPTIBLE; - schedule_timeout(self-close_delay); - } + if (self-close_delay) + schedule_timeout_interruptible(self-close_delay); wake_up_interruptible(self-open_wait); } @@ -863,8 +861,7 @@ static void ircomm_tty_wait_until_sent(s spin_lock_irqsave(self-spinlock, flags); while (self-tx_skb self-tx_skb-len) { spin_unlock_irqrestore(self-spinlock, flags); - current-state = TASK_INTERRUPTIBLE; - schedule_timeout(poll_time); +
[-mm PATCH 20/32] drivers/net: fix-up schedule_timeout() usage
Description: Use schedule_timeout_interruptible() instead of set_current_state()/schedule_timeout() to reduce kernel size. Signed-off-by: Nishanth Aravamudan [EMAIL PROTECTED] --- drivers/net/8139cp.c |3 - drivers/net/hp100.c | 48 ++ drivers/net/irda/stir4200.c |7 +--- drivers/net/ixgb/ixgb_ethtool.c |7 +--- drivers/net/ns83820.c |3 - drivers/net/tokenring/ibmtr.c |9 ++--- drivers/net/tokenring/olympic.c |2 - drivers/net/tokenring/tms380tr.c |3 - drivers/net/typhoon.c |7 +--- drivers/net/wan/cosa.c|6 +-- drivers/net/wan/cycx_drv.c|3 - drivers/net/wan/dscc4.c |9 + drivers/net/wan/farsync.c |3 - drivers/net/wireless/ipw2100.c| 17 +++--- drivers/net/wireless/prism54/islpci_dev.c |6 +-- drivers/net/wireless/prism54/islpci_mgt.c |5 +-- include/linux/ibmtr.h |4 +- include/linux/netdevice.h |6 +-- 18 files changed, 54 insertions(+), 94 deletions(-) diff -urpN 2.6.13-rc5-mm1/drivers/net/8139cp.c 2.6.13-rc5-mm1-dev/drivers/net/8139cp.c --- 2.6.13-rc5-mm1/drivers/net/8139cp.c 2005-08-07 09:58:00.0 -0700 +++ 2.6.13-rc5-mm1-dev/drivers/net/8139cp.c 2005-08-08 15:54:06.0 -0700 @@ -1029,8 +1029,7 @@ static void cp_reset_hw (struct cp_priva if (!(cpr8(Cmd) CmdReset)) return; - set_current_state(TASK_UNINTERRUPTIBLE); - schedule_timeout(10); + schedule_timeout_uninterruptible(10); } printk(KERN_ERR %s: hardware reset timeout\n, cp-dev-name); diff -urpN 2.6.13-rc5-mm1/drivers/net/hp100.c 2.6.13-rc5-mm1-dev/drivers/net/hp100.c --- 2.6.13-rc5-mm1/drivers/net/hp100.c 2005-08-07 09:58:01.0 -0700 +++ 2.6.13-rc5-mm1-dev/drivers/net/hp100.c 2005-08-08 15:55:41.0 -0700 @@ -2517,10 +2517,8 @@ static int hp100_down_vg_link(struct net do { if (hp100_inb(VG_LAN_CFG_1) HP100_LINK_CABLE_ST) break; - if (!in_interrupt()) { - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(1); - } + if (!in_interrupt()) + schedule_timeout_interruptible(1); } while (time_after(time, jiffies)); if (time_after_eq(jiffies, time)) /* no signal-no logout */ @@ -2536,10 +2534,8 @@ static int hp100_down_vg_link(struct net do { if (!(hp100_inb(VG_LAN_CFG_1) HP100_LINK_UP_ST)) break; - if (!in_interrupt()) { - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(1); - } + if (!in_interrupt()) + schedule_timeout_interruptible(1); } while (time_after(time, jiffies)); #ifdef HP100_DEBUG @@ -2577,10 +2573,8 @@ static int hp100_down_vg_link(struct net do { if (!(hp100_inb(MAC_CFG_4) HP100_MAC_SEL_ST)) break; - if (!in_interrupt()) { - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(1); - } + if (!in_interrupt()) + schedule_timeout_interruptible(1); } while (time_after(time, jiffies)); hp100_orb(HP100_AUTO_MODE, MAC_CFG_3); /* Autosel back on */ @@ -2591,10 +2585,8 @@ static int hp100_down_vg_link(struct net do { if ((hp100_inb(VG_LAN_CFG_1) HP100_LINK_CABLE_ST) == 0) break; - if (!in_interrupt()) { - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(1); - } + if (!in_interrupt()) + schedule_timeout_interruptible(1); } while (time_after(time, jiffies)); if (time_before_eq(time, jiffies)) { @@ -2606,10 +2598,8 @@ static int hp100_down_vg_link(struct net time = jiffies + (2 * HZ); /* This seems to take a while */ do { - if (!in_interrupt()) { - set_current_state(TASK_INTERRUPTIBLE); - schedule_timeout(1); - } + if (!in_interrupt()) + schedule_timeout_interruptible(1); } while (time_after(time, jiffies)); return 0; @@ -2659,10 +2649,8 @@ static int hp100_login_to_vg_hub(struct do { if (~(hp100_inb(VG_LAN_CFG_1) HP100_LINK_UP_ST))
Re: [NETLINK 6/8]: Support dynamic number of multicast groups per netlink family
From: Patrick McHardy [EMAIL PROTECTED] Date: Mon, 15 Aug 2005 16:05:49 +0200 It was intentional, but I agree the other way around would be more consistent. If you want to send a patch, go ahead, otherwise I'll put it on my cleanup-list. Not allowing user sockets for unregistered protocols allows a couple of other cleanups as well. I would also like to recommend a few other potential cleanups: 1) inconsistent use of unsigned int vs. u32 for types used to hold the same data, for example the groups member of nl_table[] (which is unsigned int) vs. nlk-groups which is u32. 2) do_one_set_err() and do_one_broadcast() both make this identical test: if (nlk-pid == p-pid || p-group - 1 = nlk-ngroups || !test_bit(p-group - 1, nlk-groups)) goto out; so, consider making this into an inline function or similar. But this is nit-picking, of course, your patches were very well done Patrick :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: udp source port randomization?
From: bert hubert [EMAIL PROTECTED] Date: Mon, 15 Aug 2005 22:16:49 +0200 Currently socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) delivers the exact same source port each time I run it, 32776. The second invocation, without closing the first socket, generates 32777. This strikes me as being somewhat insecure and not in the spirit of TCP source port randomization. UDP does not have the same kind of vulnerability from port number guessing. In fact, UDP is extremely vulnerable for connected sockets no matter what we do in the port allocation area. UDP does not have sequence numbers, so there is nothing protecting an attacker from injecting random crap into a UDP connection. Another factor influencing this is the fact that most UDP usage is of the request/response type where the port identity only exists for those two packets. I really don't think it's worth the work to add UDP port randomization at all. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: udp source port randomization?
It does help 16 bits :-) Better than nothing. 16bits is so poor that any secure algorithms using it would just give a false sense of security. -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skb-pkt_type
From: Thomas Graf [EMAIL PROTECTED] Date: Mon, 15 Aug 2005 16:43:57 +0200 Dave, I found another problem in the earlier patch of mine, when we free the clone portion and the parent is still alive we used to set UNAVAIL in the else branch but at this point the skb could have been gone already, I fixed this in this patch. This patch hangs the machine on boot for me. Probably this is occuring, once again, on the first TCP usage which is the only spot which will use fclones in this patch. I thought firstly that it might be due to the child's fclone field not being initialized at __alloc_skb() time. So I fixed that up like so: + if (fclone) { + struct sk_buff *n = skb + 1; + + skb-fclone = SKB_FCLONE_ORIG; + n-fclone = SKB_FCLONE_UNAVAILABLE; + } That is a real bug, because we do not explicitly initialize the child skb with a memset() here, only the parent SKB gets that. But things are still busted somehow. Still looking. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skb-pkt_type
Ok, this scheme doesn't work as-is. We never run the __kfree_skb() actions on the parent SKB if the child drops the parent SKB users count to zero. This means we don't release the DST and other objects referenced in the parent SKB. We also never release the SKB memory in this case either. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skb-pkt_type
From: David S. Miller [EMAIL PROTECTED] Date: Mon, 15 Aug 2005 15:45:00 -0700 (PDT) Ok, this scheme doesn't work as-is. FWIW the fclone_ref version works perfectly fine, and I'm running this right now. I'm including it below against current net-2.6.14 for reference. So what do folks think we should do? I'm inclined to put this in first, as-is, then if we can get the skb-users variant functional we can add that in as a follow-on patch. diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -162,6 +162,13 @@ struct skb_timeval { u32 off_usec; }; + +enum { + SKB_FCLONE_UNAVAILABLE, + SKB_FCLONE_ORIG, + SKB_FCLONE_CLONE, +}; + /** * struct sk_buff - socket buffer * @next: Next buffer in list @@ -255,8 +262,10 @@ struct sk_buff { ip_summed:2, nohdr:1, nfctinfo:3; - __u8pkt_type; + __u8pkt_type:3, + fclone:2; __u16 protocol; + atomic_tfclone_ref; void(*destructor)(struct sk_buff *skb); #ifdef CONFIG_NETFILTER @@ -295,8 +304,20 @@ struct sk_buff { #include asm/system.h extern void __kfree_skb(struct sk_buff *skb); -extern struct sk_buff *alloc_skb(unsigned int size, -unsigned int __nocast priority); +extern struct sk_buff *__alloc_skb(unsigned int size, + unsigned int __nocast priority, int fclone); +static inline struct sk_buff *alloc_skb(unsigned int size, + unsigned int __nocast priority) +{ + return __alloc_skb(size, priority, 0); +} + +static inline struct sk_buff *alloc_skb_fclone(unsigned int size, + unsigned int __nocast priority) +{ + return __alloc_skb(size, priority, 1); +} + extern struct sk_buff *alloc_skb_from_cache(kmem_cache_t *cp, unsigned int size, unsigned int __nocast priority); diff --git a/include/net/sock.h b/include/net/sock.h --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1195,7 +1195,7 @@ static inline struct sk_buff *sk_stream_ int hdr_len; hdr_len = SKB_DATA_ALIGN(sk-sk_prot-max_header); - skb = alloc_skb(size + hdr_len, gfp); + skb = alloc_skb_fclone(size + hdr_len, gfp); if (skb) { skb-truesize += mem; if (sk-sk_forward_alloc = (int)skb-truesize || diff --git a/net/core/skbuff.c b/net/core/skbuff.c --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -69,6 +69,7 @@ #include asm/system.h static kmem_cache_t *skbuff_head_cache; +static kmem_cache_t *skbuff_fclone_cache; struct timeval __read_mostly skb_tv_base; @@ -120,7 +121,7 @@ void skb_under_panic(struct sk_buff *skb */ /** - * alloc_skb - allocate a network buffer + * __alloc_skb - allocate a network buffer * @size: size to allocate * @gfp_mask: allocation mask * @@ -131,14 +132,20 @@ void skb_under_panic(struct sk_buff *skb * Buffers may only be allocated from interrupts using a @gfp_mask of * %GFP_ATOMIC. */ -struct sk_buff *alloc_skb(unsigned int size, unsigned int __nocast gfp_mask) +struct sk_buff *__alloc_skb(unsigned int size, unsigned int __nocast gfp_mask, + int fclone) { struct sk_buff *skb; u8 *data; /* Get the HEAD */ - skb = kmem_cache_alloc(skbuff_head_cache, - gfp_mask ~__GFP_DMA); + if (fclone) + skb = kmem_cache_alloc(skbuff_fclone_cache, + gfp_mask ~__GFP_DMA); + else + skb = kmem_cache_alloc(skbuff_head_cache, + gfp_mask ~__GFP_DMA); + if (!skb) goto out; @@ -155,7 +162,14 @@ struct sk_buff *alloc_skb(unsigned int s skb-data = data; skb-tail = data; skb-end = data + size; + if (fclone) { + struct sk_buff *child = skb + 1; + skb-fclone = SKB_FCLONE_ORIG; + atomic_set(skb-fclone_ref, 1); + + child-fclone = SKB_FCLONE_UNAVAILABLE; + } atomic_set((skb_shinfo(skb)-dataref), 1); skb_shinfo(skb)-nr_frags = 0; skb_shinfo(skb)-tso_size = 0; @@ -268,8 +282,31 @@ void skb_release_data(struct sk_buff *sk */ void kfree_skbmem(struct sk_buff *skb) { + struct sk_buff *other; + skb_release_data(skb); - kmem_cache_free(skbuff_head_cache, skb); + switch (skb-fclone) { + case SKB_FCLONE_UNAVAILABLE: + kmem_cache_free(skbuff_head_cache, skb); + break;
[PATCH] iproute2 support for inet_diag
Hi Stephen, Please consider applying. One thing I think we should address is to show the name of the protocol in listings where sockets of more than one protocol type are being displayed, but this patch is a good start, I guess. Best Regards, - Arnaldo diff -uNrp iproute2-ss050808.orig/include/linux/inet_diag.h iproute2-ss050808.dccp/include/linux/inet_diag.h --- iproute2-ss050808.orig/include/linux/inet_diag.h1969-12-31 21:00:00.0 -0300 +++ iproute2-ss050808.dccp/include/linux/inet_diag.h2005-08-15 21:47:38.0 -0300 @@ -0,0 +1,121 @@ +#ifndef _INET_DIAG_H_ +#define _INET_DIAG_H_ 1 + +/* Just some random number */ +#define TCPDIAG_GETSOCK 18 +#define DCCPDIAG_GETSOCK 19 + +#define INET_DIAG_GETSOCK_MAX 24 + +/* Socket identity */ +struct inet_diag_sockid { + __u16 idiag_sport; + __u16 idiag_dport; + __u32 idiag_src[4]; + __u32 idiag_dst[4]; + __u32 idiag_if; + __u32 idiag_cookie[2]; +#define INET_DIAG_NOCOOKIE (~0U) +}; + +/* Request structure */ + +struct inet_diag_req { + __u8idiag_family; /* Family of addresses. */ + __u8idiag_src_len; + __u8idiag_dst_len; + __u8idiag_ext; /* Query extended information */ + + struct inet_diag_sockid id; + + __u32 idiag_states; /* States to dump */ + __u32 idiag_dbs; /* Tables to dump (NI) */ +}; + +enum { + INET_DIAG_REQ_NONE, + INET_DIAG_REQ_BYTECODE, +}; + +#define INET_DIAG_REQ_MAX INET_DIAG_REQ_BYTECODE + +/* Bytecode is sequence of 4 byte commands followed by variable arguments. + * All the commands identified by code are conditional jumps forward: + * to offset cc+yes or to offset cc+no. yes is supposed to be + * length of the command and its arguments. + */ + +struct inet_diag_bc_op { + unsigned char code; + unsigned char yes; + unsigned short no; +}; + +enum { + INET_DIAG_BC_NOP, + INET_DIAG_BC_JMP, + INET_DIAG_BC_S_GE, + INET_DIAG_BC_S_LE, + INET_DIAG_BC_D_GE, + INET_DIAG_BC_D_LE, + INET_DIAG_BC_AUTO, + INET_DIAG_BC_S_COND, + INET_DIAG_BC_D_COND, +}; + +struct inet_diag_hostcond { + __u8family; + __u8prefix_len; + int port; + __u32 addr[0]; +}; + +/* Base info structure. It contains socket identity (addrs/ports/cookie) + * and, alas, the information shown by netstat. */ +struct inet_diag_msg { + __u8idiag_family; + __u8idiag_state; + __u8idiag_timer; + __u8idiag_retrans; + + struct inet_diag_sockid id; + + __u32 idiag_expires; + __u32 idiag_rqueue; + __u32 idiag_wqueue; + __u32 idiag_uid; + __u32 idiag_inode; +}; + +/* Extensions */ + +enum { + INET_DIAG_NONE, + INET_DIAG_MEMINFO, + INET_DIAG_INFO, + INET_DIAG_VEGASINFO, + INET_DIAG_CONG, +}; + +#define INET_DIAG_MAX INET_DIAG_CONG + + +/* INET_DIAG_MEM */ + +struct inet_diag_meminfo { + __u32 idiag_rmem; + __u32 idiag_wmem; + __u32 idiag_fmem; + __u32 idiag_tmem; +}; + +/* INET_DIAG_VEGASINFO */ + +struct tcpvegas_info { + __u32 tcpv_enabled; + __u32 tcpv_rttcnt; + __u32 tcpv_rtt; + __u32 tcpv_minrtt; +}; + +#endif /* _INET_DIAG_H_ */ diff -uNrp iproute2-ss050808.orig/include/linux/netlink.h iproute2-ss050808.dccp/include/linux/netlink.h --- iproute2-ss050808.orig/include/linux/netlink.h 2005-08-08 17:24:41.0 -0300 +++ iproute2-ss050808.dccp/include/linux/netlink.h 2005-08-15 21:35:34.0 -0300 @@ -8,19 +8,17 @@ #define NETLINK_W1 1 /* 1-wire subsystem */ #define NETLINK_USERSOCK 2 /* Reserved for user mode socket protocols */ #define NETLINK_FIREWALL 3 /* Firewalling hook */ -#define NETLINK_TCPDIAG4 /* TCP socket monitoring */ +#define NETLINK_INET_DIAG 4 /* INET socket monitoring */ #define NETLINK_NFLOG 5 /* netfilter/iptables ULOG */ #define NETLINK_XFRM 6 /* ipsec */ #define NETLINK_SELINUX7 /* SELinux event notifications */ -#define NETLINK_ARPD 8 +#define NETLINK_ISCSI 8 /* Open-iSCSI */ #define NETLINK_AUDIT 9 /* auditing */ #define NETLINK_FIB_LOOKUP 10 -#define NETLINK_ROUTE6 11 /* af_inet6 route comm channel */ #define NETLINK_NETFILTER 12 /* netfilter subsystem */ #define NETLINK_IP6_FW 13 #define NETLINK_DNRTMSG14 /* DECnet routing messages */ #define NETLINK_KOBJECT_UEVENT 15 /* Kernel messages to userspace */ -#define NETLINK_TAPBASE16 /* 16 to 31 are ethertap */
Re: [PATCH] iproute2 support for inet_diag
Em Mon, Aug 15, 2005 at 09:51:54PM -0300, Arnaldo Carvalho de Melo escreveu: Hi Stephen, Please consider applying. One thing I think we should address is to show the name of the protocol in listings where sockets of more than one protocol type are being displayed, but this patch is a good start, I guess. Or even better, to use /etc/protocols and accept --proto as long as it is listed in /etc/protocols, for that we would have a special case in the kernel where 18 is mapped to TCP as well, there is no protocol mapped to 18 in my /etc/protocols right now, this would be the any problem with this scheme, I guess. This way we would not have to change iproute2 everytime we add support to inet_diag in some inet transport protocol (SCTP and UDP being the next potential ones). - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skb-pkt_type
On Mon, Aug 15, 2005 at 05:10:41PM -0700, David S. Miller wrote: So what do folks think we should do? I'm inclined to put this in first, as-is, then if we can get the skb-users variant functional we can add that in as a follow-on patch. Fine by me. I have a suggestion as to where fclone_ref should be though. I'd put it outside sk_buff. So when you allocate sk_buff * 2 for fast clones, make that sk_buff * 2 + atomic_t. Then you will only have to carry it around for the fast clones. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skb-pkt_type
From: Herbert Xu [EMAIL PROTECTED] Date: Tue, 16 Aug 2005 12:02:31 +1000 I have a suggestion as to where fclone_ref should be though. I'd put it outside sk_buff. So when you allocate sk_buff * 2 for fast clones, make that sk_buff * 2 + atomic_t. Then you will only have to carry it around for the fast clones. Excellent idea, I'll work on that change tonight or tomorrow sometime. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1][NET] Fix sparse warnings
Hi David, Please consider pulling from: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/ Please let me know about anything you may want reworked. Best Regards, - Arnaldo tree 7f65f8f8a8cf5b2f66089c9c039f2b032d964bab parent f38354751f9c96203164bf9fcf3ec9ee91ef07e5 author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1124169482 -0300 committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1124169482 -0300 [NET] Fix sparse warnings Of this type, mostly: CHECK net/ipv6/netfilter.c net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static? net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static? Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] -- include/linux/if_ether.h|2 + include/linux/if_frad.h |6 +++-- include/linux/if_tr.h |4 +++ include/linux/igmp.h|3 ++ include/linux/net.h |7 ++ include/linux/netdevice.h | 10 + include/linux/netfilter_ipv6.h |4 +-- include/linux/security.h|6 +++-- include/linux/skbuff.h |2 + include/linux/socket.h |7 ++ include/net/addrconf.h |6 + include/net/af_unix.h | 15 + include/net/icmp.h |7 ++ include/net/ip.h| 23 + include/net/ip_fib.h|5 include/net/ipv6.h | 35 ++-- include/net/p8022.h |2 + include/net/raw.h |7 +- include/net/route.h |2 + include/net/sock.h | 12 +++ include/net/tcp.h | 12 +++ include/net/udp.h |5 init/main.c |2 - kernel/sysctl.c |4 --- net/802/p8023.c |1 net/802/sysctl_net_802.c|3 +- net/core/dev.c |6 - net/core/sysctl_net_core.c |9 +--- net/core/utils.c|2 + net/core/wireless.c |4 --- net/ethernet/eth.c |2 - net/ethernet/sysctl_net_ether.c |1 net/ipv4/af_inet.c | 14 - net/ipv4/datagram.c |1 net/ipv4/inetpeer.c |1 net/ipv4/ip_sockglue.c |2 - net/ipv4/proc.c |3 -- net/ipv4/syncookies.c |2 - net/ipv4/sysctl_net_ipv4.c | 43 ++-- net/ipv4/tcp_input.c|2 - net/ipv4/tcp_ipv4.c |2 - net/ipv6/addrconf.c |4 +-- net/ipv6/af_inet6.c | 24 -- net/ipv6/ipv6_sockglue.c|8 --- net/ipv6/route.c|6 + net/ipv6/sit.c |2 - net/ipv6/sysctl_net_ipv6.c |3 -- net/ipv6/tcp_ipv6.c |4 --- net/ipv6/udp.c |2 - net/ipx/af_ipx.c|2 - net/socket.c| 11 -- net/sysctl_net.c|8 ++- net/unix/af_unix.c |8 --- net/unix/sysctl_net_unix.c |2 - 54 files changed, 208 insertions(+), 162 deletions(-) -- diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h --- a/include/linux/if_ether.h +++ b/include/linux/if_ether.h @@ -110,6 +110,8 @@ static inline struct ethhdr *eth_hdr(con { return (struct ethhdr *)skb-mac.raw; } + +extern struct ctl_table ether_table[]; #endif #endif /* _LINUX_IF_ETHER_H */ diff --git a/include/linux/if_frad.h b/include/linux/if_frad.h --- a/include/linux/if_frad.h +++ b/include/linux/if_frad.h @@ -191,10 +191,12 @@ struct frad_local int buffer; /* current buffer for S508 firmware */ }; -extern void dlci_ioctl_set(int (*hook)(unsigned int, void __user *)); - #endif /* __KERNEL__ */ #endif /* CONFIG_DLCI || CONFIG_DLCI_MODULE */ +#ifdef __KERNEL__ +extern void dlci_ioctl_set(int (*hook)(unsigned int, void __user *)); +#endif + #endif diff --git a/include/linux/if_tr.h b/include/linux/if_tr.h --- a/include/linux/if_tr.h +++ b/include/linux/if_tr.h @@ -43,12 +43,16 @@ struct trh_hdr { }; #ifdef __KERNEL__ +#include linux/config.h #include linux/skbuff.h static inline struct trh_hdr *tr_hdr(const struct sk_buff *skb) { return (struct trh_hdr *)skb-mac.raw; } +#ifdef CONFIG_SYSCTL +extern struct ctl_table tr_table[]; +#endif #endif /* This is an Token-Ring LLC structure */ diff --git a/include/linux/igmp.h b/include/linux/igmp.h --- a/include/linux/igmp.h +++ b/include/linux/igmp.h @@ -129,6 +129,9 @@ struct
Re: [PATCH 1/1][NET] Fix sparse warnings
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo) Date: Tue, 16 Aug 2005 02:24:14 -0300 Please consider pulling from: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git/ Please let me know about anything you may want reworked. Looks good, pulled. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html