Re: [PATCH] Allow kfree_skb to be called with a NULL argument
On Thu, Feb 23, 2006 at 07:53:36AM +0100, J?rn Engel wrote: How is that argument special for kfree_skb? Both libc free and kfree ignore NULL arguments and do so for good reasons. Well with kfree there is actually a slight gain in that you are doing the check in one place. kfree_skb on the other hand is inlined so the you're actually adding bloat to many places that simply don't need it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]IPv4 UDP does not discard the datagram with invalid checksum
From: Wei Yongjun [EMAIL PROTECTED] Date: Thu, 23 Feb 2006 16:03:18 -0500 IPv4 UDP does not discard the datagram with invalid checksum. UDP can validate UDP checksums correctly only when socket filtering instructions is set. If socket filtering instructions is not set, datagram with invalid checksum will be passed to the application. We check the checksum later, in parallel with the copy of the packet data into userspace. See udp_recvmsg(), where we do this: if (skb-ip_summed==CHECKSUM_UNNECESSARY) { err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg-msg_iov, copied); } else if (msg-msg_flagsMSG_TRUNC) { if (__udp_checksum_complete(skb)) goto csum_copy_err; err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg-msg_iov, copied); } else { err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg-msg_iov); if (err == -EINVAL) goto csum_copy_err; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] prism54usb: compile fix
On Mon, 20 Feb 2006 20:39:16 +0100, Carlos Martin [EMAIL PROTECTED] wrote: diff --git a/drivers/net/wireless/prism54usb/isl_sm.h b/drivers/net/wireless/prism54usb/isl_sm.h index 9e41587..c39bb48 100644 --- a/drivers/net/wireless/prism54usb/isl_sm.h +++ b/drivers/net/wireless/prism54usb/isl_sm.h @@ -249,7 +249,7 @@ extern int islsm_wait_timeo /* now the helper functions, for sending packets */ int islsm_outofband_msg(struct net_device *netdev, - void *buf, unsigned int size); + void *buf, size_t size); I have it in my tree already. Something is inconsistent somewhere. Weird. -- Pete - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Some infrastructure for interrupt-less TX
On Thu, Feb 23, 2006 at 08:00:32AM +0100, Jörn Engel wrote: I am assuming the real goal is avoiding interrupts when transmit completions can be reported without them on a reasonably periodic basis. Not necessarily on a periodic basis. For some network driver I once worked on, the hardware simply had a ring buffer of n frames. Whenever a n+1th frame was transmitted, the first would be checked for completion. If it was completed, it was freed, else the new frame was dropped (and freed). This breaks socket buffer accounting. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow kfree_skb to be called with a NULL argument
On Thu, 23 February 2006 19:28:49 +1100, Herbert Xu wrote: On Thu, Feb 23, 2006 at 07:53:36AM +0100, J?rn Engel wrote: How is that argument special for kfree_skb? Both libc free and kfree ignore NULL arguments and do so for good reasons. Well with kfree there is actually a slight gain in that you are doing the check in one place. kfree_skb on the other hand is inlined so the you're actually adding bloat to many places that simply don't need it. Wrt. the binary, you have a point. For source code, my patch does not any new bloat and allows removal of the existing. Lemme do a quick measurement for the kernel I run on my machine: -rwxr-xr-x 1 joern src 4836592 Feb 23 10:43 vmlinux -rwxr-xr-x 1 joern src 4836727 Feb 23 10:19 vmlinux.kfree_null 135 bytes added by my patch. Not that much. Jörn -- He who knows others is wise. He who knows himself is enlightened. -- Lao Tsu - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow kfree_skb to be called with a NULL argument
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 23 Feb 2006 21:55:43 +1100 Now there's a good idea. After all, the great majority of callers of kfree_skb expect to free the skb. Dave, what do you think? Absolutely. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow kfree_skb to be called with a NULL argument
On Thu, 23 February 2006 03:11:12 -0800, David S. Miller wrote: Now there's a good idea. After all, the great majority of callers of kfree_skb expect to free the skb. Dave, what do you think? Absolutely. Should I merge the two patches into one and resend? Jörn -- If you're willing to restrict the flexibility of your approach, you can almost always do something better. -- John Carmack - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow kfree_skb to be called with a NULL argument
On Thu, Feb 23, 2006 at 12:22:31PM +0100, J?rn Engel wrote: Should I merge the two patches into one and resend? Sounds good. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Uninline kfree_skb and allow NULL argument
On Thu, 23 February 2006 22:26:01 +1100, Herbert Xu wrote: On Thu, Feb 23, 2006 at 12:22:31PM +0100, J?rn Engel wrote: Should I merge the two patches into one and resend? Sounds good. Here it is. Jörn -- Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures. -- Rob Pike o Uninline kfree_skb, which saves some 15k of object code on my notebook. o Allow kfree_skb to be called with a NULL argument. Subsequent patches can remove conditional from drivers and further reduce source and object size. Signed-off-by: Jörn Engel [EMAIL PROTECTED] --- include/linux/skbuff.h | 17 + net/core/skbuff.c | 18 ++ 2 files changed, 19 insertions(+), 16 deletions(-) --- kfree_skb/include/linux/skbuff.h~kfree_skb_uninline_null2006-02-23 13:35:05.0 +0100 +++ kfree_skb/include/linux/skbuff.h2006-02-23 13:36:23.0 +0100 @@ -306,6 +306,7 @@ struct sk_buff { #include asm/system.h +void kfree_skb(struct sk_buff *skb); extern void __kfree_skb(struct sk_buff *skb); extern struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int fclone); @@ -406,22 +407,6 @@ static inline struct sk_buff *skb_get(st */ /** - * kfree_skb - free an sk_buff - * @skb: buffer to free - * - * Drop a reference to the buffer and free it if the usage count has - * hit zero. - */ -static inline void kfree_skb(struct sk_buff *skb) -{ - if (likely(atomic_read(skb-users) == 1)) - smp_rmb(); - else if (likely(!atomic_dec_and_test(skb-users))) - return; - __kfree_skb(skb); -} - -/** * skb_cloned - is the buffer a clone * @skb: buffer to check * --- kfree_skb/net/core/skbuff.c~kfree_skb_uninline_null 2006-02-23 13:35:05.0 +0100 +++ kfree_skb/net/core/skbuff.c 2006-02-23 13:37:01.0 +0100 @@ -355,6 +355,24 @@ void __kfree_skb(struct sk_buff *skb) } /** + * kfree_skb - free an sk_buff + * @skb: buffer to free + * + * Drop a reference to the buffer and free it if the usage count has + * hit zero. + */ +void kfree_skb(struct sk_buff *skb) +{ + if (unlikely(!skb)) + return; + if (likely(atomic_read(skb-users) == 1)) + smp_rmb(); + else if (likely(!atomic_dec_and_test(skb-users))) + return; + __kfree_skb(skb); +} + +/** * skb_clone - duplicate an sk_buff * @skb: buffer to clone * @gfp_mask: allocation priority - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with Ipsec transport mode over NAT
Patrick McHardy wrote: Chinh Nguyen wrote: I discovered that the bug is in the function tcp_v4_rcv for kernel 2.6.16-rc1. After the ESP packet is decapped and decrypted in xfrm4_rcv_encap_finish, the unencrypted packet is pushed back through ip_local_deliver. For a UDP packet, it goes (back) to function udp_queue_rcv_skb. The first thing this function does is called xfrm4_policy_check. As noted previously, in xfrm4_policy_check, if the skb-sp != NULL, the esp_post_input function is called. The post input function sets skb-ip_summed to CHECKSUM_UNNECESSASRY if we are in transport mode. Therefore, further down in udp_queue_rcv_skb, we skip the checksum check and the packet is passed up the stack. However, for a decrypted TCP packet, the packet goes to tcp_v4_rcv. This function does the checksum check right away if skb-ip_summed != CHECKSUM_UNNECESSARY while xfrm4_policy_check is called a little later in the function. Therefore, the esp post input has not yet set the ip_summed to unnecessary. The decrypted packet fails the checksum and is discarded. To confirm this, I added another call to xfrm4_policy_check before the checksum check in tcp_v4_rcv (to call esp post input). Once patched, my systems were able to initiate TCP connections using Transport Mode/NAT. What values does skb-ip_summed have before that? the skb-ip_summed value before the checksum check in tcp_v4_rcv is CHECKSUM_NONE. Hence tcp_v4_rcv checks its value, which is incorrect because the checksum is with regards to the private IP but the NAT device has modified the source IP. I believe that skb-ip_summed is set to CHECKSUM_NONE by esp_input (net/ipv4/esp4.c:180) which is called by xfrm4_rcv_encap (net/ipv4/xfrm4_input.c:101). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: fix first packet goes out with MAC 00:00:00:00:00:00
On Thu, 2006-23-02 at 17:41 +0300, Alexey Kuznetsov wrote: After some thinking I suspect the deletion of this chunk could change behaviour of some parts which do not use neighbour cache f.e. packet socket. Thanks Alexey, this was what i was worried about ;- I think safer approach would be to move this chunk after if (daddr). And the possibility to remove this completely could be analyzed later. Ok, patch attached. Dave this also is needed for 2.6.16-rcXX. Tested against a standard eth device (e1000) and tuntap. cheers, jamal For ethernet-like netdevices, dont overwritte first packet's dst MAC address when it is already resolved Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] --- diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 9890fd9..c971f14 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -95,6 +95,12 @@ int eth_header(struct sk_buff *skb, stru saddr = dev-dev_addr; memcpy(eth-h_source,saddr,dev-addr_len); + if(daddr) + { + memcpy(eth-h_dest,daddr,dev-addr_len); + return ETH_HLEN; + } + /* * Anyway, the loopback-device should never use this function... */ @@ -105,12 +111,6 @@ int eth_header(struct sk_buff *skb, stru return ETH_HLEN; } - if(daddr) - { - memcpy(eth-h_dest,daddr,dev-addr_len); - return ETH_HLEN; - } - return -ETH_HLEN; }
ipw2200 tester needed
In reviewing the ieee80211 stack in order to add additional geographic support for wireless drivers, I have studied all the in-kernel wireless drivers for their interactions with the routines in ieee80211_geo.c. As clearly stated in the comments, ipw2200.c duplicates most of those routines, even though ieee80211 is required to use ipw2200. Obviously, this bloats both the source code and the binaries for any user of ipw2200. I am planning to develop a patch to have ipw2200 use the ieee80211 code; however, I do not have the necessary hardware to test the result. Is anyone interested in testing this patch for me? Are there any comments regarding this change? Thanks, Larry - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with Ipsec transport mode over NAT
Chinh Nguyen wrote: Patrick McHardy wrote: What values does skb-ip_summed have before that? the skb-ip_summed value before the checksum check in tcp_v4_rcv is CHECKSUM_NONE. Hence tcp_v4_rcv checks its value, which is incorrect because the checksum is with regards to the private IP but the NAT device has modified the source IP. Netfilter recalculates the checksum when NATing it. I believe that skb-ip_summed is set to CHECKSUM_NONE by esp_input (net/ipv4/esp4.c:180) which is called by xfrm4_rcv_encap (net/ipv4/xfrm4_input.c:101). The question is why the checksum is invalid. Please start by describing what you're trying to do. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.16-rc4] e1000: revert to single descriptor for legacy receive path
A recent patch attempted to enable more efficient memory usage by using only 2kB descriptors for jumbo frames. The method used to implement this has since been commented upon as illegal and in recent kernels even causes a BUG when receiving ip fragments while using jumbo frames. This patch simply goes back to the way things were. We expect some complaints to reoccur due to order 3 allocations failing due to this change. Signed-off-by: Jesse Brandeburg [EMAIL PROTECTED] --- drivers/net/e1000/e1000.h |3 - drivers/net/e1000/e1000_main.c | 117 +++- 2 files changed, 45 insertions(+), 75 deletions(-) diff --git a/drivers/net/e1000/e1000.h b/drivers/net/e1000/e1000.h index 27c7730..99baf0e 100644 --- a/drivers/net/e1000/e1000.h +++ b/drivers/net/e1000/e1000.h @@ -225,9 +225,6 @@ struct e1000_rx_ring { struct e1000_ps_page *ps_page; struct e1000_ps_page_dma *ps_page_dma; - struct sk_buff *rx_skb_top; - struct sk_buff *rx_skb_prev; - /* cpu for rx queue */ int cpu; diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 31e3329..5b7d0f4 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -103,7 +103,7 @@ static char e1000_driver_string[] = Int #else #define DRIVERNAPI -NAPI #endif -#define DRV_VERSION 6.3.9-k2DRIVERNAPI +#define DRV_VERSION 6.3.9-k4DRIVERNAPI char e1000_driver_version[] = DRV_VERSION; static char e1000_copyright[] = Copyright (c) 1999-2005 Intel Corporation.; @@ -1635,8 +1635,6 @@ setup_rx_desc_die: rxdr-next_to_clean = 0; rxdr-next_to_use = 0; - rxdr-rx_skb_top = NULL; - rxdr-rx_skb_prev = NULL; return 0; } @@ -1713,8 +1711,23 @@ e1000_setup_rctl(struct e1000_adapter *a rctl |= adapter-rx_buffer_len 0x11; } else { rctl = ~E1000_RCTL_SZ_4096; - rctl = ~E1000_RCTL_BSEX; - rctl |= E1000_RCTL_SZ_2048; + rctl |= E1000_RCTL_BSEX; + switch (adapter-rx_buffer_len) { + case E1000_RXBUFFER_2048: + default: + rctl |= E1000_RCTL_SZ_2048; + rctl = ~E1000_RCTL_BSEX; + break; + case E1000_RXBUFFER_4096: + rctl |= E1000_RCTL_SZ_4096; + break; + case E1000_RXBUFFER_8192: + rctl |= E1000_RCTL_SZ_8192; + break; + case E1000_RXBUFFER_16384: + rctl |= E1000_RCTL_SZ_16384; + break; + } } #ifndef CONFIG_E1000_DISABLE_PACKET_SPLIT @@ -2107,16 +2120,6 @@ e1000_clean_rx_ring(struct e1000_adapter } } - /* there also may be some cached data in our adapter */ - if (rx_ring-rx_skb_top) { - dev_kfree_skb(rx_ring-rx_skb_top); - - /* rx_skb_prev will be wiped out by rx_skb_top */ - rx_ring-rx_skb_top = NULL; - rx_ring-rx_skb_prev = NULL; - } - - size = sizeof(struct e1000_buffer) * rx_ring-count; memset(rx_ring-buffer_info, 0, size); size = sizeof(struct e1000_ps_page) * rx_ring-count; @@ -3106,24 +3109,27 @@ e1000_change_mtu(struct net_device *netd break; } - /* since the driver code now supports splitting a packet across -* multiple descriptors, most of the fifo related limitations on -* jumbo frame traffic have gone away. -* simply use 2k descriptors for everything. -* -* NOTE: dev_alloc_skb reserves 16 bytes, and typically NET_IP_ALIGN -* means we reserve 2 more, this pushes us to allocate from the next -* larger slab size -* i.e. RXBUFFER_2048 -- size-4096 slab */ - /* recent hardware supports 1KB granularity */ if (adapter-hw.mac_type e1000_82547_rev_2) { - adapter-rx_buffer_len = - ((max_frame E1000_RXBUFFER_2048) ? - max_frame : E1000_RXBUFFER_2048); + adapter-rx_buffer_len = max_frame; E1000_ROUNDUP(adapter-rx_buffer_len, 1024); - } else - adapter-rx_buffer_len = E1000_RXBUFFER_2048; + } else { + if(unlikely((adapter-hw.mac_type e1000_82543) + (max_frame MAXIMUM_ETHERNET_FRAME_SIZE))) { + DPRINTK(PROBE, ERR, Jumbo Frames not supported + on 82542\n); + return -EINVAL; + } else { + if(max_frame = E1000_RXBUFFER_2048) + adapter-rx_buffer_len = E1000_RXBUFFER_2048; + else if(max_frame = E1000_RXBUFFER_4096) + adapter-rx_buffer_len = E1000_RXBUFFER_4096; + else
Fw: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call
Begin forwarded message: Date: Thu, 23 Feb 2006 07:26:28 -0800 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call http://bugzilla.kernel.org/show_bug.cgi?id=6121 Summary: TCP_DEFER_ACCEPT is reset on listen() call Kernel Version: 2.6.14, 2.6.15 Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: 2.6.13 Distribution: Hardware Environment: Software Environment: Problem Description: Value of TCP_DEFER_ACCEPT socket option is reset to zero when listen() is called. Steps to reproduce: Following program shows the problem: #include sys/types.h #include sys/socket.h #include netinet/in.h #include netinet/tcp.h main() { int s = socket(AF_INET, SOCK_STREAM, 0); int val = 1; int len = sizeof(val); setsockopt(s, SOL_TCP, TCP_DEFER_ACCEPT, val, len); listen(s, 1); getsockopt(s, SOL_TCP, TCP_DEFER_ACCEPT, val, len); printf(get TCP_DEFER_ACCEPT = %d\n, val); } On =2.6.13 output is get TCP_DEFER_ACCEPT = 3; On =2.6.14 output is get TCP_DEFER_ACCEPT = 0. Starting from 2.6.14, defer_accept is moved to request_sock_queue structure, which is re-initialized in inet_csk_listen_start(). --- You are receiving this mail because: --- You are on the CC list for the bug, or are watching someone who is. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call
On 2/23/06, Andrew Morton [EMAIL PROTECTED] wrote: Starting from 2.6.14, defer_accept is moved to request_sock_queue structure, which is re-initialized in inet_csk_listen_start(). Oops, looking into it... - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/02] add mask options to fwmark masking code
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Patrick == Patrick McHardy [EMAIL PROTECTED] writes: #define RTA_FWMARK RTA_PROTOINFO +#define RTA_FWMARK_MASK RTA_CACHEINFO Patrick Please introduce a new attribute for this instead of Patrick overloading RTA_CACHEINFO. I would be happy to do that. Should I also un-overload FWMARK, with backwards compatibility? diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c index de327b3..69eed89 100644 --- a/net/ipv4/fib_rules.c +++ b/net/ipv4/fib_rules.c @@ -68,6 +68,7 @@ struct fib_rule u8 r_tos; #ifdef CONFIG_IP_ROUTE_FWMARK u32 r_fwmark; + u32 r_fwmark_mask; Patrick Both patches have whitespace issues. You should also change uhm. okay. I'm surprised, since I produced it with git-format-patch. Maybe there are tabs that emacs screwed up. - -- ] ON HUMILITY: to err is human. To moo, bovine. | firewalls [ ] Michael Richardson,Xelerance Corporation, Ottawa, ON|net architect[ ] [EMAIL PROTECTED] http://www.sandelman.ottawa.on.ca/mcr/ |device driver[ ] panic(Just another Debian GNU/Linux using, kernel hacking, security guy); [ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Finger me for keys iQEVAwUBQ/4O2ICLcPvd0N1lAQK/egf6A0iQ1hvecR4BeaCrQiu53beGZd6zHldk o6logfar94kPP/H/D/kMcNeAvL2a3cJ8wyfyP02Cav8gP1C3X+XV+yLtA9jHIrdK nqQ1gw7F4Cj2+v7du/jS8GxNMWevXhJ7f9hvnzh8+DHMUCjqiksgsuIgcRQYrqOQ vxYERvR5TojEIaJfg8kH/lJRn3sm/APuMphM6c6SAeqrWpAdijbZb4LSNpGH50ci nNhUp+FxoP8vVFTMTu7M1MK4fpCIWA/PxBkmy3YDhcQx1+mE2nrEqHdbKfx9uY+t 0mxR8UC5sthhn94/VCjcqWOoHe3S/Gi+WWoPtwN1sFe5BujwU7Vcfw== =yKIA -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] iproute2 -- add fwmarkmask
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Patrick == Patrick McHardy [EMAIL PROTECTED] writes: Patrick The normal way to display masks is with a /. Also I think Patrick it shouldn't display the default mask to avoid breaking Patrick scripts that parse the output. I generally dislike the /VALUE, since I expect /PREFIX-LEN. I agree that it shouldn't show if it is default. Patrick ip should be able to parse its own output, and it would Patrick also look nicer if I could just say fwmark Patrick 0x1/32. fwmarkmask is really an incredible ugly expression Patrick :) Sure. Is that a 32-bit long mask (0xfff), or is it a 0x0020? fwmark is not an address. Or would you like /32 to be a prefix-based mask, and value and/or fwmarkmask to be a value? - -- ] ON HUMILITY: to err is human. To moo, bovine. | firewalls [ ] Michael Richardson,Xelerance Corporation, Ottawa, ON|net architect[ ] [EMAIL PROTECTED] http://www.sandelman.ottawa.on.ca/mcr/ |device driver[ ] panic(Just another Debian GNU/Linux using, kernel hacking, security guy); [ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Finger me for keys iQEUAwUBQ/4PcoCLcPvd0N1lAQIHhQf3XzPLA91QEx2+XpmYIm8RyB1oKmUUXDP+ s2UrhOKbQwipcq8/hk1t4FKx8J5j/dFHzVXbgPK+ZUwX4+IjHmM3r0sCIcK08xwU /ZZjf0wqwUI+RcPRFw3zC0+hnwRUIAUxhl3p7h3PigDpPu7AY5tQ1dXc6WNwRjTi fS7Yozbo225dzvVLKHhSIqOQ4eJFJcPPQdTKQLxnc3gtVoSe41DKMM+x6uix6fG8 se9dngJRbhye1Xgws9AGnBQT9f7JVmCSv7V4SHnNynmnRw3cra8++QEnLZ/vhm5C JdeVSeDGxAPuKEj6HA2RZu/UOG6RkYNZGPovGKzuPn403x0HNBuf =BzfV -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with Ipsec transport mode over NAT
Patrick McHardy wrote: Chinh Nguyen wrote: Patrick McHardy wrote: What values does skb-ip_summed have before that? the skb-ip_summed value before the checksum check in tcp_v4_rcv is CHECKSUM_NONE. Hence tcp_v4_rcv checks its value, which is incorrect because the checksum is with regards to the private IP but the NAT device has modified the source IP. Netfilter recalculates the checksum when NATing it. The NATing is not done by netfilter but by the NAT device between the IPsec peers. I believe that skb-ip_summed is set to CHECKSUM_NONE by esp_input (net/ipv4/esp4.c:180) which is called by xfrm4_rcv_encap (net/ipv4/xfrm4_input.c:101). The question is why the checksum is invalid. Please start by describing what you're trying to do. [Linux ipsec client C] -- [NAT device] -- [Linux ipsec server S] C negotiates a IPsec Transport Mode with S. Because of Transport Mode/NAT-T, 2 things happen to an IPsec packet. 1. It is UDP-encapsulated, typically on port 4500/udp. 2. Transport Mode traffic leaves the original IP header alone whereas tunnel mode wraps the entire traffic in a second IP header. As such, when the packet passes through the NAT device, the source IP is N. However, the original unencrypted packet had source IP C. S rips off the UDP-encap header, decrypts the payload, and joins the content back to the IP header. If the decrypted content is UDP or TCP, the UDP/TCP checksum is now incorrect because the source IP is now N not C. (In tunnel mode, we would ignore the NAT-ted outer IP header because the decrypted content has an entire IP header + UDP/TCP etc) This is a well-known problem with transport mode/NAT. One solution is to use NAT-OA and NAT-OR to recalculate the checksum. The linux kernel does the simpler thing of ignoring the UDP/TCP checksum altogether in this particular case: function esp_post_input (net/ipv4/esp4.c) 290 /* 291 * 2) ignore UDP/TCP checksums in case 292 *of NAT-T in Transport Mode, or 293 *perform other post-processing fixes 294 *as per * draft-ietf-ipsec-udp-encaps-06, 295 *section 3.1.2 296 */ 297 if (!x-props.mode) 298 skb-ip_summed = CHECKSUM_UNNECESSARY; 299 300 break; As noted, esp_post_input is called in xfrm4_policy_check. Decrypted UDP traffic through transport mode/nat also has bad checksums. However, since it is passed through udp_queue_rcv_skb after decryption, and this function calls xfrm4_policy_check before checking the UDP checksum, line 298 means the kernel ignores the bad checksum. Decrypted TCP traffic has bad checksums too. But since tcp_v4_rcv checks the TCP checksum before calling xfrm4_policy_check, the bad checksum means the TCP packet is dropped as a bad segment. The end result is that UDP and other traffic (eg, ICMP) can pass through transport mode/nat but not TCP. I don't know what correct fix is. Adding an extra call to xfrm4_policy_check in tcp_v4_rcv before the checksum check fixes this problem and doesn't seem to break anything else. On the other hand, moving some of the code in esp_post_input into esp_input (especially line 298) will work, too. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Uninline kfree_skb and allow NULL argument
From: Jörn Engel [EMAIL PROTECTED] Date: Thu, 23 Feb 2006 13:52:59 +0100 +void kfree_skb(struct sk_buff *skb); extern void __kfree_skb(struct sk_buff *skb); If you wish to contribute to a software project, you should adhere to the coding style and conventions of that project when submitting changes. It doesn't matter what the reasons are for those conventions, you should follow them until the projects decides to change them. If you wish to discuss the merits of putting extern there or not in function declarations, you can start a thread about that and make proposals on linux-kernel. Patch submissions are not the place to do that. So place add extern here, thanks a lot. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 losing promisc rx_mode bit
On Thu, 2006-02-23 at 14:31 -0800, Jim Westfall wrote: I am seeing the following issue on only the first onboard nic on each of the servers. If the nic is put into promisc mode too soon after the nic is brought up, the promisc bit in the rx_mode register is somehow getting reset to 0; This is a known problem caused by ASF or IPMI firmware overwriting the promiscuous mode bit. I will have someone contact you to get the firmware upgraded. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tg3 losing promisc rx_mode bit
On 2/24/06, Michael Chan [EMAIL PROTECTED] wrote: This is a known problem caused by ASF or IPMI firmware overwriting the promiscuous mode bit. I will have someone contact you to get the firmware upgraded. Thanks. Thinking out loud here without reading source... - can you check the version of the firmware and make noise if they have a version like this one? Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch 1/6] IPSEC: core updates
From: jamal [EMAIL PROTECTED] Date: Tue, 21 Feb 2006 08:31:49 -0500 Ok. Patch attached against net-2617 Yoshfuji-san you should probably write a little doc that should be available in the Doc/ directory. If we write this, please ask Andi Kleen to review it. His arch has the most problems in this area making him an expert on this topic :-) struct xfrm_aevent_id needs to be 32-bit + 64-bit align friendly. Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] Applied, thanks everyone. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call
On 2/23/06, Arnaldo Carvalho de Melo [EMAIL PROTECTED] wrote: On 2/23/06, Andrew Morton [EMAIL PROTECTED] wrote: Starting from 2.6.14, defer_accept is moved to request_sock_queue structure, which is re-initialized in inet_csk_listen_start(). Oops, looking into it... culprit: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=295f7324ff8d9ea58b4d3ec93b1aaa1d80e048a9 Alexandra, can you please test by just removing the zeroing from reqsk_queue_alloc() in net/core/request_sock.c? Just remove this line: queue-rskq_defer_accept = 0; icsk-icsk_accept_queue (that maps to the queue- above) is zeroed at sk alloc time, so just removing this one should restore the previous behaviour. Thanks, - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ip6_tunnel keeping dst_cache after change of params
Hi, ip6_tunnel keeps a cached dst (dst_cache in ip6_tnl) per tunnel instance. This cached dst is re-used while it's not marked obsolete. A change of the tunnel's parameters (via SIOCCHGTUNNEL) does not invalidate the dst_cache directly, which results on it being used by ip6ip6_tnl_xmit after the tunnel is configured with new parameters. Shouldn't ip6ip6_tnl_change dst_release() the cached dst and leave ip6ip6_tnl_xmit to pick a new one based on the new local/remote addresses etc? I can provide a patch to fix this, meanwhile just wanted to confirm the expected behaviour. Thanks, Hugo signature.asc Description: Digital signature
Re: (usagi-users 03614) Re: IPv6 setsockopt software MTU patch
From: YOSHIFUJI Hideaki [EMAIL PROTECTED] Date: Fri, 24 Feb 2006 00:23:51 +0900 (JST) David, please apply. Thank you. Can you please resend the patch with a full changelog entry and Signed-off-by lines for me? Thank you. This is for net-2.6 right? Or net-2.6.17? Thanks again. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pktgen: fix races between control/worker threads
From: Robert Olsson [EMAIL PROTECTED] Date: Wed, 22 Feb 2006 19:47:13 +0100 Jesse Brandeburg writes: I looked quickly at this on a couple different machines and wasn't able to reproduce, so don't let me block the patch. I think its a good patch FWIW OK! We ask Deve to apply it. Applied to net-2.6.17, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/01] pktgen: Lindent run.
From: Luiz Fernando Capitulino [EMAIL PROTECTED] Date: Mon, 23 Jan 2006 13:44:19 -0200 This patch is not in-lined because it's 120K bytes long, you can found it at: http://www.cpu.eti.br/patches/pktgen_lindent_1.patch Not found: [EMAIL PROTECTED]:~/src/GIT/net-2.6.17$ wget http://www.cpu.eti.br/patches/pktgen_lindent_1.patch --17:16:50-- http://www.cpu.eti.br/patches/pktgen_lindent_1.patch = `pktgen_lindent_1.patch' Resolving www.cpu.eti.br... 209.59.143.183 Connecting to www.cpu.eti.br|209.59.143.183|:80... connected. HTTP request sent, awaiting response... 404 Not Found 17:16:50 ERROR 404: Not Found. Anyways, can you please regenerate these 4 patches against net-2.6.17, as I put in Arthur's race fix and it will certainly conflict with these. Sorry for taking so long to get to this :-( - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] skge: patches for 2.6.16
Francois Romieu wrote: Stephen Hemminger [EMAIL PROTECTED] : Bug fix patches to skge driver that need to go in 2.6.16. Some of them are in -mm and some have already been sent (and ignored). #1..#3 Applied to branch 'for-jeff' at git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git Shortlog $ git rev-list --pretty master..HEAD | git shortlog Francois Romieu: r8169: fix broken ring index handling in suspend/resume r8169: enable wake on lan Stephen Hemminger: sky2: yukon-ec-u chipset initialization sky2: limit coalescing values to ring size sky2: poke coalescing timer to fix hang sky2: force early transmit status sky2: use device iomem to access PCI config sky2: close race on IRQ mask update. skge: NAPI/irq race fix skge: genesis phy initialzation skge: protect interrupt mask pulled, thanks. It definitely makes things easier, if the patches are rolled up like this. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Some state changes not be counted to TCP_MIB_ATTEMPTFAILS
Refer to RFC2012, tcpAttemptFails is defined as following: tcpAttemptFails OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION The number of times TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state. ::= { tcp 7 } State changes of SYN-RCVD to CLOSED, SYN-SENT to CLOSED and SYN-RCVD to LISTEN should be counted to TCP_MIB_ATTEMPTFAILS. Following state changes does not be counted to TCP_MIB_ATTEMPTFAILS by the kernel. SYN-SENT state = CLOSED TCP A TCP B 1. LISTENCLOSED 2. -- SEQ=ZCTL=SYN -- SYN-SENT 3. -- SEQ=XACK=Z+1CTL=RST -- CLOSED SYN-RECEIVED state(came from SYN-SENT state) = CLOSED TCP A TCP B 1. LISTENCLOSED 2. -- SEQ=ZCTL=SYN -- SYN-SENT 3. -- SEQ=XACK=Z+1CTL=SYN SYN-SENT 4. -- SEQ=Z+1ACK=X+1CTL=ACK -- SYN-RECEIVED 3. -- SEQ=X+1ACK=Z+2CTL=RST -- CLOSED SYN-RECEIVED state(came from SYN-SENT state) = CLOSED TCP A TCP B 1. LISTENCLOSED 2. -- SEQ=ZCTL=SYN -- SYN-SENT 3. -- SEQ=XACK=Z+1CTL=SYN SYN-SENT 4. -- SEQ=Z+1ACK=X+1CTL=ACK -- SYN-RECEIVED 3. -- SEQ=X+1ACK=Z+2CTL=SYN -- CLOSED SYN-RECEIVED state = LISTEN TCP A TCP B 1. LISTENLISTEN 2. ... SEQ=ZCTL=SYN-- SYN-RECEIVED 3. (??) -- SEQ=XACK=Z+1CTL=SYN,ACK -- SYN-RECEIVED 4. -- SEQ=Z+1CTL=RST -- (return to LISTEN!) 5. LISTENLISTEN SYN-RECEIVED state = LISTEN TCP A TCP B 1. LISTENLISTEN 2. ... SEQ=ZCTL=SYN-- SYN-RECEIVED 3. (??) -- SEQ=XACK=Z+1CTL=SYN,ACK -- SYN-RECEIVED 4. -- SEQ=Z+1CTL=SYN -- (return to LISTEN!) 5. LISTENLISTEN Patch to kernel 2.6.15.4 as following: diff -Nur a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c --- a/net/ipv4/tcp_input.c 2006-02-23 09:20:24.659262056 +0900 +++ b/net/ipv4/tcp_input.c 2006-02-23 09:28:50.772321176 +0900 @@ -4003,6 +4003,7 @@ */ if (th-rst) { + TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS); tcp_reset(sk); goto discard; } @@ -4290,6 +4291,8 @@ /* step 2: check RST bit */ if(th-rst) { + if(sk-sk_state == TCP_SYN_RECV) + TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS); tcp_reset(sk); goto discard; } @@ -4303,6 +4306,8 @@ * Check for a SYN in window. */ if (th-syn !before(TCP_SKB_CB(skb)-seq, tp-rcv_nxt)) { + if(sk-sk_state == TCP_SYN_RECV) + TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS); NET_INC_STATS_BH(LINUX_MIB_TCPABORTONSYN); tcp_reset(sk); return 1; diff -Nur a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c --- a/net/ipv4/tcp_minisocks.c 2006-02-23 09:20:24.660261904 +0900 +++ b/net/ipv4/tcp_minisocks.c 2006-02-23 09:26:07.432152656 +0900 @@ -591,8 +591,10 @@ /* RFC793: second check the RST bit and * fourth, check the SYN bit */ - if (flg (TCP_FLAG_RST|TCP_FLAG_SYN)) + if (flg (TCP_FLAG_RST|TCP_FLAG_SYN)) { + TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS); goto embryonic_reset; + } /* ACK sequence verified above, just make sure ACK is * set. If ACK not set, just silently drop the packet. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]IPv4 UDP does not discard the datagram with invalid checksum
Under IPv4, when I send a UDP packet with invalid checksum, kernel used udp_rcv() to up packet to UDP layer, application used udp_recvmsg to receive message. So if one UDP packet with invalid checksum is arrived to host, UDP_MIB_INDATAGRAMS will be increased 1, UDP_MIB_INERRORS should be increased 1. int udp_rcv(struct sk_buff *skb) { ... udp_queue_rcv_skb(); ... } static int udp_queue_rcv_skb(struct sock * sk, struct sk_buff *skb) { ... if (sk-sk_filter skb-ip_summed != CHECKSUM_UNNECESSARY) { if (__udp_checksum_complete(skb)) { UDP_INC_STATS_BH(UDP_MIB_INERRORS); kfree_skb(skb); return -1; } skb-ip_summed = CHECKSUM_UNNECESSARY; } UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS); ... } static int udp_recvmsg(...) { ... csum_copy_err: UDP_INC_STATS_BH(UDP_MIB_INERRORS); ... } In my test, I send a to a IPv4 UDP packet with invalid checksum to echo- udp, I can find the following message in file /var/log/messages: xinetd[4468]: service echo-dgram, recvfrom: Resource temporarily unavailable (errno = 11) and UDP_MIB_INDATAGRAMS increased 1, UDP_MIB_INERRORS increased 0. xinetd used other fucntion to receive message, not udp_recvmsg()? The other question is why discard the packet with invalid checksum only when sk-sk_filter is set? By the way, under IPv6, packet with invalid checksum be discard in udpv6_rcv(), so So if one UDP packet with invalid checksum is arrived to IPv6 host, UDP_MIB_INDATAGRAMS will be increased 0, UDP_MIB_INERRORS should be increased 1. static int udpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp) { ... udpv6_queue_rcv_skb(); ... } static inline int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb) { ... if (skb-ip_summed != CHECKSUM_UNNECESSARY) { if ((unsigned short)csum_fold(skb_checksum(skb, 0, skb-len, skb-csum))) { UDP6_INC_STATS_BH(UDP_MIB_INERRORS); kfree_skb(skb); return 0; } skb-ip_summed = CHECKSUM_UNNECESSARY; } ... UDP6_INC_STATS_BH(UDP_MIB_INDATAGRAMS); ... } One packet with invalid checksum arrived to IPv4 and IPv6 host, the count of UDP_MIB_INDATAGRAMS and UDP_MIB_INERRORS get different increase. There definition of the two count are some difference between IPv4 and IPv6? IPv4 UDP does not discard the datagram with invalid checksum. UDP can validate UDP checksums correctly only when socket filtering instructions is set. If socket filtering instructions is not set, datagram with invalid checksum will be passed to the application. We check the checksum later, in parallel with the copy of the packet data into userspace. See udp_recvmsg(), where we do this: if (skb-ip_summed==CHECKSUM_UNNECESSARY) { err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg-msg_iov, copied); } else if (msg-msg_flagsMSG_TRUNC) { if (__udp_checksum_complete(skb)) goto csum_copy_err; err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), msg-msg_iov, copied); } else { err = skb_copy_and_csum_datagram_iovec(skb, sizeof (struct udphdr), msg-msg_iov); if (err == -EINVAL) goto csum_copy_err; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with Ipsec transport mode over NAT
Chinh Nguyen wrote: Patrick McHardy wrote: Netfilter recalculates the checksum when NATing it. The NATing is not done by netfilter but by the NAT device between the IPsec peers. I see, so the TCP checksum includes the wrong IPs. [Linux ipsec client C] -- [NAT device] -- [Linux ipsec server S] C negotiates a IPsec Transport Mode with S. Because of Transport Mode/NAT-T, 2 things happen to an IPsec packet. 1. It is UDP-encapsulated, typically on port 4500/udp. 2. Transport Mode traffic leaves the original IP header alone whereas tunnel mode wraps the entire traffic in a second IP header. As such, when the packet passes through the NAT device, the source IP is N. However, the original unencrypted packet had source IP C. S rips off the UDP-encap header, decrypts the payload, and joins the content back to the IP header. If the decrypted content is UDP or TCP, the UDP/TCP checksum is now incorrect because the source IP is now N not C. (In tunnel mode, we would ignore the NAT-ted outer IP header because the decrypted content has an entire IP header + UDP/TCP etc) This is a well-known problem with transport mode/NAT. One solution is to use NAT-OA and NAT-OR to recalculate the checksum. The linux kernel does the simpler thing of ignoring the UDP/TCP checksum altogether in this particular case: function esp_post_input (net/ipv4/esp4.c) 290 /* 291 * 2) ignore UDP/TCP checksums in case 292 *of NAT-T in Transport Mode, or 293 *perform other post-processing fixes 294 *as per * draft-ietf-ipsec-udp-encaps-06, 295 *section 3.1.2 296 */ 297 if (!x-props.mode) 298 skb-ip_summed = CHECKSUM_UNNECESSARY; 299 300 break; As noted, esp_post_input is called in xfrm4_policy_check. Decrypted UDP traffic through transport mode/nat also has bad checksums. However, since it is passed through udp_queue_rcv_skb after decryption, and this function calls xfrm4_policy_check before checking the UDP checksum, line 298 means the kernel ignores the bad checksum. Decrypted TCP traffic has bad checksums too. But since tcp_v4_rcv checks the TCP checksum before calling xfrm4_policy_check, the bad checksum means the TCP packet is dropped as a bad segment. The end result is that UDP and other traffic (eg, ICMP) can pass through transport mode/nat but not TCP. I don't know what correct fix is. Adding an extra call to xfrm4_policy_check in tcp_v4_rcv before the checksum check fixes this problem and doesn't seem to break anything else. On the other hand, moving some of the code in esp_post_input into esp_input (especially line 298) will work, too. So we could move checksum validation behind xfrm4_policy_check or already set ip_summed to CHECKSUM_UNNECESSARY in esp_input. Already setting ip_summed in esp4_input looks easier. But this still leaves one problem. With netfilter and local NAT, a decapsulated transport mode packet might be forwarded to another host. In that case the checksum contained in the packet is invalid. Any ideas how to fix this anyone? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/02] add mask options to fwmark masking code
Michael Richardson wrote: Patrick == Patrick McHardy [EMAIL PROTECTED] writes: #define RTA_FWMARK RTA_PROTOINFO +#define RTA_FWMARK_MASK RTA_CACHEINFO Patrick Please introduce a new attribute for this instead of Patrick overloading RTA_CACHEINFO. I would be happy to do that. Should I also un-overload FWMARK, with backwards compatibility? No, that one is fine since it doesn't already have a different meaning. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] iproute2 -- add fwmarkmask
Michael Richardson wrote: Patrick == Patrick McHardy [EMAIL PROTECTED] writes: Patrick The normal way to display masks is with a /. Also I think Patrick it shouldn't display the default mask to avoid breaking Patrick scripts that parse the output. I generally dislike the /VALUE, since I expect /PREFIX-LEN. I agree that it shouldn't show if it is default. Patrick ip should be able to parse its own output, and it would Patrick also look nicer if I could just say fwmark Patrick 0x1/32. fwmarkmask is really an incredible ugly expression Patrick :) Sure. Is that a 32-bit long mask (0xfff), or is it a 0x0020? fwmark is not an address. Or would you like /32 to be a prefix-based mask, and value and/or fwmarkmask to be a value? That was not the greatest example :) I think it should be a bitmask. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch 1/1] updated: TCP/UDP getpeersec
Hi, Updated as per Herbert's comment. Catherine --- From: [EMAIL PROTECTED] This patch implements an application of the LSM-IPSec networking controls whereby an application can determine the label of the security association its TCP or UDP sockets are currently connected to via getsockopt and the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of an IPSec security association a particular TCP or UDP socket is using. The application can then use this security context to determine the security context for processing on behalf of the peer at the other end of this connection. In the case of UDP, the security context is for each individual packet. An example application is the inetd daemon, which could be modified to start daemons running at security contexts dependent on the remote client. Patch design approach: - Design for TCP The patch enables the SELinux LSM to set the peer security context for a socket based on the security context of the IPSec security association. The application may retrieve this context using getsockopt. When called, the kernel determines if the socket is a connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry cache on the socket to retrieve the security associations. If a security association has a security context, the context string is returned, as for UNIX domain sockets. - Design for UDP Unlike TCP, UDP is connectionless. This requires a somewhat different API to retrieve the peer security context. With TCP, the peer security context stays the same throughout the connection, thus it can be retrieved at any time between when the connection is established and when it is torn down. With UDP, each read/write can have different peer and thus the security context might change every time. As a result the security context retrieval must be done TOGETHER with the packet retrieval. The solution is to build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). Patch implementation details: - Implementation for TCP The security context can be retrieved by applications using getsockopt with the existing SO_PEERSEC flag. As an example (ignoring error checking): getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, optlen); printf(Socket peer context is: %s\n, optbuf); The SELinux function, selinux_socket_getpeersec, is extended to check for labeled security associations for connected (TCP_ESTABLISHED == sk-sk_state) TCP sockets only. If so, the socket has a dst_cache of struct dst_entry values that may refer to security associations. If these have security associations with security contexts, the security context is returned. getsockopt returns a buffer that contains a security context string or the buffer is unmodified. - Implementation for UDP To retrieve the security context, the application first indicates to the kernel such desire by setting the IP_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for UDP should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_IP, IP_PASSSEC, toggle, toggle_len); recvmsg(sockfd, msg_hdr, 0); if (msg_hdr.msg_controllen sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(msg_hdr); if (cmsg_hdr-cmsg_len = CMSG_LEN(sizeof(scontext)) cmsg_hdr-cmsg_level == SOL_IP cmsg_hdr-cmsg_type == SCM_SECURITY) { memcpy(scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow a server socket to receive security context of the peer. A new ancillary message type SCM_SECURITY. When the packet is received we get the security context from the sec_path pointer which is contained in the sk_buff, and copy it to the ancillary message space. An additional LSM hook, selinux_socket_getpeersec_udp, is defined to retrieve the security context from the SELinux space. The existing function, selinux_socket_getpeersec does not suit our purpose, because the security context is copied directly to user space, rather than to kernel space. Testing: We have tested the patch by setting up TCP and UDP connections between applications on two machines using the IPSec policies that result in labeled security associations being built. For TCP, we can then extract the peer security context using getsockopt on either end. For UDP, the receiving end can retrieve the security context using the auxiliary data mechanism of recvmsg. --- include/linux/in.h |1 include/linux/security.h| 25 +++--- include/linux/socket.h |1 net/core/sock.c |2 - net/ipv4/ip_sockglue.c |
[git patches] net driver fixes
Please pull from 'upstream-fixes' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git to receive the following updates: drivers/net/r8169.c | 189 drivers/net/skge.c | 75 drivers/net/skge.h |1 drivers/net/sky2.c | 173 --- drivers/net/sky2.h | 85 --- drivers/net/tlan.c |2 6 files changed, 371 insertions(+), 154 deletions(-) Adrian Bunk: drivers/net/tlan.c: #ifdef CONFIG_PCI the PCI specific code Francois Romieu: r8169: fix broken ring index handling in suspend/resume r8169: enable wake on lan Stephen Hemminger: sky2: yukon-ec-u chipset initialization sky2: limit coalescing values to ring size sky2: poke coalescing timer to fix hang sky2: force early transmit status sky2: use device iomem to access PCI config sky2: close race on IRQ mask update. skge: NAPI/irq race fix skge: genesis phy initialzation skge: protect interrupt mask diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index 6e10184..8cc0d0b 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -287,6 +287,20 @@ enum RTL8169_register_content { TxInterFrameGapShift = 24, TxDMAShift = 8, /* DMA burst value (0-7) is shift this many bits */ + /* Config1 register p.24 */ + PMEnable= (1 0), /* Power Management Enable */ + + /* Config3 register p.25 */ + MagicPacket = (1 5), /* Wake up when receives a Magic Packet */ + LinkUp = (1 4), /* Wake up when the cable connection is re-established */ + + /* Config5 register p.27 */ + BWF = (1 6), /* Accept Broadcast wakeup frame */ + MWF = (1 5), /* Accept Multicast wakeup frame */ + UWF = (1 4), /* Accept Unicast wakeup frame */ + LanWake = (1 1), /* LanWake enable/disable */ + PMEStatus = (1 0), /* PME status can be reset by PCI RST# */ + /* TBICSR p.28 */ TBIReset= 0x8000, TBILoopback = 0x4000, @@ -433,6 +447,7 @@ struct rtl8169_private { unsigned int (*phy_reset_pending)(void __iomem *); unsigned int (*link_ok)(void __iomem *); struct work_struct task; + unsigned wol_enabled : 1; }; MODULE_AUTHOR(Realtek and the Linux r8169 crew netdev@vger.kernel.org); @@ -607,6 +622,80 @@ static void rtl8169_link_option(int idx, *duplex = p-duplex; } +static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + struct rtl8169_private *tp = netdev_priv(dev); + void __iomem *ioaddr = tp-mmio_addr; + u8 options; + + wol-wolopts = 0; + +#define WAKE_ANY (WAKE_PHY | WAKE_MAGIC | WAKE_UCAST | WAKE_BCAST | WAKE_MCAST) + wol-supported = WAKE_ANY; + + spin_lock_irq(tp-lock); + + options = RTL_R8(Config1); + if (!(options PMEnable)) + goto out_unlock; + + options = RTL_R8(Config3); + if (options LinkUp) + wol-wolopts |= WAKE_PHY; + if (options MagicPacket) + wol-wolopts |= WAKE_MAGIC; + + options = RTL_R8(Config5); + if (options UWF) + wol-wolopts |= WAKE_UCAST; + if (options BWF) + wol-wolopts |= WAKE_BCAST; + if (options MWF) + wol-wolopts |= WAKE_MCAST; + +out_unlock: + spin_unlock_irq(tp-lock); +} + +static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + struct rtl8169_private *tp = netdev_priv(dev); + void __iomem *ioaddr = tp-mmio_addr; + int i; + static struct { + u32 opt; + u16 reg; + u8 mask; + } cfg[] = { + { WAKE_ANY, Config1, PMEnable }, + { WAKE_PHY, Config3, LinkUp }, + { WAKE_MAGIC, Config3, MagicPacket }, + { WAKE_UCAST, Config5, UWF }, + { WAKE_BCAST, Config5, BWF }, + { WAKE_MCAST, Config5, MWF }, + { WAKE_ANY, Config5, LanWake } + }; + + spin_lock_irq(tp-lock); + + RTL_W8(Cfg9346, Cfg9346_Unlock); + + for (i = 0; i ARRAY_SIZE(cfg); i++) { + u8 options = RTL_R8(cfg[i].reg) ~cfg[i].mask; + if (wol-wolopts cfg[i].opt) + options |= cfg[i].mask; + RTL_W8(cfg[i].reg, options); + } + + RTL_W8(Cfg9346, Cfg9346_Lock); + + tp-wol_enabled = (wol-wolopts) ? 1 : 0; + + spin_unlock_irq(tp-lock); + + return 0; +} + static void rtl8169_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { @@ -1025,6 +1114,8 @@ static struct ethtool_ops rtl8169_ethtoo .get_tso=
Re: [git patches] net driver fixes
On Friday 24 February 2006 06:22, Jeff Garzik wrote: Please pull from 'upstream-fixes' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git [...] Stephen Hemminger: sky2: yukon-ec-u chipset initialization sky2: limit coalescing values to ring size sky2: poke coalescing timer to fix hang sky2: force early transmit status sky2: use device iomem to access PCI config sky2: close race on IRQ mask update. [...] Thanks for the update. Still I'm seeing reproducable hangs with this version of sky2 (as reported in bugzilla 6084 and discussed on netdev). Stephen, if there is anything I can do to narrow down my hangs a bit more systematically, please let me know, I'd be happy to help. Wolfgang - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html