Re: [GIT PATCH (TAKE 2)] [NET]: Use {htons,htonl,cpu_to_le16}() where appropriate.
From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> Date: Wed, 07 Mar 2007 16:08:21 +0900 (JST) > Dave, > > In article <[EMAIL PROTECTED]> (at Wed, 07 Mar 2007 14:58:07 +0900 (JST)), > YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> says: > > > Please consider pulling following changesets from the > > "net-2.6.22-20070307a-byteorder-20070307" branch at > > . > > Argh, I found more places to convert in bluetooth. > > I've made a new branch "net-2.6.22-20070307a-byteorder-20070307a" at > . Pulled, thank you very much. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NETLINK: convert NLMSG_GOODSIZE to constant expression.
From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> Date: Wed, 07 Mar 2007 16:34:01 +0900 (JST) > This fixes the following error: > : > | CC [M] net/ipv4/netfilter/ipt_ULOG.o > |net/ipv4/netfilter/ipt_ULOG.c:82: error: braced-group within expression > allowed only inside a function > |net/ipv4/netfilter/ipt_ULOG.c:82: error: syntax error before "void" > |make[1]: *** [net/ipv4/netfilter/ipt_ULOG.o] Error 1 > |make: *** [net/ipv4/netfilter/] Error 2 > > Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Sorry about this :-/ Thanks for the fix, applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NETLINK: convert NLMSG_GOODSIZE to constant expression.
This fixes the following error: : | CC [M] net/ipv4/netfilter/ipt_ULOG.o |net/ipv4/netfilter/ipt_ULOG.c:82: error: braced-group within expression allowed only inside a function |net/ipv4/netfilter/ipt_ULOG.c:82: error: syntax error before "void" |make[1]: *** [net/ipv4/netfilter/ipt_ULOG.o] Error 1 |make: *** [net/ipv4/netfilter/] Error 2 Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 54597e4..a9d3ad5 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -175,8 +175,12 @@ int netlink_sendskb(struct sock *sk, struct sk_buff *skb, int protocol); * use enormous buffer sizes on recvmsg() calls just to avoid * MSG_TRUNC when PAGE_SIZE is very large. */ -#define NLMSG_GOODSIZE \ - SKB_WITH_OVERHEAD(min(PAGE_SIZE,8192UL)) +#if PAGE_SIZE < 8192UL +#define NLMSG_GOODSIZE SKB_WITH_OVERHEAD(PAGE_SIZE) +#else +#define NLMSG_GOODSIZE SKB_WITH_OVERHEAD(8192UL) +#endif + #define NLMSG_DEFAULT_SIZE (NLMSG_GOODSIZE - NLMSG_HDRLEN) -- YOSHIFUJI Hideaki @ USAGI Project <[EMAIL PROTECTED]> GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multipath routing in Linux 2.6
On 06-03-2007 21:36, Tore Anderson wrote: > > Hello list, > > I've been trying to figure out how to make equal-cost multipath > routing work, with no luck. Asked on the LARTC list with no success, It is probably one of the most often asked questions on the LARTC, so I'd suggest to look at its archives. > I'm sending traffic from a relatively busy network to this table: ... > I've tried loading and unloading the multipath_{wrandom,rr,random,drr} Multipath with caching doesn't work with forwarding. ... > I feel I'm missing something essential here but I have no idea what. > Google only tells me about others having roughly the same problem but > never any solution. [...] It must be this fake google. Some wrandom suggestions: CONFIG_IP_ROUTE_MULTIPATH = y CONFIG_IP_ROUTE_MULTIPATH_CACHED = n rp_filter turned off iptables CONNMARK or Julian Anastasov's patch more ip route ... & ip rule ... go to the LARTC again with: why still doesn't work... Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PATCH (TAKE 2)] [NET]: Use {htons,htonl,cpu_to_le16}() where appropriate.
Dave, In article <[EMAIL PROTECTED]> (at Wed, 07 Mar 2007 14:58:07 +0900 (JST)), YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> says: > Please consider pulling following changesets from the > "net-2.6.22-20070307a-byteorder-20070307" branch at > . Argh, I found more places to convert in bluetooth. I've made a new branch "net-2.6.22-20070307a-byteorder-20070307a" at . Thanks. HEADLINES - [NET] 802: Use hton{s,l}() where appropriate. [NET] 8021Q: Use htons() where appropriate. [NET] ATM: Use htons() where appropriate. [NET] BLUETOOTH: Use cpu_to_le{16,32}() where appropriate. [NET] CORE: Use htons() where appropriate. [NET] ETHERNET: Use htons() where appropriate. [NET] IEEE80211: Use htons() where appropriate. [NET] IPV4: Use hton{s,l}() where appropriate. [NET] NETFILTER: Use htonl() where appropriate. [NET] SCHED: Use htons() where appropriate. [NET] TIPC: Use htons() where appropriate. DIFFSTAT net/802/fddi.c |4 +- net/802/hippi.c |4 +- net/8021q/vlan_dev.c|6 +- net/atm/br2684.c|4 +- net/bluetooth/hci_conn.c| 36 +++--- net/bluetooth/hci_core.c| 20 net/bluetooth/hci_event.c |8 ++- net/bluetooth/l2cap.c | 70 ++- net/core/netpoll.c |2 - net/ethernet/eth.c |2 - net/ieee80211/ieee80211_tx.c|2 - net/ipv4/ipvs/ip_vs_core.c | 10 ++-- net/ipv4/ipvs/ip_vs_proto_ah.c | 16 +++--- net/ipv4/ipvs/ip_vs_xmit.c | 16 +++--- net/ipv4/netfilter/ip_conntrack_proto_tcp.c |9 ++- net/netfilter/nf_conntrack_proto_tcp.c |9 ++- net/sched/cls_rsvp.h|2 - net/sched/sch_api.c |2 - net/tipc/eth_media.c|2 - 19 files changed, 111 insertions(+), 113 deletions(-) CHANGESETS -- commit 9f72aaaed003b2dda0fc5fa28c2caccc4ae358e3 Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Date: Wed Mar 7 14:18:33 2007 +0900 [NET] 802: Use hton{s,l}() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> diff --git a/net/802/fddi.c b/net/802/fddi.c index ace6386..8c86216 100644 --- a/net/802/fddi.c +++ b/net/802/fddi.c @@ -100,7 +100,7 @@ static int fddi_rebuild_header(struct sk_buff *skb) struct fddihdr *fddi = (struct fddihdr *)skb->data; #ifdef CONFIG_INET - if (fddi->hdr.llc_snap.ethertype == __constant_htons(ETH_P_IP)) + if (fddi->hdr.llc_snap.ethertype == htons(ETH_P_IP)) /* Try to get ARP to resolve the header and fill destination address */ return arp_find(fddi->daddr, skb); else @@ -135,7 +135,7 @@ __be16 fddi_type_trans(struct sk_buff *skb, struct net_device *dev) if(fddi->hdr.llc_8022_1.dsap==0xe0) { skb_pull(skb, FDDI_K_8022_HLEN-3); - type = __constant_htons(ETH_P_802_2); + type = htons(ETH_P_802_2); } else { diff --git a/net/802/hippi.c b/net/802/hippi.c index 578f2a3..35dd938 100644 --- a/net/802/hippi.c +++ b/net/802/hippi.c @@ -60,7 +60,7 @@ static int hippi_header(struct sk_buff *skb, struct net_device *dev, * Due to the stupidity of the little endian byte-order we * have to set the fp field this way. */ - hip->fp.fixed = __constant_htonl(0x04800018); + hip->fp.fixed = htonl(0x04800018); hip->fp.d2_size = htonl(len + 8); hip->le.fc = 0; hip->le.double_wide = 0;/* only HIPPI 800 for the time being */ @@ -104,7 +104,7 @@ static int hippi_rebuild_header(struct sk_buff *skb) * Only IP is currently supported */ - if(hip->snap.ethertype != __constant_htons(ETH_P_IP)) + if(hip->snap.ethertype != htons(ETH_P_IP)) { printk(KERN_DEBUG "%s: unable to resolve type %X addresses.\n",skb->dev->name,ntohs(hip->snap.ethertype)); return 0; --- commit d186c0a25b23079f5faac2500d3dcbd8b7d6e166 Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Date: Wed Mar 7 14:18:35 2007 +0900 [NET] 8021Q: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 2fc8fe2..e961d59 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -258,7 +258,7 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev, * won't work for fault tolerant netware but does for the rest. */ if (*(unsigned short *)rawp == 0x) { - skb->protocol = __constant_htons(ETH_P_802_3); + skb->pr
Re: [RFC] ARP notify option
On Tue, 6 Mar 2007, Chris Friesen wrote: Stephen Hemminger wrote: +arp_notify - BOOLEAN + Define mode for notification of address and device changes. + 0 - (default): do nothing + 1 - Generate gratuitous arp replies when device is brought up + or hardware address changes. Did you consider using gratuitous arp requests instead? I remember reading about some hardware that updated its arp cache on gratuitous requests but not gratuitous replies. You might be interested in taking a look at: http://tools.ietf.org/id/draft-cheshire-ipv4-acd There has been some follow-up discussion on this in the thread starting at: http://www1.ietf.org/mail-archive/web/int-area/current/msg00611.html In particular, you may be interested in this comment about ARP request and ARP reply for gratuitous ARP: http://www1.ietf.org/mail-archive/web/int-area/current/msg00669.html -- Pekka Savola "You each name yourselves king, yet the Netcore Oykingdom bleeds." Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Update to cube root benchmark code
Hi Stephen, Thanks for this code, it's easy to experiment with it. Let me propose this simple update with a variation on your ncubic() function. I noticed that all intermediate results were far below 32 bits, so I did a new version which is 30% faster on my athlon with the same results. This is because we only use x and a/x^2 in the function, with x very close to cbrt(a). So a/x^2 is very close to cbrt(a) which is at most 22 bits. So we only use the 32 lower bits of the result of div64_64(), and all intermediate computations can be done on 32 bits (including multiplies and divides). [EMAIL PROTECTED]:~$ ./bictcp Calibrating Function clocks mean(us) max(us) std(us) Avg error bictcp 1085 0.7028.19 2.30 0.172% ocubic 869 0.5622.76 1.23 0.274% ncubic 637 0.4116.29 1.41 0.247% ncubic32435 0.2811.18 1.03 0.247% acbrt 824 0.5321.03 0.85 0.275% hcbrt 547 0.3513.96 0.42 1.580% I also tried to improve a bit by checking for early convergence and returning before last divide, but it is worthless because it almost never happens so it does not make the code any faster. Here's the code. I think that it would be fine if we merged this version since it's supposed to behave better on most 32 bits machines. Best regards, Willy /* Here is a better version of the benchmark code. It has the original code used in 2.4 version of Cubic for comparison --- */ /* Test and measure perf of cube root algorithms. */ #include #include #include #include #include #ifdef __x86_64 #define rdtscll(val) do { \ unsigned int __a,__d; \ asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); \ (val) = ((unsigned long)__a) | (((unsigned long)__d)<<32); \ } while(0) # define do_div(n,base) ({ \ uint32_t __base = (base); \ uint32_t __rem; \ __rem = ((uint64_t)(n)) % __base; \ (n) = ((uint64_t)(n)) / __base; \ __rem; \ }) /** * __ffs - find first bit in word. * @word: The word to search * * Undefined if no bit exists, so code should check against 0 first. */ static __inline__ unsigned long __ffs(unsigned long word) { __asm__("bsfq %1,%0" :"=r" (word) :"rm" (word)); return word; } /* * __fls: find last bit set. * @word: The word to search * * Undefined if no zero exists, so code should check against ~0UL first. */ static inline unsigned long __fls(unsigned long word) { __asm__("bsrq %1,%0" :"=r" (word) :"rm" (word)); return word; } /** * ffs - find first bit set * @x: the word to search * * This is defined the same way as * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). */ static __inline__ int ffs(int x) { int r; __asm__("bsfl %1,%0\n\t" "cmovzl %2,%0" : "=r" (r) : "rm" (x), "r" (-1)); return r+1; } /** * fls - find last bit set * @x: the word to search * * This is defined the same way as ffs. */ static inline int fls(int x) { int r; __asm__("bsrl %1,%0\n\t" "cmovzl %2,%0" : "=&r" (r) : "rm" (x), "rm" (-1)); return r+1; } /** * fls64 - find last bit set in 64 bit word * @x: the word to search * * This is defined the same way as fls. */ static inline int fls64(uint64_t x) { if (x == 0) return 0; return __fls(x) + 1; } static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor) { return dividend / divisor; } #elif __i386 #define rdtscll(val) \ __asm__ __volatile__("rdtsc" : "=A" (val)) /** * ffs - find first bit set * @x: the word to search * * This is defined the same way as * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz() (man ffs). */ static inline int ffs(int x) { int r; __asm__("bsfl %1,%0\n\t" "jnz 1f\n\t" "movl $-1,%0\n" "1:" : "=r" (r) : "rm" (x)); return r+1; } /** * fls - find last bit set * @x: the word to search * * This is defined the same way as ffs(). */ static inline int fls(int x) { int r; __asm__("bsrl %1,%0\n\t" "jnz 1f\n\t" "movl $-1,%0\n" "1:" : "=r" (r) : "rm" (x)); return r+1; } static inline int fls64(uint64_t x) { uint32_t h = x >> 32; if (h) return fls(h) + 32; return fls(x); } #define do_div(n,base) ({ \ unsigned long __upper, __low, __high, __mod, __base; \ __base =
[NET]: Use {htons,htonl,cpu_to_le16}() where appropriate.
Dave, Please consider pulling following changesets from the "net-2.6.22-20070307a-byteorder-20070307" branch at . Thank you. HEADLINES - [NET] 802: Use hton{s,l}() where appropriate. [NET] 8021Q: Use htons() where appropriate. [NET] ATM: Use htons() where appropriate. [NET] BLUETOOTH: Use cpu_to_le16() where appropriate. [NET] CORE: Use htons() where appropriate. [NET] ETHERNET: Use htons() where appropriate. [NET] IEEE80211: Use htons() where appropriate. [NET] IPV4: Use hton{s,l}() where appropriate. [NET] NETFILTER: Use htonl() where appropriate. [NET] SCHED: Use htons() where appropriate. [NET] TIPC: Use htons() where appropriate. DIFFSTAT net/802/fddi.c |4 ++-- net/802/hippi.c |4 ++-- net/8021q/vlan_dev.c|6 +++--- net/atm/br2684.c|4 ++-- net/bluetooth/hci_conn.c| 18 +- net/core/netpoll.c |2 +- net/ethernet/eth.c |2 +- net/ieee80211/ieee80211_tx.c|2 +- net/ipv4/ipvs/ip_vs_core.c | 10 +- net/ipv4/ipvs/ip_vs_proto_ah.c | 16 net/ipv4/ipvs/ip_vs_xmit.c | 16 net/ipv4/netfilter/ip_conntrack_proto_tcp.c |9 - net/netfilter/nf_conntrack_proto_tcp.c |9 - net/sched/cls_rsvp.h|2 +- net/sched/sch_api.c |2 +- net/tipc/eth_media.c|2 +- 16 files changed, 53 insertions(+), 55 deletions(-) CHANGESETS -- commit 2893c534ffba19c31c0321f8221eaf844306e951 Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Date: Wed Mar 7 14:18:33 2007 +0900 [NET] 802: Use hton{s,l}() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> diff --git a/net/802/fddi.c b/net/802/fddi.c index ace6386..8c86216 100644 --- a/net/802/fddi.c +++ b/net/802/fddi.c @@ -100,7 +100,7 @@ static int fddi_rebuild_header(struct sk_buff *skb) struct fddihdr *fddi = (struct fddihdr *)skb->data; #ifdef CONFIG_INET - if (fddi->hdr.llc_snap.ethertype == __constant_htons(ETH_P_IP)) + if (fddi->hdr.llc_snap.ethertype == htons(ETH_P_IP)) /* Try to get ARP to resolve the header and fill destination address */ return arp_find(fddi->daddr, skb); else @@ -135,7 +135,7 @@ __be16 fddi_type_trans(struct sk_buff *skb, struct net_device *dev) if(fddi->hdr.llc_8022_1.dsap==0xe0) { skb_pull(skb, FDDI_K_8022_HLEN-3); - type = __constant_htons(ETH_P_802_2); + type = htons(ETH_P_802_2); } else { diff --git a/net/802/hippi.c b/net/802/hippi.c index 578f2a3..35dd938 100644 --- a/net/802/hippi.c +++ b/net/802/hippi.c @@ -60,7 +60,7 @@ static int hippi_header(struct sk_buff *skb, struct net_device *dev, * Due to the stupidity of the little endian byte-order we * have to set the fp field this way. */ - hip->fp.fixed = __constant_htonl(0x04800018); + hip->fp.fixed = htonl(0x04800018); hip->fp.d2_size = htonl(len + 8); hip->le.fc = 0; hip->le.double_wide = 0;/* only HIPPI 800 for the time being */ @@ -104,7 +104,7 @@ static int hippi_rebuild_header(struct sk_buff *skb) * Only IP is currently supported */ - if(hip->snap.ethertype != __constant_htons(ETH_P_IP)) + if(hip->snap.ethertype != htons(ETH_P_IP)) { printk(KERN_DEBUG "%s: unable to resolve type %X addresses.\n",skb->dev->name,ntohs(hip->snap.ethertype)); return 0; --- commit 60e71cd6525fdf2b35c71158bb5441a252e62a0e Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Date: Wed Mar 7 14:18:35 2007 +0900 [NET] 8021Q: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 2fc8fe2..e961d59 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -258,7 +258,7 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev, * won't work for fault tolerant netware but does for the rest. */ if (*(unsigned short *)rawp == 0x) { - skb->protocol = __constant_htons(ETH_P_802_3); + skb->protocol = htons(ETH_P_802_3); /* place it back on the queue to be handled by true layer 3 protocols. */ @@ -281,7 +281,7 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev, /* * Real 802.2 LLC */ - skb->protocol = __constant_htons(ETH_P_802_2); + skb->protocol = htons(ETH_P_802_2); /* place it back
Re: [ubuntu-marketing] Why should we teach students Linux??
Roel Bindels wrote: > Hello listers, > > I'm tutor on the Faculty ICT, department NID. This is a bachelor degree > and we are preparing our students to become something more then just > System Administrators (such as manager, consulting, etc). Since this > department is part of the Microsoft camp, the students are educated > mostly in this direction, which I think is not a bad thing. A better > thing would be if we could give our students the opportunity to meat > both the systems on the same level, at least, that is my opinion. > > To change a curriculum of a study, I need a solid case. So if somebody > knows a link/document about why we should educate our students in the > Linux OS, please send it. Or article about the usage of Linux in company's. > > I hope you will all take some time to send me your best links/documents. > > with best regards > > Roel Bindels > Roel, I recently interviewed Richard Weideman (who I am adding to the CC so he can comment directly) for an article about Edubuntu. One of the comments he made was: "Kids who learn to use a computer from scratch are not afraid of Linux or OpenOffice.org. They concentrate on the learning task at hand, and they learn to use whatever the tool is put in front of them. If some of those kids graduate to a work environment using Linux or OpenOffice.org, they will have no problem. If the new work environment uses Windows, they will adjust without any issues. Some of them will even propose OpenOffice.org or Linux at work, and help their new company to migrate and save money." You can read the full article at http://www.linux.com/article.pl?sid=07/02/20/197251 -- Sincerely Melissa Draper http://www.meldraweb.com Phone: 0404 595 395 (intl): +61 404 595 395 P.O Box 1412 Lavington, NSW 2641 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [UDP]: Clean up UDP-Lite receive checksum
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 12:41:00 +1100 > Hi Dave: > > [UDP]: Clean up UDP-Lite receive checksum > > This patch eliminates some duplicate code for the verification of > receive checksums between UDP-Lite and UDP. It does this by > introducing __skb_checksum_complete_head which is identical to > __skb_checksum_complete_head apart from the fact that it takes > a length parameter rather than computing the first skb->len bytes. > > As a result UDP-Lite will be able to use hardware checksum offload > for packets which do not use partial coverage checksums. It also > means that UDP-Lite loopback no longer does unnecessary checksum > verification. > > If any NICs start support UDP-Lite this would also start working > automatically. > > This patch removes the assumption that msg_flags has MSG_TRUNC clear > upon entry in recvmsg. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Also applied, thanks a lot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [UDP6]: Restore sk_filter optimisation
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 12:20:10 +1100 > Hi Dave: > > [UDP6]: Restore sk_filter optimisation > > This reverts the changeset > > [IPV6]: UDPv6 checksum. > > We always need to check UDPv6 checksum because it is mandatory. > > The sk_filter optimisation has nothing to do whether we verify the > checksum. It simply postpones it to the point when the user calls > recv or poll. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [UDP]: Reread uh pointer after pskb_trim
From: Herbert Xu <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 12:00:20 +1100 > Hi Dave: > > [UDP]: Reread uh pointer after pskb_trim > > The header may have moved when trimming. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> IPV6 got this case right :-) Applied, and I'll push this to -stable too. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : Optimizes inet_getpeer()
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 11:33:20 +0100 > [PATCH] NET : Optimizes inet_getpeer() > > 1) Some sysctl vars are declared __read_mostly > > 2) We can avoid updating stack[] when doing an AVL lookup only. > > lookup() macro is extended to receive a second parameter, that may be > NULL > in case of a pure lookup (no need to save the AVL path). This removes > unnecessary instructions, because compiler knows if this _stack parameter is > NULL or not. > > text size of net/ipv4/inetpeer.o is 2063 bytes instead of 2107 on x86_64 > > Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> Applied, thanks Eric. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] pcnet32: only allocate init_block dma consistent
On Tue, Mar 06, 2007 at 07:39:21PM -0800, Michael K. Edwards wrote: > On 3/6/07, Ralf Baechle <[EMAIL PROTECTED]> wrote: > >This small change btw. delivers about ~ 3% extra performance on a very > >slow test system. > > Has this change been tested / benchmarked under VMWare? pcnet32 is > the (default?) virtual device presented by VMWare Workstation, and > that's probably a large fraction of its use in the field these days. > But then Don probably already knows that. :-) Price question: why would this patch make a difference under VMware? :-) Ralf - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] TCP Yeah: cleanup
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 14:56:05 -0800 > > Eliminate need for full 6/4/64 divide to compute queue. > Variable maxqueue was really a constant. > Fix indentation. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Applied, thanks Stephen. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp_cubic: faster cube root
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 14:47:06 -0800 > The Newton-Raphson method is quadratically convergent so > only a small fixed number of steps are necessary. > Therefore it is faster to unroll the loop. Since div64_64 is no longer > inline it won't cause code explosion. > > Also fixes a bug that can occur if x^2 was bigger than 32 bits. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> Applied, thanks Stephen. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ATM] ENI: Convert to struct timeval to ktime_t.
From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> Date: Wed, 07 Mar 2007 11:31:39 +0900 (JST) > Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Applied, thanks a lot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6 Davelopment Tree
From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]> Date: Wed, 07 Mar 2007 11:30:30 +0900 (JST) > Please pull from "net-2.6.22-20070307-FOR_DAVEM-20070307" branch at > . Pulled, thank you very much. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
Rick Jones wrote: The timeout is also to cover datagrams which just got "stuck" somewhere too (IIRC) and may not necessarily require a multiple path situation. I guess that's a fair point. Originally, the only possible place for a packet to get "stuck" was in a router but I suppose that may no longer be true. True. Thankfully, the web learned to use persistent connections so later versions of SPECweb benchmarking make use of persistent connections. As a complete aside, I think it's about time for a SPECldap benchmark... -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc Chief Architect, OpenLDAP http://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] pcnet32: only allocate init_block dma consistent
On 3/6/07, Ralf Baechle <[EMAIL PROTECTED]> wrote: This small change btw. delivers about ~ 3% extra performance on a very slow test system. Has this change been tested / benchmarked under VMWare? pcnet32 is the (default?) virtual device presented by VMWare Workstation, and that's probably a large fraction of its use in the field these days. But then Don probably already knows that. :-) Cheers, - Michael - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing
> >Marking the master down would, I believe, issue notifiers that > > the device has gone down. Various things, network manager sort of > > applications in particular, listen to those, so I'm not sure it's a good > > idea. I think there are other side effects as well, I'm thinking it > > would flush routes associated with the interface as well. [BTW, you can call ip_mc_down()/ip_mc_up() directly w/o getting there from the notifiers -- then no side-effects.] Andy Gospodarek wrote: > > I agree with Jay here. I hate that bonding has to have so much > knowledge about upper layer protocols, but for the ones that are > stateful like IGMP we will need fixes like the one proposed. I have no problem with bonding having knowledge of ULP's (I don't like it, but I don't have to look at it :-) ), but the patch is doing it the other way around. What I don't like about the proposed patch is that it's adding knowledge of bonding to IGMP. And IGMP does work fine in this case, w/o flooding or the proposed patch. It just has the risk of losing multicast packets during one query interval, and that only happens if you're using a switch that does IGMP snooping. I'd like the patch a lot better if it were basicly this: mc_bond_fudge(void) { ip_mc_down(masterdev); /*do whatever you need to do to switch the slave */ ip_mc_up(masterdev); } That doesn't go through the notifier chain, uses existing functions, doesn't have any refcnt issues, and most importantly could/should reside in a bonding source file and not in igmp.c. :-) But RTNL is required whether you use up/down or roll your own variant, so it sounds like you have other issues to resolve too. +-DLS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] pcnet32: only allocate init_block dma consistent
On Tue, Mar 06, 2007 at 10:45:23AM -0800, Don Fry wrote: > The patch below moves the init_block out of the private struct and > only allocates init block with pci_alloc_consistent. > > This has two effects: > > 1. Performance increase for non cache coherent machines, because the >CPU only data in the private struct are now cached This small change btw. delivers about ~ 3% extra performance on a very slow test system. Ralf - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ATM] ENI: Convert to struct timeval to ktime_t.
Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> --- diff --git a/drivers/atm/eni.c b/drivers/atm/eni.c index 8fccf01..0d3a38b 100644 --- a/drivers/atm/eni.c +++ b/drivers/atm/eni.c @@ -536,7 +536,7 @@ static int rx_aal0(struct atm_vcc *vcc) return 0; } skb_put(skb,length); - skb_set_timestamp(skb, &eni_vcc->timestamp); + skb->tstamp = eni_vcc->timestamp; DPRINTK("got len %ld\n",length); if (do_rx_dma(vcc,skb,1,length >> 2,length >> 2)) return 1; eni_vcc->rxing++; @@ -701,7 +701,7 @@ static void get_service(struct atm_dev *dev) DPRINTK("Grr, servicing VCC %ld twice\n",vci); continue; } - do_gettimeofday(&ENI_VCC(vcc)->timestamp); + ENI_VCC(vcc)->timestamp = ktime_get_real(); ENI_VCC(vcc)->next = NULL; if (vcc->qos.rxtp.traffic_class == ATM_CBR) { if (eni_dev->fast) diff --git a/drivers/atm/eni.h b/drivers/atm/eni.h index 385090c..d04fefb 100644 --- a/drivers/atm/eni.h +++ b/drivers/atm/eni.h @@ -59,7 +59,7 @@ struct eni_vcc { int rxing; /* number of pending PDUs */ int servicing; /* number of waiting VCs (0 or 1) */ int txing; /* number of pending TX bytes */ - struct timeval timestamp; /* for RX timing */ + ktime_t timestamp; /* for RX timing */ struct atm_vcc *next; /* next pending RX */ struct sk_buff *last; /* last PDU being DMAed (used to carry discard information) */ -- YOSHIFUJI Hideaki @ USAGI Project <[EMAIL PROTECTED]> GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6 Davelopment Tree
Dave, In article <[EMAIL PROTECTED]> (at Tue, 06 Mar 2007 13:36:11 -0800 (PST)), David Miller <[EMAIL PROTECTED]> says: > From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> > Date: Fri, 23 Feb 2007 12:53:01 +0900 (JST) > > > I have cooked up new git tree for IPv6 development. > > It is available as branch named > > 2.6.21-rc1-net-2.6-20070223-FOR_DAVEM-20070223 > > at > > . > > > > I will shift to new branch time to time (e.g. every -rc releases) in order > > to chase the latest tree. > > What is the current branch name? I'd like to pull whatever > you have into my net-2.6.22 tree. Please pull from "net-2.6.22-20070307-FOR_DAVEM-20070307" branch at . Thank you. -- YOSHIFUJI Hideaki @ USAGI Project <[EMAIL PROTECTED]> GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless extensions vs. 64-bit architectures
On Tue, Mar 06, 2007 at 07:43:06PM +0100, Michael Buesch wrote: > > Ok, it is wrapping the following ioctls: > > HANDLE_IOCTL(SIOCGIWRANGE, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWSPY, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWSPY, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWTHRSPY, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWTHRSPY, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWAPLIST, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWSCAN, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWESSID, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWESSID, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWNICKN, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWNICKN, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWENCODE, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWENCODE, do_wireless_ioctl) > > What about SIOCSIWSCAN, SIOCSIWENCODEEXT, SIOCGIWENCODEEXT > and some others that also use iw_point? Ok, please check the patch attached. I don't have a box to test that on, and on my 32 bit kernel it is not even compiled, but I believe I got everything all right. Please push that to the usual channels... > Greetings Michael. Thanks again, Jean Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]> -- diff -u -p linux/fs/compat_ioctl.j1.c linux/fs/compat_ioctl.c --- linux/fs/compat_ioctl.j1.c 2007-03-06 17:49:33.0 -0800 +++ linux/fs/compat_ioctl.c 2007-03-06 17:56:19.0 -0800 @@ -2553,11 +2553,15 @@ HANDLE_IOCTL(I2C_RDWR, do_i2c_rdwr_ioctl HANDLE_IOCTL(I2C_SMBUS, do_i2c_smbus_ioctl) /* wireless */ HANDLE_IOCTL(SIOCGIWRANGE, do_wireless_ioctl) +HANDLE_IOCTL(SIOCGIWPRIV, do_wireless_ioctl) +HANDLE_IOCTL(SIOCGIWSTATS, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIWSPY, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWSPY, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIWTHRSPY, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWTHRSPY, do_wireless_ioctl) +HANDLE_IOCTL(SIOCSIWMLME, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWAPLIST, do_wireless_ioctl) +HANDLE_IOCTL(SIOCSIWSCAN, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWSCAN, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIWESSID, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWESSID, do_wireless_ioctl) @@ -2565,6 +2569,11 @@ HANDLE_IOCTL(SIOCSIWNICKN, do_wireless_i HANDLE_IOCTL(SIOCGIWNICKN, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIWENCODE, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWENCODE, do_wireless_ioctl) +HANDLE_IOCTL(SIOCSIWGENIE, do_wireless_ioctl) +HANDLE_IOCTL(SIOCGIWGENIE, do_wireless_ioctl) +HANDLE_IOCTL(SIOCSIWENCODEEXT, do_wireless_ioctl) +HANDLE_IOCTL(SIOCGIWENCODEEXT, do_wireless_ioctl) +HANDLE_IOCTL(SIOCSIWPMKSA, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIFBR, old_bridge_ioctl) HANDLE_IOCTL(SIOCGIFBR, old_bridge_ioctl) HANDLE_IOCTL(RTC_IRQP_READ32, rtc_ioctl) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing
On Tue, Mar 06, 2007 at 03:15:41PM -0800, Jay Vosburgh wrote: > > David Stevens <[EMAIL PROTECTED]> wrote: > > >It looks to me like "rejoin" is essentially ip_mc_up(), and it'd be better > >to call that than add a nearly identical function. > > Won't ip_mc_up() acquire an additional reference (via > ip_mc_inc_group) to the IGMP_ALL_HOSTS im->users that would never be > released (in the case of bonding calling the function out of the blue)? > > In looking at it, the ip_mc_rejoin_group function (the new one > added with the patch) is a lot more like igmp_group_added() than > ip_mc_up(). I'm not sure if the extra bits in igmp_group_added() are > worthy of concern; I'm thinking not, since im->loaded shouldn't be zero > coming in for the bonding case. > > I think the meat that the "rejoin" wants is what's in > igmpv3_send_cr(), which appears to do the actual sending stuff. I'm not > sure if that's better to call directly (and risk locking adventures) or > to just trip the timer via igmp_ifc_event(). > > Anyway, it looks like all of this needs to be done under RTNL, > which isn't the case, so I need to go off and look into reworking it > again. > > Andy: do you have any work in progress on the sleep / rtnl stuff > we've been discussing? Jay, I do, but unfortunately it's much closer to the code I'd proposed originally than the code you sent me. The more I audited your code, the more I like the design -- until I discovered that every time you pause the timers you need to flush the workqueue. This is bad since you are regularly stopping the timers in places where the rtnl lock is taken and the currently running work item may need to that lock to complete. With a small enough monitor interval I could deadlock pretty quickly. Without the benefit of the full stop, I couldn't justify the major conversion just yet (plus is feels like keeping a list of the timers is re-implementing what workqueues are designed to do for you). I've got a patch that seems decent so far, but its really just at timer->workqueue conversion with some bits thrown in correctly stop the queues when taking the interface down or when removing the module. > >Also, real interfaces already do gratuitous IGMP advertisements when > >they are bounced (the reason there is an ip_mc_up()). Could bonding, > >when failing over, simply mark the master interface as down, switch, and > >then mark the master as up again? In addition to doing the right > >thing for both IPv4 and IPv6 multicasting w/o any code changes in those > >layers, it may have similar benefits for ARP and neighbor discovery, > >right? > > Marking the master down would, I believe, issue notifiers that > the device has gone down. Various things, network manager sort of > applications in particular, listen to those, so I'm not sure it's a good > idea. I think there are other side effects as well, I'm thinking it > would flush routes associated with the interface as well. I agree with Jay here. I hate that bonding has to have so much knowledge about upper layer protocols, but for the ones that are stateful like IGMP we will need fixes like the one proposed. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless extensions vs. 64-bit architectures
On Tue, Mar 06, 2007 at 07:43:06PM +0100, Michael Buesch wrote: > > > > Yep, and it's even in fs/compat_ioctl.c. Hint, hint ;-) > > Ok, it is wrapping the following ioctls: > > HANDLE_IOCTL(SIOCGIWRANGE, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWSPY, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWSPY, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWTHRSPY, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWTHRSPY, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWAPLIST, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWSCAN, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWESSID, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWESSID, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWNICKN, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWNICKN, do_wireless_ioctl) > HANDLE_IOCTL(SIOCSIWENCODE, do_wireless_ioctl) > HANDLE_IOCTL(SIOCGIWENCODE, do_wireless_ioctl) > > What about SIOCSIWSCAN, SIOCSIWENCODEEXT, SIOCGIWENCODEEXT > and some others that also use iw_point? Yep, good point. SIOCSIWSCAN is up there. I did not realise that all the WPA ioctls are missing. That's easy enough to fix, remember that you have the full description of the ioctls in wireless.c. I'll try to do a patch if I find 5 min, but feel free to forward something to John L. Thanks a lot ! > Greetings Michael. Jean - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] S2IO: Save/Restore unused buffer mappings in 2/3 buffer mode
- Save/Restore unused buffer mappings in 2/3 buffer mode to avoid frequent mapping - Save/Restore adapter reset count during adapter reset (Resending; forgot to cc netdev) Signed-off-by: Santosh Rastapur <[EMAIL PROTECTED]> --- diff -Nurp patch1/drivers/net/s2io.c patch2/drivers/net/s2io.c --- patch1/drivers/net/s2io.c 2007-03-06 03:29:18.0 -0800 +++ patch2/drivers/net/s2io.c 2007-03-06 04:00:41.0 -0800 @@ -84,7 +84,7 @@ #include "s2io.h" #include "s2io-regs.h" -#define DRV_VERSION "2.0.17.1" +#define DRV_VERSION "2.0.19.1" /* S2io Driver name & version. */ static char s2io_driver_name[] = "Neterion"; @@ -2242,6 +2242,7 @@ static int fill_rx_buffers(struct s2io_n struct buffAdd *ba; unsigned long flags; struct RxD_t *first_rxdp = NULL; + u64 Buffer0_ptr = 0, Buffer1_ptr = 0; mac_control = &nic->mac_control; config = &nic->config; @@ -2342,7 +2343,14 @@ static int fill_rx_buffers(struct s2io_n * payload */ + /* save the buffer pointers to avoid frequent dma mapping */ + Buffer0_ptr = ((struct RxD3*)rxdp)->Buffer0_ptr; + Buffer1_ptr = ((struct RxD3*)rxdp)->Buffer1_ptr; memset(rxdp, 0, sizeof(struct RxD3)); + /* restore the buffer pointers for dma sync*/ + ((struct RxD3*)rxdp)->Buffer0_ptr = Buffer0_ptr; + ((struct RxD3*)rxdp)->Buffer1_ptr = Buffer1_ptr; + ba = &mac_control->rings[ring_no].ba[block_no][off]; skb_reserve(skb, BUF0_LEN); tmp = (u64)(unsigned long) skb->data; @@ -3307,6 +3315,7 @@ static void s2io_reset(struct s2io_nic * u16 subid, pci_cmd; int i; u16 val16; + unsigned long long reset_cnt = 0; DBG_PRINT(INIT_DBG,"%s - Resetting XFrame card %s\n", __FUNCTION__, sp->dev->name); @@ -3372,6 +3381,11 @@ new_way: /* Reset device statistics maintained by OS */ memset(&sp->stats, 0, sizeof (struct net_device_stats)); + /* save reset count */ + reset_cnt = sp->mac_control.stats_info->sw_stat.soft_reset_cnt; + memset(sp->mac_control.stats_info, 0, sizeof(struct stat_block)); + /* restore reset count */ + sp->mac_control.stats_info->sw_stat.soft_reset_cnt = reset_cnt; /* SXE-002: Configure link and activity LED to turn it off */ subid = sp->pdev->subsystem_device; @@ -4279,9 +4293,7 @@ static void s2io_updt_stats(struct s2io_ if (cnt == 5) break; /* Updt failed */ } while(1); - } else { - memset(sp->mac_control.stats_info, 0, sizeof(struct stat_block)); - } + } } /** - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] S2IO: Remove unused variables
- Remove unused variables from s2io_nic structure - Changed the memory failure printk messages to print only in debug mode - Updated the copyright messages (Resending; forgot to cc netdev) Signed-off-by: Santosh Rastapur <[EMAIL PROTECTED]> --- diff -Nurp patch/drivers/net/s2io.c patch1/drivers/net/s2io.c --- patch/drivers/net/s2io.c2007-03-06 03:28:39.0 -0800 +++ patch1/drivers/net/s2io.c 2007-03-06 03:29:18.0 -0800 @@ -1,6 +1,6 @@ / * s2io.c: A Linux PCI-X Ethernet driver for Neterion 10GbE Server NIC - * Copyright(c) 2002-2005 Neterion Inc. + * Copyright(c) 2002-2007 Neterion Inc. * This software may be used and distributed according to the terms of * the GNU General Public License (GPL), incorporated herein by reference. @@ -516,7 +516,7 @@ static int init_shared_mem(struct s2io_n mac_control->fifos[i].list_info = kmalloc(list_holder_size, GFP_KERNEL); if (!mac_control->fifos[i].list_info) { - DBG_PRINT(ERR_DBG, + DBG_PRINT(INFO_DBG, "Malloc failed for list_info\n"); return -ENOMEM; } @@ -542,9 +542,9 @@ static int init_shared_mem(struct s2io_n tmp_v = pci_alloc_consistent(nic->pdev, PAGE_SIZE, &tmp_p); if (!tmp_v) { - DBG_PRINT(ERR_DBG, + DBG_PRINT(INFO_DBG, "pci_alloc_consistent "); - DBG_PRINT(ERR_DBG, "failed for TxDL\n"); + DBG_PRINT(INFO_DBG, "failed for TxDL\n"); return -ENOMEM; } /* If we got a zero DMA address(can happen on @@ -561,9 +561,9 @@ static int init_shared_mem(struct s2io_n tmp_v = pci_alloc_consistent(nic->pdev, PAGE_SIZE, &tmp_p); if (!tmp_v) { - DBG_PRINT(ERR_DBG, + DBG_PRINT(INFO_DBG, "pci_alloc_consistent "); - DBG_PRINT(ERR_DBG, "failed for TxDL\n"); + DBG_PRINT(INFO_DBG, "failed for TxDL\n"); return -ENOMEM; } } @@ -2187,7 +2187,7 @@ static int fill_rxd_3buf(struct s2io_nic /* skb_shinfo(skb)->frag_list will have L4 data payload */ skb_shinfo(skb)->frag_list = dev_alloc_skb(dev->mtu + ALIGN_SIZE); if (skb_shinfo(skb)->frag_list == NULL) { - DBG_PRINT(ERR_DBG, "%s: dev_alloc_skb failed\n ", dev->name); + DBG_PRINT(INFO_DBG, "%s: dev_alloc_skb failed\n ", dev->name); return -ENOMEM ; } frag_list = skb_shinfo(skb)->frag_list; @@ -2313,8 +2313,8 @@ static int fill_rx_buffers(struct s2io_n /* allocate skb */ skb = dev_alloc_skb(size); if(!skb) { - DBG_PRINT(ERR_DBG, "%s: Out of ", dev->name); - DBG_PRINT(ERR_DBG, "memory to allocate SKBs\n"); + DBG_PRINT(INFO_DBG, "%s: Out of ", dev->name); + DBG_PRINT(INFO_DBG, "memory to allocate SKBs\n"); if (first_rxdp) { wmb(); first_rxdp->Control_1 |= RXD_OWN_XENA; @@ -2573,8 +2573,8 @@ static int s2io_poll(struct net_device * for (i = 0; i < config->rx_ring_num; i++) { if (fill_rx_buffers(nic, i) == -ENOMEM) { - DBG_PRINT(ERR_DBG, "%s:Out of memory", dev->name); - DBG_PRINT(ERR_DBG, " in Rx Poll!!\n"); + DBG_PRINT(INFO_DBG, "%s:Out of memory", dev->name); + DBG_PRINT(INFO_DBG, " in Rx Poll!!\n"); break; } } @@ -2590,8 +2590,8 @@ no_rx: for (i = 0; i < config->rx_ring_num; i++) { if (fill_rx_buffers(nic, i) == -ENOMEM) { - DBG_PRINT(ERR_DBG, "%s:Out of memory", dev->name); - DBG_PRINT(ERR_DBG, " in Rx Poll!!\n"); + DBG_PRINT(INFO_DBG, "%s:Out of memory", dev->name); + DBG_PRINT(INFO_DBG, " in Rx Poll!!\n"); break; } } @@ -2640,8 +2640,8 @@ static void s2io_netpoll(struct net_devi for (i = 0; i < config->rx_ring_num; i++) { if (fill_rx_buffers(nic, i)
Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing
[EMAIL PROTECTED] wrote on 03/06/2007 03:15:41 PM: > > David Stevens <[EMAIL PROTECTED]> wrote: > > >It looks to me like "rejoin" is essentially ip_mc_up(), and it'd be better > >to call that than add a nearly identical function. > >Won't ip_mc_up() acquire an additional reference (via > ip_mc_inc_group) to the IGMP_ALL_HOSTS im->users that would never be > released (in the case of bonding calling the function out of the blue)? Yes. I'm not sure that matters-- destroy_dev doesn't care how many references to a group, and IGMP_ALL_HOSTS isn't advertised (so wouldn't get a "leave" when you only down the interface, like other groups do). But since ip_mc_up() is *entirely* that join plus group_added() on all the existing groups, there really shouldn't be another. But the new device will need the all-hosts group in its hardware multicast filter, too, if it hasn't already been using multicasting. Your "reload" caller could just dec_group that group after calling ip_mc_up(). >In looking at it, the ip_mc_rejoin_group function (the new one > added with the patch) is a lot more like igmp_group_added() than > ip_mc_up(). No, group_added() is one group. mc_up() just calls group_added on all of them, which I think is what the rejoin was trying to do. >I'm not sure if the extra bits in igmp_group_added() are > worthy of concern; I'm thinking not, since im->loaded shouldn't be zero > coming in for the bonding case. "im->loaded" means the device has it in its multicast address filter. If you're switching devices, and didn't do all the multicast stuff on all the devices originally, then you want it to be 0 (and should make it so, like ip_mc_down() does). :-) >I think the meat that the "rejoin" wants is what's in > igmpv3_send_cr(), which appears to do the actual sending stuff. I'm not > sure if that's better to call directly (and risk locking adventures) or > to just trip the timer via igmp_ifc_event(). No, no, no -- please don't mess with those directly. It'd be a maintenance nightmare, and multicasting is device independent right now. :-) I'd hope there wouldn't be any bonding-specific code needed at this layer, which is why I hope something like using up/down would work out. +-DLS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] s2io: add PCI error recovery support
Jeff, Please apply and forward this patch upstream. Ram > -Original Message- > From: Ramkrishna Vepa > Sent: Monday, March 05, 2007 2:34 PM > To: 'Linas Vepstas' > Cc: Wen Xiong; linux-kernel@vger.kernel.org; linux- > [EMAIL PROTECTED]; netdev@vger.kernel.org; Jeff Garzik; Andrew > Morton > Subject: RE: [PATCH] s2io: add PCI error recovery support > > Comments on this patch - > > 1. device_close_flag is unused and is not required. > > +static pci_ers_result_t s2io_io_error_detected(struct pci_dev *pdev, > > + pci_channel_state_t > state) > > +{ > ... > > + do_s2io_card_down(sp, 0); > > + sp->device_close_flag = TRUE; /* Device is shut down. */ > > 2. s2io_reset can fail to reset the device. Ideally s2io_reset should > return a failure in this case (return is void now) and in this case could > s2io_io_slot_reset() be called again, maybe try thrice, in total, before > failing to reset the slot? > > Ram > > -Original Message- > > From: Linas Vepstas [mailto:[EMAIL PROTECTED] > > Sent: Thursday, February 15, 2007 3:09 PM > > To: Ramkrishna Vepa; Raghavendra Koushik; Ananda Raju > > Cc: Wen Xiong; linux-kernel@vger.kernel.org; linux- > > [EMAIL PROTECTED]; netdev@vger.kernel.org; Jeff Garzik; > Andrew > > Morton > > Subject: [PATCH] s2io: add PCI error recovery support > > > > > > Koushik, Raju, > > > > Please review, comment, and if you find this acceptable, > > please forward upstream. This patch incorporates all of > > fixes resulting from the last set of discussions, circa > > November 2006. > > > > --linas > > > > This patch adds PCI error recovery support to the > > s2io 10-Gigabit ethernet device driver. Fourth revision, > > blocks interrupts and the watchdog. Adds a flag to > > s2io_down(), to avoid doing I/O when PCI bus is offline. > > > > Tested, seems to work well. > > > > Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> > > Acked-by: Ramkrishna Vepa <[EMAIL PROTECTED]> > > Cc: Raghavendra Koushik <[EMAIL PROTECTED]> > > Cc: Ananda Raju <[EMAIL PROTECTED]> > > Cc: Wen Xiong <[EMAIL PROTECTED]> > > > > > > drivers/net/s2io.c | 116 > > ++--- > > drivers/net/s2io.h |5 ++ > > 2 files changed, 116 insertions(+), 5 deletions(-) > > > > Index: linux-2.6.20-git4/drivers/net/s2io.c > > === > > --- linux-2.6.20-git4.orig/drivers/net/s2io.c 2007-02-15 > > 15:39:35.0 -0600 > > +++ linux-2.6.20-git4/drivers/net/s2io.c2007-02-15 > 16:15:10.0 - > > 0600 > > @@ -435,11 +435,18 @@ static struct pci_device_id s2io_tbl[] _ > > > > MODULE_DEVICE_TABLE(pci, s2io_tbl); > > > > +static struct pci_error_handlers s2io_err_handler = { > > + .error_detected = s2io_io_error_detected, > > + .slot_reset = s2io_io_slot_reset, > > + .resume = s2io_io_resume, > > +}; > > + > > static struct pci_driver s2io_driver = { > >.name = "S2IO", > >.id_table = s2io_tbl, > >.probe = s2io_init_nic, > >.remove = __devexit_p(s2io_rem_nic), > > + .err_handler = &s2io_err_handler, > > }; > > > > /* A simplifier macro used both by init and free shared_mem Fns(). */ > > @@ -2577,6 +2584,9 @@ static void s2io_netpoll(struct net_devi > > u64 val64 = 0xULL; > > int i; > > > > + if (pci_channel_offline(nic->pdev)) > > + return; > > + > > disable_irq(dev->irq); > > > > atomic_inc(&nic->isr_cnt); > > @@ -3079,6 +3089,8 @@ static void alarm_intr_handler(struct s2 > > int i; > > if (atomic_read(&nic->card_state) == CARD_DOWN) > > return; > > + if (pci_channel_offline(nic->pdev)) > > + return; > > nic->mac_control.stats_info->sw_stat.ring_full_cnt = 0; > > /* Handling the XPAK counters update */ > > if(nic->mac_control.stats_info->xpak_stat.xpak_timer_count < 72000) > > { > > @@ -4117,6 +4129,10 @@ static irqreturn_t s2io_isr(int irq, voi > > struct mac_info *mac_control; > > struct config_param *config; > > > > + /* Pretend we handled any irq's from a disconnected card */ > > + if (pci_channel_offline(sp->pdev)) > > + return IRQ_NONE; > > + > > atomic_inc(&sp->isr_cnt); > > mac_control = &sp->mac_control; > > config = &sp->config; > > @@ -6188,7 +6204,7 @@ static void s2io_rem_isr(struct s2io_nic > > } while(cnt < 5); > > } > > > > -static void s2io_card_down(struct s2io_nic * sp) > > +static void do_s2io_card_down(struct s2io_nic * sp, int do_io) > > { > > int cnt = 0; > > struct XENA_dev_config __iomem *bar0 = sp->bar0; > > @@ -6203,7 +6219,8 @@ static void s2io_card_down(struct s2io_n > > atomic_set(&sp->card_state, CARD_DOWN); > > > > /* disable Tx and Rx traffic on the NIC */ > > - stop_nic(sp); > > + if (do_io) > > + stop_nic(sp); > > > > s2io_rem_isr(sp); > > > > @@ -6211,7 +6228,7 @@ static voi
Re: [patch 2/2] div64_64: common code
On Tue, 06 Mar 2007 15:21:40 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > From: Andrew Morton <[EMAIL PROTECTED]> > Date: Tue, 6 Mar 2007 14:32:06 -0800 > > > ho hum, I didn't know that, so we missed rc2-mm2. > > > > Could I have symlinks in /pub/scm/linux/kernel/git/davem/ to net-latest and > > sparc-latest please? > > When we are in a merge window, I have only one tree for sparc > and networking. > > But when we're in RC mode, I've got multiple trees, one each > for bug fixes and one for stuff which will get submitted in > the next merge window. > > When Linus pulls in the bug fixes, I rebase the merge window trees so > that all the fixes get integrated to the merge tree and I can resolve > any conflicts, if any. > > So, which one(s) do you want? :-) The merge-window things, generally. I assume from the above that the merge-window tree doesn't contain the paterial in the bugfixes tree? If so, I guess I'd need both. If not, the merge-window tree should contain everything? I dunno - you know your trees better than I. The bottom line is I want everything you've got, and it'd be nice to fix this problem where I don't know that a new tree has been opened up, so I miss it - how can we do this? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink recvmsg() and MSG_TRUNC
On Tue, Mar 06, 2007 at 04:05:57PM -0800, David Miller wrote: > > Actually, more accurately it's using PAGE_SIZE. :) Aha it's you non-i386 people :) > I see, so the better fix would be to make glibc's > netlink_request() function start with a getpagesize()'d > buffer. Yes that's a good idea. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink recvmsg() and MSG_TRUNC
From: Herbert Xu <[EMAIL PROTECTED]> Date: Wed, 7 Mar 2007 11:04:19 +1100 > On Tue, Mar 06, 2007 at 04:02:02PM -0800, David Miller wrote: > > > > Create a lot of intefaces, try to dump them :-) > > Dumps should be done using 4K (NLMSG_GOODSIZE) skb's, where is the problem? Actually, more accurately it's using PAGE_SIZE. :) > > GLIBC can even hit this via it's ifaddrs.c code. > > Do you have a simple test case that I can run? I see, so the better fix would be to make glibc's netlink_request() function start with a getpagesize()'d buffer. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink recvmsg() and MSG_TRUNC
On Tue, Mar 06, 2007 at 03:57:50PM -0800, Stephen Hemminger wrote: > > I know some commands send big blocks down of configuration information. > One example is netem statistical data, but there are others. You mean dumps? Unless someone is coalescing them I don't see a problem there. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink recvmsg() and MSG_TRUNC
On Tue, Mar 06, 2007 at 04:02:02PM -0800, David Miller wrote: > > Create a lot of intefaces, try to dump them :-) Dumps should be done using 4K (NLMSG_GOODSIZE) skb's, where is the problem? > GLIBC can even hit this via it's ifaddrs.c code. Do you have a simple test case that I can run? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink recvmsg() and MSG_TRUNC
From: Herbert Xu <[EMAIL PROTECTED]> Date: Wed, 07 Mar 2007 10:49:07 +1100 > Which netlink family generates (or needs to generate) unbounded > messages to user-space? Or indeed which ones generate messages > greater than 64K (or 4K for that matter)? Create a lot of intefaces, try to dump them :-) GLIBC can even hit this via it's ifaddrs.c code. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink recvmsg() and MSG_TRUNC
On Wed, 07 Mar 2007 10:49:07 +1100 Herbert Xu <[EMAIL PROTECTED]> wrote: > David Miller <[EMAIL PROTECTED]> wrote: > > > > I guess one thing the user could do when it sees MSG_TRUNC > > is keep calling recvmsg() until the receive queue is emptied > > of packets, in order to get that pesky nlk->cb cleared to > > NULL, then resubmit. > > > > But that's rediculous and complicated. > > > > Any ideas? > > Which netlink family generates (or needs to generate) unbounded > messages to user-space? Or indeed which ones generate messages > greater than 64K (or 4K for that matter)? > > Cheers, I know some commands send big blocks down of configuration information. One example is netem statistical data, but there are others. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink recvmsg() and MSG_TRUNC
David Miller <[EMAIL PROTECTED]> wrote: > > I guess one thing the user could do when it sees MSG_TRUNC > is keep calling recvmsg() until the receive queue is emptied > of packets, in order to get that pesky nlk->cb cleared to > NULL, then resubmit. > > But that's rediculous and complicated. > > Any ideas? Which netlink family generates (or needs to generate) unbounded messages to user-space? Or indeed which ones generate messages greater than 64K (or 4K for that matter)? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink recvmsg() and MSG_TRUNC
On Tue, 6 Mar 2007, David Miller wrote: > I guess one thing the user could do when it sees MSG_TRUNC > is keep calling recvmsg() until the receive queue is emptied > of packets, in order to get that pesky nlk->cb cleared to > NULL, then resubmit. > > But that's rediculous and complicated. > > Any ideas? Only slightly less complicated: user calls recvmsg() once with a new flag MSG_FLUSH, which causes the queue to be flushed, then resubmits ? - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, Mar 06, 2007 at 23:53:49 +0200, Sami Farin wrote: ... > And I found bug in gcc-4.1.2, it gave 0 for ncubic results > when doing 1000 loops test... gcc-4.0.3 works. Found it. --- cbrt-test.c~2007-03-07 00:20:54.735248105 +0200 +++ cbrt-test.c 2007-03-07 00:21:03.964864343 +0200 @@ -209,7 +209,7 @@ __asm__("bsrl %1,%0\n\t" "cmovzl %2,%0" - : "=&r" (r) : "rm" (x), "rm" (-1)); + : "=&r" (r) : "rm" (x), "rm" (-1) : "memory"); return r+1; } Now Linux 2.6 does not have "memory" in fls, maybe it causes some gcc funnies some people are seeing. -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
Stephen Hemminger wrote: TCP can not assume anything about the path that a packet may take. We have declared a moratorium on loopback benchmark foolishness. Go optimize the idle loop instead ;-) Sure - A delay loop with fewer instructions is a worthwhile optimization because it has less impact on a CPU's instruction cache... -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc Chief Architect, OpenLDAP http://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] div64_64: common code
From: Andrew Morton <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 14:32:06 -0800 > ho hum, I didn't know that, so we missed rc2-mm2. > > Could I have symlinks in /pub/scm/linux/kernel/git/davem/ to net-latest and > sparc-latest please? When we are in a merge window, I have only one tree for sparc and networking. But when we're in RC mode, I've got multiple trees, one each for bug fixes and one for stuff which will get submitted in the next merge window. When Linus pulls in the bug fixes, I rebase the merge window trees so that all the fixes get integrated to the merge tree and I can resolve any conflicts, if any. So, which one(s) do you want? :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
netlink recvmsg() and MSG_TRUNC
So if you don't give a large enough buffer to recvmsg() for the netlink response a few things happen: 1) MSG_TRUNC is set 2) The length returned and the amount of data copied is the size given in the recvmsg() call 3) If enough other packets remain in the receive buffer, nlk->cb is left at non-NULL for a partial dump. This means that you can't just immediately resubmit the original request else you'll get NLMSG_ERROR with error set to -EBUSY. This is what netlink_dump_start() does when it sees nlk->cb non-NULL. Now, the user is basically stuck and there is no real way to recover from this besides doing something like openning up a new netlink socket and then doing the recvmsg() with a larger buffer, wash rinse repeat. I looked at how some of our standard userspace code handles this and it's not pretty: 1) iproute2 basically just uses a 16K buffer, signals an error when it sees MSG_TRUNC, and that's it, whoopee 2) Thomas's libnl believes that recvmsg() will return the true length necessary to receive the whole message, he signals on this to double the buffer size and try the recvmsg() again. As mentioned recvmsg() never returns a length larger than the given buffer size, so this code never triggers, and if it did it would lose entries because netlink_recvmsg() drops the SKB even when it signals MSG_TRUNC. The behavior of dropping the SKB matches what UDP does in the case of MSG_TRUNC. I guess one thing the user could do when it sees MSG_TRUNC is keep calling recvmsg() until the receive queue is emptied of packets, in order to get that pesky nlk->cb cleared to NULL, then resubmit. But that's rediculous and complicated. Any ideas? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bonding-devel] [PATCH 3/3] bonding: Improve IGMP join processing
David Stevens <[EMAIL PROTECTED]> wrote: >It looks to me like "rejoin" is essentially ip_mc_up(), and it'd be better >to call that than add a nearly identical function. Won't ip_mc_up() acquire an additional reference (via ip_mc_inc_group) to the IGMP_ALL_HOSTS im->users that would never be released (in the case of bonding calling the function out of the blue)? In looking at it, the ip_mc_rejoin_group function (the new one added with the patch) is a lot more like igmp_group_added() than ip_mc_up(). I'm not sure if the extra bits in igmp_group_added() are worthy of concern; I'm thinking not, since im->loaded shouldn't be zero coming in for the bonding case. I think the meat that the "rejoin" wants is what's in igmpv3_send_cr(), which appears to do the actual sending stuff. I'm not sure if that's better to call directly (and risk locking adventures) or to just trip the timer via igmp_ifc_event(). Anyway, it looks like all of this needs to be done under RTNL, which isn't the case, so I need to go off and look into reworking it again. Andy: do you have any work in progress on the sleep / rtnl stuff we've been discussing? >Also, real interfaces already do gratuitous IGMP advertisements when >they are bounced (the reason there is an ip_mc_up()). Could bonding, >when failing over, simply mark the master interface as down, switch, and >then mark the master as up again? In addition to doing the right >thing for both IPv4 and IPv6 multicasting w/o any code changes in those >layers, it may have similar benefits for ARP and neighbor discovery, >right? Marking the master down would, I believe, issue notifiers that the device has gone down. Various things, network manager sort of applications in particular, listen to those, so I'm not sure it's a good idea. I think there are other side effects as well, I'm thinking it would flush routes associated with the interface as well. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, Mar 06, 2007 at 10:29:41 -0800, Stephen Hemminger wrote: > Don't count the existing Newton-Raphson out. It turns out that to get enough > precision for 32 bits, only 4 iterations are needed. By unrolling those, it > gets much better timing. > > Slightly gross test program (with original cubic wraparound bug fixed). ... > {~0, 2097151}, ^^^ this should be 2642245. Without serializing instruction before rdtsc and with one loop I do not get very accurate results (104 for ncubic, > 1000 for others). #define rdtscll_serialize(val) \ __asm__ __volatile__("movl $0, %%eax\n\tcpuid\n\trdtsc\n" : "=A" (val) : : "ebx", "ecx") Here Pentium D timings for 1000 loops. ~0, 2097151 Function clocks mean(us) max(us) std(us) total error ocubic 9120.306 20.3170.730 545101 ncubic 7770.261 14.7990.486 576263 acbrt 11680.392 21.6810.547 547562 hcbrt 8270.278 15.2440.3872410 ~0, 2642245 Function clocks mean(us) max(us) std(us) total error ocubic 9080.305 20.2100.656 7 ncubic 7750.260 14.7920.550 31169 acbrt 11760.395 22.0170.9702468 hcbrt 8260.278 15.3260.670 547504 And I found bug in gcc-4.1.2, it gave 0 for ncubic results when doing 1000 loops test... gcc-4.0.3 works. -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
cube root benchmark code
Here is a better version of the benchmark code. It has the original code used in 2.4 version of Cubic for comparison --- /* Test and measure perf of cube root algorithms. */ #include #include #include #include #include #ifdef __x86_64 #define rdtscll(val) do { \ unsigned int __a,__d; \ asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); \ (val) = ((unsigned long)__a) | (((unsigned long)__d)<<32); \ } while(0) # define do_div(n,base) ({ \ uint32_t __base = (base); \ uint32_t __rem; \ __rem = ((uint64_t)(n)) % __base; \ (n) = ((uint64_t)(n)) / __base; \ __rem; \ }) /** * __ffs - find first bit in word. * @word: The word to search * * Undefined if no bit exists, so code should check against 0 first. */ static __inline__ unsigned long __ffs(unsigned long word) { __asm__("bsfq %1,%0" :"=r" (word) :"rm" (word)); return word; } /* * __fls: find last bit set. * @word: The word to search * * Undefined if no zero exists, so code should check against ~0UL first. */ static inline unsigned long __fls(unsigned long word) { __asm__("bsrq %1,%0" :"=r" (word) :"rm" (word)); return word; } /** * ffs - find first bit set * @x: the word to search * * This is defined the same way as * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz (man ffs). */ static __inline__ int ffs(int x) { int r; __asm__("bsfl %1,%0\n\t" "cmovzl %2,%0" : "=r" (r) : "rm" (x), "r" (-1)); return r+1; } /** * fls - find last bit set * @x: the word to search * * This is defined the same way as ffs. */ static inline int fls(int x) { int r; __asm__("bsrl %1,%0\n\t" "cmovzl %2,%0" : "=&r" (r) : "rm" (x), "rm" (-1)); return r+1; } /** * fls64 - find last bit set in 64 bit word * @x: the word to search * * This is defined the same way as fls. */ static inline int fls64(uint64_t x) { if (x == 0) return 0; return __fls(x) + 1; } static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor) { return dividend / divisor; } #elif __i386 #define rdtscll(val) \ __asm__ __volatile__("rdtsc" : "=A" (val)) /** * ffs - find first bit set * @x: the word to search * * This is defined the same way as * the libc and compiler builtin ffs routines, therefore * differs in spirit from the above ffz() (man ffs). */ static inline int ffs(int x) { int r; __asm__("bsfl %1,%0\n\t" "jnz 1f\n\t" "movl $-1,%0\n" "1:" : "=r" (r) : "rm" (x)); return r+1; } /** * fls - find last bit set * @x: the word to search * * This is defined the same way as ffs(). */ static inline int fls(int x) { int r; __asm__("bsrl %1,%0\n\t" "jnz 1f\n\t" "movl $-1,%0\n" "1:" : "=r" (r) : "rm" (x)); return r+1; } static inline int fls64(uint64_t x) { uint32_t h = x >> 32; if (h) return fls(h) + 32; return fls(x); } #define do_div(n,base) ({ \ unsigned long __upper, __low, __high, __mod, __base; \ __base = (base); \ asm("":"=a" (__low), "=d" (__high):"A" (n)); \ __upper = __high; \ if (__high) { \ __upper = __high % (__base); \ __high = __high / (__base); \ } \ asm("divl %2":"=a" (__low), "=d" (__mod):"rm" (__base), "0" (__low), "1" (__upper)); \ asm("":"=A" (n):"a" (__low),"d" (__high)); \ __mod; \ }) /* 64bit divisor, dividend and result. dynamic precision */ static uint64_t div64_64(uint64_t dividend, uint64_t divisor) { uint32_t d = divisor; if (divisor > 0xULL) { unsigned int shift = fls(divisor >> 32); d = divisor >> shift; dividend >>= shift; } /* avoid 64 bit division if possible */ if (dividend >> 32) do_div(dividend, d); else dividend = (uint32_t) dividend / d; return dividend; } #endif /* Andi Kleen's version */ uint32_t acbrt(uint64_t x) { uint32_t y = 0; int s; for (s = 63; s >= 0; s -= 3) { uint64_t b, bs; y = 2 * y; b = 3 * y * (y+1) + 1; bs = b << s; if (x >= bs && (b == (bs>>s))) { /* avoid overflow */ x -= bs; y++; } }
[PATCH] TCP Yeah: cleanup
Eliminate need for full 6/4/64 divide to compute queue. Variable maxqueue was really a constant. Fix indentation. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/ipv4/tcp_yeah.c | 42 +++--- 1 file changed, 23 insertions(+), 19 deletions(-) --- net-2.6.22.orig/net/ipv4/tcp_yeah.c 2007-03-06 11:46:34.0 -0800 +++ net-2.6.22/net/ipv4/tcp_yeah.c 2007-03-06 11:54:54.0 -0800 @@ -74,7 +74,7 @@ } static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack, -u32 seq_rtt, u32 in_flight, int flag) + u32 seq_rtt, u32 in_flight, int flag) { struct tcp_sock *tp = tcp_sk(sk); struct yeah *yeah = inet_csk_ca(sk); @@ -142,8 +142,8 @@ */ if (yeah->cntRTT > 2) { - u32 rtt; - u32 queue, maxqueue; + u32 rtt, queue; + u64 bw; /* We have enough RTT samples, so, using the Vegas * algorithm, we determine if we should increase or @@ -158,32 +158,36 @@ */ rtt = yeah->minRTT; - queue = (u32)div64_64((u64)tp->snd_cwnd * (rtt - yeah->baseRTT), rtt); - - maxqueue = TCP_YEAH_ALPHA; - - if (queue > maxqueue || - rtt - yeah->baseRTT > (yeah->baseRTT / TCP_YEAH_PHY)) { - - if (queue > maxqueue && tp->snd_cwnd > yeah->reno_count) { - u32 reduction = min( queue / TCP_YEAH_GAMMA , -tp->snd_cwnd >> TCP_YEAH_EPSILON ); + /* Compute excess number of packets above bandwidth +* Avoid doing full 64 bit divide. +*/ + bw = tp->snd_cwnd; + bw *= rtt - yeah->baseRTT; + do_div(bw, rtt); + queue = bw; + + if (queue > TCP_YEAH_ALPHA || + rtt - yeah->baseRTT > (yeah->baseRTT / TCP_YEAH_PHY)) { + if (queue > TCP_YEAH_ALPHA + && tp->snd_cwnd > yeah->reno_count) { + u32 reduction = min(queue / TCP_YEAH_GAMMA , + tp->snd_cwnd >> TCP_YEAH_EPSILON); tp->snd_cwnd -= reduction; - tp->snd_cwnd = max( tp->snd_cwnd, yeah->reno_count); + tp->snd_cwnd = max(tp->snd_cwnd, + yeah->reno_count); tp->snd_ssthresh = tp->snd_cwnd; - } + } if (yeah->reno_count <= 2) - yeah->reno_count = max( tp->snd_cwnd>>1, 2U); + yeah->reno_count = max(tp->snd_cwnd>>1, 2U); else yeah->reno_count++; - yeah->doing_reno_now = - min_t( u32, yeah->doing_reno_now + 1 , 0xff); - + yeah->doing_reno_now = min(yeah->doing_reno_now + 1, + 0xffU); } else { yeah->fast_count++; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
On Tue, 06 Mar 2007 14:07:09 -0800 Howard Chu <[EMAIL PROTECTED]> wrote: > David Miller wrote: > > From: Rick Jones <[EMAIL PROTECTED]> > > Date: Tue, 06 Mar 2007 13:25:35 -0800 > > > >>> On the other hand, being able to configure a small MSL for the loopback > >>> device is perfectly safe. Being able to configure a small MSL for other > >>> interfaces may be safe, depending on the rest of the network layout. > >> A peanut gallery question - I seem to recall prior discussions about how > >> one cannot assume that a packet destined for a given IP address will > >> remain detined for that given IP address as it could go through a module > >> that will rewrite headers etc. > > > > That's right, both netfilter and the packet scheduler actions > > can do that, that's why this whole idea about changing the MSL > > on loopback by default is wrong and pointless. > > If the headers get rewritten and the packet gets directed elsewhere, > then we're no longer talking about a loopback connection, so that's > outside the discussion. > > If the packet gets munged by multiple filters but still eventually gets > to the specified destination, OK. But regardless, if both endpoints of > the connection are on the loopback device, then there is nothing wrong > with the idea. Those filters can only do so much, they still have to > preserve the reliable in-order delivery semantics of TCP, otherwise the > system is broken. > > It may not have much use, sure, I admitted that much from the outset. > > So I'll leave it at this, thanks for the feedback. TCP can not assume anything about the path that a packet may take. We have declared a moratorium on loopback benchmark foolishness. Go optimize the idle loop instead ;-) -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ARP notify option
On Tue, 06 Mar 2007 15:18:07 -0600 "Chris Friesen" <[EMAIL PROTECTED]> wrote: > Stephen Hemminger wrote: > > > +arp_notify - BOOLEAN > > + Define mode for notification of address and device changes. > > + 0 - (default): do nothing > > + 1 - Generate gratuitous arp replies when device is brought up > > + or hardware address changes. > > Did you consider using gratuitous arp requests instead? I remember > reading about some hardware that updated its arp cache on gratuitous > requests but not gratuitous replies. > > Chris I copied the ARP generation from other places that were doing gratuitous ARP already: Xen and irlan. Our local switch used REPLY's to do the same thing. One could imagine making it a ternary value and having 2 generate REQUEST's. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] tcp_cubic: faster cube root
The Newton-Raphson method is quadratically convergent so only a small fixed number of steps are necessary. Therefore it is faster to unroll the loop. Since div64_64 is no longer inline it won't cause code explosion. Also fixes a bug that can occur if x^2 was bigger than 32 bits. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/ipv4/tcp_cubic.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) --- net-2.6.22.orig/net/ipv4/tcp_cubic.c2007-03-06 12:24:34.0 -0800 +++ net-2.6.22/net/ipv4/tcp_cubic.c 2007-03-06 14:43:37.0 -0800 @@ -96,23 +96,17 @@ */ static u32 cubic_root(u64 a) { - u32 x, x1; + u64 x; /* Initial estimate is based on: * cbrt(x) = exp(log(x) / 3) */ x = 1u << (fls64(a)/3); - /* -* Iteration based on: -* 2 -* x= ( 2 * x + a / x ) / 3 -* k+1 k k -*/ - do { - x1 = x; - x = (2 * x + (uint32_t) div64_64(a, x*x)) / 3; - } while (abs(x1 - x) > 1); + /* converges to 32 bits in 3 iterations */ + x = (2 * x + div64_64(a, x*x)) / 3; + x = (2 * x + div64_64(a, x*x)) / 3; + x = (2 * x + div64_64(a, x*x)) / 3; return x; } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] div64_64: common code
On Tue, 06 Mar 2007 10:11:40 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > From: [EMAIL PROTECTED] > Date: Tue, 06 Mar 2007 02:42:28 -0800 > > > From: Stephen Hemminger <[EMAIL PROTECTED]> > > > > Implement div64_64(): 64-bit by 64-bit division. Needed by networking (at > > least). > > This patch, with the types.h fixes of your's, is already in my > net-2.6.22 GIT tree if you'd like to start pulling from there > Andrew. ho hum, I didn't know that, so we missed rc2-mm2. Could I have symlinks in /pub/scm/linux/kernel/git/davem/ to net-latest and sparc-latest please? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] bonding: Improve IGMP join processing
It looks to me like "rejoin" is essentially ip_mc_up(), and it'd be better to call that than add a nearly identical function. Also, real interfaces already do gratuitous IGMP advertisements when they are bounced (the reason there is an ip_mc_up()). Could bonding, when failing over, simply mark the master interface as down, switch, and then mark the master as up again? In addition to doing the right thing for both IPv4 and IPv6 multicasting w/o any code changes in those layers, it may have similar benefits for ARP and neighbor discovery, right? Maybe not-- haven't looked at it... One down side for IPv6 (which apparently bonding doesn't support) is that static addresses are lost when the device goes down, but that's a difference form IPv4 that should be fixed. +-DLS - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH]: Dynamically sized routing cache hash table.
From: Robert Olsson <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 14:26:04 +0100 > David Miller writes: > > > Actually, more accurately, the conflict exists in how this GC > > logic is implemented. The core issue is that hash table size > > guides the GC processing, and hash table growth therefore > > modifies those GC goals. So with the patch below we'll just > > keep growing the hash table instead of giving GC some time to > > try to keep the working set in equilibrium before doing the > > hash grow. > > AFIK the equilibrium is resizing function as well but using fixed > hash table. So can we do without equilibrium resizing if tables > are dynamic? I think so > > With the hash data structure we could monitor the average chain > length or just size and resize hash after that. I'm not so sure, it may be a mistake to eliminate the equilibrium logic. One error I think it does have is the usage of chain length. Even a nearly perfect hash has small lumps in distribution, and we should not penalize entries which fall into these lumps. Let us call T the threshold at which we would grow the routing hash table. As we approach T we start to GC. Let's assume hash table has shift = 2. and T would (with T=N+(N>>1) algorithm) therefore be 6. TABLE: [0] DST1, DST2 [1] DST3, DST4, DST5 DST6 arrives, what should we do? If we just accept it and don't GC some existing entries, we will grow the hash table. This is the wrong thing to do if our true working set is smaller than 6 entries and thus some of the existing entries are unlikely to be reused and thus could be purged to keep us from hitting T. If they are all active, growing is the right thing to do. This is the crux of the whole routing cache problem. I am of the opinion that LRU, for routes not attached to sockets, is probably the best thing to do here. Furthermore at high packet rates, the current rt_may_expire() logic probably is not very effective since it's granularity is limited to jiffies. We can quite easily create 100,000 or more entries per jiffie when HZ=100 during rDOS, for example. So perhaps some global LRU algorithm using ktime is more appropriate. Global LRU is not easy without touching a lot of memory. But I'm sure some clever trick can be discovered by someone :) It is amusing, but it seems that for rDOS workload most optimal routing hash would be tiny one like my example above. All packets essentially miss the routing cache and create new entry. So keeping the working set as small as possible is what you want to do since no matter how large you grow your hit rate will be zero :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
David Miller wrote: From: Rick Jones <[EMAIL PROTECTED]> Date: Tue, 06 Mar 2007 13:25:35 -0800 On the other hand, being able to configure a small MSL for the loopback device is perfectly safe. Being able to configure a small MSL for other interfaces may be safe, depending on the rest of the network layout. A peanut gallery question - I seem to recall prior discussions about how one cannot assume that a packet destined for a given IP address will remain detined for that given IP address as it could go through a module that will rewrite headers etc. That's right, both netfilter and the packet scheduler actions can do that, that's why this whole idea about changing the MSL on loopback by default is wrong and pointless. If the headers get rewritten and the packet gets directed elsewhere, then we're no longer talking about a loopback connection, so that's outside the discussion. If the packet gets munged by multiple filters but still eventually gets to the specified destination, OK. But regardless, if both endpoints of the connection are on the loopback device, then there is nothing wrong with the idea. Those filters can only do so much, they still have to preserve the reliable in-order delivery semantics of TCP, otherwise the system is broken. It may not have much use, sure, I admitted that much from the outset. So I'll leave it at this, thanks for the feedback. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc Chief Architect, OpenLDAP http://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ARP notify option
Stephen Hemminger wrote: +arp_notify - BOOLEAN + Define mode for notification of address and device changes. + 0 - (default): do nothing + 1 - Generate gratuitous arp replies when device is brought up + or hardware address changes. Did you consider using gratuitous arp requests instead? I remember reading about some hardware that updated its arp cache on gratuitous requests but not gratuitous replies. Chris - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions
From: Baruch Even <[EMAIL PROTECTED]> Date: Wed, 7 Mar 2007 00:01:46 +0200 > * David Miller <[EMAIL PROTECTED]> [070306 23:47]: > > From: Baruch Even <[EMAIL PROTECTED]> > > Date: Tue, 6 Mar 2007 21:42:59 +0200 > > > > > * Ilpo J?rvinen <[EMAIL PROTECTED]> [070306 14:52]: > > > > + newtp->highest_sack = treq->snt_isn + 1; > > > > > > That's the only initialization that you have for highest_sack, I think > > > that you should initialize it when a loss is detected to the start_seq > > > of the first packet that wasn't acked. > > > > He also sets it in tcp_sacktag_write_queue() like this: > > > > + > > + if (after(TCP_SKB_CB(skb)->seq, > > + tp->highest_sack)) > > + tp->highest_sack = TCP_SKB_CB(skb)->seq; > > Yes, but that's still not enough if between the start of the connection > and the first sack block we already wrapped around to before the old > highest_sack. It might not be a common occurrence but it's still > something to take care of. Aha, I see, yes good point. That would need to be fixed. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions
* David Miller <[EMAIL PROTECTED]> [070306 23:47]: > From: Baruch Even <[EMAIL PROTECTED]> > Date: Tue, 6 Mar 2007 21:42:59 +0200 > > > * Ilpo J?rvinen <[EMAIL PROTECTED]> [070306 14:52]: > > > + newtp->highest_sack = treq->snt_isn + 1; > > > > That's the only initialization that you have for highest_sack, I think > > that you should initialize it when a loss is detected to the start_seq > > of the first packet that wasn't acked. > > He also sets it in tcp_sacktag_write_queue() like this: > > + > + if (after(TCP_SKB_CB(skb)->seq, > + tp->highest_sack)) > + tp->highest_sack = TCP_SKB_CB(skb)->seq; Yes, but that's still not enough if between the start of the connection and the first sack block we already wrapped around to before the old highest_sack. It might not be a common occurrence but it's still something to take care of. Baruch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 10:29:41 -0800 > /* calculate the cubic root of x using Newton-Raphson */ > static uint32_t ncubic(uint64_t a) > { > uint64_t x; > > /* Initial estimate is based on: >* cbrt(x) = exp(log(x) / 3) >*/ > x = 1u << (fls64(a)/3); > > /* Converges in 3 iterations to > 32 bits */ > > x = (2 * x + div64_64(a, x*x)) / 3; > x = (2 * x + div64_64(a, x*x)) / 3; > x = (2 * x + div64_64(a, x*x)) / 3; > > return x; > } Indeed that will be the fastest variant for cpus with hw integer division. I did a quick sparc64 port, here is what I got: Function clocks mean(us) max(us) std(us) total error ocubic 529 0.3515.16 0.66 545101 ncubic 498 0.3312.83 0.36 576263 acbrt 427 0.2811.04 0.33 547562 hcbrt 393 0.2610.18 0.47 2410 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions
From: Baruch Even <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 21:42:59 +0200 > * Ilpo J?rvinen <[EMAIL PROTECTED]> [070306 14:52]: > > + newtp->highest_sack = treq->snt_isn + 1; > > That's the only initialization that you have for highest_sack, I think > that you should initialize it when a loss is detected to the start_seq > of the first packet that wasn't acked. He also sets it in tcp_sacktag_write_queue() like this: + + if (after(TCP_SKB_CB(skb)->seq, + tp->highest_sack)) + tp->highest_sack = TCP_SKB_CB(skb)->seq; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/2] div64_64: common code
From: Ralf Baechle <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 19:46:32 + > On Tue, Mar 06, 2007 at 02:42:28AM -0800, [EMAIL PROTECTED] wrote: > > > Implement div64_64(): 64-bit by 64-bit division. Needed by networking (at > > least). > > Your patch only implements div64_64() for 32-bit MIPS. Below patch adds > the trivial 64-bit bits. > > Ralf > > Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]> Applied to net-2.6.22, thanks Ralf. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix compat_sock_common_getsockopt typo
From: James Morris <[EMAIL PROTECTED]> Date: Tue, 6 Mar 2007 10:06:00 -0500 (EST) > On Tue, 6 Mar 2007, Johannes Berg wrote: > > > This patch fixes a typo in compat_sock_common_getsockopt. > > > > Signed-off-by: Johannes Berg <[EMAIL PROTECTED]> > > > > --- wireless-dev.orig/net/core/sock.c 2007-03-06 15:44:15.618565674 > > +0100 > > +++ wireless-dev/net/core/sock.c2007-03-06 15:44:25.948565674 +0100 > > @@ -1597,7 +1597,7 @@ int compat_sock_common_getsockopt(struct > > { > > struct sock *sk = sock->sk; > > > > - if (sk->sk_prot->compat_setsockopt != NULL) > > + if (sk->sk_prot->compat_getsockopt != NULL) > > return sk->sk_prot->compat_getsockopt(sk, level, optname, > > optval, optlen); > > return sk->sk_prot->getsockopt(sk, level, optname, optval, optlen); > > > Acked-by: James Morris <[EMAIL PROTECTED]> Applied, thanks evryone. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6 Davelopment Tree
From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Date: Fri, 23 Feb 2007 12:53:01 +0900 (JST) > I have cooked up new git tree for IPv6 development. > It is available as branch named > 2.6.21-rc1-net-2.6-20070223-FOR_DAVEM-20070223 > at > . > > I will shift to new branch time to time (e.g. every -rc releases) in order > to chase the latest tree. What is the current branch name? I'd like to pull whatever you have into my net-2.6.22 tree. Thank you. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
From: Rick Jones <[EMAIL PROTECTED]> Date: Tue, 06 Mar 2007 13:25:35 -0800 > > On the other hand, being able to configure a small MSL for the loopback > > device is perfectly safe. Being able to configure a small MSL for other > > interfaces may be safe, depending on the rest of the network layout. > > A peanut gallery question - I seem to recall prior discussions about how > one cannot assume that a packet destined for a given IP address will > remain detined for that given IP address as it could go through a module > that will rewrite headers etc. That's right, both netfilter and the packet scheduler actions can do that, that's why this whole idea about changing the MSL on loopback by default is wrong and pointless. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: many sockets, slow sendto
Hi, Thx a lot you for the advice, I'll have a try. And sorry for the stupid webmail, I will not use it again. Zacco Andi Kleen wrote: Zaccomer Lajos <[EMAIL PROTECTED]> writes: I'm playing around with a simulation, in which many thousands of IP addresses (on interface aliases) are used to send/receive TCP/UDP Something seems to be wrong with your emailer. It adds a empty line between each real line. packets. I noticed that the time of send/sendto increased linearly with the number of file descriptors, and I found it rather strange. Yes that is strange. I would suggest you use oprofile to identify which parts of the kernel use the CPU time with many descriptors. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
On the other hand, being able to configure a small MSL for the loopback device is perfectly safe. Being able to configure a small MSL for other interfaces may be safe, depending on the rest of the network layout. A peanut gallery question - I seem to recall prior discussions about how one cannot assume that a packet destined for a given IP address will remain detined for that given IP address as it could go through a module that will rewrite headers etc. Is traffic destined for 127.0.0.1 immune from that? rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
Eric Dumazet wrote: Arf... dont tell me you forgot to do this... echo 1 >/proc/sys/net/ipv4/tcp_tw_recycle echo 1 >/proc/sys/net/ipv4/tcp_tw_reuse That does not appear to me to be a safe thing to do on a production machine. Tweaks that are only good in a test environment really don't help the testing effort; they just mask a problem that will surface later at deployment time. We could run our benchmarks this way and get high rates but no one deploying the server for real use would ever get anything like that, which makes the benchmark figure rather pointless. On the other hand, being able to configure a small MSL for the loopback device is perfectly safe. Being able to configure a small MSL for other interfaces may be safe, depending on the rest of the network layout. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc Chief Architect, OpenLDAP http://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Multipath routing in Linux 2.6
Hello list, I've been trying to figure out how to make equal-cost multipath routing work, with no luck. Asked on the LARTC list with no success, and attempts to contact the two authors privately yielded one bounce while the other declined to answer in private and pointed me to this list. So I'll just include the rest of the mail I sent them here (after doing a search-and-replace on the first three octets of all addresses, I'm a bit paranoid), hopefully someone has some suggestions for me... I'm using 2.6.20 on an x86_64 machine. I'm adding my route thusly: ip route add table 100 default \ nexthop via 1.1.1.1 nexthop via 1.1.1.9 It shows correctly up in the routing table: [EMAIL PROTECTED]:~# ip route show table 100 default nexthop via 1.1.1.1 dev vlan11 weight 1 nexthop via 1.1.1.9 dev vlan12 weight 1 [...] I'm sending traffic from a relatively busy network to this table: [EMAIL PROTECTED]:~# ip rule [...] 21000: from 1.1.2.128/26 lookup 100 [...] I can verify with tcpdump that the rule works correctly and that the route is used. However, the traffic is without exception routed via 1.1.1.9, not a single packet is sent to 1.1.1.1. If I however swap the two nexthops while adding the route, all traffic is sent to 1.1.1.1, and nothing ends up at 1.1.1.9. I've tried loading and unloading the multipath_{wrandom,rr,random,drr} modules, removing and readding the route, and flushing the routing cache. Several times and in different order. Nothing affects the behaviour though, all of the traffic is sent to the router specified as the second nexthop on the "ip route add" command line. I feel I'm missing something essential here but I have no idea what. Google only tells me about others having roughly the same problem but never any solution. Do you have any suggestions for me? If I can make this work I will be happy to document how and try to have that included in the next kernel/iproute release and hopefully nobody will bother you about it again. Thanks for your time! Kind regards -- Tore Anderson - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] bonding: Improve IGMP join processing
On 3/6/07, Jay Vosburgh <[EMAIL PROTECTED]> wrote: Brian Haley <[EMAIL PROTECTED]> wrote: >Andy Gospodarek wrote: >> If we are easily able to differentiate between the multicast addresses >> in the mc_list as to which are for ipv4 and which are for ipv6 then it >> would be easy to call-out to something in the ipv6 mcast code when >> needed instead of always calling out to ipv4 code. > >I've been unable to figure out exactly what you're referring to in the >code (bond_main.c), it seems to failover all multicast addresses, >regardless of what address family they are. I might have missed something >in 4K lines of code though? I believe Andy is talking about bond_resend_igmp_join_requests being only effective for IGMP v4 and not IGMP v6. The reason being that there is (a) no discrimination between v4 and v6 multicast addresses, and (b) for the v6 case, there's no "rejoin" type function as was created for IPv4 with the patch. /me nods - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
With transparant bridging, nobody knows how long the datagram may be out there. Admittedly, the chances of a datagram living for a full two minutes these days is probably nil, but just being in the same IP subnet doesn't really mean anything when it comes to physical locality. Bridging isn't necessarily a problem though. The 2MSL timeout is designed to prevent problems from delayed packets that got sent through multiple paths. In a bridging setup you don't allow multiple paths, that's what STP is designed to prevent. If you want to configure a network that allows multiple paths, you need to use a router, not a bridge. Well, there is trunking at the data link layer, and in theory there could be an active-standby where the standby took a somewhat different path. The timeout is also to cover datagrams which just got "stuck" somewhere too (IIRC) and may not necessarily require a multiple path situation. SPECweb benchmarking has had to deal with the issue of attempted TIME_WAIT reuse going back to 1997. It deals with it by not relying on the client's configured local/anonymous/ephemeral port number range and instead making explicit bind() calls in the (more or less) entire unpriv port range (actually it may just be from 5000 to 65535 but still) That still doesn't solve the problem, it only ~doubles the available port range. That means it takes 0.6 seconds to trigger the problem instead of only 0.3 seconds... True. Thankfully, the web learned to use persistent connections so later versions of SPECweb benchmarking make use of persistent connections. In an environment where connections are opened and closed very quickly with only a small amount of data carried per connection, it might make sense to remember the last sequence number used on a port and use that as the floor of the next randomly generated ISN. Monotonically increasing sequence numbers aren't a security risk if there's still a randomly determined gap from one connection to the next. But I don't think it's necessary to consider this at the moment. I thought that all the "security types" started squawking if the ISN wasn't completely random? I've not tried this, but if a client does want to cycle through thousands of connections per second, and if it is the one to initiate connection close, would it be sufficient to only use something like: socket() bind() loop: connect() request() response() shudtown(SHUT_RDWR) goto loop ie not call close on the FD so there is still a direct link to the connection in TIME_WAIT so one could in theory initiate a new connection from TIME_WAIT? Then in theory the randomness could be _almost_ the entire sequence space, less the previous connection's window (IIRC). rick jones rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
Howard Chu a écrit : Eric Dumazet wrote: Let me see, any chance you can try the prog on 2.6.20 ? Not any time soon. If not, please send : grep . /proc/sys/net/ipv4/* This is the output on the laptop: /proc/sys/net/ipv4/icmp_echo_ignore_all:0 /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts:1 /proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr:0 /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses:1 /proc/sys/net/ipv4/icmp_ratelimit:250 /proc/sys/net/ipv4/icmp_ratemask:6168 /proc/sys/net/ipv4/igmp_max_memberships:20 /proc/sys/net/ipv4/igmp_max_msf:10 /proc/sys/net/ipv4/inet_peer_gc_maxtime:120 /proc/sys/net/ipv4/inet_peer_gc_mintime:10 /proc/sys/net/ipv4/inet_peer_maxttl:600 /proc/sys/net/ipv4/inet_peer_minttl:120 /proc/sys/net/ipv4/inet_peer_threshold:65664 /proc/sys/net/ipv4/ip_default_ttl:64 /proc/sys/net/ipv4/ip_dynaddr:0 /proc/sys/net/ipv4/ip_forward:0 /proc/sys/net/ipv4/ipfrag_high_thresh:262144 /proc/sys/net/ipv4/ipfrag_low_thresh:196608 /proc/sys/net/ipv4/ipfrag_max_dist:64 /proc/sys/net/ipv4/ipfrag_secret_interval:600 /proc/sys/net/ipv4/ipfrag_time:30 /proc/sys/net/ipv4/ip_local_port_range:3276861000 /proc/sys/net/ipv4/ip_nonlocal_bind:0 /proc/sys/net/ipv4/ip_no_pmtu_disc:0 /proc/sys/net/ipv4/tcp_abc:0 /proc/sys/net/ipv4/tcp_abort_on_overflow:0 /proc/sys/net/ipv4/tcp_adv_win_scale:2 /proc/sys/net/ipv4/tcp_app_win:31 /proc/sys/net/ipv4/tcp_base_mss:512 /proc/sys/net/ipv4/tcp_congestion_control:reno /proc/sys/net/ipv4/tcp_dma_copybreak:4096 /proc/sys/net/ipv4/tcp_dsack:1 /proc/sys/net/ipv4/tcp_ecn:0 /proc/sys/net/ipv4/tcp_fack:1 /proc/sys/net/ipv4/tcp_fin_timeout:60 /proc/sys/net/ipv4/tcp_frto:0 /proc/sys/net/ipv4/tcp_keepalive_intvl:75 /proc/sys/net/ipv4/tcp_keepalive_probes:9 /proc/sys/net/ipv4/tcp_keepalive_time:7200 /proc/sys/net/ipv4/tcp_low_latency:0 /proc/sys/net/ipv4/tcp_max_orphans:32768 /proc/sys/net/ipv4/tcp_max_syn_backlog:1024 /proc/sys/net/ipv4/tcp_max_tw_buckets:18 /proc/sys/net/ipv4/tcp_mem:98304131072 196608 /proc/sys/net/ipv4/tcp_moderate_rcvbuf:1 /proc/sys/net/ipv4/tcp_mtu_probing:0 /proc/sys/net/ipv4/tcp_no_metrics_save:0 /proc/sys/net/ipv4/tcp_orphan_retries:0 /proc/sys/net/ipv4/tcp_reordering:3 /proc/sys/net/ipv4/tcp_retrans_collapse:1 /proc/sys/net/ipv4/tcp_retries1:3 /proc/sys/net/ipv4/tcp_retries2:15 /proc/sys/net/ipv4/tcp_rfc1337:0 /proc/sys/net/ipv4/tcp_rmem:409687380 4194304 /proc/sys/net/ipv4/tcp_sack:1 /proc/sys/net/ipv4/tcp_slow_start_after_idle:1 /proc/sys/net/ipv4/tcp_stdurg:0 /proc/sys/net/ipv4/tcp_synack_retries:5 /proc/sys/net/ipv4/tcp_syncookies:1 /proc/sys/net/ipv4/tcp_syn_retries:5 /proc/sys/net/ipv4/tcp_timestamps:1 /proc/sys/net/ipv4/tcp_tso_win_divisor:3 /proc/sys/net/ipv4/tcp_tw_recycle:0 /proc/sys/net/ipv4/tcp_tw_reuse:0 /proc/sys/net/ipv4/tcp_window_scaling:1 /proc/sys/net/ipv4/tcp_wmem:409616384 4194304 /proc/sys/net/ipv4/tcp_workaround_signed_windows:0 Arf... dont tell me you forgot to do this... echo 1 >/proc/sys/net/ipv4/tcp_tw_recycle echo 1 >/proc/sys/net/ipv4/tcp_tw_reuse - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] bonding: Improve IGMP join processing
Brian Haley <[EMAIL PROTECTED]> wrote: >Andy Gospodarek wrote: >> If we are easily able to differentiate between the multicast addresses >> in the mc_list as to which are for ipv4 and which are for ipv6 then it >> would be easy to call-out to something in the ipv6 mcast code when >> needed instead of always calling out to ipv4 code. > >I've been unable to figure out exactly what you're referring to in the >code (bond_main.c), it seems to failover all multicast addresses, >regardless of what address family they are. I might have missed something >in 4K lines of code though? I believe Andy is talking about bond_resend_igmp_join_requests being only effective for IGMP v4 and not IGMP v6. The reason being that there is (a) no discrimination between v4 and v6 multicast addresses, and (b) for the v6 case, there's no "rejoin" type function as was created for IPv4 with the patch. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET] netxen: fix warnings
CC [M] drivers/net/netxen/netxen_nic_hw.o drivers/net/netxen/netxen_nic_hw.c: In function 'netxen_nic_hw_resources': drivers/net/netxen/netxen_nic_hw.c:231: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'dma_addr_t' drivers/net/netxen/netxen_nic_hw.c:250: warning: format '%llx' expects type 'long long unsigned int', but argument 2 has type 'dma_addr_t' u64 is unsigned long so the cast to u64 will result in a warning on the printf arguments for 64-bit builds. So cast to unsigned long long instead. Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]> diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c index a2877f3..1be5570 100644 --- a/drivers/net/netxen/netxen_nic_hw.c +++ b/drivers/net/netxen/netxen_nic_hw.c @@ -228,7 +228,7 @@ int netxen_nic_hw_resources(struct netxen_adapter *adapter) &adapter->ctx_desc_pdev); printk("ctx_desc_phys_addr: 0x%llx\n", - (u64) adapter->ctx_desc_phys_addr); + (unsigned long long) adapter->ctx_desc_phys_addr); if (addr == NULL) { DPRINTK(ERR, "bad return from pci_alloc_consistent\n"); err = -ENOMEM; @@ -247,7 +247,8 @@ int netxen_nic_hw_resources(struct netxen_adapter *adapter) adapter->max_tx_desc_count, (dma_addr_t *) & hw->cmd_desc_phys_addr, &adapter->ahw.cmd_desc_pdev); - printk("cmd_desc_phys_addr: 0x%llx\n", (u64) hw->cmd_desc_phys_addr); + printk("cmd_desc_phys_addr: 0x%llx\n", + (unsigned long long) hw->cmd_desc_phys_addr); if (addr == NULL) { DPRINTK(ERR, "bad return from pci_alloc_consistent\n"); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
Eric Dumazet wrote: Let me see, any chance you can try the prog on 2.6.20 ? Not any time soon. If not, please send : grep . /proc/sys/net/ipv4/* This is the output on the laptop: /proc/sys/net/ipv4/icmp_echo_ignore_all:0 /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts:1 /proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr:0 /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses:1 /proc/sys/net/ipv4/icmp_ratelimit:250 /proc/sys/net/ipv4/icmp_ratemask:6168 /proc/sys/net/ipv4/igmp_max_memberships:20 /proc/sys/net/ipv4/igmp_max_msf:10 /proc/sys/net/ipv4/inet_peer_gc_maxtime:120 /proc/sys/net/ipv4/inet_peer_gc_mintime:10 /proc/sys/net/ipv4/inet_peer_maxttl:600 /proc/sys/net/ipv4/inet_peer_minttl:120 /proc/sys/net/ipv4/inet_peer_threshold:65664 /proc/sys/net/ipv4/ip_default_ttl:64 /proc/sys/net/ipv4/ip_dynaddr:0 /proc/sys/net/ipv4/ip_forward:0 /proc/sys/net/ipv4/ipfrag_high_thresh:262144 /proc/sys/net/ipv4/ipfrag_low_thresh:196608 /proc/sys/net/ipv4/ipfrag_max_dist:64 /proc/sys/net/ipv4/ipfrag_secret_interval:600 /proc/sys/net/ipv4/ipfrag_time:30 /proc/sys/net/ipv4/ip_local_port_range:3276861000 /proc/sys/net/ipv4/ip_nonlocal_bind:0 /proc/sys/net/ipv4/ip_no_pmtu_disc:0 /proc/sys/net/ipv4/tcp_abc:0 /proc/sys/net/ipv4/tcp_abort_on_overflow:0 /proc/sys/net/ipv4/tcp_adv_win_scale:2 /proc/sys/net/ipv4/tcp_app_win:31 /proc/sys/net/ipv4/tcp_base_mss:512 /proc/sys/net/ipv4/tcp_congestion_control:reno /proc/sys/net/ipv4/tcp_dma_copybreak:4096 /proc/sys/net/ipv4/tcp_dsack:1 /proc/sys/net/ipv4/tcp_ecn:0 /proc/sys/net/ipv4/tcp_fack:1 /proc/sys/net/ipv4/tcp_fin_timeout:60 /proc/sys/net/ipv4/tcp_frto:0 /proc/sys/net/ipv4/tcp_keepalive_intvl:75 /proc/sys/net/ipv4/tcp_keepalive_probes:9 /proc/sys/net/ipv4/tcp_keepalive_time:7200 /proc/sys/net/ipv4/tcp_low_latency:0 /proc/sys/net/ipv4/tcp_max_orphans:32768 /proc/sys/net/ipv4/tcp_max_syn_backlog:1024 /proc/sys/net/ipv4/tcp_max_tw_buckets:18 /proc/sys/net/ipv4/tcp_mem:98304131072 196608 /proc/sys/net/ipv4/tcp_moderate_rcvbuf:1 /proc/sys/net/ipv4/tcp_mtu_probing:0 /proc/sys/net/ipv4/tcp_no_metrics_save:0 /proc/sys/net/ipv4/tcp_orphan_retries:0 /proc/sys/net/ipv4/tcp_reordering:3 /proc/sys/net/ipv4/tcp_retrans_collapse:1 /proc/sys/net/ipv4/tcp_retries1:3 /proc/sys/net/ipv4/tcp_retries2:15 /proc/sys/net/ipv4/tcp_rfc1337:0 /proc/sys/net/ipv4/tcp_rmem:409687380 4194304 /proc/sys/net/ipv4/tcp_sack:1 /proc/sys/net/ipv4/tcp_slow_start_after_idle:1 /proc/sys/net/ipv4/tcp_stdurg:0 /proc/sys/net/ipv4/tcp_synack_retries:5 /proc/sys/net/ipv4/tcp_syncookies:1 /proc/sys/net/ipv4/tcp_syn_retries:5 /proc/sys/net/ipv4/tcp_timestamps:1 /proc/sys/net/ipv4/tcp_tso_win_divisor:3 /proc/sys/net/ipv4/tcp_tw_recycle:0 /proc/sys/net/ipv4/tcp_tw_reuse:0 /proc/sys/net/ipv4/tcp_window_scaling:1 /proc/sys/net/ipv4/tcp_wmem:409616384 4194304 /proc/sys/net/ipv4/tcp_workaround_signed_windows:0 -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc Chief Architect, OpenLDAP http://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] bonding: Improve IGMP join processing
Andy Gospodarek wrote: If we are easily able to differentiate between the multicast addresses in the mc_list as to which are for ipv4 and which are for ipv6 then it would be easy to call-out to something in the ipv6 mcast code when needed instead of always calling out to ipv4 code. I've been unable to figure out exactly what you're referring to in the code (bond_main.c), it seems to failover all multicast addresses, regardless of what address family they are. I might have missed something in 4K lines of code though? -Brian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, 6 Mar 2007 20:48:41 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: > On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote: > > Don't count the existing Newton-Raphson out. It turns out that to get enough > > precision for 32 bits, only 4 iterations are needed. By unrolling those, it > > gets much better timing. > > But did you fix the >2^43 bug too? It was caused by not doing x^2 in 64 bit. > > SGI has already shipped 10TB Altixen, so it's not entirely theoretical. > > -Andi > -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
Howard Chu a écrit : Eric Dumazet wrote: On Tuesday 06 March 2007 10:22, Howard Chu wrote: It's a combination of 2MSL and /proc/sys/net/ipv4/ip_local_port_range - on my system the default port range is 32768-61000. That means if I use up 28232 ports in less than 2MSL then everything stops. netstat will show that all the available port numbers are in TIME_WAIT state. And this is particularly bad because while waiting for the timeout, I can't initiate any new outbound connections of any kind at all - telnet, ssh, whatever, you have to wait for at least one port to free up. (Interesting denial of service there) Granted, I was running my test on 2.6.18, perhaps 2.6.21 behaves differently. Could you try this attached program and tell me whats happen ? $ gcc -O2 -o socktest socktest.c -lpthread $ time ./socktest -n 10 nb_conn=9 nb_accp=9 real0m5.058s user0m0.212s sys 0m4.844s (on my small machine, dell d610 :) ) On my Asus laptop (2GHz Pentium M) the first time I ran it it completed in about 51 seconds, with no errors. I then copied it to another machine and started it up there, and got connect errors right away. I then went back to my laptop and ran it again, and got errors that time. This is the laptop run with errors: viola:~/src> uname -a Linux viola 2.6.18.2-34-default #1 SMP Mon Nov 27 11:46:27 UTC 2006 i686 i686 i386 GNU/Linux viola:~/src> time ./socktest -n 100 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 nb_conn=993757 nb_accp=993757 1.408u 88.649s 1:42.76 87.6%0+0k 0+0io 0pf+0w This is my other system, an AMD X2 3800+ (dual core) mandolin:~/src> uname -a Linux mandolin 2.6.18.3SMP #9 SMP Sat Nov 25 10:08:51 PST 2006 x86_64 x86_64 x86_64 GNU/Linux mandolin:~/src> gcc -O2 -o socktest socktest.c -lpthread mandolin:~/src> time ./socktest -n 100 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 connect error 99 nb_conn=957088 nb_accp=957088 1.012u 630.991s 5:18.05 198.7% 0+0k 0+0io 0pf+0w Let me see, any chance you can try the prog on 2.6.20 ? If not, please send : grep . /proc/sys/net/ipv4/* Thank you - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.21 patch] unconditionally enable SYSFS_DEPRECATED
On Tue, Mar 06, 2007 at 12:10:09AM -0600, Matt Mackall wrote: > On Mon, Mar 05, 2007 at 08:03:50PM -0800, Greg KH wrote: > > On Mon, Mar 05, 2007 at 09:39:47PM -0600, Matt Mackall wrote: > > > On Mon, Mar 05, 2007 at 06:48:50PM -0800, Greg KH wrote: > > > > If so, can you disable the option and strace it to see what program is > > > > trying to access what? That will put the > > > > HAL/NetworkManager/libsysfs/distro script finger pointing to rest pretty > > > > quickly :) > > > > > > Ok, I've got straces of both good and bad (>5M each). Filtered out > > > random pointer values and the like, diffed, and filtered for /sys/, > > > and the result's still 1.5M. What should I be looking for? > > > > Failures when trying to read from /sys/class/net/ > > > > Or opening the directory and iterating over the subdirs in there. Or > > something like that. > > > > But the /sys/class/net/ stuff should hopefully help narrow it down. > > Works: > > 6857 open("/sys/class/net", > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13 > 6857 fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > 6857 fcntl64(13, F_SETFD, FD_CLOEXEC) = 0 > 6857 getdents64(13, /* 5 entries */, 4096) = 120 > 6857 readlink("/sys/class/net/eth1", 0x80a2450, 256) = -1 EINVAL > (Invalid argument) > 6857 readlink("/sys/class/net/eth1/device", > "../../../devices/pci:00/:00:1e.0/:02:02.0", 256) = 53 > 6857 readlink("/sys/class/net/lo", 0x80a2450, 256) = -1 EINVAL > (Invalid argument) > 6857 readlink("/sys/class/net/lo/device", 0x80a2450, 256) = -1 ENOENT > (No such > file or directory) > 6857 readlink("/sys/class/net/eth0", 0x80a2450, 256) = -1 EINVAL > (Invalid argument) > 6857 readlink("/sys/class/net/eth0/device", > "../../../devices/pci:00/:00:1e.0/:02:01.0", 256) = 53 > 6857 getdents64(13, /* 0 entries */, 4096) = 0 > 6857 close(13) = 0 > > Breaks: > > 3620 open("/sys/class/net", > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13 > 3620 fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > 3620 fcntl64(13, F_SETFD, FD_CLOEXEC) = 0 > 3620 getdents64(13, /* 5 entries */, 4096) = 120 > 3620 readlink("/sys/class/net/eth1", > "../../devices/pci:00/:00:1e.0/00\00:02:02.0/eth1", 256) = 55 > 3620 > readlink("/sys/devices/pci:00/:00:1e.0/:02:02.0/eth1/device", > 0x809e910, 256) = -1 ENOENT (No such file or directory) > 3620 readlink("/sys/class/net/lo", "../../devices/virtual/net/lo", > 256) = 28 > 3620 readlink("/sys/devices/virtual/net/lo/device", 0x809e960, 256) = > -1 ENOEN\T (No such file or directory) > 3620 readlink("/sys/class/net/eth0", > "../../devices/pci:00/:00:1e.0/00\00:02:01.0/eth0", 256) = 55 > 3620 > readlink("/sys/devices/pci:00/:00:1e.0/:02:01.0/eth0/device", > 0x809e960, 256) = -1 ENOENT (No such file or directory) > 3620 getdents64(13, /* 0 entries */, 4096) = 0 > 3620 close(13) = 0 Can you try the patch below? And enable CONFIG_SYSFS_DEPRECATED. It should cause HAL to see the network devices again, as the symlink is now back (it shouldn't have gone away, that was my fault...) I tried this with HAL 0.5.7, which is pretty old, and hal-device-manager shows my network devices properly. thanks for your patience, greg k-h --- drivers/base/core.c | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) --- gregkh-2.6.orig/drivers/base/core.c +++ gregkh-2.6/drivers/base/core.c @@ -584,17 +584,17 @@ int device_add(struct device *dev) if (dev->kobj.parent != &dev->class->subsys.kset.kobj) sysfs_create_link(&dev->class->subsys.kset.kobj, &dev->kobj, dev->bus_id); -#ifdef CONFIG_SYSFS_DEPRECATED if (parent) { sysfs_create_link(&dev->kobj, &dev->parent->kobj, "device"); +#ifdef CONFIG_SYSFS_DEPRECATED class_name = make_class_name(dev->class->name, &dev->kobj); if (class_name) sysfs_create_link(&dev->parent->kobj, &dev->kobj, class_name); - } #endif + } } if ((error = device_add_attrs(dev))) @@ -651,17 +651,17 @@ int device_add(struct device *dev) if (dev->kobj.parent != &dev->class->subsys.kset.kobj) sysfs_remove_link(&dev->class->subsys.kset.kobj, dev->bus_id); -#ifdef CONFIG_SYSFS_DEPRECATED if (parent) { +#ifdef CONFIG_SYSFS_DEPRECATED char *class_name = make_class_name(dev->class->name, &dev->kobj); if (class_name) sysfs_remove_
Re: [patch 2/2] div64_64: common code
On Tue, Mar 06, 2007 at 02:42:28AM -0800, [EMAIL PROTECTED] wrote: > Implement div64_64(): 64-bit by 64-bit division. Needed by networking (at > least). Your patch only implements div64_64() for 32-bit MIPS. Below patch adds the trivial 64-bit bits. Ralf Signed-off-by: Ralf Baechle <[EMAIL PROTECTED]> include/asm-mips/div64.h |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) Index: linux-mips/include/asm-mips/div64.h === --- linux-mips.orig/include/asm-mips/div64.h +++ linux-mips/include/asm-mips/div64.h @@ -1,6 +1,6 @@ /* * Copyright (C) 2000, 2004 Maciej W. Rozycki - * Copyright (C) 2003 Ralf Baechle + * Copyright (C) 2003, 07 Ralf Baechle ([EMAIL PROTECTED]) * * This file is subject to the terms and conditions of the GNU General Public * License. See the file "COPYING" in the main directory of this archive @@ -105,6 +105,11 @@ extern uint64_t div64_64(uint64_t divide (n) = __quot; \ __mod; }) +static inline uint64_t div64_64(uint64_t dividend, uint64_t divisor) +{ + return dividend / divisor; +} + #endif /* (_MIPS_SZLONG == 64) */ #endif /* _ASM_DIV64_H */ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support
On Tue, Mar 06, 2007 at 10:29:41AM -0800, Stephen Hemminger wrote: > Don't count the existing Newton-Raphson out. It turns out that to get enough > precision for 32 bits, only 4 iterations are needed. By unrolling those, it > gets much better timing. But did you fix the >2^43 bug too? SGI has already shipped 10TB Altixen, so it's not entirely theoretical. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] [PATCH 2.6.21-rc2] iw_cxgb3: Don't use mm after its freed in iwch_mmap().
Thanks, applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] [TCP]: Reworked recovery's TCPCB_LOST marking functions
* Ilpo J?rvinen <[EMAIL PROTECTED]> [070306 14:52]: > Complete rewrite for update_scoreboard and mark_head_lost. Couple > of hints became unnecessary because of this change. Changes > !TCPCB_TAGBITS check from the original to !(S|L) but it shouldn't > make a difference, and if there ever is an R only skb TCP will > mark it as LOST too. The algorithm uses some ideas presented by > David Miller and Baruch Even. > > Seqno lookups require fast lookups that are provided using > RB-tree patch(+abstraction) from DaveM. > > Signed-off-by: Ilpo J?rvinen <[EMAIL PROTECTED]> > --- > > I'm sorry about poorly chunked diff, is it possible to force git to > produce better (large block) diffs when a complete function is rewritten > from scratch in the patch (manpage of git-diff-files hints -B bit it did > not work, affects whole file rewrites only perhaps)? > > This probably conflicts with the other patches in the rbtree patchset of > DaveM (two first are required) because I tested this one (at least the > non-timedout part worked) and didn't want some random breakage > from the other patches (as such was reported). > > include/linux/tcp.h |6 - > include/net/tcp.h|6 + > net/ipv4/tcp_input.c | 194 > +- > net/ipv4/tcp_minisocks.c |1 > 4 files changed, 130 insertions(+), 77 deletions(-) > [snip] > + newtp->highest_sack = treq->snt_isn + 1; That's the only initialization that you have for highest_sack, I think that you should initialize it when a loss is detected to the start_seq of the first packet that wasn't acked. Didn't review the rest, still need to arrange a proper tree with preliminary patches to apply it on. Could you note the kernel you based it on and include all patches applied before it? Baruch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: when having to acquire an SA, ipsec drops the packet
On Tue, 6 Mar 2007, Joy Latten wrote: > > I saw something similar to this some time ago when testing various > > failure modes, and discused it with Herbert. > > > > IIRC, there's a larval SA which is not torn down properly by Racoon once > > the full SA is established, and the larval SA keeps resending until it > > times out. > > > Ok, good to know. > I thought a bit more about this last night but am not > sure best way to fix it. Perhaps a way to keep larval > SA around until all SAs resulting from xfrm_vec[xfrm_nr] > are established... oh well, just thinking out loud... :-) I think the solution, if this actually the problem, is for the userland code to maintain the SAs. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
Rick Jones wrote: This is probably not something that happens in real world deployments. I But it's not 60,000 concurrent connections, it's 60,000 within a 2 minute span. Sounds like a case of Doctor! Doctor! It hurts when I do this. I guess. In the cases where it matters, we use LDAP over Unix Domain Sockets instead of TCP. Smarter clients that do connection pooling would help too, but the fact that this even came to our attention is because not all clients out there are smart enough. Since we have an alternative that works, I'm not really worried about it. I just thought it was worthwhile to raise the question. I'm not saying this is a high priority problem, I only encountered it in a test scenario where I was deliberately trying to max out the server. Ideally the 2MSL parameter would be dynamically adjusted based on the route to the destination and the weights associated with those routes. In the simplest case, connections between machines on the same subnet (i.e., no router hops involved) should have a much smaller default value than connections that traverse any routers. I'd settle for a two-level setting - with no router hops, use the small value; with any router hops use the large value. With transparant bridging, nobody knows how long the datagram may be out there. Admittedly, the chances of a datagram living for a full two minutes these days is probably nil, but just being in the same IP subnet doesn't really mean anything when it comes to physical locality. Bridging isn't necessarily a problem though. The 2MSL timeout is designed to prevent problems from delayed packets that got sent through multiple paths. In a bridging setup you don't allow multiple paths, that's what STP is designed to prevent. If you want to configure a network that allows multiple paths, you need to use a router, not a bridge. SPECweb benchmarking has had to deal with the issue of attempted TIME_WAIT reuse going back to 1997. It deals with it by not relying on the client's configured local/anonymous/ephemeral port number range and instead making explicit bind() calls in the (more or less) entire unpriv port range (actually it may just be from 5000 to 65535 but still) That still doesn't solve the problem, it only ~doubles the available port range. That means it takes 0.6 seconds to trigger the problem instead of only 0.3 seconds... Now, if it weren't necessary to fully randomize the ISNs, the chances of a successful transition from TIME_WAIT to ESTABLISHED might be greater, but going back to the good old days of more or less purly clock driven ISN's isn't likely. In an environment where connections are opened and closed very quickly with only a small amount of data carried per connection, it might make sense to remember the last sequence number used on a port and use that as the floor of the next randomly generated ISN. Monotonically increasing sequence numbers aren't a security risk if there's still a randomly determined gap from one connection to the next. But I don't think it's necessary to consider this at the moment. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sunhttp://highlandsun.com/hyc Chief Architect, OpenLDAP http://www.openldap.org/project/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Revert accept queue backlog change.
Wei, I have to revert your change, it is incorrect as pointed out by other people here on netdev. BSD sockets basically define the 'backlog' parameter to listen() to mean "allow backlog + 1" connections to be queued to the socket. This allows a backlog parameter of "0" to allow 1 connection, and there are real applications which do this. diff --git a/include/net/sock.h b/include/net/sock.h index 849c7df..2c7d60c 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -426,7 +426,7 @@ static inline void sk_acceptq_added(struct sock *sk) static inline int sk_acceptq_is_full(struct sock *sk) { - return sk->sk_ack_backlog >= sk->sk_max_ack_backlog; + return sk->sk_ack_backlog > sk->sk_max_ack_backlog; } /* diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 51ca438..6069716 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -934,7 +934,7 @@ static long unix_wait_for_peer(struct sock *other, long timeo) sched = !sock_flag(other, SOCK_DEAD) && !(other->sk_shutdown & RCV_SHUTDOWN) && - (skb_queue_len(&other->sk_receive_queue) >= + (skb_queue_len(&other->sk_receive_queue) > other->sk_max_ack_backlog); unix_state_runlock(other); @@ -1008,7 +1008,7 @@ restart: if (other->sk_state != TCP_LISTEN) goto out_unlock; - if (skb_queue_len(&other->sk_receive_queue) >= + if (skb_queue_len(&other->sk_receive_queue) > other->sk_max_ack_backlog) { err = -EAGAIN; if (!timeo) @@ -1381,7 +1381,7 @@ restart: } if (unix_peer(other) != sk && - (skb_queue_len(&other->sk_receive_queue) >= + (skb_queue_len(&other->sk_receive_queue) > other->sk_max_ack_backlog)) { if (!timeo) { err = -EAGAIN; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Please revert disallowing zero listen queues
From: Rick Jones <[EMAIL PROTECTED]> Date: Tue, 06 Mar 2007 10:54:00 -0800 > > So we're not "disallowing" a backlog argument of zero to > > listen(). We'll accept that just fine, the only thing that > > happens is that you'll get what you ask for, that being > > no connections :-) > > I'm not sure where HP-UX inherited the 0 = 1 bit - perhaps from BSD, nor > am I sure there is official chapter and verse, but: > > > backlog is limited to the range of 0 to SOMAXCONN, which is defined in > . SOMAXCONN is currently set to 4096. If any other > value is specified, the system automatically assigns the closest value > within the range. A backlog of 0 specifies only 1 pending > connection is allowed at any given time. > > > I don't have a Solaris, BSD or AIX manpage for listen handy to check > them but would not be surprised to see they are similar. Ok, that seals the deal for me, I'll revert the change :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ixgb: Use ARRAY_SIZE macro when appropriate.
From: Ahmed S. Darwish <[EMAIL PROTECTED]> Signed-off-by: Ahmed S. Darwish <[EMAIL PROTECTED]> Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- drivers/net/ixgb/ixgb_param.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/drivers/net/ixgb/ixgb_param.c b/drivers/net/ixgb/ixgb_param.c index b27442a..c38ce73 100644 --- a/drivers/net/ixgb/ixgb_param.c +++ b/drivers/net/ixgb/ixgb_param.c @@ -245,8 +245,6 @@ ixgb_validate_option(int *value, struct ixgb_option *opt) return -1; } -#define LIST_LEN(l) (sizeof(l) / sizeof(l[0])) - /** * ixgb_check_options - Range Checking for Command Line Parameters * @adapter: board private structure @@ -335,7 +333,7 @@ ixgb_check_options(struct ixgb_adapter *adapter) .name = "Flow Control", .err = "reading default settings from EEPROM", .def = ixgb_fc_tx_pause, - .arg = { .l = { .nr = LIST_LEN(fc_list), + .arg = { .l = { .nr = ARRAY_SIZE(fc_list), .p = fc_list }} }; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ARP notify option
* Stephen Hemminger ([EMAIL PROTECTED]) wrote: > This adds another inet device option to enable gratuitous ARP > when device is brought up or address change. This is handy for > clusters or virtualization. This looks good. I'll test with Xen. What about the source addr selection? thanks, -chris - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] ARP notify option
Stephen Hemminger wrote: > This adds another inet device option to enable gratuitous ARP > when device is brought up or address change. This is handy for > clusters or virtualization. > Thanks Stephen. Haven't tested this yet, but it definitely cleans up a warty corner of netfront. J - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.21 patch] unconditionally enable SYSFS_DEPRECATED
On Tue, Mar 06, 2007 at 12:10:09AM -0600, Matt Mackall wrote: > On Mon, Mar 05, 2007 at 08:03:50PM -0800, Greg KH wrote: > > On Mon, Mar 05, 2007 at 09:39:47PM -0600, Matt Mackall wrote: > > > On Mon, Mar 05, 2007 at 06:48:50PM -0800, Greg KH wrote: > > > > If so, can you disable the option and strace it to see what program is > > > > trying to access what? That will put the > > > > HAL/NetworkManager/libsysfs/distro script finger pointing to rest pretty > > > > quickly :) > > > > > > Ok, I've got straces of both good and bad (>5M each). Filtered out > > > random pointer values and the like, diffed, and filtered for /sys/, > > > and the result's still 1.5M. What should I be looking for? > > > > Failures when trying to read from /sys/class/net/ > > > > Or opening the directory and iterating over the subdirs in there. Or > > something like that. > > > > But the /sys/class/net/ stuff should hopefully help narrow it down. > > Works: > > 6857 open("/sys/class/net", > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13 > 6857 fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > 6857 fcntl64(13, F_SETFD, FD_CLOEXEC) = 0 > 6857 getdents64(13, /* 5 entries */, 4096) = 120 > 6857 readlink("/sys/class/net/eth1", 0x80a2450, 256) = -1 EINVAL > (Invalid argument) > 6857 readlink("/sys/class/net/eth1/device", > "../../../devices/pci:00/:00:1e.0/:02:02.0", 256) = 53 > 6857 readlink("/sys/class/net/lo", 0x80a2450, 256) = -1 EINVAL > (Invalid argument) > 6857 readlink("/sys/class/net/lo/device", 0x80a2450, 256) = -1 ENOENT > (No such > file or directory) > 6857 readlink("/sys/class/net/eth0", 0x80a2450, 256) = -1 EINVAL > (Invalid argument) > 6857 readlink("/sys/class/net/eth0/device", > "../../../devices/pci:00/:00:1e.0/:02:01.0", 256) = 53 > 6857 getdents64(13, /* 0 entries */, 4096) = 0 > 6857 close(13) = 0 > > Breaks: > > 3620 open("/sys/class/net", > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 13 > 3620 fstat64(13, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > 3620 fcntl64(13, F_SETFD, FD_CLOEXEC) = 0 > 3620 getdents64(13, /* 5 entries */, 4096) = 120 > 3620 readlink("/sys/class/net/eth1", > "../../devices/pci:00/:00:1e.0/00\00:02:02.0/eth1", 256) = 55 > 3620 > readlink("/sys/devices/pci:00/:00:1e.0/:02:02.0/eth1/device", > 0x809e910, 256) = -1 ENOENT (No such file or directory) > 3620 readlink("/sys/class/net/lo", "../../devices/virtual/net/lo", > 256) = 28 > 3620 readlink("/sys/devices/virtual/net/lo/device", 0x809e960, 256) = > -1 ENOEN\T (No such file or directory) > 3620 readlink("/sys/class/net/eth0", > "../../devices/pci:00/:00:1e.0/00\00:02:01.0/eth0", 256) = 55 > 3620 > readlink("/sys/devices/pci:00/:00:1e.0/:02:01.0/eth0/device", > 0x809e960, 256) = -1 ENOENT (No such file or directory) > 3620 getdents64(13, /* 0 entries */, 4096) = 0 > 3620 close(13) = 0 Ah, that should be simple to fix in the kernel, give me an hour or so... thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH]: Dynamically sized routing cache hash table.
Eric Dumazet writes: > With 2^20 entries, your actual limit of 2^19 entries in root node will > probably show us quite different numbers for order-1,2,3,4... tnodes Yeep trie will get deeper and lookup more costly as insert and delete. The 2^19 was that was getting memory alloction problem that I never sorted out. > Yes, numbers you gave us basically showed a big root node, and mainly leaves > and very few tnodes. > > I was interested to see the distribution in case the root-node limit is hit, > and we load into the table a *lot* of entries. Maxlength etc... well maybe root-restriction should be removed and just have maxsize instead. Cheers --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Please revert disallowing zero listen queues
So we're not "disallowing" a backlog argument of zero to listen(). We'll accept that just fine, the only thing that happens is that you'll get what you ask for, that being no connections :-) I'm not sure where HP-UX inherited the 0 = 1 bit - perhaps from BSD, nor am I sure there is official chapter and verse, but: backlog is limited to the range of 0 to SOMAXCONN, which is defined in . SOMAXCONN is currently set to 4096. If any other value is specified, the system automatically assigns the closest value within the range. A backlog of 0 specifies only 1 pending connection is allowed at any given time. I don't have a Solaris, BSD or AIX manpage for listen handy to check them but would not be surprised to see they are similar. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] pcnet32: change to use netdev_priv
use netdev_priv() instead of dev->priv Signed-off-by: Thomas Bogendoerfer <[EMAIL PROTECTED]> Signed-off-by: Don Fry <[EMAIL PROTECTED]> --- --- linux-2.6.21-rc2/drivers/net/one.pcnet32.c 2007-03-06 10:48:37.0 -0800 +++ linux-2.6.21-rc2/drivers/net/pcnet32.c 2007-03-05 18:03:32.0 -0800 @@ -653,7 +653,7 @@ static void pcnet32_realloc_rx_ring(stru static void pcnet32_purge_rx_ring(struct net_device *dev) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); int i; /* free all allocated skbuffs */ @@ -681,7 +681,7 @@ static void pcnet32_poll_controller(stru static int pcnet32_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); unsigned long flags; int r = -EOPNOTSUPP; @@ -696,7 +696,7 @@ static int pcnet32_get_settings(struct n static int pcnet32_set_settings(struct net_device *dev, struct ethtool_cmd *cmd) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); unsigned long flags; int r = -EOPNOTSUPP; @@ -711,7 +711,7 @@ static int pcnet32_set_settings(struct n static void pcnet32_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); strcpy(info->driver, DRV_NAME); strcpy(info->version, DRV_VERSION); @@ -723,7 +723,7 @@ static void pcnet32_get_drvinfo(struct n static u32 pcnet32_get_link(struct net_device *dev) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); unsigned long flags; int r; @@ -743,19 +743,19 @@ static u32 pcnet32_get_link(struct net_d static u32 pcnet32_get_msglevel(struct net_device *dev) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); return lp->msg_enable; } static void pcnet32_set_msglevel(struct net_device *dev, u32 value) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); lp->msg_enable = value; } static int pcnet32_nway_reset(struct net_device *dev) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); unsigned long flags; int r = -EOPNOTSUPP; @@ -770,7 +770,7 @@ static int pcnet32_nway_reset(struct net static void pcnet32_get_ringparam(struct net_device *dev, struct ethtool_ringparam *ering) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); ering->tx_max_pending = TX_MAX_RING_SIZE; ering->tx_pending = lp->tx_ring_size; @@ -781,7 +781,7 @@ static void pcnet32_get_ringparam(struct static int pcnet32_set_ringparam(struct net_device *dev, struct ethtool_ringparam *ering) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); unsigned long flags; unsigned int size; ulong ioaddr = dev->base_addr; @@ -847,7 +847,7 @@ static int pcnet32_self_test_count(struc static void pcnet32_ethtool_test(struct net_device *dev, struct ethtool_test *test, u64 * data) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); int rc; if (test->flags == ETH_TEST_FL_OFFLINE) { @@ -868,7 +868,7 @@ static void pcnet32_ethtool_test(struct static int pcnet32_loopback_test(struct net_device *dev, uint64_t * data1) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); struct pcnet32_access *a = &lp->a; /* access to registers */ ulong ioaddr = dev->base_addr; /* card base I/O address */ struct sk_buff *skb;/* sk buff */ @@ -1047,7 +1047,7 @@ static int pcnet32_loopback_test(struct static void pcnet32_led_blink_callback(struct net_device *dev) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); struct pcnet32_access *a = &lp->a; ulong ioaddr = dev->base_addr; unsigned long flags; @@ -1064,7 +1064,7 @@ static void pcnet32_led_blink_callback(s static int pcnet32_phys_id(struct net_device *dev, u32 data) { - struct pcnet32_private *lp = dev->priv; + struct pcnet32_private *lp = netdev_priv(dev); struct pcnet32_access *a = &lp->a; ulong ioaddr = dev->base_addr; unsigned long flags; @@ -1109,7 +1109,7 @@ static int pcnet32_suspend(struct net_de int can_sleep) { int csr5; - struct pcnet32_private *lp = dev->priv; + stru
Re: [RFC] div64_64 support
Andi Kleen wrote: Let me see... You throw code like that and expect someone to actually understand it in one year, and be able to correct a bug ? To be honest I don't expect any bugs in this function. Please add something, an URL or even better a nice explanation, per favor... It's straight out of Hacker's delight which is referenced in the commit log. Referencing it in a comment would have been a better idea. -hpa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] ARP notify option
This adds another inet device option to enable gratuitous ARP when device is brought up or address change. This is handy for clusters or virtualization. Tested on a normal device (not Xen). Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- Documentation/networking/ip-sysctl.txt |6 ++ include/linux/inetdevice.h |2 ++ include/linux/sysctl.h |1 + net/ipv4/devinet.c | 16 4 files changed, 25 insertions(+) --- net-2.6.22.orig/Documentation/networking/ip-sysctl.txt 2007-03-05 14:35:31.0 -0800 +++ net-2.6.22/Documentation/networking/ip-sysctl.txt 2007-03-05 16:46:47.0 -0800 @@ -732,6 +732,12 @@ The max value from conf/{all,interface}/arp_ignore is used when ARP request is received on the {interface} +arp_notify - BOOLEAN + Define mode for notification of address and device changes. + 0 - (default): do nothing + 1 - Generate gratuitous arp replies when device is brought up + or hardware address changes. + arp_accept - BOOLEAN Define behavior when gratuitous arp replies are received: 0 - drop gratuitous arp frames --- net-2.6.22.orig/include/linux/inetdevice.h 2007-03-05 14:35:34.0 -0800 +++ net-2.6.22/include/linux/inetdevice.h 2007-03-05 16:46:47.0 -0800 @@ -26,6 +26,7 @@ int arp_announce; int arp_ignore; int arp_accept; + int arp_notify; int medium_id; int no_xfrm; int no_policy; @@ -84,6 +85,7 @@ #define IN_DEV_ARPFILTER(in_dev) (ipv4_devconf.arp_filter || (in_dev)->cnf.arp_filter) #define IN_DEV_ARP_ANNOUNCE(in_dev)(max(ipv4_devconf.arp_announce, (in_dev)->cnf.arp_announce)) #define IN_DEV_ARP_IGNORE(in_dev) (max(ipv4_devconf.arp_ignore, (in_dev)->cnf.arp_ignore)) +#define IN_DEV_ARP_NOTIFY(in_dev) (ipv4_devconf.arp_notify || (in_dev)->cnf.arp_notify) struct in_ifaddr { --- net-2.6.22.orig/include/linux/sysctl.h 2007-03-05 14:35:34.0 -0800 +++ net-2.6.22/include/linux/sysctl.h 2007-03-05 16:46:47.0 -0800 @@ -495,6 +495,7 @@ NET_IPV4_CONF_ARP_IGNORE=19, NET_IPV4_CONF_PROMOTE_SECONDARIES=20, NET_IPV4_CONF_ARP_ACCEPT=21, + NET_IPV4_CONF_ARP_NOTIFY=22, __NET_IPV4_CONF_MAX }; --- net-2.6.22.orig/net/ipv4/devinet.c 2007-03-05 14:35:34.0 -0800 +++ net-2.6.22/net/ipv4/devinet.c 2007-03-05 16:46:47.0 -0800 @@ -1089,6 +1089,14 @@ } } ip_mc_up(in_dev); + /* fall through */ + case NETDEV_CHANGEADDR: + if (IN_DEV_ARP_NOTIFY(in_dev)) + arp_send(ARPOP_REQUEST, ETH_P_ARP, +in_dev->ifa_list->ifa_address, +dev, +in_dev->ifa_list->ifa_address, +NULL, dev->dev_addr, NULL); break; case NETDEV_DOWN: ip_mc_down(in_dev); @@ -1495,6 +1503,14 @@ .proc_handler = &proc_dointvec, }, { + .ctl_name = NET_IPV4_CONF_ARP_NOTIFY, + .procname = "arp_notify", + .data = &ipv4_devconf.arp_notify, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec, + }, + { .ctl_name = NET_IPV4_CONF_NOXFRM, .procname = "disable_xfrm", .data = &ipv4_devconf.no_xfrm, - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] div64_64 support II
Andi Kleen wrote: The problem with these algorithms that tradoff one or more multiplies in order to avoid a divide is that they don't give anything and often lose when both multiplies and divides are emulated in software. Actually on rereading this: is there really any Linux port that emulates multiplies in software? I thought that was only done on really small microcontrollers or smart cards; but anything 32bit+ that runs Linux should have hardware multiply, shouldn't it? SPARC < v8 does multiplies using an MSTEP instruction. -hpa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP 2MSL on loopback
This is probably not something that happens in real world deployments. I But it's not 60,000 concurrent connections, it's 60,000 within a 2 minute span. Sounds like a case of Doctor! Doctor! It hurts when I do this. I'm not saying this is a high priority problem, I only encountered it in a test scenario where I was deliberately trying to max out the server. Ideally the 2MSL parameter would be dynamically adjusted based on the route to the destination and the weights associated with those routes. In the simplest case, connections between machines on the same subnet (i.e., no router hops involved) should have a much smaller default value than connections that traverse any routers. I'd settle for a two-level setting - with no router hops, use the small value; with any router hops use the large value. With transparant bridging, nobody knows how long the datagram may be out there. Admittedly, the chances of a datagram living for a full two minutes these days is probably nil, but just being in the same IP subnet doesn't really mean anything when it comes to physical locality. It's a combination of 2MSL and /proc/sys/net/ipv4/ip_local_port_range - on my system the default port range is 32768-61000. That means if I use up 28232 ports in less than 2MSL then everything stops. netstat will show that all the available port numbers are in TIME_WAIT state. And this is particularly bad because while waiting for the timeout, I can't initiate any new outbound connections of any kind at all - telnet, ssh, whatever, you have to wait for at least one port to free up. (Interesting denial of service there) SPECweb benchmarking has had to deal with the issue of attempted TIME_WAIT reuse going back to 1997. It deals with it by not relying on the client's configured local/anonymous/ephemeral port number range and instead making explicit bind() calls in the (more or less) entire unpriv port range (actually it may just be from 5000 to 65535 but still) Now, if it weren't necessary to fully randomize the ISNs, the chances of a successful transition from TIME_WAIT to ESTABLISHED might be greater, but going back to the good old days of more or less purly clock driven ISN's isn't likely. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Please revert disallowing zero listen queues
From: David Miller <[EMAIL PROTECTED]> Date: Tue, 06 Mar 2007 10:37:06 -0800 (PST) > Everything I've ever seen clearly states that a backlog of > zero means that zero connections are allowed. > > So we're not "disallowing" a backlog argument of zero to > listen(). We'll accept that just fine, the only thing that > happens is that you'll get what you ask for, that being > no connections :-) I'm not saying that a backlog of zero might mean allow one, in which case we do need to revert the change. Rather, I'm trying to clarify what is the real issue here as Gerrit's email implied that listen() with a zero backlog returns an error now, which is not true. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: wireless extensions vs. 64-bit architectures
On Tuesday 06 March 2007 18:13, Jean Tourrilhes wrote: > On Tue, Mar 06, 2007 at 02:27:26AM +0100, Johannes Berg wrote: > > Hi, > > > > Wtf! After struggling with some strange problems with zd1211rw (see some > > other mail) I decided to think again about what could possibly cause all > > the other problems I'm having with it. The kernel seems fine, but iw* > > userspace continually segfaults! And it also seems to be not > > reproducible for most other people, I'd asked on IRC once a while. > > > > Well. Some thinking and stracing and thinking later it occurred to me... > > Hell! wext is ioctls and includes this gem: > > > > struct iw_point > > { > > void __user *pointer; /* Pointer to the data (in user space) */ > > __u16 length; /* number of fields or size in bytes */ > > __u16 flags; /* Optional params */ > > }; > > > > Of course nobody ever tells you this, but it's used in a shitload of > > places. > > Yep, and it's even in fs/compat_ioctl.c. Hint, hint ;-) Ok, it is wrapping the following ioctls: HANDLE_IOCTL(SIOCGIWRANGE, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIWSPY, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWSPY, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIWTHRSPY, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWTHRSPY, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWAPLIST, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWSCAN, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIWESSID, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWESSID, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIWNICKN, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWNICKN, do_wireless_ioctl) HANDLE_IOCTL(SIOCSIWENCODE, do_wireless_ioctl) HANDLE_IOCTL(SIOCGIWENCODE, do_wireless_ioctl) What about SIOCSIWSCAN, SIOCSIWENCODEEXT, SIOCGIWENCODEEXT and some others that also use iw_point? -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux 2.6 Ipv4 routing enhancement (fwd)
Richard Kojedzinszky writes: > Sorry for sending the tgz with .svn included. And i did not send > instructions. > To do a test with fib_trie, issue > $ make clean all ROUTE_ALG=TRIE & ./try a > with fib_radix: > $ make clean all ROUTE_ALG=RADIX & ./try a > with fib_lef: > $ make clean all ROUTE_ALG=LEF SBBITS=4 & ./try a Thanks. First I'll use to do my testing in kernel context and in the forwarding path with full semantic match so it's not that easy to compare. But I'll take a look. BTW the you test so you do correct prefix matching? FYI. some old fib work on robur.slu.se # Look with just hlist /pub/Linux/net-development/fib_hlist/ # 24 bit hash lookup /pub/Linux/net-development/fib_hash2/ And some hlist/hash2/trie comparisons in: /pub/Linux/tmp/trie-talk-kth.pdf Cheers --ro - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html