Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 23:22, David Miller wrote: > Ok, ok, but don't we have queueing disciplines that need the timestamp > even on ingress? I grepped and I can't find any. The only non SIOCGTSTAMP users of the time stamp seem to be sunrpc and conntrack and I bet both can be converted over to jiffies without trouble. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 23:03, Alexey Kuznetsov wrote: > > > And do you have some other prefered way to solve this? Even if the timer > > was fast it would be still good to avoid it in the fast path when DHCPD > > is running. > > No. The way, which you suggested, seems to be the best. Ok. I also checked my desktop and for some reason I got a timestamp counter of 7 (and it doesn't even run client dhcp). Haven't investigated why yet, and I am still hoping it's not a leak. But that hints that trying to fix all of user space to not use the ioctl would have been probably too much work. > 1. It even does not disable possibility to record timestamp inside >driver, which Alan was afraid of. The sequence is: > > if (!skb->tstamp.off_sec) > net_timestamp(skb); > > 2. Maybe, netif_rx() should continue to get timestamp in netif_rx(). Hmm, there are still quite a lot users and even with netif_rx() you can have long delays from interrupt mitigation etc. % grep -rw netif_rx drivers/net/* | wc -l 253 > 3. NAPI already introduced almost the same inaccuracy. And it is really >silly to waste time getting timestamp in netif_receive_skb() a few >moments before the packet is delivered to a socket. > > 4. ...but clock source, which takes one of top lines in profiles >must be repaired yet. :-) It's being worked on, but it'll take some time. But even when TSC can be used it's still a good idea to not call gtod unnecessarily because it can be still relatively slow (e.g. on P4 RDTSC takes hundreds of cycles because it synchronizes the CPU). Also on some other non x86 platforms it is also relatively slow because they have to reach out to the chipset and every time you do that things get slow. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tcp: set congestion default through Kconfig
Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/ipv4/Kconfig | 39 +-- net/ipv4/sysctl_net_ipv4.c |7 +++ net/ipv4/tcp_cong.c|2 +- 3 files changed, 45 insertions(+), 3 deletions(-) Nice solution. Signed-off-by: Ian McDonald <[EMAIL PROTECTED]> -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add Broadcom PHY support
Amy Fong wrote: [PATCH] Add Broadcom PHY support This patch adds a driver to support the bcm5421s and bcm5461s PHY Kernel version: linux-2.6.18-rc6 Signed-off-by: Amy Fong <[EMAIL PROTECTED]> And... where are the users of this phy driver? Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] tcp: set congestion default through Kconfig
Bert's attempt was noble It showed your desire for the truth A simple path exists Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/ipv4/Kconfig | 39 +-- net/ipv4/sysctl_net_ipv4.c |7 +++ net/ipv4/tcp_cong.c|2 +- 3 files changed, 45 insertions(+), 3 deletions(-) diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index 90f9136..e922c3a 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -573,12 +573,47 @@ config TCP_CONG_VENO loss packets. See http://www.ntu.edu.sg/home5/ZHOU0022/papers/CPFu03a.pdf +choice + prompt "Default TCP congestion control" + default DEFAULT_BIC + help + Select the TCP congestion control that will be used by default + for all connections. + + config DEFAULT_BIC + bool "Bic" if TCP_CONG_BIC=y + + config DEFAULT_CUBIC + bool "Cubic" if TCP_CONG_CUBIC=y + + config DEFAULT_HTCP + bool "Htcp" if TCP_CONG_HTCP=y + + config DEFAULT_VEGAS + bool "Vegas" if TCP_CONG_VEGAS=y + + config DEFAULT_WESTWOOD + bool "Westwood" if TCP_CONG_WESTWOOD=y + + config DEFAULT_RENO + bool "Reno" + +endchoice + endmenu -config TCP_CONG_BIC - tristate +config DEFAULT_BIC depends on !TCP_CONG_ADVANCED default y +config DEFAULT_TCP_CONG + string + default "bic" if DEFAULT_BIC + default "cubic" if DEFAULT_CUBIC + default "htcp" if DEFAULT_HTCP + default "vegas" if DEFAULT_VEGAS + default "westwood" if DEFAULT_WESTWOOD + default "reno" if DEFAULT_RENO + source "net/ipv4/ipvs/Kconfig" diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 19b2071..52b6481 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -129,6 +129,13 @@ static int sysctl_tcp_congestion_control return ret; } +static __init void tcp_congestion_default(void) +{ + tcp_set_default_congestion_control(CONFIG_DEFAULT_TCP_CONG) +} + +late_initcall(tcp_congestion_default); + ctl_table ipv4_table[] = { { diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index 7ff2e42..af0aca1 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -48,7 +48,7 @@ int tcp_register_congestion_control(stru printk(KERN_NOTICE "TCP %s already registered\n", ca->name); ret = -EEXIST; } else { - list_add_rcu(&ca->list, &tcp_cong_list); + list_add_tail_rcu(&ca->list, &tcp_cong_list); printk(KERN_INFO "TCP %s registered\n", ca->name); } spin_unlock(&tcp_cong_list_lock); -- 1.4.1 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Remove powerpc specific parts of 3c509 driver
On powerpc and ppc, insl_ns and insl are identical as are outsl_ns and outsl, so remove the conditional use of insl_ns and outsl_ns. Signed-off-by: Stephen Rothwell <[EMAIL PROTECTED]> --- drivers/net/3c509.c |9 - 1 files changed, 0 insertions(+), 9 deletions(-) This is in anticipation of removing the insl_ns and outsl_ns definitions which are powerpc sepcific patches. -- Cheers, Stephen Rothwell[EMAIL PROTECTED] diff --git a/drivers/net/3c509.c b/drivers/net/3c509.c index cbdae54..add6381 100644 --- a/drivers/net/3c509.c +++ b/drivers/net/3c509.c @@ -879,11 +879,7 @@ #endif outw(skb->len, ioaddr + TX_FIFO); outw(0x00, ioaddr + TX_FIFO); /* ... and the packet rounded to a doubleword. */ -#ifdef __powerpc__ - outsl_ns(ioaddr + TX_FIFO, skb->data, (skb->len + 3) >> 2); -#else outsl(ioaddr + TX_FIFO, skb->data, (skb->len + 3) >> 2); -#endif dev->trans_start = jiffies; if (inw(ioaddr + TX_FREE) > 1536) @@ -1103,13 +1099,8 @@ el3_rx(struct net_device *dev) skb_reserve(skb, 2); /* Align IP on 16 byte */ /* 'skb->data' points to the start of sk_buff data area. */ -#ifdef __powerpc__ - insl_ns(ioaddr+RX_FIFO, skb_put(skb,pkt_len), - (pkt_len + 3) >> 2); -#else insl(ioaddr + RX_FIFO, skb_put(skb,pkt_len), (pkt_len + 3) >> 2); -#endif outw(RxDiscard, ioaddr + EL3_CMD); /* Pop top Rx packet. */ skb->protocol = eth_type_trans(skb,dev); -- 1.4.2.1 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xt_policy: remove dups in .family
sparse "defined twice" warning Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> --- net/netfilter/xt_policy.c |2 -- 1 file changed, 2 deletions(-) --- a/net/netfilter/xt_policy.c +++ b/net/netfilter/xt_policy.c @@ -171,7 +171,6 @@ static struct xt_match policy_match = { .match = match, .matchsize = sizeof(struct xt_policy_info), .checkentry = checkentry, - .family = AF_INET, .me = THIS_MODULE, }; @@ -181,7 +180,6 @@ static struct xt_match policy6_match = { .match = match, .matchsize = sizeof(struct xt_policy_info), .checkentry = checkentry, - .family = AF_INET6, .me = THIS_MODULE, }; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Regardless, kudos for running the test. The only thing missing is the -c and -C options to enable the CPU utilization measurements which will then give the service demand on a CPU time per transaction basis. Or was this a UP system that was taken to CPU saturation? It is my notebook. :-) Of course, cpu consumption is 100%. (Actally, netperf shows 100.10 :-)) Gotta love the accuracy. :) I will redo test on a real network. What range of -b should I test? I suppose that depends on your patience :) In theory, as you increase (eg double) the -b setting you should reach a point of diminishing returns wrt transaction rate. If you see that, and see the service demand flattening-out I'd say it is probably time to stop. I'm also not quite sure if "abc" needs to be disabled or not. I do know that I left-out one very important netperf option. The command line should be: netperf -t TCP_RR -H foo -- -b N -D where "-D" is added to set TCP_NODELAY. Otherwise, the ratio of transactions to data segments is fubar. That issue is also why I wonder about the setting of tcp_abc. [I have this quixotic pipedream about being able to --enable-burst, set -D and say that the number of TCP segments exchanged on the network is 2X the transaction count when request and response size are < MSS. The raison d'etre for this pipe dream is maximizing PPS with TCP_RR tests without _having_ to have hundreds if not thousands of simultaneous netperfs/connections - say with just as many netperfs/connections as there are CPUs or threads/strands in the system. It was while trying to make this pipe dream a reality I first noticed that HP-UX 11i, which normally has a very nice ACK avoidance heuristic, would send an immediate ACK if it received back-to-back sub-MSS segments - thus ruining my pipe dream when it came to HP-UX testing. Hapily, I noticed that "linux" didn't seem to be doing the same thing. Hence my tweaking when seeing this patch come along...] What i'm thinking about isn't so much about the latency I understand. Actually, I did those tests ages ago for a pure throughput case, when nothing goes in the opposite direction. I did not find a difference that time. And nobody even noticed that Linux sends ACKs _each_ small segment for unidirectional connections for all those years. :-) Not everyone looks very closely (alas, sometimes myself included). If all anyone does is look at throughput, until they CPU saturate they wouldn't notice. Heck, before netperf and TCP_RR tests, and sadly even still today, most people just look at how fast a single-connection, unidirectional data transfer goes and leave it at that :( Thankfully, the set of "most people" and "netdev" aren't completely overlapping. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
On Sun, 17 Sep 2006 16:51:50 +0200 bert hubert <[EMAIL PROTECTED]> wrote: > The original message Stephen reacts to below apparently never made it to the > list, it can be found here: http://ds9a.nl/tmp/module-policy.txt > > > Any body who builds in random stuff without thinking is being foolish. > > But, if you can think of a better configuration method that isn't too > > grotty, then go for it. > > The method I'm proposing is simple enough: > > 1) reno is always built-in > 2) it is the default tcp congestion policy No, Reno is unstable in high BDP > 3) loading/compiling-in additional tcp congestion policies only make them >available > 4) userspace is free to select a non-default tcp congestion policy at will > > The implementation might be as simple as making the *first* registered > congestion policy the default (instead of the last one) which would be reno, > as it is in tcp_cong.o, which is probably always loaded first (as the other > .o's need symbols that are in tcp_cong.o). > > Despite what you allege about my foolishness, I maintain that a kernel that > enables a *random policy* from the ones you compiled in, is not a sane > kernel. > > The default kernel should be as sane as possible, allowing the userspace > people (ie, me) to mess things up to their heart's desire. > > Bert > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Hello! > There isn't any sort of clever short-circuiting in loopback is there? No, from all that I know. > I > do like the convenience of testing things over loopback, but always fret > about not including drivers and actual hardware interrupts etc. Well, if the test is right, it should show cost of redundant ACKs. > Regardless, kudos for running the test. The only thing missing is the > -c and -C options to enable the CPU utilization measurements which will > then give the service demand on a CPU time per transaction basis. Or > was this a UP system that was taken to CPU saturation? It is my notebook. :-) Of course, cpu consumption is 100%. (Actally, netperf shows 100.10 :-)) I will redo test on a real network. What range of -b should I test? > What i'm thinking about isn't so much about the latency I understand. Actually, I did those tests ages ago for a pure throughput case, when nothing goes in the opposite direction. I did not find a difference that time. And nobody even noticed that Linux sends ACKs _each_ small segment for unidirectional connections for all those years. :-) Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] secid reconciliation-v02: Enforcement for SELinux
On Friday 08 September 2006 12:50 pm, Venkat Yekkirala wrote: > This defines SELinux enforcement of the 2 new LSM hooks. {snip} > +static int selinux_skb_policy_check(struct sk_buff *skb, unsigned short > family) +{ > + u32 xfrm_sid, trans_sid; > + int err; > + > + if (selinux_compat_net) > + return 1; > + > + err = selinux_xfrm_decode_session(skb, &xfrm_sid, 0); > + BUG_ON(err); First, any reason against including the "struct sock *" in the LSM hook? At a quick glance it looks like it is available at each place security_skb_policy_check() is invoked? If there are no objections I would like to see it included in the hook. Second, I wonder if it would be better to do a NetLabel/CIPSO query here using the xfrm_sid as the NetLabel "base_sid" instead of at the end of the function (see your comment)? This way we wouldn't have to duplicate the avc_has_perm() and security_transition_sid() calls for both xfrm and NetLabel. It just seems to be more inline with the whole secid reconciliation concept. I don't feel too strongly either way, I just thought it was worth exploring - thoughts? > + err = avc_has_perm(xfrm_sid, skb->secmark, SECCLASS_PACKET, > + PACKET__FLOW_IN, NULL); > + if (err) > + goto out; > + > + if (xfrm_sid) { > + err = security_transition_sid(xfrm_sid, skb->secmark, > + SECCLASS_PACKET, &trans_sid); > + if (err) > + goto out; > + > + skb->secmark = trans_sid; > + } > + > + /* See if CIPSO can flow in thru the current secmark here */ > + > +out: > + return err ? 0 : 1; > +}; -- paul moore linux security @ hp - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Tue, 19 Sep 2006, Alexey Kuznetsov wrote: Hello! Please think about it this way: suppose you haave a heavily loaded router and some network problem is to be diagnosed. You run tcpdump and suddenly router becomes overloaded (by switching to timestamp-it-all mode I am sorry. I cannot think that way. :-) Instead of attempts to scare, better resend original report, where you said how much performance degraded, I cannot find it. * I do see get_offset_pmtmr() in top lines of profile. That's scary enough. * I do not undestand what the hell dhcp needs timestamps for. * I do not listen any suggestions to screw up tcpdump with a sysctl. Kernel already implements much better thing then a sysctl. Do not want timestamps? Fix tcpdump, add an options, submit the patch to tcpdump maintainers. Not a big deal. if fireing up one program (however minor) can cause network performance to drop by >50% (based on the numbers reported earlier in this thread) that is a significant problem for sysadmins. yes tcpdump may be wrong in requesting timestamps (in most cases it probably is, but in some cases it's doing exactly what the sysadmin wants it to do), but I don't think that many sysadmins would expect this much of a performance hit. there should be some way to tell the system to ignore requests for timestamps so that a badly behaved program cannot cripple the system this way (and preferably something that doesn't require a full SELinux/capabilities implementation) David Lang - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Tue, Sep 19, 2006 at 02:00:38AM +0400, Alexey Kuznetsov wrote: > Hello! > > > Please think about it this way: > > suppose you haave a heavily loaded router and some network problem is to > > be diagnosed. You run tcpdump and suddenly router becomes overloaded (by > > switching to timestamp-it-all mode > > I am sorry. I cannot think that way. :-) > > Instead of attempts to scare, better resend original report, > where you said how much performance degraded, I cannot find it. > > * I do see get_offset_pmtmr() in top lines of profile. That's scary enough. I had it at the very top line. > * I do not undestand what the hell dhcp needs timestamps for. > * I do not listen any suggestions to screw up tcpdump with a sysctl. > Kernel already implements much better thing then a sysctl. > Do not want timestamps? Fix tcpdump, add an options, submit the > patch to tcpdump maintainers. Not a big deal. OK, point taken. It's better to patch tcpdump. > > Alexey > ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > Please think about it this way: > suppose you haave a heavily loaded router and some network problem is to > be diagnosed. You run tcpdump and suddenly router becomes overloaded (by > switching to timestamp-it-all mode I am sorry. I cannot think that way. :-) Instead of attempts to scare, better resend original report, where you said how much performance degraded, I cannot find it. * I do see get_offset_pmtmr() in top lines of profile. That's scary enough. * I do not undestand what the hell dhcp needs timestamps for. * I do not listen any suggestions to screw up tcpdump with a sysctl. Kernel already implements much better thing then a sysctl. Do not want timestamps? Fix tcpdump, add an options, submit the patch to tcpdump maintainers. Not a big deal. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > Ok, ok, but don't we have queueing disciplines that need the timestamp > even on ingress? I cannot find. ip_queue does. But it is just another user, not different of sockets. BTW in any case, any user of timestamp who sees 0, because skb was received before timestamping was enabled, has to calculate timestamp itself right in the place where Andi suggested. Seems, preparation to the change makes sense even without the change. :-) Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Alexey Kuznetsov wrote: Hello! Of course, number of ACK increases. It is the goal. :-) unpleasant increase in service demands on something like a "burst enabled" (./configure --enable-burst) netperf TCP_RR test: netperf -t TCP_RR -H foo -- -b N # N > 1 foo=localhost There isn't any sort of clever short-circuiting in loopback is there? I do like the convenience of testing things over loopback, but always fret about not including drivers and actual hardware interrupts etc. b patched orig 2 105874.83 105143.71 3 114208.53 114023.07 4 120493.99 120851.27 5 128087.48 128573.33 10 151328.48 151056.00 > Probably, the test is done wrong. But I see no difference. Regardless, kudos for running the test. The only thing missing is the -c and -C options to enable the CPU utilization measurements which will then give the service demand on a CPU time per transaction basis. Or was this a UP system that was taken to CPU saturation? to increase as a result. Pipelined HTTP would be like that, some NFS over TCP stuff too, maybe X traffic, X will be excited about better latency. What's about protocols not interested in latency, they will be a little happier, if transactions are processed asynchronously. What i'm thinking about isn't so much about the latency as it is the aggregate throughput a system can do with lots of these protocols/connections going at the same time. Hence the concern about increases in service demand. But actually, it is not about increasing/decreasing number of ACKs. It is about killing that pain in ass which we used to have because we pretended to be too smart. :) rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] secid reconciliation-v02: Repost patchset with updates
On Friday 08 September 2006 12:50 pm, Venkat Yekkirala wrote: > UPCOMING WORK: > > The following per the discussion at: > http://marc.theaimsgroup.com/?l=selinux&m=115755980516072&w=2 > > - Create IPSec SAs to be acquired with the creating sock's context as > opposed to that of the matching SPD rule, resulting in a simpler SPD as > well as policy. - Set peer_sid on tcp sockets to the reconciled secmark so > trusted applications can retrieve and service the data at the appropriate > context. Considering the discussions that have taken place on the SELinux list I think doing the work to set the peer_sid value on TCP sockets is an important part of the secid work and should be included in this patchset. I don't believe it would be that difficult, and it would make some of the code much cleaner/simpler I think. -- paul moore linux security @ hp - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
From: Alexey Kuznetsov <[EMAIL PROTECTED]> Date: Tue, 19 Sep 2006 01:03:21 +0400 > 1. It even does not disable possibility to record timestamp inside >driver, which Alan was afraid of. The sequence is: > > if (!skb->tstamp.off_sec) > net_timestamp(skb); > > 2. Maybe, netif_rx() should continue to get timestamp in netif_rx(). > > 3. NAPI already introduced almost the same inaccuracy. And it is really >silly to waste time getting timestamp in netif_receive_skb() a few >moments before the packet is delivered to a socket. > > 4. ...but clock source, which takes one of top lines in profiles >must be repaired yet. :-) Ok, ok, but don't we have queueing disciplines that need the timestamp even on ingress? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, Sep 18, 2006 at 06:50:22PM +0200, Andi Kleen wrote: > > I suppose in the worst case a sysctl like Vladimir asked for could be added, > but it would seem somewhat lame. > Please think about it this way: suppose you haave a heavily loaded router and some network problem is to be diagnosed. You run tcpdump and suddenly router becomes overloaded (by switching to timestamp-it-all mode), drops OSPF adjancecies etc. Users are angry, and you can't diagnose anything. But with impresise timestamps and maybe even with reordered packets you still have some traces to analyze. So, in this particular corner case it's not that lame. Or maybe patching tcpdump will do better? ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] please include in 2.6.18: e100 disable device on PCI error
It seems our mails crossed. On Mon, Sep 18, 2006 at 01:21:02PM -0700, Andrew Morton wrote: > > hm. I don't have this patch queued, but I _do_ have an equivalent patch > for e1000 queued; what's up with that? Nobody seems to have paid much > attention to the e1000 fix. I spotted the e100 patch in your "broken-out" patches earlier today, as a part of "git-netdev-all.patch" (where it had the right changelog and old acks) > If we can gather the appropriate acks quickly then I expect we can get both > of these into 2.6.18. That would be great! Thanks. --linas - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, Sep 18, 2006 at 01:27:57PM +0200, Andi Kleen wrote: > The codebase for timing (and lots of other things) is quite different > between 32bit and 64bit. You're really surprised it doesn't work if you do > such things? > It works, and after your remark above, I'm surprised. Dunno about slow TSC drift though, there was not enough time passed to detect it, and I hope we will have this problem soved in a better way before the drift becomes visible :) > > But the question is, why stock 2.6.18-rc7 could not use TSC on its own? > > x86-64 doesn't use the TSC when it deems it to not be reliable, which > is the case on your system. > Could it at least print something so that I know that using TSC was considered, but rejected? > > What hardware exactly. Doesn't it affect only CPU? And they are not > > know to fail before any other components. > > All hardware. It's basic physics. Hm, what other hardware is affected by idle=poll? Does this option ear out HDDs? ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > But that never happens right? Right. Well, not right. It happens. Simply because you get packet with newer timestamp after previous handler saw this packet and did some actions. I just do not see any bad consequences. > And do you have some other prefered way to solve this? Even if the timer > was fast it would be still good to avoid it in the fast path when DHCPD > is running. No. The way, which you suggested, seems to be the best. 1. It even does not disable possibility to record timestamp inside driver, which Alan was afraid of. The sequence is: if (!skb->tstamp.off_sec) net_timestamp(skb); 2. Maybe, netif_rx() should continue to get timestamp in netif_rx(). 3. NAPI already introduced almost the same inaccuracy. And it is really silly to waste time getting timestamp in netif_receive_skb() a few moments before the packet is delivered to a socket. 4. ...but clock source, which takes one of top lines in profiles must be repaired yet. :-) Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] EtherIP tunnel driver (RFC 3378)
On Mon, Sep 11, 2006 at 10:41:29PM +0200, Joerg Roedel wrote: > This driver implements the tunneling of Ethernet packets over IPv4 > networks for Linux. It uses the protocol defined in RFC 3378. Check out the thread "[PATCH][RFC] etherip: Ethernet-in-IPv4 tunneling" that was on netdev in January of 2005 -- a number of arguments against etherip (and for tunneling ethernet in GRE) were raised back then. One of the most significant ones, IMHO: > Another argument against etherip would be that OpenBSD apparently > mis-implemented etherip by putting the etherip version nibble in the > second nibble of the etherip header instead of the first, which would > probably prevent the linux and OpenBSD versions from interoperating, > negating the advantage of using etherip in the first place. cheers, Lennert - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] please include in 2.6.18: e100 disable device on PCI error
Andrew Morton wrote: On Mon, 18 Sep 2006 15:01:22 -0500 [EMAIL PROTECTED] (Linas Vepstas) wrote: Hi, Please apply the following one-liner patch to what will become the stable 2.6.18. This patch is low-risk because it affects only the PCI error recovery code, which dosn't run on most platforms (in particular, isn't invoked on current x86/ia64). This patch was originally sent on 29 June 2006 to fix a bug that showed up in an -mm build. The code from -mm made it into mainline, but this patch did not, and so we're unhappy. :-( Here's the original patch description: A recent patch in -mm3 titled gregkh-pci-pci-don-t-enable-device-if-already-enabled.patch causes pci_enable_device() to be a no-op if the kernel thinks that the device is already enabled. This change breaks the PCI error recovery mechanism in the e100 device driver, since, after PCI slot reset, the card is no longer enabled. This is a trivial fix for this problem. Tested. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Auke Kok <[EMAIL PROTECTED]> drivers/net/e100.c |1 + 1 file changed, 1 insertion(+) Index: linux-2.6.18-rc7-git1/drivers/net/e100.c === --- linux-2.6.18-rc7-git1.orig/drivers/net/e100.c 2006-09-18 14:21:49.0 -0500 +++ linux-2.6.18-rc7-git1/drivers/net/e100.c2006-09-18 14:24:50.0 -0500 @@ -2799,6 +2799,7 @@ static pci_ers_result_t e100_io_error_de /* Detach; put netif into state similar to hotplug unplug. */ netif_poll_enable(netdev); netif_device_detach(netdev); + pci_disable_device(pdev); /* Request a slot reset. */ return PCI_ERS_RESULT_NEED_RESET; hm. I don't have this patch queued, but I _do_ have an equivalent patch for e1000 queued; what's up with that? Nobody seems to have paid much attention to the e1000 fix. If we can gather the appropriate acks quickly then I expect we can get both of these into 2.6.18. Ack! for both, of course. I'm unsure what happened here, as I have this patch in my local tree. I suspect that it got merged into jeff's #upstream only somehow. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Hello! Of course, number of ACK increases. It is the goal. :-) > unpleasant increase in service demands on something like a "burst > enabled" (./configure --enable-burst) netperf TCP_RR test: > > netperf -t TCP_RR -H foo -- -b N # N > 1 foo=localhost b patched orig 2 105874.83 105143.71 3 114208.53 114023.07 4 120493.99 120851.27 5 128087.48 128573.33 10 151328.48 151056.00 Probably, the test is done wrong. But I see no difference. > to increase as a result. Pipelined HTTP would be like that, some NFS > over TCP stuff too, maybe X traffic, X will be excited about better latency. What's about protocols not interested in latency, they will be a little happier, if transactions are processed asynchronously. But actually, it is not about increasing/decreasing number of ACKs. It is about killing that pain in ass which we used to have because we pretended to be too smart. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] please include in 2.6.18: e100 disable device on PCI error
On Mon, 18 Sep 2006 15:01:22 -0500 [EMAIL PROTECTED] (Linas Vepstas) wrote: > > Hi, > > Please apply the following one-liner patch to > what will become the stable 2.6.18. This patch is > low-risk because it affects only the PCI error > recovery code, which dosn't run on most platforms > (in particular, isn't invoked on current x86/ia64). > > This patch was originally sent on 29 June 2006 > to fix a bug that showed up in an -mm build. > The code from -mm made it into mainline, but > this patch did not, and so we're unhappy. :-( > > Here's the original patch description: > > A recent patch in -mm3 titled > gregkh-pci-pci-don-t-enable-device-if-already-enabled.patch > causes pci_enable_device() to be a no-op if the kernel thinks > that the device is already enabled. This change breaks the > PCI error recovery mechanism in the e100 device driver, since, > after PCI slot reset, the card is no longer enabled. This is > a trivial fix for this problem. Tested. > > Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > Signed-off-by: Auke Kok <[EMAIL PROTECTED]> > > > drivers/net/e100.c |1 + > 1 file changed, 1 insertion(+) > > Index: linux-2.6.18-rc7-git1/drivers/net/e100.c > === > --- linux-2.6.18-rc7-git1.orig/drivers/net/e100.c 2006-09-18 > 14:21:49.0 -0500 > +++ linux-2.6.18-rc7-git1/drivers/net/e100.c 2006-09-18 14:24:50.0 > -0500 > @@ -2799,6 +2799,7 @@ static pci_ers_result_t e100_io_error_de > /* Detach; put netif into state similar to hotplug unplug. */ > netif_poll_enable(netdev); > netif_device_detach(netdev); > + pci_disable_device(pdev); > > /* Request a slot reset. */ > return PCI_ERS_RESULT_NEED_RESET; hm. I don't have this patch queued, but I _do_ have an equivalent patch for e1000 queued; what's up with that? Nobody seems to have paid much attention to the e1000 fix. If we can gather the appropriate acks quickly then I expect we can get both of these into 2.6.18. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] please include in 2.6.18: e100 disable device on PCI error
Hi, Please apply the following one-liner patch to what will become the stable 2.6.18. This patch is low-risk because it affects only the PCI error recovery code, which dosn't run on most platforms (in particular, isn't invoked on current x86/ia64). This patch was originally sent on 29 June 2006 to fix a bug that showed up in an -mm build. The code from -mm made it into mainline, but this patch did not, and so we're unhappy. :-( Here's the original patch description: A recent patch in -mm3 titled gregkh-pci-pci-don-t-enable-device-if-already-enabled.patch causes pci_enable_device() to be a no-op if the kernel thinks that the device is already enabled. This change breaks the PCI error recovery mechanism in the e100 device driver, since, after PCI slot reset, the card is no longer enabled. This is a trivial fix for this problem. Tested. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Signed-off-by: Auke Kok <[EMAIL PROTECTED]> drivers/net/e100.c |1 + 1 file changed, 1 insertion(+) Index: linux-2.6.18-rc7-git1/drivers/net/e100.c === --- linux-2.6.18-rc7-git1.orig/drivers/net/e100.c 2006-09-18 14:21:49.0 -0500 +++ linux-2.6.18-rc7-git1/drivers/net/e100.c2006-09-18 14:24:50.0 -0500 @@ -2799,6 +2799,7 @@ static pci_ers_result_t e100_io_error_de /* Detach; put netif into state similar to hotplug unplug. */ netif_poll_enable(netdev); netif_device_detach(netdev); + pci_disable_device(pdev); /* Request a slot reset. */ return PCI_ERS_RESULT_NEED_RESET; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
On Mon, Sep 18, 2006 at 11:53:09AM -0700, David Miller wrote: > > What would the desired default be, 'BIC' in all cases? > > And if BIC is not enabled in the configuration, then what? As the source notes "/* we'll always have reno */ ". This would make the policy: the default is "bic" if available, otherwise it is "reno", which is *always* available. But it is all up to you. I'm willing to do the leg work. Bert -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] secid reconciliation-v02: Invoke LSM hook for outbound traffic
On Fri, 8 Sep 2006, Venkat Yekkirala wrote: > -static void secmark_restore(struct sk_buff *skb) > +static unsigned int secmark_restore(struct sk_buff *skb, unsigned int > hooknum, > +const struct xt_target *target) > { > - if (!skb->secmark) { > - u32 *connsecmark; > - enum ip_conntrack_info ctinfo; > + u32 *psecmark; > + u32 secmark = 0; > + enum ip_conntrack_info ctinfo; > > - connsecmark = nf_ct_get_secmark(skb, &ctinfo); > - if (connsecmark && *connsecmark) > - if (skb->secmark != *connsecmark) > - skb->secmark = *connsecmark; > - } > + psecmark = nf_ct_get_secmark(skb, &ctinfo); > + if (psecmark) > + secmark = *psecmark; > + > + if (!secmark) > + return XT_CONTINUE; > + > + /* Set secmark on inbound and filter it on outbound */ > + if (hooknum == NF_IP_POST_ROUTING || hooknum == NF_IP6_POST_ROUTING) { > + if (!security_skb_netfilter_check(skb, secmark)) > + return NF_DROP; > + } else > + if (skb->secmark != secmark) > + skb->secmark = secmark; > + > + return XT_CONTINUE; > } Quite a lot of logic has changed here. With the original code, we only restored a secmark once for the lifetime of a packet or connetcion (to make behavior deterministic and security marks immutable in the face of arbitrarily complex iptables rules). With your patch, secmarks are always writable. What about packets on the OUTPUT hook? Also, we did not restore a 'null' (zero) secmark to the skb (while this should never happen with the current SECMARK target, there may be non-SELinux extensions later which set a null marking). Why not just do something like: psecmark = nf_ct_get_secmark(skb, &ctinfo); if (psecmark && *psecmark) { ... core of function ... } return XT_CONTINUE; I don't think you need the new secmark variable. You've also changed the logic for the dummy case of security_skb_netfilter_check() +static inline int security_skb_netfilter_check(struct sk_buff *skb, + u32 nf_secid) +{ + return 1; +} + This code does not now behave as it did originally. Keep in mind that SELinux is not the only user of SECMARK. (The documentation of the hook in security.h doesn't match the behavior, either -- it's (re-)labeling, not just filtering). I really don't know if connection tracking is the right place to be doing policy enforcment, either. Perhaps you should just do the relabeling here and enforcement later. The xt_SECMARK.c case has similar issues to all of the above. - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/7] secid reconciliation-v02: Enforcement for SELinux
> On Fri, 8 Sep 2006, Venkat Yekkirala wrote: > > > + if (selinux_compat_net) { > > + err = selinux_xfrm_decode_session(skb, &peersid, 0); > > + BUG_ON(err); > > I'm pretty sure this should not be a BUG_ON. IIUC, you want > to panic the > kernel because one of the nested SAs has a different security context. No, we are sending in 0 for the ckall param by which we are telling the function NOT to do any checks, but to simply set the return param peersid to the secid on the first xfrm if any and succeed by returning 0. Must not fail. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/7] secid reconciliation-v02: Invoke LSM hook for out bound traffic
> On Fri, 8 Sep 2006, Venkat Yekkirala wrote: > > > @@ -114,6 +128,9 @@ static struct xt_target xt_connsecmark_t > > .target = target, > > .targetsize = sizeof(struct > xt_connsecmark_target_info), > > .table = "mangle", > > + .hooks = (1 << NF_IP_LOCAL_IN) | > > + (1 << NF_IP_FORWARD) | > > + (1 << NF_IP_POST_ROUTING), > > Why have you added constraints on the hooks? > > This breaks a bunch of things. I was trying to restrict the module usage to these, but later realized I really needn't. Will take these out. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PATCH] NET: Fixes for net-2.6.19
* YOSHIFUJI Hideaki / ?$B5HF#1QL@ <[EMAIL PROTECTED]> 2006-09-19 00:08 > [NET]: Move netlink interface bits to linux/if_link.h. > > Moving netlink interface bits to linux/if.h is rather troublesome for > applications including both linux/if.h (which was changed to be included > from linux/rtnetlink.h automatically) and net/if.h. Agreed. > [NET]: Include new rtnetlink headers for userspace backward compatibility. > > Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> > > diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h > index 3a18add..8ec375c 100644 > --- a/include/linux/rtnetlink.h > +++ b/include/linux/rtnetlink.h > @@ -2,7 +2,12 @@ #ifndef __LINUX_RTNETLINK_H > #define __LINUX_RTNETLINK_H > > #include > +#ifndef __KERNEL__ > +/* Backward compatibility */ > #include > +#include > +#include > +#endif > > / > * Routing/neighbour discovery messages. Still acceptable but this gets ugly at some point. Applications using the interface should start making copies of the header version they use. > commit 55a08a9078b243a06223222735580df9e11a5fa6 > Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> > Date: Sun Sep 17 13:55:02 2006 +0900 > > [NET]: Put {IFLA,IFA,NDA,NDTA}_{RTA,PAYLOAD}() macro back. > > These macros are still used by userspace applications. Same here, it doesn't make sense to export macros only of functional value and used by userspace only. The same issue will pop up once all users have been converted to use the new netlink interface. Keeping the old interface around just so userspace doesn't have to make copies doesn't make sense. I think it's better to start fixing userspace than to try and keep headers source compatible. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] secid reconciliation-v02: Enforcement for SELinux
On Fri, 8 Sep 2006, Venkat Yekkirala wrote: > This defines SELinux enforcement of the 2 new LSM hooks. > I think this looks ok in general (I have a couple more technical issues), athough I believe that Stephen has some question about policy construction. Please rename these hooks: + * @skb_policy_check: + * @skb_netfilter_check to: skb_flow_in skb_flow_out - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
From: bert hubert <[EMAIL PROTECTED]> Date: Mon, 18 Sep 2006 17:40:48 +0200 > What would the desired default be, 'BIC' in all cases? And if BIC is not enabled in the configuration, then what? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] secid reconciliation-v02: Enforcement for SELinux
On Fri, 8 Sep 2006, Venkat Yekkirala wrote: > + if (selinux_compat_net) { > + err = selinux_xfrm_decode_session(skb, &peersid, 0); > + BUG_ON(err); I'm pretty sure this should not be a BUG_ON. IIUC, you want to panic the kernel because one of the nested SAs has a different security context. > + err = selinux_xfrm_decode_session(skb, &xfrm_sid, 0); > + BUG_ON(err); Same. -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] secid reconciliation-v02: Invoke LSM hook for outbound traffic
On Fri, 8 Sep 2006, Venkat Yekkirala wrote: > @@ -114,6 +128,9 @@ static struct xt_target xt_connsecmark_t > .target = target, > .targetsize = sizeof(struct xt_connsecmark_target_info), > .table = "mangle", > + .hooks = (1 << NF_IP_LOCAL_IN) | > + (1 << NF_IP_FORWARD) | > + (1 << NF_IP_POST_ROUTING), Why have you added constraints on the hooks? This breaks a bunch of things. > @@ -123,6 +140,9 @@ static struct xt_target xt_connsecmark_t > .target = target, > .targetsize = sizeof(struct xt_connsecmark_target_info), > .table = "mangle", > + .hooks = (1 << NF_IP6_LOCAL_IN) | > + (1 << NF_IP6_FORWARD) | > + (1 << NF_IP6_POST_ROUTING), > .me = THIS_MODULE, Ditto... > @@ -119,6 +129,9 @@ static struct xt_target xt_secmark_targe > .target = target, > .targetsize = sizeof(struct xt_secmark_target_info), > .table = "mangle", > + .hooks = (1 << NF_IP_LOCAL_IN) | > + (1 << NF_IP_FORWARD) | > + (1 << NF_IP_POST_ROUTING), > .me = THIS_MODULE, > }, > { > @@ -128,6 +141,9 @@ static struct xt_target xt_secmark_targe > .target = target, > .targetsize = sizeof(struct xt_secmark_target_info), > .table = "mangle", > + .hooks = (1 << NF_IP6_LOCAL_IN) | > + (1 << NF_IP6_FORWARD) | > + (1 << NF_IP6_POST_ROUTING), > .me = THIS_MODULE, > }, > }; > -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] secid reconciliation-v02: Label locally generated IPv4 traffic
On Fri, 8 Sep 2006, Venkat Yekkirala wrote: > diff --git a/include/net/ip.h b/include/net/ip.h > index 98f9084..4646c13 100644 > --- a/include/net/ip.h > +++ b/include/net/ip.h > @@ -48,6 +48,9 @@ struct ipcm_cookie > u32 addr; > int oif; > struct ip_options *opt; > +#ifdef CONFIG_SECURITY_NETWORK > + __u32 secid; > +#endif /* CONFIG_SECURITY_NETWORK */ > }; This field should be 'u32'. -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] secid reconciliation-v02: Invoke LSM hook for inbound traffic
On Fri, 8 Sep 2006, Venkat Yekkirala wrote: > -static inline int xfrm6_policy_check(struct sock *sk, int dir, struct sk_buff > *skb) > -{ > - return xfrm_policy_check(sk, dir, skb, AF_INET6); > + if (sk && sk->sk_policy[XFRM_POLICY_IN]) > + ret = __xfrm_policy_check(sk, dir, skb, family); > + else > + ret = (!xfrm_policy_count[dir] && !skb->sp) || > + (skb->dst->flags & DST_NOPOLICY) || > + __xfrm_policy_check(sk, dir, skb, family); > + > +#ifdef CONFIG_SECURITY_NETWORK > + if (ret) > + ret = security_skb_policy_check(skb, family); > +#endif /* CONFIG_SECURITY_NETWORK */ Why is this code ifdef'd when the function is conditionally compiled? > { > +#ifdef CONFIG_SECURITY_NETWORK > + return security_skb_policy_check(skb, family); > +#else > return 1; > +#endif /* CONFIG_SECURITY_NETWORK */ Ditto. -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
David Miller wrote: From: Rick Jones <[EMAIL PROTECTED]> Date: Tue, 05 Sep 2006 10:55:16 -0700 Is this really necessary? I thought that the problems with ABC were in trying to apply byte-based heuristics from the RFC(s) to a packet-oritented cwnd in the stack? This is receiver side, and helps a sender who does congestion control based upon packet counting like Linux does. It really is less related to ABC than Alexey implies, we've always had this kind of problem as I mentioned in previous talks in the past on this issue. For a connection receiving nothing but sub-MSS segments this is going to non-trivially increase the number of ACKs sent no? I would expect an unpleasant increase in service demands on something like a "burst enabled" (./configure --enable-burst) netperf TCP_RR test: netperf -t TCP_RR -H foo -- -b N # N > 1 to increase as a result. Pipelined HTTP would be like that, some NFS over TCP stuff too, maybe X traffic, other "transactional" workloads as well - maybe Tuxeudo. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)
Philippe De Muyter <[EMAIL PROTECTED]> : [...] > On Mon, Sep 18, 2006 at 07:11:29PM +0800, Jesse Huang wrote: > > Dear Philippe: > > (1)Because this is a patent issue, we are not allow to use it again, even it > > is in Data Sheet. > > I surmise this is only a concern for icplus as a hardware company. I'd rather avoid that any Linux user of the old sundance driver with a new ip100a chipset instantly has some problem with the said patent. Who would be responsible for it ? :o( -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 18:28, Alexey Kuznetsov wrote: > Hello! > > > Hmm, not sure how that could happen. Also is it a real problem > > even if it could? > > As I said, the problem is _occasionally_ theoretical. > > This would happen f.e. if packet socket handler was installed > after IP handler. Then tcpdump would get packet after it is processed > (acked/replied/forwarded). This would be disasterous, the results > are unparsable. But that never happens right? And do you have some other prefered way to solve this? Even if the timer was fast it would be still good to avoid it in the fast path when DHCPD is running. I suppose in the worst case a sysctl like Vladimir asked for could be added, but it would seem somewhat lame. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > Hmm, not sure how that could happen. Also is it a real problem > even if it could? As I said, the problem is _occasionally_ theoretical. This would happen f.e. if packet socket handler was installed after IP handler. Then tcpdump would get packet after it is processed (acked/replied/forwarded). This would be disasterous, the results are unparsable. I recall, the issue was discussed, and that time it looked more reasonable to solve problems of this kind taking timestamp once before it is seen by all the rest of stack. Who could expect that PIT nightmare is going to return? :-) > Then it has to use the ACPI pmtmr which is really really slow. > The overhead of that thing is so large that you can clearly see it in > the network benchmark. I see. Thank you. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 17:38, Alexey Kuznetsov wrote: > Hello! > > > For netdev: I'm more and more thinking we should just avoid the problem > > completely and switch to "true end2end" timestamps. This means don't > > time stamp when a packet is received, but only when it is delivered > > to a socket. > > This will work. > > From viewpoint of existing uses of timestamp by packet socket > this time is not worse. The only danger is violation of casuality > (when forwarded packet or reply packet gets timestamp earlier than > original packet). Hmm, not sure how that could happen. Also is it a real problem even if it could? > > handler runs. Then the problem above would completely disappear. > > Well, not completely. Too slow clock source remains too slow clock source. > If it is so slow, that it results in "performance degradation", it just > should not be used at all, even such pariah as tcpdump wants to be fast. > > Actually, I have a question. Why the subject is > "Network performance degradation from 2.6.11.12 to 2.6.16.20"? > I do not see beginning of the thread and cannot guess > why clock source degraded. :-) It's a long and sad story. Old kernels didn't disable the TSC on those boxes (multi core K8) and assumed they were synchronized for timing purposes. This initially mostly worked if you don't use cpufreq, but over a longer uptime the TSCs would drift against each other and timing would jump more and more between CPUs. On older versions of K8 this drift happened much slower (more aggressive power saving in HLT in newer steppings made it worse; that is why idle=poll helps) and could be often ignored. But technically it was still a bug there because it would could break timing after long uptimes. New multi socket K8 boxes are generally totally unusable with TSC because they use cpufreq and the TSCs can run at completely differently frequencies, which obviously doesn't give very good timing information if you assume the TSC is globally synchronized. That is why later kernels default to TSC off. The original plan was to use HPET then, which is slower than TSC, but still not that bad. But while most modern systems have a HPET timer somewhere in the chipset nearly all BIOS vendors "forgot" to describe it in the BIOS because Windows didn't use it and Linux can't find it because of that. Then it has to use the ACPI pmtmr which is really really slow. The overhead of that thing is so large that you can clearly see it in the network benchmark. The real fix long term is to change the timer subsystem to keep all TSC state per CPU, then it'll work on the K8s too. Unfortunately it's a moderately hard problem to make the result still fully monotonic. But people are working on it. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
On Mon, Sep 18, 2006 at 07:06:00AM -0700, David Miller wrote: > Any ordering scheme is wrong or unexpected for _somebody_. Look how I agree violently. Would you agree that it would be best to have a mechanism that explicitly sets a sane default, and does not rely on ordering? My implementation indeed broke your intentions, but would you be open to revamping things so the default policy is not dependent on load order? What would the desired default be, 'BIC' in all cases? Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Hello! > For netdev: I'm more and more thinking we should just avoid the problem > completely and switch to "true end2end" timestamps. This means don't > time stamp when a packet is received, but only when it is delivered > to a socket. This will work. >From viewpoint of existing uses of timestamp by packet socket this time is not worse. The only danger is violation of casuality (when forwarded packet or reply packet gets timestamp earlier than original packet). This pathology was main reason why timestamp is recorded early, before packet is demultiplexed in netif_receive_skb(). But it is not a practical problem: delivery to packet/raw sockets is occasionally placed _before_ delivery to real protocol handlers. > handler runs. Then the problem above would completely disappear. Well, not completely. Too slow clock source remains too slow clock source. If it is so slow, that it results in "performance degradation", it just should not be used at all, even such pariah as tcpdump wants to be fast. Actually, I have a question. Why the subject is "Network performance degradation from 2.6.11.12 to 2.6.16.20"? I do not see beginning of the thread and cannot guess why clock source degraded. :-) Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Monday 18 September 2006 17:19, Alan Cox wrote: > Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen: > > The only delay this would add would be the queueing time from the NIC > > to the softirq. Do you really think that is that bad? > > If you are trying to do things like network record/playback then you > want the minimal delay. But it's not minimal. Maybe it was long ago when the code was designed on a 3c509 but not with modern hardware: Think interrupt mitigation and NAPI. And with NAPI we tend to process the packets directly after they are fetched out of the RX queue, so there is practically no delay between driver seeing the packet and softirq seeing it. All the queuing is done either at hardware level or later at socket level. > There's a reason the original timestamp code > supported the hardware setting the timestamp itself - we actually had a > separare set of logic on a board that was doing the timestamping by > watching the IRQ line of the NIC chip. That would be fine too (because it will be likely fast), but unfortunately I don't know of any driver that does that. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] network namespaces: socket hashes
Andrey Savochkin wrote: Socket hash lookups are made within namespace. Hash tables are common for all namespaces, with additional permutation of indexes. Hi Andrey, why is the hash table common and not instanciated multiple times for each namespace like the routes ? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.
Hi David, > > although HIDP mouse movement doesn't seem to be appearing > > in /dev/input/mice on my G5, while the 'hcidump' output looks sane > > enough while I move it. > > Ew, that's because struct hidp_connadd_req is similarly buggered for > compat. Replacement HIDP patch to fix both at once... I didn't miss > anywhere where we actually change the hidp_connadd_req structure during > the call, did I? that looks ugly, but I assume there is no other way to solve this problem. I will go over all three patches and wrap them up nicely. Linus, will you accept these for inclusion before 2.6.18 final? Regards Marcel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PATCH] NET: Fixes for net-2.6.19
Hello. Please pull the following changesets available at: git://git.skbuff.net/gitroot/yoshfuji/net-2.6.19-20060918-net/ HEADLINES - [XFRM]: Do not add a state whose SPI is zero to the SPI hash. [NET]: Move netlink interface bits to linux/if_link.h. [NET]: Include linux/if_link.h directly from the source file. [NET]: Include new rtnetlink headers for userspace backward compatibility. [NET]: Put {IFLA,IFA,NDA,NDTA}_{RTA,PAYLOAD}() macro back. [NET] KBUILD: Add missing entries for new net headers. DIFFSTAT include/linux/Kbuild | 10 ++- include/linux/if.h| 130 -- include/linux/if_addr.h |3 + include/linux/if_link.h | 139 + include/linux/neighbour.h |7 ++ include/linux/rtnetlink.h |7 ++ net/bridge/br_netlink.c |1 net/core/rtnetlink.c |1 net/core/wireless.c |1 net/ipv6/addrconf.c |1 net/xfrm/xfrm_state.c | 11 ++-- 11 files changed, 172 insertions(+), 139 deletions(-) CHANGESETS -- commit 04b3eac83cccb7da663bd11a2b569f197bb3170e Author: Masahide NAKAMURA <[EMAIL PROTECTED]> Date: Sun Sep 17 13:54:53 2006 +0900 [XFRM]: Do not add a state whose SPI is zero to the SPI hash. SPI=0 is used for acquired IPsec SA and MIPv6 RO state. Such state should not be added to the SPI hash because we do not care about it on deleting path. Signed-off-by: Masahide NAKAMURA <[EMAIL PROTECTED]> Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 9f63edd..5f4a50e 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -96,9 +96,12 @@ static void xfrm_hash_transfer(struct hl nhashmask); hlist_add_head(&x->bysrc, nsrctable+h); - h = __xfrm_spi_hash(&x->id.daddr, x->id.spi, x->id.proto, - x->props.family, nhashmask); - hlist_add_head(&x->byspi, nspitable+h); + if (x->id.spi) { + h = __xfrm_spi_hash(&x->id.daddr, x->id.spi, + x->id.proto, x->props.family, + nhashmask); + hlist_add_head(&x->byspi, nspitable+h); + } } } @@ -622,7 +625,7 @@ static void __xfrm_state_insert(struct x h = xfrm_src_hash(&x->props.saddr, x->props.family); hlist_add_head(&x->bysrc, xfrm_state_bysrc+h); - if (xfrm_id_proto_match(x->id.proto, IPSEC_PROTO_ANY)) { + if (x->id.spi) { h = xfrm_spi_hash(&x->id.daddr, x->id.spi, x->id.proto, x->props.family); --- commit 44ad787528719604896754d1d05895d2dcfff88b Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Date: Sun Sep 17 13:54:55 2006 +0900 [NET]: Move netlink interface bits to linux/if_link.h. Moving netlink interface bits to linux/if.h is rather troublesome for applications including both linux/if.h (which was changed to be included from linux/rtnetlink.h automatically) and net/if.h. Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> diff --git a/include/linux/if.h b/include/linux/if.h index cd080d7..ab85ed0 100644 --- a/include/linux/if.h +++ b/include/linux/if.h @@ -212,134 +212,4 @@ struct ifconf #defineifc_buf ifc_ifcu.ifcu_buf /* buffer address */ #defineifc_req ifc_ifcu.ifcu_req /* array of structures */ -/* The struct should be in sync with struct net_device_stats */ -struct rtnl_link_stats -{ - __u32 rx_packets; /* total packets received */ - __u32 tx_packets; /* total packets transmitted*/ - __u32 rx_bytes; /* total bytes received */ - __u32 tx_bytes; /* total bytes transmitted */ - __u32 rx_errors; /* bad packets received */ - __u32 tx_errors; /* packet transmit problems */ - __u32 rx_dropped; /* no space in linux buffers*/ - __u32 tx_dropped; /* no space available in linux */ - __u32 multicast; /* multicast packets received */ - __u32 collisions; - - /* detailed rx_errors: */ - __u32 rx_length_errors; - __u32 rx_over_errors; /* receiver ring buff overflow */ - __u32 rx_crc_errors; /* recved pkt with crc error*/ - __u32 rx_frame_errors;/* recv'd frame alignment error */ - __u32 rx_fifo_errors; /* recv'r fifo overrun */ - __u32 rx_miss
[GIT PATCH] IPV6: Updates for net-2.6.19
Hello. Please pull the following changesets available at: git://git.skbuff.net/gitroot/yoshfuji/net-2.6.19-20060918-inet6/ HEADLINES - [IPV6] NDISC: Handle NDP messages to proxied addresses. [IPV6]: Don't forward packets to proxied link-local address. [IPV6] NDISC: Avoid updating neighbor cache for proxied address in receiving NA. [IPV6] NDISC: Set per-entry is_router flag in Proxy NA. [IPV6] NDISC: Add proxy_ndp sysctl. [IPV6] ADDRCONF: Convert addrconf_lock to RCU. DIFFSTAT Documentation/networking/ip-sysctl.txt |3 ++ include/linux/ipv6.h |2 + include/linux/sysctl.h |1 + include/net/addrconf.h | 10 ++--- include/net/if_inet6.h |1 + include/net/neighbour.h|1 + net/core/neighbour.c | 11 -- net/core/pktgen.c |4 +- net/ipv6/addrconf.c| 57 ++--- net/ipv6/anycast.c |4 +- net/ipv6/ip6_output.c | 62 net/ipv6/ipv6_syms.c |1 - net/ipv6/ndisc.c | 29 +-- net/sctp/ipv6.c|6 ++- 14 files changed, 150 insertions(+), 42 deletions(-) CHANGESETS -- commit 9b06d4f4593cb15872e4351e3b1bdbf69c279f68 Author: Masahide NAKAMURA <[EMAIL PROTECTED]> Date: Sun Sep 17 13:55:07 2006 +0900 [IPV6] NDISC: Handle NDP messages to proxied addresses. It is required to respond to NDP messages sent directly to the "target" unicast address. Proxying node (router) is required to handle such messages. To achieve this, check if the packet in forwarding patch is NDP message. With this patch, the proxy neighbor entries are always looked up in forwarding path. We may want to optimize further. Based on MIPL2 kernel patch. Signed-off-by: Ville Nuorvala <[EMAIL PROTECTED]> Signed-off-by: Masahide NAKAMURA <[EMAIL PROTECTED]> Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index c14ea1e..0f56e9e 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -308,6 +308,46 @@ static int ip6_call_ra_chain(struct sk_b return 0; } +static int ip6_forward_proxy_check(struct sk_buff *skb) +{ + struct ipv6hdr *hdr = skb->nh.ipv6h; + u8 nexthdr = hdr->nexthdr; + int offset; + + if (ipv6_ext_hdr(nexthdr)) { + offset = ipv6_skip_exthdr(skb, sizeof(*hdr), &nexthdr); + if (offset < 0) + return 0; + } else + offset = sizeof(struct ipv6hdr); + + if (nexthdr == IPPROTO_ICMPV6) { + struct icmp6hdr *icmp6; + + if (!pskb_may_pull(skb, skb->nh.raw + offset + 1 - skb->data)) + return 0; + + icmp6 = (struct icmp6hdr *)(skb->nh.raw + offset); + + switch (icmp6->icmp6_type) { + case NDISC_ROUTER_SOLICITATION: + case NDISC_ROUTER_ADVERTISEMENT: + case NDISC_NEIGHBOUR_SOLICITATION: + case NDISC_NEIGHBOUR_ADVERTISEMENT: + case NDISC_REDIRECT: + /* For reaction involving unicast neighbor discovery +* message destined to the proxied address, pass it to +* input function. +*/ + return 1; + default: + break; + } + } + + return 0; +} + static inline int ip6_forward_finish(struct sk_buff *skb) { return dst_output(skb); @@ -362,6 +402,11 @@ int ip6_forward(struct sk_buff *skb) return -ETIMEDOUT; } + if (pneigh_lookup(&nd_tbl, &hdr->daddr, skb->dev, 0)) { + if (ip6_forward_proxy_check(skb)) + return ip6_input(skb); + } + if (!xfrm6_route_forward(skb)) { IP6_INC_STATS(IPSTATS_MIB_INDISCARDS); goto drop; --- commit 6d57c4f060b4d327a40bed8cd5053ba812cb0cb6 Author: Masahide NAKAMURA <[EMAIL PROTECTED]> Date: Sun Sep 17 13:55:09 2006 +0900 [IPV6]: Don't forward packets to proxied link-local address. Proxying router can't forward traffic sent to link-local address, so signal the sender and discard the packet. This behavior is clarified by Mobile IPv6 specification (RFC3775) but might be required for all proxying router. Based on MIPL2 kernel patch. Signed-off-by: Ville Nuorvala <[EMAIL PROTECTED]> Signed-off-by: Masahide NAKAMURA <[EMAIL PROTECTED]> Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> diff --git a/n
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Ar Llu, 2006-09-18 am 16:29 +0200, ysgrifennodd Andi Kleen: > The only delay this would add would be the queueing time from the NIC > to the softirq. Do you really think that is that bad? If you are trying to do things like network record/playback then you want the minimal delay. There's a reason the original timestamp code supported the hardware setting the timestamp itself - we actually had a separare set of logic on a board that was doing the timestamping by watching the IRQ line of the NIC chip. Alan - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
> > People who run tcpdump want "wire" timestamps as close as possible. > Yes, things get delayed with the IRQ path, DMA delays, IRQ > mitigation and whatnot, but it's an order of magnitude worse if > you delay to user read() since that introduces also the delay of > the packet copies to userspace which are significantly larger than > these hardware level delays. If tcpdump gets swapped out, the > timestamp delay can be on the order of several seconds making it > totally useless. My proposal wasn't to delay to user read, just to do the time stamp in socket context. This means as soon as packet or RAW/UDP have looked up the socket and can check a per socket flag do the time stamp. The only delay this would add would be the queueing time from the NIC to the softirq. Do you really think that is that bad? -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.
On Mon, 2006-09-18 at 12:38 +0200, Marcel Holtmann wrote: > it seems that HIDP and CMTP will have the same problem. Finally, the CMTP version... this one is untested. [CMTP] Fix compat CMTPGETCONNLIST ioctl Signed-off-by: David Woodhouse <[EMAIL PROTECTED]> diff --git a/net/bluetooth/cmtp/sock.c b/net/bluetooth/cmtp/sock.c index 10ad7fd..68e1290 100644 --- a/net/bluetooth/cmtp/sock.c +++ b/net/bluetooth/cmtp/sock.c @@ -34,6 +34,7 @@ #include #include #include #include +#include #include #include @@ -137,11 +138,44 @@ static int cmtp_sock_ioctl(struct socket return -EINVAL; } +#ifdef CONFIG_COMPAT +static int cmtp_sock_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) +{ + + if (cmd == CMTPGETCONNLIST) { + struct cmtp_connlist_req cl; + uint32_t uci; + int err; + + if (get_user(cl.cnum, (uint32_t __user *)arg) || + get_user(uci, (u32 __user *)(arg+4))) + return -EFAULT; + + cl.ci = compat_ptr(uci); + + if (cl.cnum <= 0) + return -EINVAL; + + err = cmtp_get_connlist(&cl); + + if (!err && put_user(cl.cnum, (uint32_t __user *)arg)) + err = -EFAULT; + + return err; + } + + return cmtp_sock_ioctl(sock, cmd, arg); +} +#endif + static const struct proto_ops cmtp_sock_ops = { .family = PF_BLUETOOTH, .owner = THIS_MODULE, .release= cmtp_sock_release, .ioctl = cmtp_sock_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = cmtp_sock_compat_ioctl, +#endif .bind = sock_no_bind, .getname= sock_no_getname, .sendmsg= sock_no_sendmsg, -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.
On Mon, 2006-09-18 at 14:25 +0100, David Woodhouse wrote: > although HIDP mouse movement doesn't seem to be appearing > in /dev/input/mice on my G5, while the 'hcidump' output looks sane > enough while I move it. Ew, that's because struct hidp_connadd_req is similarly buggered for compat. Replacement HIDP patch to fix both at once... I didn't miss anywhere where we actually change the hidp_connadd_req structure during the call, did I? - [HIDP] Fix compat HIDPGETCONNLIST and HIDPCONNADD ioctls. Signed-off-by: David Woodhouse <[EMAIL PROTECTED]> diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c index 099646e..e01fdc5 100644 --- a/net/bluetooth/hidp/sock.c +++ b/net/bluetooth/hidp/sock.c @@ -35,6 +35,7 @@ #include #include #include #include +#include #include #include "hidp.h" @@ -143,11 +144,87 @@ static int hidp_sock_ioctl(struct socket return -EINVAL; } +#ifdef CONFIG_COMPAT +struct compat_hidp_connadd_req { + int ctrl_sock;// Connected control socket + int intr_sock;// Connteted interrupt socket + __u16 parser; + __u16 rd_size; + compat_uptr_t rd_data; + __u8 country; + __u8 subclass; + __u16 vendor; + __u16 product; + __u16 version; + __u32 flags; + __u32 idle_to; + char name[128]; +}; + +static int hidp_sock_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) +{ + if (cmd == HIDPGETCONNLIST) { + struct hidp_connlist_req cl; + uint32_t uci; + int err; + + if (get_user(cl.cnum, (uint32_t __user *)arg) || + get_user(uci, (u32 __user *)(arg+4))) + return -EFAULT; + + cl.ci = compat_ptr(uci); + + if (cl.cnum <= 0) + return -EINVAL; + + err = hidp_get_connlist(&cl); + + if (!err && put_user(cl.cnum, (uint32_t __user *)arg)) + err = -EFAULT; + + return err; + } else if (cmd == HIDPCONNADD) { + struct compat_hidp_connadd_req ca; + struct hidp_connadd_req __user *uca; + + uca = compat_alloc_user_space(sizeof(*uca)); + + if (copy_from_user(&ca, (void *)arg, sizeof(ca))) + return -EFAULT; + + if (put_user(ca.ctrl_sock, &uca->ctrl_sock) + || put_user(ca.intr_sock, &uca->intr_sock) + || put_user(ca.parser, &uca->parser) + || put_user(ca.rd_size, &uca->parser) + || put_user(compat_ptr(ca.rd_data), &uca->rd_data) + || put_user(ca.country, &uca->country) + || put_user(ca.subclass, &uca->subclass) + || put_user(ca.vendor, &uca->vendor) + || put_user(ca.product, &uca->product) + || put_user(ca.version, &uca->version) + || put_user(ca.flags, &uca->flags) + || put_user(ca.idle_to, &uca->idle_to) + || copy_to_user(&uca->name[0], &ca.name[0], 128)) + return -EFAULT; + + arg = (unsigned long)uca; + /* Fall through. We don't actually write back any _changes_ + to the structure anyway, so there's no need to copy back + into the original compat version */ + } + + return hidp_sock_ioctl(sock, cmd, arg); +} +#endif + static const struct proto_ops hidp_sock_ops = { .family = PF_BLUETOOTH, .owner = THIS_MODULE, .release= hidp_sock_release, .ioctl = hidp_sock_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = hidp_sock_compat_ioctl, +#endif .bind = sock_no_bind, .getname= sock_no_getname, .sendmsg= sock_no_sendmsg, -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
From: Andi Kleen <[EMAIL PROTECTED]> Date: 18 Sep 2006 11:58:21 +0200 > For netdev: I'm more and more thinking we should just avoid the > problem completely and switch to "true end2end" timestamps. This > means don't time stamp when a packet is received, but only when it > is delivered to a socket. The timestamp at receiving is a lie > anyways because the network hardware can add an arbitary long delay > before the driver interrupt handler runs. Then the problem above > would completely disappear. I don't think this is wise. People who run tcpdump want "wire" timestamps as close as possible. Yes, things get delayed with the IRQ path, DMA delays, IRQ mitigation and whatnot, but it's an order of magnitude worse if you delay to user read() since that introduces also the delay of the packet copies to userspace which are significantly larger than these hardware level delays. If tcpdump gets swapped out, the timestamp delay can be on the order of several seconds making it totally useless. Andi, you will need to find another solution to this problem :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
From: bert hubert <[EMAIL PROTECTED]> Date: Mon, 18 Sep 2006 11:59:36 +0200 > I've tested this patch and it does the job for me, reno is now the default, > even when more advanced options are compiled in, but the rest is still > available. This breaks our intention that when TCP_CONG_ADVANCED is not set, BIC is the default since that is the default congestion control algorithm we want users to get. When TCP_CONG_ADVANCED is disabled, we turn on TCP_CONG_BIC, and your changes cause reno to be the default algorithm in that build case. That's not what we want. Any ordering scheme is wrong or unexpected for _somebody_. Look how easy it was for you to break the BIC default we had in place. To make things sensible for you, your patch causes everyone else got the wrong default. Therefore any ordering scheme is by definition arbitrary and no ordering is better than any other one. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
From: Alexey Kuznetsov <[EMAIL PROTECTED]> Date: Mon, 18 Sep 2006 14:37:05 +0400 > > It looks perfectly fine to me, would you like me to apply it > > Alexey? > > Yes, I think it is safe. Ok, I'll put this into net-2.6.19 for now. Thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.
On Mon, 2006-09-18 at 12:38 +0200, Marcel Holtmann wrote: > Hi David, > > > We were making no attempt to deal with the fact that a structure with a > > uint32_t followed by a pointer is going to be _different_ for 32-bit and > > 64-bit userspace. Any 32-bit process trying to use BNEPGETCONNLIST will > > be failing with -EFAULT if it's lucky; suffering from having the > > connection list dumped at a random address if it's not. > > it seems that HIDP and CMTP will have the same problem. Indeed they do. This patch fixes 'hidd -l'... although HIDP mouse movement doesn't seem to be appearing in /dev/input/mice on my G5, while the 'hcidump' output looks sane enough while I move it. - [HIDP] Fix compat HIDPGETCONNLIST ioctl. Signed-off-by: David Woodhouse <[EMAIL PROTECTED]> diff --git a/net/bluetooth/hidp/sock.c b/net/bluetooth/hidp/sock.c index 099646e..af5a21c 100644 --- a/net/bluetooth/hidp/sock.c +++ b/net/bluetooth/hidp/sock.c @@ -35,6 +35,7 @@ #include #include #include #include +#include #include #include "hidp.h" @@ -143,11 +144,42 @@ static int hidp_sock_ioctl(struct socket return -EINVAL; } +#ifdef CONFIG_COMPAT +static int hidp_sock_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) +{ + if (cmd == HIDPGETCONNLIST) { + struct hidp_connlist_req cl; + uint32_t uci; + int err; + + if (get_user(cl.cnum, (uint32_t __user *)arg) || + get_user(uci, (u32 __user *)(arg+4))) + return -EFAULT; + + cl.ci = compat_ptr(uci); + + if (cl.cnum <= 0) + return -EINVAL; + + err = hidp_get_connlist(&cl); + + if (!err && put_user(cl.cnum, (uint32_t __user *)arg)) + err = -EFAULT; + + return err; + } + return hidp_sock_ioctl(sock, cmd, arg); +} +#endif + static const struct proto_ops hidp_sock_ops = { .family = PF_BLUETOOTH, .owner = THIS_MODULE, .release= hidp_sock_release, .ioctl = hidp_sock_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = hidp_sock_compat_ioctl, +#endif .bind = sock_no_bind, .getname= sock_no_getname, .sendmsg= sock_no_sendmsg, -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)
Hi Jesse, On Mon, Sep 18, 2006 at 07:11:29PM +0800, Jesse Huang wrote: > Dear Philippe: > (1)Because this is a patent issue, we are not allow to use it again, even it > is in Data Sheet. I surmise this is only a concern for icplus as a hardware company. The sundance driver in Linux is meant to work also with the previous versions of the chip (Sundance, Kendin, D-Link). If you wish you can make it clear that those registers have disappeared or have no effect in the icplus 100A version. > > (2)Ok, sorry for this, I will add it back. Thanks > > Should I resent those 4 patches? Or generate this as a new patch? I do not know about the other patches, but for this one of course you should Philippe > > Thanks very much! > > Best Regards, > Jesse Huang. > > - Original Message - > From: "Philippe De Muyter" <[EMAIL PROTECTED]> > To: "Jesse Huang" <[EMAIL PROTECTED]> > Cc: > Sent: Monday, September 18, 2006 5:41 PM > Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler) > > > On Mon, Sep 18, 2006 at 11:41:09AM +0800, Jesse Huang wrote: > > Dear Philippe: > > > > (1) We are not allow to support register TxStartThresh and, RxEarlyThresh, > > so > > we remove it. > > Could you develop ? > - What do you mean by `We are not allow' > - Is it specific to the IP100A chip ? > > Those register are documented in the Sundance Technology ST201 Data Sheet > and when modified with fine-tuned values, they can have a real positive > effect on the overall throughput on a loaded system. > > > > > (2) Your consideration is right. But reset_tx is workaround for customer's > > embedded system, I don't have this > > enviroment now. I can't sure it will work fine if I removed this. > > On DFE-580TX boards, the reset_tx way did not work. The ports remained > blocked until a power-cycle. I do not know if the TxUnderrun problem ever > happened with earlier (one port) boards, so I doubt that the reset_tx way > ever worked. Is was even commented as not being tested. On DFE-580TX > boards, the current way has been verified by me and others to work, so > please do not break it. > > Best regards > > Philippe > > > > > Thanks you very mutch. > > > > Best Regards, > > Jesse Huang. > > > > - Original Message - > > From: "Philippe De Muyter" <[EMAIL PROTECTED]> > > To: "Jesse Huang" <[EMAIL PROTECTED]> > > Cc: > > Sent: Friday, September 15, 2006 7:44 PM > > Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler) > > > > > > On Thu, Sep 14, 2006 at 12:58:30AM +, Jesse Huang wrote: > > [...] > > > @@ -262,8 +262,6 @@ enum alta_offsets { > > > ASICCtrl = 0x30, > > > EEData = 0x34, > > > EECtrl = 0x36, > > > - TxStartThresh = 0x3c, > > > - RxEarlyThresh = 0x3e, > > > > Why ? > > > > > FlashAddr = 0x40, > > > FlashData = 0x44, > > > TxStatus = 0x46, > > [...] > > > @@ -1156,29 +1160,29 @@ static irqreturn_t intr_handler(int irq, > > > np->stats.tx_fifo_errors++; > > > if (tx_status & 0x02) > > > np->stats.tx_window_errors++; > > > - /* > > > - ** This reset has been verified on > > > - ** DFE-580TX boards ! [EMAIL PROTECTED] > > > - */ > > > - if (tx_status & 0x10) { /* TxUnderrun */ > > > - unsigned short txthreshold; > > > - > > > - txthreshold = ioread16 (ioaddr + TxStartThresh); > > > - /* Restart Tx FIFO and transmitter */ > > > - sundance_reset(dev, (NetworkReset|FIFOReset|TxReset) << 16); > > > - iowrite16 (txthreshold, ioaddr + TxStartThresh); > > > - /* No need to reset the Tx pointer here */ > > > + > > > + /* FIFO ERROR need to be reset tx */ > > > + if (tx_status & 0x10) { /* Reset the Tx. */ > > > + spin_lock(&np->lock); > > > + reset_tx(dev); > > > + spin_unlock(&np->lock); > > > + } > > > > Just as the comments say, on DFE-580TX 4 port boards, where it is easy to > > reproduce TxUnderrun problems, just resetting on the chip the Tx FIFO and > > transmitter is enough. > > There is no need to call reset_tx, which discards all pending messages and > > frees all the skb's. It is also not necessary to reload the Tx pointer. > > > > Is it different with newer versions of the chip ? > > > > Philippe > > > > -- > -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
"Vladimir B. Savkin" <[EMAIL PROTECTED]> writes: [you seem to send your emails in a strange way that doesn't keep me in cc. Please stop doing that.] > On Mon, Sep 18, 2006 at 11:58:21AM +0200, Andi Kleen wrote: > > > > The x86-64 timer subsystems currently doesn't have clocksources > > > > at all, but it supports TSC and some other timers. > > > > > > > > until I hacked arch/i386/kernel/tsc.c > > > > Then you don't use x86-64. > > > Oh. I mean I made arch/i386/kernel/tsc.c compile on x86-64 > by hacking some Makefiles and headers. The codebase for timing (and lots of other things) is quite different between 32bit and 64bit. You're really surprised it doesn't work if you do such things? > But the question is, why stock 2.6.18-rc7 could not use TSC on its own? x86-64 doesn't use the TSC when it deems it to not be reliable, which is the case on your system. > > > > > I've also had experience of unsychronized TSC on dual-core Athlon, > > > > > but it was cured by idle=poll. > > > > > > > > You can use that, but it will make your system run quite hot > > > > and cost you a lot of powe^wmoney. > > > > > > Here in Russia electric power is cheap compared with hardware upgrade. > > > > It's not just electrical power - the hardware is more stressed and will > > likely fail earlier too. As a rule of thumb the hotter your hardware runs > > the earlier it will fail. > > What hardware exactly. Doesn't it affect only CPU? And they are not > know to fail before any other components. All hardware. It's basic physics. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)
Dear Philippe: (1)Because this is a patent issue, we are not allow to use it again, even it is in Data Sheet. (2)Ok, sorry for this, I will add it back. Should I resent those 4 patches? Or generate this as a new patch? Thanks very much! Best Regards, Jesse Huang. - Original Message - From: "Philippe De Muyter" <[EMAIL PROTECTED]> To: "Jesse Huang" <[EMAIL PROTECTED]> Cc: Sent: Monday, September 18, 2006 5:41 PM Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler) On Mon, Sep 18, 2006 at 11:41:09AM +0800, Jesse Huang wrote: > Dear Philippe: > > (1) We are not allow to support register TxStartThresh and, RxEarlyThresh, > so > we remove it. Could you develop ? - What do you mean by `We are not allow' - Is it specific to the IP100A chip ? Those register are documented in the Sundance Technology ST201 Data Sheet and when modified with fine-tuned values, they can have a real positive effect on the overall throughput on a loaded system. > > (2) Your consideration is right. But reset_tx is workaround for customer's > embedded system, I don't have this > enviroment now. I can't sure it will work fine if I removed this. On DFE-580TX boards, the reset_tx way did not work. The ports remained blocked until a power-cycle. I do not know if the TxUnderrun problem ever happened with earlier (one port) boards, so I doubt that the reset_tx way ever worked. Is was even commented as not being tested. On DFE-580TX boards, the current way has been verified by me and others to work, so please do not break it. Best regards Philippe > > Thanks you very mutch. > > Best Regards, > Jesse Huang. > > - Original Message - > From: "Philippe De Muyter" <[EMAIL PROTECTED]> > To: "Jesse Huang" <[EMAIL PROTECTED]> > Cc: > Sent: Friday, September 15, 2006 7:44 PM > Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler) > > > On Thu, Sep 14, 2006 at 12:58:30AM +, Jesse Huang wrote: > [...] > > @@ -262,8 +262,6 @@ enum alta_offsets { > > ASICCtrl = 0x30, > > EEData = 0x34, > > EECtrl = 0x36, > > - TxStartThresh = 0x3c, > > - RxEarlyThresh = 0x3e, > > Why ? > > > FlashAddr = 0x40, > > FlashData = 0x44, > > TxStatus = 0x46, > [...] > > @@ -1156,29 +1160,29 @@ static irqreturn_t intr_handler(int irq, > > np->stats.tx_fifo_errors++; > > if (tx_status & 0x02) > > np->stats.tx_window_errors++; > > - /* > > - ** This reset has been verified on > > - ** DFE-580TX boards ! [EMAIL PROTECTED] > > - */ > > - if (tx_status & 0x10) { /* TxUnderrun */ > > - unsigned short txthreshold; > > - > > - txthreshold = ioread16 (ioaddr + TxStartThresh); > > - /* Restart Tx FIFO and transmitter */ > > - sundance_reset(dev, (NetworkReset|FIFOReset|TxReset) << 16); > > - iowrite16 (txthreshold, ioaddr + TxStartThresh); > > - /* No need to reset the Tx pointer here */ > > + > > + /* FIFO ERROR need to be reset tx */ > > + if (tx_status & 0x10) { /* Reset the Tx. */ > > + spin_lock(&np->lock); > > + reset_tx(dev); > > + spin_unlock(&np->lock); > > + } > > Just as the comments say, on DFE-580TX 4 port boards, where it is easy to > reproduce TxUnderrun problems, just resetting on the chip the Tx FIFO and > transmitter is enough. > There is no need to call reset_tx, which discards all pending messages and > frees all the skb's. It is also not necessary to reload the Tx pointer. > > Is it different with newer versions of the chip ? > > Philippe > -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
Hello! > It looks perfectly fine to me, would you like me to apply it > Alexey? Yes, I think it is safe. Theoretically, there is one place where it can be not so good. Good nagling tcp connection, which makes lots of small write()s, will send MSS sized frames due to delayed ACKs. But if we ACK each other segment, more segments will come out incomplete, which could result in some decrease of throughput. But the trap for this case was set 6 years ago. For unidirectional sessions ACKs were sent not even each second segment, but each small segment. :-) This did not show any problems for those 6 years. I guess it means that the problem does not exist. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BNEP] Fix compat BNEPGETCONNLIST ioctl.
Hi David, > We were making no attempt to deal with the fact that a structure with a > uint32_t followed by a pointer is going to be _different_ for 32-bit and > 64-bit userspace. Any 32-bit process trying to use BNEPGETCONNLIST will > be failing with -EFAULT if it's lucky; suffering from having the > connection list dumped at a random address if it's not. it seems that HIDP and CMTP will have the same problem. Regards Marcel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, Sep 18, 2006 at 11:58:21AM +0200, Andi Kleen wrote: > > > The x86-64 timer subsystems currently doesn't have clocksources > > > at all, but it supports TSC and some other timers. > > > > > until I hacked arch/i386/kernel/tsc.c > > Then you don't use x86-64. > Oh. I mean I made arch/i386/kernel/tsc.c compile on x86-64 by hacking some Makefiles and headers. But the question is, why stock 2.6.18-rc7 could not use TSC on its own? > > > > I've also had experience of unsychronized TSC on dual-core Athlon, > > > > but it was cured by idle=poll. > > > > > > You can use that, but it will make your system run quite hot > > > and cost you a lot of powe^wmoney. > > > > Here in Russia electric power is cheap compared with hardware upgrade. > > It's not just electrical power - the hardware is more stressed and will > likely fail earlier too. As a rule of thumb the hotter your hardware runs > the earlier it will fail. What hardware exactly. Doesn't it affect only CPU? And they are not know to fail before any other components. > > > > > > It seems that dhcpd3 makes the box timestamping incoming packets, > > > > killing the performance. I think that combining router and DHCP server > > > > on a same box is a legitimate situation, isn't it? > > > > > > Yes. Good point. DHCP is broken and needs to be fixed. Can you > > > send a bug report to the DHCP maintainers? > > > > > > iirc the problem used to be that RAW sockets didn't do something > > > they need them to do. Maybe we can fix that now. > > > > Will try some days later. > > > > Oh, and pppoe-server uses some kind of packet socket too, doesn't it? > > The problem is not really using a packet socket, but using the SIOCGSTAMP > ioctl on it. As soon as someone issues it the system will take accurate > time stamps for each incoming packet until the respective socket is closed. > > Quick fix is to change user space to use gettimeofday() when it reads > the packet instead. Ok, thank you, I now understand. > > For netdev: I'm more and more thinking we should just avoid the problem > completely and switch to "true end2end" timestamps. This means don't > time stamp when a packet is received, but only when it is delivered > to a socket. The timestamp at receiving is a lie anyways because > the network hardware can add an arbitary long delay before the driver > interrupt > handler runs. Then the problem above would completely disappear. > Comments? Opinions? > > -Andi > ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BNEP] Fix compat BNEPGETCONNLIST ioctl.
We were making no attempt to deal with the fact that a structure with a uint32_t followed by a pointer is going to be _different_ for 32-bit and 64-bit userspace. Any 32-bit process trying to use BNEPGETCONNLIST will be failing with -EFAULT if it's lucky; suffering from having the connection list dumped at a random address if it's not. Signed-off-by: David Woodhouse <[EMAIL PROTECTED]> diff --git a/net/bluetooth/bnep/sock.c b/net/bluetooth/bnep/sock.c index 28c5583..0ef2783 100644 --- a/net/bluetooth/bnep/sock.c +++ b/net/bluetooth/bnep/sock.c @@ -43,6 +43,7 @@ #include #include #include #include +#include #include #include @@ -146,11 +147,44 @@ static int bnep_sock_ioctl(struct socket return 0; } +#ifdef CONFIG_COMPAT +static int bnep_sock_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) +{ + + if (cmd == BNEPGETCONNLIST) { + struct bnep_connlist_req cl; + uint32_t uci; + int err; + + if (get_user(cl.cnum, (uint32_t __user *)arg) || + get_user(uci, (u32 __user *)(arg+4))) + return -EFAULT; + + cl.ci = compat_ptr(uci); + + if (cl.cnum <= 0) + return -EINVAL; + + err = bnep_get_connlist(&cl); + + if (!err && put_user(cl.cnum, (uint32_t __user *)arg)) + err = -EFAULT; + + return err; + } + + return bnep_sock_ioctl(sock, cmd, arg); +} +#endif + static const struct proto_ops bnep_sock_ops = { .family = PF_BLUETOOTH, .owner = THIS_MODULE, .release= bnep_sock_release, .ioctl = bnep_sock_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = bnep_sock_compat_ioctl, +#endif .bind = sock_no_bind, .getname= sock_no_getname, .sendmsg= sock_no_sendmsg, -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
On Mon, Sep 18, 2006 at 01:51:30AM -0700, David Miller wrote: > We created TCP_CONG_ADVANCED for a purpose. If you turn that > thing on, you get full control but if something breaks you get > to keep the pieces. But we should not try to break stuff on purpose, no matter how advanced. It makes zero sense. To reiterate, when compiling in multiple TCP policies, a *random* one gets enabled. This is not something we want to offer even advanced users. It is a kernel, not an adventure course. Please consider this near-oneliner patch which makes stuff behave more like people expect: loading a module, or compiling in a congestion avoidance policy only makes it available, but does not turn it on by default. It also cleans up two notices a bit. I've tested this patch and it does the job for me, reno is now the default, even when more advanced options are compiled in, but the rest is still available. When in doubt, consider that I discovered this because my kernel was crashing, and that this is bound to generate heaps of annoying email otherwise. Thanks. Signed-off-by: bert hubert <[EMAIL PROTECTED]> --- linux-2.6.18-rc7/net/ipv4/tcp_cong.c.org2006-09-18 11:42:25.0 +0200 +++ linux-2.6.18-rc7/net/ipv4/tcp_cong.c2006-09-18 11:43:45.0 +0200 @@ -45,11 +45,11 @@ spin_lock(&tcp_cong_list_lock); if (tcp_ca_find(ca->name)) { - printk(KERN_NOTICE "TCP %s already registered\n", ca->name); + printk(KERN_NOTICE "TCP congestion control '%s' already registered\n", ca->name); ret = -EEXIST; } else { - list_add_rcu(&ca->list, &tcp_cong_list); - printk(KERN_INFO "TCP %s registered\n", ca->name); + list_add_tail_rcu(&ca->list, &tcp_cong_list); + printk(KERN_INFO "TCP congestion control '%s' registered\n", ca->name); } spin_unlock(&tcp_cong_list_lock); -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
"Vladimir B. Savkin" <[EMAIL PROTECTED]> writes: > On Mon, Sep 18, 2006 at 10:35:38AM +0200, Andi Kleen wrote: > > > I just found out that TSC clocksource is not implemented on x86-64. > > > Kernel version 2.6.18-rc7, is it true? > > > > The x86-64 timer subsystems currently doesn't have clocksources > > at all, but it supports TSC and some other timers. > > until I hacked arch/i386/kernel/tsc.c Then you don't use x86-64. > > > > I've also had experience of unsychronized TSC on dual-core Athlon, > > > but it was cured by idle=poll. > > > > You can use that, but it will make your system run quite hot > > and cost you a lot of powe^wmoney. > > Here in Russia electric power is cheap compared with hardware upgrade. It's not just electrical power - the hardware is more stressed and will likely fail earlier too. As a rule of thumb the hotter your hardware runs the earlier it will fail. > > > > It seems that dhcpd3 makes the box timestamping incoming packets, > > > killing the performance. I think that combining router and DHCP server > > > on a same box is a legitimate situation, isn't it? > > > > Yes. Good point. DHCP is broken and needs to be fixed. Can you > > send a bug report to the DHCP maintainers? > > > > iirc the problem used to be that RAW sockets didn't do something > > they need them to do. Maybe we can fix that now. > > Will try some days later. > > Oh, and pppoe-server uses some kind of packet socket too, doesn't it? The problem is not really using a packet socket, but using the SIOCGSTAMP ioctl on it. As soon as someone issues it the system will take accurate time stamps for each incoming packet until the respective socket is closed. Quick fix is to change user space to use gettimeofday() when it reads the packet instead. For netdev: I'm more and more thinking we should just avoid the problem completely and switch to "true end2end" timestamps. This means don't time stamp when a packet is received, but only when it is delivered to a socket. The timestamp at receiving is a lie anyways because the network hardware can add an arbitary long delay before the driver interrupt handler runs. Then the problem above would completely disappear. Comments? Opinions? -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel panic on T60 by e1000 driver
Only authors of proprietary modules you loaded can debug this. Please, redirect this and all futher oopses to them. On 9/18/06, Joe Jin <[EMAIL PROTECTED]> wrote: while I try to transmit a 8k data by send() on my laptap T60, kernel panic occured: Modules linked in: rds cisco_ipsec parport_pc lp parport autofs4 pcmcia opw3945 ieee80211 ie80211_crypt ipt_REJECT xt_tcpudp x_tables vfat fat dm_mirror dm_mod ibm-acpi button battery ac yenta_socket rsrc_nonstatic pcmcia_core uhci_hcd ehci_hcd i2c_i801 i2c_core e1000 ext3 jbd ahci libata sd_mod scsi_mod CPU:0 EIP:0060:[] Tainted:PF VLI - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [XFRM]: Fix wildcard as tunnel source
Patrick McHardy wrote: > [XFRM]: Fix wildcard as tunnel source > > Hashing SAs by source address breaks templates with wildcards as tunnel > source since the source address used for hashing/lookup is still 0/0. > Move source address lookup to xfrm_tmpl_resolve_one() so we can use the > real address in the lookup. > > > static inline int > +xfrm_addr_any(xfrm_address_t *addr, unsigned short family) > +{ > + switch (family) { > + case AF_INET: > + return addr->a4 == 0; > + case AF_INET6: > + return ipv6_addr_any((struct in6_addr*)addr->a6); D'oh. Fixed patch attached. [XFRM]: Fix wildcard as tunnel source Hashing SAs by source address breaks templates with wildcards as tunnel source since the source address used for hashing/lookup is still 0/0. Move source address lookup to xfrm_tmpl_resolve_one() so we can use the real address in the lookup. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit f3307c3183e50959247f28c773590b5d7902097f tree 78ddff768dc25145110767f182408ed6993828c5 parent c2cb1937e1054380c49699188810b9c6e04c8e21 author Patrick McHardy <[EMAIL PROTECTED]> Mon, 18 Sep 2006 11:34:25 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Mon, 18 Sep 2006 11:34:25 +0200 include/net/xfrm.h | 13 + net/ipv4/xfrm4_policy.c | 20 net/ipv4/xfrm4_state.c | 15 --- net/ipv6/xfrm6_policy.c | 21 + net/ipv6/xfrm6_state.c | 16 net/xfrm/xfrm_policy.c | 21 + 6 files changed, 75 insertions(+), 31 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index bf8e2df..c6fac69 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -223,6 +223,7 @@ struct xfrm_policy_afinfo { struct dst_ops *dst_ops; void(*garbage_collect)(void); int (*dst_lookup)(struct xfrm_dst **dst, struct flowi *fl); + int (*get_saddr)(xfrm_address_t *saddr, xfrm_address_t *daddr); struct dst_entry*(*find_bundle)(struct flowi *fl, struct xfrm_policy *policy); int (*bundle_create)(struct xfrm_policy *policy, struct xfrm_state **xfrm, @@ -632,6 +633,18 @@ #endif } static inline int +xfrm_addr_any(xfrm_address_t *addr, unsigned short family) +{ + switch (family) { + case AF_INET: + return addr->a4 == 0; + case AF_INET6: + return ipv6_addr_any((struct in6_addr *)&addr->a6); + } + return 0; +} + +static inline int __xfrm4_state_addr_cmp(struct xfrm_tmpl *tmpl, struct xfrm_state *x) { return (tmpl->saddr.a4 && diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 4795985..eabcd27 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -21,6 +21,25 @@ static int xfrm4_dst_lookup(struct xfrm_ return __ip_route_output_key((struct rtable**)dst, fl); } +static int xfrm4_get_saddr(xfrm_address_t *saddr, xfrm_address_t *daddr) +{ + struct rtable *rt; + struct flowi fl_tunnel = { + .nl_u = { + .ip4_u = { + .daddr = daddr->a4, + }, + }, + }; + + if (!xfrm4_dst_lookup((struct xfrm_dst **)&rt, &fl_tunnel)) { + saddr->a4 = rt->rt_src; + dst_release(&rt->u.dst); + return 0; + } + return -EHOSTUNREACH; +} + static struct dst_entry * __xfrm4_find_bundle(struct flowi *fl, struct xfrm_policy *policy) { @@ -298,6 +317,7 @@ static struct xfrm_policy_afinfo xfrm4_p .family = AF_INET, .dst_ops = &xfrm4_dst_ops, .dst_lookup = xfrm4_dst_lookup, + .get_saddr =xfrm4_get_saddr, .find_bundle = __xfrm4_find_bundle, .bundle_create =__xfrm4_bundle_create, .decode_session = _decode_session4, diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c index 6a2a4ab..fe20344 100644 --- a/net/ipv4/xfrm4_state.c +++ b/net/ipv4/xfrm4_state.c @@ -42,21 +42,6 @@ __xfrm4_init_tempsel(struct xfrm_state * x->props.saddr = tmpl->saddr; if (x->props.saddr.a4 == 0) x->props.saddr.a4 = saddr->a4; - if (tmpl->mode == XFRM_MODE_TUNNEL && x->props.saddr.a4 == 0) { - struct rtable *rt; - struct flowi fl_tunnel = { - .nl_u = { - .ip4_u = { - .daddr = x->id.daddr.a4, - } - } - }; - if (!xfrm_dst_lookup((struct xfrm_dst **)&rt, -&fl_tunnel, AF_INET)) { - x->props.saddr.a4 = rt->rt_src; - dst_
Re: [XFRM]: Fix wildcard as tunnel source
Patrick McHardy wrote: > David Miller wrote: > >>I really don't want to remove this as it's fairly critical performance >>wise for the scalability problems all my changes were meant to address. >>I hope I really don't have to do something like what was needed for >>the policy layer, having a linked list and a hash table to handle the >>two cases. > > > We could query the address before the SA lookup. It will cost an > additional route lookup in case a matching SA is already present, > but I guess thats still better than removing the source from the > hash. I'll try if it works and send a new patch. I've tested this patch and it works fine. I'm wondering if something else might be affected by the hash change though, xfrm_state_addr_check treated 0.0.0.0 as wildcard even before the introduction of wildcards in tunnel templates, but I can't see in which other case it would be zero. [XFRM]: Fix wildcard as tunnel source Hashing SAs by source address breaks templates with wildcards as tunnel source since the source address used for hashing/lookup is still 0/0. Move source address lookup to xfrm_tmpl_resolve_one() so we can use the real address in the lookup. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit f3307c3183e50959247f28c773590b5d7902097f tree 78ddff768dc25145110767f182408ed6993828c5 parent c2cb1937e1054380c49699188810b9c6e04c8e21 author Patrick McHardy <[EMAIL PROTECTED]> Mon, 18 Sep 2006 11:34:25 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Mon, 18 Sep 2006 11:34:25 +0200 include/net/xfrm.h | 13 + net/ipv4/xfrm4_policy.c | 20 net/ipv4/xfrm4_state.c | 15 --- net/ipv6/xfrm6_policy.c | 21 + net/ipv6/xfrm6_state.c | 16 net/xfrm/xfrm_policy.c | 21 + 6 files changed, 75 insertions(+), 31 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index bf8e2df..c6fac69 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -223,6 +223,7 @@ struct xfrm_policy_afinfo { struct dst_ops *dst_ops; void(*garbage_collect)(void); int (*dst_lookup)(struct xfrm_dst **dst, struct flowi *fl); + int (*get_saddr)(xfrm_address_t *saddr, xfrm_address_t *daddr); struct dst_entry*(*find_bundle)(struct flowi *fl, struct xfrm_policy *policy); int (*bundle_create)(struct xfrm_policy *policy, struct xfrm_state **xfrm, @@ -632,6 +633,18 @@ #endif } static inline int +xfrm_addr_any(xfrm_address_t *addr, unsigned short family) +{ + switch (family) { + case AF_INET: + return addr->a4 == 0; + case AF_INET6: + return ipv6_addr_any((struct in6_addr*)addr->a6); + } + return 0; +} + +static inline int __xfrm4_state_addr_cmp(struct xfrm_tmpl *tmpl, struct xfrm_state *x) { return (tmpl->saddr.a4 && diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 4795985..eabcd27 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -21,6 +21,25 @@ static int xfrm4_dst_lookup(struct xfrm_ return __ip_route_output_key((struct rtable**)dst, fl); } +static int xfrm4_get_saddr(xfrm_address_t *saddr, xfrm_address_t *daddr) +{ + struct rtable *rt; + struct flowi fl_tunnel = { + .nl_u = { + .ip4_u = { + .daddr = daddr->a4, + }, + }, + }; + + if (!xfrm4_dst_lookup((struct xfrm_dst **)&rt, &fl_tunnel)) { + saddr->a4 = rt->rt_src; + dst_release(&rt->u.dst); + return 0; + } + return -EHOSTUNREACH; +} + static struct dst_entry * __xfrm4_find_bundle(struct flowi *fl, struct xfrm_policy *policy) { @@ -298,6 +317,7 @@ static struct xfrm_policy_afinfo xfrm4_p .family = AF_INET, .dst_ops = &xfrm4_dst_ops, .dst_lookup = xfrm4_dst_lookup, + .get_saddr =xfrm4_get_saddr, .find_bundle = __xfrm4_find_bundle, .bundle_create =__xfrm4_bundle_create, .decode_session = _decode_session4, diff --git a/net/ipv4/xfrm4_state.c b/net/ipv4/xfrm4_state.c index 6a2a4ab..fe20344 100644 --- a/net/ipv4/xfrm4_state.c +++ b/net/ipv4/xfrm4_state.c @@ -42,21 +42,6 @@ __xfrm4_init_tempsel(struct xfrm_state * x->props.saddr = tmpl->saddr; if (x->props.saddr.a4 == 0) x->props.saddr.a4 = saddr->a4; - if (tmpl->mode == XFRM_MODE_TUNNEL && x->props.saddr.a4 == 0) { - struct rtable *rt; - struct flowi fl_tunnel = { - .nl_u = { - .ip4_u = { - .daddr = x->id.daddr.a4, -
Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler)
On Mon, Sep 18, 2006 at 11:41:09AM +0800, Jesse Huang wrote: > Dear Philippe: > > (1) We are not allow to support register TxStartThresh and, RxEarlyThresh, > so > we remove it. Could you develop ? - What do you mean by `We are not allow' - Is it specific to the IP100A chip ? Those register are documented in the Sundance Technology ST201 Data Sheet and when modified with fine-tuned values, they can have a real positive effect on the overall throughput on a loaded system. > > (2) Your consideration is right. But reset_tx is workaround for customer's > embedded system, I don't have this > enviroment now. I can't sure it will work fine if I removed this. On DFE-580TX boards, the reset_tx way did not work. The ports remained blocked until a power-cycle. I do not know if the TxUnderrun problem ever happened with earlier (one port) boards, so I doubt that the reset_tx way ever worked. Is was even commented as not being tested. On DFE-580TX boards, the current way has been verified by me and others to work, so please do not break it. Best regards Philippe > > Thanks you very mutch. > > Best Regards, > Jesse Huang. > > - Original Message - > From: "Philippe De Muyter" <[EMAIL PROTECTED]> > To: "Jesse Huang" <[EMAIL PROTECTED]> > Cc: > Sent: Friday, September 15, 2006 7:44 PM > Subject: Re: [PATCH 1/4] IP100A: Fix TX Pause bug (reset_tx, intr_handler) > > > On Thu, Sep 14, 2006 at 12:58:30AM +, Jesse Huang wrote: > [...] > > @@ -262,8 +262,6 @@ enum alta_offsets { > > ASICCtrl = 0x30, > > EEData = 0x34, > > EECtrl = 0x36, > > - TxStartThresh = 0x3c, > > - RxEarlyThresh = 0x3e, > > Why ? > > > FlashAddr = 0x40, > > FlashData = 0x44, > > TxStatus = 0x46, > [...] > > @@ -1156,29 +1160,29 @@ static irqreturn_t intr_handler(int irq, > > np->stats.tx_fifo_errors++; > > if (tx_status & 0x02) > > np->stats.tx_window_errors++; > > - /* > > - ** This reset has been verified on > > - ** DFE-580TX boards ! [EMAIL PROTECTED] > > - */ > > - if (tx_status & 0x10) { /* TxUnderrun */ > > - unsigned short txthreshold; > > - > > - txthreshold = ioread16 (ioaddr + TxStartThresh); > > - /* Restart Tx FIFO and transmitter */ > > - sundance_reset(dev, (NetworkReset|FIFOReset|TxReset) << 16); > > - iowrite16 (txthreshold, ioaddr + TxStartThresh); > > - /* No need to reset the Tx pointer here */ > > + > > + /* FIFO ERROR need to be reset tx */ > > + if (tx_status & 0x10) { /* Reset the Tx. */ > > + spin_lock(&np->lock); > > + reset_tx(dev); > > + spin_unlock(&np->lock); > > + } > > Just as the comments say, on DFE-580TX 4 port boards, where it is easy to > reproduce TxUnderrun problems, just resetting on the chip the Tx FIFO and > transmitter is enough. > There is no need to call reset_tx, which discards all pending messages and > frees all the skb's. It is also not necessary to reload the Tx pointer. > > Is it different with newer versions of the chip ? > > Philippe > -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
On Mon, Sep 18, 2006 at 10:35:38AM +0200, Andi Kleen wrote: > > I just found out that TSC clocksource is not implemented on x86-64. > > Kernel version 2.6.18-rc7, is it true? > > The x86-64 timer subsystems currently doesn't have clocksources > at all, but it supports TSC and some other timers. Hm. On my box, TSC did not work, until I hacked arch/i386/kernel/tsc.c in it. Neither clock=tsc nor clocksource=tsc didn't have any effect. > > I've also had experience of unsychronized TSC on dual-core Athlon, > > but it was cured by idle=poll. > > You can use that, but it will make your system run quite hot > and cost you a lot of powe^wmoney. Here in Russia electric power is cheap compared with hardware upgrade. > > It seems that dhcpd3 makes the box timestamping incoming packets, > > killing the performance. I think that combining router and DHCP server > > on a same box is a legitimate situation, isn't it? > > Yes. Good point. DHCP is broken and needs to be fixed. Can you > send a bug report to the DHCP maintainers? > > iirc the problem used to be that RAW sockets didn't do something > they need them to do. Maybe we can fix that now. Will try some days later. Oh, and pppoe-server uses some kind of packet socket too, doesn't it? > > If that's not possible we can probably add a ioctl or similar > to disable time stamping for packet sockets (DHCP shouldn't really > need a fine grained time stamp). dhcpcd would need to use that then. I would like some sysctl very much, too. Let tcpdump show imprecise timestamps when forwarding performance is more important. After all, Ciscos don't have any tcpdump analog at all, and they are very popular :) > > Keep me updated what they say. > > -Andi > ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcp congestion policy selection link order fragile
From: bert hubert <[EMAIL PROTECTED]> Date: Sun, 17 Sep 2006 14:21:53 +0200 > Operators, distributors and even people who've been doing kernel stuff for > more than a decade expect to be able to compile in (experimental) policies, > and not have a *random* one of them enabled by default! We created TCP_CONG_ADVANCED for a purpose. If you turn that thing on, you get full control but if something breaks you get to keep the pieces. Quite frankly, just about everyone should not enable TCP_CONG_ADVANCED at all. And quite likely thie applies even distribution vendors. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
"Vladimir B. Savkin" <[EMAIL PROTECTED]> writes: > On Mon, Jun 19, 2006 at 05:24:31PM +0200, Andi Kleen wrote: > > > > > If you use "pmtmr" try to reboot with kernel option "clock=tsc". > > > > That's dangerous advice - when the system choses not to use > > TSC it often has a reason. > > I just found out that TSC clocksource is not implemented on x86-64. > Kernel version 2.6.18-rc7, is it true? The x86-64 timer subsystems currently doesn't have clocksources at all, but it supports TSC and some other timers. > > I've also had experience of unsychronized TSC on dual-core Athlon, > but it was cured by idle=poll. You can use that, but it will make your system run quite hot and cost you a lot of powe^wmoney. > It seems that dhcpd3 makes the box timestamping incoming packets, > killing the performance. I think that combining router and DHCP server > on a same box is a legitimate situation, isn't it? Yes. Good point. DHCP is broken and needs to be fixed. Can you send a bug report to the DHCP maintainers? iirc the problem used to be that RAW sockets didn't do something they need them to do. Maybe we can fix that now. If that's not possible we can probably add a ioctl or similar to disable time stamping for packet sockets (DHCP shouldn't really need a fine grained time stamp). dhcpcd would need to use that then. Keep me updated what they say. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [XFRM]: Fix wildcard as tunnel source
David Miller wrote: > Unfortunately, this break scalability of the xfrm state layer when the > source is equally as varying as the destination. In such setups you > have an enormous number of entries with destination being the local > system and only the source address changing. > > BTW, how can the source be specified as wildcard? There is no prefix > component, it is simply an xfrm_address_t. And there are several > macros which check for x->props.saddr equality directly with no > special prefixing or wildcard logic. The tunnel endpoint in the template (either source or destination, depending on the direction) is set to 0.0.0.0. For outbound SAs, the address is compared using xfrm_state_addr_check(), which interprets 0.0.0.0 as wildcard. When no matching SA is present, the address is resolved using routing and filled in the ACQ SA. The keying daemon will then install SAs with the proper source. For inbound SAs the tunnel destination from the template is ignored. > I really don't want to remove this as it's fairly critical performance > wise for the scalability problems all my changes were meant to address. > I hope I really don't have to do something like what was needed for > the policy layer, having a linked list and a hash table to handle the > two cases. We could query the address before the SA lookup. It will cost an additional route lookup in case a matching SA is already present, but I guess thats still better than removing the source from the hash. I'll try if it works and send a new patch. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] net/sctp/: cleanups
From: Sridhar Samudrala <[EMAIL PROTECTED]> Date: Tue, 05 Sep 2006 16:44:21 -0700 > On Tue, 2006-09-05 at 23:57 +0200, Adrian Bunk wrote: > > This patch contains the following cleanups: > > - make the following needlessly global function static: > > - socket.c: sctp_apply_peer_addr_params() > > - add proper prototypes for the several global functions in > > include/net/sctp/sctp.h > > > > Note that this fixes wrong prototypes for the following functions: > > - sctp_snmp_proc_exit() > > - sctp_eps_proc_exit() > > - sctp_assocs_proc_exit() > > > > The latter was spotted by the GNU C compiler and reported > > by David Woodhouse. > > > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> > > Acked-by: Sridhar Samudrala <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
From: Rick Jones <[EMAIL PROTECTED]> Date: Tue, 05 Sep 2006 10:55:16 -0700 > Is this really necessary? I thought that the problems with ABC were in > trying to apply byte-based heuristics from the RFC(s) to a > packet-oritented cwnd in the stack? This is receiver side, and helps a sender who does congestion control based upon packet counting like Linux does. It really is less related to ABC than Alexey implies, we've always had this kind of problem as I mentioned in previous talks in the past on this issue. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] Re: high latency with TCP connections
From: Alexey Kuznetsov <[EMAIL PROTECTED]> Date: Mon, 4 Sep 2006 20:00:45 +0400 > Try enclosed patch. I have no idea why 9.997 sec is so magic, but I > get exactly this number on my notebook. :-) > > = > > This patch enables sending ACKs each 2d received segment. > It does not affect either mss-sized connections (obviously) or connections > controlled by Nagle (because there is only one small segment in flight). > > The idea is to record the fact that a small segment arrives > on a connection, where one small segment has already been received > and still not-ACKed. In this case ACK is forced after tcp_recvmsg() > drains receive buffer. > > In other words, it is a "soft" each-2d-segment ACK, which is enough > to preserve ACK clock even when ABC is enabled. > > Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]> This looks exactly like the kind of patch I tried to formulate, very unsuccessfully, last time this topic came up a year or so ago. It looks perfectly fine to me, would you like me to apply it Alexey? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-rc5 with GRE, iptables and Speedtouch ADSL, PPP over ATM
From: Herbert Xu <[EMAIL PROTECTED]> Date: Sun, 3 Sep 2006 21:15:07 +1000 > So here is a simple patch to remove the tx lock from dev_watchdog_up. > In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and > replace it with dev_watchdog_up. > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Applied, thanks Herbert. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [XFRM]: Fix wildcard as tunnel source
From: Patrick McHardy <[EMAIL PROTECTED]> Date: Sat, 02 Sep 2006 16:46:44 +0200 > [XFRM]: Fix wildcard as tunnel source > > Hashing SAs by source address breaks templates with wildcards as tunnel > source. Remove saddr from the hash key. > > Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> Unfortunately, this break scalability of the xfrm state layer when the source is equally as varying as the destination. In such setups you have an enormous number of entries with destination being the local system and only the source address changing. BTW, how can the source be specified as wildcard? There is no prefix component, it is simply an xfrm_address_t. And there are several macros which check for x->props.saddr equality directly with no special prefixing or wildcard logic. I really don't want to remove this as it's fairly critical performance wise for the scalability problems all my changes were meant to address. I hope I really don't have to do something like what was needed for the policy layer, having a linked list and a hash table to handle the two cases. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]:[XFRM] BEET mode
On Sat, 16 Sep 2006, Diego Beltrami wrote: The patch which introduces the BEET mode and which previously was sent to this mailing list is valid also for http://www.kernel.org/git/?p=linux/kernel/git/davem/net-2.6.19.git;a=summary branch. However there are probably some errors in attaching inline the patch to the mail. I retry to reattach it. In any case, if there would be some errors, the same patch can be found at the following URL and it works just fine: .. For those who haven't been following this discussion, the patch introduces the BEET mode (Bound End-to-End Tunnel) as specified by the ietf draft at the following link: http://www.ietf.org/internet-drafts/draft-nikander-esp-beet-mode-06.txt Signed-off-by: Diego Beltrami <[EMAIL PROTECTED]> Signed-off-by: Miika Komu <[EMAIL PROTECTED]> Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> Signed-off-by: Abhinav Pathak <[EMAIL PROTECTED]> Signed-off-by: Jeff Ahrenholz <[EMAIL PROTECTED]> Is the patch in the web fine? Diego said that the patch applies fine to Dave's branch, but the problem is the email formatting. The patch in the web is the same as forwarded to the email list. I put the patch into a more permanent location: http://infrahip.hiit.fi/beet/2.6.18/simple-beet-ph-patch-2.6.18 http://infrahip.hiit.fi/beet/2.6.18/simple-beet-ph-patch-2.6.18.md5sum 5cd131d2f15f04d3dc26e360ce3ae38e simple-beet-ph-patch-2.6.18 -- Miika Komu http://www.iki.fi/miika/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] address: Support NLM_F_EXCL when adding addresses
From: Thomas Graf <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 23:40:05 +0200 > iproute2 doesn't provide the NLM_F_CREATE flag when adding addresses, > it is assumed to be implied. The existing code issues a check on > said flag when the modify operation fails (likely due to ENOENT) > before continueing to create it, this leads to a hard to predict > result, therefore the NLM_F_CREATE check is removed. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> I hope this doesn't break any existing stuff, but it is certainly the logically correct thing to do. If things break I'm reverting this though. But for now, applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] address: Allow address changes while device is administrative down
From: Thomas Graf <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 23:40:04 +0200 > Same behaviour as IPv4, using IFF_UP is a no-no anyway. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] address: Convert address dumping to new netlink api
From: Thomas Graf <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 23:40:03 +0200 > Replaces INET6_IFADDR_RTA_SPACE with a new function calculating > the total required message size for all address messages. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] address: Add put_ifaddrmsg() and rt_scope()
From: Thomas Graf <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 23:40:02 +0200 > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] address: Add put_cacheinfo() to dump struct cacheinfo
From: Thomas Graf <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 23:40:01 +0200 > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] address: Convert address lookup to new netlink api
From: Thomas Graf <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 23:40:00 +0200 > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] address: Convert address deletion to new netlink api
From: Thomas Graf <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 23:39:59 +0200 > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8] address: Convert address addition to new netlink api
From: Thomas Graf <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 23:39:58 +0200 > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] change netfilter tunables to __read_mostly
From: Brian Haley <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 11:32:06 -0400 > Change some netfilter tunables to __read_mostly. Also fixed some > incorrect file reference comments while I was in there. > > (this will be my last __read_mostly patch unless someone points out > something else that needs it) > > Signed-off-by: Brian Haley <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] change sctp globals to __read_mostly
From: Brian Haley <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 11:31:52 -0400 > Change sctp globals to __read_mostly. > > Signed-off-by: Brian Haley <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] change bridge sysctl tunables to __read_mostly
From: Brian Haley <[EMAIL PROTECTED]> Date: Fri, 01 Sep 2006 11:31:43 -0400 > Change some bridge sysctl tunables to __read_mostly. > > Signed-off-by: Brian Haley <[EMAIL PROTECTED]> Applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENL]: Provide more information to userspace about registered genl families
From: Thomas Graf <[EMAIL PROTECTED]> Date: Thu, 31 Aug 2006 23:21:29 +0200 > Additionaly exports the following information when providing > the list of registered generic netlink families: > - protocol version > - header size > - maximum number of attributes > - list of available operations including > - id > - flags > - avaiability of policy and doit/dumpit function > > libnl HEAD provides a utility to read this new information: > > 0x0010 nlctrl version 1 > hdrsize 0 maxattr 6 > op GETFAMILY (0x03) [POLICY,DOIT,DUMPIT] > 0x0011 NLBL_MGMT version 1 > hdrsize 0 maxattr 0 > op unknown (0x02) [DOIT] > op unknown (0x03) [DOIT] > > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Applied to net-2.6.19, thanks Thomas. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18-rc6 memory mapped pcap truncates outgoing TCP packets, but not icmp
From: Patrick McHardy <[EMAIL PROTECTED]> Date: Fri, 15 Sep 2006 22:16:17 +0200 > bert hubert wrote: > >>It appears to be intentionally, but I don't see a reason for it. > >>Can you try if this patch makes it work as expected? > > > > > >>[PACKET]: Don't truncate non-linear skbs with mmaped IO > >> > >>Non-linear skbs are truncated to their linear part with mmaped IO. > >>Fix by using skb_copy_bits instead of memcpy. > > > > > > Works very well for me! I hope this can make it into 2.6.18. > > > That would be fine with me, lets see what Dave thinks. Applied to net-2.6, thanks a lot. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html