Re: bad networking related lag in v2.6.22-rc2
From: Patrick McHardy [EMAIL PROTECTED] Date: Thu, 24 May 2007 07:41:00 +0200 David Miller wrote: * Herbert Xu [EMAIL PROTECTED] wrote: [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty Applied, thanks everyone. Even though it didn't fix this problem, this patch I sent earlier is also needed. Thanks a lot for reminding me about this patch Patrick, applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad networking related lag in v2.6.22-rc2
On Thursday 24 May 2007 03:00:56 David Miller wrote: From: Ingo Molnar [EMAIL PROTECTED] Date: Wed, 23 May 2007 13:40:21 +0200 * Herbert Xu [EMAIL PROTECTED] wrote: [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty My previous patch that changed the return value of qdisc_restart incorrectly made the case where dequeue returns empty continue processing packets. This patch is based on diagnosis and fix by Patrick McHardy. Signed-off-by: Herbert Xu [EMAIL PROTECTED] also: Reported-and-debugged-by: Anant Nitya [EMAIL PROTECTED] Applied, thanks everyone. Networking lag I been seeing since 2.6.22-rc1, disappeared after applying this patch. Thanks to everyone who helped me run my system sane again. :) Reagards Ananitya -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad networking related lag in v2.6.22-rc2
Ingo Molnar wrote: if you feel inclined to try the git-bisection then by all means please do it (it will certainly be helpful and educative), but it's optional: i dont think you should 'need' to go through extra debugging chores, my analysis based on the excellent trace you provided still holds and whoever modified htb_dequeue()'s logic recently ought to be able to figure that out (or send you a debug patch to further narrow the problem down). The trace shows a _clearly_ anomalous loop: for example there's 56396 (!) calls to rb_first() in htb_dequeue() [without the kernel ever exiting that function]: earth4:~/s grep rb_first trace-to-ingo.txt | wc -l 56396 How is this trace to be understood? Is it simply a call trace in execution-order? If thats the case than we are exiting htb_dequeue, each call to qdisc_watchdog_schedule happens at the very end of that function, which would imply a bug in __qdisc_run. Looking at the recent changes to __qdisc_run, this indeed seems to be the case, when the qdisc is throttled and has packets queued we return a value != 0, causing __qdisc_run to loop until all packets have been sent, which may be a long time. Anant, can you please verify by testing the attached patch? Thanks. diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index f28bb2d..f536060 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -174,7 +174,7 @@ requeue: out: BUG_ON((int) q-q.qlen 0); - return q-q.qlen; + return skb ? q-q.qlen : 0; } void __qdisc_run(struct net_device *dev)
Re: bad networking related lag in v2.6.22-rc2
* Patrick McHardy [EMAIL PROTECTED] wrote: How is this trace to be understood? Is it simply a call trace in execution-order? [...] yeah. There's a help section at the top of the trace which explains the other fields too: _--= CPU# / _-= irqs-off | / _= need-resched || / _---= hardirq/softirq ||| / _--= preempt-depth / | delay cmd pid | time | caller \ /| \ | / privoxy-12926 1.Ns10us : ktime_get_ts (ktime_get) the function name in braces is the parent function. So in this case the trace entry means we called ktime_get_ts() from ktime_get(). Ingo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad networking related lag in v2.6.22-rc2
On Wed, May 23, 2007 at 12:56:04PM +0200, Patrick McHardy wrote: Looking at the recent changes to __qdisc_run, this indeed seems to be the case, when the qdisc is throttled and has packets queued we return a value != 0, causing __qdisc_run to loop until all packets have been sent, which may be a long time. Good catch! I was obviously half awake at the time :) We could also fix it this way: [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty My previous patch that changed the return value of qdisc_restart incorrectly made the case where dequeue returns empty continue processing packets. This patch is based on diagnosis and fix by Patrick McHardy. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index f28bb2d..cbefe22 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -169,8 +169,8 @@ requeue: else q-ops-requeue(skb, q); netif_schedule(dev); - return 0; } + return 0; out: BUG_ON((int) q-q.qlen 0); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad networking related lag in v2.6.22-rc2
Herbert Xu wrote: On Wed, May 23, 2007 at 12:56:04PM +0200, Patrick McHardy wrote: Looking at the recent changes to __qdisc_run, this indeed seems to be the case, when the qdisc is throttled and has packets queued we return a value != 0, causing __qdisc_run to loop until all packets have been sent, which may be a long time. Good catch! I was obviously half awake at the time :) We could also fix it this way: Yes, that looks better, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad networking related lag in v2.6.22-rc2
* Herbert Xu [EMAIL PROTECTED] wrote: [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty My previous patch that changed the return value of qdisc_restart incorrectly made the case where dequeue returns empty continue processing packets. This patch is based on diagnosis and fix by Patrick McHardy. Signed-off-by: Herbert Xu [EMAIL PROTECTED] also: Reported-and-debugged-by: Anant Nitya [EMAIL PROTECTED] ... i gave your patch a quick test-boot and networking still works fine. Ingo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad networking related lag in v2.6.22-rc2
On Wed, 23 May 2007, Patrick McHardy wrote: Yes, that looks better, thanks. There appear to be other obvious problems in the recent cleanups in this area.. Look at psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, psched_time_t bound) { return min(tv1 - tv2, bound); } and compare it to the previous code: #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \ min_t(long long, (tv1) - (tv2), bound) and ponder how that trivial cleanup totally broke the thing. Hint: psched_time_t is an u64. What does that mean for min(tv1 - tv2, bound); again, when tv2 is larger than tv1. It _used_ to return a negative value. Now it returns a positive bound upper bound, because tv1-tv2 will be used as a huge unsigned (and thus _positive_) integer. And was that accidental, or done on purpose? Sounds accidental to me, since you then want to return a psched_tdiff_t, which is typedeffed to be long. Doesn't sound very safe to me, especially since the commit message for this is [NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function, and there's no indication that anybody realized that it changed semantics in the process. Hmm? What _should_ that thing do? Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad networking related lag in v2.6.22-rc2
Linus Torvalds wrote: There appear to be other obvious problems in the recent cleanups in this area.. Look at psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, psched_time_t bound) { return min(tv1 - tv2, bound); } and compare it to the previous code: #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \ min_t(long long, (tv1) - (tv2), bound) and ponder how that trivial cleanup totally broke the thing. Hint: psched_time_t is an u64. What does that mean for min(tv1 - tv2, bound); again, when tv2 is larger than tv1. It _used_ to return a negative value. Now it returns a positive bound upper bound, because tv1-tv2 will be used as a huge unsigned (and thus _positive_) integer. And was that accidental, or done on purpose? Sounds accidental to me, since you then want to return a psched_tdiff_t, which is typedeffed to be long. Doesn't sound very safe to me, especially since the commit message for this is [NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function, and there's no indication that anybody realized that it changed semantics in the process. I did realize it, but tv2 tv1 can't happen and makes no sense for the users of this function. I probably should have provided a more detailed changelog entry. Hmm? What _should_ that thing do? It is used to calculate the amount of tokens a tocken bucket has accumulated since the last refill, thus we always have tv1 = tv2 (modulo ktime wraps). In fact tv2 tv1 was never properly supported. This macro would have returned the negative long long value, but all users assign it to a psched_tdiff_t (long), so depending on the exact values, it might still be interpreted as a large positive value. Additionally there was a second implementation for the gettimeofday clocksource that didn't return the negative difference but the bound value. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad networking related lag in v2.6.22-rc2
From: Ingo Molnar [EMAIL PROTECTED] Date: Wed, 23 May 2007 13:40:21 +0200 * Herbert Xu [EMAIL PROTECTED] wrote: [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty My previous patch that changed the return value of qdisc_restart incorrectly made the case where dequeue returns empty continue processing packets. This patch is based on diagnosis and fix by Patrick McHardy. Signed-off-by: Herbert Xu [EMAIL PROTECTED] also: Reported-and-debugged-by: Anant Nitya [EMAIL PROTECTED] Applied, thanks everyone. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad networking related lag in v2.6.22-rc2
David Miller wrote: * Herbert Xu [EMAIL PROTECTED] wrote: [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty Applied, thanks everyone. Even though it didn't fix this problem, this patch I sent earlier is also needed. [NET_SCHED]: sch_htb: fix event cache time calculation The event cache time must be an absolute value, when no event exists it is incorrectly set to 1s instead of 1s in the future. Signed-off-by: Patrick McHardy [EMAIL PROTECTED] --- commit 49d1023ea0ea8377e740123d5954e88a00f78b7c tree 031c210f1b5e37ade5a4fa519f5808cd49225b89 parent 637fc540b0ad22bf7971929e906e704236af06cd author Patrick McHardy [EMAIL PROTECTED] Mon, 21 May 2007 23:24:16 +0200 committer Patrick McHardy [EMAIL PROTECTED] Mon, 21 May 2007 23:25:51 +0200 net/sched/sch_htb.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 99bcec8..035788c 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -976,8 +976,9 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch) if (q-now = q-near_ev_cache[level]) { event = htb_do_events(q, level); - q-near_ev_cache[level] = event ? event : - PSCHED_TICKS_PER_SEC; + if (!event) + event = q-now + PSCHED_TICKS_PER_SEC; + q-near_ev_cache[level] = event; } else event = q-near_ev_cache[level];