Re: bad networking related lag in v2.6.22-rc2

2007-05-24 Thread David Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Thu, 24 May 2007 07:41:00 +0200

 David Miller wrote:
 * Herbert Xu [EMAIL PROTECTED] wrote:
 
 [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty
 
  Applied, thanks everyone.
 
 
 Even though it didn't fix this problem, this patch I sent earlier is
 also needed.

Thanks a lot for reminding me about this patch Patrick, applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad networking related lag in v2.6.22-rc2

2007-05-24 Thread Anant Nitya
On Thursday 24 May 2007 03:00:56 David Miller wrote:
 From: Ingo Molnar [EMAIL PROTECTED]
 Date: Wed, 23 May 2007 13:40:21 +0200

  * Herbert Xu [EMAIL PROTECTED] wrote:
   [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty
  
   My previous patch that changed the return value of qdisc_restart
   incorrectly made the case where dequeue returns empty continue
   processing packets.
  
   This patch is based on diagnosis and fix by Patrick McHardy.
  
   Signed-off-by: Herbert Xu [EMAIL PROTECTED]
 
  also:
 
Reported-and-debugged-by: Anant Nitya [EMAIL PROTECTED]

 Applied, thanks everyone.

Networking lag I been seeing since 2.6.22-rc1, disappeared after applying this 
patch. Thanks to everyone who helped me run my system sane again. :)

Reagards
Ananitya

-- 
Out of many thousands, one may endeavor for perfection, and of
those who have achieved perfection, hardly one knows Me in truth.
-- Gita Sutra Of Mysticism
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad networking related lag in v2.6.22-rc2

2007-05-23 Thread Patrick McHardy
Ingo Molnar wrote:
 if you feel inclined to try the git-bisection then by all means please 
 do it (it will certainly be helpful and educative), but it's optional: i 
 dont think you should 'need' to go through extra debugging chores, my 
 analysis based on the excellent trace you provided still holds and 
 whoever modified htb_dequeue()'s logic recently ought to be able to 
 figure that out (or send you a debug patch to further narrow the problem 
 down).

 The trace shows a _clearly_ anomalous loop: for example there's 56396 
 (!) calls to rb_first() in htb_dequeue() [without the kernel ever 
 exiting that function]:
 
   earth4:~/s grep rb_first trace-to-ingo.txt  | wc -l
   56396


How is this trace to be understood? Is it simply a call trace in
execution-order? If thats the case than we are exiting htb_dequeue,
each call to qdisc_watchdog_schedule happens at the very end of
that function, which would imply a bug in __qdisc_run.

Looking at the recent changes to __qdisc_run, this indeed seems
to be the case, when the qdisc is throttled and has packets queued
we return a value != 0, causing __qdisc_run to loop until all
packets have been sent, which may be a long time.

Anant, can you please verify by testing the attached patch? Thanks.
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f28bb2d..f536060 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -174,7 +174,7 @@ requeue:
 
 out:
BUG_ON((int) q-q.qlen  0);
-   return q-q.qlen;
+   return skb ? q-q.qlen : 0;
 }
 
 void __qdisc_run(struct net_device *dev)


Re: bad networking related lag in v2.6.22-rc2

2007-05-23 Thread Ingo Molnar

* Patrick McHardy [EMAIL PROTECTED] wrote:

 How is this trace to be understood? Is it simply a call trace in 
 execution-order? [...]

yeah. There's a help section at the top of the trace which explains the 
other fields too:

 _--= CPU#
/ _-= irqs-off
   | / _= need-resched
   || / _---= hardirq/softirq
   ||| / _--= preempt-depth
    /
   | delay
   cmd pid | time  |   caller
  \   /|   \   |   /
 privoxy-12926 1.Ns10us : ktime_get_ts (ktime_get)

the function name in braces is the parent function. So in this case the 
trace entry means we called ktime_get_ts() from ktime_get().

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad networking related lag in v2.6.22-rc2

2007-05-23 Thread Herbert Xu
On Wed, May 23, 2007 at 12:56:04PM +0200, Patrick McHardy wrote:
 
 Looking at the recent changes to __qdisc_run, this indeed seems
 to be the case, when the qdisc is throttled and has packets queued
 we return a value != 0, causing __qdisc_run to loop until all
 packets have been sent, which may be a long time.

Good catch! I was obviously half awake at the time :)

We could also fix it this way:

[NET_SCHED]: Fix qdisc_restart return value when dequeue is empty

My previous patch that changed the return value of qdisc_restart
incorrectly made the case where dequeue returns empty continue
processing packets.

This patch is based on diagnosis and fix by Patrick McHardy.

Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index f28bb2d..cbefe22 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -169,8 +169,8 @@ requeue:
else
q-ops-requeue(skb, q);
netif_schedule(dev);
-   return 0;
}
+   return 0;
 
 out:
BUG_ON((int) q-q.qlen  0);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad networking related lag in v2.6.22-rc2

2007-05-23 Thread Patrick McHardy
Herbert Xu wrote:
 On Wed, May 23, 2007 at 12:56:04PM +0200, Patrick McHardy wrote:
 
Looking at the recent changes to __qdisc_run, this indeed seems
to be the case, when the qdisc is throttled and has packets queued
we return a value != 0, causing __qdisc_run to loop until all
packets have been sent, which may be a long time.
 
 
 Good catch! I was obviously half awake at the time :)
 
 We could also fix it this way:


Yes, that looks better, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad networking related lag in v2.6.22-rc2

2007-05-23 Thread Ingo Molnar

* Herbert Xu [EMAIL PROTECTED] wrote:

 [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty
 
 My previous patch that changed the return value of qdisc_restart 
 incorrectly made the case where dequeue returns empty continue 
 processing packets.
 
 This patch is based on diagnosis and fix by Patrick McHardy.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

also:

  Reported-and-debugged-by: Anant Nitya [EMAIL PROTECTED]

...

i gave your patch a quick test-boot and networking still works fine.

Ingo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad networking related lag in v2.6.22-rc2

2007-05-23 Thread Linus Torvalds


On Wed, 23 May 2007, Patrick McHardy wrote:
 
 Yes, that looks better, thanks.

There appear to be other obvious problems in the recent cleanups in this 
area..

Look at

psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, 
psched_time_t bound)
{
return min(tv1 - tv2, bound);
}

and compare it to the previous code:

#define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \
min_t(long long, (tv1) - (tv2), bound)

and ponder how that trivial cleanup totally broke the thing. 

Hint: psched_time_t is an u64. What does that mean for

min(tv1 - tv2, bound);

again, when tv2 is larger than tv1. It _used_ to return a negative 
value. Now it returns a positive bound upper bound, because tv1-tv2 
will be used as a huge unsigned (and thus _positive_) integer. And was 
that accidental, or done on purpose?

Sounds accidental to me, since you then want to return a psched_tdiff_t, 
which is typedeffed to be long.

Doesn't sound very safe to me, especially since the commit message for 
this is [NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function, and 
there's no indication that anybody realized that it changed semantics in 
the process.

Hmm? What _should_ that thing do?

Linus
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad networking related lag in v2.6.22-rc2

2007-05-23 Thread Patrick McHardy
Linus Torvalds wrote:
 There appear to be other obvious problems in the recent cleanups in this 
 area..
 
 Look at
 
   psched_tdiff_bounded(psched_time_t tv1, psched_time_t tv2, 
 psched_time_t bound)
   {
   return min(tv1 - tv2, bound);
   }
 
 and compare it to the previous code:
 
   #define PSCHED_TDIFF_SAFE(tv1, tv2, bound) \
   min_t(long long, (tv1) - (tv2), bound)
 
 and ponder how that trivial cleanup totally broke the thing. 
 
 Hint: psched_time_t is an u64. What does that mean for
 
   min(tv1 - tv2, bound);
 
 again, when tv2 is larger than tv1. It _used_ to return a negative 
 value. Now it returns a positive bound upper bound, because tv1-tv2 
 will be used as a huge unsigned (and thus _positive_) integer. And was 
 that accidental, or done on purpose?
 
 Sounds accidental to me, since you then want to return a psched_tdiff_t, 
 which is typedeffed to be long.

 Doesn't sound very safe to me, especially since the commit message for 
 this is [NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function, and 
 there's no indication that anybody realized that it changed semantics in 
 the process.


I did realize it, but tv2  tv1 can't happen and makes no sense for
the users of this function. I probably should have provided a more
detailed changelog entry.

 Hmm? What _should_ that thing do?


It is used to calculate the amount of tokens a tocken bucket has
accumulated since the last refill, thus we always have tv1 = tv2
(modulo ktime wraps). In fact tv2  tv1 was never properly
supported. This macro would have returned the negative long long
value, but all users assign it to a psched_tdiff_t (long), so
depending on the exact values, it might still be interpreted as a
large positive value. Additionally there was a second implementation
for the gettimeofday clocksource that didn't return the negative
difference but the bound value.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad networking related lag in v2.6.22-rc2

2007-05-23 Thread David Miller
From: Ingo Molnar [EMAIL PROTECTED]
Date: Wed, 23 May 2007 13:40:21 +0200

 
 * Herbert Xu [EMAIL PROTECTED] wrote:
 
  [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty
  
  My previous patch that changed the return value of qdisc_restart 
  incorrectly made the case where dequeue returns empty continue 
  processing packets.
  
  This patch is based on diagnosis and fix by Patrick McHardy.
  
  Signed-off-by: Herbert Xu [EMAIL PROTECTED]
 
 also:
 
   Reported-and-debugged-by: Anant Nitya [EMAIL PROTECTED]

Applied, thanks everyone.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad networking related lag in v2.6.22-rc2

2007-05-23 Thread Patrick McHardy
David Miller wrote:
* Herbert Xu [EMAIL PROTECTED] wrote:

[NET_SCHED]: Fix qdisc_restart return value when dequeue is empty

 Applied, thanks everyone.


Even though it didn't fix this problem, this patch I sent earlier is
also needed.

[NET_SCHED]: sch_htb: fix event cache time calculation

The event cache time must be an absolute value, when no event exists it is
incorrectly set to 1s instead of 1s in the future.

Signed-off-by: Patrick McHardy [EMAIL PROTECTED]

---
commit 49d1023ea0ea8377e740123d5954e88a00f78b7c
tree 031c210f1b5e37ade5a4fa519f5808cd49225b89
parent 637fc540b0ad22bf7971929e906e704236af06cd
author Patrick McHardy [EMAIL PROTECTED] Mon, 21 May 2007 23:24:16 +0200
committer Patrick McHardy [EMAIL PROTECTED] Mon, 21 May 2007 23:25:51 +0200

 net/sched/sch_htb.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 99bcec8..035788c 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -976,8 +976,9 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
 
if (q-now = q-near_ev_cache[level]) {
event = htb_do_events(q, level);
-   q-near_ev_cache[level] = event ? event :
- PSCHED_TICKS_PER_SEC;
+   if (!event)
+   event = q-now + PSCHED_TICKS_PER_SEC;
+   q-near_ev_cache[level] = event;
} else
event = q-near_ev_cache[level];