Re: NOHZ: local_softirq_pending 08
On Tuesday 29 May 2007 02:34:22 RafaĆ Bilski wrote: > Hi! > > A lot of "NOHZ: local_softirq_pending 08" messages (about 100) and > then suddenly stoped to show. I have 2.6.21.1. I checked .2 and .3 > changelogs but I don't see anything about this message. > What does it mean? > > Is "08" IRQ number? > 8: 2XT-PIC-XTrtc > > Please CC me. Does this patch [ http://lkml.org/lkml/2007/5/22/35 ] helps you fix it ? Regards Ananitya -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bad networking related lag in v2.6.22-rc2
On Thursday 24 May 2007 03:00:56 David Miller wrote: > From: Ingo Molnar <[EMAIL PROTECTED]> > Date: Wed, 23 May 2007 13:40:21 +0200 > > > * Herbert Xu <[EMAIL PROTECTED]> wrote: > > > [NET_SCHED]: Fix qdisc_restart return value when dequeue is empty > > > > > > My previous patch that changed the return value of qdisc_restart > > > incorrectly made the case where dequeue returns empty continue > > > processing packets. > > > > > > This patch is based on diagnosis and fix by Patrick McHardy. > > > > > > Signed-off-by: Herbert Xu <[EMAIL PROTECTED]> > > > > also: > > > > Reported-and-debugged-by: Anant Nitya <[EMAIL PROTECTED]> > > Applied, thanks everyone. Networking lag I been seeing since 2.6.22-rc1, disappeared after applying this patch. Thanks to everyone who helped me run my system sane again. :) Reagards Ananitya -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] local_softirq_pending storm
On Thursday 24 May 2007 00:08:40 Chuck Ebbert wrote: > Chuck Ebbert wrote: > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240982 > > Another; these started to appear after the below patch was merged: > > Index: linux/kernel/sched.c > > > > > === > > > --- linux.orig/kernel/sched.c > > > +++ linux/kernel/sched.c > > > @@ -4212,9 +4212,7 @@ int __sched cond_resched_softirq(void) > > > BUG_ON(!in_softirq()); > > > > > > if (need_resched() && system_state == SYSTEM_RUNNING) { > > > - raw_local_irq_disable(); > > > - _local_bh_enable(); > > > - raw_local_irq_enable(); > > > + local_bh_enable(); > > > __cond_resched(); > > > local_bh_disable(); > > > return 1; > > May 23 19:26:26 localhost kernel: BUG: warning at > kernel/softirq.c:138/local_bh_enable() (Not tainted) May 23 19:26:26 > localhost kernel: [] local_bh_enable+0x45/0x92 May 23 19:26:26 > localhost kernel: [] cond_resched_softirq+0x2c/0x42 May 23 > 19:26:26 localhost kernel: [] release_sock+0x54/0xa3 May 23 > 19:26:26 localhost kernel: [] prepare_to_wait+0x24/0x3f May 23 > 19:26:26 localhost kernel: [] inet_stream_connect+0x116/0x1ff > May 23 19:26:26 localhost kernel: [] > autoremove_wake_function+0x0/0x35 May 23 19:26:26 localhost kernel: > [] sys_connect+0x82/0xad May 23 19:26:26 localhost kernel: > [] release_sock+0x13/0xa3 May 23 19:26:26 localhost kernel: > [] _spin_unlock_bh+0x5/0xd May 23 19:26:26 localhost kernel: > [] sock_setsockopt+0x4a8/0x4b2 May 23 19:26:26 localhost kernel: > [] sock_attach_fd+0x70/0xd2 May 23 19:26:26 localhost kernel: > [] get_empty_filp+0xfc/0x170 May 23 19:26:26 localhost kernel: > [] sys_setsockopt+0x9b/0xa7 May 23 19:26:26 localhost kernel: > [] sys_socketcall+0xac/0x261 May 23 19:26:26 localhost kernel: > [] syscall_call+0x7/0xb strange, while applying the concerned patch first time I was hand editing __ kernel/sched.c __ and stupidly typed _local_bh_enable() instead of local_bh_enable() and when did a reboot, as soon as I got inside X I was welcomed with following message in system log. [ 152.692609] BUG: at kernel/softirq.c:122 _local_bh_enable() [ 152.692637] [] show_trace_log_lvl+0x1a/0x2f [ 152.692658] [] show_trace+0x12/0x14 [ 152.692668] [] dump_stack+0x16/0x18 [ 152.692678] [] _local_bh_enable+0x8b/0xc3 [ 152.692688] [] cond_resched_softirq+0x2b/0x40 [ 152.692700] [] established_get_first+0x19/0xad [ 152.692712] [] tcp_seq_next+0x76/0x8c [ 152.692722] [] seq_read+0x17b/0x264 [ 152.692733] [] vfs_read+0xad/0x161 [ 152.692745] [] sys_read+0x3d/0x61 [ 152.692755] [] syscall_call+0x7/0xb [ 152.692765] === [ 152.692770] BUG: at kernel/lockdep.c:1937 trace_softirqs_on() [ 152.692777] [] show_trace_log_lvl+0x1a/0x2f [ 152.692789] [] show_trace+0x12/0x14 [ 152.692800] [] dump_stack+0x16/0x18 [ 152.692810] [] trace_softirqs_on+0x5f/0xa5 [ 152.692822] [] _local_bh_enable+0xb3/0xc3 [ 152.692831] [] cond_resched_softirq+0x2b/0x40 [ 152.692842] [] established_get_first+0x19/0xad [ 152.692852] [] tcp_seq_next+0x76/0x8c [ 152.692862] [] seq_read+0x17b/0x264 [ 152.692870] [] vfs_read+0xad/0x161 [ 152.692879] [] sys_read+0x3d/0x61 [ 152.692889] [] syscall_call+0x7/0xb [ 152.692899] === [ 159.257890] NOHZ: local_softirq_pending 22 [ 159.266009] NOHZ: local_softirq_pending 22 [ 159.273965] NOHZ: local_softirq_pending 22 [ 159.281884] NOHZ: local_softirq_pending 22 [ 160.712828] NOHZ: local_softirq_pending 22 [ 162.609377] NOHZ: local_softirq_pending 22 [ 162.609804] NOHZ: local_softirq_pending 22 [ 162.610054] NOHZ: local_softirq_pending 22 [ 162.610279] NOHZ: local_softirq_pending 22 [ 162.610502] NOHZ: local_softirq_pending 22 After realzing my mistake, I changed it to local_bh_enable() as was in patch and since then not a single BUG or local_softirq_pending message occurs in system log, maybe my system waiting for that condition to happen :). -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v13
On Wednesday 23 May 2007 03:36:27 Bill Davidsen wrote: > Anant Nitya wrote: > > On Thursday 17 May 2007 23:15:33 Ingo Molnar wrote: > >> i'm pleased to announce release -v13 of the CFS scheduler patchset. > >> > >> The CFS patch against v2.6.22-rc1, v2.6.21.1 or v2.6.20.10 can be > >> downloaded from the usual place: > >> > >> http://people.redhat.com/mingo/cfs-scheduler/ > >> > >> -v13 is a fixes-only release. It fixes a smaller accounting bug, so if > >> you saw small lags during desktop use under certain workloads then > >> please re-check that workload under -v13 too. It also tweaks SMP > >> load-balancing a bit. (Note: the load-balancing artifact reported by > >> Peter Williams is not a CFS-specific problem and he reproduced it in > >> v2.6.21 too. Nevertheless -v13 should be less prone to such artifacts.) > >> > >> I know about no open CFS regression at the moment, so please re-test > >> -v13 and if you still see any problem please re-report it. Thanks! > >> > >> Changes since -v12: > >> > >> - small tweak: made the "fork flow" of reniced tasks zero-sum > >> > >> - debugging update: /proc//sched is now seqfile based and echoing > >>0 to it clears the maximum-tracking counters. > >> > >> - more debugging counters > >> > >> - small rounding fix to make the statistical average of rounding errors > >>zero > >> > >> - scale both the runtime limit and the granularity on SMP too, and make > >>it dependent on HZ > >> > >> - misc cleanups > >> > >> As usual, any sort of feedback, bugreport, fix and suggestion is more > >> than welcome, > >> > >>Ingo > >> - > > > > Hi > > Been testing this version of CFS from last an hour or so and still facing > > same lag problems while browsing sites with heavy JS and or flash usage. > > Mouse movement is pathetic and audio starts to skip. I haven't face this > > behavior with CFS till v11. > > 'm not seeing this, do have a site or two as examples? Please disregard the above post, lag problem I am experiencing got introduced in 2.6.22-rcX and is network QoS specific and its not related to CFS. Regards Ananitya -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bad networking related lag in v2.6.22-rc2
On Tuesday 22 May 2007 11:52:33 Ingo Molnar wrote: > * Anant Nitya <[EMAIL PROTECTED]> wrote: > > > I think I already found the bug, please try if this patch helps. > > > > Sorry, but this patch is not helping here. I recompiled the kernel > > with this patch but same load pattern still make system to crawl. > > > > Here is the link for script I use to shape traffic. > > > > http://cybertek.info/taitai/adslbwopt.sh > > could you also apply the fix for the softirq problem below, to make sure > it does not interact? > > Ingo > > Index: linux/kernel/sched.c > === > --- linux.orig/kernel/sched.c > +++ linux/kernel/sched.c > @@ -4212,9 +4212,7 @@ int __sched cond_resched_softirq(void) > BUG_ON(!in_softirq()); > > if (need_resched() && system_state == SYSTEM_RUNNING) { > - raw_local_irq_disable(); > - _local_bh_enable(); > - raw_local_irq_enable(); > + local_bh_enable(); > __cond_resched(); > local_bh_disable(); > return 1; Hi Ingo Above patch does solve __ soft_irq_pending __ problem. I am running this patch with kernel 2.6.21.1 since last day doing all kinda things but haven't encountered any __ NOHZ: local_softirq_pending __. But network lag that I am seeing since 2.6.22-rc1 is still there even with this patch applied. If you need any more information please do ask. Meanwhile I will do gitbisect as suggested by linus to find out the specific commit that introduced this problem and will inform once I find it. Its good to see system running without any __ local_softirq_problem __ :) Regards Ananitya -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bad networking related lag in v2.6.22-rc2
On Tuesday 22 May 2007 14:47:47 Patrick McHardy wrote: > Anant Nitya wrote: > >>Patrick McHardy wrote: > >> > >>I think I already found the bug, please try if this patch helps. > > > > Sorry, but this patch is not helping here. I recompiled the kernel with > > this patch but same load pattern still make system to crawl. > > > > Here is the link for script I use to shape traffic. > > > > http://cybertek.info/taitai/adslbwopt.sh > > Thanks. Please also send the output of "tc -s -d qdisc show dev > ppp0" and "tc -d -s class show dev ppp0" at the time the problem > occurs. Here it goes... Regards Ananitya -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism qdisc htb 1: r2q 1 default 50 direct_packets_stat 0 ver 3.17 Sent 837184 bytes 3603 pkt (dropped 0, overlimits 60528154 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc sfq 4210: parent 1:10 limit 50p quantum 1492b flows 50/1024 perturb 10sec Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc sfq 4220: parent 1:20 limit 50p quantum 1492b flows 50/1024 perturb 10sec Sent 102922 bytes 2364 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc sfq 4230: parent 1:30 limit 64p quantum 1492b flows 64/1024 perturb 10sec Sent 12690 bytes 167 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc sfq 4240: parent 1:40 limit 128p quantum 1492b flows 128/1024 perturb 10sec Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc sfq 4250: parent 1:50 limit 64p quantum 1492b flows 64/1024 perturb 10sec Sent 714095 bytes 944 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc sfq 666: parent 1:666 limit 128p quantum 1492b flows 128/1024 perturb 10sec Sent 7477 bytes 128 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 class htb 1:10 parent 1:1 leaf 4210: prio 0 quantum 1000 rate 7000bit ceil 57000bit burst 1599b/8 mpu 0b overhead 0b cburst 1599b/8 mpu 0b overhead 0b level 0 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 lended: 0 borrowed: 0 giants: 0 tokens: 1785713 ctokens: 219297 class htb 1:1 root rate 64000bit ceil 64000bit burst 1599b/8 mpu 0b overhead 0b cburst 1599b/8 mpu 0b overhead 0b level 7 Sent 1533899 bytes 5926 pkt (dropped 0, overlimits 0 requeues 0) rate 74640bit 31pps backlog 0b 0p requeues 0 lended: 1113 borrowed: 0 giants: 0 tokens: -255221 ctokens: -255221 class htb 1:20 parent 1:1 leaf 4220: prio 1 quantum 3125 rate 25000bit ceil 57000bit burst 1599b/8 mpu 0b overhead 0b cburst 1599b/8 mpu 0b overhead 0b level 0 Sent 171357 bytes 3931 pkt (dropped 0, overlimits 0 requeues 0) rate 7376bit 21pps backlog 0b 0p requeues 0 lended: 3931 borrowed: 0 giants: 0 tokens: 341211 ctokens: 150341 class htb 1:30 parent 1:1 leaf 4230: prio 4 quantum 1250 rate 1bit ceil 51000bit burst 1600b/8 mpu 0b overhead 0b cburst 1599b/8 mpu 0b overhead 0b level 0 Sent 15578 bytes 205 pkt (dropped 0, overlimits 0 requeues 0) rate 336bit 0pps backlog 0b 0p requeues 0 lended: 205 borrowed: 0 giants: 0 tokens: 1137614 ctokens: 223153 class htb 1:40 parent 1:1 leaf 4240: prio 4 quantum 1000 rate 8000bit ceil 51000bit burst 1600b/8 mpu 0b overhead 0b cburst 1599b/8 mpu 0b overhead 0b level 0 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 lended: 0 borrowed: 0 giants: 0 tokens: 1562500 ctokens: 245097 class htb 1:50 parent 1:1 leaf 4250: prio 4 quantum 1000 rate 7000bit ceil 6bit burst 1599b/8 mpu 0b overhead 0b cburst 1599b/8 mpu 0b overhead 0b level 0 Sent 1329025 bytes 1455 pkt (dropped 0, overlimits 0 requeues 0) rate 65384bit 6pps backlog 0b 0p requeues 0 lended: 342 borrowed: 1113 giants: 0 tokens: -2865704 ctokens: 1887 class htb 1:666 parent 1:1 leaf 666: prio 7 quantum 1492 rate 3000bit ceil 48000bit burst 1599b/8 mpu 0b overhead 0b cburst 1599b/8 mpu 0b overhead 0b level 0 Sent 17939 bytes 335 pkt (dropped 0, overlimits 0 requeues 0) rate 1104bit 2pps backlog 0b 0p requeues 0 lended: 335 borrowed: 0 giants: 0 tokens: 3937910 ctokens: 246505
Re: bad networking related lag in v2.6.22-rc2
On Monday 21 May 2007 15:50:09 Ingo Molnar wrote: > * Anant Nitya <[EMAIL PROTECTED]> wrote: > > Tcp: > > 5 connections established > > hm, this does not explain the /proc/net/tcp overhead i think - although > it could be a red herring. Will have a closer look at your new trace. > > if possible please try to generate the automatic softirq trace for > Thomas, and then a separate trace for the firefox/net-lag thing, using > trace-it-10sec.c. Btw., for the second trace, could you boot with > maxcpus=1? That would make the second trace quite a bit more > straightforward to analyze. You probably need both cpus to trigger the > softirq problem. > > Ingo here is the link for new trace with maxcpus=1. http://cybertek.info/taitai/trace-it-10sec-to-ingo-with-maxcpus=1.bz2 Regards Ananitya -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bad networking related lag in v2.6.22-rc2
On Tuesday 22 May 2007 03:00:31 Patrick McHardy wrote: > Patrick McHardy wrote: > > Ingo Molnar wrote: > >>* Anant Nitya <[EMAIL PROTECTED]> wrote: > >>>I am posting links to the information you asked for. One more thing, > >>>after digging a bit more I found its QoS shaping that is making the > >>>box crawl. Once I disabled the traffic shaping everything comes back > >>>to smooth and normal. Shaping being done on very low speed residential > >>>ADSL 256/64 Kbps connection. If you want me to post shaping rules, > >>>please free to ask. BTW its a simple HTB/SFQ rules. > >> > >>[...] > >> > >>>http://cybertek.info/taitai/trace-to-ingo.txt.bz2 > >> > >>thanks! This trace indeed includes the smoking gun, htb_dequeue() and > >>__qdisc_run(): > >> > >>[..] > > > > This looks like fallout from the switch to hrtimers. Anant, please > > send me your HTB script, I'll try to reproduce it. > > I think I already found the bug, please try if this patch helps. Sorry, but this patch is not helping here. I recompiled the kernel with this patch but same load pattern still make system to crawl. Here is the link for script I use to shape traffic. http://cybertek.info/taitai/adslbwopt.sh Regards Ananitya -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bad networking related lag in v2.6.22-rc2
On Monday 21 May 2007 13:42:01 Ingo Molnar wrote: > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > ouch! a nearly 1 second delay got observed by the scheduler - something > > > is really killing your system! > > > > ah, you got the latency tracer from Thomas, as part of the -hrt patchset > > - that makes it quite a bit easier to debug. [...] > > and ... you already did a trace for Thomas, for the softirq problem: > >http://cybertek.info/taitai/trace.txt.bz2 > > this trace shows really bad networking related kernel activities! > > gkrellm-5977 does this at timestamp 0: > > gkrellm-5977 0..s.0us : cond_resched_softirq (established_get_next) > > 2 milliseconds later it's still in established_get_next() (!): > > gkrellm-5977 0..s. 2001us : cond_resched_softirq (established_get_next) > > and the whole thing takes ... 455 msecs: > > gkrellm-5977 0..s. 455443us+: cond_resched_softirq (established_get_next) > > i think this suggests that you have tons of open sockets. What does > "netstat -ts" say on your box? I am posting links to the information you asked for. One more thing, after digging a bit more I found its QoS shaping that is making the box crawl. Once I disabled the traffic shaping everything comes back to smooth and normal. Shaping being done on very low speed residential ADSL 256/64 Kbps connection. If you want me to post shaping rules, please free to ask. BTW its a simple HTB/SFQ rules. http://cybertek.info/taitai/netstat-ts-before-crawl-normal-workload.txt http://cybertek.info/taitai/netstat-ts-while-crawl-normal-workload.txt http://cybertek.info/taitai/trace-to-ingo.txt.bz2 Regards Ananitya > > Ingo -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] local_softirq_pending storm
On Monday 21 May 2007 14:31:57 Thomas Gleixner wrote: > On Mon, 2007-05-21 at 11:52 +0530, Anant Nitya wrote: > > > You should find something like: > > > > > > ( swapper-0|#0): new 67173 us user-latency. > > > > > > along with the familiar "NOHZ .." message in your log file. > > > > > > Once that happened please do: > > > > > > $ cat /proc/latency_trace >trace.txt > > > > > > compress it and send it to me along with the full dmesg output or put > > > both up to some place, where I can download it. > > > > Hi Thomas > > > > Here are the links... > > http://cybertek.info/taitai/dmesg-2.6.22.rc2.hrt2-1.SMP.DN.LINUX.txt > > http://cybertek.info/taitai/trace.txt.bz2 > > Thanks. Sorry, I need more info. I uploaded a new tracer.diff to > > http://www.tglx.de/private/tglx/ht-debug/tracer.diff > > Can you please revert the first one and retest with the new one ? > > Thanks, Sorry for delay, here is link for output from new tacer. http://cybertek.info/taitai/trace-new.txt.bz2 Regards Ananitya > > tglx -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: bad networking related lag in v2.6.22-rc2
On Monday 21 May 2007 13:42:01 Ingo Molnar wrote: > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > ouch! a nearly 1 second delay got observed by the scheduler - something > > > is really killing your system! > > > > ah, you got the latency tracer from Thomas, as part of the -hrt patchset > > - that makes it quite a bit easier to debug. [...] > > and ... you already did a trace for Thomas, for the softirq problem: > >http://cybertek.info/taitai/trace.txt.bz2 > > this trace shows really bad networking related kernel activities! > > gkrellm-5977 does this at timestamp 0: > > gkrellm-5977 0..s.0us : cond_resched_softirq (established_get_next) > > 2 milliseconds later it's still in established_get_next() (!): > > gkrellm-5977 0..s. 2001us : cond_resched_softirq (established_get_next) > > and the whole thing takes ... 455 msecs: > > gkrellm-5977 0..s. 455443us+: cond_resched_softirq (established_get_next) > > i think this suggests that you have tons of open sockets. What does > "netstat -ts" say on your box? On 2.6.21.1 doing normal work while seeding few torrents produces this with "netstat -ts". I will send you same information for 2.6.22-rc2 after a reboot. Regards Ananitya > > Ingo -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism Tcp: 1233 active connections openings 845 passive connection openings 9 failed connection attempts 164 connection resets received 5 connections established 44995 segments received 43171 segments send out 183 segments retransmited 0 bad segments received. 192 resets sent UdpLite: TcpExt: 696 TCP sockets finished time wait in fast timer 3273 delayed acks sent 12 delayed acks further delayed because of locked socket Quick ack mode was activated 38 times 4867 packets directly queued to recvmsg prequeue. 31660 packets directly received from backlog 6759887 packets directly received from prequeue 12038 packets header predicted 3228 packets header predicted and directly queued to user 4795 acknowledgments not containing data received 12680 predicted acknowledgments 15 times recovered from packet loss due to SACK data 4 congestion windows recovered after partial ack 12 TCP data loss events 5 timeouts after SACK recovery 8 timeouts in loss state 20 fast retransmits 25 retransmits in slow start 65 other TCP timeouts 5 sack retransmits failed 3 times receiver scheduled too late for direct processing 16 DSACKs sent for old packets 152 connections reset due to unexpected data 5 connections reset due to early user close
Re: [BUG] local_softirq_pending storm
On Monday 21 May 2007 14:31:57 Thomas Gleixner wrote: > On Mon, 2007-05-21 at 11:52 +0530, Anant Nitya wrote: > > > You should find something like: > > > > > > ( swapper-0|#0): new 67173 us user-latency. > > > > > > along with the familiar "NOHZ .." message in your log file. > > > > > > Once that happened please do: > > > > > > $ cat /proc/latency_trace >trace.txt > > > > > > compress it and send it to me along with the full dmesg output or put > > > both up to some place, where I can download it. > > > > Hi Thomas > > > > Here are the links... > > http://cybertek.info/taitai/dmesg-2.6.22.rc2.hrt2-1.SMP.DN.LINUX.txt > > http://cybertek.info/taitai/trace.txt.bz2 > > Thanks. Sorry, I need more info. I uploaded a new tracer.diff to > > http://www.tglx.de/private/tglx/ht-debug/tracer.diff > > Can you please revert the first one and retest with the new one ? Okay sure, compiling now. Regards Ananitya > > Thanks, > > tglx -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] local_softirq_pending storm
On Monday 21 May 2007 03:13:08 Thomas Gleixner wrote: > On Sun, 2007-05-20 at 02:53 +0530, Anant Nitya wrote: > > > 1 == TASK_INTERRUPTIBLE, so we know that ksoftirqd was not woken up. At > > > least it is not a scheduler problem. > > > > > > I work out a more complex debug patch and pester you to test once I'm > > > done. > > > > No problem :) > > You asked for it :) > > Please patch 2.6.22-rc2 with > > http://tglx.de/projects/hrtimers/2.6.22-rc2/patch-2.6.22-rc2-hrt2.patch > and > http://www.tglx.de/private/tglx/ht-debug/tracer.diff > > Compile it with the config > > http://www.tglx.de/private/tglx/ht-debug/config.debug > > You should find something like: > > ( swapper-0|#0): new 67173 us user-latency. > > along with the familiar "NOHZ .." message in your log file. > > Once that happened please do: > > $ cat /proc/latency_trace >trace.txt > > compress it and send it to me along with the full dmesg output or put > both up to some place, where I can download it. Hi Thomas Here are the links... http://cybertek.info/taitai/dmesg-2.6.22.rc2.hrt2-1.SMP.DN.LINUX.txt http://cybertek.info/taitai/trace.txt.bz2 Regards Ananitya > > Michal, > > IIRC you encountered the same P4/HT related wreckage. Can you do the > same ? > > Thanks, > > tglx -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] local_softirq_pending storm
On Sunday 20 May 2007 00:41:08 Thomas Gleixner wrote: > On Sat, 2007-05-19 at 15:25 +0530, Anant Nitya wrote: > > > No idea. I uploaded a debug patch against 2.6.22-rc1 to > > > > > > http://www.tglx.de/private/tglx/2.6.22-rc1-hrt-debug.patch > > > > > > Can you give it a try and report the output ? > > > > Hi > > Here it goes > > [ 159.646196] NOHZ softirq pending 22 on CPU 0 > > [ 159.646207] task state: 1 > > 1 == TASK_INTERRUPTIBLE, so we know that ksoftirqd was not woken up. At > least it is not a scheduler problem. > > I work out a more complex debug patch and pester you to test once I'm > done. No problem :) Regards Ananitya > > tglx -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v13
On Friday 18 May 2007 15:56:07 Ingo Molnar wrote: > * Anant Nitya <[EMAIL PROTECTED]> wrote: > > Hi > > > > Been testing this version of CFS from last an hour or so and still > > facing same lag problems while browsing sites with heavy JS and or > > flash usage. Mouse movement is pathetic and audio starts to skip. I > > haven't face this behavior with CFS till v11. > > i have just tried 5 different versions of the Flash plugin and i cannot > reproduce this (flash games are still smooth and acceptable even with > the system significantly overloaded with 5 infite loops or with a kernel > build), so it would be nice if you could help me debug this problem. > > The last version that worked for you was v11, correct? The biggest v11 > -> v12 change was the yield workaround, and while testing your workload > i also noticed that all Flash versions except the latest one (9.0 r31) > use sys_sched_yield() quite frequently. So it would be nice to know > which plugin version you are using (and which Firefox version): you can > check that by typing about:plugins into firefox. Furthermore, could you > also try the following tune: > >echo 0 > /proc/sys/kernel/sched_yield_bug_workaround > > and this: > >echo 2 > /proc/sys/kernel/sched_yield_bug_workaround > > if none of this changes behavior then please send me the output of the > following: > > strace -ttt -TTT -o strace.txt -f -p `pidof firefox-bin` > < reproduce the lag in firefox > > < Ctrl-C the strace > > > and send me the strace.txt file (off-line, it's going to be large). > Thanks, Hi Ingo, Please ignore my last report about lag problem while using CFS-v13, it is working perfectly fine with 2.6.21.1 and the lag I used to see in v12 is not there with v13 anymore. After digging in a bit I found that problem is only occurring in 2.6.22-rc1 and it get fired by network usage while transmitting data upstream. I don't have any evidence that CFS is involved in lag problem since 2.6.22-rc1 with stock scheduler is also having same lag problem and it seems directly proportional with upstream speed while downstream doesn't shows any misbehavior { at lower upstream speed lag is less but with higher upstream speed system starts crawling and system load hitting to 70/75}. Lets see how 2.6.22-rc2 is doing. Regards Ananitya > > Ingo -- Out of many thousands, one may endeavor for perfection, and of those who have achieved perfection, hardly one knows Me in truth. -- Gita Sutra Of Mysticism - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] local_softirq_pending storm
On Friday 18 May 2007 18:31:17 Thomas Gleixner wrote: > On Thu, 2007-05-17 at 12:11 +0530, Anant Nitya wrote: > > On Friday 11 May 2007 03:28:46 Thomas Gleixner wrote: > > > Ok, that's consistent with earlier reports. The problem surfaces when > > > one of the SMT-"cpus" goes idle. The problem goes away when you disable > > > hyperthreading. > > > > Yes with HT disabled in BIOS there is no local_softirq_pending messages. > > BTW why does this problem persist only with X ? > > No idea. I uploaded a debug patch against 2.6.22-rc1 to > > http://www.tglx.de/private/tglx/2.6.22-rc1-hrt-debug.patch > > Can you give it a try and report the output ? Hi Here it goes [ 159.646196] NOHZ softirq pending 22 on CPU 0 [ 159.646207] task state: 1 [ 159.646217] last caller: __tasklet_schedule [ 159.646997] NOHZ softirq pending 22 on CPU 0 [ 159.647006] task state: 1 [ 159.647013] last caller: __tasklet_schedule [ 159.647398] NOHZ softirq pending 22 on CPU 0 [ 159.647405] task state: 1 [ 159.647412] last caller: __tasklet_schedule [ 159.647768] NOHZ softirq pending 22 on CPU 0 [ 159.647775] task state: 1 [ 159.647781] last caller: __tasklet_schedule [ 166.285664] NOHZ softirq pending 22 on CPU 0 [ 166.285675] task state: 1 [ 166.285687] last caller: raise_softirq [ 166.286321] NOHZ softirq pending 22 on CPU 0 [ 166.286329] task state: 1 [ 166.286337] last caller: raise_softirq [ 166.286715] NOHZ softirq pending 22 on CPU 0 [ 166.286722] task state: 1 [ 166.286729] last caller: raise_softirq [ 166.287085] NOHZ softirq pending 22 on CPU 0 [ 166.287092] task state: 1 [ 166.287098] last caller: raise_softirq [ 171.512134] NOHZ softirq pending 22 on CPU 0 [ 171.512144] task state: 1 [ 171.512154] last caller: __tasklet_schedule [ 171.512712] NOHZ softirq pending 22 on CPU 0 [ 171.512720] task state: 1 [ 171.512727] last caller: __tasklet_schedule Regards Ananitya - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] local_softirq_pending storm
On Friday 18 May 2007 18:31:17 Thomas Gleixner wrote: > On Thu, 2007-05-17 at 12:11 +0530, Anant Nitya wrote: > > On Friday 11 May 2007 03:28:46 Thomas Gleixner wrote: > > > Ok, that's consistent with earlier reports. The problem surfaces when > > > one of the SMT-"cpus" goes idle. The problem goes away when you disable > > > hyperthreading. > > > > Yes with HT disabled in BIOS there is no local_softirq_pending messages. > > BTW why does this problem persist only with X ? > > No idea. I uploaded a debug patch against 2.6.22-rc1 to > > http://www.tglx.de/private/tglx/2.6.22-rc1-hrt-debug.patch > > Can you give it a try and report the output ? I am compiling kernel with above patch applied and will post the results. > > > When you apply the ratelimit patch, does the softlockup problem > > > persist ? > > > > Yes, though softlockup is rare and mostly hit when system is under high > > load. Apart of that I am also getting following messages consistently > > across multiple boot cycles with NOHZ=y and ratelimit patch applied. > > > > May 15 11:51:22 rudra kernel: [ 2594.341068] Clocksource tsc unstable > > (delta = 28111260302 ns) > > May 15 11:51:22 rudra kernel: [ 2594.343194] Time: acpi_pm clocksource > > has been installed. > > That's informal. The TSC is detected to be unstable and replaced by the > pm timer. Nothing to worry about. It happens with NOHZ=n as well, > right ? Nope it don't appear with nohz=no till now across so many boot cycles, it always appears with nohz!=no. Regards Ananitya - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v13
On Friday 18 May 2007 15:56:07 Ingo Molnar wrote: > * Anant Nitya <[EMAIL PROTECTED]> wrote: > > Hi > > > > Been testing this version of CFS from last an hour or so and still > > facing same lag problems while browsing sites with heavy JS and or > > flash usage. Mouse movement is pathetic and audio starts to skip. I > > haven't face this behavior with CFS till v11. > > i have just tried 5 different versions of the Flash plugin and i cannot > reproduce this (flash games are still smooth and acceptable even with > the system significantly overloaded with 5 infite loops or with a kernel > build), so it would be nice if you could help me debug this problem. > > The last version that worked for you was v11, correct? The biggest v11 > -> v12 change was the yield workaround, and while testing your workload > i also noticed that all Flash versions except the latest one (9.0 r31) > use sys_sched_yield() quite frequently. So it would be nice to know > which plugin version you are using (and which Firefox version): you can > check that by typing about:plugins into firefox. Furthermore, could you > also try the following tune: Hi I am using konqueror and about:plugins gives back this information regarding flashplayer. Shockwave Flash Shockwave Flash 9.0 r31 libflashplayer.so application/x-shockwave-flash - Shockwave Flash (swf) application/futuresplash - FutureSplash Player (spl) > >echo 0 > /proc/sys/kernel/sched_yield_bug_workaround > > and this: > >echo 2 > /proc/sys/kernel/sched_yield_bug_workaround > These values do visibly makes browsing smooth but it still lags though lag time is less compared to original values. > if none of this changes behavior then please send me the output of the > following: > > strace -ttt -TTT -o strace.txt -f -p `pidof firefox-bin` > < reproduce the lag in firefox > > < Ctrl-C the strace > > > and send me the strace.txt file (off-line, it's going to be large). > Thanks, I am sending you all these information off list. Regards Ananitya - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v13
On Thursday 17 May 2007 23:15:33 Ingo Molnar wrote: > i'm pleased to announce release -v13 of the CFS scheduler patchset. > > The CFS patch against v2.6.22-rc1, v2.6.21.1 or v2.6.20.10 can be > downloaded from the usual place: > > http://people.redhat.com/mingo/cfs-scheduler/ > > -v13 is a fixes-only release. It fixes a smaller accounting bug, so if > you saw small lags during desktop use under certain workloads then > please re-check that workload under -v13 too. It also tweaks SMP > load-balancing a bit. (Note: the load-balancing artifact reported by > Peter Williams is not a CFS-specific problem and he reproduced it in > v2.6.21 too. Nevertheless -v13 should be less prone to such artifacts.) > > I know about no open CFS regression at the moment, so please re-test > -v13 and if you still see any problem please re-report it. Thanks! > > Changes since -v12: > > - small tweak: made the "fork flow" of reniced tasks zero-sum > > - debugging update: /proc//sched is now seqfile based and echoing >0 to it clears the maximum-tracking counters. > > - more debugging counters > > - small rounding fix to make the statistical average of rounding errors >zero > > - scale both the runtime limit and the granularity on SMP too, and make >it dependent on HZ > > - misc cleanups > > As usual, any sort of feedback, bugreport, fix and suggestion is more > than welcome, > > Ingo > - Hi Been testing this version of CFS from last an hour or so and still facing same lag problems while browsing sites with heavy JS and or flash usage. Mouse movement is pathetic and audio starts to skip. I haven't face this behavior with CFS till v11. Regards Ananitya - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] local_softirq_pending storm
On Friday 11 May 2007 03:28:46 Thomas Gleixner wrote: > Ok, that's consistent with earlier reports. The problem surfaces when > one of the SMT-"cpus" goes idle. The problem goes away when you disable > hyperthreading. Yes with HT disabled in BIOS there is no local_softirq_pending messages. BTW why does this problem persist only with X ? > When you apply the ratelimit patch, does the softlockup problem > persist ? > Yes, though softlockup is rare and mostly hit when system is under high load. Apart of that I am also getting following messages consistently across multiple boot cycles with NOHZ=y and ratelimit patch applied. May 15 11:51:22 rudra kernel: [ 2594.341068] Clocksource tsc unstable (delta = 28111260302 ns) May 15 11:51:22 rudra kernel: [ 2594.343194] Time: acpi_pm clocksource has been installed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] local_softirq_pending storm
Hi, Ever since I upgrade to 2.6.21/.1, system log is filled with following messages if I enable CONFIG_NO_HZ=y, going through archives it seems ingo sometime back posted some patch and now it is upstream, but its not helping here. If I disable NOHZ by kernel command line nohz=off this problem disappears. This system is P4/2.40GHz/HT with SMP/SMT on in kernel config. One more thing that I noticed is this problem only arises while using X or network otherwise plain command line with no network access don't trigger this with nohz=on. Please ask if more information is required regarding this setup. [EMAIL PROTECTED] [~]$ >> grep NOHZ /var/log/messages May 8 03:38:14 rudra kernel: [ 419.271195] NOHZ: local_softirq_pending 02 May 8 03:38:14 rudra kernel: [ 419.271663] NOHZ: local_softirq_pending 02 May 8 03:38:14 rudra kernel: [ 419.343948] NOHZ: local_softirq_pending 22 May 8 03:38:14 rudra kernel: [ 419.344236] NOHZ: local_softirq_pending 22 May 8 03:38:14 rudra kernel: [ 419.344397] NOHZ: local_softirq_pending 22 May 8 03:38:14 rudra kernel: [ 419.344545] NOHZ: local_softirq_pending 22 May 8 03:38:14 rudra kernel: [ 419.344691] NOHZ: local_softirq_pending 22 May 8 03:38:14 rudra kernel: [ 419.344842] NOHZ: local_softirq_pending 22 May 8 03:38:14 rudra kernel: [ 419.344991] NOHZ: local_softirq_pending 22 May 8 03:38:14 rudra kernel: [ 419.345137] NOHZ: local_softirq_pending 22 May 8 03:38:18 rudra kernel: [ 423.065780] NOHZ: local_softirq_pending 22 May 8 03:38:18 rudra kernel: [ 423.066206] NOHZ: local_softirq_pending 22 May 8 03:38:18 rudra kernel: [ 423.066509] NOHZ: local_softirq_pending 22 May 8 03:38:19 rudra kernel: [ 424.006549] NOHZ: local_softirq_pending 22 May 8 03:38:19 rudra kernel: [ 424.006983] NOHZ: local_softirq_pending 22 May 8 03:38:19 rudra kernel: [ 424.007239] NOHZ: local_softirq_pending 22 May 8 03:38:19 rudra kernel: [ 424.007473] NOHZ: local_softirq_pending 22 May 8 03:38:19 rudra kernel: [ 424.007706] NOHZ: local_softirq_pending 22 May 8 03:38:19 rudra kernel: [ 424.007941] NOHZ: local_softirq_pending 22 May 8 03:38:22 rudra kernel: [ 426.862456] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.331619] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.331991] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.332192] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.332378] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.332553] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.332740] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.332914] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.333097] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.333271] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.333443] NOHZ: local_softirq_pending 22 May 8 03:38:23 rudra kernel: [ 428.333619] NOHZ: local_softirq_pending 22 May 8 03:38:24 rudra kernel: [ 429.261574] NOHZ: local_softirq_pending 22 May 8 03:38:24 rudra kernel: [ 429.262024] NOHZ: local_softirq_pending 22 May 8 03:38:24 rudra kernel: [ 429.262339] NOHZ: local_softirq_pending 22 May 8 03:38:24 rudra kernel: [ 429.262610] NOHZ: local_softirq_pending 22 May 8 03:38:24 rudra kernel: [ 429.262847] NOHZ: local_softirq_pending 22 May 8 03:38:24 rudra kernel: [ 429.263081] NOHZ: local_softirq_pending 22 May 8 03:38:35 rudra kernel: [ 440.182998] NOHZ: local_softirq_pending 22 May 8 03:38:35 rudra kernel: [ 440.183408] NOHZ: local_softirq_pending 22 May 8 03:38:35 rudra kernel: [ 440.183661] NOHZ: local_softirq_pending 22 May 8 03:38:43 rudra kernel: [ 448.272087] NOHZ: local_softirq_pending 22 May 8 03:38:43 rudra kernel: [ 448.272529] NOHZ: local_softirq_pending 22 May 8 03:38:44 rudra kernel: [ 449.223360] NOHZ: local_softirq_pending 22 May 8 03:38:44 rudra kernel: [ 449.223887] NOHZ: local_softirq_pending 22 May 8 03:38:44 rudra kernel: [ 449.224570] NOHZ: local_softirq_pending 22 May 8 03:38:44 rudra kernel: [ 449.225066] NOHZ: local_softirq_pending 22 May 8 03:38:44 rudra kernel: [ 449.232989] NOHZ: local_softirq_pending 22 May 8 03:38:47 rudra kernel: [ 452.178583] NOHZ: local_softirq_pending a2 May 8 03:38:47 rudra kernel: [ 452.179017] NOHZ: local_softirq_pending a2 May 8 03:38:47 rudra kernel: [ 452.179257] NOHZ: local_softirq_pending a2 May 8 03:38:51 rudra kernel: [ 455.957968] NOHZ: local_softirq_pending 22 May 8 03:38:51 rudra kernel: [ 455.958462] NOHZ: local_softirq_pending 22 May 8 03:38:51 rudra kernel: [ 455.958741] NOHZ: local_softirq_pending 22 May 8 03:38:51 rudra kernel: [ 455.958984] NOHZ: local_softirq_pending 22 May 8 03:38:51 rudra kernel: [ 455.959292] NOHZ: local_softirq_pending 22 May 8 03:38:51 rudra kernel: [ 455.959540] NOHZ: local_softirq_pending 22 May 8 03:38:51 rudra kernel: [ 455.959774] NOHZ: local_softirq_pending 22 Ma