Re: CPU load
On Mon, 26 Feb 2007 13:42:50 +0300 (MSK) malc wrote: > On Mon, 26 Feb 2007, Pavel Machek wrote: > > > Hi! > > > >> [..snip..] > >> > The current situation ought to be documented. Better yet some flag > can > >>> > >>> It probably _is_ documented, somewhere :-). If you find nice place > >>> where to document it (top manpage?) go ahead with the patch. > >> > >> > >> How about this: > > > > Looks okay to me. (You should probably add your name to it, and I do > > not like html-like markup... plus please don't add extra spaces > > between words)... > > Thanks. html-like markup was added to clearly mark the boundaries of > the message and the text. Extra-spaces courtesy emacs' C-0 M-q. > > > > > You probably want to send it to akpm? > > Any pointers on how to do that and perhaps preferred submission > format? > > [..snip..] Well, he wrote it up and posted it at http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Mon, 26 Feb 2007, Pavel Machek wrote: Hi! [..snip..] The current situation ought to be documented. Better yet some flag can It probably _is_ documented, somewhere :-). If you find nice place where to document it (top manpage?) go ahead with the patch. How about this: Looks okay to me. (You should probably add your name to it, and I do not like html-like markup... plus please don't add extra spaces between words)... Thanks. html-like markup was added to clearly mark the boundaries of the message and the text. Extra-spaces courtesy emacs' C-0 M-q. You probably want to send it to akpm? Any pointers on how to do that and perhaps preferred submission format? [..snip..] -- vale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
Hi! > [..snip..] > > >>The current situation ought to be documented. Better yet some flag > >>can > > > >It probably _is_ documented, somewhere :-). If you find nice place > >where to document it (top manpage?) go ahead with the patch. > > > How about this: Looks okay to me. (You should probably add your name to it, and I do not like html-like markup... plus please don't add extra spaces between words)... You probably want to send it to akpm? Pavel > > CPU load > > > Linux exports various bits of information via `/proc/stat'and > `/proc/uptime' that userland tools, such as top(1), use to calculate > the average time system spent in a particular state, for example: > > > $ iostat > Linux 2.6.18.3-exp (linmac) 02/20/2007 > > avg-cpu: %user %nice %system %iowait %steal %idle > 10.010.002.925.440.00 81.63 > > ... > > > Here the system thinks that over the default sampling period the > system spent 10.01% of the time doing work in user space, 2.92% in the > kernel, and was overall 81.63% of the time idle. > > In most cases the `/proc/stat' information reflects the reality quite > closely, however due to the nature of how/when the kernel collects > this data sometimes it can not be trusted at all. > > So how is this information collected? Whenever timer interrupt is > signalled the kernel looks what kind of task was running at this > moment and increments the counter that corresponds to this tasks > kind/state. The problem with this is that the system could have > switched between various states multiple times betweentwo timer > interrupts yet the counter is incremented only for the last state. > > > Example > --- > > If we imagine the system with one task that periodically burns cycles > in the following manner: > > time line between two timer interrupts > |--| > ^^ > |_ something begins working | > |_ something goes to sleep > (only to be awaken quite soon) > > In the above situation the system will be 0% loaded according to the > `/proc/stat' (since the timer interrupt will always happen when the > system is executing the idle handler), but in reality the load is > closer to 99%. > > One can imagine many more situations where this behavior of the kernel > will lead to quite erratic information inside `/proc/stat'. > > > /* gcc -o hog smallhog.c */ > #include > #include > #include > #include > #define HIST 10 > > static volatile sig_atomic_t stop; > > static void sighandler (int signr) > { > (void) signr; > stop = 1; > } > static unsigned long hog (unsigned long niters) > { > stop = 0; > while (!stop && --niters); > return niters; > } > int main (void) > { > int i; > struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 }, > .it_value = { .tv_sec = 0, .tv_usec = 1 } }; > sigset_t set; > unsigned long v[HIST]; > double tmp = 0.0; > unsigned long n; > signal (SIGALRM, &sighandler); > setitimer (ITIMER_REAL, &it, NULL); > > hog (ULONG_MAX); > for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX); > for (i = 0; i < HIST; ++i) tmp += v[i]; > tmp /= HIST; > n = tmp - (tmp / 3.0); > > sigemptyset (&set); > sigaddset (&set, SIGALRM); > > for (;;) { > hog (n); > sigwait (&set, &i); > } > return 0; > } > > > References > -- > > http://lkml.org/lkml/2007/2/12/6 > Documentation/filesystems/proc.txt (1.8) > > -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Wed, 14 Feb 2007, Pavel Machek wrote: Hi! [..snip..] The current situation ought to be documented. Better yet some flag can It probably _is_ documented, somewhere :-). If you find nice place where to document it (top manpage?) go ahead with the patch. How about this: CPU load Linux exports various bits of information via `/proc/stat'and `/proc/uptime' that userland tools, such as top(1), use to calculate the average time system spent in a particular state, for example: $ iostat Linux 2.6.18.3-exp (linmac) 02/20/2007 avg-cpu: %user %nice %system %iowait %steal %idle 10.010.002.925.440.00 81.63 ... Here the system thinks that over the default sampling period the system spent 10.01% of the time doing work in user space, 2.92% in the kernel, and was overall 81.63% of the time idle. In most cases the `/proc/stat' information reflects the reality quite closely, however due to the nature of how/when the kernel collects this data sometimes it can not be trusted at all. So how is this information collected? Whenever timer interrupt is signalled the kernel looks what kind of task was running at this moment and increments the counter that corresponds to this tasks kind/state. The problem with this is that the system could have switched between various states multiple times betweentwo timer interrupts yet the counter is incremented only for the last state. Example --- If we imagine the system with one task that periodically burns cycles in the following manner: time line between two timer interrupts |--| ^^ |_ something begins working | |_ something goes to sleep (only to be awaken quite soon) In the above situation the system will be 0% loaded according to the `/proc/stat' (since the timer interrupt will always happen when the system is executing the idle handler), but in reality the load is closer to 99%. One can imagine many more situations where this behavior of the kernel will lead to quite erratic information inside `/proc/stat'. /* gcc -o hog smallhog.c */ #include #include #include #include #define HIST 10 static volatile sig_atomic_t stop; static void sighandler (int signr) { (void) signr; stop = 1; } static unsigned long hog (unsigned long niters) { stop = 0; while (!stop && --niters); return niters; } int main (void) { int i; struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 }, .it_value = { .tv_sec = 0, .tv_usec = 1 } }; sigset_t set; unsigned long v[HIST]; double tmp = 0.0; unsigned long n; signal (SIGALRM, &sighandler); setitimer (ITIMER_REAL, &it, NULL); hog (ULONG_MAX); for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX); for (i = 0; i < HIST; ++i) tmp += v[i]; tmp /= HIST; n = tmp - (tmp / 3.0); sigemptyset (&set); sigaddset (&set, SIGALRM); for (;;) { hog (n); sigwait (&set, &i); } return 0; } References -- http://lkml.org/lkml/2007/2/12/6 Documentation/filesystems/proc.txt (1.8) -- vale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
Hi! > > >>>I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu > >>>without being noticed. > >> > >>Slightly changed version of hog(around 3 lines in total changed) does that > >>easily on 2.6.18.3 on PPC. > >> > >>http://www.boblycat.org/~malc/apc/load-hog-ppc.png > > > >I guess it's worth mentioning this is _only_ about displaying the cpu > >usage to > >userspace, as the cpu scheduler knows the accounting of each task in > >different ways. This behaviour can not be used to exploit the cpu scheduler > >into a starvation situation. Using the discrete per process accounting to > >accumulate the displayed values to userspace would fix this problem, but > >would be expensive. > > Guess you are right, but, once again, the problem is not so much about > fooling the system to do something or other, but confusing the user: > > a. Everything is fine - the load is 0%, the fact that the system is >overheating and/or that some processes do not do as much as they >could is probably due to the bad hardware. > > b. The weird load pattern must be the result of bugs in my code. >(And then a whole lot of time/effort is poured into fixing the > problem which is simply not there) > > The current situation ought to be documented. Better yet some flag > can It probably _is_ documented, somewhere :-). If you find nice place where to document it (top manpage?) go ahead with the patch. > be introduced somewhere in the system so that it exports realy values to > /proc, not the estimations that are innacurate in some cases (like hog) Patch would be welcome, but I do not think it will be easy. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Wednesday 14 February 2007 18:28, malc wrote: > On Wed, 14 Feb 2007, Con Kolivas wrote: > > On Wednesday 14 February 2007 09:01, malc wrote: > >> On Mon, 12 Feb 2007, Pavel Machek wrote: > >>> Hi! > > [..snip..] > > >>> I have (had?) code that 'exploits' this. I believe I could eat 90% of > >>> cpu without being noticed. > >> > >> Slightly changed version of hog(around 3 lines in total changed) does > >> that easily on 2.6.18.3 on PPC. > >> > >> http://www.boblycat.org/~malc/apc/load-hog-ppc.png > > > > I guess it's worth mentioning this is _only_ about displaying the cpu > > usage to userspace, as the cpu scheduler knows the accounting of each > > task in different ways. This behaviour can not be used to exploit the cpu > > scheduler into a starvation situation. Using the discrete per process > > accounting to accumulate the displayed values to userspace would fix this > > problem, but would be expensive. > > Guess you are right, but, once again, the problem is not so much about > fooling the system to do something or other, but confusing the user: Yes and I certainly am not arguing against that. > > a. Everything is fine - the load is 0%, the fact that the system is > overheating and/or that some processes do not do as much as they > could is probably due to the bad hardware. > > b. The weird load pattern must be the result of bugs in my code. > (And then a whole lot of time/effort is poured into fixing the > problem which is simply not there) > > The current situation ought to be documented. Better yet some flag can > be introduced somewhere in the system so that it exports realy values to > /proc, not the estimations that are innacurate in some cases (like hog) I wouldn't argue against any of those either. schedstats with userspace tools to understand the data will give better information I believe. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Wed, 14 Feb 2007, Con Kolivas wrote: On Wednesday 14 February 2007 09:01, malc wrote: On Mon, 12 Feb 2007, Pavel Machek wrote: Hi! [..snip..] I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu without being noticed. Slightly changed version of hog(around 3 lines in total changed) does that easily on 2.6.18.3 on PPC. http://www.boblycat.org/~malc/apc/load-hog-ppc.png I guess it's worth mentioning this is _only_ about displaying the cpu usage to userspace, as the cpu scheduler knows the accounting of each task in different ways. This behaviour can not be used to exploit the cpu scheduler into a starvation situation. Using the discrete per process accounting to accumulate the displayed values to userspace would fix this problem, but would be expensive. Guess you are right, but, once again, the problem is not so much about fooling the system to do something or other, but confusing the user: a. Everything is fine - the load is 0%, the fact that the system is overheating and/or that some processes do not do as much as they could is probably due to the bad hardware. b. The weird load pattern must be the result of bugs in my code. (And then a whole lot of time/effort is poured into fixing the problem which is simply not there) The current situation ought to be documented. Better yet some flag can be introduced somewhere in the system so that it exports realy values to /proc, not the estimations that are innacurate in some cases (like hog) -- vale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Wednesday 14 February 2007 09:01, malc wrote: > On Mon, 12 Feb 2007, Pavel Machek wrote: > > Hi! > > > >> The kernel looks at what is using cpu _only_ during the > >> timer > >> interrupt. Which means if your HZ is 1000 it looks at > >> what is running > >> at precisely the moment those 1000 timer ticks occur. It > >> is > >> theoretically possible using this measurement system to > >> use >99% cpu > >> and record 0 usage if you time your cpu usage properly. > >> It gets even > >> more inaccurate at lower HZ values for the same reason. > > > > I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu > > without being noticed. > > Slightly changed version of hog(around 3 lines in total changed) does that > easily on 2.6.18.3 on PPC. > > http://www.boblycat.org/~malc/apc/load-hog-ppc.png I guess it's worth mentioning this is _only_ about displaying the cpu usage to userspace, as the cpu scheduler knows the accounting of each task in different ways. This behaviour can not be used to exploit the cpu scheduler into a starvation situation. Using the discrete per process accounting to accumulate the displayed values to userspace would fix this problem, but would be expensive. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Mon, 12 Feb 2007, Pavel Machek wrote: Hi! The kernel looks at what is using cpu _only_ during the timer interrupt. Which means if your HZ is 1000 it looks at what is running at precisely the moment those 1000 timer ticks occur. It is theoretically possible using this measurement system to use >99% cpu and record 0 usage if you time your cpu usage properly. It gets even more inaccurate at lower HZ values for the same reason. I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu without being noticed. Slightly changed version of hog(around 3 lines in total changed) does that easily on 2.6.18.3 on PPC. http://www.boblycat.org/~malc/apc/load-hog-ppc.png -- vale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
Hi! > The kernel looks at what is using cpu _only_ during the > timer > interrupt. Which means if your HZ is 1000 it looks at > what is running > at precisely the moment those 1000 timer ticks occur. It > is > theoretically possible using this measurement system to > use >99% cpu > and record 0 usage if you time your cpu usage properly. > It gets even > more inaccurate at lower HZ values for the same reason. I have (had?) code that 'exploits' this. I believe I could eat 90% of cpu without being noticed. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Mon, 12 Feb 2007, Andrew Burgess wrote: On 12/02/07, Vassili Karpov <[EMAIL PROTECTED]> wrote: How does the kernel calculates the value it places in `/proc/stat' at 4th position (i.e. "idle: twiddling thumbs")? .. Later small kernel module was developed that tried to time how much time is spent in the idle handler inside the kernel and exported this information to the user-space. The results were consistent with our expectations and the output of the test utility. .. http://www.boblycat.org/~malc/apc Vassili Could you rewrite this code as a kernel patch for discussion/inclusion in mainline? I and maybe others would appreciate having idle statistics be more accurate. I really don't know how to approach that, what i do in itc.c is ugly to say the least (it's less ugly on PPC, but still). There's stuff there that is very dangerous, i.e. entering idle handler on SMP and simultaneously rmmoding the module (which surprisingly never actually caused any bad things on kernels i had (starting with 2.6.17.3), but paniced on Debians 2.6.8). Safety nets were added but i don't know whether they are sufficient. All in all what i have is a gross hack, but it works for my purposes. Another thing that keeps bothering me (again discovered with this Debian kernel) is the fact that PREEMPT preempts idle handler, this just doesn't add up in my head. So to summarize: i don't know how to properly do that (so that it works on all/most architectures, is less of a hack, has no negative impact on performance, etc) But i guess what innocent `smallhog.c' posted earlier demonstrated - is that something probably ought to be done about it, or at least the current situation documented. -- vale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Mon, 12 Feb 2007, Con Kolivas wrote: On 12/02/07, Vassili Karpov <[EMAIL PROTECTED]> wrote: Hello, [..snip..] The kernel looks at what is using cpu _only_ during the timer interrupt. Which means if your HZ is 1000 it looks at what is running at precisely the moment those 1000 timer ticks occur. It is theoretically possible using this measurement system to use >99% cpu and record 0 usage if you time your cpu usage properly. It gets even more inaccurate at lower HZ values for the same reason. And indeed it appears to be possible to do just that. Example: /* gcc -o hog smallhog.c */ #include #include #include #include #define HIST 10 static sig_atomic_t stop; static void sighandler (int signr) { (void) signr; stop = 1; } static unsigned long hog (unsigned long niters) { stop = 0; while (!stop && --niters); return niters; } int main (void) { int i; struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 }, .it_value = { .tv_sec = 0, .tv_usec = 1 } }; sigset_t set; unsigned long v[HIST]; double tmp = 0.0; unsigned long n; signal (SIGALRM, &sighandler); setitimer (ITIMER_REAL, &it, NULL); for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX); for (i = 0; i < HIST; ++i) tmp += v[i]; tmp /= HIST; n = tmp - (tmp / 3.0); sigemptyset (&set); sigaddset (&set, SIGALRM); for (;;) { hog (n); sigwait (&set, &i); } return 0; } /* end smallhog.c */ Might need some adjustment for a particular system but ran just fine here on: 2.4.30 + Athlon tbird (1Ghz) 2.6.19.2 + Athlon X2 3800+ (2Ghz) Showing next to zero load in top(1) and a whole lot more in APC. http://www.boblycat.org/~malc/apc/load-tbird-hog.png http://www.boblycat.org/~malc/apc/load-x2-hog.png Not quite 99% but nevertheless scary. -- vale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On 12/02/07, Vassili Karpov <[EMAIL PROTECTED]> wrote: > > How does the kernel calculates the value it places in `/proc/stat' at > 4th position (i.e. "idle: twiddling thumbs")? > .. > > Later small kernel module was developed that tried to time how much > time is spent in the idle handler inside the kernel and exported this > information to the user-space. The results were consistent with our > expectations and the output of the test utility. .. > http://www.boblycat.org/~malc/apc Vassili Could you rewrite this code as a kernel patch for discussion/inclusion in mainline? I and maybe others would appreciate having idle statistics be more accurate. Thanks for your work Andrew - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Monday 12 February 2007 18:10, malc wrote: > On Mon, 12 Feb 2007, Con Kolivas wrote: > > Lots of confusion comes from this, and often people think their pc > > suddenly uses a lot less cpu when they change from 1000HZ to 100HZ and > > use this as an argument/reason for changing to 100HZ when in fact the > > massive _reported_ difference is simply worse accounting. Of course there > > is more overhead going from 100 to 1000 but it doesn't suddenly make your > > apps use 10 times more cpu. > > Yep. This, i belive, what made the mplayer developers incorrectly conclude > that utilizing RTC suddenly made the code run slower, after all /proc/stat > now claims that CPU load is higher, while in reality it stayed the same - > it's the accuracy that has improved (somewhat) > > But back to the original question, does it look at what's running on timer > interrupt only or any IRQ? (something which is more in line with my own > observations) During the timer interrupt only. However if you create any form of timer, they will of course have some periodicity relationship with the timer interrupt. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Mon, 12 Feb 2007, Con Kolivas wrote: On Monday 12 February 2007 16:54, malc wrote: On Mon, 12 Feb 2007, Con Kolivas wrote: On 12/02/07, Vassili Karpov <[EMAIL PROTECTED]> wrote: [..snip..] The kernel looks at what is using cpu _only_ during the timer interrupt. Which means if your HZ is 1000 it looks at what is running at precisely the moment those 1000 timer ticks occur. It is theoretically possible using this measurement system to use >99% cpu and record 0 usage if you time your cpu usage properly. It gets even more inaccurate at lower HZ values for the same reason. Thank you very much. This somewhat contradicts what i saw (and outlined in usnet article), namely the mplayer+/dev/rtc case. Unless ofcourse /dev/rtc interrupt is considered to be the same as the interrupt from PIT (on X86 that is) P.S. Perhaps it worth documenting this in the documentation? I caused me, and perhaps quite a few other people, a great deal of pain and frustration. Lots of confusion comes from this, and often people think their pc suddenly uses a lot less cpu when they change from 1000HZ to 100HZ and use this as an argument/reason for changing to 100HZ when in fact the massive _reported_ difference is simply worse accounting. Of course there is more overhead going from 100 to 1000 but it doesn't suddenly make your apps use 10 times more cpu. Yep. This, i belive, what made the mplayer developers incorrectly conclude that utilizing RTC suddenly made the code run slower, after all /proc/stat now claims that CPU load is higher, while in reality it stayed the same - it's the accuracy that has improved (somewhat) But back to the original question, does it look at what's running on timer interrupt only or any IRQ? (something which is more in line with my own observations) -- vale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Mon, 12 Feb 2007, Con Kolivas wrote: On 12/02/07, Vassili Karpov <[EMAIL PROTECTED]> wrote: [..snip..] The kernel looks at what is using cpu _only_ during the timer interrupt. Which means if your HZ is 1000 it looks at what is running at precisely the moment those 1000 timer ticks occur. It is theoretically possible using this measurement system to use >99% cpu and record 0 usage if you time your cpu usage properly. It gets even more inaccurate at lower HZ values for the same reason. Thank you very much. This somewhat contradicts what i saw (and outlined in usnet article), namely the mplayer+/dev/rtc case. Unless ofcourse /dev/rtc interrupt is considered to be the same as the interrupt from PIT (on X86 that is) P.S. Perhaps it worth documenting this in the documentation? I caused me, and perhaps quite a few other people, a great deal of pain and frustration. -- vale - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Monday 12 February 2007 16:54, malc wrote: > On Mon, 12 Feb 2007, Con Kolivas wrote: > > On 12/02/07, Vassili Karpov <[EMAIL PROTECTED]> wrote: > > [..snip..] > > > The kernel looks at what is using cpu _only_ during the timer > > interrupt. Which means if your HZ is 1000 it looks at what is running > > at precisely the moment those 1000 timer ticks occur. It is > > theoretically possible using this measurement system to use >99% cpu > > and record 0 usage if you time your cpu usage properly. It gets even > > more inaccurate at lower HZ values for the same reason. > > Thank you very much. This somewhat contradicts what i saw (and outlined > in usnet article), namely the mplayer+/dev/rtc case. Unless ofcourse > /dev/rtc interrupt is considered to be the same as the interrupt from > PIT (on X86 that is) > > P.S. Perhaps it worth documenting this in the documentation? I caused > me, and perhaps quite a few other people, a great deal of pain and > frustration. Lots of confusion comes from this, and often people think their pc suddenly uses a lot less cpu when they change from 1000HZ to 100HZ and use this as an argument/reason for changing to 100HZ when in fact the massive _reported_ difference is simply worse accounting. Of course there is more overhead going from 100 to 1000 but it doesn't suddenly make your apps use 10 times more cpu. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Monday 12 February 2007 16:55, Stephen Rothwell wrote: > On Mon, 12 Feb 2007 16:44:22 +1100 "Con Kolivas" <[EMAIL PROTECTED]> wrote: > > The kernel looks at what is using cpu _only_ during the timer > > interrupt. Which means if your HZ is 1000 it looks at what is running > > at precisely the moment those 1000 timer ticks occur. It is > > theoretically possible using this measurement system to use >99% cpu > > and record 0 usage if you time your cpu usage properly. It gets even > > more inaccurate at lower HZ values for the same reason. > > That is not true on all architecures, some do more accurate accounting by > recording the times at user/kernel/interrupt transitions ... Indeed. It's certainly the way the common more boring pc architectures do it though. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CPU load
On Mon, 12 Feb 2007 16:44:22 +1100 "Con Kolivas" <[EMAIL PROTECTED]> wrote: > > The kernel looks at what is using cpu _only_ during the timer > interrupt. Which means if your HZ is 1000 it looks at what is running > at precisely the moment those 1000 timer ticks occur. It is > theoretically possible using this measurement system to use >99% cpu > and record 0 usage if you time your cpu usage properly. It gets even > more inaccurate at lower HZ values for the same reason. That is not true on all architecures, some do more accurate accounting by recording the times at user/kernel/interrupt transitions ... -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpMZ5w06pmhZ.pgp Description: PGP signature
Re: CPU load
On 12/02/07, Vassili Karpov <[EMAIL PROTECTED]> wrote: Hello, How does the kernel calculates the value it places in `/proc/stat' at 4th position (i.e. "idle: twiddling thumbs")? For background information as to why this question arose in the first place read on. While writing the code dealing with video acquisition/processing at work noticed that what top(1) (and every other tool that uses `/proc/stat' or `/proc/uptime') shows some very strange results. Top claimed that the system running one version of the code[A] is idling more often than the code[B] doing the same thing but more cleverly. After some head scratching one of my colleagues suggested a simple test that was implemented in a few minutes. The test consisted of a counter that incremented in an endless loop also after certain period of time had elapsed it printed the value of the counter. Running this test (with priority set to the lowest possible level) with code[A] and code[B] confirmed that code[B] is indeed faster than code[A], in a sense that the test made more forward progress while code[B] is running. Hard-coding some things (i.e. the value of the counter after counting for the duration of one period on completely idle system) we extended the test to show the percentage of CPU that was utilized. This never matched the value that top presented us with. Later small kernel module was developed that tried to time how much time is spent in the idle handler inside the kernel and exported this information to the user-space. The results were consistent with our expectations and the output of the test utility. Two more points. a. In the past (again video processing context) i have witnessed `/proc/stat' claiming that CPU utilization is 0% for, say, 20 seconds followed by 5 seconds of 30% load, and then the cycle repeated. According to the methods outlined above the load is always at 30%. b. In my personal experience difference between `/proc/stat' and "reality" can easily reach 40% (think i saw even more than that) The module and graphical application that uses it, along with some short README and a link to Usenet article dealing with the same subject is available at: http://www.boblycat.org/~malc/apc The kernel looks at what is using cpu _only_ during the timer interrupt. Which means if your HZ is 1000 it looks at what is running at precisely the moment those 1000 timer ticks occur. It is theoretically possible using this measurement system to use >99% cpu and record 0 usage if you time your cpu usage properly. It gets even more inaccurate at lower HZ values for the same reason. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu load balancing problem on smp
On Thursday 08 February 2007 09:42, you wrote: > On Wed, 7 Feb 2007, Arjan van de Ven wrote: > > Marc Donner wrote: > >> 501: 215717 209388 209430 202514 PCI-MSI-edge > >> eth10 502:927 1019 1053888 PCI-MSI-edge > >> eth11 > > > > this is odd, this is not an irq distribution that irqbalance should give > > you 1 > > > >> NMI:451 39 42 46 > >> LOC: 170899 170864 170846 170788 > >> ERR: 0 > >> > >> top output: > >> > >> top - 01:45:32 up 16 min, 2 users, load average: 1.04, 0.92, 0.50 > >> Tasks: 81 total, 3 running, 78 sleeping, 0 stopped, 0 zombie > >> Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, > >> 100.0% si > > > > and this doesn't match the irq output... > > sounds as if something has a real bug; can you send an lsmod ? maybe some > > driver keeps doing si's > > since this only happens when he adds more iptables rules, is it possible > that there sis some locking, or otherdata structure access that's > serializing things under load? > > Marc, since you don't use modules, send your .config. > > David Lang i've inserted some more iptables rules, and got 'softlockup detected' messages. BUG: soft lockup detected on CPU#1! Call Trace: [] softlockup_tick+0xda/0xec [] update_process_times+0x42/0x68 [] smp_local_timer_interrupt+0x32/0x55 [] ip_rcv+0x0/0x48e [] smp_apic_timer_interrupt+0x4f/0x67 [] apic_timer_interrupt+0x66/0x70 [] ip_rcv+0x0/0x48e [] ipt_do_table+0x2cd/0x315 [] nf_iterate+0x3f/0x7b [] ip_forward_finish+0x0/0x3e [] nf_hook_slow+0x5f/0xca [] ip_forward_finish+0x0/0x3e [] ip_forward+0x16e/0x212 [] ip_rcv+0x447/0x48e [] e1000_clean_rx_irq+0x41e/0x4ea [] e1000_clean+0x2f7/0x4b0 [] task_rq_lock+0x3d/0x6f [] net_rx_action+0x78/0x14a [] __do_softirq+0x56/0xd3 [] ksoftirqd+0x0/0x8f [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] smp_apic_timer_interrupt+0x54/0x67 [] apic_timer_interrupt+0x66/0x70 [] do_softirq+0x7b/0x7d [] ksoftirqd+0x4f/0x8f [] kthread+0xcb/0xf5 [] child_rip+0xa/0x12 [] kthread+0x0/0xf5 [] child_rip+0x0/0x12 Marc BUG: soft lockup detected on CPU#1! Call Trace: [] softlockup_tick+0xda/0xec [] update_process_times+0x42/0x68 [] smp_local_timer_interrupt+0x32/0x55 [] ip_rcv+0x0/0x48e [] smp_apic_timer_interrupt+0x4f/0x67 [] apic_timer_interrupt+0x66/0x70 [] ip_rcv+0x0/0x48e [] ipt_do_table+0x2cd/0x315 [] nf_iterate+0x3f/0x7b [] ip_forward_finish+0x0/0x3e [] nf_hook_slow+0x5f/0xca [] ip_forward_finish+0x0/0x3e [] ip_forward+0x16e/0x212 [] ip_rcv+0x447/0x48e [] e1000_clean_rx_irq+0x41e/0x4ea [] e1000_clean+0x2f7/0x4b0 [] task_rq_lock+0x3d/0x6f [] net_rx_action+0x78/0x14a [] __do_softirq+0x56/0xd3 [] ksoftirqd+0x0/0x8f [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] smp_apic_timer_interrupt+0x54/0x67 [] apic_timer_interrupt+0x66/0x70 [] do_softirq+0x7b/0x7d [] ksoftirqd+0x4f/0x8f [] kthread+0xcb/0xf5 [] child_rip+0xa/0x12 [] kthread+0x0/0xf5 [] child_rip+0x0/0x12 BUG: soft lockup detected on CPU#1! Call Trace: [] softlockup_tick+0xda/0xec [] update_process_times+0x42/0x68 [] smp_local_timer_interrupt+0x32/0x55 [] ip_rcv+0x0/0x48e [] smp_apic_timer_interrupt+0x4f/0x67 [] apic_timer_interrupt+0x66/0x70 [] ip_rcv+0x0/0x48e [] ipt_do_table+0xaf/0x315 [] nf_iterate+0x3f/0x7b [] ip_forward_finish+0x0/0x3e [] nf_hook_slow+0x5f/0xca [] ip_forward_finish+0x0/0x3e [] ip_forward+0x16e/0x212 [] ip_rcv+0x447/0x48e [] e1000_clean_rx_irq+0x41e/0x4ea [] e1000_clean+0x2f7/0x4b0 [] lock_timer_base+0x1b/0x3c [] __mod_timer+0xa6/0xb4 [] net_rx_action+0x78/0x14a [] __do_softirq+0x56/0xd3 [] ksoftirqd+0x0/0x8f [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] smp_apic_timer_interrupt+0x54/0x67 [] apic_timer_interrupt+0x66/0x70 [] do_softirq+0x7b/0x7d [] ksoftirqd+0x4f/0x8f [] kthread+0xcb/0xf5 [] child_rip+0xa/0x12 [] kthread+0x0/0xf5 [] child_rip+0x0/0x12 BUG: soft lockup detected on CPU#1! Call Trace: [] softlockup_tick+0xda/0xec [] update_process_times+0x42/0x68 [] smp_local_timer_interrupt+0x32/0x55 [] ip_rcv+0x0/0x48e [] smp_apic_timer_interrupt+0x4f/0x67 [] apic_timer_interrupt+0x66/0x70 [] ip_rcv+0x0/0x48e [] ipt_do_table+0xbf/0x315 [] nf_iterate+0x3f/0x7b [] ip_forward_finish+0x0/0x3e [] nf_hook_slow+0x5f/0xca [] ip_forward_finish+0x0/0x3e [] ip_forward+0x16e/0x212 [] ip_rcv+0x447/0x48e [] e1000_clean_rx_irq+0x41e/0x4ea [] e1000_clean+0x2f7/0x4b0 [] handle_edge_irq+0x106/0x12f [] handle_edge_irq+0x0/0x12f [] do_IRQ+0x137/0x159 [] net_rx_action+0x78/0x14a [] __do_softirq+0x56/0xd3 [] ksoftirqd+0x0/0x8f [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] smp_apic_timer_interrupt+0x54/0x67 [] apic_timer_interrupt+0x66/0x70 [] do_softirq+0x7b/0x7d [] ksoftirqd+0x4f/0x8f [] kthread+0xcb/0xf5 [] child_rip+0xa/0x12 [] kthread+0x0/0xf5 [] child_rip+0x0/0x12 BUG: soft lockup detected on CPU#1! Call
Re: cpu load balancing problem on smp
On Wed, 7 Feb 2007, Arjan van de Ven wrote: Marc Donner wrote: 501: 215717 209388 209430 202514 PCI-MSI-edge eth10 502:927 1019 1053888 PCI-MSI-edge eth11 this is odd, this is not an irq distribution that irqbalance should give you 1 NMI:451 39 42 46 LOC: 170899 170864 170846 170788 ERR: 0 top output: top - 01:45:32 up 16 min, 2 users, load average: 1.04, 0.92, 0.50 Tasks: 81 total, 3 running, 78 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 100.0% si and this doesn't match the irq output... sounds as if something has a real bug; can you send an lsmod ? maybe some driver keeps doing si's since this only happens when he adds more iptables rules, is it possible that there sis some locking, or otherdata structure access that's serializing things under load? Marc, since you don't use modules, send your .config. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu load balancing problem on smp
On Wednesday 07 February 2007 06:59, you wrote: > Marc Donner wrote: > > 501: 215717 209388 209430 202514 PCI-MSI-edge > > eth10 502:927 1019 1053888 PCI-MSI-edge > > eth11 > > this is odd, this is not an irq distribution that irqbalance should > give you i think this is ok, because only eth10 is receiving packets. traffic is flowing from eth10 to eth11 > and this doesn't match the irq output... > sounds as if something has a real bug; can you send an lsmod ? maybe > some driver keeps doing si's lsmod Module Size Used by thermal16780 0 fan 6280 0 button 9696 0 processor 29576 1 thermal ac 6664 0 battery11720 0 drivers are build directly in the kernel. i have attached the config file. i can also give access to the test setup, if you want. i have also tested kernel 2.6.18.3 and 2.6.19.2 on other hardware. same effect. regards marc # # Automatically generated make config: don't edit # Linux kernel version: 2.6.20 # Tue Feb 6 00:17:31 2007 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_GENERIC_BUG=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="dell2950-router" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_CPUSETS=y CONFIG_SYSFS_DEPRECATED=y CONFIG_RELAY=y CONFIG_INITRAMFS_SOURCE="" CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_MODVERSIONS=y # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y # # Block layer # CONFIG_BLOCK=y CONFIG_BLK_DEV_IO_TRACE=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq" # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set # CONFIG_MK8 is not set # CONFIG_MPSC is not set CONFIG_MCORE2=y # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_INTERNODE_CACHE_BYTES=64 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_X86_HT=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y CONFIG_SMP=y CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y # CONFIG_NUMA is not set CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_RESOURCES_64BIT=y CONFIG_NR_CPUS=8 # CONFIG_HOTPLUG_CPU is not set CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_IOMMU=y # CONFIG_CALGARY_IOMMU is not set CONFIG_SWIOTLB=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_AMD=y # CONFIG_KEXEC is not set # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x20 CONFIG_SECCOMP=y # CONFIG_CC_STACKPROTECTOR is not set # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 # CONFIG_REORDER is not set CONFIG_K8_NB=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_ISA_DMA_API=y CONFIG_GENERIC_PENDING_IRQ=y # # Power management options # CONFIG_PM=y # CONFIG_PM_LEGACY is not set # CONFIG_PM_DEBUG is not set #
Re: cpu load balancing problem on smp
Marc Donner wrote: 501: 215717 209388 209430 202514 PCI-MSI-edge eth10 502:927 1019 1053888 PCI-MSI-edge eth11 this is odd, this is not an irq distribution that irqbalance should give you 1 NMI:451 39 42 46 LOC: 170899 170864 170846 170788 ERR: 0 top output: top - 01:45:32 up 16 min, 2 users, load average: 1.04, 0.92, 0.50 Tasks: 81 total, 3 running, 78 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 100.0% si and this doesn't match the irq output... sounds as if something has a real bug; can you send an lsmod ? maybe some driver keeps doing si's - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu load balancing problem on smp
> can you send me the output of > > cat /proc/interrupts here it is: irqblance is running. network loaded with 600Mbit/s for about 5minutes. CPU0 CPU1 CPU2 CPU3 0: 37713 41667 41673 49914 IO-APIC-edge timer 1: 0 0 2 0 IO-APIC-edge i8042 8: 0 0 1 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 2 0 2 0 IO-APIC-edge i8042 14: 11 9 9 8 IO-APIC-edge ide0 20: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 21: 62 52 37 46 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb4 78:665581344351 IO-APIC-fasteoi megasas 501: 215717 209388 209430 202514 PCI-MSI-edge eth10 502:927 1019 1053888 PCI-MSI-edge eth11 NMI:451 39 42 46 LOC: 170899 170864 170846 170788 ERR: 0 top output: top - 01:45:32 up 16 min, 2 users, load average: 1.04, 0.92, 0.50 Tasks: 81 total, 3 running, 78 sleeping, 0 stopped, 0 zombie Cpu0 : 0.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 100.0% si Cpu1 : 0.0% us, 0.0% sy, 0.0% ni, 99.0% id, 1.0% wa, 0.0% hi, 0.0% si Cpu2 : 0.0% us, 0.0% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.3% si Cpu3 : 0.0% us, 0.0% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.3% hi, 0.0% si regards Marc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu load balancing problem on smp
Arjan van de Ven wrote: Pablo Sebastian Greco wrote: 2296:427426436 134563009 PCI-MSI-edge eth1 2297:252252 135926471257 PCI-MSI-edge eth0 this suggests that cores would be busy rather than only one - Yes, but you are looking at mm kernel statistics, but if you look at the standard kernel, you'll see that eth interrupts are on the same core according to attached /proc/cpuinfo. OTOH, take a look at timer interrupt distribution processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Intel(R) Xeon(TM) CPU 2.66GHz stepping: 4 cpu MHz : 2656.000 cache size : 2048 KB physical id : 0 siblings: 4 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est cid cx16 xtpr lahf_lm bogomips: 5324.82 clflush size: 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Intel(R) Xeon(TM) CPU 2.66GHz stepping: 4 cpu MHz : 2656.000 cache size : 2048 KB physical id : 0 siblings: 4 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est cid cx16 xtpr lahf_lm bogomips: 5320.06 clflush size: 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Intel(R) Xeon(TM) CPU 2.66GHz stepping: 4 cpu MHz : 2656.000 cache size : 2048 KB physical id : 0 siblings: 4 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est cid cx16 xtpr lahf_lm bogomips: 5320.20 clflush size: 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Intel(R) Xeon(TM) CPU 2.66GHz stepping: 4 cpu MHz : 2656.000 cache size : 2048 KB physical id : 0 siblings: 4 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est cid cx16 xtpr lahf_lm bogomips: 5320.16 clflush size: 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management:
Re: cpu load balancing problem on smp
Pablo Sebastian Greco wrote: 2296:427426436 134563009 PCI-MSI-edge eth1 2297:252252 135926471257 PCI-MSI-edge eth0 this suggests that cores would be busy rather than only one - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu load balancing problem on smp
Arjan van de Ven wrote: Marc Donner wrote: see http://www.irqbalance.org to get irqbalance I now have tried irqloadbalance, but the same problem. can you send me the output of cat /proc/interrupts (taken when you are or have been loading the network) maybe there's something fishy going on - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Please take a look at this, taken from the same machine running different vanilla kernels on fc6. Current 2.6.19 fedora kernel, looks like 2.6.20rc3 (non mm) in the attachment. 2.6.20-rc3 [EMAIL PROTECTED] ~]# rpm -q irqbalance irqbalance-0.55-2.fc6 [EMAIL PROTECTED] ~]# uptime 11:51:50 up 6 days, 30 min, 3 users, load average: 5.31, 5.08, 4.02 [EMAIL PROTECTED] ~]# service irqbalance status irqbalance (pid 2310) is running... [EMAIL PROTECTED] ~]# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 520209517 0 0 0 IO-APIC-edge timer 1: 12 0 0 0 IO-APIC-edge i8042 8: 1 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 12:103 0 0 0 IO-APIC-edge i8042 14: 0 0 0 0 IO-APIC-edge libata 15: 0 0 0 0 IO-APIC-edge libata 20: 138736 188194096 06797630 IO-APIC-fasteoi libata 22: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb4 23: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb3, ehci_hcd:usb5 2296: 1367 0 0 849270653 PCI-MSI-edge eth1 2297: 1022 835083968 0 0 PCI-MSI-edge eth0 NMI: 47756 146249 47617 146186 LOC: 516828752 517331906 516828611 517331771 ERR: 0 2.6.20-rc3-mm1 [EMAIL PROTECTED] kernel]# uptime 12:17:54 up 1 day, 21:58, 2 users, load average: 9.47, 9.79, 10.28 [EMAIL PROTECTED] kernel]# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 60031592 61350247 22273772 21780215 IO-APIC-edge timer 1: 0 6 1 1 IO-APIC-edge i8042 8: 0 0 1 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 12:148283104136 IO-APIC-edge i8042 14: 0 0 0 0 IO-APIC-edge libata 15: 0 0 0 0 IO-APIC-edge libata 20: 104827951477821 93306 641628 IO-APIC-fasteoi libata 22: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb4 23: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb3, ehci_hcd:usb5 2296:427426436 134563009 PCI-MSI-edge eth1 2297:252252 135926471257 PCI-MSI-edge eth0 NMI: 0 0 0 0 LOC: 164661140 165163503 164660992 165163305 ERR: 0
Re: cpu load balancing problem on smp
Marc Donner wrote: see http://www.irqbalance.org to get irqbalance I now have tried irqloadbalance, but the same problem. can you send me the output of cat /proc/interrupts (taken when you are or have been loading the network) maybe there's something fishy going on - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu load balancing problem on smp
On Tuesday 06 February 2007 19:09, you wrote: > On Tue, 2007-02-06 at 18:32 +0100, Marc Donner wrote: > > Hi @all > > > > we have detected some problems on our live systems and so i have build a > > test setup in our lab as follow: > > > > 3 Core 2 duo servers, each with 2 CPUs, with GE interfaces. 2 of them > > are only for generating network traffic. the 3rd server is the one i want > > to test. it is connected over two GE links to the other servers. the > > testserver is configured as an ip router. running kernel 2.6.20. > > > > now if i let traffic flow over the box, about 600Mbit/s and about 120k > > packets/s all seems to be ok. the load is balanced over all cpus. if now > > insert some iptables rules, about 500, the softirq load increases, but > > all seems to be ok. now i insert some rules more, and suddenly 1 CPU is > > 100% loaded and the other ones are 99% idle. the load toggles now between > > the cpus in intervals. > > I wonder if you are using irqbalance.. if not you probably want to... > (this should at least split it over 2 cpus) > > see http://www.irqbalance.org to get irqbalance I now have tried irqloadbalance, but the same problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: cpu load balancing problem on smp
On Tue, 2007-02-06 at 18:32 +0100, Marc Donner wrote: > Hi @all > > we have detected some problems on our live systems and so i have build a test > setup in our lab as follow: > > 3 Core 2 duo servers, each with 2 CPUs, with GE interfaces. 2 of them are > only for generating network traffic. the 3rd server is the one i want to > test. it is connected over two GE links to the other servers. the testserver > is configured as an ip router. running kernel 2.6.20. > > now if i let traffic flow over the box, about 600Mbit/s and about 120k > packets/s all seems to be ok. the load is balanced over all cpus. if now > insert some iptables rules, about 500, the softirq load increases, but all > seems to be ok. now i insert some rules more, and suddenly 1 CPU is 100% > loaded and the other ones are 99% idle. the load toggles now between the cpus > in intervals. I wonder if you are using irqbalance.. if not you probably want to... (this should at least split it over 2 cpus) see http://www.irqbalance.org to get irqbalance - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/