Re: [patch] sched: accurate user accounting

2007-06-16 Thread malc

On Sat, 16 Jun 2007, Ingo Molnar wrote:



* malc <[EMAIL PROTECTED]> wrote:


Interesting, the idle time accounting (done from
account_system_time()) has not changed. Has your .config changed?
Could you please send it across. I've downloaded apc and I am trying
to reproduce your problem.


http://www.boblycat.org/~malc/apc/cfs/ has config for 2.6.21 an the
diff against 2.6.21.4-cfs-v16.


hm. Could you add this to your kernel boot command line:

highres=off nohz=off

and retest, to inquire whether this problem is independent of any
timer-events effects?


It certainly makes a difference. Without dynticks however scheduler
still moves the task (be it hog or mplayer) between CPUs for no good
reason, for hog the switching is every few seconds (instead of more or
less all the time in case of dynticks). What's rather strange is that
while it hogs 7x% of CPU on core#1 it only hogs 3x% on core#2
(percentage is obtained by timing idle handler not form /proc/stat,
according to /proc/stat either core is 0% loaded)..

Live report... After a while it stabilized and was running on core#2
all the time, when the process was stopped and restarted it started to
run constantly on core#1 (with ~70% load)

Btw. i don't want to waste anyones time here. All i originally wanted
is to know if something was done (as per LWN article) with increasing
the accuracy of system wide statistics (in /proc/stat), turns out that
nothing really happened in this area, but latest developments
(CFS/dynticks) have some peculiar effect on hog. Plus this constant
migration of hog/mplayer is somewhat strange.

Live report... again.. Okay now that hog stabilized on running
exclusively on core#1 and at 70% load i switched to the machine
where it runs and after just switching the windows in IceWM
resulted in system load dropping to 30%.. Quite reproducible too.

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-16 Thread Ingo Molnar

* malc <[EMAIL PROTECTED]> wrote:

> > Interesting, the idle time accounting (done from 
> > account_system_time()) has not changed. Has your .config changed? 
> > Could you please send it across. I've downloaded apc and I am trying 
> > to reproduce your problem.
> 
> http://www.boblycat.org/~malc/apc/cfs/ has config for 2.6.21 an the 
> diff against 2.6.21.4-cfs-v16.

hm. Could you add this to your kernel boot command line:

 highres=off nohz=off

and retest, to inquire whether this problem is independent of any 
timer-events effects?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-16 Thread malc

On Sat, 16 Jun 2007, Balbir Singh wrote:


malc wrote:

On Fri, 15 Jun 2007, Balbir Singh wrote:


malc wrote:

On Thu, 14 Jun 2007, Ingo Molnar wrote:



[..snip..]



Now integral load matches the one obtained via the "accurate" method.
However the report for individual cores are of by around 20% percent.



I think I missed some of the context, is the accounting of individual
tasks
or cpustat values off by 20%? I'll try and reproduce this problem.


Neither actually, the individual core idle times reported via `/proc/stat'
are off by 20 percent, one over estimates and the other under estimates
and the sum is right on the mark.



Interesting, the idle time accounting (done from account_system_time())
has not changed. Has your .config changed? Could you please send
it across. I've downloaded apc and I am trying to reproduce your problem.


http://www.boblycat.org/~malc/apc/cfs/ has config for 2.6.21 an the 
diff against 2.6.21.4-cfs-v16.


I updated hog (can be found in the above directory) to call setitimer
with a bit saner values (apprantly tickless has profound effect on
itimers interface). While fooling around with this version of hog
on 2.6.21.4-cfs-v16 i stumbled upon rather strange behavior (the
screenshot is also at the address above, note that kernel was booted
with maxcpus=1)

[..snip..]

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-16 Thread Balbir Singh
malc wrote:
> On Fri, 15 Jun 2007, Balbir Singh wrote:
> 
>> malc wrote:
>>> On Thu, 14 Jun 2007, Ingo Molnar wrote:
>>>
> 
> [..snip..]
> 
>>>
>>> Now integral load matches the one obtained via the "accurate" method.
>>> However the report for individual cores are of by around 20% percent.
>>>
>>
>> I think I missed some of the context, is the accounting of individual
>> tasks
>> or cpustat values off by 20%? I'll try and reproduce this problem.
> 
> Neither actually, the individual core idle times reported via `/proc/stat'
> are off by 20 percent, one over estimates and the other under estimates
> and the sum is right on the mark.
> 

Interesting, the idle time accounting (done from account_system_time())
has not changed. Has your .config changed? Could you please send
it across. I've downloaded apc and I am trying to reproduce your problem.

>>
>> Could you provide more details on the APC tool that you are using -- I
>> do not understand the orange and yellow lines, do they represent system
>> and user time?
> 
> It's somewhat documented on the page: http://www.boblycat.org/~malc/apc
> Anyway the left bar is based on information from `/proc/stat' the right
> one is derived from the kernel module that just times how much time was
> spent in idle handler. The graphs: red - `/proc/stat', yellow - module.
> 
>> NOTE: There is some inconsistency in the values reported by /usr/bin/time
>> (getrusage) and values reported in /proc or through delay accounting.
> 
> I don't really use `getrusage'.
> 

Tools like /usr/bin/time do.

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-14 Thread malc

On Fri, 15 Jun 2007, Balbir Singh wrote:


malc wrote:

On Thu, 14 Jun 2007, Ingo Molnar wrote:



[..snip..]



Now integral load matches the one obtained via the "accurate" method.
However the report for individual cores are of by around 20% percent.



I think I missed some of the context, is the accounting of individual tasks
or cpustat values off by 20%? I'll try and reproduce this problem.


Neither actually, the individual core idle times reported via `/proc/stat'
are off by 20 percent, one over estimates and the other under estimates
and the sum is right on the mark.



Could you provide more details on the APC tool that you are using -- I
do not understand the orange and yellow lines, do they represent system
and user time?


It's somewhat documented on the page: http://www.boblycat.org/~malc/apc
Anyway the left bar is based on information from `/proc/stat' the right
one is derived from the kernel module that just times how much time was
spent in idle handler. The graphs: red - `/proc/stat', yellow - module.


NOTE: There is some inconsistency in the values reported by /usr/bin/time
(getrusage) and values reported in /proc or through delay accounting.


I don't really use `getrusage'.

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-14 Thread Balbir Singh
malc wrote:
> On Thu, 14 Jun 2007, Ingo Molnar wrote:
> 
>>
>> * malc <[EMAIL PROTECTED]> wrote:
>>
 the alternating balancing might be due to an uneven number of tasks
 perhaps? If you have 3 tasks on 2 cores then there's no other
 solution to achieve even performance of each task but to rotate them
 amongst the cores.
>>>
>>> One task, one thread. I have also tried to watch fairly demanding
>>> video (Elephants Dream in 1920x1080/MPEG4) with mplayer, and CFS moves
>>> the only task between cores almost every second.
>>
>> hm, mplayer is not running alone when it does video playback: Xorg is
>> also pretty active. Furthermore, the task you are using to monitor
>> mplayer counts too. The Core2Duo has a shared L2 cache between cores, so
>> it is pretty cheap to move tasks between the cores.
>>
> 
> Well just to be sure i reran the test with `-vo null' (and fwiw i tried
> few completely different output drivers) the behavior is the same. I'm
> not running Core2Duo but X2, but guess that does not really matter here.
> 
> As for the task that monitors, i've written it myself (there are two
> monitoring methods, one(the accurate) does not depend on contets of
> `/proc/stat' at all), so it can be cheaply (for me) changed in any
> way one wants. Sources are available at the same place where screenshot
> was found.
> 
 well, precise/finegrained accounting patches have been available for
 years, the thing with CFS is that there we get them 'for free',
 because CFS needs those metrics for its own logic. That's why this
 information is much closer to reality now. But note: right now what
 is affected by the changes in the CFS patches is /proc/PID/stat
 (i.e. the per-task information that 'top' and 'ps' displays, _not_
 /proc/stat) - but more accurate /proc/stat could certainly come
 later on too.
>>>
>>> Aha. I see, it's just that integral load for hog is vastly improved
>>> compared to vanilla 2.6.21 [...]
>>
>> hm, which ones are improved? Could this be due to some other property of
>> CFS? If your app relies on /proc/stat then there's no extra precision in
>> those cpustat values yet.
> 
> This is what it looked like before:
> http://www.boblycat.org/~malc/apc/load-x2-hog.png
> 
> Now integral load matches the one obtained via the "accurate" method.
> However the report for individual cores are of by around 20% percent.
> 

I think I missed some of the context, is the accounting of individual tasks
or cpustat values off by 20%? I'll try and reproduce this problem.

Could you provide more details on the APC tool that you are using -- I
do not understand the orange and yellow lines, do they represent system
and user time?

NOTE: There is some inconsistency in the values reported by /usr/bin/time
(getrusage) and values reported in /proc or through delay accounting.


> Though i'm not quite sure what you mean by "which ones are improved".
> 
>> i've Cc:-ed Balbir Singh and Dmitry Adamushko who are the main authors
>> of the current precise accounting code in CFS. Maybe i missed some
>> detail :-)
> 
> Oh, the famous "With enough eyeballs, all bugs are shallow." in action.
> 


-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-14 Thread malc

On Thu, 14 Jun 2007, Ingo Molnar wrote:



* malc <[EMAIL PROTECTED]> wrote:


the alternating balancing might be due to an uneven number of tasks
perhaps? If you have 3 tasks on 2 cores then there's no other
solution to achieve even performance of each task but to rotate them
amongst the cores.


One task, one thread. I have also tried to watch fairly demanding
video (Elephants Dream in 1920x1080/MPEG4) with mplayer, and CFS moves
the only task between cores almost every second.


hm, mplayer is not running alone when it does video playback: Xorg is
also pretty active. Furthermore, the task you are using to monitor
mplayer counts too. The Core2Duo has a shared L2 cache between cores, so
it is pretty cheap to move tasks between the cores.



Well just to be sure i reran the test with `-vo null' (and fwiw i tried
few completely different output drivers) the behavior is the same. I'm
not running Core2Duo but X2, but guess that does not really matter here.

As for the task that monitors, i've written it myself (there are two
monitoring methods, one(the accurate) does not depend on contets of
`/proc/stat' at all), so it can be cheaply (for me) changed in any
way one wants. Sources are available at the same place where screenshot
was found.


well, precise/finegrained accounting patches have been available for
years, the thing with CFS is that there we get them 'for free',
because CFS needs those metrics for its own logic. That's why this
information is much closer to reality now. But note: right now what
is affected by the changes in the CFS patches is /proc/PID/stat
(i.e. the per-task information that 'top' and 'ps' displays, _not_
/proc/stat) - but more accurate /proc/stat could certainly come
later on too.


Aha. I see, it's just that integral load for hog is vastly improved
compared to vanilla 2.6.21 [...]


hm, which ones are improved? Could this be due to some other property of
CFS? If your app relies on /proc/stat then there's no extra precision in
those cpustat values yet.


This is what it looked like before: 
http://www.boblycat.org/~malc/apc/load-x2-hog.png


Now integral load matches the one obtained via the "accurate" method.
However the report for individual cores are of by around 20% percent.

Though i'm not quite sure what you mean by "which ones are improved".


i've Cc:-ed Balbir Singh and Dmitry Adamushko who are the main authors
of the current precise accounting code in CFS. Maybe i missed some
detail :-)


Oh, the famous "With enough eyeballs, all bugs are shallow." in action.

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-14 Thread Ingo Molnar

* malc <[EMAIL PROTECTED]> wrote:

> > the alternating balancing might be due to an uneven number of tasks 
> > perhaps? If you have 3 tasks on 2 cores then there's no other 
> > solution to achieve even performance of each task but to rotate them 
> > amongst the cores.
> 
> One task, one thread. I have also tried to watch fairly demanding 
> video (Elephants Dream in 1920x1080/MPEG4) with mplayer, and CFS moves 
> the only task between cores almost every second.

hm, mplayer is not running alone when it does video playback: Xorg is 
also pretty active. Furthermore, the task you are using to monitor 
mplayer counts too. The Core2Duo has a shared L2 cache between cores, so 
it is pretty cheap to move tasks between the cores.

> > well, precise/finegrained accounting patches have been available for 
> > years, the thing with CFS is that there we get them 'for free', 
> > because CFS needs those metrics for its own logic. That's why this 
> > information is much closer to reality now. But note: right now what 
> > is affected by the changes in the CFS patches is /proc/PID/stat 
> > (i.e. the per-task information that 'top' and 'ps' displays, _not_ 
> > /proc/stat) - but more accurate /proc/stat could certainly come 
> > later on too.
> 
> Aha. I see, it's just that integral load for hog is vastly improved 
> compared to vanilla 2.6.21 [...]

hm, which ones are improved? Could this be due to some other property of 
CFS? If your app relies on /proc/stat then there's no extra precision in 
those cpustat values yet.

i've Cc:-ed Balbir Singh and Dmitry Adamushko who are the main authors 
of the current precise accounting code in CFS. Maybe i missed some 
detail :-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-14 Thread malc

On Thu, 14 Jun 2007, Ingo Molnar wrote:



* Vassili Karpov <[EMAIL PROTECTED]> wrote:


Hello Ingo and others,

After reading http://lwn.net/Articles/236485/ and noticing few
refernces to accounting i decided to give CFS a try. With
sched-cfs-v2.6.21.4-16 i get pretty weird results, it seems like
scheduler is dead set on trying to move the processes to different
CPUs/cores all the time. And with hog (manually tweaking the amount
iterations) i get fairly strange resuls, first of all the process is
split between two cores, secondly while integral load provided by the
kernel looks correct, it's off by good 20 percent on each idividial
core.

(http://www.boblycat.org/~malc/apc/hog-cfs-v16.png)

Thought this information might be of some interest.


hm - what does 'hog' do, can i download hog.c from somewhere?


http://www.boblycat.org/~malc/apc/hog.c and also a in
Documentation/cpu-load.txt.



the alternating balancing might be due to an uneven number of tasks
perhaps? If you have 3 tasks on 2 cores then there's no other solution
to achieve even performance of each task but to rotate them amongst the
cores.


One task, one thread. I have also tried to watch fairly demanding video
(Elephants Dream in 1920x1080/MPEG4) with mplayer, and CFS moves the
only task between cores almost every second.


P.S. How come the /proc/stat information is much closer to reality
 now? Something like what Con Kolivas suggested was added to
 sched.c?


well, precise/finegrained accounting patches have been available for
years, the thing with CFS is that there we get them 'for free', because
CFS needs those metrics for its own logic. That's why this information
is much closer to reality now. But note: right now what is affected by
the changes in the CFS patches is /proc/PID/stat (i.e. the per-task
information that 'top' and 'ps' displays, _not_ /proc/stat) - but more
accurate /proc/stat could certainly come later on too.


Aha. I see, it's just that integral load for hog is vastly improved
compared to vanilla 2.6.21 (then again some other tests are off by a few
percent (at least), though they were fine with Con's patch (which was
announced at the beginning of this thread))

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-14 Thread Ingo Molnar

* Vassili Karpov <[EMAIL PROTECTED]> wrote:

> Hello Ingo and others,
> 
> After reading http://lwn.net/Articles/236485/ and noticing few 
> refernces to accounting i decided to give CFS a try. With 
> sched-cfs-v2.6.21.4-16 i get pretty weird results, it seems like 
> scheduler is dead set on trying to move the processes to different 
> CPUs/cores all the time. And with hog (manually tweaking the amount 
> iterations) i get fairly strange resuls, first of all the process is 
> split between two cores, secondly while integral load provided by the 
> kernel looks correct, it's off by good 20 percent on each idividial 
> core.
> 
> (http://www.boblycat.org/~malc/apc/hog-cfs-v16.png)
> 
> Thought this information might be of some interest.

hm - what does 'hog' do, can i download hog.c from somewhere?

the alternating balancing might be due to an uneven number of tasks 
perhaps? If you have 3 tasks on 2 cores then there's no other solution 
to achieve even performance of each task but to rotate them amongst the 
cores.

> P.S. How come the /proc/stat information is much closer to reality 
>  now? Something like what Con Kolivas suggested was added to 
>  sched.c?

well, precise/finegrained accounting patches have been available for 
years, the thing with CFS is that there we get them 'for free', because 
CFS needs those metrics for its own logic. That's why this information 
is much closer to reality now. But note: right now what is affected by 
the changes in the CFS patches is /proc/PID/stat (i.e. the per-task 
information that 'top' and 'ps' displays, _not_ /proc/stat) - but more 
accurate /proc/stat could certainly come later on too.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-06-14 Thread Vassili Karpov

Hello Ingo and others,

After reading http://lwn.net/Articles/236485/ and noticing few refernces
to accounting i decided to give CFS a try. With sched-cfs-v2.6.21.4-16
i get pretty weird results, it seems like scheduler is dead set on trying
to move the processes to different CPUs/cores all the time. And with hog
(manually tweaking the amount iterations) i get fairly strange resuls,
first of all the process is split between two cores, secondly while
integral load provided by the kernel looks correct, it's off by good
20 percent on each idividial core.

(http://www.boblycat.org/~malc/apc/hog-cfs-v16.png)

Thought this information might be of some interest.

P.S. How come the /proc/stat information is much closer to reality now?
 Something like what Con Kolivas suggested was added to sched.c?

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-28 Thread Ingo Molnar

* malc <[EMAIL PROTECTED]> wrote:

> This situation is harder to write a hog-like testcase for. Anyhow it 
> seems the difference in percentage stems from the `intr' field of 
> `/proc/stat', which fits. And following patch (which should be applied 
> on top of yours) seems to help. I wouldn't really know what to do with 
> softirq and the rest of counts touched by this function, so i left 
> them alone.
> 
> Comments?

i like it. FYI, i've applied your patch to -rt.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-26 Thread malc

On Mon, 26 Mar 2007, Con Kolivas wrote:


On Monday 26 March 2007 09:01, Con Kolivas wrote:

On Monday 26 March 2007 03:14, malc wrote:

On Mon, 26 Mar 2007, Con Kolivas wrote:

On Monday 26 March 2007 01:19, malc wrote:

Erm... i just looked at the code and suddenly it stopped making any sense
at all:

 p->last_ran = rq->most_recent_timestamp = now;
 /* Sanity check. It should never go backwards or ruin accounting
*/ if (unlikely(now < p->last_ran))
 return;
 time_diff = now - p->last_ran;

First `now' is assigned to `p->last_ran' and the very next line
compares those two values, and then the difference is taken.. I quite
frankly am either very tired or fail to see the point.. time_diff is
either always zero or there's always a race here.


Bah major thinko error on my part! That will teach me to post patches
untested at 1:30 am. I'll try again shortly sorry.


Ok this one is heavily tested. Please try it when you find the time.


[..snip..]

Done, works. However there's a problem with accuracy comming from a
different angle.

I have this USB video grabber and also quite efficient way of putting
the pixels to the screen. Video is grabbed using isochronous
transfers, i.e. lots of small(on the order of 1K) chunks of data are
being transferred continously instead of one big burst unlike in, for
instance, PCI setup. With your accounting change idle from
`/proc/stat' is accurate but unfortunatelly top(1)/icewm's monitor/etc
apparently use user+sys+nice+intr+softirq+iowait to show the system
load, so system tools claim that the load is 10-12% while in reality
it is ~3%.

This situation is harder to write a hog-like testcase for. Anyhow it
seems the difference in percentage stems from the `intr' field of
`/proc/stat', which fits. And following patch (which should be applied
on top of yours) seems to help. I wouldn't really know what to do with
softirq and the rest of counts touched by this function, so i left them
alone.

Comments?

diff -ru linux-2.6.21-rc4/include/linux/kernel_stat.h 
linux-2.6.21-rc4-load/include/linux/kernel_stat.h
--- linux-2.6.21-rc4/include/linux/kernel_stat.h2007-03-26 
14:33:19.0 +0400
+++ linux-2.6.21-rc4-load/include/linux/kernel_stat.h   2007-03-26 
14:06:21.0 +0400
@@ -22,6 +22,7 @@
cputime64_t system;
cputime64_t softirq;
cputime64_t irq;
+   cputime64_t irq_ns;
cputime64_t idle;
cputime64_t idle_ns;
cputime64_t iowait;
diff -ru linux-2.6.21-rc4/kernel/sched.c linux-2.6.21-rc4-load/kernel/sched.c
--- linux-2.6.21-rc4/kernel/sched.c 2007-03-26 14:33:19.0 +0400
+++ linux-2.6.21-rc4-load/kernel/sched.c2007-03-26 14:11:36.0 
+0400
@@ -3148,9 +3148,14 @@

/* Add system time to cpustat. */
tmp = cputime_to_cputime64(cputime);
-   if (hardirq_count() - hardirq_offset)
-   cpustat->irq = cputime64_add(cpustat->irq, tmp);
-   else if (softirq_count())
+   if (hardirq_count() - hardirq_offset) {
+   cpustat->irq_ns = cputime64_add(cpustat->irq_ns, tmp);
+   if (cpustat->irq_ns > JIFFY_NS) {
+   cpustat->irq_ns = cputime64_sub(cpustat->irq_ns,
+   JIFFY_NS);
+   cpustat->irq = cputime64_add(cpustat->irq, 1);
+   }
+   } else if (softirq_count())
cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
else if (p != rq->idle)
cpustat->system = cputime64_add(cpustat->system, tmp);

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-26 Thread Con Kolivas
On Monday 26 March 2007 15:11, Al Boldi wrote:
> Con Kolivas wrote:
> > Ok this one is heavily tested. Please try it when you find the time.
>
> It's better, but still skewed.  Try two chew.c's; they account 80% each.
>
> > ---
> > Currently we only do cpu accounting to userspace based on what is
> > actually happening precisely on each tick. The accuracy of that
> > accounting gets progressively worse the lower HZ is. As we already keep
> > accounting of nanosecond resolution we can accurately track user cpu,
> > nice cpu and idle cpu if we move the accounting to update_cpu_clock with
> > a nanosecond cpu_usage_stat entry.
>
> That's great and much needed, but this is still probed; so what's wrong
> with doing it in-lined?
>
> > This increases overhead slightly but
> > avoids the problem of tick aliasing errors making accounting unreliable.
>
> Higher scheduling accuracy may actually offset any overhead incurred, so
> it's well worth it; and if it's in-lined it should mean even less overhead.
>
> > +   /* Sanity check. It should never go backwards or ruin accounting
> > */ +   if (unlikely(now < p->last_ran))
> > +   goto out_set;
>
> If sched_clock() goes backwards, why not fix it, instead of hacking around
> it?
>
>
> Thanks!

Actually I'm going to give up this idea as not worth my effort given the 
sched_clock fsckage that seems to cause so much grief. If someone else wants 
to take up the challenge feel free to.

Andrew please drop this patch. It's still broken and I have too much on my 
plate to try and debug it sorry.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Mike Galbraith
On Mon, 2007-03-26 at 08:11 +0300, Al Boldi wrote:

> > +   /* Sanity check. It should never go backwards or ruin accounting
> > */ +   if (unlikely(now < p->last_ran))
> > +   goto out_set;
> 
> If sched_clock() goes backwards, why not fix it, instead of hacking around 
> it?

When tasks change cpu, timestamp correction is attempted, but isn't
perfect (perfection: submit patch to read remote clock - people likely
toast poor submitter's buns very thoroughly).  Timewarps happen, and
even on Intel processors there seems to be very small difference in TSCs
in the same package.  (that's what instrumentation here said must be
true, possible booboos aside)

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Al Boldi
Con Kolivas wrote:
>
> Ok this one is heavily tested. Please try it when you find the time.

It's better, but still skewed.  Try two chew.c's; they account 80% each.

> ---
> Currently we only do cpu accounting to userspace based on what is
> actually happening precisely on each tick. The accuracy of that
> accounting gets progressively worse the lower HZ is. As we already keep
> accounting of nanosecond resolution we can accurately track user cpu,
> nice cpu and idle cpu if we move the accounting to update_cpu_clock with
> a nanosecond cpu_usage_stat entry.

That's great and much needed, but this is still probed; so what's wrong with 
doing it in-lined?

> This increases overhead slightly but
> avoids the problem of tick aliasing errors making accounting unreliable.

Higher scheduling accuracy may actually offset any overhead incurred, so it's 
well worth it; and if it's in-lined it should mean even less overhead.

> +   /* Sanity check. It should never go backwards or ruin accounting
> */ +   if (unlikely(now < p->last_ran))
> +   goto out_set;

If sched_clock() goes backwards, why not fix it, instead of hacking around 
it?


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Monday 26 March 2007 09:01, Con Kolivas wrote:
> On Monday 26 March 2007 03:14, malc wrote:
> > On Mon, 26 Mar 2007, Con Kolivas wrote:
> > > On Monday 26 March 2007 01:19, malc wrote:
> > Erm... i just looked at the code and suddenly it stopped making any sense
> > at all:
> >
> >  p->last_ran = rq->most_recent_timestamp = now;
> >  /* Sanity check. It should never go backwards or ruin accounting
> > */ if (unlikely(now < p->last_ran))
> >  return;
> >  time_diff = now - p->last_ran;
> >
> > First `now' is assigned to `p->last_ran' and the very next line
> > compares those two values, and then the difference is taken.. I quite
> > frankly am either very tired or fail to see the point.. time_diff is
> > either always zero or there's always a race here.
>
> Bah major thinko error on my part! That will teach me to post patches
> untested at 1:30 am. I'll try again shortly sorry.

Ok this one is heavily tested. Please try it when you find the time.

---
Currently we only do cpu accounting to userspace based on what is 
actually happening precisely on each tick. The accuracy of that 
accounting gets progressively worse the lower HZ is. As we already keep 
accounting of nanosecond resolution we can accurately track user cpu, 
nice cpu and idle cpu if we move the accounting to update_cpu_clock with 
a nanosecond cpu_usage_stat entry. This increases overhead slightly but 
avoids the problem of tick aliasing errors making accounting unreliable.

Remove the now defunct Documentation/cpu-load.txt file.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>

---
 Documentation/cpu-load.txt  |  113 
 include/linux/kernel_stat.h |3 +
 include/linux/sched.h   |2 
 kernel/sched.c  |   58 +-
 kernel/timer.c  |5 -
 5 files changed, 60 insertions(+), 121 deletions(-)

Index: linux-2.6.21-rc4-acct/include/linux/kernel_stat.h
===
--- linux-2.6.21-rc4-acct.orig/include/linux/kernel_stat.h  2007-03-26 
00:56:05.0 +1000
+++ linux-2.6.21-rc4-acct/include/linux/kernel_stat.h   2007-03-26 
00:56:25.0 +1000
@@ -16,11 +16,14 @@
 
 struct cpu_usage_stat {
cputime64_t user;
+   cputime64_t user_ns;
cputime64_t nice;
+   cputime64_t nice_ns;
cputime64_t system;
cputime64_t softirq;
cputime64_t irq;
cputime64_t idle;
+   cputime64_t idle_ns;
cputime64_t iowait;
cputime64_t steal;
 };
Index: linux-2.6.21-rc4-acct/include/linux/sched.h
===
--- linux-2.6.21-rc4-acct.orig/include/linux/sched.h2007-03-26 
00:56:05.0 +1000
+++ linux-2.6.21-rc4-acct/include/linux/sched.h 2007-03-26 00:57:01.0 
+1000
@@ -882,7 +882,7 @@ struct task_struct {
int __user *clear_child_tid;/* CLONE_CHILD_CLEARTID */
 
unsigned long rt_priority;
-   cputime_t utime, stime;
+   cputime_t utime, utime_ns, stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time;
 /* mm fault and swap info: this can arguably be seen as either mm-specific or 
thread-specific */
Index: linux-2.6.21-rc4-acct/kernel/sched.c
===
--- linux-2.6.21-rc4-acct.orig/kernel/sched.c   2007-03-26 00:56:05.0 
+1000
+++ linux-2.6.21-rc4-acct/kernel/sched.c2007-03-26 09:38:50.0 
+1000
@@ -89,6 +89,7 @@ unsigned long long __attribute__((weak))
  */
 #define NS_TO_JIFFIES(TIME)((TIME) / (10 / HZ))
 #define JIFFIES_TO_NS(TIME)((TIME) * (10 / HZ))
+#define JIFFY_NS   JIFFIES_TO_NS(1)
 
 /*
  * These are the 'tuning knobs' of the scheduler:
@@ -3017,8 +3018,59 @@ EXPORT_PER_CPU_SYMBOL(kstat);
 static inline void
 update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long now)
 {
-   p->sched_time += now - p->last_ran;
-   p->last_ran = rq->most_recent_timestamp = now;
+   struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+   cputime64_t time_diff;
+
+   /* Sanity check. It should never go backwards or ruin accounting */
+   if (unlikely(now < p->last_ran))
+   goto out_set;
+   /* All the userspace visible cpu accounting is done here */
+   time_diff = now - p->last_ran;
+   p->sched_time += time_diff;
+   if (p != rq->idle) {
+   cputime_t utime_diff = time_diff;
+
+   if (TASK_NICE(p) > 0) {
+   cpustat->nice_ns = cputime64_add(cpustat->nice_ns,
+time_diff);
+   if (cpustat->nice_ns > JIFFY_NS) {
+   cpustat->nice_ns =
+   cputime64_sub(cpustat->nice_ns,
+

Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Monday 26 March 2007 03:14, malc wrote:
> On Mon, 26 Mar 2007, Con Kolivas wrote:
> > On Monday 26 March 2007 01:19, malc wrote:
> >> On Mon, 26 Mar 2007, Con Kolivas wrote:
> >>> So before we go any further with this patch, can you try the following
> >>> one and see if this simple sanity check is enough?
> >>
> >> Sure (compiling the kernel now), too bad old axiom that testing can not
> >> confirm absence of bugs holds.
> >>
> >> I have one nit and one request from clarification. Question first (i
> >> admit i haven't looked at the surroundings of the patch maybe things
> >> would have been are self evident if i did):
> >>
> >> What this patch amounts to is that the accounting logic is moved from
> >> timer interrupt to the place where scheduler switches task (or something
> >> to that effect)?
> >
> > Both the scheduler tick and context switch now. So yes it adds overhead
> > as I said, although we already do update_cpu_clock on context switch, but
> > it's not this complex.
> >
> >> [..snip..]
> >>
> >>>  * These are the 'tuning knobs' of the scheduler:
> >>> @@ -3017,8 +3018,53 @@ EXPORT_PER_CPU_SYMBOL(kstat);
> >>> static inline void
> >>> update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long
> >>> long now) {
> >>> - p->sched_time += now - p->last_ran;
> >>> + struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
> >>> + cputime64_t time_diff;
> >>> +
> >>>   p->last_ran = rq->most_recent_timestamp = now;
> >>> + /* Sanity check. It should never go backwards or ruin accounting */
> >>> + if (unlikely(now < p->last_ran))
> >>> + return;
> >>> + time_diff = now - p->last_ran;
> >>
> >> A nit. Anything wrong with:
> >>
> >> time_diff = now - p->last_ran;
> >> if (unlikeley (LESS_THAN_ZERO (time_diff))
> >>  return;
> >
> > Does LESS_THAN_ZERO work on a cputime64_t on all arches? I can't figure
> > that out just by looking myself which is why I did it the other way.
>
> I have no idea what type cputime64_t really is, so used this imaginary
> LESS_THAN_ZERO thing.
>
> Erm... i just looked at the code and suddenly it stopped making any sense
> at all:
>
>  p->last_ran = rq->most_recent_timestamp = now;
>  /* Sanity check. It should never go backwards or ruin accounting
> */ if (unlikely(now < p->last_ran))
>  return;
>  time_diff = now - p->last_ran;
>
> First `now' is assigned to `p->last_ran' and the very next line
> compares those two values, and then the difference is taken.. I quite
> frankly am either very tired or fail to see the point.. time_diff is
> either always zero or there's always a race here.

Bah major thinko error on my part! That will teach me to post patches untested 
at 1:30 am. I'll try again shortly sorry.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread malc

On Mon, 26 Mar 2007, Con Kolivas wrote:


On Monday 26 March 2007 01:19, malc wrote:

On Mon, 26 Mar 2007, Con Kolivas wrote:

So before we go any further with this patch, can you try the following
one and see if this simple sanity check is enough?


Sure (compiling the kernel now), too bad old axiom that testing can not
confirm absence of bugs holds.

I have one nit and one request from clarification. Question first (i
admit i haven't looked at the surroundings of the patch maybe things
would have been are self evident if i did):

What this patch amounts to is that the accounting logic is moved from
timer interrupt to the place where scheduler switches task (or something
to that effect)?


Both the scheduler tick and context switch now. So yes it adds overhead as I
said, although we already do update_cpu_clock on context switch, but it's not
this complex.


[..snip..]


 * These are the 'tuning knobs' of the scheduler:
@@ -3017,8 +3018,53 @@ EXPORT_PER_CPU_SYMBOL(kstat);
static inline void
update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long
now) {
-   p->sched_time += now - p->last_ran;
+   struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+   cputime64_t time_diff;
+
p->last_ran = rq->most_recent_timestamp = now;
+   /* Sanity check. It should never go backwards or ruin accounting */
+   if (unlikely(now < p->last_ran))
+   return;
+   time_diff = now - p->last_ran;


A nit. Anything wrong with:

time_diff = now - p->last_ran;
if (unlikeley (LESS_THAN_ZERO (time_diff))
 return;


Does LESS_THAN_ZERO work on a cputime64_t on all arches? I can't figure that
out just by looking myself which is why I did it the other way.


I have no idea what type cputime64_t really is, so used this imaginary
LESS_THAN_ZERO thing.

Erm... i just looked at the code and suddenly it stopped making any sense
at all:

p->last_ran = rq->most_recent_timestamp = now;
/* Sanity check. It should never go backwards or ruin accounting */
if (unlikely(now < p->last_ran))
return;
time_diff = now - p->last_ran;

First `now' is assigned to `p->last_ran' and the very next line
compares those two values, and then the difference is taken.. I quite
frankly am either very tired or fail to see the point.. time_diff is
either always zero or there's always a race here.

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Monday 26 March 2007 01:19, malc wrote:
> On Mon, 26 Mar 2007, Con Kolivas wrote:
> > So before we go any further with this patch, can you try the following
> > one and see if this simple sanity check is enough?
>
> Sure (compiling the kernel now), too bad old axiom that testing can not
> confirm absence of bugs holds.
>
> I have one nit and one request from clarification. Question first (i
> admit i haven't looked at the surroundings of the patch maybe things
> would have been are self evident if i did):
>
> What this patch amounts to is that the accounting logic is moved from
> timer interrupt to the place where scheduler switches task (or something
> to that effect)?

Both the scheduler tick and context switch now. So yes it adds overhead as I 
said, although we already do update_cpu_clock on context switch, but it's not 
this complex.

> [..snip..]
>
> >  * These are the 'tuning knobs' of the scheduler:
> > @@ -3017,8 +3018,53 @@ EXPORT_PER_CPU_SYMBOL(kstat);
> > static inline void
> > update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long
> > now) {
> > -   p->sched_time += now - p->last_ran;
> > +   struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
> > +   cputime64_t time_diff;
> > +
> > p->last_ran = rq->most_recent_timestamp = now;
> > +   /* Sanity check. It should never go backwards or ruin accounting */
> > +   if (unlikely(now < p->last_ran))
> > +   return;
> > +   time_diff = now - p->last_ran;
>
> A nit. Anything wrong with:
>
> time_diff = now - p->last_ran;
> if (unlikeley (LESS_THAN_ZERO (time_diff))
>  return;

Does LESS_THAN_ZERO work on a cputime64_t on all arches? I can't figure that 
out just by looking myself which is why I did it the other way.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread malc

On Mon, 26 Mar 2007, Con Kolivas wrote:


On Monday 26 March 2007 00:57, malc wrote:

On Mon, 26 Mar 2007, Con Kolivas wrote:

On Sunday 25 March 2007 23:06, malc wrote:

On Sun, 25 Mar 2007, Con Kolivas wrote:

On Sunday 25 March 2007 21:46, Con Kolivas wrote:

On Sunday 25 March 2007 21:34, malc wrote:

On Sun, 25 Mar 2007, Ingo Molnar wrote:

* Con Kolivas <[EMAIL PROTECTED]> wrote:

For an rsdl 0.33 patched kernel. Comments? Overhead worth it?


[..snip..]


---
Currently we only do cpu accounting to userspace based on what is
actually happening precisely on each tick. The accuracy of that
accounting gets progressively worse the lower HZ is. As we already keep
accounting of nanosecond resolution we can accurately track user cpu,
nice cpu and idle cpu if we move the accounting to update_cpu_clock
with a nanosecond cpu_usage_stat entry. This increases overhead
slightly but avoids the problem of tick aliasing errors making
accounting unreliable.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>


[..snip..]

Forgot to mention. Given that this goes into the kernel, shouldn't
Documentation/cpu-load.txt be amended/removed?


Yes that's a good idea. Also there should be a sanity check because
sometimes for some reason noone's been able to explain to me sched_clock
gives a value which doesn't make sense (time appears to have gone
backwards) and that will completely ruin the accounting from then on.


After running this new kernel for a while i guess i have hit this issue:
http://www.boblycat.org/~malc/apc/bad-load.png

Top and icewm's monitor do show incredibly huge load while in reality
nothing like that is really happening. Both ad-hoc and `/proc/stat' (idle)
show normal CPU utilization (7% since i'm doing some A/V stuff in the
background)


Yes I'd say you hit the problem I described earlier. When playing with
sched_clock() I found it gave some "interesting" results fairly infrequently.
They could lead to ridiculous accounting mistakes.

So before we go any further with this patch, can you try the following one and
see if this simple sanity check is enough?


Sure (compiling the kernel now), too bad old axiom that testing can not
confirm absence of bugs holds.

I have one nit and one request from clarification. Question first (i
admit i haven't looked at the surroundings of the patch maybe things
would have been are self evident if i did):

What this patch amounts to is that the accounting logic is moved from
timer interrupt to the place where scheduler switches task (or something
to that effect)?

[..snip..]


 * These are the 'tuning knobs' of the scheduler:
@@ -3017,8 +3018,53 @@ EXPORT_PER_CPU_SYMBOL(kstat);
static inline void
update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long now)
{
-   p->sched_time += now - p->last_ran;
+   struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+   cputime64_t time_diff;
+
p->last_ran = rq->most_recent_timestamp = now;
+   /* Sanity check. It should never go backwards or ruin accounting */
+   if (unlikely(now < p->last_ran))
+   return;
+   time_diff = now - p->last_ran;


A nit. Anything wrong with:

time_diff = now - p->last_ran;
if (unlikeley (LESS_THAN_ZERO (time_diff))
return;

[..snip..]

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Monday 26 March 2007 00:57, malc wrote:
> On Mon, 26 Mar 2007, Con Kolivas wrote:
> > On Sunday 25 March 2007 23:06, malc wrote:
> >> On Sun, 25 Mar 2007, Con Kolivas wrote:
> >>> On Sunday 25 March 2007 21:46, Con Kolivas wrote:
>  On Sunday 25 March 2007 21:34, malc wrote:
> > On Sun, 25 Mar 2007, Ingo Molnar wrote:
> >> * Con Kolivas <[EMAIL PROTECTED]> wrote:
> >>> For an rsdl 0.33 patched kernel. Comments? Overhead worth it?
> >>
> >> [..snip..]
> >>
> >>> ---
> >>> Currently we only do cpu accounting to userspace based on what is
> >>> actually happening precisely on each tick. The accuracy of that
> >>> accounting gets progressively worse the lower HZ is. As we already keep
> >>> accounting of nanosecond resolution we can accurately track user cpu,
> >>> nice cpu and idle cpu if we move the accounting to update_cpu_clock
> >>> with a nanosecond cpu_usage_stat entry. This increases overhead
> >>> slightly but avoids the problem of tick aliasing errors making
> >>> accounting unreliable.
> >>>
> >>> Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
> >>> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> >>
> >> [..snip..]
> >>
> >> Forgot to mention. Given that this goes into the kernel, shouldn't
> >> Documentation/cpu-load.txt be amended/removed?
> >
> > Yes that's a good idea. Also there should be a sanity check because
> > sometimes for some reason noone's been able to explain to me sched_clock
> > gives a value which doesn't make sense (time appears to have gone
> > backwards) and that will completely ruin the accounting from then on.
>
> After running this new kernel for a while i guess i have hit this issue:
> http://www.boblycat.org/~malc/apc/bad-load.png
>
> Top and icewm's monitor do show incredibly huge load while in reality
> nothing like that is really happening. Both ad-hoc and `/proc/stat' (idle)
> show normal CPU utilization (7% since i'm doing some A/V stuff in the
> background)

Yes I'd say you hit the problem I described earlier. When playing with
sched_clock() I found it gave some "interesting" results fairly infrequently.
They could lead to ridiculous accounting mistakes.

So before we go any further with this patch, can you try the following one and 
see if this simple sanity check is enough?

Thanks!

---
Currently we only do cpu accounting to userspace based on what is 
actually happening precisely on each tick. The accuracy of that 
accounting gets progressively worse the lower HZ is. As we already keep 
accounting of nanosecond resolution we can accurately track user cpu, 
nice cpu and idle cpu if we move the accounting to update_cpu_clock with 
a nanosecond cpu_usage_stat entry. This increases overhead slightly but 
avoids the problem of tick aliasing errors making accounting unreliable.

Remove the now defunct Documentation/cpu-load.txt file.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
---
 Documentation/cpu-load.txt  |  113 
 include/linux/kernel_stat.h |3 +
 include/linux/sched.h   |2 
 kernel/sched.c  |   50 ++-
 kernel/timer.c  |5 -
 5 files changed, 53 insertions(+), 120 deletions(-)

Index: linux-2.6.21-rc4-acct/include/linux/kernel_stat.h
===
--- linux-2.6.21-rc4-acct.orig/include/linux/kernel_stat.h  2007-03-26 
00:56:05.0 +1000
+++ linux-2.6.21-rc4-acct/include/linux/kernel_stat.h   2007-03-26 
00:56:25.0 +1000
@@ -16,11 +16,14 @@
 
 struct cpu_usage_stat {
cputime64_t user;
+   cputime64_t user_ns;
cputime64_t nice;
+   cputime64_t nice_ns;
cputime64_t system;
cputime64_t softirq;
cputime64_t irq;
cputime64_t idle;
+   cputime64_t idle_ns;
cputime64_t iowait;
cputime64_t steal;
 };
Index: linux-2.6.21-rc4-acct/include/linux/sched.h
===
--- linux-2.6.21-rc4-acct.orig/include/linux/sched.h2007-03-26 
00:56:05.0 +1000
+++ linux-2.6.21-rc4-acct/include/linux/sched.h 2007-03-26 00:57:01.0 
+1000
@@ -882,7 +882,7 @@ struct task_struct {
int __user *clear_child_tid;/* CLONE_CHILD_CLEARTID */
 
unsigned long rt_priority;
-   cputime_t utime, stime;
+   cputime_t utime, utime_ns, stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time;
 /* mm fault and swap info: this can arguably be seen as either mm-specific or 
thread-specific */
Index: linux-2.6.21-rc4-acct/kernel/sched.c
===
--- linux-2.6.21-rc4-acct.orig/kernel/sched.c   2007-03-26 00:56:05.0 
+1000
+++ linux-2.6.21-rc4-acct/kernel/sched.c2007-03-26 01:01:22.0 
+1000
@@ -89,6 +89,7 @@ unsigned long long __attribute__((weak))
  */
 #define NS_TO_JIFFIES(TIME)((TIME) / (10 / HZ))

Re: [patch] sched: accurate user accounting

2007-03-25 Thread malc

On Mon, 26 Mar 2007, Con Kolivas wrote:


On Sunday 25 March 2007 23:06, malc wrote:

On Sun, 25 Mar 2007, Con Kolivas wrote:

On Sunday 25 March 2007 21:46, Con Kolivas wrote:

On Sunday 25 March 2007 21:34, malc wrote:

On Sun, 25 Mar 2007, Ingo Molnar wrote:

* Con Kolivas <[EMAIL PROTECTED]> wrote:

For an rsdl 0.33 patched kernel. Comments? Overhead worth it?


[..snip..]


---
Currently we only do cpu accounting to userspace based on what is
actually happening precisely on each tick. The accuracy of that
accounting gets progressively worse the lower HZ is. As we already keep
accounting of nanosecond resolution we can accurately track user cpu,
nice cpu and idle cpu if we move the accounting to update_cpu_clock with
a nanosecond cpu_usage_stat entry. This increases overhead slightly but
avoids the problem of tick aliasing errors making accounting unreliable.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>


[..snip..]

Forgot to mention. Given that this goes into the kernel, shouldn't
Documentation/cpu-load.txt be amended/removed?


Yes that's a good idea. Also there should be a sanity check because sometimes
for some reason noone's been able to explain to me sched_clock gives a value
which doesn't make sense (time appears to have gone backwards) and that will
completely ruin the accounting from then on.


After running this new kernel for a while i guess i have hit this issue:
http://www.boblycat.org/~malc/apc/bad-load.png

Top and icewm's monitor do show incredibly huge load while in reality 
nothing like that is really happening. Both ad-hoc and `/proc/stat' (idle)

show normal CPU utilization (7% since i'm doing some A/V stuff in the
background)

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Sunday 25 March 2007 23:06, malc wrote:
> On Sun, 25 Mar 2007, Con Kolivas wrote:
> > On Sunday 25 March 2007 21:46, Con Kolivas wrote:
> >> On Sunday 25 March 2007 21:34, malc wrote:
> >>> On Sun, 25 Mar 2007, Ingo Molnar wrote:
>  * Con Kolivas <[EMAIL PROTECTED]> wrote:
> > For an rsdl 0.33 patched kernel. Comments? Overhead worth it?
>
> [..snip..]
>
> > ---
> > Currently we only do cpu accounting to userspace based on what is
> > actually happening precisely on each tick. The accuracy of that
> > accounting gets progressively worse the lower HZ is. As we already keep
> > accounting of nanosecond resolution we can accurately track user cpu,
> > nice cpu and idle cpu if we move the accounting to update_cpu_clock with
> > a nanosecond cpu_usage_stat entry. This increases overhead slightly but
> > avoids the problem of tick aliasing errors making accounting unreliable.
> >
> > Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
> > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
>
> [..snip..]
>
> Forgot to mention. Given that this goes into the kernel, shouldn't
> Documentation/cpu-load.txt be amended/removed?

Yes that's a good idea. Also there should be a sanity check because sometimes 
for some reason noone's been able to explain to me sched_clock gives a value 
which doesn't make sense (time appears to have gone backwards) and that will 
completely ruin the accounting from then on. 

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Gene Heskett
On Sunday 25 March 2007, Con Kolivas wrote:
>On Sunday 25 March 2007 22:32, Gene Heskett wrote:
>> On Sunday 25 March 2007, Con Kolivas wrote:
>> >On Sunday 25 March 2007 21:46, Con Kolivas wrote:
>> >> On Sunday 25 March 2007 21:34, malc wrote:
>> >> > On Sun, 25 Mar 2007, Ingo Molnar wrote:
>> >> > > * Con Kolivas <[EMAIL PROTECTED]> wrote:
>> >> > >> For an rsdl 0.33 patched kernel. Comments? Overhead worth it?
>> >> > >
>> >> > > we want to do this - and we should do this to the vanilla
>> >> > > scheduler first and check the results. I've back-merged the
>> >> > > patch to before RSDL and have tested it - find the patch below.
>> >> > > Vale, could you try this patch against a 2.6.21-rc4-ish kernel
>> >> > > and re-test your testcase?
>> >> >
>> >> > [..snip..]
>> >> >
>> >> > Compilation failed with:
>> >> > kernel/built-in.o(.sched.text+0x564): more undefined references
>> >> > to `__udivdi3' follow
>> >> >
>> >> > $ gcc --version | head -1
>> >> > gcc (GCC) 3.4.6
>> >> >
>> >> > $ cat /proc/cpuinfo | grep cpu
>> >> > cpu : 7447A, altivec supported
>> >> >
>> >> > Can't say i really understand why 64bit arithmetics suddenly
>> >> > became an issue here.
>> >>
>> >> Probably due to use of:
>> >>
>> >> #define NS_TO_JIFFIES(TIME)   ((TIME) / (10 / HZ))
>> >> #define JIFFIES_TO_NS(TIME)   ((TIME) * (10 / HZ))
>> >>
>> >> Excuse our 64bit world while we strive to correct our 32bit
>> >> blindness and fix this bug.
>> >
>> >Please try this (akpm please don't include till we confirm it builds
>> > on ppc, sorry). For 2.6.21-rc4
>> >
>> >---
>> >Currently we only do cpu accounting to userspace based on what is
>> >actually happening precisely on each tick. The accuracy of that
>> >accounting gets progressively worse the lower HZ is. As we already
>> > keep accounting of nanosecond resolution we can accurately track
>> > user cpu, nice cpu and idle cpu if we move the accounting to
>> > update_cpu_clock with a nanosecond cpu_usage_stat entry. This
>> > increases overhead slightly but avoids the problem of tick aliasing
>> > errors making accounting unreliable.
>>
>> I'm playing again because the final 2.6.20.4 does NOT break amanda,
>> where 2.6.20.4-rc1 did.
>
>Yes only the original version I posted on this email thread was for an
> RSDL 0.33 patched kernel. That original patch should build fine on i386
> and x86_64 (where I tried it). This version I sent out following Ingo's
> lead has 2.6.21-rc4 in mind (without rsdl).

And since I have an amdump session running in the background using the 
first 2.6.20.4-rdsl-0.33 patch I got from your web page about an hour 
ago, I am very pleased with this, I still have a machine as gzip is being 
restrained to a max of about 83% of the cpu.  And according to my 
grepping of the /tmp/amanda-dbg/client/Daily/sendsize* files, this patch 
does not break amanda.

Also, to Ingo Molnar, I've removed that sched-batch thing from my scripts 
as this feels considerably smoother in comparison.

Thank you very much, Con, and I hope you are improving.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Aberdeen was so small that when the family with the car went
on vacation, the gas station and drive-in theatre had to close.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread malc

On Sun, 25 Mar 2007, Con Kolivas wrote:


On Sunday 25 March 2007 21:46, Con Kolivas wrote:

On Sunday 25 March 2007 21:34, malc wrote:

On Sun, 25 Mar 2007, Ingo Molnar wrote:

* Con Kolivas <[EMAIL PROTECTED]> wrote:

For an rsdl 0.33 patched kernel. Comments? Overhead worth it?


[..snip..]



---
Currently we only do cpu accounting to userspace based on what is
actually happening precisely on each tick. The accuracy of that
accounting gets progressively worse the lower HZ is. As we already keep
accounting of nanosecond resolution we can accurately track user cpu,
nice cpu and idle cpu if we move the accounting to update_cpu_clock with
a nanosecond cpu_usage_stat entry. This increases overhead slightly but
avoids the problem of tick aliasing errors making accounting unreliable.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>


[..snip..]

Forgot to mention. Given that this goes into the kernel, shouldn't
Documentation/cpu-load.txt be amended/removed?

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread malc

On Sun, 25 Mar 2007, Con Kolivas wrote:


On Sunday 25 March 2007 21:46, Con Kolivas wrote:

On Sunday 25 March 2007 21:34, malc wrote:

On Sun, 25 Mar 2007, Ingo Molnar wrote:

* Con Kolivas <[EMAIL PROTECTED]> wrote:

For an rsdl 0.33 patched kernel. Comments? Overhead worth it?


we want to do this - and we should do this to the vanilla scheduler
first and check the results. I've back-merged the patch to before RSDL
and have tested it - find the patch below. Vale, could you try this
patch against a 2.6.21-rc4-ish kernel and re-test your testcase?


[..snip..]

Compilation failed with:
kernel/built-in.o(.sched.text+0x564): more undefined references to
`__udivdi3' follow

$ gcc --version | head -1
gcc (GCC) 3.4.6

$ cat /proc/cpuinfo | grep cpu
cpu : 7447A, altivec supported

Can't say i really understand why 64bit arithmetics suddenly became an
issue here.


Probably due to use of:

#define NS_TO_JIFFIES(TIME) ((TIME) / (10 / HZ))
#define JIFFIES_TO_NS(TIME) ((TIME) * (10 / HZ))

Excuse our 64bit world while we strive to correct our 32bit blindness and
fix this bug.


Please try this (akpm please don't include till we confirm it builds on ppc,
sorry). For 2.6.21-rc4



Works. Accuracy is on par (most of the time with fraction of the
percent) with ad hoc method my kernel module employs. However there's
something strange happening w.r.t. iowait, it looks as if it doesn't
affect the idle field any more, nothing big but a digression from the
previous behavior. Thank you very much for this.

--
vale
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Sunday 25 March 2007 22:32, Gene Heskett wrote:
> On Sunday 25 March 2007, Con Kolivas wrote:
> >On Sunday 25 March 2007 21:46, Con Kolivas wrote:
> >> On Sunday 25 March 2007 21:34, malc wrote:
> >> > On Sun, 25 Mar 2007, Ingo Molnar wrote:
> >> > > * Con Kolivas <[EMAIL PROTECTED]> wrote:
> >> > >> For an rsdl 0.33 patched kernel. Comments? Overhead worth it?
> >> > >
> >> > > we want to do this - and we should do this to the vanilla
> >> > > scheduler first and check the results. I've back-merged the patch
> >> > > to before RSDL and have tested it - find the patch below. Vale,
> >> > > could you try this patch against a 2.6.21-rc4-ish kernel and
> >> > > re-test your testcase?
> >> >
> >> > [..snip..]
> >> >
> >> > Compilation failed with:
> >> > kernel/built-in.o(.sched.text+0x564): more undefined references to
> >> > `__udivdi3' follow
> >> >
> >> > $ gcc --version | head -1
> >> > gcc (GCC) 3.4.6
> >> >
> >> > $ cat /proc/cpuinfo | grep cpu
> >> > cpu : 7447A, altivec supported
> >> >
> >> > Can't say i really understand why 64bit arithmetics suddenly became
> >> > an issue here.
> >>
> >> Probably due to use of:
> >>
> >> #define NS_TO_JIFFIES(TIME)((TIME) / (10 / HZ))
> >> #define JIFFIES_TO_NS(TIME)((TIME) * (10 / HZ))
> >>
> >> Excuse our 64bit world while we strive to correct our 32bit blindness
> >> and fix this bug.
> >
> >Please try this (akpm please don't include till we confirm it builds on
> > ppc, sorry). For 2.6.21-rc4
> >
> >---
> >Currently we only do cpu accounting to userspace based on what is
> >actually happening precisely on each tick. The accuracy of that
> >accounting gets progressively worse the lower HZ is. As we already keep
> >accounting of nanosecond resolution we can accurately track user cpu,
> >nice cpu and idle cpu if we move the accounting to update_cpu_clock with
> >a nanosecond cpu_usage_stat entry. This increases overhead slightly but
> >avoids the problem of tick aliasing errors making accounting unreliable.

>
> I'm playing again because the final 2.6.20.4 does NOT break amanda, where
> 2.6.20.4-rc1 did.

Yes only the original version I posted on this email thread was for an RSDL 
0.33 patched kernel. That original patch should build fine on i386 and x86_64 
(where I tried it). This version I sent out following Ingo's lead has 
2.6.21-rc4 in mind (without rsdl).

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Gene Heskett
On Sunday 25 March 2007, Con Kolivas wrote:
>On Sunday 25 March 2007 21:46, Con Kolivas wrote:
>> On Sunday 25 March 2007 21:34, malc wrote:
>> > On Sun, 25 Mar 2007, Ingo Molnar wrote:
>> > > * Con Kolivas <[EMAIL PROTECTED]> wrote:
>> > >> For an rsdl 0.33 patched kernel. Comments? Overhead worth it?
>> > >
>> > > we want to do this - and we should do this to the vanilla
>> > > scheduler first and check the results. I've back-merged the patch
>> > > to before RSDL and have tested it - find the patch below. Vale,
>> > > could you try this patch against a 2.6.21-rc4-ish kernel and
>> > > re-test your testcase?
>> >
>> > [..snip..]
>> >
>> > Compilation failed with:
>> > kernel/built-in.o(.sched.text+0x564): more undefined references to
>> > `__udivdi3' follow
>> >
>> > $ gcc --version | head -1
>> > gcc (GCC) 3.4.6
>> >
>> > $ cat /proc/cpuinfo | grep cpu
>> > cpu : 7447A, altivec supported
>> >
>> > Can't say i really understand why 64bit arithmetics suddenly became
>> > an issue here.
>>
>> Probably due to use of:
>>
>> #define NS_TO_JIFFIES(TIME)  ((TIME) / (10 / HZ))
>> #define JIFFIES_TO_NS(TIME)  ((TIME) * (10 / HZ))
>>
>> Excuse our 64bit world while we strive to correct our 32bit blindness
>> and fix this bug.
>
>Please try this (akpm please don't include till we confirm it builds on
> ppc, sorry). For 2.6.21-rc4
>
>---
>Currently we only do cpu accounting to userspace based on what is
>actually happening precisely on each tick. The accuracy of that
>accounting gets progressively worse the lower HZ is. As we already keep
>accounting of nanosecond resolution we can accurately track user cpu,
>nice cpu and idle cpu if we move the accounting to update_cpu_clock with
>a nanosecond cpu_usage_stat entry. This increases overhead slightly but
>avoids the problem of tick aliasing errors making accounting unreliable.
>
>Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
>Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
>---
> include/linux/kernel_stat.h |3 ++
> include/linux/sched.h   |2 -
> kernel/sched.c  |   46
> +--- kernel/timer.c
>  |5 +---
> 4 files changed, 49 insertions(+), 7 deletions(-)
>
>Index: linux-2.6.21-rc4-acct/include/linux/kernel_stat.h
>===
>--- linux-2.6.21-rc4-acct.orig/include/linux/kernel_stat.h 2006-09-21
> 19:54:58.0 +1000 +++
> linux-2.6.21-rc4-acct/include/linux/kernel_stat.h 2007-03-25
> 21:51:49.0 +1000 @@ -16,11 +16,14 @@
>
> struct cpu_usage_stat {
>   cputime64_t user;
>+  cputime64_t user_ns;
>   cputime64_t nice;
>+  cputime64_t nice_ns;
>   cputime64_t system;
>   cputime64_t softirq;
>   cputime64_t irq;
>   cputime64_t idle;
>+  cputime64_t idle_ns;
>   cputime64_t iowait;
>   cputime64_t steal;
> };
>Index: linux-2.6.21-rc4-acct/include/linux/sched.h
>===
>--- linux-2.6.21-rc4-acct.orig/include/linux/sched.h   2007-03-21
> 12:53:00.0 +1100 +++
> linux-2.6.21-rc4-acct/include/linux/sched.h   2007-03-25
> 21:51:49.0 +1000 @@ -882,7 +882,7 @@ struct task_struct {
>   int __user *clear_child_tid;/* CLONE_CHILD_CLEARTID */
>
>   unsigned long rt_priority;
>-  cputime_t utime, stime;
>+  cputime_t utime, utime_ns, stime;
>   unsigned long nvcsw, nivcsw; /* context switch counts */
>   struct timespec start_time;
> /* mm fault and swap info: this can arguably be seen as either
> mm-specific or thread-specific */ Index:
> linux-2.6.21-rc4-acct/kernel/sched.c
>===
This, hunk 1, did not apply as the #defines aren't there after applying 
0.33 to a 2.6.20.4 kernel tree.

>--- linux-2.6.21-rc4-acct.orig/kernel/sched.c  2007-03-21
> 12:53:00.0 +1100 +++
> linux-2.6.21-rc4-acct/kernel/sched.c  2007-03-25 21:58:27.0
> +1000 @@ -89,6 +89,7 @@ unsigned long long __attribute__((weak))
>  */
> #define NS_TO_JIFFIES(TIME)   ((TIME) / (10 / HZ))
> #define JIFFIES_TO_NS(TIME)   ((TIME) * (10 / HZ))
>+#define JIFFY_NS  JIFFIES_TO_NS(1)
>
> /*
>  * These are the 'tuning knobs' of the scheduler:
>@@ -3017,8 +3018,49 @@ EXPORT_PER_CPU_SYMBOL(kstat);
> static inline void
> update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long
> long now) {
>-  p->sched_time += now - p->last_ran;
>+  struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
>+  cputime64_t time_diff = now - p->last_ran;
>+
>+  p->sched_time += time_diff;
>   p->last_ran = rq->most_recent_timestamp = now;
>+  if (p != rq->idle) {
>+  cputime_t utime_diff = time_diff;
>+
>+  if (TASK_NICE(p) > 0) {
>+  cpustat->nice_ns = cputime64_add(cpustat->nice_ns,
>+

Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Sunday 25 March 2007 21:46, Con Kolivas wrote:
> On Sunday 25 March 2007 21:34, malc wrote:
> > On Sun, 25 Mar 2007, Ingo Molnar wrote:
> > > * Con Kolivas <[EMAIL PROTECTED]> wrote:
> > >> For an rsdl 0.33 patched kernel. Comments? Overhead worth it?
> > >
> > > we want to do this - and we should do this to the vanilla scheduler
> > > first and check the results. I've back-merged the patch to before RSDL
> > > and have tested it - find the patch below. Vale, could you try this
> > > patch against a 2.6.21-rc4-ish kernel and re-test your testcase?
> >
> > [..snip..]
> >
> > Compilation failed with:
> > kernel/built-in.o(.sched.text+0x564): more undefined references to
> > `__udivdi3' follow
> >
> > $ gcc --version | head -1
> > gcc (GCC) 3.4.6
> >
> > $ cat /proc/cpuinfo | grep cpu
> > cpu : 7447A, altivec supported
> >
> > Can't say i really understand why 64bit arithmetics suddenly became an
> > issue here.
>
> Probably due to use of:
>
> #define NS_TO_JIFFIES(TIME)   ((TIME) / (10 / HZ))
> #define JIFFIES_TO_NS(TIME)   ((TIME) * (10 / HZ))
>
> Excuse our 64bit world while we strive to correct our 32bit blindness and
> fix this bug.

Please try this (akpm please don't include till we confirm it builds on ppc,
sorry). For 2.6.21-rc4

---
Currently we only do cpu accounting to userspace based on what is 
actually happening precisely on each tick. The accuracy of that 
accounting gets progressively worse the lower HZ is. As we already keep 
accounting of nanosecond resolution we can accurately track user cpu, 
nice cpu and idle cpu if we move the accounting to update_cpu_clock with 
a nanosecond cpu_usage_stat entry. This increases overhead slightly but 
avoids the problem of tick aliasing errors making accounting unreliable.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/kernel_stat.h |3 ++
 include/linux/sched.h   |2 -
 kernel/sched.c  |   46 +---
 kernel/timer.c  |5 +---
 4 files changed, 49 insertions(+), 7 deletions(-)

Index: linux-2.6.21-rc4-acct/include/linux/kernel_stat.h
===
--- linux-2.6.21-rc4-acct.orig/include/linux/kernel_stat.h  2006-09-21 
19:54:58.0 +1000
+++ linux-2.6.21-rc4-acct/include/linux/kernel_stat.h   2007-03-25 
21:51:49.0 +1000
@@ -16,11 +16,14 @@
 
 struct cpu_usage_stat {
cputime64_t user;
+   cputime64_t user_ns;
cputime64_t nice;
+   cputime64_t nice_ns;
cputime64_t system;
cputime64_t softirq;
cputime64_t irq;
cputime64_t idle;
+   cputime64_t idle_ns;
cputime64_t iowait;
cputime64_t steal;
 };
Index: linux-2.6.21-rc4-acct/include/linux/sched.h
===
--- linux-2.6.21-rc4-acct.orig/include/linux/sched.h2007-03-21 
12:53:00.0 +1100
+++ linux-2.6.21-rc4-acct/include/linux/sched.h 2007-03-25 21:51:49.0 
+1000
@@ -882,7 +882,7 @@ struct task_struct {
int __user *clear_child_tid;/* CLONE_CHILD_CLEARTID */
 
unsigned long rt_priority;
-   cputime_t utime, stime;
+   cputime_t utime, utime_ns, stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time;
 /* mm fault and swap info: this can arguably be seen as either mm-specific or 
thread-specific */
Index: linux-2.6.21-rc4-acct/kernel/sched.c
===
--- linux-2.6.21-rc4-acct.orig/kernel/sched.c   2007-03-21 12:53:00.0 
+1100
+++ linux-2.6.21-rc4-acct/kernel/sched.c2007-03-25 21:58:27.0 
+1000
@@ -89,6 +89,7 @@ unsigned long long __attribute__((weak))
  */
 #define NS_TO_JIFFIES(TIME)((TIME) / (10 / HZ))
 #define JIFFIES_TO_NS(TIME)((TIME) * (10 / HZ))
+#define JIFFY_NS   JIFFIES_TO_NS(1)
 
 /*
  * These are the 'tuning knobs' of the scheduler:
@@ -3017,8 +3018,49 @@ EXPORT_PER_CPU_SYMBOL(kstat);
 static inline void
 update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long now)
 {
-   p->sched_time += now - p->last_ran;
+   struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+   cputime64_t time_diff = now - p->last_ran;
+
+   p->sched_time += time_diff;
p->last_ran = rq->most_recent_timestamp = now;
+   if (p != rq->idle) {
+   cputime_t utime_diff = time_diff;
+
+   if (TASK_NICE(p) > 0) {
+   cpustat->nice_ns = cputime64_add(cpustat->nice_ns,
+time_diff);
+   if (cpustat->nice_ns > JIFFY_NS) {
+   cpustat->nice_ns =
+   cputime64_sub(cpustat->nice_ns,
+   

Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Sunday 25 March 2007 21:34, malc wrote:
> On Sun, 25 Mar 2007, Ingo Molnar wrote:
> > * Con Kolivas <[EMAIL PROTECTED]> wrote:
> >> For an rsdl 0.33 patched kernel. Comments? Overhead worth it?
> >
> > we want to do this - and we should do this to the vanilla scheduler
> > first and check the results. I've back-merged the patch to before RSDL
> > and have tested it - find the patch below. Vale, could you try this
> > patch against a 2.6.21-rc4-ish kernel and re-test your testcase?
>
> [..snip..]
>
> Compilation failed with:
> kernel/built-in.o(.sched.text+0x564): more undefined references to
> `__udivdi3' follow
>
> $ gcc --version | head -1
> gcc (GCC) 3.4.6
>
> $ cat /proc/cpuinfo | grep cpu
> cpu : 7447A, altivec supported
>
> Can't say i really understand why 64bit arithmetics suddenly became an
> issue here.

Probably due to use of:

#define NS_TO_JIFFIES(TIME) ((TIME) / (10 / HZ))
#define JIFFIES_TO_NS(TIME) ((TIME) * (10 / HZ))

Excuse our 64bit world while we strive to correct our 32bit blindness and fix 
this bug.

>
> Am i supposed to run the testcase and see if numbers in `/proc/stat'
> now match the reality closer? To be really accurate `/proc/stat'
> should be left alone methinks, because no matter how good you try the
> fundamential fact that time(and consequently load percentage) is not
> really devided in USER_HZ intervals will interfere with ones quest for
> accurate statistics. (Wonder what this patch will do to slightly modified
> hog that produced this: http://www.boblycat.org/~malc/apc/load-c2d-hog.png
> but this will have to wait till i get to the PC at work)

It should far more accurately represent the cpu usage without any userspace 
changes.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread malc

On Sun, 25 Mar 2007, Ingo Molnar wrote:



* Con Kolivas <[EMAIL PROTECTED]> wrote:


For an rsdl 0.33 patched kernel. Comments? Overhead worth it?


we want to do this - and we should do this to the vanilla scheduler
first and check the results. I've back-merged the patch to before RSDL
and have tested it - find the patch below. Vale, could you try this
patch against a 2.6.21-rc4-ish kernel and re-test your testcase?


[..snip..]

Compilation failed with:
kernel/built-in.o(.sched.text+0x564): more undefined references to `__udivdi3' 
follow

$ gcc --version | head -1
gcc (GCC) 3.4.6

$ cat /proc/cpuinfo | grep cpu
cpu : 7447A, altivec supported

Can't say i really understand why 64bit arithmetics suddenly became an
issue here.

Am i supposed to run the testcase and see if numbers in `/proc/stat'
now match the reality closer? To be really accurate `/proc/stat'
should be left alone methinks, because no matter how good you try the
fundamential fact that time(and consequently load percentage) is not
really devided in USER_HZ intervals will interfere with ones quest for
accurate statistics. (Wonder what this patch will do to slightly modified
hog that produced this: http://www.boblycat.org/~malc/apc/load-c2d-hog.png
but this will have to wait till i get to the PC at work)

--
vale (this is not name/nick/alias, this is latin for farewell)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] sched: accurate user accounting

2007-03-25 Thread Ingo Molnar

* Con Kolivas <[EMAIL PROTECTED]> wrote:

> > +/*
> > + * Some helpers for converting nanosecond timing to jiffy resolution
> > + */
> > +#define NS_TO_JIFFIES(TIME)   ((TIME) / (10 / HZ))
> > +#define JIFFIES_TO_NS(TIME)   ((TIME) * (10 / HZ))
> > +
> 
> This hunk is already in mainline so it will be double defined now.

yeah. Updated patch below.

    Ingo

-->
Subject: [patch] sched: accurate user accounting
From: Con Kolivas <[EMAIL PROTECTED]>

Currently we only do cpu accounting to userspace based on what is 
actually happening precisely on each tick. The accuracy of that 
accounting gets progressively worse the lower HZ is. As we already keep 
accounting of nanosecond resolution we can accurately track user cpu, 
nice cpu and idle cpu if we move the accounting to update_cpu_clock with 
a nanosecond cpu_usage_stat entry. This increases overhead slightly but 
avoids the problem of tick aliasing errors making accounting unreliable.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/kernel_stat.h |3 ++
 include/linux/sched.h   |2 -
 kernel/sched.c  |   46 +---
 kernel/timer.c  |5 +---
 4 files changed, 49 insertions(+), 7 deletions(-)

Index: linux/include/linux/kernel_stat.h
===
--- linux.orig/include/linux/kernel_stat.h
+++ linux/include/linux/kernel_stat.h
@@ -16,11 +16,14 @@
 
 struct cpu_usage_stat {
cputime64_t user;
+   cputime64_t user_ns;
cputime64_t nice;
+   cputime64_t nice_ns;
cputime64_t system;
cputime64_t softirq;
cputime64_t irq;
cputime64_t idle;
+   cputime64_t idle_ns;
cputime64_t iowait;
cputime64_t steal;
 };
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -882,7 +882,7 @@ struct task_struct {
int __user *clear_child_tid;/* CLONE_CHILD_CLEARTID */
 
unsigned long rt_priority;
-   cputime_t utime, stime;
+   cputime_t utime, utime_ns, stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time;
 /* mm fault and swap info: this can arguably be seen as either mm-specific or 
thread-specific */
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3017,8 +3017,50 @@ EXPORT_PER_CPU_SYMBOL(kstat);
 static inline void
 update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long now)
 {
-   p->sched_time += now - p->last_ran;
+   struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+   cputime64_t time_diff = now - p->last_ran;
+
+   p->sched_time += time_diff;
p->last_ran = rq->most_recent_timestamp = now;
+   if (p != rq->idle) {
+   cputime_t utime_diff = time_diff;
+
+   if (TASK_NICE(p) > 0) {
+   cpustat->nice_ns = cputime64_add(cpustat->nice_ns,
+time_diff);
+   if (NS_TO_JIFFIES(cpustat->nice_ns) > 1) {
+   cpustat->nice_ns =
+   cputime64_sub(cpustat->nice_ns,
+   JIFFIES_TO_NS(1));
+   cpustat->nice =
+   cputime64_add(cpustat->nice, 1);
+   }
+   } else {
+   cpustat->user_ns = cputime64_add(cpustat->user_ns,
+   time_diff);
+   if (NS_TO_JIFFIES(cpustat->user_ns) > 1) {
+   cpustat->user_ns =
+   cputime64_sub(cpustat->user_ns,
+   JIFFIES_TO_NS(1));
+   cpustat ->user =
+   cputime64_add(cpustat->user, 1);
+   }
+   }
+   p->utime_ns = cputime_add(p->utime_ns, utime_diff);
+   if (NS_TO_JIFFIES(p->utime_ns) > 1) {
+   p->utime_ns = cputime_sub(p->utime_ns,
+ JIFFIES_TO_NS(1));
+   p->utime = cputime_add(p->utime,
+  jiffies_to_cputime(1));
+   }
+   } else {
+   cpustat->idle_ns = cputime64_add(cpustat->idle_ns, time_diff);
+   if (NS_TO_JIFFI

Re: [patch] sched: accurate user accounting

2007-03-25 Thread Con Kolivas
On Sunday 25 March 2007 17:51, Ingo Molnar wrote:
> * Con Kolivas <[EMAIL PROTECTED]> wrote:
> > For an rsdl 0.33 patched kernel. Comments? Overhead worth it?
>
> we want to do this - and we should do this to the vanilla scheduler
> first and check the results. I've back-merged the patch to before RSDL
> and have tested it - find the patch below. Vale, could you try this
> patch against a 2.6.21-rc4-ish kernel and re-test your testcase?

Great. That should fix a lot of misconceptions about cpu usage and HZ.

However-

> +/*
> + * Some helpers for converting nanosecond timing to jiffy resolution
> + */
> +#define NS_TO_JIFFIES(TIME)   ((TIME) / (10 / HZ))
> +#define JIFFIES_TO_NS(TIME)   ((TIME) * (10 / HZ))
> +

This hunk is already in mainline so it will be double defined now.

Thanks.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] sched: accurate user accounting

2007-03-24 Thread Ingo Molnar

* Con Kolivas <[EMAIL PROTECTED]> wrote:

> For an rsdl 0.33 patched kernel. Comments? Overhead worth it?

we want to do this - and we should do this to the vanilla scheduler 
first and check the results. I've back-merged the patch to before RSDL 
and have tested it - find the patch below. Vale, could you try this 
patch against a 2.6.21-rc4-ish kernel and re-test your testcase?

Ingo

----->
Subject: [patch] sched: accurate user accounting
From: Con Kolivas <[EMAIL PROTECTED]>

Currently we only do cpu accounting to userspace based on what is 
actually happening precisely on each tick. The accuracy of that 
accounting gets progressively worse the lower HZ is. As we already keep 
accounting of nanosecond resolution we can accurately track user cpu, 
nice cpu and idle cpu if we move the accounting to update_cpu_clock with 
a nanosecond cpu_usage_stat entry. This increases overhead slightly but 
avoids the problem of tick aliasing errors making accounting unreliable.

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 include/linux/kernel_stat.h |3 ++
 include/linux/sched.h   |2 -
 kernel/sched.c  |   52 +---
 kernel/timer.c  |5 +---
 4 files changed, 55 insertions(+), 7 deletions(-)

Index: linux/include/linux/kernel_stat.h
===
--- linux.orig/include/linux/kernel_stat.h
+++ linux/include/linux/kernel_stat.h
@@ -16,11 +16,14 @@
 
 struct cpu_usage_stat {
cputime64_t user;
+   cputime64_t user_ns;
cputime64_t nice;
+   cputime64_t nice_ns;
cputime64_t system;
cputime64_t softirq;
cputime64_t irq;
cputime64_t idle;
+   cputime64_t idle_ns;
cputime64_t iowait;
cputime64_t steal;
 };
Index: linux/include/linux/sched.h
===
--- linux.orig/include/linux/sched.h
+++ linux/include/linux/sched.h
@@ -882,7 +882,7 @@ struct task_struct {
int __user *clear_child_tid;/* CLONE_CHILD_CLEARTID */
 
unsigned long rt_priority;
-   cputime_t utime, stime;
+   cputime_t utime, utime_ns, stime;
unsigned long nvcsw, nivcsw; /* context switch counts */
struct timespec start_time;
 /* mm fault and swap info: this can arguably be seen as either mm-specific or 
thread-specific */
Index: linux/kernel/sched.c
===
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -167,6 +167,12 @@ unsigned long long __attribute__((weak))
(JIFFIES_TO_NS(MAX_SLEEP_AVG * \
(MAX_BONUS / 2 + DELTA((p)) + 1) / MAX_BONUS - 1))
 
+/*
+ * Some helpers for converting nanosecond timing to jiffy resolution
+ */
+#define NS_TO_JIFFIES(TIME)   ((TIME) / (10 / HZ))
+#define JIFFIES_TO_NS(TIME)   ((TIME) * (10 / HZ))
+
 #define TASK_PREEMPTS_CURR(p, rq) \
((p)->prio < (rq)->curr->prio)
 
@@ -3017,8 +3023,50 @@ EXPORT_PER_CPU_SYMBOL(kstat);
 static inline void
 update_cpu_clock(struct task_struct *p, struct rq *rq, unsigned long long now)
 {
-   p->sched_time += now - p->last_ran;
+   struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;
+   cputime64_t time_diff = now - p->last_ran;
+
+   p->sched_time += time_diff;
p->last_ran = rq->most_recent_timestamp = now;
+   if (p != rq->idle) {
+   cputime_t utime_diff = time_diff;
+
+   if (TASK_NICE(p) > 0) {
+   cpustat->nice_ns = cputime64_add(cpustat->nice_ns,
+time_diff);
+   if (NS_TO_JIFFIES(cpustat->nice_ns) > 1) {
+   cpustat->nice_ns =
+   cputime64_sub(cpustat->nice_ns,
+   JIFFIES_TO_NS(1));
+   cpustat->nice =
+   cputime64_add(cpustat->nice, 1);
+   }
+   } else {
+   cpustat->user_ns = cputime64_add(cpustat->user_ns,
+   time_diff);
+   if (NS_TO_JIFFIES(cpustat->user_ns) > 1) {
+   cpustat->user_ns =
+   cputime64_sub(cpustat->user_ns,
+   JIFFIES_TO_NS(1));
+   cpustat ->user =
+   cputime64_add(cpustat->user, 1);
+   }
+   }
+   p->utime_ns = cputime_add(p->utime_ns, utime_diff);
+   if (NS_TO_JIFFIES(p->utime_ns) >