Re: [ck] [REPORT] cfs-v5 vs sd-0.46
> > with cfs-v5 finally booting on my machine I have run my daily > > numbercrunching jobs on both cfs-v5 and sd-0.46, 2.6.21-v7 on > > top of a stock openSUSE 10.2 (X86_64). > > Thanks for testing. I actually enjoyed it -- the more extensive test I had promised two days ago is almost finished. There is just one test I have yet to (re)run and I will have a slot for it later today so I'll mail out the results comparing 2.6.21-rc7 (mainline) 2.6.21-rc7-sd046 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0) 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10) during the early afternoon (my time). > You have 3 tasks and only 2 cpus. The %cpu is the percentage of the cpu the > task is currently on that it is using; it is not the percentage of > the "overall cpu available on the machine". Since you have 3 tasks and 2 > cpus, the extra task will always be on one or the other cpu taking half of > the cpu but never on both cpus. I had assumed that given the interval of 3 sec the three tasks would be evenly distributed among the 2 CPUs thus resulting in a CPU% of 66 each because that's what they get in the long run anyway. Apparently 3 sec is too short an interval to see this. > What is important is that if all three tasks are fully cpu bound and started > at the same time at the same nice level, that they all receive close to the > same total cpu time overall showing some fairness is working as well. This > should be the case no matter how many cpus you have. They are started via 'make -j3' which implies they start at the same time (i.e. within a few msec). They initially load some data and then perform extensive computations on that data. Best, Michael -- Technosis GmbH, Geschäftsführer: Michael Gerdau, Tobias Dittmar Sitz Hamburg; HRB 89145 Amtsgericht Hamburg Vote against SPAM - see http://www.politik-digital.de/spam/ Michael Gerdau email: [EMAIL PROTECTED] GPG-keys available on request or at public keyserver pgplQ8ciID60h.pgp Description: PGP signature
Re: [ck] [REPORT] cfs-v5 vs sd-0.46
On Tuesday 24 April 2007 17:37, Michael Gerdau wrote: > Hi list, > > with cfs-v5 finally booting on my machine I have run my daily > numbercrunching jobs on both cfs-v5 and sd-0.46, 2.6.21-v7 on > top of a stock openSUSE 10.2 (X86_64). Thanks for testing. > Both cfs and sd showed very similar behavior when monitored in top. > I'll show more or less representative excerpt from a 10 minutes > log, delay 3sec. > > sd-0.46 > top - 00:14:24 up 1:17, 9 users, load average: 4.79, 4.95, 4.80 > Tasks: 3 total, 3 running, 0 sleeping, 0 stopped, 0 zombie > Cpu(s): 99.8%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.2%hi, 0.0%si, > 0.0%st Mem: 3348628k total, 1648560k used, 1700068k free,64392k > buffers Swap: 2097144k total,0k used, 2097144k free, 828204k > cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 6671 mgd 33 0 95508 22m 3652 R 100 0.7 44:28.11 perl > 6669 mgd 31 0 95176 22m 3652 R 50 0.7 43:50.02 perl > 6674 > mgd 31 0 95368 22m 3652 R 50 0.7 47:55.29 perl > > cfs-v5 > top - 08:07:50 up 21 min, 9 users, load average: 4.13, 4.16, 3.23 > Tasks: 3 total, 3 running, 0 sleeping, 0 stopped, 0 zombie > Cpu(s): 99.5%us, 0.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, > 0.0%st Mem: 3348624k total, 1193500k used, 2155124k free,32516k > buffers Swap: 2097144k total,0k used, 2097144k free, 545568k > cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 6357 mgd 20 0 92024 19m 3652 R 100 0.6 8:54.21 perl > 6356 mgd 20 0 91652 18m 3652 R 50 0.6 10:35.52 perl > 6359 mgd 20 0 91700 18m 3652 R 50 0.6 8:47.32 perl > > What did surprise me is that cpu utilization had been spread 100/50/50 > (round robin) most of the time. I did expect 66/66/66 or so. You have 3 tasks and only 2 cpus. The %cpu is the percentage of the cpu the task is currently on that it is using; it is not the percentage of the "overall cpu available on the machine". Since you have 3 tasks and 2 cpus, the extra task will always be on one or the other cpu taking half of the cpu but never on both cpus. > What I also don't understand is the difference in load average, sd > constantly had higher values, the above figures are representative > for the whole log. I don't know which is better though. There isn't much useful to say about the load average in isolation. It may be meaningful or not depending on whether it just shows the timing of when the cpu load is determined, or whether there is more time waiting in runqueues. Only throughput measurements can really tell them apart. What is important is that if all three tasks are fully cpu bound and started at the same time at the same nice level, that they all receive close to the same total cpu time overall showing some fairness is working as well. This should be the case no matter how many cpus you have. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REPORT] cfs-v5 vs sd-0.46
> oh, you are writing the number-cruncher? Yep. > In general the 'best' > performance metrics for scheduler validation are the ones where you have > immediate feedback: i.e. some ops/sec (or ops per minute) value in some > readily accessible place, or some "milliseconds-per-100,000 ops" type of > metric - whichever lends itself better to the workload at hand. I'll have to see whether that works out. I don't have an easily available ops/sec but I guess I could create something similar. > If you > measure time then the best is to use long long and nanoseconds and the > monotonic clocksource: [snip] Thanks, I will implement that, for Linux anyway. > Plus an absolute metric of "the whole workload took X.Y seconds" is > useful too. That's the easiest to come by and is already available. Best, Michael -- Technosis GmbH, Geschäftsführer: Michael Gerdau, Tobias Dittmar Sitz Hamburg; HRB 89145 Amtsgericht Hamburg Vote against SPAM - see http://www.politik-digital.de/spam/ Michael Gerdau email: [EMAIL PROTECTED] GPG-keys available on request or at public keyserver pgpwqhqmZDVz7.pgp Description: PGP signature
Re: [REPORT] cfs-v5 vs sd-0.46
* Michael Gerdau <[EMAIL PROTECTED]> wrote: > > Here i'm assuming that the vmstats are directly comparable: that > > your number-crunchers behave the same during the full runtime - is > > that correct? > > Yes, basically it does (disregarding small fluctuations) ok, good. > I'll see whether I can produce some type of absolute performance > measure as well. Thinking about it I guess this should be fairly > simple to implement. oh, you are writing the number-cruncher? In general the 'best' performance metrics for scheduler validation are the ones where you have immediate feedback: i.e. some ops/sec (or ops per minute) value in some readily accessible place, or some "milliseconds-per-100,000 ops" type of metric - whichever lends itself better to the workload at hand. If you measure time then the best is to use long long and nanoseconds and the monotonic clocksource: unsigned long long rdclock(void) { struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); return ts.tv_sec * 10ULL + ts.tv_nsec; } (link to librt via -lrt to pick up clock_gettime()) The cost of a clock_gettime() (or of a gettimeofday()) can be a couple of microseconds on some systems, so it shouldnt be done too frequently. Plus an absolute metric of "the whole workload took X.Y seconds" is useful too. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REPORT] cfs-v5 vs sd-0.46
> Here i'm assuming that the vmstats are directly comparable: that your > number-crunchers behave the same during the full runtime - is that > correct? Yes, basically it does (disregarding small fluctuations) I'll see whether I can produce some type of absolute performance measure as well. Thinking about it I guess this should be fairly simple to implement. Best, Michael -- Technosis GmbH, Geschäftsführer: Michael Gerdau, Tobias Dittmar Sitz Hamburg; HRB 89145 Amtsgericht Hamburg Vote against SPAM - see http://www.politik-digital.de/spam/ Michael Gerdau email: [EMAIL PROTECTED] GPG-keys available on request or at public keyserver pgprODjr3hqXe.pgp Description: PGP signature
Re: [REPORT] cfs-v5 vs sd-0.46
* Michael Gerdau <[EMAIL PROTECTED]> wrote: > > so to be totally 'fair' and get the same rescheduling 'granularity' > > you should probably lower CFS's sched_granularity_ns to 2 msecs. > > I'll change default nice in cfs to -10. > > I'm also happy to adjust /proc/sys/kernel/sched_granularity_ns to > 2msec. However checking /proc/sys/kernel/rr_interval reveals it is 16 > (msec) on my system. ah, yeah - there due to the SMP rule in SD: rr_interval *= 1 + ilog2(num_online_cpus()); and you have a 2-CPU system, so you get 8msec*2 == 16 msecs default interval. I find this a neat solution and i have talked to Con about this already and i'll adopt Con's idea in CFS too. Nevertheless, despite the settings, SD seems to be rescheduling every 6-7 msecs, while CFS reschedules only every 13 msecs. Here i'm assuming that the vmstats are directly comparable: that your number-crunchers behave the same during the full runtime - is that correct? (If not then the vmstat result should be run at roughly the same type of "stage" of the workload, on all the schedulers.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [REPORT] cfs-v5 vs sd-0.46
> > What I also don't understand is the difference in load average, sd > > constantly had higher values, the above figures are representative for > > the whole log. I don't know which is better though. > > hm, it's hard from here to tell that. What load average does the vanilla > kernel report? I'd take that as a reference. I will redo this test with sd-0.46, cfs-v5 and mainline later today. > interesting - CFS has half the context-switch rate of SD. That is > probably because on your workload CFS defaults to longer 'timeslices' > than SD. You can influence the 'timeslice length' under SD via > /proc/sys/kernel/rr_interval (milliseconds units) and under CFS via > /proc/sys/kernel/sched_granularity_ns. On CFS the value is not > necessarily the timeslice length you will observe - for example in your > workload above the granularity is set to 5 msec, but your rescheduling > rate is 13 msecs. SD default to a rr_interval value of 8 msecs, which in > your workload produces a timeslice length of 6-7 msecs. > > so to be totally 'fair' and get the same rescheduling 'granularity' you > should probably lower CFS's sched_granularity_ns to 2 msecs. I'll change default nice in cfs to -10. I'm also happy to adjust /proc/sys/kernel/sched_granularity_ns to 2msec. However checking /proc/sys/kernel/rr_interval reveals it is 16 (msec) on my system. Anyway, I'll have to do some urgent other work and won't be able to do lots of testing until tonight (but then I will). Best, Michael -- Technosis GmbH, Geschäftsführer: Michael Gerdau, Tobias Dittmar Sitz Hamburg; HRB 89145 Amtsgericht Hamburg Vote against SPAM - see http://www.politik-digital.de/spam/ Michael Gerdau email: [EMAIL PROTECTED] GPG-keys available on request or at public keyserver pgpfJX2s3TRBz.pgp Description: PGP signature
Re: [REPORT] cfs-v5 vs sd-0.46
* Michael Gerdau <[EMAIL PROTECTED]> wrote: > I'm running three single threaded perl scripts that do double > precision floating point math with little i/o after initially loading > the data. thanks for the testing! > What I also don't understand is the difference in load average, sd > constantly had higher values, the above figures are representative for > the whole log. I don't know which is better though. hm, it's hard from here to tell that. What load average does the vanilla kernel report? I'd take that as a reference. > Here are excerpts from a concurrently run vmstat 3 200: > > sd-0.46 > procs ---memory-- ---swap-- -io -system-- cpu > r b swpd free buff cache si sobibo in cs us sy id wa > 5 0 0 1702928 63664 82787600 067 458 1350 100 0 0 > 0 > 3 0 0 1702928 63684 82787600 089 468 1362 100 0 0 > 0 > 5 0 0 1702680 63696 82787600 0 132 461 1598 99 1 0 0 > 8 0 0 1702680 63712 82789200 080 465 1180 99 1 0 0 > cfs-v5 > procs ---memory-- ---swap-- -io -system-- cpu > r b swpd free buff cache si sobibo in cs us sy id wa > 6 0 0 2157728 31816 54523600 0 103 543 748 100 0 0 > 0 > 4 0 0 2157780 31828 54525600 063 435 752 100 0 0 > 0 > 4 0 0 2157928 31852 54525600 0 105 424 770 100 0 0 > 0 > 4 0 0 2157928 31868 54526800 0 261 457 763 100 0 0 > 0 interesting - CFS has half the context-switch rate of SD. That is probably because on your workload CFS defaults to longer 'timeslices' than SD. You can influence the 'timeslice length' under SD via /proc/sys/kernel/rr_interval (milliseconds units) and under CFS via /proc/sys/kernel/sched_granularity_ns. On CFS the value is not necessarily the timeslice length you will observe - for example in your workload above the granularity is set to 5 msec, but your rescheduling rate is 13 msecs. SD default to a rr_interval value of 8 msecs, which in your workload produces a timeslice length of 6-7 msecs. so to be totally 'fair' and get the same rescheduling 'granularity' you should probably lower CFS's sched_granularity_ns to 2 msecs. > Last not least I'd like to add that at least on my system having X > niced to -19 does result in kind of "erratic" (for lack of a better > word) desktop behavior. I'll will reevaluate this with -v6 but for now > IMO nicing X to -19 is a regression at least on my machine despite the > claim that cfs doesn't suffer from it. indeed with -19 the rescheduling limit is so high under CFS that it does not throttle X's scheduling rate enough and so it will make CFS behave as badly as other schedulers. I retested this with -10 and it should work better with that. In -v6 i changed the default to -10 too. > PS: Only learning how to test these things I'm happy to get pointed > out the shortcomings of what I tested above. Of course suggestions for > improvements are welcome. your report was perfectly fine and useful. "no visible regressions" is valuable feedback too. [ In fact, such type of feedback is the one i find the easiest to resolve ;-) ] Since you are running number-crunchers you might be able to give performacne feedback too: do you have any reliable 'performance metric' available for your number cruncher jobs (ops per minute, runtime, etc.) so that it would be possible to compare number-crunching performance of mainline to SD and to CFS as well? If that value is easy to get and reliable/stable enough to be meaningful. (And it would be nice to also establish some ballpark figure about how much noise there is in any performance metric, so that we can see whether any differences between schedulers are systematic or not.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[REPORT] cfs-v5 vs sd-0.46
Hi list, with cfs-v5 finally booting on my machine I have run my daily numbercrunching jobs on both cfs-v5 and sd-0.46, 2.6.21-v7 on top of a stock openSUSE 10.2 (X86_64). Config for both kernel is the same except for the X boost option in cfs-v5 which on my system didn't work (X still was @ -19; I understand this will be fixed in -v6). HZ is 250 in both. System is a Dell XPS M1710, Intel Core2 2.33GHz, 4GB, NVIDIA GeForce Go 7950 GTX with proprietary driver 1.0-9755 I'm running three single threaded perl scripts that do double precision floating point math with little i/o after initially loading the data. Both cfs and sd showed very similar behavior when monitored in top. I'll show more or less representative excerpt from a 10 minutes log, delay 3sec. sd-0.46 top - 00:14:24 up 1:17, 9 users, load average: 4.79, 4.95, 4.80 Tasks: 3 total, 3 running, 0 sleeping, 0 stopped, 0 zombie Cpu(s): 99.8%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.2%hi, 0.0%si, 0.0%st Mem: 3348628k total, 1648560k used, 1700068k free,64392k buffers Swap: 2097144k total,0k used, 2097144k free, 828204k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 6671 mgd 33 0 95508 22m 3652 R 100 0.7 44:28.11 perl 6669 mgd 31 0 95176 22m 3652 R 50 0.7 43:50.02 perl 6674 mgd 31 0 95368 22m 3652 R 50 0.7 47:55.29 perl cfs-v5 top - 08:07:50 up 21 min, 9 users, load average: 4.13, 4.16, 3.23 Tasks: 3 total, 3 running, 0 sleeping, 0 stopped, 0 zombie Cpu(s): 99.5%us, 0.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 3348624k total, 1193500k used, 2155124k free,32516k buffers Swap: 2097144k total,0k used, 2097144k free, 545568k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 6357 mgd 20 0 92024 19m 3652 R 100 0.6 8:54.21 perl 6356 mgd 20 0 91652 18m 3652 R 50 0.6 10:35.52 perl 6359 mgd 20 0 91700 18m 3652 R 50 0.6 8:47.32 perl What did surprise me is that cpu utilization had been spread 100/50/50 (round robin) most of the time. I did expect 66/66/66 or so. What I also don't understand is the difference in load average, sd constantly had higher values, the above figures are representative for the whole log. I don't know which is better though. Here are excerpts from a concurrently run vmstat 3 200: sd-0.46 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 5 0 0 1702928 63664 82787600 067 458 1350 100 0 0 0 3 0 0 1702928 63684 82787600 089 468 1362 100 0 0 0 5 0 0 1702680 63696 82787600 0 132 461 1598 99 1 0 0 8 0 0 1702680 63712 82789200 080 465 1180 99 1 0 0 3 0 0 1702712 63732 82788400 067 453 1005 100 0 0 0 4 0 0 1702792 63744 82792000 041 461 1138 100 0 0 0 3 0 0 1702792 63760 82791600 057 456 1073 100 0 0 0 3 0 0 1702808 63776 82792800 0 111 473 1095 100 0 0 0 3 0 0 1702808 63788 82792800 081 461 1092 99 1 0 0 3 0 0 1702188 63808 82792800 0 160 463 1437 99 1 0 0 3 0 0 1702064 63884 82790000 0 229 479 1125 99 0 0 0 4 0 0 1702064 63912 82797200 177 460 1108 100 0 0 0 7 0 0 1702032 63920 82800000 040 463 1068 100 0 0 0 4 0 0 1702048 63928 82800800 068 454 1114 100 0 0 0 11 0 0 1702048 63928 82800800 0 0 458 1001 100 0 0 0 3 0 0 1701500 63960 82802000 0