Re: SCHED_ULE should not be the default
My rule is "break it any way you can and see if you can figure out why." Don't be discouraged. You may find some of the folk at yahoo are interested. Adrian On 24 December 2011 03:00, Daniel Kalchev wrote: > > On Dec 24, 2011, at 12:49 AM, Adrian Chadd wrote: > >> Do you not have access to anything with 8 CPUs in it? It'd be nice to >> get clarification that this indeed was fixed. > > I offered to do tests on 4x8 core Opteron system (32 cores total), but was > discouraged that contention would be too much and results meaningless -- yet, > such systems will be more and more popular. > >> Does ULE care (much) if the nodes are hyperthreading or real cores? >> Would that play a part in what it tries to schedule/spread? > > I could also run the tests on 2x4x2 cores Xeon, which uses hyper threading, 8 > real or 16 virtual cores in total. > > I can torture both systems (actually two pairs) for a week or two. But I may > not have enough time to prepare the core/setup so any advice is greatly > appreciated. Be more descriptive :) > > Daniel ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
Am 24.12.2011 00:02, schrieb Andriy Gapon: > on 24/12/2011 00:49 Adrian Chadd said the following: >> Does ULE care (much) if the nodes are hyperthreading or real cores? >> Would that play a part in what it tries to schedule/spread? > > An answer to this part from the theory. > ULE does care about physical topology of the (logical) CPUs. > So, for example, four cores are not the same as two core with two hw threads > from ULE's perspective. Still, ULE tries to eliminate any imbalances between > the CPU groups starting from the top level (e.g. CPU packages in a > multi-socket > system) and all the way down to the individual (logical) CPUs. > Thus, given enough load (L >= N) there should not be an idle CPU in the system > whatever the topology. Modulo bugs, of course, as always. I tried to locate the old message, where somebody explained why the topology lead to a thread being selected for migration, re-assigned and then on another topology level was swapped back and ended on just the core it had already been running on. The analysis was quite detailed and it may well have been part of that discussion back in 2008 that Steve Kargl mentioned ... This problem could be fixed by adding a slight degree if randomness. But if IIRC, a deterministic solution might also be possible, which just takes care not to put a thread back on the core it previously had been running on, if it has been determined that the thread should be migrated to a different core, before. Sorry for not being able to point to the old message that contained the analysis of this problem. Regards, STefan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Dec 24, 2011, at 12:49 AM, Adrian Chadd wrote: > Do you not have access to anything with 8 CPUs in it? It'd be nice to > get clarification that this indeed was fixed. I offered to do tests on 4x8 core Opteron system (32 cores total), but was discouraged that contention would be too much and results meaningless -- yet, such systems will be more and more popular. > Does ULE care (much) if the nodes are hyperthreading or real cores? > Would that play a part in what it tries to schedule/spread? I could also run the tests on 2x4x2 cores Xeon, which uses hyper threading, 8 real or 16 virtual cores in total. I can torture both systems (actually two pairs) for a week or two. But I may not have enough time to prepare the core/setup so any advice is greatly appreciated. Be more descriptive :) Daniel___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Fri, Dec 23, 2011 at 02:49:51PM -0800, Adrian Chadd wrote: > On 23 December 2011 11:11, Steve Kargl > wrote: > > > One difference between the 2008 tests and today tests is > > the number of available cpus. ?In 2008, I ran the tests > > on a node with 8 cpus, while today's test used only a > > node with only 4 cpus. ?If this behavior is a scaling > > issue, I can't currently test it. ?But, today's tests > > are certainly encouraging. > > Do you not have access to anything with 8 CPUs in it? It'd be nice to > get clarification that this indeed was fixed. I have a few nodes with 8 cpus, but those are running 4BSD kernels. I try to keep my kernel and world sync, and by extension the kernel/world on each node is in sync with all other nodes. So, while I took the 4 cpu node off-line and updated it, at the moment I can't take another node off-line unless I do an update across the entire cluster. The update is planned for next year. > Does ULE care (much) if the nodes are hyperthreading or real cores? > Would that play a part in what it tries to schedule/spread? I only have opteron processors in the cluster, if you're referring to Intel's hypertheading technology, I can't look into ULE's behavior with HTT. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
on 24/12/2011 00:49 Adrian Chadd said the following: > Does ULE care (much) if the nodes are hyperthreading or real cores? > Would that play a part in what it tries to schedule/spread? An answer to this part from the theory. ULE does care about physical topology of the (logical) CPUs. So, for example, four cores are not the same as two core with two hw threads from ULE's perspective. Still, ULE tries to eliminate any imbalances between the CPU groups starting from the top level (e.g. CPU packages in a multi-socket system) and all the way down to the individual (logical) CPUs. Thus, given enough load (L >= N) there should not be an idle CPU in the system whatever the topology. Modulo bugs, of course, as always. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 23 December 2011 11:11, Steve Kargl wrote: > Ah, so goods news! I cannot reproduce this problem that > I saw 3+ years ago on the 4-cpu node, which is currently > running a ULE kernel. When I killed the (N+1)th job, > the N remaining jobs are spread across the N cpus. Ah, good. > One difference between the 2008 tests and today tests is > the number of available cpus. In 2008, I ran the tests > on a node with 8 cpus, while today's test used only a > node with only 4 cpus. If this behavior is a scaling > issue, I can't currently test it. But, today's tests > are certainly encouraging. Do you not have access to anything with 8 CPUs in it? It'd be nice to get clarification that this indeed was fixed. Does ULE care (much) if the nodes are hyperthreading or real cores? Would that play a part in what it tries to schedule/spread? Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 22, 2011 at 04:23:29PM -0800, Adrian Chadd wrote: > On 22 December 2011 11:47, Steve Kargl > wrote: > > > There is the additional observation in one of my 2008 > > emails (URLs have been posted) that if you have N+1 > > cpu-bound jobs with, say, job0 and job1 ping-ponging > > on cpu0 (due to ULE's cpu-affinity feature) and if I > > kill job2 running on cpu1, then neither job0 nor job1 > > will migrate to cpu1. ?So, one now has N cpu-bound > > jobs running on N-1 cpus. > > .. and this sounds like a pretty serious regression. Have you ever > filed a PR for it? > Ah, so goods news! I cannot reproduce this problem that I saw 3+ years ago on the 4-cpu node, which is currently running a ULE kernel. When I killed the (N+1)th job, the N remaining jobs are spread across the N cpus. One difference between the 2008 tests and today tests is the number of available cpus. In 2008, I ran the tests on a node with 8 cpus, while today's test used only a node with only 4 cpus. If this behavior is a scaling issue, I can't currently test it. But, today's tests are certainly encouraging. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 22, 2011 at 04:23:29PM -0800, Adrian Chadd wrote: > On 22 December 2011 11:47, Steve Kargl > wrote: > > > There is the additional observation in one of my 2008 > > emails (URLs have been posted) that if you have N+1 > > cpu-bound jobs with, say, job0 and job1 ping-ponging > > on cpu0 (due to ULE's cpu-affinity feature) and if I > > kill job2 running on cpu1, then neither job0 nor job1 > > will migrate to cpu1. ?So, one now has N cpu-bound > > jobs running on N-1 cpus. > > .. and this sounds like a pretty serious regression. Have you ever > filed a PR for it? No. I was interacting directly with jeffr in 2008. I got as far as setting up root access on a node for jeffr. Unfortunately, both jeffr and I got busy with real life, and 4BSD allowed me to get my work done. > > Finally, my initial post in this email thread was to > > tell O. Hartman to quit beating his head against > > a wall with ULE (in an HPC environment). ?Switch to > > 4BSD. ?This was based on my 2008 observations and > > I've now wasted 2 days gather additional information > > which only re-affirms my recommendation. > > I personally don't think this is time wasted. You've done something > that noone else has actually done - provided actual results from > real-life testing, rather than a hundred posts of "I remember seeing > X, so I don't use ULE." > > If you can definitely and consistently reproduce that N-1 cpu bound > job bug, you're now in a great position to easily test and re-report > KTR/schedtrace results to see what impact they have. Please don't > underestimate exactly how valuable this is. I'll try this tomorrow. I first need to modify the code I used in the 2008 test to disable IO, so that it is nearly completely cpu-bound. > How often are those two jobs migrating between CPUs? How am I supposed > to read "CPU load" ? Why isn't it just sitting at 100% the whole time? This is my 1st foray into ktr and schedgraph, so I may not have done something incorrectly. In particular, it seems that schedgraph takes the cpu clock as a command line argument, so there is probably some scaling that I'm missing. > Would you mind repeating this with 4BSD (the N+1 jobs) so we can see > how the jobs are scheduled/interleaved? Something tells me we'll see > it the jobs being scheduled evenly Sure, I'll do this tomorrow as well. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/22/2011 16:23, Adrian Chadd wrote: > You've done something > that noone else has actually done - provided actual results from > real-life testing, rather than a hundred posts of "I remember seeing > X, so I don't use ULE." Not to take away from Steve's excellent work on this, but I actually spent weeks following detailed instructions from various people using ktr, dtrace, etc. and was never able to produce any data that helped point anyone to something that could be fixed. I'm pretty sure that others have tried as well. That said, I'm glad that Steve was able to produce useful results, and hopefully it will lead to improvements. Doug -- [^L] Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 22 December 2011 11:47, Steve Kargl wrote: [snip] Thankyou for posting some actual measurements! > There is the additional observation in one of my 2008 > emails (URLs have been posted) that if you have N+1 > cpu-bound jobs with, say, job0 and job1 ping-ponging > on cpu0 (due to ULE's cpu-affinity feature) and if I > kill job2 running on cpu1, then neither job0 nor job1 > will migrate to cpu1. So, one now has N cpu-bound > jobs running on N-1 cpus. .. and this sounds like a pretty serious regression. Have you ever filed a PR for it? > Finally, my initial post in this email thread was to > tell O. Hartman to quit beating his head against > a wall with ULE (in an HPC environment). Switch to > 4BSD. This was based on my 2008 observations and > I've now wasted 2 days gather additional information > which only re-affirms my recommendation. I personally don't think this is time wasted. You've done something that noone else has actually done - provided actual results from real-life testing, rather than a hundred posts of "I remember seeing X, so I don't use ULE." If you can definitely and consistently reproduce that N-1 cpu bound job bug, you're now in a great position to easily test and re-report KTR/schedtrace results to see what impact they have. Please don't underestimate exactly how valuable this is. How often are those two jobs migrating between CPUs? How am I supposed to read "CPU load" ? Why isn't it just sitting at 100% the whole time? Would you mind repeating this with 4BSD (the N+1 jobs) so we can see how the jobs are scheduled/interleaved? Something tells me we'll see it the jobs being scheduled evenly Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
on 22/12/2011 21:47 Steve Kargl said the following: > On Thu, Dec 22, 2011 at 09:01:15PM +0200, Andriy Gapon wrote: >> on 22/12/2011 20:45 Steve Kargl said the following: >>> I've used schedgraph to look at the ktrdump output. A jpg is >>> available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg >>> This shows the ping-pong effect where here 3 processes appear to be >>> using 2 cpus while the remaining 2 processes are pinned to their >>> cpus. >> >> I'd recommended enabling CPU-specific background colors via the menu in >> schedgraph for a better illustration of your findings. >> >> NB: I still don't understand the point of purposefully running N+1 CPU-bound >> processes. >> > > The point is that this is a node in a HPC cluster with > multiple users. Sure, I can start my job on this node > with only N cpu-bound jobs. Now, when user John Doe > wants to run his OpenMPI program should he login into > the 12 nodes in the cluster to see if someone is already > running N cpu-bound jobs on a given node? 4BSD > gives my jobs and John Doe's jobs a fair share of the > available cpus. ULE does not give a fair share and > if you read the summary file I put up on the web, > you see that it is fairly non-deterministic on when a > OpenMPI run will finish (see the mean absolute deviations > in the table of 'real' times that I posted). OK. I think I know why the uneven load occurs. I remember even trying to explain my observations. There are two things: 1. ULE doesn't have either a common across CPUs runqueue nor any other kind of mechanism for enforcing true global fairness of CPU resource sharing. 2. ULE's rebalancing code is biased and that leads to the situation where sub-groups of threads can share subsets of CPUs rather fairly, but there won't be a global fairness. I haven't really given any thought as to how to fix or workaround these issues. One dumb idea is to add an element of randomness to a choice between equally loaded CPUs (and their subsets) instead of having a permanent bias. > There is the additional observation in one of my 2008 > emails (URLs have been posted) that if you have N+1 > cpu-bound jobs with, say, job0 and job1 ping-ponging > on cpu0 (due to ULE's cpu-affinity feature) and if I > kill job2 running on cpu1, then neither job0 nor job1 > will migrate to cpu1. So, one now has N cpu-bound > jobs running on N-1 cpus. Have you checked recently that that is still the case? I would consider this a rather serious bug as opposed to a sub-optimal scheduling. > Finally, my initial post in this email thread was to > tell O. Hartman to quit beating his head against > a wall with ULE (in an HPC environment). Switch to > 4BSD. This was based on my 2008 observations and > I've now wasted 2 days gather additional information > which only re-affirms my recommendation. I think that any objective information has its value. So maybe the time is not really wasted. I think there is no argument that for your usage pattern 4BSD is better than ULE at the moment, because of the inherent design choices of both schedulers and their current implementations. But I think that ULE could be improved to produce more global fairness. P.S. But, but, this thread has seen so many different problem reports about ULE heaped together that it's very easy to get confused about what is caused by what and what is real and what is not. E.g. I don't think that there is a direct relation between this issue (N+1 CPU-bound tasks) and "my X is sluggish with ULE when I untar a large file". P.P.S. About the subject line. Let's recall why ULE has become a default. It has happened because of many observations from users and developers that "things" were faster/"snappier" with ULE than with 4BSD and a significant stream of requests to make it the default. So it's business as usual. The schedulers are different, so there those for whom one scheduler works better and those for whom the other works better and those for whom both work reasonably well and those for whom neither is satisfactory and those who don't really care/compare. There is a silent majority and the vocal minorities. There are specific bugs and quirks, advantages and disadvantages, usage patterns, hardware configurations and what not. When everybody starts to talk at the same time, it's a huge mess. But silently triaging and debugging one problem at a time also doesn't always work. There, I've said it. Let me now try to recall why I felt a need to say all of this :-) -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 22, 2011 at 09:01:15PM +0200, Andriy Gapon wrote: > on 22/12/2011 20:45 Steve Kargl said the following: > > I've used schedgraph to look at the ktrdump output. A jpg is > > available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg > > This shows the ping-pong effect where here 3 processes appear to be > > using 2 cpus while the remaining 2 processes are pinned to their > > cpus. > > I'd recommended enabling CPU-specific background colors via the menu in > schedgraph for a better illustration of your findings. > > NB: I still don't understand the point of purposefully running N+1 CPU-bound > processes. > The point is that this is a node in a HPC cluster with multiple users. Sure, I can start my job on this node with only N cpu-bound jobs. Now, when user John Doe wants to run his OpenMPI program should he login into the 12 nodes in the cluster to see if someone is already running N cpu-bound jobs on a given node? 4BSD gives my jobs and John Doe's jobs a fair share of the available cpus. ULE does not give a fair share and if you read the summary file I put up on the web, you see that it is fairly non-deterministic on when a OpenMPI run will finish (see the mean absolute deviations in the table of 'real' times that I posted). There is the additional observation in one of my 2008 emails (URLs have been posted) that if you have N+1 cpu-bound jobs with, say, job0 and job1 ping-ponging on cpu0 (due to ULE's cpu-affinity feature) and if I kill job2 running on cpu1, then neither job0 nor job1 will migrate to cpu1. So, one now has N cpu-bound jobs running on N-1 cpus. Finally, my initial post in this email thread was to tell O. Hartman to quit beating his head against a wall with ULE (in an HPC environment). Switch to 4BSD. This was based on my 2008 observations and I've now wasted 2 days gather additional information which only re-affirms my recommendation. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
on 22/12/2011 20:45 Steve Kargl said the following: > I've used schedgraph to look at the ktrdump output. A jpg is > available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg > This shows the ping-pong effect where here 3 processes appear to be > using 2 cpus while the remaining 2 processes are pinned to their > cpus. I'd recommended enabling CPU-specific background colors via the menu in schedgraph for a better illustration of your findings. NB: I still don't understand the point of purposefully running N+1 CPU-bound processes. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote: > On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote: > > > > I have placed several files at > > > > http://troutmask.apl.washington.edu/~kargl/freebsd > > > > dmesg.txt --> dmesg for ULE kernel > > summary--> A summary that includes top(1) output of all runs. > > sysctl.ule.txt --> sysctl -a for the ULE kernel > > ktr-ule-problem-kargl.out.gz I've replaced the original version of the ktr file with a new version. The old version was corrupt due to my failure to set 'sysctl debug.ktr.mask=0' prior to the dump. > One explanation for taking 1.5-2x times is that with ULE the > threads are not migrated properly, so you end up with idle cores > and ready threads not running (the other possible explanation > would be that there are migrations, but they are so frequent and > expensive that they completely trash the caches. But this seems > unlikely for this type of task). I've used schedgraph to look at the ktrdump output. A jpg is available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg This shows the ping-pong effect where here 3 processes appear to be using 2 cpus while the remaining 2 processes are pinned to their cpus. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote: > On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote: >> >> I have placed several files at >> >> http://troutmask.apl.washington.edu/~kargl/freebsd >> >> dmesg.txt --> dmesg for ULE kernel >> summary--> A summary that includes top(1) output of all runs. >> sysctl.ule.txt --> sysctl -a for the ULE kernel >> ktr-ule-problem-kargl.out.gz >> >> >> Since time is executed on the master, only the 'real' time is of >> interest (the summary file includes user and sys times). This >> command is run at 5 times for each N value and up to 10 time for >> some N values with the ULE kernel. The following table records >> the average 'real' time and the number in (...) is the mean >> absolute deviations. >> >> # N ULE 4BSD >> # - >> # 4223.27 (0.502) 221.76 (0.551) >> # 5404.35 (73.82) 270.68 (0.866) >> # 6627.56 (173.0) 247.23 (1.442) >> # 7475.53 (84.07) 285.78 (1.421) >> # 8429.45 (134.9) 223.64 (1.316) > > One explanation for taking 1.5-2x times is that with ULE the > threads are not migrated properly, so you end up with idle cores > and ready threads not running That's what I guessed back in 2008 when I first reported the behavior. http://freebsd.monkey.org/freebsd-current/200807/msg00278.html http://freebsd.monkey.org/freebsd-current/200807/msg00280.html The top(1) output at the above URL shows 10 completely independent instances of the same numerically intensive application running on a circa 2008 ULE kernel. Look at the PRI column. The high PRI jobs are not only pinned to a cpu, but these are running at 100% WCPU. The low PRI jobs seem to be pinned to a subset of the available cpus and simply ping-pong in and out of the same cpus. In this instance, there are 5 jobs competing for time on 3 cpus. > Also, perhaps one could build a simple test process that replicates > this workload (so one can run it as part of regression tests): > 1. define a CPU-intensive function f(n) which issues no > system calls, optionally touching > a lot of memory, where n determines the number of iterations. > 2. by trial and error (or let the program find it), > pick a value N1 so that the minimum execution time > of f(N1) is in the 10..100ms range > 3. now run the function f() again from an outer loop so > that the total execution time is large (10..100s) > again with no intervening system calls. > 4. use an external shell script can rerun a process > when it terminates, and then run multiple instances > in parallel. Instead of the external script one could > fork new instances before terminating, but i am a bit > unclear how CPU inheritance works when a process forks. > Going through the shell possibly breaks the chain. The tests at the above URL does essentially what you propose except in 2008 the kzk90 programs were doing some IO. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote: > If someone else thinks he has a specific problem that is not > characterized by one of the cases above please let me know and I will > put this in the chart. It seems I stumbled over another thing. Setup: 2 Servers providing devices by ggated, 1 Server using ggatec for those devices. ZFS over each a pair of disks provided by both ggated servers. I use rsync to fill up the 6 zpools/zfs from an existing storage (2 TB zpools, about 500 to 700 GiB user per pool). 2 rsyncs running in parallel to fill the partitions. Main server (ggate client with ZFS and rsync) has an Intel Xeon X3450 2.66 GHz quadcore processor (+HTT or whatever it's called nowadays, gives 8 "cpus" in FreeBSD). With ULE ZFS gets slower after some time and finally gets stuck after 1 to 3 days of continouus synchronisation (ggate works like a charm as far as I can tell), with 4BSD (online since 6 days) the rsync seems to run a lot faster and I didn't get ZFS to stall. There's nearly no local I/O (system is on a local SSD) and the load/CPU usage are not actually high. All is running a quite recent RELENG_9 If anyone's interested I can get more detail and carry out some tests. - Oliver -- | Oliver Brandmueller http://sysadm.in/ o...@sysadm.in | |Ich bin das Internet. Sowahr ich Gott helfe. | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 22, 2011 at 01:07:58AM -0800, Adrian Chadd wrote: > Are you able to go through the emails here and grab out Attilio's > example for generating KTR scheduler traces? > Did your read this part of my email? > > > > Attilio, > > > > I have placed several files at > > > > http://troutmask.apl.washington.edu/~kargl/freebsd > > > > dmesg.txt --> dmesg for ULE kernel > > summary--> A summary that includes top(1) output of all runs. > > sysctl.ule.txt --> sysctl -a for the ULE kernel > > ktr-ule-problem-kargl.out.gz ktr-ule-problem-kargl.out is a 43 MB file. I don't the freebsd.org email server would allow that file through. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/22/11 04:07, Adrian Chadd wrote: Are you able to go through the emails here and grab out Attilio's example for generating KTR scheduler traces? Adrian [...] I've put up two such files: http://www.m5p.com/~george/ktr-ule-problem.out http://www.m5p.com/~george/ktr-ule-interact.out but I don't know how to analyze them myself. What do all of us do next? -- George Mitchell ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote: > On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote: > > 2011/12/15 Steve Kargl : > > > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote: > > >> > > >> I basically went through all the e-mail you just sent and identified 4 > > >> real report on which we could work on and summarizied in the attached > > >> Excel file. > > >> I'd like that George, Steve, Doug, Andrey and Mike possibly review the > > >> few datas there and add more, if they want, or make more important > > >> clarifications in particular about the Xorg presence (or rather not) > > >> in their workload. > > > > > > Your summary of my observations appears correct. > > > > > > I have grabbed an up-to-date /usr/src, built and > > > installed world, and built and installed a new > > > kernel on one of the nodes in my cluster. ??It > > > has > > > > > > > It seems a perfect environment, just please make sure you made a > > debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically). > > > > The first thing is, can you try reproducing your case? As far as I got > > it, for you it was enough to run N + small_amount of CPU-bound threads > > to show performance penalty, so I'd ask you to start with using dnetc > > or just your preferred cpu-bound workload and verify you can reproduce > > the issue. > > As it happens, please monitor the threads bouncing and CPU utilization > > via 'top' (you don't need to be 100% precise, jut to get an idea, and > > keep an eye on things like excessive threads migration, thread binding > > obsessity, low throughput on CPU). > > One note: if your workloads need to do I/O please use a tempfs or > > memory storage to do so, in order to reduce I/O effects at all. > > Also, verify this doesn't happen with 4BSD scheduler, just in case. > > > > Finally, if the problem is still in place, please recompile your > > kernel by adding: > > options KTR > > options KTR_ENTRIES=262144 > > options KTR_COMPILE=(KTR_SCHED) > > options KTR_MASK=(KTR_SCHED) > > > > And reproduce the issue. > > When you are in the middle of the scheduling issue go with: > > # ktrdump -ctf > ktr-ule-problem-YOURNAME.out > > > > and send to the mailing list along with your dmesg and the > > informations on the CPU utilization you gathered by top(1). > > > > That should cover it all, but if you have further questions, please > > just go ahead. > > Attilio, > > I have placed several files at > > http://troutmask.apl.washington.edu/~kargl/freebsd > > dmesg.txt --> dmesg for ULE kernel > summary--> A summary that includes top(1) output of all runs. > sysctl.ule.txt --> sysctl -a for the ULE kernel > ktr-ule-problem-kargl.out.gz > > I performed a series of tests with both 4BSD and ULE kernels. > The 4BSD and ULE kernels are identical except of course for the > scheduler. Both witness and invariants are disabled, and malloc > has been compiled without debugging. > > Here's what I did. On the master node in my cluster, I ran an > OpenMPI code that sends N jobs off to the node with the kernel > of interest. There is communication between the master and > slaves to generate 16 independent chunks of data. Note, there > is no disk IO. So, for example, N=4 will start 4 essentially > identical numerically intensity jobs. At the start of a run, > the master node instructs each slave job to create a chunk of > data. After the data is created, the slave sends it back to the > master and the master sends instructions to create the next chunk > of data. This communication continues until the 16 chunks have > been assigned, computed, and returned to the master. > > Here is a rough measurement of the problem with ULE and numerical > intensity loads. This command is executed on the master > > time mpiexec -machinefile mf3 -np N sasmp sas.in > > Since time is executed on the master, only the 'real' time is of > interest (the summary file includes user and sys times). This > command is run at 5 times for each N value and up to 10 time for > some N values with the ULE kernel. The following table records > the average 'real' time and the number in (...) is the mean > absolute deviations. > > # N ULE 4BSD > # - > # 4223.27 (0.502) 221.76 (0.551) > # 5404.35 (73.82) 270.68 (0.866) > # 6627.56 (173.0) 247.23 (1.442) > # 7475.53 (84.07) 285.78 (1.421) > # 8429.45 (134.9) 223.64 (1.316) One explanation for taking 1.5-2x times is that with ULE the threads are not migrated properly, so you end up with idle cores and ready threads not running (the other possible explanation would be that there are migrations, but they are so frequent and expensive that they completely trash the caches. But this seems unlikely for this type of task). Also, perhaps one could build a simple test process that replicates this workload (so one can run it as part of regression tests):
Re: SCHED_ULE should not be the default
Are you able to go through the emails here and grab out Attilio's example for generating KTR scheduler traces? Adrian On 21 December 2011 16:52, Steve Kargl wrote: > On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote: >> 2011/12/15 Steve Kargl : >> > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote: >> >> >> >> I basically went through all the e-mail you just sent and identified 4 >> >> real report on which we could work on and summarizied in the attached >> >> Excel file. >> >> I'd like that George, Steve, Doug, Andrey and Mike possibly review the >> >> few datas there and add more, if they want, or make more important >> >> clarifications in particular about the Xorg presence (or rather not) >> >> in their workload. >> > >> > Your summary of my observations appears correct. >> > >> > I have grabbed an up-to-date /usr/src, built and >> > installed world, and built and installed a new >> > kernel on one of the nodes in my cluster. ??It >> > has >> > >> >> It seems a perfect environment, just please make sure you made a >> debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically). >> >> The first thing is, can you try reproducing your case? As far as I got >> it, for you it was enough to run N + small_amount of CPU-bound threads >> to show performance penalty, so I'd ask you to start with using dnetc >> or just your preferred cpu-bound workload and verify you can reproduce >> the issue. >> As it happens, please monitor the threads bouncing and CPU utilization >> via 'top' (you don't need to be 100% precise, jut to get an idea, and >> keep an eye on things like excessive threads migration, thread binding >> obsessity, low throughput on CPU). >> One note: if your workloads need to do I/O please use a tempfs or >> memory storage to do so, in order to reduce I/O effects at all. >> Also, verify this doesn't happen with 4BSD scheduler, just in case. >> >> Finally, if the problem is still in place, please recompile your >> kernel by adding: >> options KTR >> options KTR_ENTRIES=262144 >> options KTR_COMPILE=(KTR_SCHED) >> options KTR_MASK=(KTR_SCHED) >> >> And reproduce the issue. >> When you are in the middle of the scheduling issue go with: >> # ktrdump -ctf > ktr-ule-problem-YOURNAME.out >> >> and send to the mailing list along with your dmesg and the >> informations on the CPU utilization you gathered by top(1). >> >> That should cover it all, but if you have further questions, please >> just go ahead. > > Attilio, > > I have placed several files at > > http://troutmask.apl.washington.edu/~kargl/freebsd > > dmesg.txt --> dmesg for ULE kernel > summary --> A summary that includes top(1) output of all runs. > sysctl.ule.txt --> sysctl -a for the ULE kernel > ktr-ule-problem-kargl.out.gz > > I performed a series of tests with both 4BSD and ULE kernels. > The 4BSD and ULE kernels are identical except of course for the > scheduler. Both witness and invariants are disabled, and malloc > has been compiled without debugging. > > Here's what I did. On the master node in my cluster, I ran an > OpenMPI code that sends N jobs off to the node with the kernel > of interest. There is communication between the master and > slaves to generate 16 independent chunks of data. Note, there > is no disk IO. So, for example, N=4 will start 4 essentially > identical numerically intensity jobs. At the start of a run, > the master node instructs each slave job to create a chunk of > data. After the data is created, the slave sends it back to the > master and the master sends instructions to create the next chunk > of data. This communication continues until the 16 chunks have > been assigned, computed, and returned to the master. > > Here is a rough measurement of the problem with ULE and numerical > intensity loads. This command is executed on the master > > time mpiexec -machinefile mf3 -np N sasmp sas.in > > Since time is executed on the master, only the 'real' time is of > interest (the summary file includes user and sys times). This > command is run at 5 times for each N value and up to 10 time for > some N values with the ULE kernel. The following table records > the average 'real' time and the number in (...) is the mean > absolute deviations. > > # N ULE 4BSD > # - > # 4 223.27 (0.502) 221.76 (0.551) > # 5 404.35 (73.82) 270.68 (0.866) > # 6 627.56 (173.0) 247.23 (1.442) > # 7 475.53 (84.07) 285.78 (1.421) > # 8 429.45 (134.9) 223.64 (1.316) > > These numbers to me demonstrate that ULE is not a good choice > for a HPC workload. > > If you need more information, feel free to ask. If you would > like access to the node, I can probably arrange that. But, > we can discuss that off-line. > > -- > Steve > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "f
Re: SCHED_ULE should not be the default
On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote: > 2011/12/15 Steve Kargl : > > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote: > >> > >> I basically went through all the e-mail you just sent and identified 4 > >> real report on which we could work on and summarizied in the attached > >> Excel file. > >> I'd like that George, Steve, Doug, Andrey and Mike possibly review the > >> few datas there and add more, if they want, or make more important > >> clarifications in particular about the Xorg presence (or rather not) > >> in their workload. > > > > Your summary of my observations appears correct. > > > > I have grabbed an up-to-date /usr/src, built and > > installed world, and built and installed a new > > kernel on one of the nodes in my cluster. ??It > > has > > > > It seems a perfect environment, just please make sure you made a > debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically). > > The first thing is, can you try reproducing your case? As far as I got > it, for you it was enough to run N + small_amount of CPU-bound threads > to show performance penalty, so I'd ask you to start with using dnetc > or just your preferred cpu-bound workload and verify you can reproduce > the issue. > As it happens, please monitor the threads bouncing and CPU utilization > via 'top' (you don't need to be 100% precise, jut to get an idea, and > keep an eye on things like excessive threads migration, thread binding > obsessity, low throughput on CPU). > One note: if your workloads need to do I/O please use a tempfs or > memory storage to do so, in order to reduce I/O effects at all. > Also, verify this doesn't happen with 4BSD scheduler, just in case. > > Finally, if the problem is still in place, please recompile your > kernel by adding: > options KTR > options KTR_ENTRIES=262144 > options KTR_COMPILE=(KTR_SCHED) > options KTR_MASK=(KTR_SCHED) > > And reproduce the issue. > When you are in the middle of the scheduling issue go with: > # ktrdump -ctf > ktr-ule-problem-YOURNAME.out > > and send to the mailing list along with your dmesg and the > informations on the CPU utilization you gathered by top(1). > > That should cover it all, but if you have further questions, please > just go ahead. Attilio, I have placed several files at http://troutmask.apl.washington.edu/~kargl/freebsd dmesg.txt --> dmesg for ULE kernel summary--> A summary that includes top(1) output of all runs. sysctl.ule.txt --> sysctl -a for the ULE kernel ktr-ule-problem-kargl.out.gz I performed a series of tests with both 4BSD and ULE kernels. The 4BSD and ULE kernels are identical except of course for the scheduler. Both witness and invariants are disabled, and malloc has been compiled without debugging. Here's what I did. On the master node in my cluster, I ran an OpenMPI code that sends N jobs off to the node with the kernel of interest. There is communication between the master and slaves to generate 16 independent chunks of data. Note, there is no disk IO. So, for example, N=4 will start 4 essentially identical numerically intensity jobs. At the start of a run, the master node instructs each slave job to create a chunk of data. After the data is created, the slave sends it back to the master and the master sends instructions to create the next chunk of data. This communication continues until the 16 chunks have been assigned, computed, and returned to the master. Here is a rough measurement of the problem with ULE and numerical intensity loads. This command is executed on the master time mpiexec -machinefile mf3 -np N sasmp sas.in Since time is executed on the master, only the 'real' time is of interest (the summary file includes user and sys times). This command is run at 5 times for each N value and up to 10 time for some N values with the ULE kernel. The following table records the average 'real' time and the number in (...) is the mean absolute deviations. # N ULE 4BSD # - # 4223.27 (0.502) 221.76 (0.551) # 5404.35 (73.82) 270.68 (0.866) # 6627.56 (173.0) 247.23 (1.442) # 7475.53 (84.07) 285.78 (1.421) # 8429.45 (134.9) 223.64 (1.316) These numbers to me demonstrate that ULE is not a good choice for a HPC workload. If you need more information, feel free to ask. If you would like access to the node, I can probably arrange that. But, we can discuss that off-line. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Mon Dec 19 11, Nathan Whitehorn wrote: > On 12/18/11 04:34, Adrian Chadd wrote: > >The trouble is that there's lots of anecdotal evidence, but noone's > >really gone digging deep into _their_ example of why it's broken. The > >developers who know this stuff don't see anything wrong. That hints to > >me it may be something a little more creepy - as an example, the > >interplay between netisr/swi/taskqueue/callbacks and such. It may be > >that something is being starved that isn't obviously obvious. It's > >just a stab in the dark, but it sounds somewhat plausible based on > >what I've seen ULE do in my network throughput hacking. > > > >I applaud reppie for trying to make it as easy as possible for people > >to use KTR to provide scheduler traces for him to go digging with, so > >please, if you have these issues and you can absolutely reproduce > >them, please follow his instructions and work with him to get him what > >he needs. > > The thing I've seen is that ULE is substantially more enthusiastic about > migrating processes between cores than 4BSD. Often, this is a good > thing, but can increase the rate of cache misses, hurting performance > for cache-bound processes (I see this particularly in HPC-type > scientific workloads). It might be interesting to add some kind of > tunable here. does r228718 have any impact regarding this behaviour? cheers. alex > > Another more interesting and slightly longer-term possibility if someone > wants a project would be to integrate scheduling decisions with hwpmc > counters, to accumulate statistics on cache hits at each context switch > and preferentially keep processes with a high hits/misses ratio on the > same thread/cache domain relative to processes with a low one. > -Nathan > > P.S. The other thing that could be very interesting from a research and > scheduling standpoint would be to integrate heterogeneous SMP support > into the operating system, with a FreeBSD-4 "Application Processor" > syscall model. We seem to be going down the road where GPGPU computing > has MMUs, timer interrupts, IPIs, etc. (the next AMD Fusions, IBM Cell). > This is something that no operating system currently supports well, and > would be a place for BSD to shine. If anyone has a free graduate student... ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
on 19/12/2011 19:46 Ivan Klymenko said the following: > В Sat, 17 Dec 2011 23:13:16 +0200 > Andriy Gapon пишет: > >> on 17/12/2011 19:33 George Mitchell said the following: >>> Summing up for the record, in my original test: >>> 1. It doesn't matter whether X is running or not. >>> 2. The problem is not limited to two or fewer CPUs. (It also >>> happens for me on a six-CPU system.) >>> 3. It doesn't require nCPU + 1 compute-bound processes, just nCPU. >>> >>> With nCPU compute-bound processes running, with SCHED_ULE, any other >>> process that is interactive (which to me means frequently waiting >>> for I/O) gets ABYSMAL performance -- over an order of magnitude >>> worse than it gets with SCHED_4BSD under the same conditions. >> >> I definitely do not see anything like this. >> Specifically: >> - with X >> - with 2 CPUs >> - with nCPU and/or nCPU + 1 compute-bound processes >> - with SCHED_ULE obviously :-) >> I do not get "abysmal" performance for I/O active tasks. >> >> Perhaps there is something specific that you would want me to run and >> measure. >> > > Well, share your experiences - what to do, what would the others were > fine with SCHED_ULE. ;) I didn't have to do anything special, so I am at loss as what to share. It just works (tm) for me. Sorry. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
В Sat, 17 Dec 2011 23:13:16 +0200 Andriy Gapon пишет: > on 17/12/2011 19:33 George Mitchell said the following: > > Summing up for the record, in my original test: > > 1. It doesn't matter whether X is running or not. > > 2. The problem is not limited to two or fewer CPUs. (It also > > happens for me on a six-CPU system.) > > 3. It doesn't require nCPU + 1 compute-bound processes, just nCPU. > > > > With nCPU compute-bound processes running, with SCHED_ULE, any other > > process that is interactive (which to me means frequently waiting > > for I/O) gets ABYSMAL performance -- over an order of magnitude > > worse than it gets with SCHED_4BSD under the same conditions. > > I definitely do not see anything like this. > Specifically: > - with X > - with 2 CPUs > - with nCPU and/or nCPU + 1 compute-bound processes > - with SCHED_ULE obviously :-) > I do not get "abysmal" performance for I/O active tasks. > > Perhaps there is something specific that you would want me to run and > measure. > Well, share your experiences - what to do, what would the others were fine with SCHED_ULE. ;) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
The trouble is that there's lots of anecdotal evidence, but noone's really gone digging deep into _their_ example of why it's broken. The developers who know this stuff don't see anything wrong. That hints to me it may be something a little more creepy - as an example, the interplay between netisr/swi/taskqueue/callbacks and such. It may be that something is being starved that isn't obviously obvious. It's just a stab in the dark, but it sounds somewhat plausible based on what I've seen ULE do in my network throughput hacking. I applaud reppie for trying to make it as easy as possible for people to use KTR to provide scheduler traces for him to go digging with, so please, if you have these issues and you can absolutely reproduce them, please follow his instructions and work with him to get him what he needs. Adrian (wow, lots of personal pronouns packed into one sentence. It must be sleep time.) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
Hi, What Attilllo and others need are KTR traces in the most stripped down example of interactive-busting workload you can find. Eg: if you're doing 32 concurrent buildworlds and trying to test interactivity - fine, but that's going to result in a lot of KTR stuff. If you can reproduce it using a dd via /dev/null and /dev/random (like another poster did) with nothing else running, then even better. If you can do it without X running, even better. I honestly suggest ignoring benchmarks for now and concentrating on interactivity. Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/18/11 03:37, Bruce Cran wrote: > On 13/12/2011 09:00, Andrey Chernov wrote: >> I observe ULE interactivity slowness even on single core machine >> (Pentium 4) in very visible places, like 'ps ax' output stucks in the >> middle by ~1 second. When I switch back to SHED_4BSD, all slowness is >> gone. > > I'm also seeing problems with ULE on a dual-socket quad-core Xeon > machine with 16 logical CPUs. If I run "tar xf somefile.tar" and "make > -j16 buildworld" then logging into another console can take several > seconds. Sometimes even the "Password:" prompt can take a couple of > seconds to appear after typing my username. > I reported ages ago several problems using SCHED_ULE on FreeBSD 8/9 when doing heavy I/O, either disk or network bound (that time I realised the problem on servers doing heavy disk I/O or net I/O). It was suspected that X could be the problem, but we also have a Dell PowerEdge 1950III running FreeBSD 8.2-STABLE (by next week 9.0-RC[2/3]/STABLE) without X, but the same problems, but no so prominent as with X. The box has 8 cores, 4 cores per socket each and 16 GB RAM, SAS 6/iR controller and two PCI-X attached Broacom NexTreme NICs, so the hardware shouldn't be any kind of trouble. But that time (over the past two years for now), the problem was considered "a personal" problem. Bah! By the beginning of next year my working group expects new hardware. Since we use for Linux for scientific work (due to OpenCL and CUDA on TESLA cards), I can't use the Blade system. The boxes I expect is one Dell Precission T7500, 96 GB RAM, two sockets, two Westmere XEONs each socket with a summary of 12 cores/24 threads. I'll start a dual OS installation with FreeBSD 10 and the most recent Suse (since the development is mostly done by my colleagues on Suse for the C2075 TESLA board, I need Suse Linux). I will then being capable of performing some benchmarks on both boxes on the very same hardware. The other box will be my desk's box, a brand new Sandy-Bridge E CPU (i7-3960X) with 32 GB RAM. I'm also inclined to install a dual boot box (I rejected this up to now since I do not like to install GRUB2 for having multiboot when using GPT on FreeBSD). The box will run with FreeBSD 9 and an Ubuntu or Gentoo Linux, if. I'm unsure in the question of Linux, but I tend to have Gentoo for compiling everything myself. On this box, I also can perform benchmarks with several setups. I see forward getting some help and/or tips to proof the issues we discussed here. Oliver signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
On Sun Dec 18 11, Alexander Best wrote: > On Sun Dec 18 11, Andrey Chernov wrote: > > On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote: > > > On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote: > > > > On 13/12/2011 09:00, Andrey Chernov wrote: > > > > > I observe ULE interactivity slowness even on single core machine > > > (Pentium > > > > > 4) in very visible places, like 'ps ax' output stucks in the middle > > > by ~1 > > > > > second. When I switch back to SHED_4BSD, all slowness is gone. > > > > > > > > I'm also seeing problems with ULE on a dual-socket quad-core Xeon > > > machine > > > > with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16 > > > > buildworld" then logging into another console can take several seconds. > > > > Sometimes even the "Password:" prompt can take a couple of seconds to > > > appear > > > > after typing my username. > > > > > > I'd resigned myself to expecting this sort of behaviour as 'normal' on > > > my single core 1133MHz PIII-M. As a reproducable data point, running > > > 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat > > > the CPU while testing my manual fan control script, hogs it up pretty > > > much while regularly running the script below in another konsole to > > > check values - which often gets stuck half way, occasionally pausing > > > _twice_ before finishing. Switching back to the first konsole (on > > > another desktop) to kill the dd can also take a couple/few seconds. > > > > This issue not about slow machine under load, because the same > > slow machine under exact the same load, but with SCHED_4BSD is very fast > > to response interactively. > > > > I think we should not misinterpret interactivity with speed. I see no big > > speed (i.e. compilation time) differences, switching schedulers, but see > > big _interactivity_ difference. ULE in general tends to underestimate > > interactive processes in favour of background ones. It perhaps helps to > > compilation, but looks like slowpoke OS from the interactive user > > experience. > > +1 > > i've also experienced issues with ULE and performed several tests to compare > it to the historical 4BSD scheduler. the difference between the two does *not* > seem to be speed (at least not a huge difference), but interactivity. > > one of the tests i performed was the following > > ttyv0: untar a *huge* (+10G) archive > ttyv1: after ~ 30 seconds of untaring do 'ls -la $direcory', where directory >contains a lot of files. i used "direcory = /var/db/portsnap", because s/portsnap/portsnap\/files/ >that directory contains 23117 files on my machine. > > measuring 'ls -la $direcory' via time(1) revealed that SCHED_ULE takes > 15 > seconds, whereas SCHED_4BSD only takes ~ 3-5 seconds. i think the issue is io. > io operations usually get a high priority, because statistics have shown that > - unlike computational tasks - io intensive tasks only run for a small > fraction > of time and then exit: read data -> change data -> writeback data. > > so SCHED_ULE might take these statistics too literaly and gives tasks like > bsdtar(1) (in my case) too many ressources, so other tasks which require io > are > struggling to get some ressources assigned to them (ls(1) in my case). > > of course SCHED_4BSD isn't perfect, too. try using it and run the stress2 > testsuite. your whole system will grind to a halt. mouse input drops below > 1 HZ. even after killing all the stress2 tests, it will take a few minutes > after the system becomes snappy again. > > cheers. > alex > > > > > -- > > http://ache.vniz.net/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Sun Dec 18 11, Andrey Chernov wrote: > On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote: > > On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote: > > > On 13/12/2011 09:00, Andrey Chernov wrote: > > > > I observe ULE interactivity slowness even on single core machine > > (Pentium > > > > 4) in very visible places, like 'ps ax' output stucks in the middle by > > ~1 > > > > second. When I switch back to SHED_4BSD, all slowness is gone. > > > > > > I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine > > > with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16 > > > buildworld" then logging into another console can take several seconds. > > > Sometimes even the "Password:" prompt can take a couple of seconds to > > appear > > > after typing my username. > > > > I'd resigned myself to expecting this sort of behaviour as 'normal' on > > my single core 1133MHz PIII-M. As a reproducable data point, running > > 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat > > the CPU while testing my manual fan control script, hogs it up pretty > > much while regularly running the script below in another konsole to > > check values - which often gets stuck half way, occasionally pausing > > _twice_ before finishing. Switching back to the first konsole (on > > another desktop) to kill the dd can also take a couple/few seconds. > > This issue not about slow machine under load, because the same > slow machine under exact the same load, but with SCHED_4BSD is very fast > to response interactively. > > I think we should not misinterpret interactivity with speed. I see no big > speed (i.e. compilation time) differences, switching schedulers, but see > big _interactivity_ difference. ULE in general tends to underestimate > interactive processes in favour of background ones. It perhaps helps to > compilation, but looks like slowpoke OS from the interactive user > experience. +1 i've also experienced issues with ULE and performed several tests to compare it to the historical 4BSD scheduler. the difference between the two does *not* seem to be speed (at least not a huge difference), but interactivity. one of the tests i performed was the following ttyv0: untar a *huge* (+10G) archive ttyv1: after ~ 30 seconds of untaring do 'ls -la $direcory', where directory contains a lot of files. i used "direcory = /var/db/portsnap", because that directory contains 23117 files on my machine. measuring 'ls -la $direcory' via time(1) revealed that SCHED_ULE takes > 15 seconds, whereas SCHED_4BSD only takes ~ 3-5 seconds. i think the issue is io. io operations usually get a high priority, because statistics have shown that - unlike computational tasks - io intensive tasks only run for a small fraction of time and then exit: read data -> change data -> writeback data. so SCHED_ULE might take these statistics too literaly and gives tasks like bsdtar(1) (in my case) too many ressources, so other tasks which require io are struggling to get some ressources assigned to them (ls(1) in my case). of course SCHED_4BSD isn't perfect, too. try using it and run the stress2 testsuite. your whole system will grind to a halt. mouse input drops below 1 HZ. even after killing all the stress2 tests, it will take a few minutes after the system becomes snappy again. cheers. alex > > -- > http://ache.vniz.net/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote: > On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote: > > On 13/12/2011 09:00, Andrey Chernov wrote: > > > I observe ULE interactivity slowness even on single core machine (Pentium > > > 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 > > > second. When I switch back to SHED_4BSD, all slowness is gone. > > > > I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine > > with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16 > > buildworld" then logging into another console can take several seconds. > > Sometimes even the "Password:" prompt can take a couple of seconds to > appear > > after typing my username. > > I'd resigned myself to expecting this sort of behaviour as 'normal' on > my single core 1133MHz PIII-M. As a reproducable data point, running > 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat > the CPU while testing my manual fan control script, hogs it up pretty > much while regularly running the script below in another konsole to > check values - which often gets stuck half way, occasionally pausing > _twice_ before finishing. Switching back to the first konsole (on > another desktop) to kill the dd can also take a couple/few seconds. This issue not about slow machine under load, because the same slow machine under exact the same load, but with SCHED_4BSD is very fast to response interactively. I think we should not misinterpret interactivity with speed. I see no big speed (i.e. compilation time) differences, switching schedulers, but see big _interactivity_ difference. ULE in general tends to underestimate interactive processes in favour of background ones. It perhaps helps to compilation, but looks like slowpoke OS from the interactive user experience. -- http://ache.vniz.net/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote: > On 13/12/2011 09:00, Andrey Chernov wrote: > > I observe ULE interactivity slowness even on single core machine (Pentium > > 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 > > second. When I switch back to SHED_4BSD, all slowness is gone. > > I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine > with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16 > buildworld" then logging into another console can take several seconds. > Sometimes even the "Password:" prompt can take a couple of seconds to appear > after typing my username. I'd resigned myself to expecting this sort of behaviour as 'normal' on my single core 1133MHz PIII-M. As a reproducable data point, running 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat the CPU while testing my manual fan control script, hogs it up pretty much while regularly running the script below in another konsole to check values - which often gets stuck half way, occasionally pausing _twice_ before finishing. Switching back to the first konsole (on another desktop) to kill the dd can also take a couple/few seconds. t23# cat /root/bin/t23stat #!/bin/sh echo -n "`date` " sysctl dev.cpu.0.freq dev.cpu.0.cx_usage sysctl dev.acpi_ibm | egrep 'fan_|thermal' sysctl hw.acpi.thermal.tz0.temperature acpiconf -i0 | egrep 'State|Remain|Present|Volt' Sure it's a slow machine, but it normally runs pretty smoothly. Anything with a bit of disk i/o, like buildworld, runs smooth as. This is on 8.2-R GENERIC, HZ=1000, 768MB with lots free, no swap in use. I'll definitely be trying SCHED_4BSD after updating to 8-stable unless a 'miracle cure' appears beforehand. cheers, Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 17 December 2011 14:00, Andriy Gapon wrote: > on 17/12/2011 23:20 Adrian Chadd said the following: >> This may -not- be a userland specific problem.. > That's an interesting idea. From the recent discussion about USB I can > conclude > that USB threads run at higher priority than GEOM threads: PI_NET/PI_DISK vs > PRIBIO. The former is from the ithread range, the latter is from the regular > kernel range. Maybe it would make sense to give the GEOM threads a priority > from the ithread range too - given their role and importance. Ah, so I can just punt this to you? Sweet! *punt*. I haven't had time to dig into the network side of things but I do plan on doing this soon. Hopefully something really silly shows up. Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 13/12/2011 09:00, Andrey Chernov wrote: I observe ULE interactivity slowness even on single core machine (Pentium 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 second. When I switch back to SHED_4BSD, all slowness is gone. I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16 buildworld" then logging into another console can take several seconds. Sometimes even the "Password:" prompt can take a couple of seconds to appear after typing my username. -- Bruce Cran ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
on 17/12/2011 23:20 Adrian Chadd said the following: > Erm, just as a random question - since device drivers (and GEOM) run > as separate threads, has anyone looked into what kind of effects the > scheduler has on these? > > I definitely have measurable throughput/responsiveness differences > between ULE and 4BSD (and preempt/non-preempt on 4BSD) on my MIPS > boards when they're bridging traffic. I wonder if there's something > strange going on with the scheduling and preemption of driver netisrs, > taskqueues, the fast interrupt handlers, etc. > > This may -not- be a userland specific problem.. That's an interesting idea. From the recent discussion about USB I can conclude that USB threads run at higher priority than GEOM threads: PI_NET/PI_DISK vs PRIBIO. The former is from the ithread range, the latter is from the regular kernel range. Maybe it would make sense to give the GEOM threads a priority from the ithread range too - given their role and importance. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
Erm, just as a random question - since device drivers (and GEOM) run as separate threads, has anyone looked into what kind of effects the scheduler has on these? I definitely have measurable throughput/responsiveness differences between ULE and 4BSD (and preempt/non-preempt on 4BSD) on my MIPS boards when they're bridging traffic. I wonder if there's something strange going on with the scheduling and preemption of driver netisrs, taskqueues, the fast interrupt handlers, etc. This may -not- be a userland specific problem.. Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
on 17/12/2011 19:33 George Mitchell said the following: > Summing up for the record, in my original test: > 1. It doesn't matter whether X is running or not. > 2. The problem is not limited to two or fewer CPUs. (It also happens >for me on a six-CPU system.) > 3. It doesn't require nCPU + 1 compute-bound processes, just nCPU. > > With nCPU compute-bound processes running, with SCHED_ULE, any other > process that is interactive (which to me means frequently waiting for > I/O) gets ABYSMAL performance -- over an order of magnitude worse than > it gets with SCHED_4BSD under the same conditions. I definitely do not see anything like this. Specifically: - with X - with 2 CPUs - with nCPU and/or nCPU + 1 compute-bound processes - with SCHED_ULE obviously :-) I do not get "abysmal" performance for I/O active tasks. Perhaps there is something specific that you would want me to run and measure. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/14/11 21:05, Oliver Pinter wrote: [...] Hi! Can you try with this settings: op@opn ~> sysctl kern.sched. kern.sched.cpusetsize: 8 kern.sched.preemption: 0 kern.sched.name: ULE kern.sched.slice: 13 kern.sched.interact: 30 kern.sched.preempt_thresh: 224 kern.sched.static_boost: 152 kern.sched.idlespins: 1 kern.sched.idlespinthresh: 16 kern.sched.affinity: 1 kern.sched.balance: 1 kern.sched.balance_interval: 133 kern.sched.steal_htt: 1 kern.sched.steal_idle: 1 kern.sched.steal_thresh: 1 kern.sched.topology_spec: 0, 1 0, 1 [...] Sorry I didn't try this earlier, but I had time this morning. Apparently you can't change kern.sched.preemption without recompiling, so I did that. It didn't help, and subjectively it made interactive performance worse. I changed preempt_thresh and observed no difference. There were only a couple of small differences between your other settings and the 9.0-PRERELEASE defaults. Summing up for the record, in my original test: 1. It doesn't matter whether X is running or not. 2. The problem is not limited to two or fewer CPUs. (It also happens for me on a six-CPU system.) 3. It doesn't require nCPU + 1 compute-bound processes, just nCPU. With nCPU compute-bound processes running, with SCHED_ULE, any other process that is interactive (which to me means frequently waiting for I/O) gets ABYSMAL performance -- over an order of magnitude worse than it gets with SCHED_4BSD under the same conditions.-- George ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
On 12/16/2011 14:59, Luigi Rizzo wrote: > It really looks much easier than i thought initially. Awesome! -- [^L] Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
On Fri, Dec 16, 2011 at 01:51:26PM -0800, Doug Barton wrote: > On 12/16/2011 13:40, Michel Talon wrote: > > Adrian Chadd said: > > > > > >> Hi all, > >> > >> Can someone load a kernel module dynamically at boot-time? > >> > >> Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at > >> boot-time, so the user can just change by rebooting? > >> > >> That may be an acceptable solution for now. > > > > As Luigi explained, the problem is not to have code for both > > schedulers residing in the kernel, the problem is to migrate > > processes from one scheduler to the other. > > I think dynamically switching schedulers on a running system and loading > one or the other at boot time are different problems, are they not? Runtime switching is a superset of loading as a module at boot time. In both cases you need to implement a generic interface between the scheduler and the rest of the system. The good thing, compared to 2002, is that now the abstraction exists, it is made by all functions and variables named sched_*() in sched_4bsd.c and sched_ule.c I see there is a small number of #ifdef SCHED_ULE in a couple of files, but probably it can be fixed. I believe all is needed for dynamic scheduler loading is to create function pointers for all these names, and initialize them when one of the scheduler modules is loaded. After that, runtime switching shouldn't require a lot of work either. The architecture and implementation i posted earlier (repeated below for convenience) should work, with just a bit of attention at locking the scheduler during a switch. References: http://kerneltrap.org/node/349 http://info.iet.unipi.it/~luigi/ps_sched.20020719a.diff It really looks much easier than i thought initially. cheers luigi ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
On 12/16/2011 14:16, Michel Talon wrote: > Of course, you are perfectly right., and i had misunderstood Adrian's > post. Happens to the best of us. :) > But if the problem is only to change scheduler by rebooting, i think > it is no more expensive to compile a kernel with the other scheduler. > Or is it that people never compile kernels nowadays? That's part of it. For my money the other 2 big problems are first that we'd like to make it as easy on the 'make release' and installer processes as possible. I imagine (although I would not object to being proven wrong) that 1 kernel with knobs is easier to manage and less resource intensive than 2 kernels that differ only by this 1 feature. The other big problem is freebsd-update. While I assume that logic could be built into the system to handle this issue, if the guts can be built into the kernel itself why not do that instead? Of lesser, but not insignificant consideration is the possibility that at some point we'll have more than 2 scheduler options. Doug -- [^L] Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
Le 16 déc. 2011 à 22:51, Doug Barton a écrit : > On 12/16/2011 13:40, Michel Talon wrote: >> Adrian Chadd said: >> >> >>> Hi all, >>> >>> Can someone load a kernel module dynamically at boot-time? >>> >>> Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at >>> boot-time, so the user can just change by rebooting? >>> >>> That may be an acceptable solution for now. >> >> As Luigi explained, the problem is not to have code for both >> schedulers residing in the kernel, the problem is to migrate >> processes from one scheduler to the other. > > I think dynamically switching schedulers on a running system and loading > one or the other at boot time are different problems, are they not? > Of course, you are perfectly right., and i had misunderstood Adrian's post. But if the problem is only to change scheduler by rebooting, i think it is no more expensive to compile a kernel with the other scheduler. Or is it that people never compile kernels nowadays? The ability to switch scheduler on a running machine would certainly be a more desirable way to test the best adaptation of the system to the load. To come back to the problems in question about ULE i must say i don't see obvious malfunctions for my own use (i had some problems of this sort long ago, but they disappeared with more recent FreeBSD). > > Doug > > -- Michel Talon ta...@lpthe.jussieu.fr ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
On 12/16/2011 13:40, Michel Talon wrote: > Adrian Chadd said: > > >> Hi all, >> >> Can someone load a kernel module dynamically at boot-time? >> >> Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at >> boot-time, so the user can just change by rebooting? >> >> That may be an acceptable solution for now. > > As Luigi explained, the problem is not to have code for both > schedulers residing in the kernel, the problem is to migrate > processes from one scheduler to the other. I think dynamically switching schedulers on a running system and loading one or the other at boot time are different problems, are they not? Doug -- [^L] Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
Adrian Chadd said: > Hi all, > > Can someone load a kernel module dynamically at boot-time? > > Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at > boot-time, so the user can just change by rebooting? > > That may be an acceptable solution for now. As Luigi explained, the problem is not to have code for both schedulers residing in the kernel, the problem is to migrate processes from one scheduler to the other. -- Michel Talon ta...@lpthe.jussieu.fr ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
On 12/16/2011 12:53, Adrian Chadd wrote: > Hi all, > > Can someone load a kernel module dynamically at boot-time? > > Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at > boot-time, so the user can just change by rebooting? > > That may be an acceptable solution for now. That, or a loader.conf tunable (which in the case of making them modules would basically amount to the same thing, right?). I've heard several really smart people with rather convincing explanations of why ULE is not the right choice for default for 2 cores or less. If we could ship one kernel with both schedulers available it should be simple to modify the installer to choose the right one and put the right stuff in loader.conf. Doug -- [^L] Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
Hi all, Can someone load a kernel module dynamically at boot-time? Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at boot-time, so the user can just change by rebooting? That may be an acceptable solution for now. Adrian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
On Fri, Dec 16, 2011 at 11:46:35AM +0100, Stefan Esser wrote: > Am 16.12.2011 09:11, schrieb Luigi Rizzo: > > The interesting part is probably the definition of the methods that > > schedulers should implement (see struct _sched_interface ). > > > > The switch from one scheduler to another was implemented with a > > sysctl. This calls the sched_move() method of the current (i.e. > > old) scheduler, which extracts all ready processes from its own > > "queues" (however they are implemented) and reinserts them onto the > > new scheduler's "queues" using its (new) setrunqueue() method. You > > don't need to bother for blocked process as the scheduler doesn't > > know much about them. > > > > I am not preserving the thread's dynamic "priority" (think of > > accumulated work, affinity etc.) when switching > > schedulers, as that is expected to be an infrequent event, and > > so in the end it doesn't really matter -- at a switch, threads > > are inserted in the scheduler as newly created ones, using only > > the static priority as a parameter. > > I think this is OK for user processes (which will receive reasonable > relative priorities after running a fraction of a second, anyway). > > But I'm not sure whether it is possible to use static priorities for > (real-time) kernel threads, where priority inversion may occur, if the > current dynamic (relative) thread priorities are not preserved. the word "priority" is too overloaded in this context, as it mixes configuration information (which i called "static priority", and would be really better characterized as the "service parameters" you specify when you start a new thread) and scheduler state ("dynamic priority" in a priority based scheduler, but other schedulers have different state info, such as tickets, virtual times, deadlines, cpu affinity and so on). What i meant to say is that the way i implemented it (and i believe it is almost the only practical way), on a change of scheduler, all processes are requeued as if they had just started. Then it will be the active scheduler the one who can change the initial state according the evolution of the system (changing priorities, tickets, virtual times, deadlines, etc.) > But not only the relative priorities of the existing processes must be > preserved, new kernel threads must be created with matching (relative) > priorities. This means, that the schedulers may be switched at any time, > but the priority values should be portable between schedulers to prevent > dead-lock (or illegal order of execution?) of threads (AFAICT). This issue (i think you have in mind priority inheritance, priority inversion and related issues) is almost irrelevant in FreeBSD, and i am really sorry to see that it comes up so frequently in discussions and sometimes also in documentation related to process schedulers. Apart from bugs in the implementation (see Bruce Evans' email from a few days ago), our CPU schedulers are a collection of heuristics without formally proved properties. So, as much as we can trust developers to come up with effective solutions: - we cannot rely on priorities for correctness (mutual exclusion or deadlock avoidance); - we don't have any support for real time guarantees; - average performance (which is why some of our priority-based schedulers may decide to implement priority inheritance) is not affected by events as infrequent as changing schedulers. cheers luigi ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
2011/12/15 Steve Kargl : > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote: >> >> I basically went through all the e-mail you just sent and identified 4 >> real report on which we could work on and summarizied in the attached >> Excel file. >> I'd like that George, Steve, Doug, Andrey and Mike possibly review the >> few datas there and add more, if they want, or make more important >> clarifications in particular about the Xorg presence (or rather not) >> in their workload. > > Your summary of my observations appears correct. > > I have grabbed an up-to-date /usr/src, built and > installed world, and built and installed a new > kernel on one of the nodes in my cluster. It > has > > CPU: Dual Core AMD Opteron(tm) Processor 280 (2392.65-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0x20f12 Family = f Model = 21 Stepping = 2 > Features=0x178bfbff MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > Features2=0x1 > AMD Features=0xe2500800 > AMD Features2=0x3 > real memory = 17179869184 (16384 MB) > avail memory = 16269832192 (15516 MB) > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > FreeBSD/SMP: 2 package(s) x 2 core(s) > > I can perform new tests with both ULE and 4BSD, but you'll > need to be precise in the information you want collected > (and how to collect the data) due to the rather limited > amount of time I currently have. It seems a perfect environment, just please make sure you made a debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically). The first thing is, can you try reproducing your case? As far as I got it, for you it was enough to run N + small_amount of CPU-bound threads to show performance penalty, so I'd ask you to start with using dnetc or just your preferred cpu-bound workload and verify you can reproduce the issue. As it happens, please monitor the threads bouncing and CPU utilization via 'top' (you don't need to be 100% precise, jut to get an idea, and keep an eye on things like excessive threads migration, thread binding obsessity, low throughput on CPU). One note: if your workloads need to do I/O please use a tempfs or memory storage to do so, in order to reduce I/O effects at all. Also, verify this doesn't happen with 4BSD scheduler, just in case. Finally, if the problem is still in place, please recompile your kernel by adding: options KTR options KTR_ENTRIES=262144 options KTR_COMPILE=(KTR_SCHED) options KTR_MASK=(KTR_SCHED) And reproduce the issue. When you are in the middle of the scheduling issue go with: # ktrdump -ctf > ktr-ule-problem-YOURNAME.out and send to the mailing list along with your dmesg and the informations on the CPU utilization you gathered by top(1). That should cover it all, but if you have further questions, please just go ahead. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: switching schedulers (Re: SCHED_ULE should not be the default)
Am 16.12.2011 09:11, schrieb Luigi Rizzo: > The interesting part is probably the definition of the methods that > schedulers should implement (see struct _sched_interface ). > > The switch from one scheduler to another was implemented with a > sysctl. This calls the sched_move() method of the current (i.e. > old) scheduler, which extracts all ready processes from its own > "queues" (however they are implemented) and reinserts them onto the > new scheduler's "queues" using its (new) setrunqueue() method. You > don't need to bother for blocked process as the scheduler doesn't > know much about them. > > I am not preserving the thread's dynamic "priority" (think of > accumulated work, affinity etc.) when switching > schedulers, as that is expected to be an infrequent event, and > so in the end it doesn't really matter -- at a switch, threads > are inserted in the scheduler as newly created ones, using only > the static priority as a parameter. I think this is OK for user processes (which will receive reasonable relative priorities after running a fraction of a second, anyway). But I'm not sure whether it is possible to use static priorities for (real-time) kernel threads, where priority inversion may occur, if the current dynamic (relative) thread priorities are not preserved. But not only the relative priorities of the existing processes must be preserved, new kernel threads must be created with matching (relative) priorities. This means, that the schedulers may be switched at any time, but the priority values should be portable between schedulers to prevent dead-lock (or illegal order of execution?) of threads (AFAICT). Regards, STefan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
switching schedulers (Re: SCHED_ULE should not be the default)
On Fri, Dec 16, 2011 at 03:11:43AM +0100, C. P. Ghost wrote: > On Thu, Dec 15, 2011 at 10:44 AM, Tom Evans wrote: > > Real time scheduler changing would be insane! I was thinking that > > both/any/all schedulers could be compiled into the kernel, and the > > choice of which one to use becomes a boot time configuration. You > > don't have to recompile the kernel to change timecounter. > > Right. > > Switching the scheduler on the fly may be thinkable though. > I could imagine a syscall that would suspend all scheduling, > convert the bookkeeping data of one scheduler into the other > scheduler's, and transfer control to the other scheduler. Of > course, that would require some heavy hacking, as I would > imagine that "cross-scheduler surgery" would result in a pretty > hard to debug kernel (at least during development). Since the subject has come up a few times: back in 2002 (boy, it's almost 10 years ago!) we did implement switchable schedulers on FreeBSD 4.x UP, and the diffs and a bit of documentation are still online, and probably the architecture could be reused even now or for the SMP case. announcement and brief description http://kerneltrap.org/node/349 the patch referred in there http://info.iet.unipi.it/~luigi/ps_sched.20020719a.diff The interesting part is probably the definition of the methods that schedulers should implement (see struct _sched_interface ). The switch from one scheduler to another was implemented with a sysctl. This calls the sched_move() method of the current (i.e. old) scheduler, which extracts all ready processes from its own "queues" (however they are implemented) and reinserts them onto the new scheduler's "queues" using its (new) setrunqueue() method. You don't need to bother for blocked process as the scheduler doesn't know much about them. I am not preserving the thread's dynamic "priority" (think of accumulated work, affinity etc.) when switching schedulers, as that is expected to be an infrequent event, and so in the end it doesn't really matter -- at a switch, threads are inserted in the scheduler as newly created ones, using only the static priority as a parameter. At the time I did not address the SMP case for several reasons, but they are all gone now: - did not have a suitable test system - SMP support was still in a state of flux - i did not understand the KSE concept - and i did not have an algorithm for proportional share scheduling (the actual goal of the project) in an SMP context. cheers luigi > > A more general solution could even be a separate userland > scheduler process a la L4 [*], but since we don't have lightweight > IPC in the kernel (yet, or never), it would require even heavier > black wizardry. But nice and flexible it would be. ;-) > > [*] Refs: > - https://github.com/l4ka/pistachio > - > http://www.systems.ethz.ch/education/past-courses/fall-2010/aos/lectures/wk13-scheduling-print.pdf > > Regards, > -cpghost. > > -- > Cordula's Web. http://www.cordula.ws/ > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 10:44 AM, Tom Evans wrote: > Real time scheduler changing would be insane! I was thinking that > both/any/all schedulers could be compiled into the kernel, and the > choice of which one to use becomes a boot time configuration. You > don't have to recompile the kernel to change timecounter. Right. Switching the scheduler on the fly may be thinkable though. I could imagine a syscall that would suspend all scheduling, convert the bookkeeping data of one scheduler into the other scheduler's, and transfer control to the other scheduler. Of course, that would require some heavy hacking, as I would imagine that "cross-scheduler surgery" would result in a pretty hard to debug kernel (at least during development). A more general solution could even be a separate userland scheduler process a la L4 [*], but since we don't have lightweight IPC in the kernel (yet, or never), it would require even heavier black wizardry. But nice and flexible it would be. ;-) [*] Refs: - https://github.com/l4ka/pistachio - http://www.systems.ethz.ch/education/past-courses/fall-2010/aos/lectures/wk13-scheduling-print.pdf Regards, -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote: > > I basically went through all the e-mail you just sent and identified 4 > real report on which we could work on and summarizied in the attached > Excel file. > I'd like that George, Steve, Doug, Andrey and Mike possibly review the > few datas there and add more, if they want, or make more important > clarifications in particular about the Xorg presence (or rather not) > in their workload. Your summary of my observations appears correct. I have grabbed an up-to-date /usr/src, built and installed world, and built and installed a new kernel on one of the nodes in my cluster. It has CPU: Dual Core AMD Opteron(tm) Processor 280 (2392.65-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x20f12 Family = f Model = 21 Stepping = 2 Features=0x178bfbff Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x3 real memory = 17179869184 (16384 MB) avail memory = 16269832192 (15516 MB) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 2 package(s) x 2 core(s) I can perform new tests with both ULE and 4BSD, but you'll need to be precise in the information you want collected (and how to collect the data) due to the rather limited amount of time I currently have. To summarize my workload, on the master node on my cluster I start a job that will send N slave jobs to the node of interest. The slaves perform nearly identical cpu-bound floating point computations, so the expectation is that each slave should take nearly the same amount of cpu-time to complete its task. Communication occurs between only the master and a slave at the start of the process and when it finishes. The communication is over GigE ipv4 internal network. The slaves do not read or write to disk. -- Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
2011/12/15 Mike Tancsa : > On 12/15/2011 11:56 AM, Attilio Rao wrote: >> So, as very first thing, can you try the following: >> - Same codebase, etc. etc. >> - Make the test 4 times, discard the first and ministat for the other 3 >> - Reboot >> - Change the steal_thresh value >> - Make the test 4 times, discard the first and ministat for the other 3 >> >> Then report discarded values and the ministated one and we will have >> more informations I guess >> (also, I don't think devfs contention should play a role here, thus >> nevermind about it for now). > > > Results and data at > > http://www.tancsa.com/ule-bsd.html I'm not totally sure, what does burnP6 do? is it a CPU-bound workload? Also, how many threads are spanked in your case for parallel bzip2? Also, it would be very good if you could arrange these tests against newer -CURRENT (with userland and kerneland debugging off). Thanks a lot of your hard work, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/15/2011 11:56 AM, Attilio Rao wrote: > So, as very first thing, can you try the following: > - Same codebase, etc. etc. > - Make the test 4 times, discard the first and ministat for the other 3 > - Reboot > - Change the steal_thresh value > - Make the test 4 times, discard the first and ministat for the other 3 > > Then report discarded values and the ministated one and we will have > more informations I guess > (also, I don't think devfs contention should play a role here, thus > nevermind about it for now). Results and data at http://www.tancsa.com/ule-bsd.html ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
В Thu, 15 Dec 2011 20:02:44 +0100 Attilio Rao пишет: > 2011/12/15 Jeremy Chadwick : > > On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote: > >> 2011/12/13 Jeremy Chadwick : > >> > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: > >> >> > Not fully right, boinc defaults to run on idprio 31 so this > >> >> > isn't an issue. And yes, there are cases where SCHED_ULE > >> >> > shows much better performance then SCHED_4BSD. ??[...] > >> >> > >> >> Do we have any proof at hand for such cases where SCHED_ULE > >> >> performs much better than SCHED_4BSD? Whenever the subject > >> >> comes up, it is mentioned, that SCHED_ULE has better > >> >> performance on boxes with a ncpu > 2. But in the end I see here > >> >> contradictionary statements. People complain about poor > >> >> performance (especially in scientific environments), and other > >> >> give contra not being the case. > >> >> > >> >> Within our department, we developed a highly scalable code for > >> >> planetary science purposes on imagery. It utilizes present GPUs > >> >> via OpenCL if present. Otherwise it grabs as many cores as it > >> >> can. By the end of this year I'll get a new desktop box based > >> >> on Intels new Sandy Bridge-E architecture with plenty of > >> >> memory. If the colleague who developed the code is willing > >> >> performing some benchmarks on the same hardware platform, we'll > >> >> benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For > >> >> FreeBSD I intent also to look for performance with both > >> >> different schedulers available. > >> > > >> > This is in no way shape or form the same kind of benchmark as > >> > what you're planning to do, but I thought I'd throw it out there > >> > for folks to take in as they see fit. > >> > > >> > I know folks were focused mainly on buildworld. > >> > > >> > I personally would find it interesting if someone with a > >> > higher-end system (e.g. 2 physical CPUs, with 6 or 8 cores per > >> > CPU) was to do the same test (changing -jX to -j{numofcores} of > >> > course). > >> > > >> > -- > >> > | Jeremy > >> > Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at > >> > parodius.com | | Parodius > >> > Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? > >> > http://www.parodius.com/ | | UNIX Systems > >> > Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, US | > >> > | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? > >> > PGP 4BD6C0CB | > >> > > >> > > >> > sched_ule > >> > === > >> > - time make -j2 buildworld > >> > ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io > >> > 4565pf+0w > >> > - time make -j2 buildkernel > >> > ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w > >> > > >> > > >> > sched_4bsd > >> > > >> > - time make -j2 buildworld > >> > ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io > >> > 6451pf+0w > >> > - time make -j2 buildkernel > >> > ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w > >> > > >> > > >> > software > >> > == > >> > * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST > >> > 2011 > >> > * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST > >> > 2011 > >> > >> Hi Jeremy, > >> thanks for the time you spent on this. > >> > >> However, I wanted to ask/let you note 3 things: > >> 1) Did you use 2 different code base for the test? (one updated on > >> December 1 and another one on December 12) > > > > No; src-all (/usr/src on this system) was not updated between > > December 1st and December 12th PST. I do believe I updated it > > today (15th PST). I can/will obviously hold off so that we have a > > consistent code base for comparing numbers between schedulers > > during buildworld and/or buildkernel. > > > >> 2) Please note that you should have repeated this test several > >> times (basically until you don't get a standard deviation which is > >> acceptable with ministat) and report the ministat output > > > > This is the first time I have heard of ministat(1). I'm pretty > > sure I see what it's for and how it applies to this situation, but > > boy that man page could use some clarification (I have 3 people > > looking at this thing right now trying to figure out what means > > what in the graph :-) ). Anyway, graph or not, I see the point. > > > > Regarding multiple tests: yup, you're absolutely right, the only > > way to do it would be to run a sequence of tests repeatedly > > (probably 10 per scheduler). Reboots and rm -fr /usr/obj/* would > > be required after each test too, to guarantee empty kernel caches > > (of all types) consistently every time. > > > > What I posted was supposed to give people just a "general idea" if > > there was any gigantic difference between the two, and there really > > isn't. But, as others have stated (and you below), buildworld may > > not be an effective way to "benchmark" what we're trying to test. > > > > Hence me wondering exactly what would make for a goo
Re: SCHED_ULE should not be the default
2011/12/15 Jeremy Chadwick : > On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote: >> 2011/12/13 Jeremy Chadwick : >> > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: >> >> > Not fully right, boinc defaults to run on idprio 31 so this isn't an >> >> > issue. And yes, there are cases where SCHED_ULE shows much better >> >> > performance then SCHED_4BSD. ??[...] >> >> >> >> Do we have any proof at hand for such cases where SCHED_ULE performs >> >> much better than SCHED_4BSD? Whenever the subject comes up, it is >> >> mentioned, that SCHED_ULE has better performance on boxes with a ncpu > >> >> 2. But in the end I see here contradictionary statements. People >> >> complain about poor performance (especially in scientific environments), >> >> and other give contra not being the case. >> >> >> >> Within our department, we developed a highly scalable code for planetary >> >> science purposes on imagery. It utilizes present GPUs via OpenCL if >> >> present. Otherwise it grabs as many cores as it can. >> >> By the end of this year I'll get a new desktop box based on Intels new >> >> Sandy Bridge-E architecture with plenty of memory. If the colleague who >> >> developed the code is willing performing some benchmarks on the same >> >> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most >> >> recent Suse. For FreeBSD I intent also to look for performance with both >> >> different schedulers available. >> > >> > This is in no way shape or form the same kind of benchmark as what >> > you're planning to do, but I thought I'd throw it out there for folks to >> > take in as they see fit. >> > >> > I know folks were focused mainly on buildworld. >> > >> > I personally would find it interesting if someone with a higher-end >> > system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the >> > same test (changing -jX to -j{numofcores} of course). >> > >> > -- >> > | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at >> > parodius.com | >> > | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? >> > http://www.parodius.com/ | >> > | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, >> > US | >> > | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP >> > 4BD6C0CB | >> > >> > >> > sched_ule >> > === >> > - time make -j2 buildworld >> > ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w >> > - time make -j2 buildkernel >> > ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w >> > >> > >> > sched_4bsd >> > >> > - time make -j2 buildworld >> > ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w >> > - time make -j2 buildkernel >> > ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w >> > >> > >> > software >> > == >> > * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011 >> > * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 >> >> Hi Jeremy, >> thanks for the time you spent on this. >> >> However, I wanted to ask/let you note 3 things: >> 1) Did you use 2 different code base for the test? (one updated on >> December 1 and another one on December 12) > > No; src-all (/usr/src on this system) was not updated between December > 1st and December 12th PST. I do believe I updated it today (15th PST). > I can/will obviously hold off so that we have a consistent code base for > comparing numbers between schedulers during buildworld and/or > buildkernel. > >> 2) Please note that you should have repeated this test several times >> (basically until you don't get a standard deviation which is >> acceptable with ministat) and report the ministat output > > This is the first time I have heard of ministat(1). I'm pretty sure I > see what it's for and how it applies to this situation, but boy that man > page could use some clarification (I have 3 people looking at this thing > right now trying to figure out what means what in the graph :-) ). > Anyway, graph or not, I see the point. > > Regarding multiple tests: yup, you're absolutely right, the only way to > do it would be to run a sequence of tests repeatedly (probably 10 per > scheduler). Reboots and rm -fr /usr/obj/* would be required after each > test too, to guarantee empty kernel caches (of all types) consistently > every time. > > What I posted was supposed to give people just a "general idea" if there > was any gigantic difference between the two, and there really isn't. > But, as others have stated (and you below), buildworld may not be an > effective way to "benchmark" what we're trying to test. > > Hence me wondering exactly what would make for a good test. Example: > > 1. Run + background some program that "beats on things" (I really don't > know what; creation/deletion of threads? CPU benchmark? bonnie++?), > with output going to /dev/null. > 2. Run + background "time make -j2 buildworld" with output going to /dev/null > 3. Record/save output fr
Re: SCHED_ULE should not be the default
Am 12/15/11 15:20, schrieb Steven Hartland: > With all the discussion I thought I'd give a buildworld > benchmark a go here on a spare 24 core machine. ULE > tested fine but with 4BSD it wont even boot panicing > with the following:- > http://screensnapr.com/v/hwysGV.png > > This is on a clean 8.2-RELEASE-p4 > > Upgrading to RELENG_9 fixed this but its a bit concerning > that just changing the scheduler would cause the machine > to panic on boot. > > Its only a single run so varience could be high but here's > the result of a buildworld on this machine running the > two different schedulers:- > 4BSD: 24m54.10s real 2h43m12.42s user 56m20.07s sys > ULE: 23m54.68s real 2h34m59.04s user 50m59.91s sys > > What really sticks out is that this is over double that > of an 8.2 buildworld on the same machine with the same > kernel > ULE: 11m12.76s real 1h27m59.39s user 28m59.57s sys > > This was run 9.0-PRERELEASE kernel due to 4BSD panicing > on boot under 8.2. > > So for this use ULE vs 4BSD is neither here-nor-there > but 9.0 buildworld is very slow (x2 slower) compared > with 8.2 so whats a bigger question in my mind. > >Regards >Steve > All of our 8.2-STABLE with ncpu >= 4 compile the OS in half the time a compilation of FreeBSD 9/10 is needed to. I guess this is due to the huge LLVM contribution which is now part of the source tree. Even if you allow building a whole LLVM suite (and not even pieces of it as in FreeBSD standard for CLANG purposes), it takes another q0 to 20 minutes, depending on the architecture of the underlying host. Building kernel or worl, taking time and show then the invers of that number isn't a good idea, in my opinion. Therefore I like "artificial" benchmarks: have a set of programs that can be compiled and take the time if compilation time is important. Well, your one-shot test would show, that there is indeed a marginal advantage of SCHED_ULE, if the number of cores is big enough (as said to be n > 2 in this thread). But I'm a bit disappointed about the very small advantage on that 24 core hog. Oliver signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
On Dec 15, 2011, at 6:26 PM, Attilio Rao wrote: > 2011/12/13 Daniel Kalchev : >> >> >> On 13.12.11 09:36, Jeremy Chadwick wrote: >>> >>> I personally would find it interesting if someone with a higher-end system >>> (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test >>> (changing -jX to -j{numofcores} of course). >> >> >> Is 4 way 8 core Opteron ok? That is 32 cores, 64GB RAM. >> >> Testing with buildworld in my opinion is not adequate, as it involves way >> too much I/O. Any advice on proper testing methodology? > > I'm sure that I/O and pmap subsystem contention (because of > buildworld) and TLB shootdown overhead (because of 32 CPUs) will be so > overwhelming that you are not really going to benchmark the scheduler > activity at all. Can't pmap / TLB be tuned for 32 CPUs and 64GB of RAM? > > However I still don't get what you want to verify exactly? The obvious: is SCHED_ULE better or worse than SCHED_4BSD on such platform. Problem is how to test "interactivity" -- that is a blade server and doesn't really have a display and keyboard, nor does it have X etc. I have spare pair of those, that might be put to crunch tests to see how things compare for different scenarios - but I need ideas what to test, really. Daniel___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote: > 2011/12/13 Jeremy Chadwick : > > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: > >> > Not fully right, boinc defaults to run on idprio 31 so this isn't an > >> > issue. And yes, there are cases where SCHED_ULE shows much better > >> > performance then SCHED_4BSD. ??[...] > >> > >> Do we have any proof at hand for such cases where SCHED_ULE performs > >> much better than SCHED_4BSD? Whenever the subject comes up, it is > >> mentioned, that SCHED_ULE has better performance on boxes with a ncpu > > >> 2. But in the end I see here contradictionary statements. People > >> complain about poor performance (especially in scientific environments), > >> and other give contra not being the case. > >> > >> Within our department, we developed a highly scalable code for planetary > >> science purposes on imagery. It utilizes present GPUs via OpenCL if > >> present. Otherwise it grabs as many cores as it can. > >> By the end of this year I'll get a new desktop box based on Intels new > >> Sandy Bridge-E architecture with plenty of memory. If the colleague who > >> developed the code is willing performing some benchmarks on the same > >> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most > >> recent Suse. For FreeBSD I intent also to look for performance with both > >> different schedulers available. > > > > This is in no way shape or form the same kind of benchmark as what > > you're planning to do, but I thought I'd throw it out there for folks to > > take in as they see fit. > > > > I know folks were focused mainly on buildworld. > > > > I personally would find it interesting if someone with a higher-end > > system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the > > same test (changing -jX to -j{numofcores} of course). > > > > -- > > | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at > > parodius.com | > > | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? > > http://www.parodius.com/ | > > | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, > > US | > > | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP 4BD6C0CB > > | > > > > > > sched_ule > > === > > - time make -j2 buildworld > > ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w > > - time make -j2 buildkernel > > ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w > > > > > > sched_4bsd > > > > - time make -j2 buildworld > > ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w > > - time make -j2 buildkernel > > ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w > > > > > > software > > == > > * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011 > > * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 > > Hi Jeremy, > thanks for the time you spent on this. > > However, I wanted to ask/let you note 3 things: > 1) Did you use 2 different code base for the test? (one updated on > December 1 and another one on December 12) No; src-all (/usr/src on this system) was not updated between December 1st and December 12th PST. I do believe I updated it today (15th PST). I can/will obviously hold off so that we have a consistent code base for comparing numbers between schedulers during buildworld and/or buildkernel. > 2) Please note that you should have repeated this test several times > (basically until you don't get a standard deviation which is > acceptable with ministat) and report the ministat output This is the first time I have heard of ministat(1). I'm pretty sure I see what it's for and how it applies to this situation, but boy that man page could use some clarification (I have 3 people looking at this thing right now trying to figure out what means what in the graph :-) ). Anyway, graph or not, I see the point. Regarding multiple tests: yup, you're absolutely right, the only way to do it would be to run a sequence of tests repeatedly (probably 10 per scheduler). Reboots and rm -fr /usr/obj/* would be required after each test too, to guarantee empty kernel caches (of all types) consistently every time. What I posted was supposed to give people just a "general idea" if there was any gigantic difference between the two, and there really isn't. But, as others have stated (and you below), buildworld may not be an effective way to "benchmark" what we're trying to test. Hence me wondering exactly what would make for a good test. Example: 1. Run + background some program that "beats on things" (I really don't know what; creation/deletion of threads? CPU benchmark? bonnie++?), with output going to /dev/null. 2. Run + background "time make -j2 buildworld" with output going to /dev/null 3. Record/save output from "time". 4. rm -fr /usr/obj && shutdown -r now 5. Repeat all steps ~10 times 6. Adjust kernel configuration file to use other scheduler 7. Repeat steps 1-5.
Re: SCHED_ULE should not be the default
On 15/12/2011 14:20, Steven Hartland wrote: So for this use ULE vs 4BSD is neither here-nor-there but 9.0 buildworld is very slow (x2 slower) compared with 8.2 so whats a bigger question in my mind. clang is new in 9.0 and takes a long time to build. -- Bruce Cran ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
2011/12/15 Mike Tancsa : > On 12/15/2011 11:42 AM, Attilio Rao wrote: >> >> I'm thinking now to a better test-case for this: can you try that on a >> tmpfs volume? > > There is enough RAM in the box so that it should not touch the disk, and > I was sending the output to /dev/null, so it was not writing to the disk. > >> >> Also what filesystem you were using? > > UFS > >> How many CPUs were in place? > > 4 > >> Did you reboot before to move the steal_thresh value? > > No. So, as very first thing, can you try the following: - Same codebase, etc. etc. - Make the test 4 times, discard the first and ministat for the other 3 - Reboot - Change the steal_thresh value - Make the test 4 times, discard the first and ministat for the other 3 Then report discarded values and the ministated one and we will have more informations I guess (also, I don't think devfs contention should play a role here, thus nevermind about it for now). Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/15/2011 11:42 AM, Attilio Rao wrote: > > I'm thinking now to a better test-case for this: can you try that on a > tmpfs volume? There is enough RAM in the box so that it should not touch the disk, and I was sending the output to /dev/null, so it was not writing to the disk. > > Also what filesystem you were using? UFS > How many CPUs were in place? 4 > Did you reboot before to move the steal_thresh value? No. ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
2011/12/15 Mike Tancsa : > On 12/15/2011 11:26 AM, Attilio Rao wrote: >> >> Hi Mike, >> was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE? > > Hi Attilio, > It was the same codebase. > > >> Could you retry the bench checking CPU usage and possible thread >> migration around for both cases? > > I can, but how do I do that ? I'm thinking now to a better test-case for this: can you try that on a tmpfs volume? Also what filesystem you were using? How many CPUs were in place? Did you reboot before to move the steal_thresh value? Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/15/2011 11:26 AM, Attilio Rao wrote: > > Hi Mike, > was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE? Hi Attilio, It was the same codebase. > Could you retry the bench checking CPU usage and possible thread > migration around for both cases? I can, but how do I do that ? ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
2011/12/13 Jeremy Chadwick : > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: >> > Not fully right, boinc defaults to run on idprio 31 so this isn't an >> > issue. And yes, there are cases where SCHED_ULE shows much better >> > performance then SCHED_4BSD. [...] >> >> Do we have any proof at hand for such cases where SCHED_ULE performs >> much better than SCHED_4BSD? Whenever the subject comes up, it is >> mentioned, that SCHED_ULE has better performance on boxes with a ncpu > >> 2. But in the end I see here contradictionary statements. People >> complain about poor performance (especially in scientific environments), >> and other give contra not being the case. >> >> Within our department, we developed a highly scalable code for planetary >> science purposes on imagery. It utilizes present GPUs via OpenCL if >> present. Otherwise it grabs as many cores as it can. >> By the end of this year I'll get a new desktop box based on Intels new >> Sandy Bridge-E architecture with plenty of memory. If the colleague who >> developed the code is willing performing some benchmarks on the same >> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most >> recent Suse. For FreeBSD I intent also to look for performance with both >> different schedulers available. > > This is in no way shape or form the same kind of benchmark as what > you're planning to do, but I thought I'd throw it out there for folks to > take in as they see fit. > > I know folks were focused mainly on buildworld. > > I personally would find it interesting if someone with a higher-end > system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the > same test (changing -jX to -j{numofcores} of course). > > -- > | Jeremy Chadwick jdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, US | > | Making life hard for others since 1977. PGP 4BD6C0CB | > > > sched_ule > === > - time make -j2 buildworld > 1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w > - time make -j2 buildkernel > 640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w > > > sched_4bsd > > - time make -j2 buildworld > 1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w > - time make -j2 buildkernel > 638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w > > > software > == > * sched_ule test: FreeBSD 8.2-STABLE, Thu Dec 1 04:37:29 PST 2011 > * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011 Hi Jeremy, thanks for the time you spent on this. However, I wanted to ask/let you note 3 things: 1) Did you use 2 different code base for the test? (one updated on December 1 and another one on December 12) 2) Please note that you should have repeated this test several times (basically until you don't get a standard deviation which is acceptable with ministat) and report the ministat output 3) The difference is less than 2% which I suspect is really statistically unuseful/the same I'm not really even surprised ULE is not faster than 4BSD in this case because usually buildworld/buildkernel tests are driven for the vast majority by I/O overhead rather than scheduler capacity. It would be more interesting to analyze how buildworld does while another type of workload is going on. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
2011/12/13 Daniel Kalchev : > > > On 13.12.11 09:36, Jeremy Chadwick wrote: >> >> I personally would find it interesting if someone with a higher-end system >> (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test >> (changing -jX to -j{numofcores} of course). > > > Is 4 way 8 core Opteron ok? That is 32 cores, 64GB RAM. > > Testing with buildworld in my opinion is not adequate, as it involves way > too much I/O. Any advice on proper testing methodology? I'm sure that I/O and pmap subsystem contention (because of buildworld) and TLB shootdown overhead (because of 32 CPUs) will be so overwhelming that you are not really going to benchmark the scheduler activity at all. However I still don't get what you want to verify exactly? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
2011/12/14 Mike Tancsa : > On 12/13/2011 7:01 PM, m...@freebsd.org wrote: >> >> Has anyone experiencing problems tried to set sysctl >> kern.sched.steal_thresh=1 ? >> >> I don't remember what our specific problem at $WORK was, perhaps it >> was just interrupt threads not getting serviced fast enough, but we've >> hard-coded this to 1 and removed the code that sets it in >> sched_initticks(). The same effect should be had by setting the >> sysctl after a box is up. > > FWIW, this does impact the performance of pbzip2 on an i7. Using a 1.1G file > > pbzip2 -v -c big > /dev/null > > with burnP6 running in the background, > > sysctl kern.sched.steal_thresh=1 > vs > sysctl kern.sched.steal_thresh=3 > > > > N Min Max Median Avg Stddev > x 10 38.005022 38.42238 38.194648 38.165052 0.15546188 > + 9 38.695417 40.595544 39.392127 39.435384 0.59814114 > Difference at 95.0% confidence > 1.27033 +/- 0.412636 > 3.32852% +/- 1.08119% > (Student's t, pooled s = 0.425627) > > a value of 1 is *slightly* faster. Hi Mike, was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE? Also, the results here should be in the 3% interval for the avg case, which is not yet at the 'alarm level' but could still be an indication. I still suspect I/O plays a big role here, however, thus it could be detemined by other factors. Could you retry the bench checking CPU usage and possible thread migration around for both cases? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
2011/12/9 George Mitchell : > dnetc is an open-source program from http://www.distributed.net/. It > tries a brute-force approach to cracking RC4 puzzles and also computes > optimal Golomb rulers. It starts up one process per CPU and runs at > nice 20 and is, for all intents and purposes, 100% compute bound. [Posting on the first message of the thread] I basically went through all the e-mail you just sent and identified 4 real report on which we could work on and summarizied in the attached Excel file. I'd like that George, Steve, Doug, Andrey and Mike possibly review the few datas there and add more, if they want, or make more important clarifications in particular about the Xorg presence (or rather not) in their workload. I've readed a couple of message in the thread pointing the finger to Xorg to be excessively CPU-intensive and I think they are right, we might try to find a solution for that at some point, but it is really a very edge case. Geroge's and Steve's case, instead, look very different from this and I want to analyze them in detail. George already provided schedgraph traces and for others, if they cannot provide them directly, I'd really appreciate they would at least describe in detail the workload so that I get a chance to reproduce it. If someone else thinks he has a specific problem that is not characterized by one of the cases above please let me know and I will put this in the chart. Thanks for the hard work you guys put in pointing out ULE's problem, I think we will get at the bottom of this if we keep up sharing thoughts and reports. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 10:32 AM, Steven Hartland wrote: > Lars Engels wrote: >> >> 9.0 ships with gcc and clang which both need to be compiled, 8.2 only >> has gcc. > > > Ahh, any reason we need both, and is it possible to disable clang? man src.conf add WITHOUT_CLANG=yes to /etc/src.conf -- Eitan Adler ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
Lars Engels wrote: 9.0 ships with gcc and clang which both need to be compiled, 8.2 only has gcc. Ahh, any reason we need both, and is it possible to disable clang? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 02:20:04PM -, Steven Hartland wrote: > With all the discussion I thought I'd give a buildworld > benchmark a go here on a spare 24 core machine. ULE > tested fine but with 4BSD it wont even boot panicing > with the following:- > http://screensnapr.com/v/hwysGV.png > > This is on a clean 8.2-RELEASE-p4 > > Upgrading to RELENG_9 fixed this but its a bit concerning > that just changing the scheduler would cause the machine > to panic on boot. > > Its only a single run so varience could be high but here's > the result of a buildworld on this machine running the > two different schedulers:- > 4BSD: 24m54.10s real 2h43m12.42s user 56m20.07s sys > ULE: 23m54.68s real 2h34m59.04s user 50m59.91s sys > > What really sticks out is that this is over double that > of an 8.2 buildworld on the same machine with the same > kernel > ULE: 11m12.76s real 1h27m59.39s user 28m59.57s sys 9.0 ships with gcc and clang which both need to be compiled, 8.2 only has gcc. > > This was run 9.0-PRERELEASE kernel due to 4BSD panicing > on boot under 8.2. > > So for this use ULE vs 4BSD is neither here-nor-there > but 9.0 buildworld is very slow (x2 slower) compared > with 8.2 so whats a bigger question in my mind. > > Regards > Steve pgpBrx1B7rCED.pgp Description: PGP signature
Re: SCHED_ULE should not be the default
With all the discussion I thought I'd give a buildworld benchmark a go here on a spare 24 core machine. ULE tested fine but with 4BSD it wont even boot panicing with the following:- http://screensnapr.com/v/hwysGV.png This is on a clean 8.2-RELEASE-p4 Upgrading to RELENG_9 fixed this but its a bit concerning that just changing the scheduler would cause the machine to panic on boot. Its only a single run so varience could be high but here's the result of a buildworld on this machine running the two different schedulers:- 4BSD: 24m54.10s real 2h43m12.42s user 56m20.07s sys ULE: 23m54.68s real 2h34m59.04s user 50m59.91s sys What really sticks out is that this is over double that of an 8.2 buildworld on the same machine with the same kernel ULE: 11m12.76s real 1h27m59.39s user 28m59.57s sys This was run 9.0-PRERELEASE kernel due to 4BSD panicing on boot under 8.2. So for this use ULE vs 4BSD is neither here-nor-there but 9.0 buildworld is very slow (x2 slower) compared with 8.2 so whats a bigger question in my mind. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 12:42 AM, Jeremy Chadwick wrote: > On Thu, Dec 15, 2011 at 12:39:50AM +0100, O. Hartmann wrote: >> On 12/14/11 18:54, Tom Evans wrote: >> > I believe the correct thing to do is to put some extra documentation >> > into the handbook about scheduler choice, noting the potential issues >> > with loading NCPU+1 CPU bound processes. Perhaps making it easier to >> > switch scheduler would also help? > > Replying to Tom's comment here: > > It is already easy to switch schedulers. You change the option in your > kernel config, rebuild kernel (world isn't necessary as long as you > haven't csup'd between your last rebuild and now), make installkernel, > shutdown -r now, done. Your definition of 'easy' differs wildly from mine. How is that in any way 'easy' to do across 200 servers? > > If what you're proposing is to make the scheduler changeable in > real-time? I think that would require a **lot** of work for something > that very few people would benefit from (please stop for a moment and > think about the majority of the userbase, not just niche environments; I > say this politely, not with any condescension BTW). Sure, it'd be > "nice to have", but should be extremely low on the priority list (IMO). Real time scheduler changing would be insane! I was thinking that both/any/all schedulers could be compiled into the kernel, and the choice of which one to use becomes a boot time configuration. You don't have to recompile the kernel to change timecounter. Cheers Tom ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 15/12/2011 00:42, Jeremy Chadwick wrote: > It is already easy to switch schedulers. You change the option in your > kernel config, rebuild kernel (world isn't necessary as long as you > haven't csup'd between your last rebuild and now), make installkernel, > shutdown -r now, done. > > If what you're proposing is to make the scheduler changeable in > real-time? I think that would require a **lot** of work for something > that very few people would benefit from (please stop for a moment and > think about the majority of the userbase, not just niche environments; I > say this politely, not with any condescension BTW). Sure, it'd be > "nice to have", but should be extremely low on the priority list (IMO). Somewhere in between might be a good idea it seems to me: viz, change a setting in loader.conf and reboot to switch to a new scheduler. Having to juggle different kernels is no big deal for the likes of you and me, but it is quite a barrier in many environments. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
Jeremy Chadwick wrote: > It is already easy to switch schedulers. You change the > option in your kernel config, rebuild kernel (world isn't > necessary as long as you haven't csup'd between your last > rebuild and now), make installkernel, shutdown -r now, > done. and you have thereby shot freebsd-update in the foot, because you are no longer using a generic kernel. > If what you're proposing is to make the scheduler changeable > in real-time? I think that would require a **lot** of work > for something that very few people would benefit from ... Switching on the fly sounds frightfully difficult, as long as 4BSD and ULE are separate code bases. (It might not be so bad if a tunable or 3 could be added to ULE, so that it could be configured to behave like 4BSD.) However, the freebsd-update complication could in principle be relieved by building both schedulers into the generic kernel, with the choice being configurable in loader.conf. It would still take a reboot to switch, but not a kernel rebuild. Of course there may be practical issues, e.g. name collisions. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
В Thu, 15 Dec 2011 03:05:12 +0100 Oliver Pinter пишет: > On 12/15/11, O. Hartmann wrote: > > On 12/14/11 18:54, Tom Evans wrote: > >> On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell > >> wrote: > >>> > >>> Dear Secret Masters of FreeBSD: Can we have a decision on whether > >>> to change back to SCHED_4BSD while SCHED_ULE gets properly fixed? > >>> > >> > >> Please do not do this. This thread has shown that ULE performs > >> poorly in very specific scenarios where the server is loaded with > >> NCPU+1 CPU bound processes, and brought forward more complaints > >> about interactivity in X (I've never noticed this, and use a > >> FreeBSD desktop daily). > > > > I would highly appreciate a decission against SCHED_ULE as the > > default scheduler! SCHED_4BSD is considered a more mature entity > > and obviously it seems that SCHED_ULE needs some refinements to > > achieve a better level of quality. > > > >> > >> On the other hand, we have very many benchmarks showing how poorly > >> 4BSD scales on things like postgresql. We get much more load out of > >> our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's > >> easy to look at what you do and say "well, what suits my > >> environment is clearly the best default", but I think there are > >> probably more users typically running IO bound processes than CPU > >> bound processes. > > > > You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? > > Shouldn't you compare SCHED_ULE and SCHED_4BSD on the very same > > platform? > > > > Development of SCHED_ULE has been focused very much on DB like > > PostgreSQL, no wonder the performance benefit. But this is also a > > very specific scneario where SCHED_ULE shows a real benefit > > compared to SCHED_4BSD. > > > >> > >> I believe the correct thing to do is to put some extra > >> documentation into the handbook about scheduler choice, noting the > >> potential issues with loading NCPU+1 CPU bound processes. Perhaps > >> making it easier to switch scheduler would also help? > > > > Many people more experst in the issue than myself revealed some > > issues in the code of both SCHED_ULE and even SCHED_4BSD. It would > > be a pitty if all the discussions get flushed away like a > > "toilette-busisness" as it has been done all the way in the past. > > > > > > Well, I'd like to see a kind of "standardized" benchmark. Like on > > openbenchmark.org or at phoronix.com. I know that Phoronix' way of > > performing benchmarks is questionable and do not reveal much of the > > issues, but it is better than nothing. I'm always surprised by the > > worse performance of FreeBSD when it comes to threaded I/O. The > > differences between Linux and FreeBSD of the same development > > maturity are tremendous and scaring! > > > > It is a long time since I saw a SPEC benchmark on a FreeBSD driven > > HPC box. Most benchmark around for testing hardware are performed > > with Linux and Linux seems to make the race in nearly every > > scenario. It would be highly appreciable and interesting to see how > > Linux and FreeBSD would perform in SPEC on the same hardware > > platform. This is only an idea. Without a suitable benchmark with a > > codebase understood the discussion is in many aspects pointless > > -both ways. > > > > > >> > >> Cheers > >> > >> Tom > >> > >> References: > >> > >> http://people.freebsd.org/~kris/scaling/mysql-freebsd.png > >> http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade > >> ___ > > > > > > Hi! > > Can you try with this settings: > > op@opn ~> sysctl kern.sched. > kern.sched.cpusetsize: 8 > kern.sched.preemption: 0 > kern.sched.name: ULE > kern.sched.slice: 13 > kern.sched.interact: 30 > kern.sched.preempt_thresh: 224 > kern.sched.static_boost: 152 > kern.sched.idlespins: 1 > kern.sched.idlespinthresh: 16 > kern.sched.affinity: 1 > kern.sched.balance: 1 > kern.sched.balance_interval: 133 > kern.sched.steal_htt: 1 > kern.sched.steal_idle: 1 > kern.sched.steal_thresh: 1 > kern.sched.topology_spec: > > 0, 1 > > > 0, 1 > > > > > > Most of them from 7-STABLE settings, and with this, "works for me". > This an laptop with core2 duo cpu (with enabled powerd), and my kernel > config is here: > http://oliverp.teteny.bme.hu/freebsd/kernel_conf And you try to do like there http://www.youtube.com/watch?v=1CLCp-dqWu0 what would your the cursor mouse and Xorg NOT froze for a split second or more... And I'll see how really good your ULE ;) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 15.12.11 01:39, O. Hartmann wrote: On 12/14/11 18:54, Tom Evans wrote: On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell wrote: Dear Secret Masters of FreeBSD: Can we have a decision on whether to change back to SCHED_4BSD while SCHED_ULE gets properly fixed? Please do not do this. This thread has shown that ULE performs poorly in very specific scenarios where the server is loaded with NCPU+1 CPU bound processes, and brought forward more complaints about interactivity in X (I've never noticed this, and use a FreeBSD desktop daily). I would highly appreciate a decission against SCHED_ULE as the default scheduler! SCHED_4BSD is considered a more mature entity and obviously it seems that SCHED_ULE needs some refinements to achieve a better level of quality. My logic would be, if SCHED_ULE works better on multi-CPU systems, or if SCHED_4BSD works poor on multi-CPU systems, then by all means keep SCHED_ULE as default scheduler. We are at the end of 2011 and the number of single or dual core CPU systems is decreasing. Most people would just try the newest FreeBSD version on their newest hardware and on that base make an "informed" decision if it is worth it. If on newer hardware SCHED_ULE gives better performance, then again it should be the default. Then, FreeBSD is used in an extremely wide set fo different environments. What scheduler might benefit an one CPU, simple architecture X workstation may be damaging for the performance of multiple CPU, NUMA based server with a large number of non-interactive processes running. Perhaps an knob should be provided with sufficient documentation for those that will not go forward to recompile the kernel (the majority of users, I would guess). I tried switching my RELENG8 desktop from SCHED_ULE to SCHED_4BSD yesterday and cannot see any measurable difference in responsiveness. My 'stress test' is typically an FLASH game, that get's firefox in an almost unresponsive state, eats one of the CPU cores -- but no difference. Well, FLASH has it's own set of problems on FreeBSD, but these are typical "desktop" uses. Running 100% compute intensive processes in background is not. Daniel PS: As to why Linux is "better" in these usages: they do not care much to do things "right", but rather to achieve performance. In my opinion, most of us are with FreeBSD for the "do it right" attitude. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/15/11, Jeremy Chadwick wrote: > On Thu, Dec 15, 2011 at 03:05:12AM +0100, Oliver Pinter wrote: >> On 12/15/11, O. Hartmann wrote: >> > On 12/14/11 18:54, Tom Evans wrote: >> >> On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell >> >> wrote: >> >>> >> >>> Dear Secret Masters of FreeBSD: Can we have a decision on whether to >> >>> change back to SCHED_4BSD while SCHED_ULE gets properly fixed? >> >>> >> >> >> >> Please do not do this. This thread has shown that ULE performs poorly >> >> in very specific scenarios where the server is loaded with NCPU+1 CPU >> >> bound processes, and brought forward more complaints about >> >> interactivity in X (I've never noticed this, and use a FreeBSD desktop >> >> daily). >> > >> > I would highly appreciate a decission against SCHED_ULE as the default >> > scheduler! SCHED_4BSD is considered a more mature entity and obviously >> > it seems that SCHED_ULE needs some refinements to achieve a better level >> > of quality. >> > >> >> >> >> On the other hand, we have very many benchmarks showing how poorly >> >> 4BSD scales on things like postgresql. We get much more load out of >> >> our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's >> >> easy to look at what you do and say "well, what suits my environment >> >> is clearly the best default", but I think there are probably more >> >> users typically running IO bound processes than CPU bound processes. >> > >> > You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't >> > you compare SCHED_ULE and SCHED_4BSD on the very same platform? >> > >> > Development of SCHED_ULE has been focused very much on DB like >> > PostgreSQL, no wonder the performance benefit. But this is also a very >> > specific scneario where SCHED_ULE shows a real benefit compared to >> > SCHED_4BSD. >> > >> >> >> >> I believe the correct thing to do is to put some extra documentation >> >> into the handbook about scheduler choice, noting the potential issues >> >> with loading NCPU+1 CPU bound processes. Perhaps making it easier to >> >> switch scheduler would also help? >> > >> > Many people more experst in the issue than myself revealed some issues >> > in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty >> > if all the discussions get flushed away like a "toilette-busisness" as >> > it has been done all the way in the past. >> > >> > >> > Well, I'd like to see a kind of "standardized" benchmark. Like on >> > openbenchmark.org or at phoronix.com. I know that Phoronix' way of >> > performing benchmarks is questionable and do not reveal much of the >> > issues, but it is better than nothing. I'm always surprised by the worse >> > performance of FreeBSD when it comes to threaded I/O. The differences >> > between Linux and FreeBSD of the same development maturity are >> > tremendous and scaring! >> > >> > It is a long time since I saw a SPEC benchmark on a FreeBSD driven HPC >> > box. Most benchmark around for testing hardware are performed with Linux >> > and Linux seems to make the race in nearly every scenario. It would be >> > highly appreciable and interesting to see how Linux and FreeBSD would >> > perform in SPEC on the same hardware platform. This is only an idea. >> > Without a suitable benchmark with a codebase understood the discussion >> > is in many aspects pointless -both ways. >> > >> > >> >> >> >> Cheers >> >> >> >> Tom >> >> >> >> References: >> >> >> >> http://people.freebsd.org/~kris/scaling/mysql-freebsd.png >> >> http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade >> >> ___ >> >> Hi! >> >> Can you try with this settings: >> op@opn ~> sysctl kern.sched. > > I'm replying with a list of each setting which differs compared to > RELENG_8 stock on our ULE systems. Note that our ULE systems are 1 > physical CPU with 4 cores. On other system that has 4 core I use 7-STABLE, because I have not enough time for upgraded it, and the system has some custom patches. The values what I send in previous mail mostly based on this 4 cores system. > >> kern.sched.cpusetsize: 8 > > I see no such tunable/sysctl on any of our RELENG_8 and RELENG_7 > systems. Nor do I find any references to it in /usr/src (on any > system). Is this a RELENG_9 setting? Please explain where it comes > from. I hope it's not a custom kernel patch... Yes, this is 9-STABLE. > >> kern.sched.preemption: 0 > > This differs; default value is 1. PREEMPTION is disabled via kernel config. > >> kern.sched.name: ULE >> kern.sched.slice: 13 >> kern.sched.interact: 30 > >> kern.sched.preempt_thresh: 224 > > This differs; default value is 64. The "magic value" of 224 has been > discussed in the past, in this thread even. This magic value has discussed before 1 or 1.5 year here, first for 8-STABLE. > >> kern.sched.static_boost: 152 > > This differs; on our systems it's 160. > >> kern.sched.idlespins: 1 > >> kern.sched.idlespinthresh: 16 > > This differs; on our sys
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 03:05:12AM +0100, Oliver Pinter wrote: > On 12/15/11, O. Hartmann wrote: > > On 12/14/11 18:54, Tom Evans wrote: > >> On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell > >> wrote: > >>> > >>> Dear Secret Masters of FreeBSD: Can we have a decision on whether to > >>> change back to SCHED_4BSD while SCHED_ULE gets properly fixed? > >>> > >> > >> Please do not do this. This thread has shown that ULE performs poorly > >> in very specific scenarios where the server is loaded with NCPU+1 CPU > >> bound processes, and brought forward more complaints about > >> interactivity in X (I've never noticed this, and use a FreeBSD desktop > >> daily). > > > > I would highly appreciate a decission against SCHED_ULE as the default > > scheduler! SCHED_4BSD is considered a more mature entity and obviously > > it seems that SCHED_ULE needs some refinements to achieve a better level > > of quality. > > > >> > >> On the other hand, we have very many benchmarks showing how poorly > >> 4BSD scales on things like postgresql. We get much more load out of > >> our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's > >> easy to look at what you do and say "well, what suits my environment > >> is clearly the best default", but I think there are probably more > >> users typically running IO bound processes than CPU bound processes. > > > > You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't > > you compare SCHED_ULE and SCHED_4BSD on the very same platform? > > > > Development of SCHED_ULE has been focused very much on DB like > > PostgreSQL, no wonder the performance benefit. But this is also a very > > specific scneario where SCHED_ULE shows a real benefit compared to > > SCHED_4BSD. > > > >> > >> I believe the correct thing to do is to put some extra documentation > >> into the handbook about scheduler choice, noting the potential issues > >> with loading NCPU+1 CPU bound processes. Perhaps making it easier to > >> switch scheduler would also help? > > > > Many people more experst in the issue than myself revealed some issues > > in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty > > if all the discussions get flushed away like a "toilette-busisness" as > > it has been done all the way in the past. > > > > > > Well, I'd like to see a kind of "standardized" benchmark. Like on > > openbenchmark.org or at phoronix.com. I know that Phoronix' way of > > performing benchmarks is questionable and do not reveal much of the > > issues, but it is better than nothing. I'm always surprised by the worse > > performance of FreeBSD when it comes to threaded I/O. The differences > > between Linux and FreeBSD of the same development maturity are > > tremendous and scaring! > > > > It is a long time since I saw a SPEC benchmark on a FreeBSD driven HPC > > box. Most benchmark around for testing hardware are performed with Linux > > and Linux seems to make the race in nearly every scenario. It would be > > highly appreciable and interesting to see how Linux and FreeBSD would > > perform in SPEC on the same hardware platform. This is only an idea. > > Without a suitable benchmark with a codebase understood the discussion > > is in many aspects pointless -both ways. > > > > > >> > >> Cheers > >> > >> Tom > >> > >> References: > >> > >> http://people.freebsd.org/~kris/scaling/mysql-freebsd.png > >> http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade > >> ___ > > Hi! > > Can you try with this settings: > op@opn ~> sysctl kern.sched. I'm replying with a list of each setting which differs compared to RELENG_8 stock on our ULE systems. Note that our ULE systems are 1 physical CPU with 4 cores. > kern.sched.cpusetsize: 8 I see no such tunable/sysctl on any of our RELENG_8 and RELENG_7 systems. Nor do I find any references to it in /usr/src (on any system). Is this a RELENG_9 setting? Please explain where it comes from. I hope it's not a custom kernel patch... > kern.sched.preemption: 0 This differs; default value is 1. > kern.sched.name: ULE > kern.sched.slice: 13 > kern.sched.interact: 30 > kern.sched.preempt_thresh: 224 This differs; default value is 64. The "magic value" of 224 has been discussed in the past, in this thread even. > kern.sched.static_boost: 152 This differs; on our systems it's 160. > kern.sched.idlespins: 1 > kern.sched.idlespinthresh: 16 This differs; on our systems it's 4. > Most of them from 7-STABLE settings, and with this, "works for me". > This an laptop with core2 duo cpu (with enabled powerd), and my kernel > config is here: > http://oliverp.teteny.bme.hu/freebsd/kernel_conf -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Re: SCHED_ULE should not be the default
On 12/15/11, O. Hartmann wrote: > On 12/14/11 18:54, Tom Evans wrote: >> On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell >> wrote: >>> >>> Dear Secret Masters of FreeBSD: Can we have a decision on whether to >>> change back to SCHED_4BSD while SCHED_ULE gets properly fixed? >>> >> >> Please do not do this. This thread has shown that ULE performs poorly >> in very specific scenarios where the server is loaded with NCPU+1 CPU >> bound processes, and brought forward more complaints about >> interactivity in X (I've never noticed this, and use a FreeBSD desktop >> daily). > > I would highly appreciate a decission against SCHED_ULE as the default > scheduler! SCHED_4BSD is considered a more mature entity and obviously > it seems that SCHED_ULE needs some refinements to achieve a better level > of quality. > >> >> On the other hand, we have very many benchmarks showing how poorly >> 4BSD scales on things like postgresql. We get much more load out of >> our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's >> easy to look at what you do and say "well, what suits my environment >> is clearly the best default", but I think there are probably more >> users typically running IO bound processes than CPU bound processes. > > You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't > you compare SCHED_ULE and SCHED_4BSD on the very same platform? > > Development of SCHED_ULE has been focused very much on DB like > PostgreSQL, no wonder the performance benefit. But this is also a very > specific scneario where SCHED_ULE shows a real benefit compared to > SCHED_4BSD. > >> >> I believe the correct thing to do is to put some extra documentation >> into the handbook about scheduler choice, noting the potential issues >> with loading NCPU+1 CPU bound processes. Perhaps making it easier to >> switch scheduler would also help? > > Many people more experst in the issue than myself revealed some issues > in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty > if all the discussions get flushed away like a "toilette-busisness" as > it has been done all the way in the past. > > > Well, I'd like to see a kind of "standardized" benchmark. Like on > openbenchmark.org or at phoronix.com. I know that Phoronix' way of > performing benchmarks is questionable and do not reveal much of the > issues, but it is better than nothing. I'm always surprised by the worse > performance of FreeBSD when it comes to threaded I/O. The differences > between Linux and FreeBSD of the same development maturity are > tremendous and scaring! > > It is a long time since I saw a SPEC benchmark on a FreeBSD driven HPC > box. Most benchmark around for testing hardware are performed with Linux > and Linux seems to make the race in nearly every scenario. It would be > highly appreciable and interesting to see how Linux and FreeBSD would > perform in SPEC on the same hardware platform. This is only an idea. > Without a suitable benchmark with a codebase understood the discussion > is in many aspects pointless -both ways. > > >> >> Cheers >> >> Tom >> >> References: >> >> http://people.freebsd.org/~kris/scaling/mysql-freebsd.png >> http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade >> ___ > > Hi! Can you try with this settings: op@opn ~> sysctl kern.sched. kern.sched.cpusetsize: 8 kern.sched.preemption: 0 kern.sched.name: ULE kern.sched.slice: 13 kern.sched.interact: 30 kern.sched.preempt_thresh: 224 kern.sched.static_boost: 152 kern.sched.idlespins: 1 kern.sched.idlespinthresh: 16 kern.sched.affinity: 1 kern.sched.balance: 1 kern.sched.balance_interval: 133 kern.sched.steal_htt: 1 kern.sched.steal_idle: 1 kern.sched.steal_thresh: 1 kern.sched.topology_spec: 0, 1 0, 1 Most of them from 7-STABLE settings, and with this, "works for me". This an laptop with core2 duo cpu (with enabled powerd), and my kernel config is here: http://oliverp.teteny.bme.hu/freebsd/kernel_conf ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Thu, Dec 15, 2011 at 12:39:50AM +0100, O. Hartmann wrote: > On 12/14/11 18:54, Tom Evans wrote: > > On the other hand, we have very many benchmarks showing how poorly > > 4BSD scales on things like postgresql. We get much more load out of > > our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's > > easy to look at what you do and say "well, what suits my environment > > is clearly the best default", but I think there are probably more > > users typically running IO bound processes than CPU bound processes. > > You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't > you compare SCHED_ULE and SCHED_4BSD on the very same platform? Agreed -- this is a bad comparison. Again, I'm going to tell people to do the one thing that's painful and nobody likes to do: *look at commits* and pay close attention to the branches and any commits that involve "tagging" for a release (so you can determine what "version" of the code you might be running). http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_ule.c http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_4bsd.c I'm a bit busy today, otherwise I would offer to go over the SCHED_4BSD changes between 7.0-RELEASE and 8.1-RELEASE (I would need Tom to confirm those are the exact versions being used; I wish people would stop saying things like "FreeBSD x.y" because it's inaccurate). But the data is there at the above URLs, including the committers/those involves. > > I believe the correct thing to do is to put some extra documentation > > into the handbook about scheduler choice, noting the potential issues > > with loading NCPU+1 CPU bound processes. Perhaps making it easier to > > switch scheduler would also help? Replying to Tom's comment here: It is already easy to switch schedulers. You change the option in your kernel config, rebuild kernel (world isn't necessary as long as you haven't csup'd between your last rebuild and now), make installkernel, shutdown -r now, done. If what you're proposing is to make the scheduler changeable in real-time? I think that would require a **lot** of work for something that very few people would benefit from (please stop for a moment and think about the majority of the userbase, not just niche environments; I say this politely, not with any condescension BTW). Sure, it'd be "nice to have", but should be extremely low on the priority list (IMO). > Many people more experst in the issue than myself revealed some issues > in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty > if all the discussions get flushed away like a "toilette-busisness" as > it has been done all the way in the past. Gut feeling says this is what will happen, and that's because the people who are (and have in the past) touching the scheduler bits are not involved in this conversation. We're not going to get anywhere unless those people are involved and are available to make adjustments/etc. I would love to start CC'ing them all, but I don't think that's necessarily effective. I will take the time to point out/remind folks that the number of people who *truly understand* the schedulers are few and far between. We're talking single-digit numbers, folks. And those people are already busy enough as-is. This makes solving this problem difficult. So, what I think WOULD be effective would be for someone to catalogue a list of their systems/specifications/benchmarks/software/etc. that show exactly where the problems are in their workspace when using ULE vs. 4BSD, or vice-versa. That may give the developers some leads as to how to progress. Let's also not forget about the compiler ordeal; gcc versions greatly differ (some folks overwrite the default base gcc with ones in ports), and then there's the clang stuff... Sigh. > Well, I'd like to see a kind of "standardized" benchmark. Like on > openbenchmark.org or at phoronix.com. I know that Phoronix' way of > performing benchmarks is questionable and do not reveal much of the > issues, but it is better than nothing. I would love to run such benchmarks on all of our systems, but I have no idea what kind of benchmark suites/etc. would be beneficial for the developers who maintain/touch the schedulers. You understand what I'm saying? For example, some folks earlier in the thread said the best thing to do for this would be buildworld, but then further follow-ups from others said buildworld is not effective given the I/O demands. Furthermore, I want whatever benchmark/app suite thing to be minimal as hell. It should be standalone, no dependencies (or only 1 or 2). Regarding threadsing: a colleague of mine, ex-co-worker who now works at Apple as a developer, wrote a C program while he was at my current workplace which -- pardon my French -- "beat the shit out of our Solaris boxes, thread-wise". It was customisable via command-line. The thing got some of our Solaris machines up to load averages of nearly 42000 (yes you read that right!), and s
Re: SCHED_ULE should not be the default
On 12/14/11 12:54, Tom Evans wrote: [...] This thread has shown that ULE performs poorly in very specific scenarios where the server is loaded with NCPU+1 CPU bound processes, [...] Minor correction: Problem occurs when there are nCPU compute-bound processes, not nCPU + 1.-- George Mitchell ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/14/11 18:54, Tom Evans wrote: > On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell > wrote: >> >> Dear Secret Masters of FreeBSD: Can we have a decision on whether to >> change back to SCHED_4BSD while SCHED_ULE gets properly fixed? >> > > Please do not do this. This thread has shown that ULE performs poorly > in very specific scenarios where the server is loaded with NCPU+1 CPU > bound processes, and brought forward more complaints about > interactivity in X (I've never noticed this, and use a FreeBSD desktop > daily). I would highly appreciate a decission against SCHED_ULE as the default scheduler! SCHED_4BSD is considered a more mature entity and obviously it seems that SCHED_ULE needs some refinements to achieve a better level of quality. > > On the other hand, we have very many benchmarks showing how poorly > 4BSD scales on things like postgresql. We get much more load out of > our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's > easy to look at what you do and say "well, what suits my environment > is clearly the best default", but I think there are probably more > users typically running IO bound processes than CPU bound processes. You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't you compare SCHED_ULE and SCHED_4BSD on the very same platform? Development of SCHED_ULE has been focused very much on DB like PostgreSQL, no wonder the performance benefit. But this is also a very specific scneario where SCHED_ULE shows a real benefit compared to SCHED_4BSD. > > I believe the correct thing to do is to put some extra documentation > into the handbook about scheduler choice, noting the potential issues > with loading NCPU+1 CPU bound processes. Perhaps making it easier to > switch scheduler would also help? Many people more experst in the issue than myself revealed some issues in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty if all the discussions get flushed away like a "toilette-busisness" as it has been done all the way in the past. Well, I'd like to see a kind of "standardized" benchmark. Like on openbenchmark.org or at phoronix.com. I know that Phoronix' way of performing benchmarks is questionable and do not reveal much of the issues, but it is better than nothing. I'm always surprised by the worse performance of FreeBSD when it comes to threaded I/O. The differences between Linux and FreeBSD of the same development maturity are tremendous and scaring! It is a long time since I saw a SPEC benchmark on a FreeBSD driven HPC box. Most benchmark around for testing hardware are performed with Linux and Linux seems to make the race in nearly every scenario. It would be highly appreciable and interesting to see how Linux and FreeBSD would perform in SPEC on the same hardware platform. This is only an idea. Without a suitable benchmark with a codebase understood the discussion is in many aspects pointless -both ways. > > Cheers > > Tom > > References: > > http://people.freebsd.org/~kris/scaling/mysql-freebsd.png > http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade > ___ signature.asc Description: OpenPGP digital signature
Re: SCHED_ULE should not be the default
On Wed, Dec 14, 2011 at 05:54:15PM +, Tom Evans wrote: > brought forward more complaints about interactivity in X (I've never > noticed this, and use a FreeBSD desktop daily). .. that was me, but I forgot to add that it almost never happens, and it can only be triggered when there are processes that want to take up 100% of the CPU running on the system along with X and friends. Don't want to spread FUD, I've been happily using FreeBSD on the desktop for a decade and ULE seems to work great. Marcus ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
I'm not on the Release Engineering Team, and in fact don't have a src commit bit ... but this close to a major release, no, it's too late to change the default. mcl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
В Wed, 14 Dec 2011 21:34:35 +0400 Andrey Chernov пишет: > On Tue, Dec 13, 2011 at 02:22:48AM -0800, Adrian Chadd wrote: > > On 13 December 2011 01:00, Andrey Chernov wrote: > > > > >> If the algorithm ULE does not contain problems - it means the > > >> problem has Core2Duo, or in a piece of code that uses the ULE > > >> scheduler. > > > > > > I observe ULE interactivity slowness even on single core machine > > > (Pentium 4) in very visible places, like 'ps ax' output stucks in > > > the middle by ~1 second. When I switch back to SHED_4BSD, all > > > slowness is gone. > > > > Are you able to provide KTR traces of the scheduler results? > > Something that can be fed to schedgraph? > > Sorry, this machine is not mine anymore. I try SCHED_ULE on Core 2 > Duo instead and don't notice this effect, but it is overall pretty > fast comparing to that Pentium 4. > Give me, please, detailed instructions on how to do it - I'll do it ... Be a shame if this the theme is will end again just only the discussions ... :( ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell wrote: > > Dear Secret Masters of FreeBSD: Can we have a decision on whether to > change back to SCHED_4BSD while SCHED_ULE gets properly fixed? > Please do not do this. This thread has shown that ULE performs poorly in very specific scenarios where the server is loaded with NCPU+1 CPU bound processes, and brought forward more complaints about interactivity in X (I've never noticed this, and use a FreeBSD desktop daily). On the other hand, we have very many benchmarks showing how poorly 4BSD scales on things like postgresql. We get much more load out of our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's easy to look at what you do and say "well, what suits my environment is clearly the best default", but I think there are probably more users typically running IO bound processes than CPU bound processes. I believe the correct thing to do is to put some extra documentation into the handbook about scheduler choice, noting the potential issues with loading NCPU+1 CPU bound processes. Perhaps making it easier to switch scheduler would also help? Cheers Tom References: http://people.freebsd.org/~kris/scaling/mysql-freebsd.png http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 02:22:48AM -0800, Adrian Chadd wrote: > On 13 December 2011 01:00, Andrey Chernov wrote: > > >> If the algorithm ULE does not contain problems - it means the problem > >> has Core2Duo, or in a piece of code that uses the ULE scheduler. > > > > I observe ULE interactivity slowness even on single core machine (Pentium > > 4) in very visible places, like 'ps ax' output stucks in the middle by ~1 > > second. When I switch back to SHED_4BSD, all slowness is gone. > > Are you able to provide KTR traces of the scheduler results? Something > that can be fed to schedgraph? Sorry, this machine is not mine anymore. I try SCHED_ULE on Core 2 Duo instead and don't notice this effect, but it is overall pretty fast comparing to that Pentium 4. -- http://ache.vniz.net/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/13/2011 7:01 PM, m...@freebsd.org wrote: > > Has anyone experiencing problems tried to set sysctl > kern.sched.steal_thresh=1 ? > > I don't remember what our specific problem at $WORK was, perhaps it > was just interrupt threads not getting serviced fast enough, but we've > hard-coded this to 1 and removed the code that sets it in > sched_initticks(). The same effect should be had by setting the > sysctl after a box is up. FWIW, this does impact the performance of pbzip2 on an i7. Using a 1.1G file pbzip2 -v -c big > /dev/null with burnP6 running in the background, sysctl kern.sched.steal_thresh=1 vs sysctl kern.sched.steal_thresh=3 N Min MaxMedian AvgStddev x 10 38.005022 38.42238 38.194648 38.1650520.15546188 + 9 38.695417 40.595544 39.392127 39.4353840.59814114 Difference at 95.0% confidence 1.27033 +/- 0.412636 3.32852% +/- 1.08119% (Student's t, pooled s = 0.425627) a value of 1 is *slightly* faster. -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/09/11 19:57, George Mitchell wrote: On 12/09/11 10:17, Attilio Rao wrote: [...] More precisely I'd be interested in KTR traces. To be even more precise: With a completely stable GENERIC configuration (or otherwise please post your kernel config) please add the following: options KTR options KTR_ENTRIES=262144 options KTR_COMPILE=(KTR_SCHED) options KTR_MASK=(KTR_SCHED) While you are in the middle of the slow-down (so once it is well established) please do: # sysclt debug.ktr.cpumask="" wonderland# sysctl debug.ktr.cpumask="" debug.ktr.cpumask: sysctl: debug.ktr.cpumask: Invalid argument In the end go with: # ktrdump -ctf> ktr-ule-problem.out It's 44MB, so it's at http://www.m5p.com/~george/ktr-ule-problem.out There have been 22 downloads of this file so far; does anyone who looked at it have any results to report? Dear Secret Masters of FreeBSD: Can we have a decision on whether to change back to SCHED_4BSD while SCHED_ULE gets properly fixed? -- George Mitchell and send the file to this mailing list. Thanks, Attilio I hope this helps. -- George Mitchell ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Wed, 14 Dec 2011, Ivan Klymenko wrote: ?? Wed, 14 Dec 2011 00:04:42 +0100 Jilles Tjoelker ??: On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I already wrote in a mailing list that specifically in my case (Core2Duo) partially helps the following patch: --- sched_ule.c.orig2011-11-24 18:11:48.0 +0200 +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 ... @@ -2118,13 +2119,21 @@ struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); + if (td->td_pri_class & PRI_FIFO_BIT) + return; + ts = td->td_sched; + /* +* We used up one time slice. +*/ + if (--ts->ts_slice > 0) + return; This skips most of the periodic functionality (long term load balancer, saving switch count (?), insert index (?), interactivity score update for long running thread) if the thread is not going to be rescheduled right now. It looks wrong but it is a data point if it helps your workload. Yes, I did it for as long as possible to delay the execution of the code in section: I don't understand what you are doing here, but recently noticed that the timeslicing in SCHED_4BSD is completely broken. This bug may be a feature. SCHED_4BSD doesn't have its own timeslice counter like ts_slice above. It uses `switchticks' instead. But switchticks hasn't been usable for this purpose since long before SCHED_4BSD started using it for this purpose. switchticks is reset on every context switch, so it is useless for almost all purposes -- any interrupt activity on a non-fast interrupt clobbers it. Removing the check of ts_slice in the above and always returning might give a similar bug to the SCHED_4BSD one. I noticed this while looking for bugs in realtime scheduling. In the above, returning early for PRI_FIFO_BIT also skips most of the periodic functionality. In SCHED_4BSD, returning early is the usual case, so the PRI_FIFO_BIT might as well not be checked, and it is the unusual fifo scheduling case (which is supposed to only apply to realtime priority threads) which has a chance of working as intended, while the usual roundrobin case degenerates to an impure form of fifo scheduling (iit is impure since priority decay still works so it is only fifo among threads of the same priority). ... @@ -2144,9 +2153,6 @@ if (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx])) tdq->tdq_ridx = tdq->tdq_idx; } - ts = td->td_sched; - if (td->td_pri_class & PRI_FIFO_BIT) - return; if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) { /* * We used a tick; charge it to the thread so @@ -2157,11 +2163,6 @@ sched_priority(td); } /* -* We used up one time slice. -*/ - if (--ts->ts_slice > 0) - return; - /* * We're out of time, force a requeue at userret(). */ ts->ts_slice = sched_slice; With the ts_slice check here before you moved it, removing it might give buggy behaviour closer to SCHED_4BSD. and refusal to use options FULL_PREEMPTION 4-5 years ago, I found that any form of PREMPTION was a pessimization for at least makeworld (since it caused too many context switches). PREEMPTION was needed for the !SMP case, at least partly because of the broken switchticks (switchticks, when it works, gives voluntary yielding by some CPU hogs in the kernel. PREEMPTION, if it works, should do this better). So I used PREEMPTION in the !SMP case and not for the SMP case. I didn't worry about the CPU hogs in the SMP case since it is rare to have more than 1 of them and 1 will use at most 1/2 of a multi-CPU system. But no one has unsubscribed to my letter, my patch helps or not in the case of Core2Duo... There is a suspicion that the problems stem from the sections of code associated with the SMP... Maybe I'm in something wrong, but I want to help in solving this problem ... The main point of SCHED_ULE is to give better affinity for multi-CPU systems. But the `multi' apparently needs to be strictly more than 2 for it to brak even. Bruce___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/13/11 18:02, Marcus Reid wrote: [...] The issues that I've seen with ULE on the desktop seem to be caused by X taking up a steady amount of CPU, and being demoted from being an "interactive" process. X then becomes the bottleneck for other processes that would otherwise be "interactive". Try 'renice -20 ' and see if that makes your problems go away. Marcus [...] renice on X has no effect. Stopping my compute-bound dnetc process immediately speeds everything up; restarting it slows it back down. On 12/13/11 19:01, m...@freebsd.org wrote: > [...] > Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ? > [...] 1 appears to be the default value for kern.sched.steal_thresh. -- George Mitchell ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
В Tue, 13 Dec 2011 16:01:56 -0800 m...@freebsd.org пишет: > On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko wrote: > > В Wed, 14 Dec 2011 00:04:42 +0100 > > Jilles Tjoelker пишет: > > > >> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: > >> > If the algorithm ULE does not contain problems - it means the > >> > problem has Core2Duo, or in a piece of code that uses the ULE > >> > scheduler. I already wrote in a mailing list that specifically in > >> > my case (Core2Duo) partially helps the following patch: > >> > --- sched_ule.c.orig 2011-11-24 18:11:48.0 +0200 > >> > +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 > >> > @@ -794,7 +794,8 @@ > >> > * 1.5 * balance_interval. > >> > */ > >> > balance_ticks = max(balance_interval / 2, 1); > >> > - balance_ticks += random() % balance_interval; > >> > +// balance_ticks += random() % balance_interval; > >> > + balance_ticks += ((int)random()) % balance_interval; > >> > if (smp_started == 0 || rebalance == 0) > >> > return; > >> > tdq = TDQ_SELF(); > >> > >> This avoids a 64-bit division on 64-bit platforms but seems to > >> have no effect otherwise. Because this function is not called very > >> often, the change seems unlikely to help. > > > > Yes, this section does not apply to this problem :) > > Just I posted the latest patch which i using now... > > > >> > >> > @@ -2118,13 +2119,21 @@ > >> > struct td_sched *ts; > >> > > >> > THREAD_LOCK_ASSERT(td, MA_OWNED); > >> > + if (td->td_pri_class & PRI_FIFO_BIT) > >> > + return; > >> > + ts = td->td_sched; > >> > + /* > >> > + * We used up one time slice. > >> > + */ > >> > + if (--ts->ts_slice > 0) > >> > + return; > >> > >> This skips most of the periodic functionality (long term load > >> balancer, saving switch count (?), insert index (?), interactivity > >> score update for long running thread) if the thread is not going to > >> be rescheduled right now. > >> > >> It looks wrong but it is a data point if it helps your workload. > > > > Yes, I did it for as long as possible to delay the execution of the > > code in section: ... > > #ifdef SMP > > /* > > * We run the long term load balancer infrequently on the > > first cpu. */ > > if (balance_tdq == tdq) { > > if (balance_ticks && --balance_ticks == 0) > > sched_balance(); > > } > > #endif > > ... > > > >> > >> > tdq = TDQ_SELF(); > >> > #ifdef SMP > >> > /* > >> > * We run the long term load balancer infrequently on the > >> > first cpu. */ > >> > - if (balance_tdq == tdq) { > >> > - if (balance_ticks && --balance_ticks == 0) > >> > + if (balance_ticks && --balance_ticks == 0) { > >> > + if (balance_tdq == tdq) > >> > sched_balance(); > >> > } > >> > #endif > >> > >> The main effect of this appears to be to disable the long term load > >> balancer completely after some time. At some point, a CPU other > >> than the first CPU (which uses balance_tdq) will set balance_ticks > >> = 0, and sched_balance() will never be called again. > >> > > > > That is, for the same reason as above in the text... > > > >> It also introduces a hypothetical race condition because the > >> access to balance_ticks is no longer restricted to one CPU under a > >> spinlock. > >> > >> If the long term load balancer may be causing trouble, try setting > >> kern.sched.balance_interval to a higher value with unpatched code. > > > > I checked it in the first place - but it did not help fix the > > situation... > > > > The impression of malfunction rebalancing... > > It seems that the thread is passed on to the same core that is > > loaded and so... Perhaps this is a consequence of an incorrect > > definition of the topology CPU? > > > >> > >> > @@ -2144,9 +2153,6 @@ > >> > if > >> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx])) > >> > tdq->tdq_ridx = tdq->tdq_idx; } > >> > - ts = td->td_sched; > >> > - if (td->td_pri_class & PRI_FIFO_BIT) > >> > - return; > >> > if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) { > >> > /* > >> > * We used a tick; charge it to the thread so > >> > @@ -2157,11 +2163,6 @@ > >> > sched_priority(td); > >> > } > >> > /* > >> > - * We used up one time slice. > >> > - */ > >> > - if (--ts->ts_slice > 0) > >> > - return; > >> > - /* > >> > * We're out of time, force a requeue at userret(). > >> > */ > >> > ts->ts_slice = sched_slice; > >> > >> > and refusal to use options FULL_PREEMPTION > >> > But no one has unsubscribed to my letter, my patch helps or not > >> > in the case of Core2Duo... > >> > There is a suspicion that the problems stem from the sections of > >> > code associated with the SMP... > >> > Maybe I'm in something wrong, but I want to help in solving this > >> > problem ... > > > Has an
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko wrote: > В Wed, 14 Dec 2011 00:04:42 +0100 > Jilles Tjoelker пишет: > >> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: >> > If the algorithm ULE does not contain problems - it means the >> > problem has Core2Duo, or in a piece of code that uses the ULE >> > scheduler. I already wrote in a mailing list that specifically in >> > my case (Core2Duo) partially helps the following patch: >> > --- sched_ule.c.orig 2011-11-24 18:11:48.0 +0200 >> > +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 >> > @@ -794,7 +794,8 @@ >> > * 1.5 * balance_interval. >> > */ >> > balance_ticks = max(balance_interval / 2, 1); >> > - balance_ticks += random() % balance_interval; >> > +// balance_ticks += random() % balance_interval; >> > + balance_ticks += ((int)random()) % balance_interval; >> > if (smp_started == 0 || rebalance == 0) >> > return; >> > tdq = TDQ_SELF(); >> >> This avoids a 64-bit division on 64-bit platforms but seems to have no >> effect otherwise. Because this function is not called very often, the >> change seems unlikely to help. > > Yes, this section does not apply to this problem :) > Just I posted the latest patch which i using now... > >> >> > @@ -2118,13 +2119,21 @@ >> > struct td_sched *ts; >> > >> > THREAD_LOCK_ASSERT(td, MA_OWNED); >> > + if (td->td_pri_class & PRI_FIFO_BIT) >> > + return; >> > + ts = td->td_sched; >> > + /* >> > + * We used up one time slice. >> > + */ >> > + if (--ts->ts_slice > 0) >> > + return; >> >> This skips most of the periodic functionality (long term load >> balancer, saving switch count (?), insert index (?), interactivity >> score update for long running thread) if the thread is not going to >> be rescheduled right now. >> >> It looks wrong but it is a data point if it helps your workload. > > Yes, I did it for as long as possible to delay the execution of the code in > section: > ... > #ifdef SMP > /* > * We run the long term load balancer infrequently on the first cpu. > */ > if (balance_tdq == tdq) { > if (balance_ticks && --balance_ticks == 0) > sched_balance(); > } > #endif > ... > >> >> > tdq = TDQ_SELF(); >> > #ifdef SMP >> > /* >> > * We run the long term load balancer infrequently on the >> > first cpu. */ >> > - if (balance_tdq == tdq) { >> > - if (balance_ticks && --balance_ticks == 0) >> > + if (balance_ticks && --balance_ticks == 0) { >> > + if (balance_tdq == tdq) >> > sched_balance(); >> > } >> > #endif >> >> The main effect of this appears to be to disable the long term load >> balancer completely after some time. At some point, a CPU other than >> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and >> sched_balance() will never be called again. >> > > That is, for the same reason as above in the text... > >> It also introduces a hypothetical race condition because the access to >> balance_ticks is no longer restricted to one CPU under a spinlock. >> >> If the long term load balancer may be causing trouble, try setting >> kern.sched.balance_interval to a higher value with unpatched code. > > I checked it in the first place - but it did not help fix the situation... > > The impression of malfunction rebalancing... > It seems that the thread is passed on to the same core that is loaded and > so... > Perhaps this is a consequence of an incorrect definition of the topology CPU? > >> >> > @@ -2144,9 +2153,6 @@ >> > if >> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx])) >> > tdq->tdq_ridx = tdq->tdq_idx; } >> > - ts = td->td_sched; >> > - if (td->td_pri_class & PRI_FIFO_BIT) >> > - return; >> > if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) { >> > /* >> > * We used a tick; charge it to the thread so >> > @@ -2157,11 +2163,6 @@ >> > sched_priority(td); >> > } >> > /* >> > - * We used up one time slice. >> > - */ >> > - if (--ts->ts_slice > 0) >> > - return; >> > - /* >> > * We're out of time, force a requeue at userret(). >> > */ >> > ts->ts_slice = sched_slice; >> >> > and refusal to use options FULL_PREEMPTION >> > But no one has unsubscribed to my letter, my patch helps or not in >> > the case of Core2Duo... >> > There is a suspicion that the problems stem from the sections of >> > code associated with the SMP... >> > Maybe I'm in something wrong, but I want to help in solving this >> > problem ... Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ? I don't remember what our specific problem at $WORK was, perhaps it was just interrupt threads not getting serviced fast enough, but we've hard-coded this to 1 and removed the code that sets it in sched_initticks(). The same effect should be h
Re: SCHED_ULE should not be the default
В Tue, 13 Dec 2011 23:02:15 + Marcus Reid пишет: > On Mon, Dec 12, 2011 at 04:29:14PM -0800, Doug Barton wrote: > > On 12/12/2011 05:47, O. Hartmann wrote: > > > Do we have any proof at hand for such cases where SCHED_ULE > > > performs much better than SCHED_4BSD? > > > > I complained about poor interactive performance of ULE in a desktop > > environment for years. I had numerous people try to help, including > > Jeff, with various tunables, dtrace'ing, etc. The cause of the > > problem was never found. > > The issues that I've seen with ULE on the desktop seem to be caused > by X taking up a steady amount of CPU, and being demoted from being an > "interactive" process. X then becomes the bottleneck for other > processes that would otherwise be "interactive". Try 'renice -20 > ' and see if that makes your problems go away. Why, then X is not a bottleneck when using 4BSD? > Marcus ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
В Wed, 14 Dec 2011 00:04:42 +0100 Jilles Tjoelker пишет: > On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: > > If the algorithm ULE does not contain problems - it means the > > problem has Core2Duo, or in a piece of code that uses the ULE > > scheduler. I already wrote in a mailing list that specifically in > > my case (Core2Duo) partially helps the following patch: > > --- sched_ule.c.orig2011-11-24 18:11:48.0 +0200 > > +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 > > @@ -794,7 +794,8 @@ > > * 1.5 * balance_interval. > > */ > > balance_ticks = max(balance_interval / 2, 1); > > - balance_ticks += random() % balance_interval; > > +// balance_ticks += random() % balance_interval; > > + balance_ticks += ((int)random()) % balance_interval; > > if (smp_started == 0 || rebalance == 0) > > return; > > tdq = TDQ_SELF(); > > This avoids a 64-bit division on 64-bit platforms but seems to have no > effect otherwise. Because this function is not called very often, the > change seems unlikely to help. Yes, this section does not apply to this problem :) Just I posted the latest patch which i using now... > > > @@ -2118,13 +2119,21 @@ > > struct td_sched *ts; > > > > THREAD_LOCK_ASSERT(td, MA_OWNED); > > + if (td->td_pri_class & PRI_FIFO_BIT) > > + return; > > + ts = td->td_sched; > > + /* > > +* We used up one time slice. > > +*/ > > + if (--ts->ts_slice > 0) > > + return; > > This skips most of the periodic functionality (long term load > balancer, saving switch count (?), insert index (?), interactivity > score update for long running thread) if the thread is not going to > be rescheduled right now. > > It looks wrong but it is a data point if it helps your workload. Yes, I did it for as long as possible to delay the execution of the code in section: ... #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ if (balance_tdq == tdq) { if (balance_ticks && --balance_ticks == 0) sched_balance(); } #endif ... > > > tdq = TDQ_SELF(); > > #ifdef SMP > > /* > > * We run the long term load balancer infrequently on the > > first cpu. */ > > - if (balance_tdq == tdq) { > > - if (balance_ticks && --balance_ticks == 0) > > + if (balance_ticks && --balance_ticks == 0) { > > + if (balance_tdq == tdq) > > sched_balance(); > > } > > #endif > > The main effect of this appears to be to disable the long term load > balancer completely after some time. At some point, a CPU other than > the first CPU (which uses balance_tdq) will set balance_ticks = 0, and > sched_balance() will never be called again. > That is, for the same reason as above in the text... > It also introduces a hypothetical race condition because the access to > balance_ticks is no longer restricted to one CPU under a spinlock. > > If the long term load balancer may be causing trouble, try setting > kern.sched.balance_interval to a higher value with unpatched code. I checked it in the first place - but it did not help fix the situation... The impression of malfunction rebalancing... It seems that the thread is passed on to the same core that is loaded and so... Perhaps this is a consequence of an incorrect definition of the topology CPU? > > > @@ -2144,9 +2153,6 @@ > > if > > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx])) > > tdq->tdq_ridx = tdq->tdq_idx; } > > - ts = td->td_sched; > > - if (td->td_pri_class & PRI_FIFO_BIT) > > - return; > > if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) { > > /* > > * We used a tick; charge it to the thread so > > @@ -2157,11 +2163,6 @@ > > sched_priority(td); > > } > > /* > > -* We used up one time slice. > > -*/ > > - if (--ts->ts_slice > 0) > > - return; > > - /* > > * We're out of time, force a requeue at userret(). > > */ > > ts->ts_slice = sched_slice; > > > and refusal to use options FULL_PREEMPTION > > But no one has unsubscribed to my letter, my patch helps or not in > > the case of Core2Duo... > > There is a suspicion that the problems stem from the sections of > > code associated with the SMP... > > Maybe I'm in something wrong, but I want to help in solving this > > problem ... > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 04:29:14PM -0800, Doug Barton wrote: > On 12/12/2011 05:47, O. Hartmann wrote: > > Do we have any proof at hand for such cases where SCHED_ULE performs > > much better than SCHED_4BSD? > > I complained about poor interactive performance of ULE in a desktop > environment for years. I had numerous people try to help, including > Jeff, with various tunables, dtrace'ing, etc. The cause of the problem > was never found. The issues that I've seen with ULE on the desktop seem to be caused by X taking up a steady amount of CPU, and being demoted from being an "interactive" process. X then becomes the bottleneck for other processes that would otherwise be "interactive". Try 'renice -20 ' and see if that makes your problems go away. Marcus ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: > If the algorithm ULE does not contain problems - it means the problem > has Core2Duo, or in a piece of code that uses the ULE scheduler. > I already wrote in a mailing list that specifically in my case (Core2Duo) > partially helps the following patch: > --- sched_ule.c.orig 2011-11-24 18:11:48.0 +0200 > +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 > @@ -794,7 +794,8 @@ >* 1.5 * balance_interval. >*/ > balance_ticks = max(balance_interval / 2, 1); > - balance_ticks += random() % balance_interval; > +// balance_ticks += random() % balance_interval; > + balance_ticks += ((int)random()) % balance_interval; > if (smp_started == 0 || rebalance == 0) > return; > tdq = TDQ_SELF(); This avoids a 64-bit division on 64-bit platforms but seems to have no effect otherwise. Because this function is not called very often, the change seems unlikely to help. > @@ -2118,13 +2119,21 @@ > struct td_sched *ts; > > THREAD_LOCK_ASSERT(td, MA_OWNED); > + if (td->td_pri_class & PRI_FIFO_BIT) > + return; > + ts = td->td_sched; > + /* > + * We used up one time slice. > + */ > + if (--ts->ts_slice > 0) > + return; This skips most of the periodic functionality (long term load balancer, saving switch count (?), insert index (?), interactivity score update for long running thread) if the thread is not going to be rescheduled right now. It looks wrong but it is a data point if it helps your workload. > tdq = TDQ_SELF(); > #ifdef SMP > /* >* We run the long term load balancer infrequently on the first cpu. >*/ > - if (balance_tdq == tdq) { > - if (balance_ticks && --balance_ticks == 0) > + if (balance_ticks && --balance_ticks == 0) { > + if (balance_tdq == tdq) > sched_balance(); > } > #endif The main effect of this appears to be to disable the long term load balancer completely after some time. At some point, a CPU other than the first CPU (which uses balance_tdq) will set balance_ticks = 0, and sched_balance() will never be called again. It also introduces a hypothetical race condition because the access to balance_ticks is no longer restricted to one CPU under a spinlock. If the long term load balancer may be causing trouble, try setting kern.sched.balance_interval to a higher value with unpatched code. > @@ -2144,9 +2153,6 @@ > if (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx])) > tdq->tdq_ridx = tdq->tdq_idx; > } > - ts = td->td_sched; > - if (td->td_pri_class & PRI_FIFO_BIT) > - return; > if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) { > /* >* We used a tick; charge it to the thread so > @@ -2157,11 +2163,6 @@ > sched_priority(td); > } > /* > - * We used up one time slice. > - */ > - if (--ts->ts_slice > 0) > - return; > - /* >* We're out of time, force a requeue at userret(). >*/ > ts->ts_slice = sched_slice; > and refusal to use options FULL_PREEMPTION > But no one has unsubscribed to my letter, my patch helps or not in the > case of Core2Duo... > There is a suspicion that the problems stem from the sections of code > associated with the SMP... > Maybe I'm in something wrong, but I want to help in solving this > problem ... -- Jilles Tjoelker ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/13/2011 13:31, Malin Randstrom wrote: > stop sending me spam mail ... you never stop despite me having unsubscribeb > several times. stop this! If you had actually unsubscribed, the mail would have stopped. :) You can see the instructions you need to follow below. > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > -- [^L] Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
stop sending me spam mail ... you never stop despite me having unsubscribeb several times. stop this! On Dec 13, 2011 8:12 PM, "Steve Kargl" wrote: > On Tue, Dec 13, 2011 at 02:23:46PM +0100, O. Hartmann wrote: > > On 12/12/11 16:51, Steve Kargl wrote: > > > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote: > > >> > > >>> Not fully right, boinc defaults to run on idprio 31 so this isn't an > > >>> issue. And yes, there are cases where SCHED_ULE shows much better > > >>> performance then SCHED_4BSD. [...] > > >> > > >> Do we have any proof at hand for such cases where SCHED_ULE performs > > >> much better than SCHED_4BSD? Whenever the subject comes up, it is > > >> mentioned, that SCHED_ULE has better performance on boxes with a ncpu > > > > >> 2. But in the end I see here contradictionary statements. People > > >> complain about poor performance (especially in scientific > environments), > > >> and other give contra not being the case. > > >> > > >> Within our department, we developed a highly scalable code for > planetary > > >> science purposes on imagery. It utilizes present GPUs via OpenCL if > > >> present. Otherwise it grabs as many cores as it can. > > >> By the end of this year I'll get a new desktop box based on Intels new > > >> Sandy Bridge-E architecture with plenty of memory. If the colleague > who > > >> developed the code is willing performing some benchmarks on the same > > >> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most > > >> recent Suse. For FreeBSD I intent also to look for performance with > both > > >> different schedulers available. > > >> > > > > > > This comes up every 9 months or so, and must be approaching > > > FAQ status. > > > > > > In a HPC environment, I recommend 4BSD. Depending on > > > the workload, ULE can cause a severe increase in turn > > > around time when doing already long computations. If > > > you have an MPI application, simply launching greater > > > than ncpu+1 jobs can show the problem. > > > > Well, those recommendations should based on "WHY". As the mostly > > negative experiences with SCHED_ULE in highly computative workloads get > > allways contradicted by "...but there are workloads that show the > > opposite ..." this should be shown by more recent benchmarks and > > explanations than legacy benchmarks from years ago. > > > > I have given the WHY in previous discussions of ULE, based > on what you call legacy benchmarks. I have not seen any > commit to sched_ule.c that would lead me to believe that > the performance issues with ULE and cpu-bound numerical > codes have been addressed. Repeating the benchmark would > be a waste of time. > > -- > Steve > ___ > freebsd-performa...@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: SCHED_ULE should not be the default
On 12/13/2011 10:54 AM, Steve Kargl wrote: > > I have given the WHY in previous discussions of ULE, based > on what you call legacy benchmarks. I have not seen any > commit to sched_ule.c that would lead me to believe that > the performance issues with ULE and cpu-bound numerical > codes have been addressed. Repeating the benchmark would > be a waste of time. Trying a simple pbzip2 on a large file, the results are pretty consistent through iterations. pbzip2 with 4BSD is barely faster on a file thats 322MB in size. after a reboot, I did a strings bigfile > /dev/null then ran pbzip2 -v xaa -c > /dev/null 7 times If I do a burnP6 in the background, they perform about the same. (from sysutils/cpuburn) eg pbzip2 -v xaa -c > /dev/null Parallel BZIP2 v1.1.6 - by: Jeff Gilchrist [http://compression.ca] [Oct. 30, 2011] (uses libbzip2 by Julian Seward) Major contributions: Yavor Nikolov # CPUs: 4 BWT Block Size: 900 KB File Block Size: 900 KB Maximum Memory: 100 MB --- File #: 1 of 1 Input Name: xaa Output Name: Input Size: 352404831 bytes Compressing data... Output Size: 50630745 bytes --- Wall Clock: 18.139342 seconds ULE 18.113204 18.116896 18.123400 18.105894 18.163332 18.139342 18.082888 ULE with burnP6 23.076085 22.003666 21.162987 21.682445 21.935568 23.595781 21.601277 4BSD 17.983395 17.986218 18.009254 18.004312 18.001494 17.997032 4BSD with burnP6 22.215508 21.886459 21.595179 21.361830 21.325351 21.244793 # ministat uleP6 bsdP6 x uleP6 + bsdP6 +--+ |x+ + ++x + x x + xx| | ||__MA|M_A__| | +--+ N Min MaxMedian AvgStddev x 6 21.162987 23.595781 22.003666 22.2427550.91175566 + 6 21.244793 22.215508 21.595179 21.604853 0.3792413 No difference proven at 95.0% confidence x ule + bsd +--+ |+ + + + + + xx x xx x x| | |__A___M___| |M__A__| | +--+ N Min MaxMedian AvgStddev x 7 18.082888 18.163332 18.116896 18.120708 0.025468695 + 6 17.983395 18.009254 18.001494 17.996951 0.010248473 Difference at 95.0% confidence -0.123757 +/- 0.024538 -0.68296% +/- 0.135414% (Student's t, pooled s = 0.0200388) hardware is X3450 with 8G of memory. RELENG8 ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"