subject:"SCHED_ULE should not be the default"

Re: SCHED_ULE should not be the default

2011-12-24 Thread Adrian Chadd

My rule is "break it any way you can and see if you can figure out why."

Don't be discouraged. You may find some of the folk at yahoo are interested.


Adrian

On 24 December 2011 03:00, Daniel Kalchev  wrote:
>
> On Dec 24, 2011, at 12:49 AM, Adrian Chadd wrote:
>
>> Do you not have access to anything with 8 CPUs in it? It'd be nice to
>> get clarification that this indeed was fixed.
>
> I offered to do tests on 4x8 core Opteron system (32 cores total), but was 
> discouraged that contention would be too much and results meaningless -- yet, 
> such systems will be more and more popular.
>
>> Does ULE care (much) if the nodes are hyperthreading or real cores?
>> Would that play a part in what it tries to schedule/spread?
>
> I could also run the tests on 2x4x2 cores Xeon, which uses hyper threading, 8 
> real or 16 virtual cores in total.
>
> I can torture both systems (actually two pairs) for a week or two. But I may 
> not have enough time to prepare the core/setup so any advice is greatly 
> appreciated. Be more descriptive :)
>
> Daniel
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-24 Thread Stefan Esser

Am 24.12.2011 00:02, schrieb Andriy Gapon:
> on 24/12/2011 00:49 Adrian Chadd said the following:
>> Does ULE care (much) if the nodes are hyperthreading or real cores?
>> Would that play a part in what it tries to schedule/spread?
> 
> An answer to this part from the theory.
> ULE does care about physical topology of the (logical) CPUs.
> So, for example, four cores are not the same as two core with two hw threads
> from ULE's perspective.  Still, ULE tries to eliminate any imbalances between
> the CPU groups starting from the top level (e.g. CPU packages in a 
> multi-socket
> system) and all the way down to the individual (logical) CPUs.
> Thus, given enough load (L >= N) there should not be an idle CPU in the system
> whatever the topology.  Modulo bugs, of course, as always.

I tried to locate the old message, where somebody explained why the
topology lead to a thread being selected for migration, re-assigned and
then on another topology level was swapped back and ended on just the
core it had already been running on. The analysis was quite detailed and
it may well have been part of that discussion back in 2008 that Steve
Kargl mentioned ...

This problem could be fixed by adding a slight degree if randomness.
But if IIRC, a deterministic solution might also be possible, which just
takes care not to put a thread back on the core it previously had been
running on, if it has been determined that the thread should be migrated
to a different core, before.

Sorry for not being able to point to the old message that contained the
analysis of this problem.

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-24 Thread Daniel Kalchev

On Dec 24, 2011, at 12:49 AM, Adrian Chadd wrote:

> Do you not have access to anything with 8 CPUs in it? It'd be nice to
> get clarification that this indeed was fixed.

I offered to do tests on 4x8 core Opteron system (32 cores total), but was 
discouraged that contention would be too much and results meaningless -- yet, 
such systems will be more and more popular.

> Does ULE care (much) if the nodes are hyperthreading or real cores?
> Would that play a part in what it tries to schedule/spread?

I could also run the tests on 2x4x2 cores Xeon, which uses hyper threading, 8 
real or 16 virtual cores in total.

I can torture both systems (actually two pairs) for a week or two. But I may 
not have enough time to prepare the core/setup so any advice is greatly 
appreciated. Be more descriptive :)

Daniel___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-23 Thread Steve Kargl

On Fri, Dec 23, 2011 at 02:49:51PM -0800, Adrian Chadd wrote:
> On 23 December 2011 11:11, Steve Kargl  
> wrote:
> 
> > One difference between the 2008 tests and today tests is
> > the number of available cpus. ?In 2008, I ran the tests
> > on a node with 8 cpus, while today's test used only a
> > node with only 4 cpus. ?If this behavior is a scaling
> > issue, I can't currently test it. ?But, today's tests
> > are certainly encouraging.
> 
> Do you not have access to anything with 8 CPUs in it? It'd be nice to
> get clarification that this indeed was fixed.

I have a few nodes with 8 cpus, but those are running 4BSD
kernels.  I try to keep my kernel and world sync, and by
extension the kernel/world on each node is in sync with
all other nodes.  So, while I took the 4 cpu node off-line
and updated it, at the moment I can't take another node
off-line unless I do an update across the entire cluster.
The update is planned for next year.

> Does ULE care (much) if the nodes are hyperthreading or real cores?
> Would that play a part in what it tries to schedule/spread?

I only have opteron processors in the cluster, if you're referring
to Intel's hypertheading technology, I can't look into ULE's
behavior with HTT.  

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-23 Thread Andriy Gapon

on 24/12/2011 00:49 Adrian Chadd said the following:
> Does ULE care (much) if the nodes are hyperthreading or real cores?
> Would that play a part in what it tries to schedule/spread?

An answer to this part from the theory.
ULE does care about physical topology of the (logical) CPUs.
So, for example, four cores are not the same as two core with two hw threads
from ULE's perspective.  Still, ULE tries to eliminate any imbalances between
the CPU groups starting from the top level (e.g. CPU packages in a multi-socket
system) and all the way down to the individual (logical) CPUs.
Thus, given enough load (L >= N) there should not be an idle CPU in the system
whatever the topology.  Modulo bugs, of course, as always.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-23 Thread Adrian Chadd

On 23 December 2011 11:11, Steve Kargl  
wrote:

> Ah, so goods news!  I cannot reproduce this problem that
> I saw 3+ years ago on the 4-cpu node, which is currently
> running a ULE kernel.  When I killed the (N+1)th job,
> the N remaining jobs are spread across the N cpus.

Ah, good.

> One difference between the 2008 tests and today tests is
> the number of available cpus.  In 2008, I ran the tests
> on a node with 8 cpus, while today's test used only a
> node with only 4 cpus.  If this behavior is a scaling
> issue, I can't currently test it.  But, today's tests
> are certainly encouraging.

Do you not have access to anything with 8 CPUs in it? It'd be nice to
get clarification that this indeed was fixed.

Does ULE care (much) if the nodes are hyperthreading or real cores?
Would that play a part in what it tries to schedule/spread?


Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-23 Thread Steve Kargl

On Thu, Dec 22, 2011 at 04:23:29PM -0800, Adrian Chadd wrote:
> On 22 December 2011 11:47, Steve Kargl  
> wrote:
> 
> > There is the additional observation in one of my 2008
> > emails (URLs have been posted) that if you have N+1
> > cpu-bound jobs with, say, job0 and job1 ping-ponging
> > on cpu0 (due to ULE's cpu-affinity feature) and if I
> > kill job2 running on cpu1, then neither job0 nor job1
> > will migrate to cpu1. ?So, one now has N cpu-bound
> > jobs running on N-1 cpus.
> 
> .. and this sounds like a pretty serious regression. Have you ever
> filed a PR for it?
> 

Ah, so goods news!  I cannot reproduce this problem that
I saw 3+ years ago on the 4-cpu node, which is currently
running a ULE kernel.  When I killed the (N+1)th job,
the N remaining jobs are spread across the N cpus.

One difference between the 2008 tests and today tests is
the number of available cpus.  In 2008, I ran the tests
on a node with 8 cpus, while today's test used only a 
node with only 4 cpus.  If this behavior is a scaling
issue, I can't currently test it.  But, today's tests
are certainly encouraging.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl

On Thu, Dec 22, 2011 at 04:23:29PM -0800, Adrian Chadd wrote:
> On 22 December 2011 11:47, Steve Kargl  
> wrote:
> 
> > There is the additional observation in one of my 2008
> > emails (URLs have been posted) that if you have N+1
> > cpu-bound jobs with, say, job0 and job1 ping-ponging
> > on cpu0 (due to ULE's cpu-affinity feature) and if I
> > kill job2 running on cpu1, then neither job0 nor job1
> > will migrate to cpu1. ?So, one now has N cpu-bound
> > jobs running on N-1 cpus.
> 
> .. and this sounds like a pretty serious regression. Have you ever
> filed a PR for it?

No.  I was interacting directly with jeffr in 2008.  I got
as far as setting up root access on a node for jeffr.
Unfortunately, both jeffr and I got busy with real life,
and 4BSD allowed me to get my work done.

> > Finally, my initial post in this email thread was to
> > tell O. Hartman to quit beating his head against
> > a wall with ULE (in an HPC environment). ?Switch to
> > 4BSD. ?This was based on my 2008 observations and
> > I've now wasted 2 days gather additional information
> > which only re-affirms my recommendation.
> 
> I personally don't think this is time wasted. You've done something
> that noone else has actually done - provided actual results from
> real-life testing, rather than a hundred posts of "I remember seeing
> X, so I don't use ULE."
> 
> If you can definitely and consistently reproduce that N-1 cpu bound
> job bug, you're now in a great position to easily test and re-report
> KTR/schedtrace results to see what impact they have. Please don't
> underestimate exactly how valuable this is.

I'll try this tomorrow.  I first need to modify the code I used
in the 2008 test to disable IO, so that it is nearly completely
cpu-bound.

> How often are those two jobs migrating between CPUs? How am I supposed
> to read "CPU load" ? Why isn't it just sitting at 100% the whole time?

This is my 1st foray into ktr and schedgraph, so I may not have done
something incorrectly.  In particular, it seems that schedgraph takes
the cpu clock as a command line argument, so there is probably 
some scaling that I'm missing.

> Would you mind repeating this with 4BSD (the N+1 jobs) so we can see
> how the jobs are scheduled/interleaved? Something tells me we'll see
> it the jobs being scheduled evenly

Sure, I'll do this tomorrow as well.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Doug Barton

On 12/22/2011 16:23, Adrian Chadd wrote:
> You've done something
> that noone else has actually done - provided actual results from
> real-life testing, rather than a hundred posts of "I remember seeing
> X, so I don't use ULE."

Not to take away from Steve's excellent work on this, but I actually
spent weeks following detailed instructions from various people using
ktr, dtrace, etc. and was never able to produce any data that helped
point anyone to something that could be fixed. I'm pretty sure that
others have tried as well.

That said, I'm glad that Steve was able to produce useful results, and
hopefully it will lead to improvements.

Doug

-- 

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Adrian Chadd

On 22 December 2011 11:47, Steve Kargl  
wrote:

[snip]

Thankyou for posting some actual measurements!

> There is the additional observation in one of my 2008
> emails (URLs have been posted) that if you have N+1
> cpu-bound jobs with, say, job0 and job1 ping-ponging
> on cpu0 (due to ULE's cpu-affinity feature) and if I
> kill job2 running on cpu1, then neither job0 nor job1
> will migrate to cpu1.  So, one now has N cpu-bound
> jobs running on N-1 cpus.

.. and this sounds like a pretty serious regression. Have you ever
filed a PR for it?

> Finally, my initial post in this email thread was to
> tell O. Hartman to quit beating his head against
> a wall with ULE (in an HPC environment).  Switch to
> 4BSD.  This was based on my 2008 observations and
> I've now wasted 2 days gather additional information
> which only re-affirms my recommendation.

I personally don't think this is time wasted. You've done something
that noone else has actually done - provided actual results from
real-life testing, rather than a hundred posts of "I remember seeing
X, so I don't use ULE."

If you can definitely and consistently reproduce that N-1 cpu bound
job bug, you're now in a great position to easily test and re-report
KTR/schedtrace results to see what impact they have. Please don't
underestimate exactly how valuable this is.

How often are those two jobs migrating between CPUs? How am I supposed
to read "CPU load" ? Why isn't it just sitting at 100% the whole time?

Would you mind repeating this with 4BSD (the N+1 jobs) so we can see
how the jobs are scheduled/interleaved? Something tells me we'll see
it the jobs being scheduled evenly

Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Andriy Gapon

on 22/12/2011 21:47 Steve Kargl said the following:
> On Thu, Dec 22, 2011 at 09:01:15PM +0200, Andriy Gapon wrote:
>> on 22/12/2011 20:45 Steve Kargl said the following:
>>> I've used schedgraph to look at the ktrdump output.  A jpg is
>>> available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
>>> This shows the ping-pong effect where here 3 processes appear to be
>>> using 2 cpus while the remaining 2 processes are pinned to their
>>> cpus.
>>
>> I'd recommended enabling CPU-specific background colors via the menu in
>> schedgraph for a better illustration of your findings.
>>
>> NB: I still don't understand the point of purposefully running N+1 CPU-bound
>> processes.
>>
> 
> The point is that this is a node in a HPC cluster with
> multiple users.  Sure, I can start my job on this node
> with only N cpu-bound jobs.  Now, when user John Doe
> wants to run his OpenMPI program should he login into
> the 12 nodes in the cluster to see if someone is already
> running N cpu-bound jobs on a given node?  4BSD
> gives my jobs and John Doe's jobs a fair share of the
> available cpus.  ULE does not give a fair share and 
> if you read the summary file I put up on the web,
> you see that it is fairly non-deterministic on when a
> OpenMPI run will finish (see the mean absolute deviations
> in the table of 'real' times that I posted).

OK.
I think I know why the uneven load occurs.  I remember even trying to explain my
observations.
There are two things:
1. ULE doesn't have either a common across CPUs runqueue nor any other kind of
mechanism for enforcing true global fairness of CPU resource sharing.
2. ULE's rebalancing code is biased and that leads to the situation where
sub-groups of threads can share subsets of CPUs rather fairly, but there won't
be a global fairness.

I haven't really given any thought as to how to fix or workaround these issues.
One dumb idea is to add an element of randomness to a choice between equally
loaded CPUs (and their subsets) instead of having a permanent bias.

> There is the additional observation in one of my 2008
> emails (URLs have been posted) that if you have N+1
> cpu-bound jobs with, say, job0 and job1 ping-ponging
> on cpu0 (due to ULE's cpu-affinity feature) and if I
> kill job2 running on cpu1, then neither job0 nor job1
> will migrate to cpu1.  So, one now has N cpu-bound
> jobs running on N-1 cpus.

Have you checked recently that that is still the case?
I would consider this a rather serious bug as opposed to a sub-optimal 
scheduling.

> Finally, my initial post in this email thread was to
> tell O. Hartman to quit beating his head against 
> a wall with ULE (in an HPC environment).  Switch to
> 4BSD.  This was based on my 2008 observations and 
> I've now wasted 2 days gather additional information
> which only re-affirms my recommendation.

I think that any objective information has its value.  So maybe the time is not
really wasted.  I think there is no argument that for your usage pattern 4BSD is
better than ULE at the moment, because of the inherent design choices of both
schedulers and their current implementations.  But I think that ULE could be
improved to produce more global fairness.

P.S.
But, but, this thread has seen so many different problem reports about ULE
heaped together that it's very easy to get confused about what is caused by what
and what is real and what is not.  E.g. I don't think that there is a direct
relation between this issue (N+1 CPU-bound tasks) and "my X is sluggish with ULE
when I untar a large file".

P.P.S.
About the subject line.  Let's recall why ULE has become a default.  It has
happened because of many observations from users and developers that "things"
were faster/"snappier" with ULE than with 4BSD and a significant stream of
requests to make it the default.
So it's business as usual.  The schedulers are different, so there those for
whom one scheduler works better and those for whom the other works better and
those for whom both work reasonably well and those for whom neither is
satisfactory and those who don't really care/compare.  There is a silent
majority and the vocal minorities.  There are specific bugs and quirks,
advantages and disadvantages, usage patterns, hardware configurations and what
not.  When everybody starts to talk at the same time, it's a huge mess.  But
silently triaging and debugging one problem at a time also doesn't always work.
There, I've said it.  Let me now try to recall why I felt a need to say all of
this :-)
-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl

On Thu, Dec 22, 2011 at 09:01:15PM +0200, Andriy Gapon wrote:
> on 22/12/2011 20:45 Steve Kargl said the following:
> > I've used schedgraph to look at the ktrdump output.  A jpg is
> > available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
> > This shows the ping-pong effect where here 3 processes appear to be
> > using 2 cpus while the remaining 2 processes are pinned to their
> > cpus.
> 
> I'd recommended enabling CPU-specific background colors via the menu in
> schedgraph for a better illustration of your findings.
> 
> NB: I still don't understand the point of purposefully running N+1 CPU-bound
> processes.
> 

The point is that this is a node in a HPC cluster with
multiple users.  Sure, I can start my job on this node
with only N cpu-bound jobs.  Now, when user John Doe
wants to run his OpenMPI program should he login into
the 12 nodes in the cluster to see if someone is already
running N cpu-bound jobs on a given node?  4BSD
gives my jobs and John Doe's jobs a fair share of the
available cpus.  ULE does not give a fair share and 
if you read the summary file I put up on the web,
you see that it is fairly non-deterministic on when a
OpenMPI run will finish (see the mean absolute deviations
in the table of 'real' times that I posted).

There is the additional observation in one of my 2008
emails (URLs have been posted) that if you have N+1
cpu-bound jobs with, say, job0 and job1 ping-ponging
on cpu0 (due to ULE's cpu-affinity feature) and if I
kill job2 running on cpu1, then neither job0 nor job1
will migrate to cpu1.  So, one now has N cpu-bound
jobs running on N-1 cpus.

Finally, my initial post in this email thread was to
tell O. Hartman to quit beating his head against 
a wall with ULE (in an HPC environment).  Switch to
4BSD.  This was based on my 2008 observations and 
I've now wasted 2 days gather additional information
which only re-affirms my recommendation.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Andriy Gapon

on 22/12/2011 20:45 Steve Kargl said the following:
> I've used schedgraph to look at the ktrdump output.  A jpg is
> available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
> This shows the ping-pong effect where here 3 processes appear to be
> using 2 cpus while the remaining 2 processes are pinned to their
> cpus.

I'd recommended enabling CPU-specific background colors via the menu in
schedgraph for a better illustration of your findings.

NB: I still don't understand the point of purposefully running N+1 CPU-bound
processes.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl

On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote:
> On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
> > 
> > I have placed several files at
> > 
> > http://troutmask.apl.washington.edu/~kargl/freebsd
> > 
> > dmesg.txt  --> dmesg for ULE kernel
> > summary--> A summary that includes top(1) output of all runs.
> > sysctl.ule.txt --> sysctl -a for the ULE kernel
> > ktr-ule-problem-kargl.out.gz 

I've replaced the original version of the ktr file with
a new version.  The old version was corrupt due to my
failure to set 'sysctl debug.ktr.mask=0' prior to the
dump.

> One explanation for taking 1.5-2x times is that with ULE the
> threads are not migrated properly, so you end up with idle cores
> and ready threads not running (the other possible explanation
> would be that there are migrations, but they are so frequent and
> expensive that they completely trash the caches. But this seems
> unlikely for this type of task).

I've used schedgraph to look at the ktrdump output.  A jpg is
available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
This shows the ping-pong effect where here 3 processes appear to be
using 2 cpus while the remaining 2 processes are pinned to their
cpus.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl

On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote:
> On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
>> 
>> I have placed several files at
>> 
>> http://troutmask.apl.washington.edu/~kargl/freebsd
>> 
>> dmesg.txt  --> dmesg for ULE kernel
>> summary--> A summary that includes top(1) output of all runs.
>> sysctl.ule.txt --> sysctl -a for the ULE kernel
>> ktr-ule-problem-kargl.out.gz 
>> 
>> 
>> Since time is executed on the master, only the 'real' time is of
>> interest (the summary file includes user and sys times).  This
>> command is run at 5 times for each N value and up to 10 time for
>> some N values with the ULE kernel.  The following table records
>> the average 'real' time and the number in (...) is the mean
>> absolute deviations. 
>> 
>> #  N ULE 4BSD
>> # -
>> #  4223.27 (0.502)   221.76 (0.551)
>> #  5404.35 (73.82)   270.68 (0.866)
>> #  6627.56 (173.0)   247.23 (1.442)
>> #  7475.53 (84.07)   285.78 (1.421)
>> #  8429.45 (134.9)   223.64 (1.316)
> 
> One explanation for taking 1.5-2x times is that with ULE the
> threads are not migrated properly, so you end up with idle cores
> and ready threads not running

That's what I guessed back in 2008 when I first reported the
behavior.  

http://freebsd.monkey.org/freebsd-current/200807/msg00278.html
http://freebsd.monkey.org/freebsd-current/200807/msg00280.html

The top(1) output at the above URL shows 10 completely independent
instances of the same numerically intensive application running
on a circa 2008 ULE kernel.  Look at the PRI column.  The high
PRI jobs are not only pinned to a cpu, but these are running at
100% WCPU.  The low PRI jobs seem to be pinned to a subset of the
available cpus and simply ping-pong in and out of the same cpus.
In this instance, there are 5 jobs competing for time on 3 cpus.

> Also, perhaps one could build a simple test process that replicates
> this workload (so one can run it as part of regression tests):
>   1. define a CPU-intensive function f(n) which issues no
>  system calls, optionally touching
>  a lot of memory, where n  determines the number of iterations.
>   2. by trial and error (or let the program find it),
>  pick a value N1 so that the minimum execution time
>  of f(N1) is in the 10..100ms range
>   3. now run the function f() again from an outer loop so
>  that the total execution time is large (10..100s)
>  again with no intervening system calls.
>   4. use an external shell script can rerun a process
>  when it terminates, and then run multiple instances
>  in parallel. Instead of the external script one could
>  fork new instances before terminating, but i am a bit
>  unclear how CPU inheritance works when a process forks.
>  Going through the shell possibly breaks the chain.

The tests at the above URL does essentially what you
propose except in 2008 the kzk90 programs were doing 
some IO.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Oliver Brandmueller

On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
> If someone else thinks he has a specific problem that is not
> characterized by one of the cases above please let me know and I will
> put this in the chart.

It seems I stumbled over another thing.

Setup: 2 Servers providing devices by ggated, 1 Server using ggatec for 
those devices. ZFS over each a pair of disks provided by both ggated 
servers. I use rsync to fill up the 6 zpools/zfs from an existing 
storage (2 TB zpools, about 500 to 700 GiB user per pool). 2 rsyncs 
running in parallel to fill the partitions. Main server (ggate client 
with ZFS and rsync) has an Intel Xeon X3450 2.66 GHz quadcore processor 
(+HTT or whatever it's called nowadays, gives 8 "cpus" in FreeBSD).

With ULE ZFS gets slower after some time and finally gets stuck after 1 
to 3 days of continouus synchronisation (ggate works like a charm as far 
as I can tell), with 4BSD (online since 6 days) the rsync seems to run a 
lot faster and I didn't get ZFS to stall. There's nearly no local I/O 
(system is on a local SSD) and the load/CPU usage are not actually high.

All is running a quite recent RELENG_9

If anyone's interested I can get more detail and carry out some tests.

- Oliver

-- 
| Oliver Brandmueller  http://sysadm.in/ o...@sysadm.in |
|Ich bin das Internet. Sowahr ich Gott helfe. |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl

On Thu, Dec 22, 2011 at 01:07:58AM -0800, Adrian Chadd wrote:
> Are you able to go through the emails here and grab out Attilio's
> example for generating KTR scheduler traces?
> 

Did your read this part of my email?

> >
> > Attilio,
> >
> > I have placed several files at
> >
> > http://troutmask.apl.washington.edu/~kargl/freebsd
> >
> > dmesg.txt  --> dmesg for ULE kernel
> > summary--> A summary that includes top(1) output of all runs.
> > sysctl.ule.txt --> sysctl -a for the ULE kernel
> > ktr-ule-problem-kargl.out.gz

ktr-ule-problem-kargl.out is a 43 MB file.  I don't the
freebsd.org email server would allow that file through.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread George Mitchell


On 12/22/11 04:07, Adrian Chadd wrote:

Are you able to go through the emails here and grab out Attilio's
example for generating KTR scheduler traces?


Adrian
[...]

I've put up two such files:
http://www.m5p.com/~george/ktr-ule-problem.out
http://www.m5p.com/~george/ktr-ule-interact.out
but I don't know how to analyze them myself.  What do all of us do next?
-- George Mitchell
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-22 Thread Luigi Rizzo

On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
> On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote:
> > 2011/12/15 Steve Kargl :
> > > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
> > >>
> > >> I basically went through all the e-mail you just sent and identified 4
> > >> real report on which we could work on and summarizied in the attached
> > >> Excel file.
> > >> I'd like that George, Steve, Doug, Andrey and Mike possibly review the
> > >> few datas there and add more, if they want, or make more important
> > >> clarifications in particular about the Xorg presence (or rather not)
> > >> in their workload.
> > >
> > > Your summary of my observations appears correct.
> > >
> > > I have grabbed an up-to-date /usr/src, built and
> > > installed world, and built and installed a new
> > > kernel on one of the nodes in my cluster. ??It
> > > has
> > >
> > 
> > It seems a perfect environment, just please make sure you made a
> > debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically).
> > 
> > The first thing is, can you try reproducing your case? As far as I got
> > it, for you it was enough to run N + small_amount of CPU-bound threads
> > to show performance penalty, so I'd ask you to start with using dnetc
> > or just your preferred cpu-bound workload and verify you can reproduce
> > the issue.
> > As it happens, please monitor the threads bouncing and CPU utilization
> > via 'top' (you don't need to be 100% precise, jut to get an idea, and
> > keep an eye on things like excessive threads migration, thread binding
> > obsessity, low throughput on CPU).
> > One note: if your workloads need to do I/O please use a tempfs or
> > memory storage to do so, in order to reduce I/O effects at all.
> > Also, verify this doesn't happen with 4BSD scheduler, just in case.
> > 
> > Finally, if the problem is still in place, please recompile your
> > kernel by adding:
> > options KTR
> > options KTR_ENTRIES=262144
> > options KTR_COMPILE=(KTR_SCHED)
> > options KTR_MASK=(KTR_SCHED)
> > 
> > And reproduce the issue.
> > When you are in the middle of the scheduling issue go with:
> > # ktrdump -ctf > ktr-ule-problem-YOURNAME.out
> > 
> > and send to the mailing list along with your dmesg and the
> > informations on the CPU utilization you gathered by top(1).
> > 
> > That should cover it all, but if you have further questions, please
> > just go ahead.
> 
> Attilio,
> 
> I have placed several files at
> 
> http://troutmask.apl.washington.edu/~kargl/freebsd
> 
> dmesg.txt  --> dmesg for ULE kernel
> summary--> A summary that includes top(1) output of all runs.
> sysctl.ule.txt --> sysctl -a for the ULE kernel
> ktr-ule-problem-kargl.out.gz 
> 
> I performed a series of tests with both 4BSD and ULE kernels.
> The 4BSD and ULE kernels are identical except of course for the
> scheduler.  Both witness and invariants are disabled, and malloc
> has been compiled without debugging.
> 
> Here's what I did.  On the master node in my cluster, I ran an
> OpenMPI code that sends N jobs off to the node with the kernel
> of interest.  There is communication between the master and
> slaves to generate 16 independent chunks of data.  Note, there
> is no disk IO.  So, for example, N=4 will start 4 essentially
> identical numerically intensity jobs.  At the start of a run,
> the master node instructs each slave job to create a chunk of
> data.  After the data is created, the slave sends it back to the
> master and the master sends instructions to create the next chunk
> of data.  This communication continues until the 16 chunks have
> been assigned, computed, and returned to the master.  
> 
> Here is a rough measurement of the problem with ULE and numerical
> intensity loads.  This command is executed on the master
> 
> time mpiexec -machinefile mf3 -np N sasmp sas.in
> 
> Since time is executed on the master, only the 'real' time is of
> interest (the summary file includes user and sys times).  This
> command is run at 5 times for each N value and up to 10 time for
> some N values with the ULE kernel.  The following table records
> the average 'real' time and the number in (...) is the mean
> absolute deviations. 
> 
> #  N ULE 4BSD
> # -
> #  4223.27 (0.502)   221.76 (0.551)
> #  5404.35 (73.82)   270.68 (0.866)
> #  6627.56 (173.0)   247.23 (1.442)
> #  7475.53 (84.07)   285.78 (1.421)
> #  8429.45 (134.9)   223.64 (1.316)

One explanation for taking 1.5-2x times is that with ULE the
threads are not migrated properly, so you end up with idle cores
and ready threads not running (the other possible explanation
would be that there are migrations, but they are so frequent and
expensive that they completely trash the caches. But this seems
unlikely for this type of task).

Also, perhaps one could build a simple test process that replicates
this workload (so one can run it as part of regression tests):

Re: SCHED_ULE should not be the default

2011-12-22 Thread Adrian Chadd

Are you able to go through the emails here and grab out Attilio's
example for generating KTR scheduler traces?


Adrian

On 21 December 2011 16:52, Steve Kargl  
wrote:
> On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote:
>> 2011/12/15 Steve Kargl :
>> > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
>> >>
>> >> I basically went through all the e-mail you just sent and identified 4
>> >> real report on which we could work on and summarizied in the attached
>> >> Excel file.
>> >> I'd like that George, Steve, Doug, Andrey and Mike possibly review the
>> >> few datas there and add more, if they want, or make more important
>> >> clarifications in particular about the Xorg presence (or rather not)
>> >> in their workload.
>> >
>> > Your summary of my observations appears correct.
>> >
>> > I have grabbed an up-to-date /usr/src, built and
>> > installed world, and built and installed a new
>> > kernel on one of the nodes in my cluster. ??It
>> > has
>> >
>>
>> It seems a perfect environment, just please make sure you made a
>> debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically).
>>
>> The first thing is, can you try reproducing your case? As far as I got
>> it, for you it was enough to run N + small_amount of CPU-bound threads
>> to show performance penalty, so I'd ask you to start with using dnetc
>> or just your preferred cpu-bound workload and verify you can reproduce
>> the issue.
>> As it happens, please monitor the threads bouncing and CPU utilization
>> via 'top' (you don't need to be 100% precise, jut to get an idea, and
>> keep an eye on things like excessive threads migration, thread binding
>> obsessity, low throughput on CPU).
>> One note: if your workloads need to do I/O please use a tempfs or
>> memory storage to do so, in order to reduce I/O effects at all.
>> Also, verify this doesn't happen with 4BSD scheduler, just in case.
>>
>> Finally, if the problem is still in place, please recompile your
>> kernel by adding:
>> options KTR
>> options KTR_ENTRIES=262144
>> options KTR_COMPILE=(KTR_SCHED)
>> options KTR_MASK=(KTR_SCHED)
>>
>> And reproduce the issue.
>> When you are in the middle of the scheduling issue go with:
>> # ktrdump -ctf > ktr-ule-problem-YOURNAME.out
>>
>> and send to the mailing list along with your dmesg and the
>> informations on the CPU utilization you gathered by top(1).
>>
>> That should cover it all, but if you have further questions, please
>> just go ahead.
>
> Attilio,
>
> I have placed several files at
>
> http://troutmask.apl.washington.edu/~kargl/freebsd
>
> dmesg.txt      --> dmesg for ULE kernel
> summary        --> A summary that includes top(1) output of all runs.
> sysctl.ule.txt --> sysctl -a for the ULE kernel
> ktr-ule-problem-kargl.out.gz
>
> I performed a series of tests with both 4BSD and ULE kernels.
> The 4BSD and ULE kernels are identical except of course for the
> scheduler.  Both witness and invariants are disabled, and malloc
> has been compiled without debugging.
>
> Here's what I did.  On the master node in my cluster, I ran an
> OpenMPI code that sends N jobs off to the node with the kernel
> of interest.  There is communication between the master and
> slaves to generate 16 independent chunks of data.  Note, there
> is no disk IO.  So, for example, N=4 will start 4 essentially
> identical numerically intensity jobs.  At the start of a run,
> the master node instructs each slave job to create a chunk of
> data.  After the data is created, the slave sends it back to the
> master and the master sends instructions to create the next chunk
> of data.  This communication continues until the 16 chunks have
> been assigned, computed, and returned to the master.
>
> Here is a rough measurement of the problem with ULE and numerical
> intensity loads.  This command is executed on the master
>
> time mpiexec -machinefile mf3 -np N sasmp sas.in
>
> Since time is executed on the master, only the 'real' time is of
> interest (the summary file includes user and sys times).  This
> command is run at 5 times for each N value and up to 10 time for
> some N values with the ULE kernel.  The following table records
> the average 'real' time and the number in (...) is the mean
> absolute deviations.
>
> #  N         ULE             4BSD
> # -
> #  4    223.27 (0.502)   221.76 (0.551)
> #  5    404.35 (73.82)   270.68 (0.866)
> #  6    627.56 (173.0)   247.23 (1.442)
> #  7    475.53 (84.07)   285.78 (1.421)
> #  8    429.45 (134.9)   223.64 (1.316)
>
> These numbers to me demonstrate that ULE is not a good choice
> for a HPC workload.
>
> If you need more information, feel free to ask.  If you would
> like access to the node, I can probably arrange that.  But,
> we can discuss that off-line.
>
> --
> Steve
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "f

Re: SCHED_ULE should not be the default

2011-12-21 Thread Steve Kargl

On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote:
> 2011/12/15 Steve Kargl :
> > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
> >>
> >> I basically went through all the e-mail you just sent and identified 4
> >> real report on which we could work on and summarizied in the attached
> >> Excel file.
> >> I'd like that George, Steve, Doug, Andrey and Mike possibly review the
> >> few datas there and add more, if they want, or make more important
> >> clarifications in particular about the Xorg presence (or rather not)
> >> in their workload.
> >
> > Your summary of my observations appears correct.
> >
> > I have grabbed an up-to-date /usr/src, built and
> > installed world, and built and installed a new
> > kernel on one of the nodes in my cluster. ??It
> > has
> >
> 
> It seems a perfect environment, just please make sure you made a
> debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically).
> 
> The first thing is, can you try reproducing your case? As far as I got
> it, for you it was enough to run N + small_amount of CPU-bound threads
> to show performance penalty, so I'd ask you to start with using dnetc
> or just your preferred cpu-bound workload and verify you can reproduce
> the issue.
> As it happens, please monitor the threads bouncing and CPU utilization
> via 'top' (you don't need to be 100% precise, jut to get an idea, and
> keep an eye on things like excessive threads migration, thread binding
> obsessity, low throughput on CPU).
> One note: if your workloads need to do I/O please use a tempfs or
> memory storage to do so, in order to reduce I/O effects at all.
> Also, verify this doesn't happen with 4BSD scheduler, just in case.
> 
> Finally, if the problem is still in place, please recompile your
> kernel by adding:
> options KTR
> options KTR_ENTRIES=262144
> options KTR_COMPILE=(KTR_SCHED)
> options KTR_MASK=(KTR_SCHED)
> 
> And reproduce the issue.
> When you are in the middle of the scheduling issue go with:
> # ktrdump -ctf > ktr-ule-problem-YOURNAME.out
> 
> and send to the mailing list along with your dmesg and the
> informations on the CPU utilization you gathered by top(1).
> 
> That should cover it all, but if you have further questions, please
> just go ahead.

Attilio,

I have placed several files at

http://troutmask.apl.washington.edu/~kargl/freebsd

dmesg.txt  --> dmesg for ULE kernel
summary--> A summary that includes top(1) output of all runs.
sysctl.ule.txt --> sysctl -a for the ULE kernel
ktr-ule-problem-kargl.out.gz 

I performed a series of tests with both 4BSD and ULE kernels.
The 4BSD and ULE kernels are identical except of course for the
scheduler.  Both witness and invariants are disabled, and malloc
has been compiled without debugging.

Here's what I did.  On the master node in my cluster, I ran an
OpenMPI code that sends N jobs off to the node with the kernel
of interest.  There is communication between the master and
slaves to generate 16 independent chunks of data.  Note, there
is no disk IO.  So, for example, N=4 will start 4 essentially
identical numerically intensity jobs.  At the start of a run,
the master node instructs each slave job to create a chunk of
data.  After the data is created, the slave sends it back to the
master and the master sends instructions to create the next chunk
of data.  This communication continues until the 16 chunks have
been assigned, computed, and returned to the master.  

Here is a rough measurement of the problem with ULE and numerical
intensity loads.  This command is executed on the master

time mpiexec -machinefile mf3 -np N sasmp sas.in

Since time is executed on the master, only the 'real' time is of
interest (the summary file includes user and sys times).  This
command is run at 5 times for each N value and up to 10 time for
some N values with the ULE kernel.  The following table records
the average 'real' time and the number in (...) is the mean
absolute deviations. 

#  N ULE 4BSD
# -
#  4223.27 (0.502)   221.76 (0.551)
#  5404.35 (73.82)   270.68 (0.866)
#  6627.56 (173.0)   247.23 (1.442)
#  7475.53 (84.07)   285.78 (1.421)
#  8429.45 (134.9)   223.64 (1.316)

These numbers to me demonstrate that ULE is not a good choice
for a HPC workload.

If you need more information, feel free to ask.  If you would
like access to the node, I can probably arrange that.  But,
we can discuss that off-line.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-19 Thread Alexander Best

On Mon Dec 19 11, Nathan Whitehorn wrote:
> On 12/18/11 04:34, Adrian Chadd wrote:
> >The trouble is that there's lots of anecdotal evidence, but noone's
> >really gone digging deep into _their_ example of why it's broken. The
> >developers who know this stuff don't see anything wrong. That hints to
> >me it may be something a little more creepy - as an example, the
> >interplay between netisr/swi/taskqueue/callbacks and such. It may be
> >that something is being starved that isn't obviously obvious. It's
> >just a stab in the dark, but it sounds somewhat plausible based on
> >what I've seen ULE do in my network throughput hacking.
> >
> >I applaud reppie for trying to make it as easy as possible for people
> >to use KTR to provide scheduler traces for him to go digging with, so
> >please, if you have these issues and you can absolutely reproduce
> >them, please follow his instructions and work with him to get him what
> >he needs.
> 
> The thing I've seen is that ULE is substantially more enthusiastic about 
> migrating processes between cores than 4BSD. Often, this is a good 
> thing, but can increase the rate of cache misses, hurting performance 
> for cache-bound processes (I see this particularly in HPC-type 
> scientific workloads). It might be interesting to add some kind of 
> tunable here.

does r228718 have any impact regarding this behaviour?

cheers.
alex

> 
> Another more interesting and slightly longer-term possibility if someone 
> wants a project would be to integrate scheduling decisions with hwpmc 
> counters, to accumulate statistics on cache hits at each context switch 
> and preferentially keep processes with a high hits/misses ratio on the 
> same thread/cache domain relative to processes with a low one.
> -Nathan
> 
> P.S. The other thing that could be very interesting from a research and 
> scheduling standpoint would be to integrate heterogeneous SMP support 
> into the operating system, with a FreeBSD-4 "Application Processor" 
> syscall model. We seem to be going down the road where GPGPU computing 
> has MMUs, timer interrupts, IPIs, etc. (the next AMD Fusions, IBM Cell). 
> This is something that no operating system currently supports well, and 
> would be a place for BSD to shine. If anyone has a free graduate student...
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-19 Thread Andriy Gapon

on 19/12/2011 19:46 Ivan Klymenko said the following:
> В Sat, 17 Dec 2011 23:13:16 +0200
> Andriy Gapon  пишет:
> 
>> on 17/12/2011 19:33 George Mitchell said the following:
>>> Summing up for the record, in my original test:
>>> 1. It doesn't matter whether X is running or not.
>>> 2. The problem is not limited to two or fewer CPUs.  (It also
>>> happens for me on a six-CPU system.)
>>> 3. It doesn't require nCPU + 1 compute-bound processes, just nCPU.
>>>
>>> With nCPU compute-bound processes running, with SCHED_ULE, any other
>>> process that is interactive (which to me means frequently waiting
>>> for I/O) gets ABYSMAL performance -- over an order of magnitude
>>> worse than it gets with SCHED_4BSD under the same conditions.
>>
>> I definitely do not see anything like this.
>> Specifically:
>> - with X
>> - with 2 CPUs
>> - with nCPU and/or nCPU + 1 compute-bound processes
>> - with SCHED_ULE obviously :-)
>> I do not get "abysmal" performance for I/O active tasks.
>>
>> Perhaps there is something specific that you would want me to run and
>> measure.
>>
> 
> Well, share your experiences - what to do, what would the others were
> fine with SCHED_ULE. ;)

I didn't have to do anything special, so I am at loss as what to share.
It just works (tm) for me.
Sorry.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-19 Thread Ivan Klymenko

В Sat, 17 Dec 2011 23:13:16 +0200
Andriy Gapon  пишет:

> on 17/12/2011 19:33 George Mitchell said the following:
> > Summing up for the record, in my original test:
> > 1. It doesn't matter whether X is running or not.
> > 2. The problem is not limited to two or fewer CPUs.  (It also
> > happens for me on a six-CPU system.)
> > 3. It doesn't require nCPU + 1 compute-bound processes, just nCPU.
> > 
> > With nCPU compute-bound processes running, with SCHED_ULE, any other
> > process that is interactive (which to me means frequently waiting
> > for I/O) gets ABYSMAL performance -- over an order of magnitude
> > worse than it gets with SCHED_4BSD under the same conditions.
> 
> I definitely do not see anything like this.
> Specifically:
> - with X
> - with 2 CPUs
> - with nCPU and/or nCPU + 1 compute-bound processes
> - with SCHED_ULE obviously :-)
> I do not get "abysmal" performance for I/O active tasks.
> 
> Perhaps there is something specific that you would want me to run and
> measure.
> 

Well, share your experiences - what to do, what would the others were
fine with SCHED_ULE. ;)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-18 Thread Adrian Chadd

The trouble is that there's lots of anecdotal evidence, but noone's
really gone digging deep into _their_ example of why it's broken. The
developers who know this stuff don't see anything wrong. That hints to
me it may be something a little more creepy - as an example, the
interplay between netisr/swi/taskqueue/callbacks and such. It may be
that something is being starved that isn't obviously obvious. It's
just a stab in the dark, but it sounds somewhat plausible based on
what I've seen ULE do in my network throughput hacking.

I applaud reppie for trying to make it as easy as possible for people
to use KTR to provide scheduler traces for him to go digging with, so
please, if you have these issues and you can absolutely reproduce
them, please follow his instructions and work with him to get him what
he needs.



Adrian

(wow, lots of personal pronouns packed into one sentence. It must be
sleep time.)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-18 Thread Adrian Chadd

Hi,

What Attilllo and others need are KTR traces in the most stripped down
example of interactive-busting workload you can find.

Eg: if you're doing 32 concurrent buildworlds and trying to test
interactivity - fine, but that's going to result in a lot of KTR
stuff.
If you can reproduce it using a dd via /dev/null and /dev/random (like
another poster did) with nothing else running, then even better.
If you can do it without X running, even better.

I honestly suggest ignoring benchmarks for now and concentrating on
interactivity.


Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-18 Thread O. Hartmann

On 12/18/11 03:37, Bruce Cran wrote:
> On 13/12/2011 09:00, Andrey Chernov wrote:
>> I observe ULE interactivity slowness even on single core machine
>> (Pentium 4) in very visible places, like 'ps ax' output stucks in the
>> middle by ~1 second. When I switch back to SHED_4BSD, all slowness is
>> gone. 
> 
> I'm also seeing problems with ULE on a dual-socket quad-core Xeon
> machine with 16 logical CPUs. If I run "tar xf somefile.tar" and "make
> -j16 buildworld" then logging into another console can take several
> seconds. Sometimes even the "Password:" prompt can take a couple of
> seconds to appear after typing my username.
> 

I reported ages ago several problems using SCHED_ULE on FreeBSD 8/9 when
doing heavy I/O, either disk or network bound (that time I realised the
problem on servers doing heavy disk I/O or net I/O). It was suspected
that X could be the problem, but we also have a Dell PowerEdge 1950III
running FreeBSD 8.2-STABLE (by next week 9.0-RC[2/3]/STABLE) without X,
but the same problems, but no so prominent as with X. The box has 8
cores, 4 cores per socket each and 16 GB RAM, SAS 6/iR controller and
two PCI-X attached Broacom NexTreme NICs, so the hardware shouldn't be
any kind of trouble.

But that time (over the past two years for now), the problem was
considered "a personal" problem. Bah!

By the beginning of next year my working group expects new hardware.
Since we use for Linux for scientific work (due to OpenCL and CUDA on
TESLA cards), I can't use the Blade system. The boxes I expect is one
Dell Precission T7500, 96 GB RAM, two sockets, two Westmere XEONs each
socket with a summary of 12 cores/24 threads. I'll start a dual OS
installation with FreeBSD 10 and the most recent Suse (since the
development is mostly done by my colleagues on Suse for the C2075 TESLA
board, I need Suse Linux).
I will then being capable of performing some benchmarks on both boxes on
the very same hardware. The other box will be my desk's box, a brand new
Sandy-Bridge E CPU (i7-3960X) with 32 GB RAM. I'm also inclined to
install a dual boot box (I rejected this up to now since I do not like
to install GRUB2 for having multiboot when using GPT on FreeBSD). The
box will run with FreeBSD 9 and an Ubuntu or Gentoo Linux, if. I'm
unsure in the question of Linux, but I tend to have Gentoo for compiling
everything myself.
On this box, I also can perform benchmarks with several setups.

I see forward getting some help and/or tips to proof the issues we
discussed here.

Oliver

signature.asc
Description: OpenPGP digital signature

Re: SCHED_ULE should not be the default

2011-12-18 Thread Alexander Best

On Sun Dec 18 11, Alexander Best wrote:
> On Sun Dec 18 11, Andrey Chernov wrote:
> > On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote:
> > > On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote:
> > >  > On 13/12/2011 09:00, Andrey Chernov wrote:
> > >  > > I observe ULE interactivity slowness even on single core machine 
> > > (Pentium
> > >  > > 4) in very visible places, like 'ps ax' output stucks in the middle 
> > > by ~1
> > >  > > second. When I switch back to SHED_4BSD, all slowness is gone. 
> > >  > 
> > >  > I'm also seeing problems with ULE on a dual-socket quad-core Xeon 
> > > machine
> > >  > with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16
> > >  > buildworld" then logging into another console can take several seconds.
> > >  > Sometimes even the "Password:" prompt can take a couple of seconds to 
> > > appear
> > >  > after typing my username.
> > > 
> > > I'd resigned myself to expecting this sort of behaviour as 'normal' on 
> > > my single core 1133MHz PIII-M.  As a reproducable data point, running 
> > > 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat 
> > > the CPU while testing my manual fan control script, hogs it up pretty 
> > > much while regularly running the script below in another konsole to 
> > > check values - which often gets stuck half way, occasionally pausing 
> > > _twice_ before finishing.  Switching back to the first konsole (on 
> > > another desktop) to kill the dd can also take a couple/few seconds.
> > 
> > This issue not about slow machine under load, because the same 
> > slow machine under exact the same load, but with SCHED_4BSD is very fast 
> > to response interactively.
> > 
> > I think we should not misinterpret interactivity with speed. I see no big 
> > speed (i.e. compilation time) differences, switching schedulers, but see 
> > big _interactivity_ difference. ULE in general tends to underestimate 
> > interactive processes in favour of background ones. It perhaps helps to 
> > compilation, but looks like slowpoke OS from the interactive user 
> > experience.
> 
> +1
> 
> i've also experienced issues with ULE and performed several tests to compare
> it to the historical 4BSD scheduler. the difference between the two does *not*
> seem to be speed (at least not a huge difference), but interactivity.
> 
> one of the tests i performed was the following
> 
> ttyv0: untar a *huge* (+10G) archive
> ttyv1: after ~ 30 seconds of untaring do 'ls -la $direcory', where directory
>contains a lot of files. i used "direcory = /var/db/portsnap", because
s/portsnap/portsnap\/files/

>that directory contains 23117 files on my machine.
> 
> measuring 'ls -la $direcory' via time(1) revealed that SCHED_ULE takes > 15
> seconds, whereas SCHED_4BSD only takes ~ 3-5 seconds. i think the issue is io.
> io operations usually get a high priority, because statistics have shown that
> - unlike computational tasks - io intensive tasks only run for a small 
> fraction
> of time and then exit: read data -> change data -> writeback data.
> 
> so SCHED_ULE might take these statistics too literaly and gives tasks like
> bsdtar(1) (in my case) too many ressources, so other tasks which require io 
> are
> struggling to get some ressources assigned to them (ls(1) in my case).
> 
> of course SCHED_4BSD isn't perfect, too. try using it and run the stress2
> testsuite. your whole system will grind to a halt. mouse input drops below
> 1 HZ. even after killing all the stress2 tests, it will take a few minutes
> after the system becomes snappy again.
> 
> cheers.
> alex
> 
> > 
> > -- 
> > http://ache.vniz.net/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-18 Thread Alexander Best

On Sun Dec 18 11, Andrey Chernov wrote:
> On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote:
> > On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote:
> >  > On 13/12/2011 09:00, Andrey Chernov wrote:
> >  > > I observe ULE interactivity slowness even on single core machine 
> > (Pentium
> >  > > 4) in very visible places, like 'ps ax' output stucks in the middle by 
> > ~1
> >  > > second. When I switch back to SHED_4BSD, all slowness is gone. 
> >  > 
> >  > I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine
> >  > with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16
> >  > buildworld" then logging into another console can take several seconds.
> >  > Sometimes even the "Password:" prompt can take a couple of seconds to 
> > appear
> >  > after typing my username.
> > 
> > I'd resigned myself to expecting this sort of behaviour as 'normal' on 
> > my single core 1133MHz PIII-M.  As a reproducable data point, running 
> > 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat 
> > the CPU while testing my manual fan control script, hogs it up pretty 
> > much while regularly running the script below in another konsole to 
> > check values - which often gets stuck half way, occasionally pausing 
> > _twice_ before finishing.  Switching back to the first konsole (on 
> > another desktop) to kill the dd can also take a couple/few seconds.
> 
> This issue not about slow machine under load, because the same 
> slow machine under exact the same load, but with SCHED_4BSD is very fast 
> to response interactively.
> 
> I think we should not misinterpret interactivity with speed. I see no big 
> speed (i.e. compilation time) differences, switching schedulers, but see 
> big _interactivity_ difference. ULE in general tends to underestimate 
> interactive processes in favour of background ones. It perhaps helps to 
> compilation, but looks like slowpoke OS from the interactive user 
> experience.

+1

i've also experienced issues with ULE and performed several tests to compare
it to the historical 4BSD scheduler. the difference between the two does *not*
seem to be speed (at least not a huge difference), but interactivity.

one of the tests i performed was the following

ttyv0: untar a *huge* (+10G) archive
ttyv1: after ~ 30 seconds of untaring do 'ls -la $direcory', where directory
   contains a lot of files. i used "direcory = /var/db/portsnap", because
   that directory contains 23117 files on my machine.

measuring 'ls -la $direcory' via time(1) revealed that SCHED_ULE takes > 15
seconds, whereas SCHED_4BSD only takes ~ 3-5 seconds. i think the issue is io.
io operations usually get a high priority, because statistics have shown that
- unlike computational tasks - io intensive tasks only run for a small fraction
of time and then exit: read data -> change data -> writeback data.

so SCHED_ULE might take these statistics too literaly and gives tasks like
bsdtar(1) (in my case) too many ressources, so other tasks which require io are
struggling to get some ressources assigned to them (ls(1) in my case).

of course SCHED_4BSD isn't perfect, too. try using it and run the stress2
testsuite. your whole system will grind to a halt. mouse input drops below
1 HZ. even after killing all the stress2 tests, it will take a few minutes
after the system becomes snappy again.

cheers.
alex

> 
> -- 
> http://ache.vniz.net/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-17 Thread Andrey Chernov

On Sun, Dec 18, 2011 at 05:51:47PM +1100, Ian Smith wrote:
> On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote:
>  > On 13/12/2011 09:00, Andrey Chernov wrote:
>  > > I observe ULE interactivity slowness even on single core machine (Pentium
>  > > 4) in very visible places, like 'ps ax' output stucks in the middle by ~1
>  > > second. When I switch back to SHED_4BSD, all slowness is gone. 
>  > 
>  > I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine
>  > with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16
>  > buildworld" then logging into another console can take several seconds.
>  > Sometimes even the "Password:" prompt can take a couple of seconds to 
> appear
>  > after typing my username.
> 
> I'd resigned myself to expecting this sort of behaviour as 'normal' on 
> my single core 1133MHz PIII-M.  As a reproducable data point, running 
> 'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat 
> the CPU while testing my manual fan control script, hogs it up pretty 
> much while regularly running the script below in another konsole to 
> check values - which often gets stuck half way, occasionally pausing 
> _twice_ before finishing.  Switching back to the first konsole (on 
> another desktop) to kill the dd can also take a couple/few seconds.

This issue not about slow machine under load, because the same 
slow machine under exact the same load, but with SCHED_4BSD is very fast 
to response interactively.

I think we should not misinterpret interactivity with speed. I see no big 
speed (i.e. compilation time) differences, switching schedulers, but see 
big _interactivity_ difference. ULE in general tends to underestimate 
interactive processes in favour of background ones. It perhaps helps to 
compilation, but looks like slowpoke OS from the interactive user 
experience.

-- 
http://ache.vniz.net/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-17 Thread Ian Smith

On Sun, 18 Dec 2011 02:37:52 +, Bruce Cran wrote:
 > On 13/12/2011 09:00, Andrey Chernov wrote:
 > > I observe ULE interactivity slowness even on single core machine (Pentium
 > > 4) in very visible places, like 'ps ax' output stucks in the middle by ~1
 > > second. When I switch back to SHED_4BSD, all slowness is gone. 
 > 
 > I'm also seeing problems with ULE on a dual-socket quad-core Xeon machine
 > with 16 logical CPUs. If I run "tar xf somefile.tar" and "make -j16
 > buildworld" then logging into another console can take several seconds.
 > Sometimes even the "Password:" prompt can take a couple of seconds to appear
 > after typing my username.

I'd resigned myself to expecting this sort of behaviour as 'normal' on 
my single core 1133MHz PIII-M.  As a reproducable data point, running 
'dd if=/dev/random of=/dev/null' in one konsole, specifically to heat 
the CPU while testing my manual fan control script, hogs it up pretty 
much while regularly running the script below in another konsole to 
check values - which often gets stuck half way, occasionally pausing 
_twice_ before finishing.  Switching back to the first konsole (on 
another desktop) to kill the dd can also take a couple/few seconds.

t23# cat /root/bin/t23stat
#!/bin/sh
echo -n "`date` "
sysctl dev.cpu.0.freq dev.cpu.0.cx_usage
sysctl dev.acpi_ibm | egrep 'fan_|thermal'
sysctl hw.acpi.thermal.tz0.temperature
acpiconf -i0 | egrep 'State|Remain|Present|Volt'

Sure it's a slow machine, but it normally runs pretty smoothly.
Anything with a bit of disk i/o, like buildworld, runs smooth as.

This is on 8.2-R GENERIC, HZ=1000, 768MB with lots free, no swap in use.  
I'll definitely be trying SCHED_4BSD after updating to 8-stable unless a 
'miracle cure' appears beforehand.

cheers, Ian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-17 Thread Adrian Chadd

On 17 December 2011 14:00, Andriy Gapon  wrote:
> on 17/12/2011 23:20 Adrian Chadd said the following:
>> This may -not- be a userland specific problem..

> That's an interesting idea.  From the recent discussion about USB I can 
> conclude
> that USB threads run at higher priority than GEOM threads: PI_NET/PI_DISK vs
> PRIBIO.  The former is from the ithread range, the latter is from the regular
> kernel range.  Maybe it would make sense to give the GEOM threads a priority
> from the ithread range too - given their role and importance.

Ah, so I can just punt this to you? Sweet! *punt*.

I haven't had time to dig into the network side of things but I do
plan on doing this soon. Hopefully something really silly shows up.


Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-17 Thread Bruce Cran


On 13/12/2011 09:00, Andrey Chernov wrote:
I observe ULE interactivity slowness even on single core machine 
(Pentium 4) in very visible places, like 'ps ax' output stucks in the 
middle by ~1 second. When I switch back to SHED_4BSD, all slowness is 
gone. 


I'm also seeing problems with ULE on a dual-socket quad-core Xeon 
machine with 16 logical CPUs. If I run "tar xf somefile.tar" and "make 
-j16 buildworld" then logging into another console can take several 
seconds. Sometimes even the "Password:" prompt can take a couple of 
seconds to appear after typing my username.


--
Bruce Cran
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-17 Thread Andriy Gapon

on 17/12/2011 23:20 Adrian Chadd said the following:
> Erm, just as a random question - since device drivers (and GEOM) run
> as separate threads, has anyone looked into what kind of effects the
> scheduler has on these?
> 
> I definitely have measurable throughput/responsiveness differences
> between ULE and 4BSD (and preempt/non-preempt on 4BSD) on my MIPS
> boards when they're bridging traffic. I wonder if there's something
> strange going on with the scheduling and preemption of driver netisrs,
> taskqueues, the fast interrupt handlers, etc.
> 
> This may -not- be a userland specific problem..

That's an interesting idea.  From the recent discussion about USB I can conclude
that USB threads run at higher priority than GEOM threads: PI_NET/PI_DISK vs
PRIBIO.  The former is from the ithread range, the latter is from the regular
kernel range.  Maybe it would make sense to give the GEOM threads a priority
from the ithread range too - given their role and importance.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-17 Thread Adrian Chadd

Erm, just as a random question - since device drivers (and GEOM) run
as separate threads, has anyone looked into what kind of effects the
scheduler has on these?

I definitely have measurable throughput/responsiveness differences
between ULE and 4BSD (and preempt/non-preempt on 4BSD) on my MIPS
boards when they're bridging traffic. I wonder if there's something
strange going on with the scheduling and preemption of driver netisrs,
taskqueues, the fast interrupt handlers, etc.

This may -not- be a userland specific problem..


Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-17 Thread Andriy Gapon

on 17/12/2011 19:33 George Mitchell said the following:
> Summing up for the record, in my original test:
> 1. It doesn't matter whether X is running or not.
> 2. The problem is not limited to two or fewer CPUs.  (It also happens
>for me on a six-CPU system.)
> 3. It doesn't require nCPU + 1 compute-bound processes, just nCPU.
> 
> With nCPU compute-bound processes running, with SCHED_ULE, any other
> process that is interactive (which to me means frequently waiting for
> I/O) gets ABYSMAL performance -- over an order of magnitude worse than
> it gets with SCHED_4BSD under the same conditions.

I definitely do not see anything like this.
Specifically:
- with X
- with 2 CPUs
- with nCPU and/or nCPU + 1 compute-bound processes
- with SCHED_ULE obviously :-)
I do not get "abysmal" performance for I/O active tasks.

Perhaps there is something specific that you would want me to run and measure.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-17 Thread George Mitchell


On 12/14/11 21:05, Oliver Pinter wrote:

[...]
Hi!

Can you try with this settings:

op@opn ~>  sysctl kern.sched.
kern.sched.cpusetsize: 8
kern.sched.preemption: 0
kern.sched.name: ULE
kern.sched.slice: 13
kern.sched.interact: 30
kern.sched.preempt_thresh: 224
kern.sched.static_boost: 152
kern.sched.idlespins: 1
kern.sched.idlespinthresh: 16
kern.sched.affinity: 1
kern.sched.balance: 1
kern.sched.balance_interval: 133
kern.sched.steal_htt: 1
kern.sched.steal_idle: 1
kern.sched.steal_thresh: 1
kern.sched.topology_spec:
  
   0, 1
   

 0, 1

   
  

[...]


Sorry I didn't try this earlier, but I had time this morning.
Apparently you can't change kern.sched.preemption without
recompiling, so I did that.  It didn't help, and subjectively it
made interactive performance worse.  I changed preempt_thresh and
observed no difference.  There were only a couple of small
differences between your other settings and the 9.0-PRERELEASE
defaults.

Summing up for the record, in my original test:
1. It doesn't matter whether X is running or not.
2. The problem is not limited to two or fewer CPUs.  (It also happens
   for me on a six-CPU system.)
3. It doesn't require nCPU + 1 compute-bound processes, just nCPU.

With nCPU compute-bound processes running, with SCHED_ULE, any other
process that is interactive (which to me means frequently waiting for
I/O) gets ABYSMAL performance -- over an order of magnitude worse than
it gets with SCHED_4BSD under the same conditions.-- George
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Doug Barton

On 12/16/2011 14:59, Luigi Rizzo wrote:
> It really looks much easier than i thought initially.

Awesome!


-- 

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Luigi Rizzo

On Fri, Dec 16, 2011 at 01:51:26PM -0800, Doug Barton wrote:
> On 12/16/2011 13:40, Michel Talon wrote:
> > Adrian Chadd said:
> > 
> > 
> >> Hi all,
> >> 
> >> Can someone load a kernel module dynamically at boot-time?
> >> 
> >> Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at 
> >> boot-time, so the user can just change by rebooting?
> >> 
> >> That may be an acceptable solution for now.
> > 
> > As Luigi explained, the problem is not to have code for both
> > schedulers residing in the kernel, the problem is to migrate
> > processes from one scheduler to the other.
> 
> I think dynamically switching schedulers on a running system and loading
> one or the other at boot time are different problems, are they not?

Runtime switching is a superset of loading as a module at boot time.
In both cases you need to implement a generic interface between the
scheduler and the rest of the system. The good thing, compared to
2002, is that now the abstraction exists, it is made by all functions
and variables named sched_*() in sched_4bsd.c and sched_ule.c

I see there is a small number of #ifdef SCHED_ULE in a couple of
files, but probably it can be fixed.

I believe all is needed for dynamic scheduler loading is to create
function pointers for all these names, and initialize them when one
of the scheduler modules is loaded.

After that, runtime switching shouldn't require a lot of work either.
The architecture and implementation i posted earlier (repeated below
for convenience) should work, with just a bit of attention at locking
the scheduler during a switch.

References:
   http://kerneltrap.org/node/349
   http://info.iet.unipi.it/~luigi/ps_sched.20020719a.diff

It really looks much easier than i thought initially.

cheers
luigi
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Doug Barton

On 12/16/2011 14:16, Michel Talon wrote:

> Of course, you are perfectly right., and i had misunderstood Adrian's
> post.

Happens to the best of us. :)

> But if the problem is only to change scheduler by rebooting, i think
> it is no more expensive to compile a kernel with the other scheduler.
> Or is it that people never compile kernels nowadays?

That's part of it. For my money the other 2 big problems are first that
we'd like to make it as easy on the 'make release' and installer
processes as possible. I imagine (although I would not object to being
proven wrong) that 1 kernel with knobs is easier to manage and less
resource intensive than 2 kernels that differ only by this 1 feature.

The other big problem is freebsd-update. While I assume that logic could
be built into the system to handle this issue, if the guts can be built
into the kernel itself why not do that instead?

Of lesser, but not insignificant consideration is the possibility that
at some point we'll have more than 2 scheduler options.

Doug

-- 

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Michel Talon

Le 16 déc. 2011 à 22:51, Doug Barton a écrit :

> On 12/16/2011 13:40, Michel Talon wrote:
>> Adrian Chadd said:
>> 
>> 
>>> Hi all,
>>> 
>>> Can someone load a kernel module dynamically at boot-time?
>>> 
>>> Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at 
>>> boot-time, so the user can just change by rebooting?
>>> 
>>> That may be an acceptable solution for now.
>> 
>> As Luigi explained, the problem is not to have code for both
>> schedulers residing in the kernel, the problem is to migrate
>> processes from one scheduler to the other.
> 
> I think dynamically switching schedulers on a running system and loading
> one or the other at boot time are different problems, are they not?
> 

Of course, you are perfectly right., and i had misunderstood Adrian's post.
But if the problem is only to change scheduler by rebooting,
i think it is no more expensive to compile a kernel with the other scheduler. 
Or is it that people
never compile kernels nowadays?  The ability to switch scheduler on a running 
machine
would certainly be a more desirable way to test the best adaptation of the 
system to the load.

To come back to the problems in question about ULE i must say i don't see 
obvious 
malfunctions for my own use (i had some problems of this sort long ago, but they
disappeared with more recent FreeBSD).

> 
> Doug
> 
> 

--

Michel Talon
ta...@lpthe.jussieu.fr

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Doug Barton

On 12/16/2011 13:40, Michel Talon wrote:
> Adrian Chadd said:
> 
> 
>> Hi all,
>> 
>> Can someone load a kernel module dynamically at boot-time?
>> 
>> Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at 
>> boot-time, so the user can just change by rebooting?
>> 
>> That may be an acceptable solution for now.
> 
> As Luigi explained, the problem is not to have code for both
> schedulers residing in the kernel, the problem is to migrate
> processes from one scheduler to the other.

I think dynamically switching schedulers on a running system and loading
one or the other at boot time are different problems, are they not?


Doug

-- 

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Michel Talon

Adrian Chadd said:


> Hi all,
> 
> Can someone load a kernel module dynamically at boot-time?
> 
> Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at
> boot-time, so the user can just change by rebooting?
> 
> That may be an acceptable solution for now.

As Luigi explained, the problem is not to have code for both schedulers 
residing in the 
kernel, the problem is to migrate processes from one scheduler to the other.

--

Michel Talon
ta...@lpthe.jussieu.fr





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Doug Barton

On 12/16/2011 12:53, Adrian Chadd wrote:
> Hi all,
> 
> Can someone load a kernel module dynamically at boot-time?
> 
> Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at
> boot-time, so the user can just change by rebooting?
> 
> That may be an acceptable solution for now.

That, or a loader.conf tunable (which in the case of making them modules
would basically amount to the same thing, right?).

I've heard several really smart people with rather convincing
explanations of why ULE is not the right choice for default for 2 cores
or less. If we could ship one kernel with both schedulers available it
should be simple to modify the installer to choose the right one and put
the right stuff in loader.conf.

Doug

-- 

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Adrian Chadd

Hi all,

Can someone load a kernel module dynamically at boot-time?

Ie, instead of compiling it in, can 4bsd/ule be loaded as a KLD at
boot-time, so the user can just change by rebooting?

That may be an acceptable solution for now.


Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Luigi Rizzo

On Fri, Dec 16, 2011 at 11:46:35AM +0100, Stefan Esser wrote:
> Am 16.12.2011 09:11, schrieb Luigi Rizzo:
> > The interesting part is probably the definition of the methods that
> > schedulers should implement (see struct _sched_interface ).
> > 
> > The switch from one scheduler to another was implemented with a
> > sysctl. This calls the sched_move() method of the current (i.e.
> > old) scheduler, which extracts all ready processes from its own
> > "queues" (however they are implemented) and reinserts them onto the
> > new scheduler's "queues" using its (new) setrunqueue() method.  You
> > don't need to bother for blocked process as the scheduler doesn't
> > know much about them.
> > 
> > I am not preserving the thread's dynamic "priority" (think of
> > accumulated work, affinity etc.) when switching
> > schedulers, as that is expected to be an infrequent event, and
> > so in the end it doesn't really matter -- at a switch, threads
> > are inserted in the scheduler as newly created ones, using only
> > the static priority as a parameter.
> 
> I think this is OK for user processes (which will receive reasonable
> relative priorities after running a fraction of a second, anyway).
> 
> But I'm not sure whether it is possible to use static priorities for
> (real-time) kernel threads, where priority inversion may occur, if the
> current dynamic (relative) thread priorities are not preserved.

the word "priority" is too overloaded in this context, as it mixes
configuration information (which i called "static priority", and
would be really better characterized as the "service parameters"
you specify when you start a new thread) and scheduler state ("dynamic
priority" in a priority based scheduler, but other schedulers have
different state info, such as tickets, virtual times, deadlines,
cpu affinity and so on).

What i meant to say is that the way i implemented it (and i believe
it is almost the only practical way), on a change of scheduler,
all processes are requeued as if they had just started.
Then it will be the active scheduler the one who can change
the initial state according the evolution of the system
(changing priorities, tickets, virtual times, deadlines, etc.)

> But not only the relative priorities of the existing processes must be
> preserved, new kernel threads must be created with matching (relative)
> priorities. This means, that the schedulers may be switched at any time,
> but the priority values should be portable between schedulers to prevent
> dead-lock (or illegal order of execution?) of threads (AFAICT).

This issue (i think you have in mind priority inheritance, priority
inversion and related issues) is almost irrelevant in FreeBSD, and
i am really sorry to see that it comes up so frequently in discussions
and sometimes also in documentation related to process schedulers.

Apart from bugs in the implementation (see Bruce Evans' email from
a few days ago), our CPU schedulers are a collection of heuristics
without formally proved properties. So, as much as we can trust
developers to come up with effective solutions:

- we cannot rely on priorities for correctness (mutual exclusion
  or deadlock avoidance);
- we don't have any support for real time guarantees;
- average performance (which is why some of our priority-based schedulers
  may decide to implement priority inheritance) is not affected
  by events as infrequent as changing schedulers.

cheers
luigi
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-16 Thread Attilio Rao

2011/12/15 Steve Kargl :
> On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
>>
>> I basically went through all the e-mail you just sent and identified 4
>> real report on which we could work on and summarizied in the attached
>> Excel file.
>> I'd like that George, Steve, Doug, Andrey and Mike possibly review the
>> few datas there and add more, if they want, or make more important
>> clarifications in particular about the Xorg presence (or rather not)
>> in their workload.
>
> Your summary of my observations appears correct.
>
> I have grabbed an up-to-date /usr/src, built and
> installed world, and built and installed a new
> kernel on one of the nodes in my cluster.  It
> has
>
> CPU: Dual Core AMD Opteron(tm) Processor 280 (2392.65-MHz K8-class CPU)
>  Origin = "AuthenticAMD"  Id = 0x20f12  Family = f  Model = 21  Stepping = 2
>  Features=0x178bfbff  MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>  Features2=0x1
>  AMD Features=0xe2500800
>  AMD Features2=0x3
> real memory  = 17179869184 (16384 MB)
> avail memory = 16269832192 (15516 MB)
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> FreeBSD/SMP: 2 package(s) x 2 core(s)
>
> I can perform new tests with both ULE and 4BSD, but you'll
> need to be precise in the information you want collected
> (and how to collect the data) due to the rather limited
> amount of time I currently have.

It seems a perfect environment, just please make sure you made a
debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically).

The first thing is, can you try reproducing your case? As far as I got
it, for you it was enough to run N + small_amount of CPU-bound threads
to show performance penalty, so I'd ask you to start with using dnetc
or just your preferred cpu-bound workload and verify you can reproduce
the issue.
As it happens, please monitor the threads bouncing and CPU utilization
via 'top' (you don't need to be 100% precise, jut to get an idea, and
keep an eye on things like excessive threads migration, thread binding
obsessity, low throughput on CPU).
One note: if your workloads need to do I/O please use a tempfs or
memory storage to do so, in order to reduce I/O effects at all.
Also, verify this doesn't happen with 4BSD scheduler, just in case.

Finally, if the problem is still in place, please recompile your
kernel by adding:
options KTR
options KTR_ENTRIES=262144
options KTR_COMPILE=(KTR_SCHED)
options KTR_MASK=(KTR_SCHED)

And reproduce the issue.
When you are in the middle of the scheduling issue go with:
# ktrdump -ctf > ktr-ule-problem-YOURNAME.out

and send to the mailing list along with your dmesg and the
informations on the CPU utilization you gathered by top(1).

That should cover it all, but if you have further questions, please
just go ahead.

Thanks,
Attilio

-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Stefan Esser

Am 16.12.2011 09:11, schrieb Luigi Rizzo:
> The interesting part is probably the definition of the methods that
> schedulers should implement (see struct _sched_interface ).
> 
> The switch from one scheduler to another was implemented with a
> sysctl. This calls the sched_move() method of the current (i.e.
> old) scheduler, which extracts all ready processes from its own
> "queues" (however they are implemented) and reinserts them onto the
> new scheduler's "queues" using its (new) setrunqueue() method.  You
> don't need to bother for blocked process as the scheduler doesn't
> know much about them.
> 
> I am not preserving the thread's dynamic "priority" (think of
> accumulated work, affinity etc.) when switching
> schedulers, as that is expected to be an infrequent event, and
> so in the end it doesn't really matter -- at a switch, threads
> are inserted in the scheduler as newly created ones, using only
> the static priority as a parameter.

I think this is OK for user processes (which will receive reasonable
relative priorities after running a fraction of a second, anyway).

But I'm not sure whether it is possible to use static priorities for
(real-time) kernel threads, where priority inversion may occur, if the
current dynamic (relative) thread priorities are not preserved.

But not only the relative priorities of the existing processes must be
preserved, new kernel threads must be created with matching (relative)
priorities. This means, that the schedulers may be switched at any time,
but the priority values should be portable between schedulers to prevent
dead-lock (or illegal order of execution?) of threads (AFAICT).

Regards, STefan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

switching schedulers (Re: SCHED_ULE should not be the default)

2011-12-16 Thread Luigi Rizzo

On Fri, Dec 16, 2011 at 03:11:43AM +0100, C. P. Ghost wrote:
> On Thu, Dec 15, 2011 at 10:44 AM, Tom Evans  wrote:
> > Real time scheduler changing would be insane! I was thinking that
> > both/any/all schedulers could be compiled into the kernel, and the
> > choice of which one to use becomes a boot time configuration. You
> > don't have to recompile the kernel to change timecounter.
> 
> Right.
> 
> Switching the scheduler on the fly may be thinkable though.
> I could imagine a syscall that would suspend all scheduling,
> convert the bookkeeping data of one scheduler into the other
> scheduler's, and transfer control to the other scheduler. Of
> course, that would require some heavy hacking, as I would
> imagine that "cross-scheduler surgery" would result in a pretty
> hard to debug kernel (at least during development).

Since the subject has come up a few times:
back in 2002 (boy, it's almost 10 years ago!) we did implement
switchable schedulers on FreeBSD 4.x UP, and the diffs and
a bit of documentation are still online, and probably the architecture
could be reused even now or for the SMP case.

announcement and brief description
http://kerneltrap.org/node/349

the patch referred in there
http://info.iet.unipi.it/~luigi/ps_sched.20020719a.diff

The interesting part is probably the definition of the methods that
schedulers should implement (see struct _sched_interface ).

The switch from one scheduler to another was implemented with a
sysctl. This calls the sched_move() method of the current (i.e.
old) scheduler, which extracts all ready processes from its own
"queues" (however they are implemented) and reinserts them onto the
new scheduler's "queues" using its (new) setrunqueue() method.  You
don't need to bother for blocked process as the scheduler doesn't
know much about them.

I am not preserving the thread's dynamic "priority" (think of
accumulated work, affinity etc.) when switching
schedulers, as that is expected to be an infrequent event, and
so in the end it doesn't really matter -- at a switch, threads
are inserted in the scheduler as newly created ones, using only
the static priority as a parameter.

At the time I did not address the SMP case for several reasons,
but they are all gone now:
- did not have a suitable test system
- SMP support was still in a state of flux
- i did not understand the KSE concept
- and i did not have an algorithm for proportional share scheduling
  (the actual goal of the project) in an SMP context.

cheers
luigi

> 
> A more general solution could even be a separate userland
> scheduler process a la L4 [*], but since we don't have lightweight
> IPC in the kernel (yet, or never), it would require even heavier
> black wizardry. But nice and flexible it would be. ;-)
> 
> [*] Refs:
> - https://github.com/l4ka/pistachio
> - 
> http://www.systems.ethz.ch/education/past-courses/fall-2010/aos/lectures/wk13-scheduling-print.pdf
> 
> Regards,
> -cpghost.
> 
> -- 
> Cordula's Web. http://www.cordula.ws/
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread C. P. Ghost

On Thu, Dec 15, 2011 at 10:44 AM, Tom Evans  wrote:
> Real time scheduler changing would be insane! I was thinking that
> both/any/all schedulers could be compiled into the kernel, and the
> choice of which one to use becomes a boot time configuration. You
> don't have to recompile the kernel to change timecounter.

Right.

Switching the scheduler on the fly may be thinkable though.
I could imagine a syscall that would suspend all scheduling,
convert the bookkeeping data of one scheduler into the other
scheduler's, and transfer control to the other scheduler. Of
course, that would require some heavy hacking, as I would
imagine that "cross-scheduler surgery" would result in a pretty
hard to debug kernel (at least during development).

A more general solution could even be a separate userland
scheduler process a la L4 [*], but since we don't have lightweight
IPC in the kernel (yet, or never), it would require even heavier
black wizardry. But nice and flexible it would be. ;-)

[*] Refs:
- https://github.com/l4ka/pistachio
- 
http://www.systems.ethz.ch/education/past-courses/fall-2010/aos/lectures/wk13-scheduling-print.pdf

Regards,
-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Steve Kargl

On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
> 
> I basically went through all the e-mail you just sent and identified 4
> real report on which we could work on and summarizied in the attached
> Excel file.
> I'd like that George, Steve, Doug, Andrey and Mike possibly review the
> few datas there and add more, if they want, or make more important
> clarifications in particular about the Xorg presence (or rather not)
> in their workload.

Your summary of my observations appears correct.

I have grabbed an up-to-date /usr/src, built and
installed world, and built and installed a new
kernel on one of the nodes in my cluster.  It 
has

CPU: Dual Core AMD Opteron(tm) Processor 280 (2392.65-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20f12  Family = f  Model = 21  Stepping = 2
  Features=0x178bfbff
  Features2=0x1
  AMD Features=0xe2500800
  AMD Features2=0x3
real memory  = 17179869184 (16384 MB)
avail memory = 16269832192 (15516 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)

I can perform new tests with both ULE and 4BSD, but you'll
need to be precise in the information you want collected
(and how to collect the data) due to the rather limited
amount of time I currently have.

To summarize my workload, on the master node on my cluster
I start a job that will send N slave jobs to the node of
interest.  The slaves perform nearly identical cpu-bound
floating point computations, so the expectation is that
each slave should take nearly the same amount of cpu-time
to complete its task.  Communication occurs between only
the master and a slave at the start of the process and
when it finishes.  The communication is over GigE ipv4
internal network.  The slaves do not read or write to disk.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Attilio Rao

2011/12/15 Mike Tancsa :
> On 12/15/2011 11:56 AM, Attilio Rao wrote:
>> So, as very first thing, can you try the following:
>> - Same codebase, etc. etc.
>> - Make the test 4 times, discard the first and ministat for the other 3
>> - Reboot
>> - Change the steal_thresh value
>> - Make the test 4 times, discard the first and ministat for the other 3
>>
>> Then report discarded values and the ministated one and we will have
>> more informations I guess
>> (also, I don't think devfs contention should play a role here, thus
>> nevermind about it for now).
>
>
> Results and data at
>
> http://www.tancsa.com/ule-bsd.html

I'm not totally sure, what does burnP6 do? is it a CPU-bound workload?
Also, how many threads are spanked in your case for parallel bzip2?

Also, it would be very good if you could arrange these tests against
newer -CURRENT (with userland and kerneland debugging off).

Thanks a lot of your hard work,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Mike Tancsa

On 12/15/2011 11:56 AM, Attilio Rao wrote:
> So, as very first thing, can you try the following:
> - Same codebase, etc. etc.
> - Make the test 4 times, discard the first and ministat for the other 3
> - Reboot
> - Change the steal_thresh value
> - Make the test 4 times, discard the first and ministat for the other 3
> 
> Then report discarded values and the ministated one and we will have
> more informations I guess
> (also, I don't think devfs contention should play a role here, thus
> nevermind about it for now).


Results and data at

http://www.tancsa.com/ule-bsd.html

---Mike


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Ivan Klymenko

В Thu, 15 Dec 2011 20:02:44 +0100
Attilio Rao  пишет:

> 2011/12/15 Jeremy Chadwick :
> > On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote:
> >> 2011/12/13 Jeremy Chadwick :
> >> > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
> >> >> > Not fully right, boinc defaults to run on idprio 31 so this
> >> >> > isn't an issue. And yes, there are cases where SCHED_ULE
> >> >> > shows much better performance then SCHED_4BSD. ??[...]
> >> >>
> >> >> Do we have any proof at hand for such cases where SCHED_ULE
> >> >> performs much better than SCHED_4BSD? Whenever the subject
> >> >> comes up, it is mentioned, that SCHED_ULE has better
> >> >> performance on boxes with a ncpu > 2. But in the end I see here
> >> >> contradictionary statements. People complain about poor
> >> >> performance (especially in scientific environments), and other
> >> >> give contra not being the case.
> >> >>
> >> >> Within our department, we developed a highly scalable code for
> >> >> planetary science purposes on imagery. It utilizes present GPUs
> >> >> via OpenCL if present. Otherwise it grabs as many cores as it
> >> >> can. By the end of this year I'll get a new desktop box based
> >> >> on Intels new Sandy Bridge-E architecture with plenty of
> >> >> memory. If the colleague who developed the code is willing
> >> >> performing some benchmarks on the same hardware platform, we'll
> >> >> benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For
> >> >> FreeBSD I intent also to look for performance with both
> >> >> different schedulers available.
> >> >
> >> > This is in no way shape or form the same kind of benchmark as
> >> > what you're planning to do, but I thought I'd throw it out there
> >> > for folks to take in as they see fit.
> >> >
> >> > I know folks were focused mainly on buildworld.
> >> >
> >> > I personally would find it interesting if someone with a
> >> > higher-end system (e.g. 2 physical CPUs, with 6 or 8 cores per
> >> > CPU) was to do the same test (changing -jX to -j{numofcores} of
> >> > course).
> >> >
> >> > --
> >> > | Jeremy
> >> > Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at
> >> > parodius.com | | Parodius
> >> > Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
> >> > http://www.parodius.com/ | | UNIX Systems
> >> > Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, US |
> >> > | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ??
> >> > PGP 4BD6C0CB |
> >> >
> >> >
> >> > sched_ule
> >> > ===
> >> > - time make -j2 buildworld
> >> > ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io
> >> > 4565pf+0w
> >> > - time make -j2 buildkernel
> >> > ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w
> >> >
> >> >
> >> > sched_4bsd
> >> > 
> >> > - time make -j2 buildworld
> >> > ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io
> >> > 6451pf+0w
> >> > - time make -j2 buildkernel
> >> > ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w
> >> >
> >> >
> >> > software
> >> > ==
> >> > * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST
> >> > 2011
> >> > * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST
> >> > 2011
> >>
> >> Hi Jeremy,
> >> thanks for the time you spent on this.
> >>
> >> However, I wanted to ask/let you note 3 things:
> >> 1) Did you use 2 different code base for the test? (one updated on
> >> December 1 and another one on December 12)
> >
> > No; src-all (/usr/src on this system) was not updated between
> > December 1st and December 12th PST.  I do believe I updated it
> > today (15th PST). I can/will obviously hold off so that we have a
> > consistent code base for comparing numbers between schedulers
> > during buildworld and/or buildkernel.
> >
> >> 2) Please note that you should have repeated this test several
> >> times (basically until you don't get a standard deviation which is
> >> acceptable with ministat) and report the ministat output
> >
> > This is the first time I have heard of ministat(1).  I'm pretty
> > sure I see what it's for and how it applies to this situation, but
> > boy that man page could use some clarification (I have 3 people
> > looking at this thing right now trying to figure out what means
> > what in the graph :-) ). Anyway, graph or not, I see the point.
> >
> > Regarding multiple tests: yup, you're absolutely right, the only
> > way to do it would be to run a sequence of tests repeatedly
> > (probably 10 per scheduler).  Reboots and rm -fr /usr/obj/* would
> > be required after each test too, to guarantee empty kernel caches
> > (of all types) consistently every time.
> >
> > What I posted was supposed to give people just a "general idea" if
> > there was any gigantic difference between the two, and there really
> > isn't. But, as others have stated (and you below), buildworld may
> > not be an effective way to "benchmark" what we're trying to test.
> >
> > Hence me wondering exactly what would make for a goo

Re: SCHED_ULE should not be the default

2011-12-15 Thread Attilio Rao

2011/12/15 Jeremy Chadwick :
> On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote:
>> 2011/12/13 Jeremy Chadwick :
>> > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
>> >> > Not fully right, boinc defaults to run on idprio 31 so this isn't an
>> >> > issue. And yes, there are cases where SCHED_ULE shows much better
>> >> > performance then SCHED_4BSD. ??[...]
>> >>
>> >> Do we have any proof at hand for such cases where SCHED_ULE performs
>> >> much better than SCHED_4BSD? Whenever the subject comes up, it is
>> >> mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
>> >> 2. But in the end I see here contradictionary statements. People
>> >> complain about poor performance (especially in scientific environments),
>> >> and other give contra not being the case.
>> >>
>> >> Within our department, we developed a highly scalable code for planetary
>> >> science purposes on imagery. It utilizes present GPUs via OpenCL if
>> >> present. Otherwise it grabs as many cores as it can.
>> >> By the end of this year I'll get a new desktop box based on Intels new
>> >> Sandy Bridge-E architecture with plenty of memory. If the colleague who
>> >> developed the code is willing performing some benchmarks on the same
>> >> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
>> >> recent Suse. For FreeBSD I intent also to look for performance with both
>> >> different schedulers available.
>> >
>> > This is in no way shape or form the same kind of benchmark as what
>> > you're planning to do, but I thought I'd throw it out there for folks to
>> > take in as they see fit.
>> >
>> > I know folks were focused mainly on buildworld.
>> >
>> > I personally would find it interesting if someone with a higher-end
>> > system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the
>> > same test (changing -jX to -j{numofcores} of course).
>> >
>> > --
>> > | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at 
>> > parodius.com |
>> > | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? 
>> > http://www.parodius.com/ |
>> > | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, 
>> > US |
>> > | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP 
>> > 4BD6C0CB |
>> >
>> >
>> > sched_ule
>> > ===
>> > - time make -j2 buildworld
>> > ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w
>> > - time make -j2 buildkernel
>> > ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w
>> >
>> >
>> > sched_4bsd
>> > 
>> > - time make -j2 buildworld
>> > ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w
>> > - time make -j2 buildkernel
>> > ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w
>> >
>> >
>> > software
>> > ==
>> > * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011
>> > * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011
>>
>> Hi Jeremy,
>> thanks for the time you spent on this.
>>
>> However, I wanted to ask/let you note 3 things:
>> 1) Did you use 2 different code base for the test? (one updated on
>> December 1 and another one on December 12)
>
> No; src-all (/usr/src on this system) was not updated between December
> 1st and December 12th PST.  I do believe I updated it today (15th PST).
> I can/will obviously hold off so that we have a consistent code base for
> comparing numbers between schedulers during buildworld and/or
> buildkernel.
>
>> 2) Please note that you should have repeated this test several times
>> (basically until you don't get a standard deviation which is
>> acceptable with ministat) and report the ministat output
>
> This is the first time I have heard of ministat(1).  I'm pretty sure I
> see what it's for and how it applies to this situation, but boy that man
> page could use some clarification (I have 3 people looking at this thing
> right now trying to figure out what means what in the graph :-) ).
> Anyway, graph or not, I see the point.
>
> Regarding multiple tests: yup, you're absolutely right, the only way to
> do it would be to run a sequence of tests repeatedly (probably 10 per
> scheduler).  Reboots and rm -fr /usr/obj/* would be required after each
> test too, to guarantee empty kernel caches (of all types) consistently
> every time.
>
> What I posted was supposed to give people just a "general idea" if there
> was any gigantic difference between the two, and there really isn't.
> But, as others have stated (and you below), buildworld may not be an
> effective way to "benchmark" what we're trying to test.
>
> Hence me wondering exactly what would make for a good test.  Example:
>
> 1. Run + background some program that "beats on things" (I really don't
> know what; creation/deletion of threads?  CPU benchmark?  bonnie++?),
> with output going to /dev/null.
> 2. Run + background "time make -j2 buildworld" with output going to /dev/null
> 3. Record/save output fr

Re: SCHED_ULE should not be the default

2011-12-15 Thread O. Hartmann

Am 12/15/11 15:20, schrieb Steven Hartland:
> With all the discussion I thought I'd give a buildworld
> benchmark a go here on a spare 24 core machine. ULE
> tested fine but with 4BSD it wont even boot panicing
> with the following:-
> http://screensnapr.com/v/hwysGV.png
> 
> This is on a clean 8.2-RELEASE-p4
> 
> Upgrading to RELENG_9 fixed this but its a bit concerning
> that just changing the scheduler would cause the machine
> to panic on boot.
> 
> Its only a single run so varience could be high but here's
> the result of a buildworld on this machine running the
> two different schedulers:-
> 4BSD: 24m54.10s real 2h43m12.42s user 56m20.07s sys
> ULE:  23m54.68s real 2h34m59.04s user 50m59.91s sys
> 
> What really sticks out is that this is over double that
> of an 8.2 buildworld on the same machine with the same
> kernel
> ULE:  11m12.76s real 1h27m59.39s user 28m59.57s sys
> 
> This was run 9.0-PRERELEASE kernel due to 4BSD panicing
> on boot under 8.2.
> 
> So for this use ULE vs 4BSD is neither here-nor-there
> but 9.0 buildworld is very slow (x2 slower) compared
> with 8.2 so whats a bigger question in my mind.
> 
>Regards
>Steve
> 


All of our 8.2-STABLE with ncpu >= 4 compile the OS in half the time a
compilation of FreeBSD 9/10 is needed to. I guess this is due to the
huge LLVM contribution which is now part of the source tree. Even if you
allow building a whole LLVM suite (and not even pieces of it as in
FreeBSD standard for CLANG purposes), it takes another q0 to 20 minutes,
depending on the architecture of the underlying host.

Building kernel or worl, taking time and show then the invers of that
number isn't a good idea, in my opinion.
Therefore I like "artificial" benchmarks: have a set of programs that
can be compiled and take the time if compilation time is important.

Well, your one-shot test would show, that there is indeed a marginal
advantage of SCHED_ULE, if the number of cores is big enough (as said to
be n > 2 in this thread). But I'm a bit disappointed about the very
small advantage on that 24 core hog.

Oliver



signature.asc
Description: OpenPGP digital signature

Re: SCHED_ULE should not be the default

2011-12-15 Thread Daniel Kalchev

On Dec 15, 2011, at 6:26 PM, Attilio Rao wrote:

> 2011/12/13 Daniel Kalchev :
>> 
>> 
>> On 13.12.11 09:36, Jeremy Chadwick wrote:
>>> 
>>> I personally would find it interesting if someone with a higher-end system
>>> (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test
>>> (changing -jX to -j{numofcores} of course).
>> 
>> 
>> Is 4 way 8 core Opteron ok? That is 32 cores, 64GB RAM.
>> 
>> Testing with buildworld in my opinion is not adequate, as it involves way
>> too much I/O. Any advice on proper testing methodology?
> 
> I'm sure that I/O and pmap subsystem contention (because of
> buildworld) and TLB shootdown overhead (because of 32 CPUs) will be so
> overwhelming that you are not really going to benchmark the scheduler
> activity at all.

Can't pmap / TLB be tuned for 32 CPUs and 64GB of RAM?

> 
> However I still don't get what you want to verify exactly?

The obvious: is SCHED_ULE better or worse than SCHED_4BSD on such platform. 

Problem is how to test "interactivity" -- that is a blade server and doesn't 
really have a display and keyboard, nor does it have X etc.

I have spare pair of those, that might be put to crunch tests to see how things 
compare for different scenarios - but I need ideas what to test, really.

Daniel___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Jeremy Chadwick

On Thu, Dec 15, 2011 at 05:26:27PM +0100, Attilio Rao wrote:
> 2011/12/13 Jeremy Chadwick :
> > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
> >> > Not fully right, boinc defaults to run on idprio 31 so this isn't an
> >> > issue. And yes, there are cases where SCHED_ULE shows much better
> >> > performance then SCHED_4BSD. ??[...]
> >>
> >> Do we have any proof at hand for such cases where SCHED_ULE performs
> >> much better than SCHED_4BSD? Whenever the subject comes up, it is
> >> mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
> >> 2. But in the end I see here contradictionary statements. People
> >> complain about poor performance (especially in scientific environments),
> >> and other give contra not being the case.
> >>
> >> Within our department, we developed a highly scalable code for planetary
> >> science purposes on imagery. It utilizes present GPUs via OpenCL if
> >> present. Otherwise it grabs as many cores as it can.
> >> By the end of this year I'll get a new desktop box based on Intels new
> >> Sandy Bridge-E architecture with plenty of memory. If the colleague who
> >> developed the code is willing performing some benchmarks on the same
> >> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
> >> recent Suse. For FreeBSD I intent also to look for performance with both
> >> different schedulers available.
> >
> > This is in no way shape or form the same kind of benchmark as what
> > you're planning to do, but I thought I'd throw it out there for folks to
> > take in as they see fit.
> >
> > I know folks were focused mainly on buildworld.
> >
> > I personally would find it interesting if someone with a higher-end
> > system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the
> > same test (changing -jX to -j{numofcores} of course).
> >
> > --
> > | Jeremy Chadwick ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??jdc at 
> > parodius.com |
> > | Parodius Networking ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? 
> > http://www.parodius.com/ |
> > | UNIX Systems Administrator ?? ?? ?? ?? ?? ?? ?? ?? ?? Mountain View, CA, 
> > US |
> > | Making life hard for others since 1977. ?? ?? ?? ?? ?? ?? ?? PGP 4BD6C0CB 
> > |
> >
> >
> > sched_ule
> > ===
> > - time make -j2 buildworld
> > ??1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w
> > - time make -j2 buildkernel
> > ??640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w
> >
> >
> > sched_4bsd
> > 
> > - time make -j2 buildworld
> > ??1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w
> > - time make -j2 buildkernel
> > ??638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w
> >
> >
> > software
> > ==
> > * sched_ule test: ??FreeBSD 8.2-STABLE, Thu Dec ??1 04:37:29 PST 2011
> > * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011
> 
> Hi Jeremy,
> thanks for the time you spent on this.
> 
> However, I wanted to ask/let you note 3 things:
> 1) Did you use 2 different code base for the test? (one updated on
> December 1 and another one on December 12)

No; src-all (/usr/src on this system) was not updated between December
1st and December 12th PST.  I do believe I updated it today (15th PST).
I can/will obviously hold off so that we have a consistent code base for
comparing numbers between schedulers during buildworld and/or
buildkernel.

> 2) Please note that you should have repeated this test several times
> (basically until you don't get a standard deviation which is
> acceptable with ministat) and report the ministat output

This is the first time I have heard of ministat(1).  I'm pretty sure I
see what it's for and how it applies to this situation, but boy that man
page could use some clarification (I have 3 people looking at this thing
right now trying to figure out what means what in the graph :-) ).
Anyway, graph or not, I see the point.

Regarding multiple tests: yup, you're absolutely right, the only way to
do it would be to run a sequence of tests repeatedly (probably 10 per
scheduler).  Reboots and rm -fr /usr/obj/* would be required after each
test too, to guarantee empty kernel caches (of all types) consistently
every time.

What I posted was supposed to give people just a "general idea" if there
was any gigantic difference between the two, and there really isn't.
But, as others have stated (and you below), buildworld may not be an
effective way to "benchmark" what we're trying to test.

Hence me wondering exactly what would make for a good test.  Example:

1. Run + background some program that "beats on things" (I really don't
know what; creation/deletion of threads?  CPU benchmark?  bonnie++?),
with output going to /dev/null.
2. Run + background "time make -j2 buildworld" with output going to /dev/null
3. Record/save output from "time".
4. rm -fr /usr/obj && shutdown -r now
5. Repeat all steps ~10 times
6. Adjust kernel configuration file to use other scheduler
7. Repeat steps 1-5.

Re: SCHED_ULE should not be the default

2011-12-15 Thread Bruce Cran


On 15/12/2011 14:20, Steven Hartland wrote:

So for this use ULE vs 4BSD is neither here-nor-there
but 9.0 buildworld is very slow (x2 slower) compared
with 8.2 so whats a bigger question in my mind.


clang is new in 9.0 and takes a long time to build.

--
Bruce Cran
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Attilio Rao

2011/12/15 Mike Tancsa :
> On 12/15/2011 11:42 AM, Attilio Rao wrote:
>>
>> I'm thinking now to a better test-case for this: can you try that on a
>> tmpfs volume?
>
> There is enough RAM in the box so that it should not touch the disk, and
> I was sending the output to /dev/null, so it was not writing to the disk.
>
>>
>> Also what filesystem you were using?
>
> UFS
>
>> How many CPUs were in place?
>
> 4
>
>> Did you reboot before to move the steal_thresh value?
>
> No.

So, as very first thing, can you try the following:
- Same codebase, etc. etc.
- Make the test 4 times, discard the first and ministat for the other 3
- Reboot
- Change the steal_thresh value
- Make the test 4 times, discard the first and ministat for the other 3

Then report discarded values and the ministated one and we will have
more informations I guess
(also, I don't think devfs contention should play a role here, thus
nevermind about it for now).

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Mike Tancsa

On 12/15/2011 11:42 AM, Attilio Rao wrote:
> 
> I'm thinking now to a better test-case for this: can you try that on a
> tmpfs volume?

There is enough RAM in the box so that it should not touch the disk, and
I was sending the output to /dev/null, so it was not writing to the disk.

> 
> Also what filesystem you were using? 

UFS

> How many CPUs were in place?

4

> Did you reboot before to move the steal_thresh value?

No.

---Mike
-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Attilio Rao

2011/12/15 Mike Tancsa :
> On 12/15/2011 11:26 AM, Attilio Rao wrote:
>>
>> Hi Mike,
>> was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE?
>
> Hi Attilio,
>        It was the same codebase.
>
>
>> Could you retry the bench checking CPU usage and possible thread
>> migration around for both cases?
>
> I can, but how do I do that ?

I'm thinking now to a better test-case for this: can you try that on a
tmpfs volume?

Also what filesystem you were using? How many CPUs were in place?
Did you reboot before to move the steal_thresh value?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Mike Tancsa

On 12/15/2011 11:26 AM, Attilio Rao wrote:
> 
> Hi Mike,
> was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE?

Hi Attilio,
It was the same codebase.


> Could you retry the bench checking CPU usage and possible thread
> migration around for both cases?

I can, but how do I do that ?

---Mike

-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Attilio Rao

2011/12/13 Jeremy Chadwick :
> On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
>> > Not fully right, boinc defaults to run on idprio 31 so this isn't an
>> > issue. And yes, there are cases where SCHED_ULE shows much better
>> > performance then SCHED_4BSD.  [...]
>>
>> Do we have any proof at hand for such cases where SCHED_ULE performs
>> much better than SCHED_4BSD? Whenever the subject comes up, it is
>> mentioned, that SCHED_ULE has better performance on boxes with a ncpu >
>> 2. But in the end I see here contradictionary statements. People
>> complain about poor performance (especially in scientific environments),
>> and other give contra not being the case.
>>
>> Within our department, we developed a highly scalable code for planetary
>> science purposes on imagery. It utilizes present GPUs via OpenCL if
>> present. Otherwise it grabs as many cores as it can.
>> By the end of this year I'll get a new desktop box based on Intels new
>> Sandy Bridge-E architecture with plenty of memory. If the colleague who
>> developed the code is willing performing some benchmarks on the same
>> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
>> recent Suse. For FreeBSD I intent also to look for performance with both
>> different schedulers available.
>
> This is in no way shape or form the same kind of benchmark as what
> you're planning to do, but I thought I'd throw it out there for folks to
> take in as they see fit.
>
> I know folks were focused mainly on buildworld.
>
> I personally would find it interesting if someone with a higher-end
> system (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the
> same test (changing -jX to -j{numofcores} of course).
>
> --
> | Jeremy Chadwick                                jdc at parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                   Mountain View, CA, US |
> | Making life hard for others since 1977.               PGP 4BD6C0CB |
>
>
> sched_ule
> ===
> - time make -j2 buildworld
>  1689.831u 229.328s 18:46.20 170.4% 6566+2051k 432+4264io 4565pf+0w
> - time make -j2 buildkernel
>  640.542u 87.737s 9:01.38 134.5% 6490+1920k 134+5968io 0pf+0w
>
>
> sched_4bsd
> 
> - time make -j2 buildworld
>  1662.793u 206.908s 17:12.02 181.1% 6578+2054k 23750+4271io 6451pf+0w
> - time make -j2 buildkernel
>  638.717u 76.146s 8:34.90 138.8% 6530+1927k 6415+5903io 0pf+0w
>
>
> software
> ==
> * sched_ule test:  FreeBSD 8.2-STABLE, Thu Dec  1 04:37:29 PST 2011
> * sched_4bsd test: FreeBSD 8.2-STABLE, Mon Dec 12 22:42:54 PST 2011

Hi Jeremy,
thanks for the time you spent on this.

However, I wanted to ask/let you note 3 things:
1) Did you use 2 different code base for the test? (one updated on
December 1 and another one on December 12)
2) Please note that you should have repeated this test several times
(basically until you don't get a standard deviation which is
acceptable with ministat) and report the ministat output
3) The difference is less than 2% which I suspect is really
statistically unuseful/the same

I'm not really even surprised ULE is not faster than 4BSD in this case
because usually buildworld/buildkernel tests are driven for the vast
majority by I/O overhead rather than scheduler capacity. It would be
more interesting to analyze how buildworld does while another type of
workload is going on.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Attilio Rao

2011/12/13 Daniel Kalchev :
>
>
> On 13.12.11 09:36, Jeremy Chadwick wrote:
>>
>> I personally would find it interesting if someone with a higher-end system
>> (e.g. 2 physical CPUs, with 6 or 8 cores per CPU) was to do the same test
>> (changing -jX to -j{numofcores} of course).
>
>
> Is 4 way 8 core Opteron ok? That is 32 cores, 64GB RAM.
>
> Testing with buildworld in my opinion is not adequate, as it involves way
> too much I/O. Any advice on proper testing methodology?

I'm sure that I/O and pmap subsystem contention (because of
buildworld) and TLB shootdown overhead (because of 32 CPUs) will be so
overwhelming that you are not really going to benchmark the scheduler
activity at all.

However I still don't get what you want to verify exactly?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Attilio Rao

2011/12/14 Mike Tancsa :
> On 12/13/2011 7:01 PM, m...@freebsd.org wrote:
>>
>> Has anyone experiencing problems tried to set sysctl 
>> kern.sched.steal_thresh=1 ?
>>
>> I don't remember what our specific problem at $WORK was, perhaps it
>> was just interrupt threads not getting serviced fast enough, but we've
>> hard-coded this to 1 and removed the code that sets it in
>> sched_initticks().  The same effect should be had by setting the
>> sysctl after a box is up.
>
> FWIW, this does impact the performance of pbzip2 on an i7. Using a 1.1G file
>
> pbzip2 -v -c big > /dev/null
>
> with burnP6 running in the background,
>
> sysctl kern.sched.steal_thresh=1
> vs
> sysctl kern.sched.steal_thresh=3
>
>
>
>    N           Min           Max        Median           Avg        Stddev
> x  10     38.005022      38.42238     38.194648     38.165052    0.15546188
> +   9     38.695417     40.595544     39.392127     39.435384    0.59814114
> Difference at 95.0% confidence
>        1.27033 +/- 0.412636
>        3.32852% +/- 1.08119%
>        (Student's t, pooled s = 0.425627)
>
> a value of 1 is *slightly* faster.

Hi Mike,
was that just the same codebase with the switch SCHED_4BSD/SCHED_ULE?

Also, the results here should be in the 3% interval for the avg case,
which is not yet at the 'alarm level' but could still be an
indication.
I still suspect I/O plays a big role here, however, thus it could be
detemined by other factors.

Could you retry the bench checking CPU usage and possible thread
migration around for both cases?

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Attilio Rao

2011/12/9 George Mitchell :
> dnetc is an open-source program from http://www.distributed.net/.  It
> tries a brute-force approach to cracking RC4 puzzles and also computes
> optimal Golomb rulers.  It starts up one process per CPU and runs at
> nice 20 and is, for all intents and purposes, 100% compute bound.

[Posting on the first message of the thread]

I basically went through all the e-mail you just sent and identified 4
real report on which we could work on and summarizied in the attached
Excel file.
I'd like that George, Steve, Doug, Andrey and Mike possibly review the
few datas there and add more, if they want, or make more important
clarifications in particular about the Xorg presence (or rather not)
in their workload.

I've readed a couple of message in the thread pointing the finger to
Xorg to be excessively CPU-intensive and I think they are right, we
might try to find a solution for that at some point, but it is really
a very edge case.
Geroge's and Steve's case, instead, look very different from this and
I want to analyze them in detail.
George already provided schedgraph traces and for others, if they
cannot provide them directly, I'd really appreciate they would at
least describe in detail the workload so that I get a chance to
reproduce it.

If someone else thinks he has a specific problem that is not
characterized by one of the cases above please let me know and I will
put this in the chart.

Thanks for the hard work you guys put in pointing out ULE's problem, I
think we will get at the bottom of this if we keep up sharing thoughts
and reports.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Eitan Adler

On Thu, Dec 15, 2011 at 10:32 AM, Steven Hartland
 wrote:
> Lars Engels wrote:
>>
>> 9.0 ships with gcc and clang which both need to be compiled, 8.2 only
>> has gcc.
>
>
> Ahh, any reason we need both, and is it possible to disable clang?

man src.conf
add WITHOUT_CLANG=yes to /etc/src.conf
-- 
Eitan Adler
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Steven Hartland


Lars Engels wrote:

9.0 ships with gcc and clang which both need to be compiled, 8.2 only
has gcc.


Ahh, any reason we need both, and is it possible to disable clang?

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Lars Engels

On Thu, Dec 15, 2011 at 02:20:04PM -, Steven Hartland wrote:
> With all the discussion I thought I'd give a buildworld
> benchmark a go here on a spare 24 core machine. ULE
> tested fine but with 4BSD it wont even boot panicing
> with the following:-
> http://screensnapr.com/v/hwysGV.png
> 
> This is on a clean 8.2-RELEASE-p4
> 
> Upgrading to RELENG_9 fixed this but its a bit concerning
> that just changing the scheduler would cause the machine
> to panic on boot.
> 
> Its only a single run so varience could be high but here's
> the result of a buildworld on this machine running the
> two different schedulers:-
> 4BSD: 24m54.10s real 2h43m12.42s user 56m20.07s sys
> ULE:  23m54.68s real 2h34m59.04s user 50m59.91s sys
> 
> What really sticks out is that this is over double that
> of an 8.2 buildworld on the same machine with the same
> kernel
> ULE:  11m12.76s real 1h27m59.39s user 28m59.57s sys

9.0 ships with gcc and clang which both need to be compiled, 8.2 only
has gcc.

> 
> This was run 9.0-PRERELEASE kernel due to 4BSD panicing
> on boot under 8.2.
> 
> So for this use ULE vs 4BSD is neither here-nor-there
> but 9.0 buildworld is very slow (x2 slower) compared
> with 8.2 so whats a bigger question in my mind.
> 
> Regards
> Steve



pgpBrx1B7rCED.pgp
Description: PGP signature

Re: SCHED_ULE should not be the default

2011-12-15 Thread Steven Hartland


With all the discussion I thought I'd give a buildworld
benchmark a go here on a spare 24 core machine. ULE
tested fine but with 4BSD it wont even boot panicing
with the following:-
http://screensnapr.com/v/hwysGV.png

This is on a clean 8.2-RELEASE-p4

Upgrading to RELENG_9 fixed this but its a bit concerning
that just changing the scheduler would cause the machine
to panic on boot.

Its only a single run so varience could be high but here's
the result of a buildworld on this machine running the
two different schedulers:-
4BSD: 24m54.10s real 2h43m12.42s user 56m20.07s sys
ULE:  23m54.68s real 2h34m59.04s user 50m59.91s sys

What really sticks out is that this is over double that
of an 8.2 buildworld on the same machine with the same
kernel
ULE:  11m12.76s real 1h27m59.39s user 28m59.57s sys

This was run 9.0-PRERELEASE kernel due to 4BSD panicing
on boot under 8.2.

So for this use ULE vs 4BSD is neither here-nor-there
but 9.0 buildworld is very slow (x2 slower) compared
with 8.2 so whats a bigger question in my mind.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Tom Evans

On Thu, Dec 15, 2011 at 12:42 AM, Jeremy Chadwick
 wrote:
> On Thu, Dec 15, 2011 at 12:39:50AM +0100, O. Hartmann wrote:
>> On 12/14/11 18:54, Tom Evans wrote:
>> > I believe the correct thing to do is to put some extra documentation
>> > into the handbook about scheduler choice, noting the potential issues
>> > with loading NCPU+1 CPU bound processes. Perhaps making it easier to
>> > switch scheduler would also help?
>
> Replying to Tom's comment here:
>
> It is already easy to switch schedulers.  You change the option in your
> kernel config, rebuild kernel (world isn't necessary as long as you
> haven't csup'd between your last rebuild and now), make installkernel,
> shutdown -r now, done.

Your definition of 'easy' differs wildly from mine. How is that in any
way 'easy' to do across 200 servers?

>
> If what you're proposing is to make the scheduler changeable in
> real-time?  I think that would require a **lot** of work for something
> that very few people would benefit from (please stop for a moment and
> think about the majority of the userbase, not just niche environments; I
> say this politely, not with any condescension BTW).  Sure, it'd be
> "nice to have", but should be extremely low on the priority list (IMO).

Real time scheduler changing would be insane! I was thinking that
both/any/all schedulers could be compiled into the kernel, and the
choice of which one to use becomes a boot time configuration. You
don't have to recompile the kernel to change timecounter.

Cheers

Tom
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Matthew Seaman

On 15/12/2011 00:42, Jeremy Chadwick wrote:
> It is already easy to switch schedulers.  You change the option in your
> kernel config, rebuild kernel (world isn't necessary as long as you
> haven't csup'd between your last rebuild and now), make installkernel,
> shutdown -r now, done.
> 
> If what you're proposing is to make the scheduler changeable in
> real-time?  I think that would require a **lot** of work for something
> that very few people would benefit from (please stop for a moment and
> think about the majority of the userbase, not just niche environments; I
> say this politely, not with any condescension BTW).  Sure, it'd be
> "nice to have", but should be extremely low on the priority list (IMO).

Somewhere in between might be a good idea it seems to me: viz, change a
setting in loader.conf and reboot to switch to a new scheduler.  Having
to juggle different kernels is no big deal for the likes of you and me,
but it is quite a barrier in many environments.

Cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
JID: matt...@infracaninophile.co.uk   Kent, CT11 9PW



signature.asc
Description: OpenPGP digital signature

Re: SCHED_ULE should not be the default

2011-12-15 Thread perryh

Jeremy Chadwick  wrote:

> It is already easy to switch schedulers.  You change the
> option in your kernel config, rebuild kernel (world isn't
> necessary as long as you haven't csup'd between your last
> rebuild and now), make installkernel, shutdown -r now,
> done.

and you have thereby shot freebsd-update in the foot,
because you are no longer using a generic kernel.

> If what you're proposing is to make the scheduler changeable
> in real-time?  I think that would require a **lot** of work
> for something that very few people would benefit from ...

Switching on the fly sounds frightfully difficult, as long as
4BSD and ULE are separate code bases.  (It might not be so bad
if a tunable or 3 could be added to ULE, so that it could be
configured to behave like 4BSD.)

However, the freebsd-update complication could in principle be
relieved by building both schedulers into the generic kernel,
with the choice being configurable in loader.conf.  It would
still take a reboot to switch, but not a kernel rebuild.  Of
course there may be practical issues, e.g. name collisions.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-15 Thread Ivan Klymenko

В Thu, 15 Dec 2011 03:05:12 +0100
Oliver Pinter  пишет:

> On 12/15/11, O. Hartmann  wrote:
> > On 12/14/11 18:54, Tom Evans wrote:
> >> On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell
> >>  wrote:
> >>>
> >>> Dear Secret Masters of FreeBSD: Can we have a decision on whether
> >>> to change back to SCHED_4BSD while SCHED_ULE gets properly fixed?
> >>>
> >>
> >> Please do not do this. This thread has shown that ULE performs
> >> poorly in very specific scenarios where the server is loaded with
> >> NCPU+1 CPU bound processes, and brought forward more complaints
> >> about interactivity in X (I've never noticed this, and use a
> >> FreeBSD desktop daily).
> >
> > I would highly appreciate a decission against SCHED_ULE as the
> > default scheduler! SCHED_4BSD is considered a more mature entity
> > and obviously it seems that SCHED_ULE needs some refinements to
> > achieve a better level of quality.
> >
> >>
> >> On the other hand, we have very many benchmarks showing how poorly
> >> 4BSD scales on things like postgresql. We get much more load out of
> >> our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's
> >> easy to look at what you do and say "well, what suits my
> >> environment is clearly the best default", but I think there are
> >> probably more users typically running IO bound processes than CPU
> >> bound processes.
> >
> > You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0?
> > Shouldn't you compare SCHED_ULE and SCHED_4BSD on the very same
> > platform?
> >
> > Development of SCHED_ULE has been focused very much on DB like
> > PostgreSQL, no wonder the performance benefit. But this is also a
> > very specific scneario where SCHED_ULE shows a real benefit
> > compared to SCHED_4BSD.
> >
> >>
> >> I believe the correct thing to do is to put some extra
> >> documentation into the handbook about scheduler choice, noting the
> >> potential issues with loading NCPU+1 CPU bound processes. Perhaps
> >> making it easier to switch scheduler would also help?
> >
> > Many people more experst in the issue than myself revealed some
> > issues in the code of both SCHED_ULE and even SCHED_4BSD. It would
> > be a pitty if all the discussions get flushed away like a
> > "toilette-busisness" as it has been done all the way in the past.
> >
> >
> > Well, I'd like to see a kind of "standardized" benchmark. Like on
> > openbenchmark.org or at phoronix.com. I know that Phoronix' way of
> > performing benchmarks is questionable and do not reveal much of the
> > issues, but it is better than nothing. I'm always surprised by the
> > worse performance of FreeBSD when it comes to threaded I/O. The
> > differences between Linux and FreeBSD of the same development
> > maturity are tremendous and scaring!
> >
> > It is a long time since I saw a SPEC benchmark on a FreeBSD driven
> > HPC box. Most benchmark around for testing hardware are performed
> > with Linux and Linux seems to make the race in nearly every
> > scenario. It would be highly appreciable and interesting to see how
> > Linux and FreeBSD would perform in SPEC on the same hardware
> > platform. This is only an idea. Without a suitable benchmark with a
> > codebase understood the discussion is in many aspects pointless
> > -both ways.
> >
> >
> >>
> >> Cheers
> >>
> >> Tom
> >>
> >> References:
> >>
> >> http://people.freebsd.org/~kris/scaling/mysql-freebsd.png
> >> http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade
> >> ___
> >
> >
> 
> Hi!
> 
> Can you try with this settings:
> 
> op@opn ~> sysctl kern.sched.
> kern.sched.cpusetsize: 8
> kern.sched.preemption: 0
> kern.sched.name: ULE
> kern.sched.slice: 13
> kern.sched.interact: 30
> kern.sched.preempt_thresh: 224
> kern.sched.static_boost: 152
> kern.sched.idlespins: 1
> kern.sched.idlespinthresh: 16
> kern.sched.affinity: 1
> kern.sched.balance: 1
> kern.sched.balance_interval: 133
> kern.sched.steal_htt: 1
> kern.sched.steal_idle: 1
> kern.sched.steal_thresh: 1
> kern.sched.topology_spec: 
>  
>   0, 1
>   
>
> 0, 1
>
>   
>  
> 
> 
> Most of them from 7-STABLE settings, and with this, "works for me".
> This an laptop with core2 duo cpu (with enabled powerd), and my kernel
> config is here:
> http://oliverp.teteny.bme.hu/freebsd/kernel_conf

And you try to do like there http://www.youtube.com/watch?v=1CLCp-dqWu0
what would your the cursor mouse and Xorg NOT froze for a split second
or more...
And I'll see how really good your ULE ;)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread Daniel Kalchev




On 15.12.11 01:39, O. Hartmann wrote:

On 12/14/11 18:54, Tom Evans wrote:

On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell
  wrote:

Dear Secret Masters of FreeBSD: Can we have a decision on whether to
change back to SCHED_4BSD while SCHED_ULE gets properly fixed?


Please do not do this. This thread has shown that ULE performs poorly
in very specific scenarios where the server is loaded with NCPU+1 CPU
bound processes, and brought forward more complaints about
interactivity in X (I've never noticed this, and use a FreeBSD desktop
daily).

I would highly appreciate a decission against SCHED_ULE as the default
scheduler! SCHED_4BSD is considered a more mature entity and obviously
it seems that SCHED_ULE needs some refinements to achieve a better level
of quality.



My logic would be, if SCHED_ULE works better on multi-CPU systems, or if 
SCHED_4BSD works poor on multi-CPU systems, then by all means keep 
SCHED_ULE as default scheduler. We are at the end of 2011 and the number 
of single or dual core CPU systems is decreasing. Most people would just 
try the newest FreeBSD version on their newest hardware and on that base 
make an "informed" decision if it is worth it. If on newer hardware 
SCHED_ULE gives better performance, then again it should be the default.


Then, FreeBSD is used in an extremely wide set fo different 
environments. What scheduler might benefit an one CPU, simple 
architecture X workstation may be damaging for the performance of 
multiple CPU, NUMA based server with a large number of non-interactive 
processes running.


Perhaps an knob should be provided with sufficient documentation for 
those that will not go forward to recompile the kernel (the majority of 
users, I would guess).


I tried switching my RELENG8 desktop from SCHED_ULE to SCHED_4BSD 
yesterday and cannot see any measurable difference in responsiveness. My 
'stress test' is typically an FLASH game, that get's firefox in an 
almost unresponsive state, eats one of the CPU cores -- but no 
difference. Well, FLASH has it's own set of problems on FreeBSD, but 
these are typical "desktop" uses. Running 100% compute intensive 
processes in background is not.


Daniel

PS: As to why Linux is "better" in these usages: they do not care much 
to do things "right", but rather to achieve performance. In my opinion, 
most of us are with FreeBSD for the "do it right" attitude.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread Oliver Pinter

On 12/15/11, Jeremy Chadwick  wrote:
> On Thu, Dec 15, 2011 at 03:05:12AM +0100, Oliver Pinter wrote:
>> On 12/15/11, O. Hartmann  wrote:
>> > On 12/14/11 18:54, Tom Evans wrote:
>> >> On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell
>> >>  wrote:
>> >>>
>> >>> Dear Secret Masters of FreeBSD: Can we have a decision on whether to
>> >>> change back to SCHED_4BSD while SCHED_ULE gets properly fixed?
>> >>>
>> >>
>> >> Please do not do this. This thread has shown that ULE performs poorly
>> >> in very specific scenarios where the server is loaded with NCPU+1 CPU
>> >> bound processes, and brought forward more complaints about
>> >> interactivity in X (I've never noticed this, and use a FreeBSD desktop
>> >> daily).
>> >
>> > I would highly appreciate a decission against SCHED_ULE as the default
>> > scheduler! SCHED_4BSD is considered a more mature entity and obviously
>> > it seems that SCHED_ULE needs some refinements to achieve a better level
>> > of quality.
>> >
>> >>
>> >> On the other hand, we have very many benchmarks showing how poorly
>> >> 4BSD scales on things like postgresql. We get much more load out of
>> >> our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's
>> >> easy to look at what you do and say "well, what suits my environment
>> >> is clearly the best default", but I think there are probably more
>> >> users typically running IO bound processes than CPU bound processes.
>> >
>> > You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't
>> > you compare SCHED_ULE and SCHED_4BSD on the very same platform?
>> >
>> > Development of SCHED_ULE has been focused very much on DB like
>> > PostgreSQL, no wonder the performance benefit. But this is also a very
>> > specific scneario where SCHED_ULE shows a real benefit compared to
>> > SCHED_4BSD.
>> >
>> >>
>> >> I believe the correct thing to do is to put some extra documentation
>> >> into the handbook about scheduler choice, noting the potential issues
>> >> with loading NCPU+1 CPU bound processes. Perhaps making it easier to
>> >> switch scheduler would also help?
>> >
>> > Many people more experst in the issue than myself revealed some issues
>> > in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty
>> > if all the discussions get flushed away like a "toilette-busisness" as
>> > it has been done all the way in the past.
>> >
>> >
>> > Well, I'd like to see a kind of "standardized" benchmark. Like on
>> > openbenchmark.org or at phoronix.com. I know that Phoronix' way of
>> > performing benchmarks is questionable and do not reveal much of the
>> > issues, but it is better than nothing. I'm always surprised by the worse
>> > performance of FreeBSD when it comes to threaded I/O. The differences
>> > between Linux and FreeBSD of the same development maturity are
>> > tremendous and scaring!
>> >
>> > It is a long time since I saw a SPEC benchmark on a FreeBSD driven HPC
>> > box. Most benchmark around for testing hardware are performed with Linux
>> > and Linux seems to make the race in nearly every scenario. It would be
>> > highly appreciable and interesting to see how Linux and FreeBSD would
>> > perform in SPEC on the same hardware platform. This is only an idea.
>> > Without a suitable benchmark with a codebase understood the discussion
>> > is in many aspects pointless -both ways.
>> >
>> >
>> >>
>> >> Cheers
>> >>
>> >> Tom
>> >>
>> >> References:
>> >>
>> >> http://people.freebsd.org/~kris/scaling/mysql-freebsd.png
>> >> http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade
>> >> ___
>>
>> Hi!
>>
>> Can you try with this settings:
>> op@opn ~> sysctl kern.sched.
>
> I'm replying with a list of each setting which differs compared to
> RELENG_8 stock on our ULE systems.  Note that our ULE systems are 1
> physical CPU with 4 cores.

On other system that has 4 core I use 7-STABLE, because I have not
enough time for upgraded it, and the system has some custom patches.
The  values what I send in previous mail mostly based on this 4 cores
system.

>
>> kern.sched.cpusetsize: 8
>
> I see no such tunable/sysctl on any of our RELENG_8 and RELENG_7
> systems.  Nor do I find any references to it in /usr/src (on any
> system).  Is this a RELENG_9 setting?  Please explain where it comes
> from.  I hope it's not a custom kernel patch...

Yes, this is 9-STABLE.

>
>> kern.sched.preemption: 0
>
> This differs; default value is 1.

PREEMPTION is disabled via kernel config.

>
>> kern.sched.name: ULE
>> kern.sched.slice: 13
>> kern.sched.interact: 30
>
>> kern.sched.preempt_thresh: 224
>
> This differs; default value is 64.  The "magic value" of 224 has been
> discussed in the past, in this thread even.

This magic value has discussed before 1 or 1.5 year here, first for 8-STABLE.

>
>> kern.sched.static_boost: 152
>
> This differs; on our systems it's 160.
>
>> kern.sched.idlespins: 1
>
>> kern.sched.idlespinthresh: 16
>
> This differs; on our sys

Re: SCHED_ULE should not be the default

2011-12-14 Thread Jeremy Chadwick

On Thu, Dec 15, 2011 at 03:05:12AM +0100, Oliver Pinter wrote:
> On 12/15/11, O. Hartmann  wrote:
> > On 12/14/11 18:54, Tom Evans wrote:
> >> On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell
> >>  wrote:
> >>>
> >>> Dear Secret Masters of FreeBSD: Can we have a decision on whether to
> >>> change back to SCHED_4BSD while SCHED_ULE gets properly fixed?
> >>>
> >>
> >> Please do not do this. This thread has shown that ULE performs poorly
> >> in very specific scenarios where the server is loaded with NCPU+1 CPU
> >> bound processes, and brought forward more complaints about
> >> interactivity in X (I've never noticed this, and use a FreeBSD desktop
> >> daily).
> >
> > I would highly appreciate a decission against SCHED_ULE as the default
> > scheduler! SCHED_4BSD is considered a more mature entity and obviously
> > it seems that SCHED_ULE needs some refinements to achieve a better level
> > of quality.
> >
> >>
> >> On the other hand, we have very many benchmarks showing how poorly
> >> 4BSD scales on things like postgresql. We get much more load out of
> >> our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's
> >> easy to look at what you do and say "well, what suits my environment
> >> is clearly the best default", but I think there are probably more
> >> users typically running IO bound processes than CPU bound processes.
> >
> > You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't
> > you compare SCHED_ULE and SCHED_4BSD on the very same platform?
> >
> > Development of SCHED_ULE has been focused very much on DB like
> > PostgreSQL, no wonder the performance benefit. But this is also a very
> > specific scneario where SCHED_ULE shows a real benefit compared to
> > SCHED_4BSD.
> >
> >>
> >> I believe the correct thing to do is to put some extra documentation
> >> into the handbook about scheduler choice, noting the potential issues
> >> with loading NCPU+1 CPU bound processes. Perhaps making it easier to
> >> switch scheduler would also help?
> >
> > Many people more experst in the issue than myself revealed some issues
> > in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty
> > if all the discussions get flushed away like a "toilette-busisness" as
> > it has been done all the way in the past.
> >
> >
> > Well, I'd like to see a kind of "standardized" benchmark. Like on
> > openbenchmark.org or at phoronix.com. I know that Phoronix' way of
> > performing benchmarks is questionable and do not reveal much of the
> > issues, but it is better than nothing. I'm always surprised by the worse
> > performance of FreeBSD when it comes to threaded I/O. The differences
> > between Linux and FreeBSD of the same development maturity are
> > tremendous and scaring!
> >
> > It is a long time since I saw a SPEC benchmark on a FreeBSD driven HPC
> > box. Most benchmark around for testing hardware are performed with Linux
> > and Linux seems to make the race in nearly every scenario. It would be
> > highly appreciable and interesting to see how Linux and FreeBSD would
> > perform in SPEC on the same hardware platform. This is only an idea.
> > Without a suitable benchmark with a codebase understood the discussion
> > is in many aspects pointless -both ways.
> >
> >
> >>
> >> Cheers
> >>
> >> Tom
> >>
> >> References:
> >>
> >> http://people.freebsd.org/~kris/scaling/mysql-freebsd.png
> >> http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade
> >> ___
> 
> Hi!
> 
> Can you try with this settings:
> op@opn ~> sysctl kern.sched.

I'm replying with a list of each setting which differs compared to
RELENG_8 stock on our ULE systems.  Note that our ULE systems are 1
physical CPU with 4 cores.

> kern.sched.cpusetsize: 8

I see no such tunable/sysctl on any of our RELENG_8 and RELENG_7
systems.  Nor do I find any references to it in /usr/src (on any
system).  Is this a RELENG_9 setting?  Please explain where it comes
from.  I hope it's not a custom kernel patch...

> kern.sched.preemption: 0

This differs; default value is 1.

> kern.sched.name: ULE
> kern.sched.slice: 13
> kern.sched.interact: 30

> kern.sched.preempt_thresh: 224

This differs; default value is 64.  The "magic value" of 224 has been
discussed in the past, in this thread even.

> kern.sched.static_boost: 152

This differs; on our systems it's 160.

> kern.sched.idlespins: 1

> kern.sched.idlespinthresh: 16

This differs; on our systems it's 4.

> Most of them from 7-STABLE settings, and with this, "works for me".
> This an laptop with core2 duo cpu (with enabled powerd), and my kernel
> config is here:
> http://oliverp.teteny.bme.hu/freebsd/kernel_conf

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

Re: SCHED_ULE should not be the default

2011-12-14 Thread Oliver Pinter

On 12/15/11, O. Hartmann  wrote:
> On 12/14/11 18:54, Tom Evans wrote:
>> On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell
>>  wrote:
>>>
>>> Dear Secret Masters of FreeBSD: Can we have a decision on whether to
>>> change back to SCHED_4BSD while SCHED_ULE gets properly fixed?
>>>
>>
>> Please do not do this. This thread has shown that ULE performs poorly
>> in very specific scenarios where the server is loaded with NCPU+1 CPU
>> bound processes, and brought forward more complaints about
>> interactivity in X (I've never noticed this, and use a FreeBSD desktop
>> daily).
>
> I would highly appreciate a decission against SCHED_ULE as the default
> scheduler! SCHED_4BSD is considered a more mature entity and obviously
> it seems that SCHED_ULE needs some refinements to achieve a better level
> of quality.
>
>>
>> On the other hand, we have very many benchmarks showing how poorly
>> 4BSD scales on things like postgresql. We get much more load out of
>> our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's
>> easy to look at what you do and say "well, what suits my environment
>> is clearly the best default", but I think there are probably more
>> users typically running IO bound processes than CPU bound processes.
>
> You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't
> you compare SCHED_ULE and SCHED_4BSD on the very same platform?
>
> Development of SCHED_ULE has been focused very much on DB like
> PostgreSQL, no wonder the performance benefit. But this is also a very
> specific scneario where SCHED_ULE shows a real benefit compared to
> SCHED_4BSD.
>
>>
>> I believe the correct thing to do is to put some extra documentation
>> into the handbook about scheduler choice, noting the potential issues
>> with loading NCPU+1 CPU bound processes. Perhaps making it easier to
>> switch scheduler would also help?
>
> Many people more experst in the issue than myself revealed some issues
> in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty
> if all the discussions get flushed away like a "toilette-busisness" as
> it has been done all the way in the past.
>
>
> Well, I'd like to see a kind of "standardized" benchmark. Like on
> openbenchmark.org or at phoronix.com. I know that Phoronix' way of
> performing benchmarks is questionable and do not reveal much of the
> issues, but it is better than nothing. I'm always surprised by the worse
> performance of FreeBSD when it comes to threaded I/O. The differences
> between Linux and FreeBSD of the same development maturity are
> tremendous and scaring!
>
> It is a long time since I saw a SPEC benchmark on a FreeBSD driven HPC
> box. Most benchmark around for testing hardware are performed with Linux
> and Linux seems to make the race in nearly every scenario. It would be
> highly appreciable and interesting to see how Linux and FreeBSD would
> perform in SPEC on the same hardware platform. This is only an idea.
> Without a suitable benchmark with a codebase understood the discussion
> is in many aspects pointless -both ways.
>
>
>>
>> Cheers
>>
>> Tom
>>
>> References:
>>
>> http://people.freebsd.org/~kris/scaling/mysql-freebsd.png
>> http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade
>> ___
>
>

Hi!

Can you try with this settings:

op@opn ~> sysctl kern.sched.
kern.sched.cpusetsize: 8
kern.sched.preemption: 0
kern.sched.name: ULE
kern.sched.slice: 13
kern.sched.interact: 30
kern.sched.preempt_thresh: 224
kern.sched.static_boost: 152
kern.sched.idlespins: 1
kern.sched.idlespinthresh: 16
kern.sched.affinity: 1
kern.sched.balance: 1
kern.sched.balance_interval: 133
kern.sched.steal_htt: 1
kern.sched.steal_idle: 1
kern.sched.steal_thresh: 1
kern.sched.topology_spec: 
 
  0, 1
  
   
0, 1
   
  
 


Most of them from 7-STABLE settings, and with this, "works for me".
This an laptop with core2 duo cpu (with enabled powerd), and my kernel
config is here:
http://oliverp.teteny.bme.hu/freebsd/kernel_conf
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread Jeremy Chadwick

On Thu, Dec 15, 2011 at 12:39:50AM +0100, O. Hartmann wrote:
> On 12/14/11 18:54, Tom Evans wrote:
> > On the other hand, we have very many benchmarks showing how poorly
> > 4BSD scales on things like postgresql. We get much more load out of
> > our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's
> > easy to look at what you do and say "well, what suits my environment
> > is clearly the best default", but I think there are probably more
> > users typically running IO bound processes than CPU bound processes.
> 
> You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't
> you compare SCHED_ULE and SCHED_4BSD on the very same platform?

Agreed -- this is a bad comparison.  Again, I'm going to tell people to
do the one thing that's painful and nobody likes to do: *look at
commits* and pay close attention to the branches and any commits that
involve "tagging" for a release (so you can determine what "version" of
the code you might be running).

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_ule.c
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/sched_4bsd.c

I'm a bit busy today, otherwise I would offer to go over the SCHED_4BSD
changes between 7.0-RELEASE and 8.1-RELEASE (I would need Tom to confirm
those are the exact versions being used; I wish people would stop saying
things like "FreeBSD x.y" because it's inaccurate).  But the data is
there at the above URLs, including the committers/those involves.

> > I believe the correct thing to do is to put some extra documentation
> > into the handbook about scheduler choice, noting the potential issues
> > with loading NCPU+1 CPU bound processes. Perhaps making it easier to
> > switch scheduler would also help?

Replying to Tom's comment here:

It is already easy to switch schedulers.  You change the option in your
kernel config, rebuild kernel (world isn't necessary as long as you
haven't csup'd between your last rebuild and now), make installkernel,
shutdown -r now, done.

If what you're proposing is to make the scheduler changeable in
real-time?  I think that would require a **lot** of work for something
that very few people would benefit from (please stop for a moment and
think about the majority of the userbase, not just niche environments; I
say this politely, not with any condescension BTW).  Sure, it'd be
"nice to have", but should be extremely low on the priority list (IMO).

> Many people more experst in the issue than myself revealed some issues
> in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty
> if all the discussions get flushed away like a "toilette-busisness" as
> it has been done all the way in the past.

Gut feeling says this is what will happen, and that's because the people
who are (and have in the past) touching the scheduler bits are not
involved in this conversation.  We're not going to get anywhere unless
those people are involved and are available to make adjustments/etc.  I
would love to start CC'ing them all, but I don't think that's
necessarily effective.

I will take the time to point out/remind folks that the number of people
who *truly understand* the schedulers are few and far between.  We're
talking single-digit numbers, folks.  And those people are already busy
enough as-is.  This makes solving this problem difficult.

So, what I think WOULD be effective would be for someone to catalogue a
list of their systems/specifications/benchmarks/software/etc. that show
exactly where the problems are in their workspace when using ULE vs.
4BSD, or vice-versa.  That may give the developers some leads as to how
to progress.

Let's also not forget about the compiler ordeal; gcc versions greatly
differ (some folks overwrite the default base gcc with ones in ports),
and then there's the clang stuff...  Sigh.

> Well, I'd like to see a kind of "standardized" benchmark. Like on
> openbenchmark.org or at phoronix.com. I know that Phoronix' way of
> performing benchmarks is questionable and do not reveal much of the
> issues, but it is better than nothing.

I would love to run such benchmarks on all of our systems, but I have no
idea what kind of benchmark suites/etc. would be beneficial for the
developers who maintain/touch the schedulers.  You understand what I'm
saying?  For example, some folks earlier in the thread said the best
thing to do for this would be buildworld, but then further follow-ups
from others said buildworld is not effective given the I/O demands.

Furthermore, I want whatever benchmark/app suite thing to be minimal as
hell.  It should be standalone, no dependencies (or only 1 or 2).

Regarding threadsing: a colleague of mine, ex-co-worker who now works at
Apple as a developer, wrote a C program while he was at my current
workplace which -- pardon my French -- "beat the shit out of our Solaris
boxes, thread-wise".  It was customisable via command-line.  The thing
got some of our Solaris machines up to load averages of nearly 42000
(yes you read that right!), and s

Re: SCHED_ULE should not be the default

2011-12-14 Thread George Mitchell


On 12/14/11 12:54, Tom Evans wrote:

[...] This thread has shown that ULE performs poorly
in very specific scenarios where the server is loaded with NCPU+1 CPU
bound processes, [...]


Minor correction: Problem occurs when there are nCPU compute-bound
processes, not nCPU + 1.-- George Mitchell
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread O. Hartmann

On 12/14/11 18:54, Tom Evans wrote:
> On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell
>  wrote:
>>
>> Dear Secret Masters of FreeBSD: Can we have a decision on whether to
>> change back to SCHED_4BSD while SCHED_ULE gets properly fixed?
>>
> 
> Please do not do this. This thread has shown that ULE performs poorly
> in very specific scenarios where the server is loaded with NCPU+1 CPU
> bound processes, and brought forward more complaints about
> interactivity in X (I've never noticed this, and use a FreeBSD desktop
> daily).

I would highly appreciate a decission against SCHED_ULE as the default
scheduler! SCHED_4BSD is considered a more mature entity and obviously
it seems that SCHED_ULE needs some refinements to achieve a better level
of quality.

> 
> On the other hand, we have very many benchmarks showing how poorly
> 4BSD scales on things like postgresql. We get much more load out of
> our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's
> easy to look at what you do and say "well, what suits my environment
> is clearly the best default", but I think there are probably more
> users typically running IO bound processes than CPU bound processes.

You compare SCHED_ULE on FBSD 8.1 with SCHED_4BSD on FBSD 7.0? Shouldn't
you compare SCHED_ULE and SCHED_4BSD on the very same platform?

Development of SCHED_ULE has been focused very much on DB like
PostgreSQL, no wonder the performance benefit. But this is also a very
specific scneario where SCHED_ULE shows a real benefit compared to
SCHED_4BSD.

> 
> I believe the correct thing to do is to put some extra documentation
> into the handbook about scheduler choice, noting the potential issues
> with loading NCPU+1 CPU bound processes. Perhaps making it easier to
> switch scheduler would also help?

Many people more experst in the issue than myself revealed some issues
in the code of both SCHED_ULE and even SCHED_4BSD. It would be a pitty
if all the discussions get flushed away like a "toilette-busisness" as
it has been done all the way in the past.

Well, I'd like to see a kind of "standardized" benchmark. Like on
openbenchmark.org or at phoronix.com. I know that Phoronix' way of
performing benchmarks is questionable and do not reveal much of the
issues, but it is better than nothing. I'm always surprised by the worse
performance of FreeBSD when it comes to threaded I/O. The differences
between Linux and FreeBSD of the same development maturity are
tremendous and scaring!

It is a long time since I saw a SPEC benchmark on a FreeBSD driven HPC
box. Most benchmark around for testing hardware are performed with Linux
and Linux seems to make the race in nearly every scenario. It would be
highly appreciable and interesting to see how Linux and FreeBSD would
perform in SPEC on the same hardware platform. This is only an idea.
Without a suitable benchmark with a codebase understood the discussion
is in many aspects pointless -both ways.

> 
> Cheers
> 
> Tom
> 
> References:
> 
> http://people.freebsd.org/~kris/scaling/mysql-freebsd.png
> http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade
> ___

signature.asc
Description: OpenPGP digital signature

Re: SCHED_ULE should not be the default

2011-12-14 Thread Marcus Reid

On Wed, Dec 14, 2011 at 05:54:15PM +, Tom Evans wrote:
> brought forward more complaints about interactivity in X (I've never
> noticed this, and use a FreeBSD desktop daily).

.. that was me, but I forgot to add that it almost never happens, and it
can only be triggered when there are processes that want to take up 100%
of the CPU running on the system along with X and friends.

Don't want to spread FUD, I've been happily using FreeBSD on the desktop
for a decade and ULE seems to work great.

Marcus
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread Mark Linimon

I'm not on the Release Engineering Team, and in fact don't have a src
commit bit ... but this close to a major release, no, it's too late to
change the default.

mcl
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread Ivan Klymenko

В Wed, 14 Dec 2011 21:34:35 +0400
Andrey Chernov  пишет:

> On Tue, Dec 13, 2011 at 02:22:48AM -0800, Adrian Chadd wrote:
> > On 13 December 2011 01:00, Andrey Chernov  wrote:
> > 
> > >> If the algorithm ULE does not contain problems - it means the
> > >> problem has Core2Duo, or in a piece of code that uses the ULE
> > >> scheduler.
> > >
> > > I observe ULE interactivity slowness even on single core machine
> > > (Pentium 4) in very visible places, like 'ps ax' output stucks in
> > > the middle by ~1 second. When I switch back to SHED_4BSD, all
> > > slowness is gone.
> > 
> > Are you able to provide KTR traces of the scheduler results?
> > Something that can be fed to schedgraph?
> 
> Sorry, this machine is not mine anymore. I try SCHED_ULE on Core 2
> Duo instead and don't notice this effect, but it is overall pretty
> fast comparing to that Pentium 4.
> 

Give me, please, detailed instructions on how to do it - I'll do it ...
Be a shame if this the theme is will end again just only the
discussions ...  :(
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread Tom Evans

On Wed, Dec 14, 2011 at 11:06 AM, George Mitchell
 wrote:
>
> Dear Secret Masters of FreeBSD: Can we have a decision on whether to
> change back to SCHED_4BSD while SCHED_ULE gets properly fixed?
>

Please do not do this. This thread has shown that ULE performs poorly
in very specific scenarios where the server is loaded with NCPU+1 CPU
bound processes, and brought forward more complaints about
interactivity in X (I've never noticed this, and use a FreeBSD desktop
daily).

On the other hand, we have very many benchmarks showing how poorly
4BSD scales on things like postgresql. We get much more load out of
our 8.1 ULE DB and web servers than we do out of our 7.0 ones. It's
easy to look at what you do and say "well, what suits my environment
is clearly the best default", but I think there are probably more
users typically running IO bound processes than CPU bound processes.

I believe the correct thing to do is to put some extra documentation
into the handbook about scheduler choice, noting the potential issues
with loading NCPU+1 CPU bound processes. Perhaps making it easier to
switch scheduler would also help?

Cheers

Tom

References:

http://people.freebsd.org/~kris/scaling/mysql-freebsd.png
http://suckit.blog.hu/2009/10/05/freebsd_8_is_it_worth_to_upgrade
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread Andrey Chernov

On Tue, Dec 13, 2011 at 02:22:48AM -0800, Adrian Chadd wrote:
> On 13 December 2011 01:00, Andrey Chernov  wrote:
> 
> >> If the algorithm ULE does not contain problems - it means the problem
> >> has Core2Duo, or in a piece of code that uses the ULE scheduler.
> >
> > I observe ULE interactivity slowness even on single core machine (Pentium
> > 4) in very visible places, like 'ps ax' output stucks in the middle by ~1
> > second. When I switch back to SHED_4BSD, all slowness is gone.
> 
> Are you able to provide KTR traces of the scheduler results? Something
> that can be fed to schedgraph?

Sorry, this machine is not mine anymore. I try SCHED_ULE on Core 2 Duo 
instead and don't notice this effect, but it is overall pretty fast 
comparing to that Pentium 4.

-- 
http://ache.vniz.net/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread Mike Tancsa

On 12/13/2011 7:01 PM, m...@freebsd.org wrote:
> 
> Has anyone experiencing problems tried to set sysctl 
> kern.sched.steal_thresh=1 ?
> 
> I don't remember what our specific problem at $WORK was, perhaps it
> was just interrupt threads not getting serviced fast enough, but we've
> hard-coded this to 1 and removed the code that sets it in
> sched_initticks().  The same effect should be had by setting the
> sysctl after a box is up.

FWIW, this does impact the performance of pbzip2 on an i7. Using a 1.1G file

pbzip2 -v -c big > /dev/null

with burnP6 running in the background,

sysctl kern.sched.steal_thresh=1 
vs
sysctl kern.sched.steal_thresh=3



N   Min   MaxMedian   AvgStddev
x  10 38.005022  38.42238 38.194648 38.1650520.15546188
+   9 38.695417 40.595544 39.392127 39.4353840.59814114
Difference at 95.0% confidence
1.27033 +/- 0.412636
3.32852% +/- 1.08119%
(Student's t, pooled s = 0.425627)

a value of 1 is *slightly* faster.


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-14 Thread George Mitchell


On 12/09/11 19:57, George Mitchell wrote:

On 12/09/11 10:17, Attilio Rao wrote:

[...]
More precisely I'd be interested in KTR traces.
To be even more precise:
With a completely stable GENERIC configuration (or otherwise please
post your kernel config) please add the following:
options KTR
options KTR_ENTRIES=262144
options KTR_COMPILE=(KTR_SCHED)
options KTR_MASK=(KTR_SCHED)

While you are in the middle of the slow-down (so once it is well
established) please do:
# sysclt debug.ktr.cpumask=""


wonderland# sysctl debug.ktr.cpumask=""
debug.ktr.cpumask: 
sysctl: debug.ktr.cpumask: Invalid argument



In the end go with:
# ktrdump -ctf> ktr-ule-problem.out


It's 44MB, so it's at http://www.m5p.com/~george/ktr-ule-problem.out


There have been 22 downloads of this file so far; does anyone who
looked at it have any results to report?

Dear Secret Masters of FreeBSD: Can we have a decision on whether to
change back to SCHED_4BSD while SCHED_ULE gets properly fixed?

-- George Mitchell





and send the file to this mailing list.

Thanks,
Attilio




I hope this helps. -- George Mitchell


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-13 Thread Bruce Evans


On Wed, 14 Dec 2011, Ivan Klymenko wrote:


?? Wed, 14 Dec 2011 00:04:42 +0100
Jilles Tjoelker  ??:


On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:

If the algorithm ULE does not contain problems - it means the
problem has Core2Duo, or in a piece of code that uses the ULE
scheduler. I already wrote in a mailing list that specifically in
my case (Core2Duo) partially helps the following patch:
--- sched_ule.c.orig2011-11-24 18:11:48.0 +0200
+++ sched_ule.c 2011-12-10 22:47:08.0 +0200
...
@@ -2118,13 +2119,21 @@
struct td_sched *ts;

THREAD_LOCK_ASSERT(td, MA_OWNED);
+   if (td->td_pri_class & PRI_FIFO_BIT)
+   return;
+   ts = td->td_sched;
+   /*
+* We used up one time slice.
+*/
+   if (--ts->ts_slice > 0)
+   return;


This skips most of the periodic functionality (long term load
balancer, saving switch count (?), insert index (?), interactivity
score update for long running thread) if the thread is not going to
be rescheduled right now.

It looks wrong but it is a data point if it helps your workload.


Yes, I did it for as long as possible to delay the execution of the code in 
section:


I don't understand what you are doing here, but recently noticed that
the timeslicing in SCHED_4BSD is completely broken.  This bug may be a
feature.  SCHED_4BSD doesn't have its own timeslice counter like ts_slice
above.  It uses `switchticks' instead.  But switchticks hasn't been usable
for this purpose since long before SCHED_4BSD started using it for this
purpose.  switchticks is reset on every context switch, so it is useless
for almost all purposes -- any interrupt activity on a non-fast interrupt
clobbers it.

Removing the check of ts_slice in the above and always returning might
give a similar bug to the SCHED_4BSD one.

I noticed this while looking for bugs in realtime scheduling.  In the
above, returning early for PRI_FIFO_BIT also skips most of the periodic
functionality.  In SCHED_4BSD, returning early is the usual case, so
the PRI_FIFO_BIT might as well not be checked, and it is the unusual
fifo scheduling case (which is supposed to only apply to realtime
priority threads) which has a chance of working as intended, while the
usual roundrobin case degenerates to an impure form of fifo scheduling
(iit is impure since priority decay still works so it is only fifo
among threads of the same priority).


...

@@ -2144,9 +2153,6 @@
if
(TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
tdq->tdq_ridx = tdq->tdq_idx; }
-   ts = td->td_sched;
-   if (td->td_pri_class & PRI_FIFO_BIT)
-   return;
if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
/*
 * We used a tick; charge it to the thread so
@@ -2157,11 +2163,6 @@
sched_priority(td);
}
/*
-* We used up one time slice.
-*/
-   if (--ts->ts_slice > 0)
-   return;
-   /*
 * We're out of time, force a requeue at userret().
 */
ts->ts_slice = sched_slice;


With the ts_slice check here before you moved it, removing it might
give buggy behaviour closer to SCHED_4BSD.


and refusal to use options FULL_PREEMPTION


4-5 years ago, I found that any form of PREMPTION was a pessimization
for at least makeworld (since it caused too many context switches).
PREEMPTION was needed for the !SMP case, at least partly because of
the broken switchticks (switchticks, when it works, gives voluntary
yielding by some CPU hogs in the kernel.  PREEMPTION, if it works,
should do this better).  So I used PREEMPTION in the !SMP case and
not for the SMP case.  I didn't worry about the CPU hogs in the SMP
case since it is rare to have more than 1 of them and 1 will use at
most 1/2 of a multi-CPU system.


But no one has unsubscribed to my letter, my patch helps or not in
the case of Core2Duo...
There is a suspicion that the problems stem from the sections of
code associated with the SMP...
Maybe I'm in something wrong, but I want to help in solving this
problem ...


The main point of SCHED_ULE is to give better affinity for multi-CPU
systems.  But the `multi' apparently needs to be strictly more than
2 for it to brak even.

Bruce___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-13 Thread George Mitchell

On 12/13/11 18:02, Marcus Reid wrote:

[...]
The issues that I've seen with ULE on the desktop seem to be caused by X
taking up a steady amount of CPU, and being demoted from being an
"interactive" process.  X then becomes the bottleneck for other
processes that would otherwise be "interactive".  Try 'renice -20
' and see if that makes your problems go away.

Marcus
[...]

renice on X has no effect.  Stopping my compute-bound dnetc process
immediately speeds everything up; restarting it slows it back down.

On 12/13/11 19:01, m...@freebsd.org wrote:
> [...]
> Has anyone experiencing problems tried to set sysctl 
kern.sched.steal_thresh=1 ?

> [...]

1 appears to be the default value for kern.sched.steal_thresh.

-- George Mitchell
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-13 Thread Ivan Klymenko

В Tue, 13 Dec 2011 16:01:56 -0800
m...@freebsd.org пишет:

> On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko  wrote:
> > В Wed, 14 Dec 2011 00:04:42 +0100
> > Jilles Tjoelker  пишет:
> >
> >> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
> >> > If the algorithm ULE does not contain problems - it means the
> >> > problem has Core2Duo, or in a piece of code that uses the ULE
> >> > scheduler. I already wrote in a mailing list that specifically in
> >> > my case (Core2Duo) partially helps the following patch:
> >> > --- sched_ule.c.orig        2011-11-24 18:11:48.0 +0200
> >> > +++ sched_ule.c     2011-12-10 22:47:08.0 +0200
> >> > @@ -794,7 +794,8 @@
> >> >      * 1.5 * balance_interval.
> >> >      */
> >> >     balance_ticks = max(balance_interval / 2, 1);
> >> > -   balance_ticks += random() % balance_interval;
> >> > +// balance_ticks += random() % balance_interval;
> >> > +   balance_ticks += ((int)random()) % balance_interval;
> >> >     if (smp_started == 0 || rebalance == 0)
> >> >             return;
> >> >     tdq = TDQ_SELF();
> >>
> >> This avoids a 64-bit division on 64-bit platforms but seems to
> >> have no effect otherwise. Because this function is not called very
> >> often, the change seems unlikely to help.
> >
> > Yes, this section does not apply to this problem :)
> > Just I posted the latest patch which i using now...
> >
> >>
> >> > @@ -2118,13 +2119,21 @@
> >> >     struct td_sched *ts;
> >> >
> >> >     THREAD_LOCK_ASSERT(td, MA_OWNED);
> >> > +   if (td->td_pri_class & PRI_FIFO_BIT)
> >> > +           return;
> >> > +   ts = td->td_sched;
> >> > +   /*
> >> > +    * We used up one time slice.
> >> > +    */
> >> > +   if (--ts->ts_slice > 0)
> >> > +           return;
> >>
> >> This skips most of the periodic functionality (long term load
> >> balancer, saving switch count (?), insert index (?), interactivity
> >> score update for long running thread) if the thread is not going to
> >> be rescheduled right now.
> >>
> >> It looks wrong but it is a data point if it helps your workload.
> >
> > Yes, I did it for as long as possible to delay the execution of the
> > code in section: ...
> > #ifdef SMP
> >        /*
> >         * We run the long term load balancer infrequently on the
> > first cpu. */
> >        if (balance_tdq == tdq) {
> >                if (balance_ticks && --balance_ticks == 0)
> >                        sched_balance();
> >        }
> > #endif
> > ...
> >
> >>
> >> >     tdq = TDQ_SELF();
> >> >  #ifdef SMP
> >> >     /*
> >> >      * We run the long term load balancer infrequently on the
> >> > first cpu. */
> >> > -   if (balance_tdq == tdq) {
> >> > -           if (balance_ticks && --balance_ticks == 0)
> >> > +   if (balance_ticks && --balance_ticks == 0) {
> >> > +           if (balance_tdq == tdq)
> >> >                     sched_balance();
> >> >     }
> >> >  #endif
> >>
> >> The main effect of this appears to be to disable the long term load
> >> balancer completely after some time. At some point, a CPU other
> >> than the first CPU (which uses balance_tdq) will set balance_ticks
> >> = 0, and sched_balance() will never be called again.
> >>
> >
> > That is, for the same reason as above in the text...
> >
> >> It also introduces a hypothetical race condition because the
> >> access to balance_ticks is no longer restricted to one CPU under a
> >> spinlock.
> >>
> >> If the long term load balancer may be causing trouble, try setting
> >> kern.sched.balance_interval to a higher value with unpatched code.
> >
> > I checked it in the first place - but it did not help fix the
> > situation...
> >
> > The impression of malfunction rebalancing...
> > It seems that the thread is passed on to the same core that is
> > loaded and so... Perhaps this is a consequence of an incorrect
> > definition of the topology CPU?
> >
> >>
> >> > @@ -2144,9 +2153,6 @@
> >> >             if
> >> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
> >> > tdq->tdq_ridx = tdq->tdq_idx; }
> >> > -   ts = td->td_sched;
> >> > -   if (td->td_pri_class & PRI_FIFO_BIT)
> >> > -           return;
> >> >     if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
> >> >             /*
> >> >              * We used a tick; charge it to the thread so
> >> > @@ -2157,11 +2163,6 @@
> >> >             sched_priority(td);
> >> >     }
> >> >     /*
> >> > -    * We used up one time slice.
> >> > -    */
> >> > -   if (--ts->ts_slice > 0)
> >> > -           return;
> >> > -   /*
> >> >      * We're out of time, force a requeue at userret().
> >> >      */
> >> >     ts->ts_slice = sched_slice;
> >>
> >> > and refusal to use options FULL_PREEMPTION
> >> > But no one has unsubscribed to my letter, my patch helps or not
> >> > in the case of Core2Duo...
> >> > There is a suspicion that the problems stem from the sections of
> >> > code associated with the SMP...
> >> > Maybe I'm in something wrong, but I want to help in solving this
> >> > problem ...
> 
> 
> Has an

Re: SCHED_ULE should not be the default

2011-12-13 Thread mdf

On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko  wrote:
> В Wed, 14 Dec 2011 00:04:42 +0100
> Jilles Tjoelker  пишет:
>
>> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
>> > If the algorithm ULE does not contain problems - it means the
>> > problem has Core2Duo, or in a piece of code that uses the ULE
>> > scheduler. I already wrote in a mailing list that specifically in
>> > my case (Core2Duo) partially helps the following patch:
>> > --- sched_ule.c.orig        2011-11-24 18:11:48.0 +0200
>> > +++ sched_ule.c     2011-12-10 22:47:08.0 +0200
>> > @@ -794,7 +794,8 @@
>> >      * 1.5 * balance_interval.
>> >      */
>> >     balance_ticks = max(balance_interval / 2, 1);
>> > -   balance_ticks += random() % balance_interval;
>> > +// balance_ticks += random() % balance_interval;
>> > +   balance_ticks += ((int)random()) % balance_interval;
>> >     if (smp_started == 0 || rebalance == 0)
>> >             return;
>> >     tdq = TDQ_SELF();
>>
>> This avoids a 64-bit division on 64-bit platforms but seems to have no
>> effect otherwise. Because this function is not called very often, the
>> change seems unlikely to help.
>
> Yes, this section does not apply to this problem :)
> Just I posted the latest patch which i using now...
>
>>
>> > @@ -2118,13 +2119,21 @@
>> >     struct td_sched *ts;
>> >
>> >     THREAD_LOCK_ASSERT(td, MA_OWNED);
>> > +   if (td->td_pri_class & PRI_FIFO_BIT)
>> > +           return;
>> > +   ts = td->td_sched;
>> > +   /*
>> > +    * We used up one time slice.
>> > +    */
>> > +   if (--ts->ts_slice > 0)
>> > +           return;
>>
>> This skips most of the periodic functionality (long term load
>> balancer, saving switch count (?), insert index (?), interactivity
>> score update for long running thread) if the thread is not going to
>> be rescheduled right now.
>>
>> It looks wrong but it is a data point if it helps your workload.
>
> Yes, I did it for as long as possible to delay the execution of the code in 
> section:
> ...
> #ifdef SMP
>        /*
>         * We run the long term load balancer infrequently on the first cpu.
>         */
>        if (balance_tdq == tdq) {
>                if (balance_ticks && --balance_ticks == 0)
>                        sched_balance();
>        }
> #endif
> ...
>
>>
>> >     tdq = TDQ_SELF();
>> >  #ifdef SMP
>> >     /*
>> >      * We run the long term load balancer infrequently on the
>> > first cpu. */
>> > -   if (balance_tdq == tdq) {
>> > -           if (balance_ticks && --balance_ticks == 0)
>> > +   if (balance_ticks && --balance_ticks == 0) {
>> > +           if (balance_tdq == tdq)
>> >                     sched_balance();
>> >     }
>> >  #endif
>>
>> The main effect of this appears to be to disable the long term load
>> balancer completely after some time. At some point, a CPU other than
>> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and
>> sched_balance() will never be called again.
>>
>
> That is, for the same reason as above in the text...
>
>> It also introduces a hypothetical race condition because the access to
>> balance_ticks is no longer restricted to one CPU under a spinlock.
>>
>> If the long term load balancer may be causing trouble, try setting
>> kern.sched.balance_interval to a higher value with unpatched code.
>
> I checked it in the first place - but it did not help fix the situation...
>
> The impression of malfunction rebalancing...
> It seems that the thread is passed on to the same core that is loaded and 
> so...
> Perhaps this is a consequence of an incorrect definition of the topology CPU?
>
>>
>> > @@ -2144,9 +2153,6 @@
>> >             if
>> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
>> > tdq->tdq_ridx = tdq->tdq_idx; }
>> > -   ts = td->td_sched;
>> > -   if (td->td_pri_class & PRI_FIFO_BIT)
>> > -           return;
>> >     if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
>> >             /*
>> >              * We used a tick; charge it to the thread so
>> > @@ -2157,11 +2163,6 @@
>> >             sched_priority(td);
>> >     }
>> >     /*
>> > -    * We used up one time slice.
>> > -    */
>> > -   if (--ts->ts_slice > 0)
>> > -           return;
>> > -   /*
>> >      * We're out of time, force a requeue at userret().
>> >      */
>> >     ts->ts_slice = sched_slice;
>>
>> > and refusal to use options FULL_PREEMPTION
>> > But no one has unsubscribed to my letter, my patch helps or not in
>> > the case of Core2Duo...
>> > There is a suspicion that the problems stem from the sections of
>> > code associated with the SMP...
>> > Maybe I'm in something wrong, but I want to help in solving this
>> > problem ...


Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ?

I don't remember what our specific problem at $WORK was, perhaps it
was just interrupt threads not getting serviced fast enough, but we've
hard-coded this to 1 and removed the code that sets it in
sched_initticks().  The same effect should be h

Re: SCHED_ULE should not be the default

2011-12-13 Thread Ivan Klymenko

В Tue, 13 Dec 2011 23:02:15 +
Marcus Reid  пишет:

> On Mon, Dec 12, 2011 at 04:29:14PM -0800, Doug Barton wrote:
> > On 12/12/2011 05:47, O. Hartmann wrote:
> > > Do we have any proof at hand for such cases where SCHED_ULE
> > > performs much better than SCHED_4BSD?
> > 
> > I complained about poor interactive performance of ULE in a desktop
> > environment for years. I had numerous people try to help, including
> > Jeff, with various tunables, dtrace'ing, etc. The cause of the
> > problem was never found.
> 
> The issues that I've seen with ULE on the desktop seem to be caused
> by X taking up a steady amount of CPU, and being demoted from being an
> "interactive" process.  X then becomes the bottleneck for other
> processes that would otherwise be "interactive".  Try 'renice -20
> ' and see if that makes your problems go away.

Why, then X is not a bottleneck when using 4BSD?

> Marcus
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-13 Thread Ivan Klymenko

В Wed, 14 Dec 2011 00:04:42 +0100
Jilles Tjoelker  пишет:

> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
> > If the algorithm ULE does not contain problems - it means the
> > problem has Core2Duo, or in a piece of code that uses the ULE
> > scheduler. I already wrote in a mailing list that specifically in
> > my case (Core2Duo) partially helps the following patch:
> > --- sched_ule.c.orig2011-11-24 18:11:48.0 +0200
> > +++ sched_ule.c 2011-12-10 22:47:08.0 +0200
> > @@ -794,7 +794,8 @@
> >  * 1.5 * balance_interval.
> >  */
> > balance_ticks = max(balance_interval / 2, 1);
> > -   balance_ticks += random() % balance_interval;
> > +// balance_ticks += random() % balance_interval;
> > +   balance_ticks += ((int)random()) % balance_interval;
> > if (smp_started == 0 || rebalance == 0)
> > return;
> > tdq = TDQ_SELF();
> 
> This avoids a 64-bit division on 64-bit platforms but seems to have no
> effect otherwise. Because this function is not called very often, the
> change seems unlikely to help.

Yes, this section does not apply to this problem :)
Just I posted the latest patch which i using now...

> 
> > @@ -2118,13 +2119,21 @@
> > struct td_sched *ts;
> >  
> > THREAD_LOCK_ASSERT(td, MA_OWNED);
> > +   if (td->td_pri_class & PRI_FIFO_BIT)
> > +   return;
> > +   ts = td->td_sched;
> > +   /*
> > +* We used up one time slice.
> > +*/
> > +   if (--ts->ts_slice > 0)
> > +   return;
> 
> This skips most of the periodic functionality (long term load
> balancer, saving switch count (?), insert index (?), interactivity
> score update for long running thread) if the thread is not going to
> be rescheduled right now.
> 
> It looks wrong but it is a data point if it helps your workload.

Yes, I did it for as long as possible to delay the execution of the code in 
section:
...
#ifdef SMP
/*
 * We run the long term load balancer infrequently on the first cpu.
 */
if (balance_tdq == tdq) {
if (balance_ticks && --balance_ticks == 0)
sched_balance();
}
#endif
...

> 
> > tdq = TDQ_SELF();
> >  #ifdef SMP
> > /*
> >  * We run the long term load balancer infrequently on the
> > first cpu. */
> > -   if (balance_tdq == tdq) {
> > -   if (balance_ticks && --balance_ticks == 0)
> > +   if (balance_ticks && --balance_ticks == 0) {
> > +   if (balance_tdq == tdq)
> > sched_balance();
> > }
> >  #endif
> 
> The main effect of this appears to be to disable the long term load
> balancer completely after some time. At some point, a CPU other than
> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and
> sched_balance() will never be called again.
> 

That is, for the same reason as above in the text...

> It also introduces a hypothetical race condition because the access to
> balance_ticks is no longer restricted to one CPU under a spinlock.
> 
> If the long term load balancer may be causing trouble, try setting
> kern.sched.balance_interval to a higher value with unpatched code.

I checked it in the first place - but it did not help fix the situation...

The impression of malfunction rebalancing...
It seems that the thread is passed on to the same core that is loaded and so...
Perhaps this is a consequence of an incorrect definition of the topology CPU?

> 
> > @@ -2144,9 +2153,6 @@
> > if
> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
> > tdq->tdq_ridx = tdq->tdq_idx; }
> > -   ts = td->td_sched;
> > -   if (td->td_pri_class & PRI_FIFO_BIT)
> > -   return;
> > if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
> > /*
> >  * We used a tick; charge it to the thread so
> > @@ -2157,11 +2163,6 @@
> > sched_priority(td);
> > }
> > /*
> > -* We used up one time slice.
> > -*/
> > -   if (--ts->ts_slice > 0)
> > -   return;
> > -   /*
> >  * We're out of time, force a requeue at userret().
> >  */
> > ts->ts_slice = sched_slice;
> 
> > and refusal to use options FULL_PREEMPTION
> > But no one has unsubscribed to my letter, my patch helps or not in
> > the case of Core2Duo...
> > There is a suspicion that the problems stem from the sections of
> > code associated with the SMP...
> > Maybe I'm in something wrong, but I want to help in solving this
> > problem ...
> 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-13 Thread Marcus Reid

On Mon, Dec 12, 2011 at 04:29:14PM -0800, Doug Barton wrote:
> On 12/12/2011 05:47, O. Hartmann wrote:
> > Do we have any proof at hand for such cases where SCHED_ULE performs
> > much better than SCHED_4BSD?
> 
> I complained about poor interactive performance of ULE in a desktop
> environment for years. I had numerous people try to help, including
> Jeff, with various tunables, dtrace'ing, etc. The cause of the problem
> was never found.

The issues that I've seen with ULE on the desktop seem to be caused by X
taking up a steady amount of CPU, and being demoted from being an
"interactive" process.  X then becomes the bottleneck for other
processes that would otherwise be "interactive".  Try 'renice -20
' and see if that makes your problems go away.

Marcus
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-13 Thread Jilles Tjoelker

On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
> If the algorithm ULE does not contain problems - it means the problem
> has Core2Duo, or in a piece of code that uses the ULE scheduler.
> I already wrote in a mailing list that specifically in my case (Core2Duo)
> partially helps the following patch:
> --- sched_ule.c.orig  2011-11-24 18:11:48.0 +0200
> +++ sched_ule.c   2011-12-10 22:47:08.0 +0200
> @@ -794,7 +794,8 @@
>* 1.5 * balance_interval.
>*/
>   balance_ticks = max(balance_interval / 2, 1);
> - balance_ticks += random() % balance_interval;
> +//   balance_ticks += random() % balance_interval;
> + balance_ticks += ((int)random()) % balance_interval;
>   if (smp_started == 0 || rebalance == 0)
>   return;
>   tdq = TDQ_SELF();

This avoids a 64-bit division on 64-bit platforms but seems to have no
effect otherwise. Because this function is not called very often, the
change seems unlikely to help.

> @@ -2118,13 +2119,21 @@
>   struct td_sched *ts;
>  
>   THREAD_LOCK_ASSERT(td, MA_OWNED);
> + if (td->td_pri_class & PRI_FIFO_BIT)
> + return;
> + ts = td->td_sched;
> + /*
> +  * We used up one time slice.
> +  */
> + if (--ts->ts_slice > 0)
> + return;

This skips most of the periodic functionality (long term load balancer,
saving switch count (?), insert index (?), interactivity score update
for long running thread) if the thread is not going to be rescheduled
right now.

It looks wrong but it is a data point if it helps your workload.

>   tdq = TDQ_SELF();
>  #ifdef SMP
>   /*
>* We run the long term load balancer infrequently on the first cpu.
>*/
> - if (balance_tdq == tdq) {
> - if (balance_ticks && --balance_ticks == 0)
> + if (balance_ticks && --balance_ticks == 0) {
> + if (balance_tdq == tdq)
>   sched_balance();
>   }
>  #endif

The main effect of this appears to be to disable the long term load
balancer completely after some time. At some point, a CPU other than the
first CPU (which uses balance_tdq) will set balance_ticks = 0, and
sched_balance() will never be called again.

It also introduces a hypothetical race condition because the access to
balance_ticks is no longer restricted to one CPU under a spinlock.

If the long term load balancer may be causing trouble, try setting
kern.sched.balance_interval to a higher value with unpatched code.

> @@ -2144,9 +2153,6 @@
>   if (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
>   tdq->tdq_ridx = tdq->tdq_idx;
>   }
> - ts = td->td_sched;
> - if (td->td_pri_class & PRI_FIFO_BIT)
> - return;
>   if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
>   /*
>* We used a tick; charge it to the thread so
> @@ -2157,11 +2163,6 @@
>   sched_priority(td);
>   }
>   /*
> -  * We used up one time slice.
> -  */
> - if (--ts->ts_slice > 0)
> - return;
> - /*
>* We're out of time, force a requeue at userret().
>*/
>   ts->ts_slice = sched_slice;

> and refusal to use options FULL_PREEMPTION
> But no one has unsubscribed to my letter, my patch helps or not in the
> case of Core2Duo...
> There is a suspicion that the problems stem from the sections of code
> associated with the SMP...
> Maybe I'm in something wrong, but I want to help in solving this
> problem ...

-- 
Jilles Tjoelker
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-13 Thread Doug Barton

On 12/13/2011 13:31, Malin Randstrom wrote:
> stop sending me spam mail ... you never stop despite me having unsubscribeb
> several times. stop this!

If you had actually unsubscribed, the mail would have stopped. :)

You can see the instructions you need to follow below.

> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 

-- 

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-13 Thread Malin Randstrom

stop sending me spam mail ... you never stop despite me having unsubscribeb
several times. stop this!
On Dec 13, 2011 8:12 PM, "Steve Kargl" 
wrote:

> On Tue, Dec 13, 2011 at 02:23:46PM +0100, O. Hartmann wrote:
> > On 12/12/11 16:51, Steve Kargl wrote:
> > > On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
> > >>
> > >>> Not fully right, boinc defaults to run on idprio 31 so this isn't an
> > >>> issue. And yes, there are cases where SCHED_ULE shows much better
> > >>> performance then SCHED_4BSD.  [...]
> > >>
> > >> Do we have any proof at hand for such cases where SCHED_ULE performs
> > >> much better than SCHED_4BSD? Whenever the subject comes up, it is
> > >> mentioned, that SCHED_ULE has better performance on boxes with a ncpu
> >
> > >> 2. But in the end I see here contradictionary statements. People
> > >> complain about poor performance (especially in scientific
> environments),
> > >> and other give contra not being the case.
> > >>
> > >> Within our department, we developed a highly scalable code for
> planetary
> > >> science purposes on imagery. It utilizes present GPUs via OpenCL if
> > >> present. Otherwise it grabs as many cores as it can.
> > >> By the end of this year I'll get a new desktop box based on Intels new
> > >> Sandy Bridge-E architecture with plenty of memory. If the colleague
> who
> > >> developed the code is willing performing some benchmarks on the same
> > >> hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
> > >> recent Suse. For FreeBSD I intent also to look for performance with
> both
> > >> different schedulers available.
> > >>
> > >
> > > This comes up every 9 months or so, and must be approaching
> > > FAQ status.
> > >
> > > In a HPC environment, I recommend 4BSD.  Depending on
> > > the workload, ULE can cause a severe increase in turn
> > > around time when doing already long computations.  If
> > > you have an MPI application, simply launching greater
> > > than ncpu+1 jobs can show the problem.
> >
> > Well, those recommendations should based on "WHY". As the mostly
> > negative experiences with SCHED_ULE in highly computative workloads get
> > allways contradicted by "...but there are workloads that show the
> > opposite ..." this should be shown by more recent benchmarks and
> > explanations than legacy benchmarks from years ago.
> >
>
> I have given the WHY in previous discussions of ULE, based
> on what you call legacy benchmarks.  I have not seen any
> commit to sched_ule.c that would lead me to believe that
> the performance issues with ULE and cpu-bound numerical
> codes have been addressed.  Repeating the benchmark would
> be a waste of time.
>
> --
> Steve
> ___
> freebsd-performa...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to "
> freebsd-performance-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: SCHED_ULE should not be the default

2011-12-13 Thread Mike Tancsa

On 12/13/2011 10:54 AM, Steve Kargl wrote:
> 
> I have given the WHY in previous discussions of ULE, based
> on what you call legacy benchmarks.  I have not seen any
> commit to sched_ule.c that would lead me to believe that
> the performance issues with ULE and cpu-bound numerical
> codes have been addressed.  Repeating the benchmark would
> be a waste of time.

Trying a simple pbzip2 on a large file, the results are pretty
consistent through iterations. pbzip2 with 4BSD is barely faster on a
file thats 322MB in size.

after a reboot, I did a
strings bigfile > /dev/null
then ran
 pbzip2 -v xaa -c > /dev/null
7 times

If I do a burnP6 in the background, they perform about the same.

(from sysutils/cpuburn)
eg

 pbzip2 -v xaa -c > /dev/null
Parallel BZIP2 v1.1.6 - by: Jeff Gilchrist [http://compression.ca]
[Oct. 30, 2011]   (uses libbzip2 by Julian Seward)
Major contributions: Yavor Nikolov 

 # CPUs: 4
 BWT Block Size: 900 KB
File Block Size: 900 KB
 Maximum Memory: 100 MB
---
 File #: 1 of 1
 Input Name: xaa
Output Name: 

 Input Size: 352404831 bytes
Compressing data...
Output Size: 50630745 bytes
---

 Wall Clock: 18.139342 seconds


ULE
18.113204
18.116896
18.123400
18.105894
18.163332
18.139342
18.082888

ULE with burnP6
23.076085
22.003666
21.162987
21.682445
21.935568
23.595781
21.601277


4BSD
17.983395
17.986218
18.009254
18.004312
18.001494
17.997032

4BSD with burnP6
22.215508
21.886459
21.595179
21.361830
21.325351
21.244793



# ministat uleP6 bsdP6
x uleP6
+ bsdP6
+--+
|x+   + ++x   +  x  x   +
 xx|
|
||__MA|M_A__|
|
+--+
N   Min   MaxMedian   AvgStddev
x   6 21.162987 23.595781 22.003666 22.2427550.91175566
+   6 21.244793 22.215508 21.595179 21.604853 0.3792413
No difference proven at 95.0% confidence



x ule
+ bsd
+--+
|+ +   +   + +   +
 xx x  xx   x x|
|   |__A___M___|
  |M__A__| |
+--+
N   Min   MaxMedian   AvgStddev
x   7 18.082888 18.163332 18.116896 18.120708   0.025468695
+   6 17.983395 18.009254 18.001494 17.996951   0.010248473
Difference at 95.0% confidence
-0.123757 +/- 0.024538
-0.68296% +/- 0.135414%
(Student's t, pooled s = 0.0200388)





hardware is X3450 with 8G of memory.  RELENG8

---Mike


-- 
---
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, m...@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

1 2 >

1 - 100 of 142 matches

Mail list logo