Re: idprio processes slowing down system

2010-12-06 Thread Peter Jeremy
On 2010-Nov-28 02:24:21 -0600, Adam Vande More amvandem...@gmail.com wrote:
On Sun, Nov 28, 2010 at 1:26 AM, Peter Jeremy peterjer...@acm.org wrote:
 Since all the boinc processes are running at i31, why are they impacting
 a buildkernel that runs with 0 nicety?

With the setup you presented you're going to have a lot of context switches
as the buildworld is going to give plenty of oppurtunities for boinc
processes to get some time.

Agreed.

  When it does switch out, the CPU cache is
invalidated, then invalidated again when the buildworld preempts back.

Not quite.  The amd64 uses physically addressed caches (see [1] 7.6.1)
so there's no need to flush the caches on a context switch.  (Though
the TLB _will_ need to be flushed since it does virtual-to-physical
mapping (see [1] 5.5)).  OTOH, whilst the boinc code is running, it
will occupy space in the caches, thus reducing the effective cache
size and presumably reducing the effective cache hit rate.

  This is what makes it slow.

Unfortunately, I don't think this explains the difference.  My system
doesn't have hyperthreading so any memory stalls will block the
affected core and the stall time will be added to the currently
running process.  My timing figures show that the user and system time
is unaffected by boinc - which is inconsistent with the slowdown being
due to the impact on boinc on caching.

I've done some further investigations following a suggestion from a
friend.  In particular, an idprio process should only be occupying
idle time so the time used by boinc and the system idle task whilst
boinc is running should be the same as the system idle time whilst
boinc is not running.  Re-running the tests and additionally monitoring
process times gives me the following idle time stats:

x /tmp/boinc_running
+ /tmp/boinc_stopped
++
| +  ++   +   xx  x x|
||__A_M___|   |__AM| |
++
N   Min   MaxMedian   AvgStddev
x   4 493.3507.78501.69   499.765 6.3722759
+   4332.35392.08361.84   356.885 26.514364
Difference at 95.0% confidence
-142.88 +/- 33.364
-28.5894% +/- 6.67595%
(Student's t, pooled s = 19.2823)

The numbers represent seconds of CPU time charged to [idle] (+) or
[idle] and all boinc processes (x).  This shows that when boinc is
running, it is using time that would not otherwise be idle - which
isn't what idprio processes should be doing.

My suspicion is that idprio processes are not being preempted
immediately a higher priority process becomes ready but are being
allowed to continue to run for a short period (possibly until their
current timeslice expires).  Unfortunately, I haven't yet worked out
how to prove or disprove this.

I was hoping that someone more familiar with the scheduler behaviour
would comment.

[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
http://support.amd.com/us/Processor_TechDocs/24593.pdf

-- 
Peter Jeremy


pgp9YwrxEtbQK.pgp
Description: PGP signature


Re: idprio processes slowing down system

2010-11-28 Thread Adam Vande More
On Sun, Nov 28, 2010 at 1:26 AM, Peter Jeremy peterjer...@acm.org wrote:

 Since all the boinc processes are running at i31, why are they impacting
 a buildkernel that runs with 0 nicety?


Someone please enlighten me if I'm wrong, but I'll take a stab at it.

With the setup you presented you're going to have a lot of context switches
as the buildworld is going to give plenty of oppurtunities for boinc
processes to get some time.  When it does switch out, the CPU cache is
invalidated, then invalidated again when the buildworld preempts back.  This
is what makes it slow.  If gcc was building one massive binary at that
priority, you wouldn't have boinc getting much/any time.  Since the
buildworld is much more modular and consists of a large amount of small
operations some CPU intentisive, some IO intensive, boinc can interrupt and
impact overall performance even if the inital process was started at a much
higher priority.

I'm not sure how well ULE handles CPU affinity.  Some other stuff I ran into
earlier suggested there's room for improvement, but in your particular use
case I'm not sure even ideal CPU affinity would improve things much.

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


idprio processes slowing down system

2010-11-27 Thread Peter Jeremy
Since scheduler issues have been popular lately, I thought I'd
investigate a ULE issue I've been aware of for a while...

I normally have some boinc (ports/astro/boinc) applications running
and I'd noticed that my nightly builds appear to end much sooner when
there's no boinc work units (this has been common for setiathome).
This morning, I timed 4 make -j3 KERNCONF=GENERIC buildkernel of
8-stable with the following results:
boinc running:
1167.839u 287.055s 18:45.69 129.2%6140+1975k 1+0io 114pf+0w
1166.431u 288.265s 18:00.16 134.6%6139+1975k 0+0io 106pf+0w
1168.490u 287.599s 17:52.24 135.7%6137+1975k 0+0io 106pf+0w
1165.747u 287.641s 17:10.38 141.0%6138+1975k 0+0io 106pf+0w

boinc stopped:
1165.052u 291.492s 15:54.72 152.5%6125+1972k 0+0io 106pf+0w
1166.101u 290.305s 15:42.54 154.5%6132+1973k 0+0io 106pf+0w
1165.248u 290.335s 15:35.93 155.5%6132+1974k 0+0io 106pf+0w
1166.100u 289.749s 15:26.35 157.1%6137+1974k 0+0io 106pf+0w

Since the the results were all monotonically reducing in wallclock
time, I decided to do a further 4 buildkernels with boinc running:

1168.242u 284.693s 17:33.05 137.9%6140+1975k 0+0io 106pf+0w
1167.191u 285.332s 17:19.27 139.7%6140+1976k 0+0io 106pf+0w
1224.813u 291.963s 20:14.90 124.8%6121+1966k 0+0io 106pf+0w
1213.132u 294.564s 19:48.98 126.8%6116+1967k 0+0io 106pf+0w

ministat(1) reports there is no statistical difference in the user
or system time:

User time:
x boinc_running
+ boinc_stopped
+--+
| +* x |
| +*xxx   x|
|||A_M___A___| |
+--+
N   Min   MaxMedian   AvgStddev
x   8  1165.747  1224.813  1168.242 1180.2356  24.12896
+   4  1165.052  1166.1011166.1 1165.62530.55457454
No difference proven at 95.0% confidence

System time:
x boinc_running
+ boinc_stopped
+--+
|   +  |
|x   xx  xx   x +   ++  x x|
|   |_MA|___MA_|__||
+--+
N   Min   MaxMedian   AvgStddev
x   8   284.693   294.564   287.641   288.389 3.3142183
+   4   289.749   291.492   290.335 290.470250.73252412
No difference proven at 95.0% confidence

But there is a significant difference in the wallclock time:
x boinc_running
+ boinc_stopped
+--+
|+ + +  + x x  xx x  x  x x|
||__AM_|  |___MA___|   |
+--+
N   Min   MaxMedian   AvgStddev
x   8   1030.381214.9   1080.16 1100.5838 69.364795
+   4926.35954.72942.54   939.885 11.915879
Difference at 95.0% confidence
-160.699 +/- 79.6798
-14.6012% +/- 7.23977%
(Student's t, pooled s = 58.4006)

Since all the boinc processes are running at i31, why are they impacting
a buildkernel that runs with 0 nicety?

System information: AMD Athlon(tm) Dual Core Processor 4850e
(2511.45-MHz K8-class CPU) running FreeBSD/amd64 from just before
8.1-RELEASE with WITNESS and WITNESS_SKIPSPIN, 8GB RAM, src and obj
are both on ZFS.

-- 
Peter Jeremy


pgpTYgNUsxZxp.pgp
Description: PGP signature