Re: svn commit: r208585 - head/sys/mips/mips

2010-05-27 Thread Bruce Evans

On Thu, 27 May 2010, Alexander Motin wrote:


Neel Natu wrote:

However it is not immediately obvious why we prefer to run the
statistics timer at (or very close to) 128Hz. Any pointers?


I haven't looked myself, but sources report that some legacy code depend
on it:
http://lists.freebsd.org/pipermail/freebsd-arch/2009-December/009731.html


That's a good reference for newer scheduler problems.  The following from
cvs history is better for the 128:

% RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v

History in sched_4bsd.c was broken by not repo-copying.

% Working file: kern_synch.c
% head: 1.249
% ...
% 
% revision 1.83
% date: 1999/11/28 12:12:13;  author: bde;  state: Exp;  lines: +11 -13
% Scheduler fixes equivalent to the ones logged in the following NetBSD
% commit to kern_synch.c:
% 
%   

%   revision 1.55
%   date: 1999/02/23 02:56:03;  author: ross;  state: Exp;  lines: +39 -10
%   Scheduler bug fixes and reorganization
%   * fix the ancient nice(1) bug, where nice +20 processes incorrectly
% steal 10 - 20% of the CPU, (or even more depending on load average)
%   * provide a new schedclk() mechanism at a new clock at schedhz, so high
% platform hz values don't cause nice +0 processes to look like they are
% niced
%   * change the algorithm slightly, and reorganize the code a lot
%   * fix percent-CPU calculation bugs, and eliminate some no-op code
% 
%   === nice bug === Correctly divide the scheduler queues between niced and

%   compute-bound processes. The current nice weight of two (sort of, see

2 or 4 was the historical value.

%   `algorithm change' below) neatly divides the USRPRI queues in half; this
%   should have been used to clip p_estcpu, instead of UCHAR_MAX.  Besides
%   being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
%   and it was done after decay_cpu() which can only _reduce_ the value.  It
%   has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
%   scheduler-penalize themselves onto the same queue as nice +20 processes.
%   (Or even a higher one.)
% 
%   === New schedclk() mechansism === Some platforms should be cutting down

%   stathz before hitting the scheduler, since the scheduler algorithm only
%   works right in the vicinity of 64 Hz. Rather than prescale hz, then scale

The historical value was probably 60.

%   back and forth by 4 every time p_estcpu is touched (each occurance an
%   abstraction violation), use p_estcpu without scaling and require schedhz
%   to be generated directly at the right frequency. Use a default stathz (well,
%   actually, profhz) / 4, so nothing changes unless a platform defines schedhz
%   and a new clock.  Define these for alpha, where hz==1024, and nice was
%   totally broke.
% 
%   === Algorithm change === The nice value used to be added to the

%   exponentially-decayed scheduler history value p_estcpu, in _addition_ to
%   be incorporated directly (with greater wieght) into the priority 
calculation.
%   At first glance, it appears to be a pointless increase of 1/8 the nice

Perhaps I am confused by where the above factor of 2 or 4 was, and the 8
came directly from this 1/8.  Anyway, the final version attempts to fold
the factors together if possible.

%   effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
%   because it will ramp up linearly but be decayed only exponentially, thus
%   converging to an additional .75 nice for a loadaverage of one. I killed
%   this, it makes the behavior hard to control, almost impossible to analyze,
%   and the effect (~~nothing at for the first second, then somewhat increased
%   niceness after three seconds or more, depending on load average) pointless.
% 
%   === Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.

%   Collect scheduler functionality. Try to put each abstraction in just one
%   place.
%   
% 
% The details are a little different in FreeBSD:
% 
% === nice bug ===   Fixing this is the main point of this commit.  We use

% essentially the same clipping rule as NetBSD (our limit on p_estcpu
% differs by a scale factor).  However, clipping at all is fundamentally
% bad.  It gives free CPU the hoggiest hogs once they reach the limit, and
% reaching the limit is normal for long-running hogs.  This will be fixed
% later.
% 
% === New schedclk() mechanism ===  We don't use the NetBSD schedclk()

% (now schedclock()) mechanism.  We require (real)stathz to be about 128
% and scale by an extra factor of 2 compared with NetBSD's statclock().

Later another factor of to was added, giving a factor of 8.

Later still, another factor of smp_ncpus was added.

These factors reduce overflow/clamping.

% We scale p_estcpu instead of scaling the clock.  This is more accurate
% and flexible.
% 
% === Algorithm change ===  Same change.
% 
% === Other bugs ===  The p_pctcpu bug was fixed long ago.  We don't try as

% hard to abstra

Re: svn commit: r208585 - head/sys/mips/mips

2010-05-27 Thread Bruce Evans

On Wed, 26 May 2010, Neel Natu wrote:


On Wed, May 26, 2010 at 8:20 PM, Alexander Motin  wrote:

Neel Natu wrote:



Also, as soon as you run timer1 on frequency higher then hz - it is
strange to see
? ? ? ?stathz = hz;
? ? ? ?profhz = hz;
there. It is just useless. Better would be to do same as for x86:
? ? ? ?profhz = timer1hz;
? ? ? ?if (timer1hz < 128)
? ? ? ? ? ? ? ?stathz = timer1hz;
? ? ? ?else
? ? ? ? ? ? ? ?stathz = timer1hz / (timer1hz / 128);



This is almost unreadable due to \xa0.

stathz = timer1hz / (timer1hz / 128);

only works right if timer1hz is a multiple of 128, or at least a
multiple of the final stathz..  Otherwise, there may be significant
rounding error in the calculation, and if the final stathz is not an
exact divisor of timer1hz it is impossible to generate stathz from
timer1hz by dividing it.  (This has always been broken for the
lapic timer on amd64 and i386.  stathz = 133 is only nearly a divisor
of 1000 or 2000, and 128 is even further from being a divisor of any
timer frequency that can generate hz 1000.  The effects of this can
be seen in systat(1) -v 1 output -- the reported lapic timer interrupt
frequencies jump every ~(lapic_timer_hz / stathz) seconds when the divider
compensates for the multiple not being exact.  Another bug visible in systat
-v and vmstat -i output on ref9-amd64 right now is that the lapic timer
interrupt frequencies are all reported as 960.  hz is reported to be 1000,
but it is impossible to generate 1000 from 960.  Another bug in the lapic
timer code on amd64 and i386 is that it doesn't change the lapic timer
frequency to generate a high enough profhz.  profhz = 8192, which is
generated by the RTC on amd64 and i386, was adequate in 1990, and it
needs to be 100-1000 times larger now, but the lapic doesn't even generate
that; it claims to generates 1024, and this is even more impossible to
divide down from 960 than is 1000.)


I see your point with the profiling timer. I'll fix that to be like x86.

However it is not immediately obvious why we prefer to run the
statistics timer at (or very close to) 128Hz. Any pointers?


At least SCHED_4BSD requires stathz to be almost 128.  More precisely,
it requires a clock of frequency about 16 Hz and divides stathz
internally by INVERSE_ESTCPU_WEIGHT = (8 * smp_cpus) to get this.  It
gets some extra resolution by accumulating ticks at stathz but has to
divide the result by 8 before feeding it to the priority adjustment,
else the adjustment would be too sensitive to recent activity, and/or
would overflow (overflow is avoided by clamping to the limit, but this
is bad too).  Dividing by smp_ncpus is a hack to avoid the overflow
at a cost of reducing sensitivity.  The requirement for stathz to
be almost 128 is pushed to the clock generator(s) to avoid having
dividers (other than the simple/historical division by 8) in both
the clock generator(s) and the scheduler(s).

WHen using lapic timers, I normally use lapic_timer_hz = hz = stathz =
profhz = 100, and don't worry about the completely broken profhz or the
scheduling problems from having stathz = hz.  The scheduling problems
are mostly caused by the hardware clocks behind stathz and hz being
indentical.  When they are identical, having stathz != hz doesn't
help much, at least without the changes that I suggested a few months
ago (statclock() and hardclock() should never be called from the same
hardware interrupt).  There are 2 types of scheduling/statistics problems:
- malicious applications may hide from scheduling/statistics interrupts
  by arranging that they don't run across the interrupts.  This is easy
  to do while running for most of the time if hz is much larger than
  stathz (now the default :-().
- even non-malicious applications may hide from scheduling/statistics
  interrupts if the statclock and hardclock interrupts are too synchronous.
  This is a problem with the lapic timer interrupts in practice.  I think
  it takes almost perfect synchronization for there to be a problem in
  practice, and I can't see how the syncronization was perfect enough.
  For hz = 1000, lapic_timer_hz was 2000 and hardclock was called every
  second interrupt, while statclock was called every 2000/(stathz=133) =
  15th or 16th interrupt.  Since 15 is not a multiple of 2, statclock was
  normally called for the same lapic timer interrupt only every second
  interrupt.  This should be asynchronous enough.  I don't know the details
  of the current or previous implementation (where lapic_timer_hz is not
  2000) but IIRC the dividers don't know anything about the synchronicity
  problem so they could easily make it worse.

Bruce___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Re: svn commit: r208585 - head/sys/mips/mips

2010-05-26 Thread Alexander Motin
Neel Natu wrote:
> However it is not immediately obvious why we prefer to run the
> statistics timer at (or very close to) 128Hz. Any pointers?

I haven't looked myself, but sources report that some legacy code depend
on it:
http://lists.freebsd.org/pipermail/freebsd-arch/2009-December/009731.html

In any case it should not be equal to hz whenever possible.

-- 
Alexander Motin
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r208585 - head/sys/mips/mips

2010-05-26 Thread Neel Natu
Hi Alexander,

On Wed, May 26, 2010 at 8:20 PM, Alexander Motin  wrote:
> Neel Natu wrote:
>> Author: neel
>> Date: Thu May 27 01:27:25 2010
>> New Revision: 208585
>> URL: http://svn.freebsd.org/changeset/base/208585
>>
>> Log:
>>   Simplify clock interrupt handling on mips by using the new KPI - 
>> timer1clock()
>>   and timer2clock().
>>
>>   Dynamically adjust the tick frequency depending on the value of 'hz'. 
>> Tested
>>   with hz values of 100, 1000 and 2000.
>>
>> Modified:
>>   head/sys/mips/mips/tick.c
>
>> -             if (profprocs != 0)
>> -                     profclock(TRAPF_USERMODE(tf), tf->pc);
>> -     }
>> +     timer1clock(TRAPF_USERMODE(tf), tf->pc);
>> +     timer2clock(TRAPF_USERMODE(tf), tf->pc);
>>       critical_exit();
>> -#if 0 /* TARGET_OCTEON */
>
> You are not setting timer2hz, so timer2clock() will be emulated
> automatically. It should not be called explicitly, or statclock() will
> be called twice.
>

I'll fix this.

> Also, as soon as you run timer1 on frequency higher then hz - it is
> strange to see
>        stathz = hz;
>        profhz = hz;
> there. It is just useless. Better would be to do same as for x86:
>        profhz = timer1hz;
>        if (timer1hz < 128)
>                stathz = timer1hz;
>        else
>                stathz = timer1hz / (timer1hz / 128);
>

I see your point with the profiling timer. I'll fix that to be like x86.

However it is not immediately obvious why we prefer to run the
statistics timer at (or very close to) 128Hz. Any pointers?

best
Neel

> --
> Alexander Motin
>
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r208585 - head/sys/mips/mips

2010-05-26 Thread Alexander Motin
Neel Natu wrote:
> Author: neel
> Date: Thu May 27 01:27:25 2010
> New Revision: 208585
> URL: http://svn.freebsd.org/changeset/base/208585
> 
> Log:
>   Simplify clock interrupt handling on mips by using the new KPI - 
> timer1clock()
>   and timer2clock().
>   
>   Dynamically adjust the tick frequency depending on the value of 'hz'. Tested
>   with hz values of 100, 1000 and 2000.
> 
> Modified:
>   head/sys/mips/mips/tick.c

> - if (profprocs != 0)
> - profclock(TRAPF_USERMODE(tf), tf->pc);
> - }
> + timer1clock(TRAPF_USERMODE(tf), tf->pc);
> + timer2clock(TRAPF_USERMODE(tf), tf->pc);
>   critical_exit();
> -#if 0 /* TARGET_OCTEON */

You are not setting timer2hz, so timer2clock() will be emulated
automatically. It should not be called explicitly, or statclock() will
be called twice.

Also, as soon as you run timer1 on frequency higher then hz - it is
strange to see
stathz = hz;
profhz = hz;
there. It is just useless. Better would be to do same as for x86:
profhz = timer1hz;
if (timer1hz < 128)
stathz = timer1hz;
else
stathz = timer1hz / (timer1hz / 128);

-- 
Alexander Motin
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"