Re: svn commit: r208585 - head/sys/mips/mips
On Thu, 27 May 2010, Alexander Motin wrote: Neel Natu wrote: However it is not immediately obvious why we prefer to run the statistics timer at (or very close to) 128Hz. Any pointers? I haven't looked myself, but sources report that some legacy code depend on it: http://lists.freebsd.org/pipermail/freebsd-arch/2009-December/009731.html That's a good reference for newer scheduler problems. The following from cvs history is better for the 128: % RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v History in sched_4bsd.c was broken by not repo-copying. % Working file: kern_synch.c % head: 1.249 % ... % % revision 1.83 % date: 1999/11/28 12:12:13; author: bde; state: Exp; lines: +11 -13 % Scheduler fixes equivalent to the ones logged in the following NetBSD % commit to kern_synch.c: % % % revision 1.55 % date: 1999/02/23 02:56:03; author: ross; state: Exp; lines: +39 -10 % Scheduler bug fixes and reorganization % * fix the ancient nice(1) bug, where nice +20 processes incorrectly % steal 10 - 20% of the CPU, (or even more depending on load average) % * provide a new schedclk() mechanism at a new clock at schedhz, so high % platform hz values don't cause nice +0 processes to look like they are % niced % * change the algorithm slightly, and reorganize the code a lot % * fix percent-CPU calculation bugs, and eliminate some no-op code % % === nice bug === Correctly divide the scheduler queues between niced and % compute-bound processes. The current nice weight of two (sort of, see 2 or 4 was the historical value. % `algorithm change' below) neatly divides the USRPRI queues in half; this % should have been used to clip p_estcpu, instead of UCHAR_MAX. Besides % being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op, % and it was done after decay_cpu() which can only _reduce_ the value. It % has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can % scheduler-penalize themselves onto the same queue as nice +20 processes. % (Or even a higher one.) % % === New schedclk() mechansism === Some platforms should be cutting down % stathz before hitting the scheduler, since the scheduler algorithm only % works right in the vicinity of 64 Hz. Rather than prescale hz, then scale The historical value was probably 60. % back and forth by 4 every time p_estcpu is touched (each occurance an % abstraction violation), use p_estcpu without scaling and require schedhz % to be generated directly at the right frequency. Use a default stathz (well, % actually, profhz) / 4, so nothing changes unless a platform defines schedhz % and a new clock. Define these for alpha, where hz==1024, and nice was % totally broke. % % === Algorithm change === The nice value used to be added to the % exponentially-decayed scheduler history value p_estcpu, in _addition_ to % be incorporated directly (with greater wieght) into the priority calculation. % At first glance, it appears to be a pointless increase of 1/8 the nice Perhaps I am confused by where the above factor of 2 or 4 was, and the 8 came directly from this 1/8. Anyway, the final version attempts to fold the factors together if possible. % effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that % because it will ramp up linearly but be decayed only exponentially, thus % converging to an additional .75 nice for a loadaverage of one. I killed % this, it makes the behavior hard to control, almost impossible to analyze, % and the effect (~~nothing at for the first second, then somewhat increased % niceness after three seconds or more, depending on load average) pointless. % % === Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation. % Collect scheduler functionality. Try to put each abstraction in just one % place. % % % The details are a little different in FreeBSD: % % === nice bug === Fixing this is the main point of this commit. We use % essentially the same clipping rule as NetBSD (our limit on p_estcpu % differs by a scale factor). However, clipping at all is fundamentally % bad. It gives free CPU the hoggiest hogs once they reach the limit, and % reaching the limit is normal for long-running hogs. This will be fixed % later. % % === New schedclk() mechanism === We don't use the NetBSD schedclk() % (now schedclock()) mechanism. We require (real)stathz to be about 128 % and scale by an extra factor of 2 compared with NetBSD's statclock(). Later another factor of to was added, giving a factor of 8. Later still, another factor of smp_ncpus was added. These factors reduce overflow/clamping. % We scale p_estcpu instead of scaling the clock. This is more accurate % and flexible. % % === Algorithm change === Same change. % % === Other bugs === The p_pctcpu bug was fixed long ago. We don't try as % hard to abstra
Re: svn commit: r208585 - head/sys/mips/mips
On Wed, 26 May 2010, Neel Natu wrote: On Wed, May 26, 2010 at 8:20 PM, Alexander Motin wrote: Neel Natu wrote: Also, as soon as you run timer1 on frequency higher then hz - it is strange to see ? ? ? ?stathz = hz; ? ? ? ?profhz = hz; there. It is just useless. Better would be to do same as for x86: ? ? ? ?profhz = timer1hz; ? ? ? ?if (timer1hz < 128) ? ? ? ? ? ? ? ?stathz = timer1hz; ? ? ? ?else ? ? ? ? ? ? ? ?stathz = timer1hz / (timer1hz / 128); This is almost unreadable due to \xa0. stathz = timer1hz / (timer1hz / 128); only works right if timer1hz is a multiple of 128, or at least a multiple of the final stathz.. Otherwise, there may be significant rounding error in the calculation, and if the final stathz is not an exact divisor of timer1hz it is impossible to generate stathz from timer1hz by dividing it. (This has always been broken for the lapic timer on amd64 and i386. stathz = 133 is only nearly a divisor of 1000 or 2000, and 128 is even further from being a divisor of any timer frequency that can generate hz 1000. The effects of this can be seen in systat(1) -v 1 output -- the reported lapic timer interrupt frequencies jump every ~(lapic_timer_hz / stathz) seconds when the divider compensates for the multiple not being exact. Another bug visible in systat -v and vmstat -i output on ref9-amd64 right now is that the lapic timer interrupt frequencies are all reported as 960. hz is reported to be 1000, but it is impossible to generate 1000 from 960. Another bug in the lapic timer code on amd64 and i386 is that it doesn't change the lapic timer frequency to generate a high enough profhz. profhz = 8192, which is generated by the RTC on amd64 and i386, was adequate in 1990, and it needs to be 100-1000 times larger now, but the lapic doesn't even generate that; it claims to generates 1024, and this is even more impossible to divide down from 960 than is 1000.) I see your point with the profiling timer. I'll fix that to be like x86. However it is not immediately obvious why we prefer to run the statistics timer at (or very close to) 128Hz. Any pointers? At least SCHED_4BSD requires stathz to be almost 128. More precisely, it requires a clock of frequency about 16 Hz and divides stathz internally by INVERSE_ESTCPU_WEIGHT = (8 * smp_cpus) to get this. It gets some extra resolution by accumulating ticks at stathz but has to divide the result by 8 before feeding it to the priority adjustment, else the adjustment would be too sensitive to recent activity, and/or would overflow (overflow is avoided by clamping to the limit, but this is bad too). Dividing by smp_ncpus is a hack to avoid the overflow at a cost of reducing sensitivity. The requirement for stathz to be almost 128 is pushed to the clock generator(s) to avoid having dividers (other than the simple/historical division by 8) in both the clock generator(s) and the scheduler(s). WHen using lapic timers, I normally use lapic_timer_hz = hz = stathz = profhz = 100, and don't worry about the completely broken profhz or the scheduling problems from having stathz = hz. The scheduling problems are mostly caused by the hardware clocks behind stathz and hz being indentical. When they are identical, having stathz != hz doesn't help much, at least without the changes that I suggested a few months ago (statclock() and hardclock() should never be called from the same hardware interrupt). There are 2 types of scheduling/statistics problems: - malicious applications may hide from scheduling/statistics interrupts by arranging that they don't run across the interrupts. This is easy to do while running for most of the time if hz is much larger than stathz (now the default :-(). - even non-malicious applications may hide from scheduling/statistics interrupts if the statclock and hardclock interrupts are too synchronous. This is a problem with the lapic timer interrupts in practice. I think it takes almost perfect synchronization for there to be a problem in practice, and I can't see how the syncronization was perfect enough. For hz = 1000, lapic_timer_hz was 2000 and hardclock was called every second interrupt, while statclock was called every 2000/(stathz=133) = 15th or 16th interrupt. Since 15 is not a multiple of 2, statclock was normally called for the same lapic timer interrupt only every second interrupt. This should be asynchronous enough. I don't know the details of the current or previous implementation (where lapic_timer_hz is not 2000) but IIRC the dividers don't know anything about the synchronicity problem so they could easily make it worse. Bruce___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r208585 - head/sys/mips/mips
Neel Natu wrote: > However it is not immediately obvious why we prefer to run the > statistics timer at (or very close to) 128Hz. Any pointers? I haven't looked myself, but sources report that some legacy code depend on it: http://lists.freebsd.org/pipermail/freebsd-arch/2009-December/009731.html In any case it should not be equal to hz whenever possible. -- Alexander Motin ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r208585 - head/sys/mips/mips
Hi Alexander, On Wed, May 26, 2010 at 8:20 PM, Alexander Motin wrote: > Neel Natu wrote: >> Author: neel >> Date: Thu May 27 01:27:25 2010 >> New Revision: 208585 >> URL: http://svn.freebsd.org/changeset/base/208585 >> >> Log: >> Simplify clock interrupt handling on mips by using the new KPI - >> timer1clock() >> and timer2clock(). >> >> Dynamically adjust the tick frequency depending on the value of 'hz'. >> Tested >> with hz values of 100, 1000 and 2000. >> >> Modified: >> head/sys/mips/mips/tick.c > >> - if (profprocs != 0) >> - profclock(TRAPF_USERMODE(tf), tf->pc); >> - } >> + timer1clock(TRAPF_USERMODE(tf), tf->pc); >> + timer2clock(TRAPF_USERMODE(tf), tf->pc); >> critical_exit(); >> -#if 0 /* TARGET_OCTEON */ > > You are not setting timer2hz, so timer2clock() will be emulated > automatically. It should not be called explicitly, or statclock() will > be called twice. > I'll fix this. > Also, as soon as you run timer1 on frequency higher then hz - it is > strange to see > stathz = hz; > profhz = hz; > there. It is just useless. Better would be to do same as for x86: > profhz = timer1hz; > if (timer1hz < 128) > stathz = timer1hz; > else > stathz = timer1hz / (timer1hz / 128); > I see your point with the profiling timer. I'll fix that to be like x86. However it is not immediately obvious why we prefer to run the statistics timer at (or very close to) 128Hz. Any pointers? best Neel > -- > Alexander Motin > ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r208585 - head/sys/mips/mips
Neel Natu wrote: > Author: neel > Date: Thu May 27 01:27:25 2010 > New Revision: 208585 > URL: http://svn.freebsd.org/changeset/base/208585 > > Log: > Simplify clock interrupt handling on mips by using the new KPI - > timer1clock() > and timer2clock(). > > Dynamically adjust the tick frequency depending on the value of 'hz'. Tested > with hz values of 100, 1000 and 2000. > > Modified: > head/sys/mips/mips/tick.c > - if (profprocs != 0) > - profclock(TRAPF_USERMODE(tf), tf->pc); > - } > + timer1clock(TRAPF_USERMODE(tf), tf->pc); > + timer2clock(TRAPF_USERMODE(tf), tf->pc); > critical_exit(); > -#if 0 /* TARGET_OCTEON */ You are not setting timer2hz, so timer2clock() will be emulated automatically. It should not be called explicitly, or statclock() will be called twice. Also, as soon as you run timer1 on frequency higher then hz - it is strange to see stathz = hz; profhz = hz; there. It is just useless. Better would be to do same as for x86: profhz = timer1hz; if (timer1hz < 128) stathz = timer1hz; else stathz = timer1hz / (timer1hz / 128); -- Alexander Motin ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"