Re: Please test: HZ bump
On Tue, Dec 25, 2018 at 06:37:03PM -0200, Martin Pieuchot wrote: > On 24/12/18(Mon) 20:07, Scott Cheloha wrote: > > On Tue, Dec 18, 2018 at 03:39:43PM -0600, Ian Sutton wrote: > > > On Mon, Aug 14, 2017 at 3:07 PM Martin Pieuchot wrote: > > > > > > > > I'd like to improve the fairness of the scheduler, with the goal of > > > > mitigating userland starvations. For that the kernel needs to have > > > > a better understanding of the amount of executed time per task. > > > > > > > > The smallest interval currently usable on all our architectures for > > > > such accounting is a tick. With the current HZ value of 100, this > > > > smallest interval is 10ms. I'd like to bump this value to 1000. > > > > > > > > The diff below intentionally bump other `hz' value to keep current > > > > ratios. We certainly want to call schedclock(), or a similar time > > > > accounting function, at a higher frequency than 16 Hz. However this > > > > will be part of a later diff. > > > > > > > > I'd be really interested in test reports. mlarkin@ raised a good > > > > question: is your battery lifetime shorter with this diff? > > > > > > > [...] > > > I'd like to see more folks test and other devs to share their > > > thoughts: What are the risks associated with bumping HZ globally? > > > Drawbacks? Reasons for hesitation? > > > > In general I'd like to reduce wakeup latency as well. Raising HZ is an > > obvious route to achieving that. But I think there are a couple things > > that need to be addressed before it would be reasonable. The things that > > come to mind for me are: > > > > - A tick is a 32-bit signed integer on all platforms. If HZ=100, we > >can represent at most ~248 days in ticks. This is plenty. If HZ=1000, > >we now only have ~24.8 days. Some may disagree, but I don't think this > >is enough. > > Why do you think it isn't enough? > > >One possible solution is to make ticks 64-bit. This addresses the > >timeout length issue at a cost to 32-bit platforms that I cannot > >quantify without lots of testing: what is the overhead of using 64-bit > >arithmetic on a 32-bit machine for all timeouts? > > > >A compromise is to make ticks a long. kettenis mentioned this > >possibility in a commit [1] some time back. This would allow 64-bit > >platforms to raise HZ without crippling timeout ranges. But then you > >have ticks of different sizes on different platforms, which could be a > >headache, I imagine. > > Note that we had, and certainly still have, tick-wrapping bugs in the > kernel :) > > >(maybe there are other solutions?) > > Solution to what? > > > - How does an OpenBSD guest on vmd(8) behave when HZ=1000? Multiple such > >guests on vmd(8)? Such guests on other hypervisors? > > > > - The replies in this thread don't indicate any effect on battery life or > >power consumption but I find it hard to believe that raising HZ has no > >impact on such things. Bumping HZ like this *must* increase CPU > > utilization. > >What is the cost in watt-hours? > > It depends on the machine. But that's one of the reasons I dropped the > bump. > > > - Can smaller machines even handle HZ=1000? Linux experimented with this > >over a decade ago and settled on a default HZ=250 for i386 [2]. I don't > >know how it all shook out, but my guess is that they didn't revert from > >1000 -> 250 for no reason at all. Of course, FreeBSD went ahead with > > 1000 > >on i386, so opinions differ. > > Indeed, we still support architectures that can't handle an HZ of 1000. > > > - How does this effect e.g. packet throughput on smaller machines? I think > >bigger boxes on amd64 would be fine, but I wonder if throughput would > > take > >a noticeable hit on a smaller router. > > Some measurements indicated a drop of 10% in packet forwarding on some > machines and no difference on others. > > > And then... can we reduce wakeup latency in general without raising HZ? > > Other > > systems (e.g. DFly) have better wakeup latencies and still have HZ=100. > > What > > are they doing? Can we borrow it? > > I haven't looked at other systems like DragonFly, but since you seem > interested to improve that area, here's my story. I didn't look at > wakeup latencies. I don't know why you're after that. Instead I > focused on `schedhz' and schedclock(). I landed there after observing > that with a high number of threads in "running" state (an active > browser while baking a muild), work was badly distributed amongst CPUs. > Some per-CPU queues were growing and others stayed empty. > > CPUs have runqueues that are selected based on per-thread `p_priority'. > What does this field represent today is confusing. Many changes since > the original scheduler design, including hardware improvements, side- > effects and developer mistakes makes it more confusing. However bumping > HZ improves the placements of "running" threads
Re: Please test: HZ bump
On 24/12/18(Mon) 20:07, Scott Cheloha wrote: > On Tue, Dec 18, 2018 at 03:39:43PM -0600, Ian Sutton wrote: > > On Mon, Aug 14, 2017 at 3:07 PM Martin Pieuchot wrote: > > > > > > I'd like to improve the fairness of the scheduler, with the goal of > > > mitigating userland starvations. For that the kernel needs to have > > > a better understanding of the amount of executed time per task. > > > > > > The smallest interval currently usable on all our architectures for > > > such accounting is a tick. With the current HZ value of 100, this > > > smallest interval is 10ms. I'd like to bump this value to 1000. > > > > > > The diff below intentionally bump other `hz' value to keep current > > > ratios. We certainly want to call schedclock(), or a similar time > > > accounting function, at a higher frequency than 16 Hz. However this > > > will be part of a later diff. > > > > > > I'd be really interested in test reports. mlarkin@ raised a good > > > question: is your battery lifetime shorter with this diff? > > > > > [...] > > I'd like to see more folks test and other devs to share their > > thoughts: What are the risks associated with bumping HZ globally? > > Drawbacks? Reasons for hesitation? > > In general I'd like to reduce wakeup latency as well. Raising HZ is an > obvious route to achieving that. But I think there are a couple things > that need to be addressed before it would be reasonable. The things that > come to mind for me are: > > - A tick is a 32-bit signed integer on all platforms. If HZ=100, we >can represent at most ~248 days in ticks. This is plenty. If HZ=1000, >we now only have ~24.8 days. Some may disagree, but I don't think this >is enough. Why do you think it isn't enough? >One possible solution is to make ticks 64-bit. This addresses the >timeout length issue at a cost to 32-bit platforms that I cannot >quantify without lots of testing: what is the overhead of using 64-bit >arithmetic on a 32-bit machine for all timeouts? > >A compromise is to make ticks a long. kettenis mentioned this >possibility in a commit [1] some time back. This would allow 64-bit >platforms to raise HZ without crippling timeout ranges. But then you >have ticks of different sizes on different platforms, which could be a >headache, I imagine. Note that we had, and certainly still have, tick-wrapping bugs in the kernel :) >(maybe there are other solutions?) Solution to what? > - How does an OpenBSD guest on vmd(8) behave when HZ=1000? Multiple such >guests on vmd(8)? Such guests on other hypervisors? > > - The replies in this thread don't indicate any effect on battery life or >power consumption but I find it hard to believe that raising HZ has no >impact on such things. Bumping HZ like this *must* increase CPU > utilization. >What is the cost in watt-hours? It depends on the machine. But that's one of the reasons I dropped the bump. > - Can smaller machines even handle HZ=1000? Linux experimented with this >over a decade ago and settled on a default HZ=250 for i386 [2]. I don't >know how it all shook out, but my guess is that they didn't revert from >1000 -> 250 for no reason at all. Of course, FreeBSD went ahead with 1000 >on i386, so opinions differ. Indeed, we still support architectures that can't handle an HZ of 1000. > - How does this effect e.g. packet throughput on smaller machines? I think >bigger boxes on amd64 would be fine, but I wonder if throughput would take >a noticeable hit on a smaller router. Some measurements indicated a drop of 10% in packet forwarding on some machines and no difference on others. > And then... can we reduce wakeup latency in general without raising HZ? Other > systems (e.g. DFly) have better wakeup latencies and still have HZ=100. What > are they doing? Can we borrow it? I haven't looked at other systems like DragonFly, but since you seem interested to improve that area, here's my story. I didn't look at wakeup latencies. I don't know why you're after that. Instead I focused on `schedhz' and schedclock(). I landed there after observing that with a high number of threads in "running" state (an active browser while baking a muild), work was badly distributed amongst CPUs. Some per-CPU queues were growing and others stayed empty. CPUs have runqueues that are selected based on per-thread `p_priority'. What does this field represent today is confusing. Many changes since the original scheduler design, including hardware improvements, side- effects and developer mistakes makes it more confusing. However bumping HZ improves the placements of "running" threads in per-CPU runqueues. I spend a lot of time trying to observe and understand why. I don't remember the details but came to the conclusion that `p_priority' was newer. In other words the kernel has more up-to-date information to make choices. However it became clear to me that t
Re: Please test: HZ bump
> And then... can we reduce wakeup latency in general without raising HZ? Other > systems (e.g. DFly) have better wakeup latencies and still have HZ=100. What > are they doing? Can we borrow it? https://frenchfries.net/paul/dfly/nanosleep.html OpenBSD is still adding that one tick which results in a (typical) sleep duration of no less than about 20ms.
Re: Please test: HZ bump
Scott Cheloha wrote: > - A tick is a 32-bit signed integer on all platforms. If HZ=100, we >can represent at most ~248 days in ticks. This is plenty. If HZ=1000, >we now only have ~24.8 days. Some may disagree, but I don't think this >is enough. So the question is what happens when a timeout fires early? Kernel code we control, and we can fix that. By inspection, it seems nanosleep is already non-compliant since posix says it may not return early. We'd need to add a loop or something. And review other userland interfaces to timeouts. >A compromise is to make ticks a long. kettenis mentioned this >possibility in a commit [1] some time back. This would allow 64-bit >platforms to raise HZ without crippling timeout ranges. But then you >have ticks of different sizes on different platforms, which could be a >headache, I imagine. If we have learned anything from off_t and time_t, it's that such splits cause a lot of ongoing difficulty. > - How does an OpenBSD guest on vmd(8) behave when HZ=1000? Multiple such >guests on vmd(8)? Such guests on other hypervisors? If the host is HZ=1000 and the guest is HZ=100, time keeping works much better. :) > And then... can we reduce wakeup latency in general without raising HZ? Other > systems (e.g. DFly) have better wakeup latencies and still have HZ=100. What > are they doing? Can we borrow it? Ideally, yes. The lapic can be programmed to fire one shot with a much shorter duration, but this quickly gets complicated with coalescing, etc.
Re: Please test: HZ bump
On Tue, Dec 18, 2018 at 03:39:43PM -0600, Ian Sutton wrote: > On Mon, Aug 14, 2017 at 3:07 PM Martin Pieuchot wrote: > > > > I'd like to improve the fairness of the scheduler, with the goal of > > mitigating userland starvations. For that the kernel needs to have > > a better understanding of the amount of executed time per task. > > > > The smallest interval currently usable on all our architectures for > > such accounting is a tick. With the current HZ value of 100, this > > smallest interval is 10ms. I'd like to bump this value to 1000. > > > > The diff below intentionally bump other `hz' value to keep current > > ratios. We certainly want to call schedclock(), or a similar time > > accounting function, at a higher frequency than 16 Hz. However this > > will be part of a later diff. > > > > I'd be really interested in test reports. mlarkin@ raised a good > > question: is your battery lifetime shorter with this diff? > > > > Comments, oks? > > I'd like to revisit this patch. It makes our armv7 platform more > usable for what it is meant to do, i.e. be a microcontroller. I > imagine on other platforms it would accrue similar benefits as well. > > I've tested this patch and found delightfully proportional results. > Currently, at HZ = 100, the minimum latency for a sleep calll from > userspace is about 10ms: > > https://ce.gl/baseline.jpg > > After the patch, which bumps HZ from 100 --> 1000, we see a tenfold > decrease in this latency: > > https://ce.gl/with-mpi-hz-patch.jpg > > This signal is generated with gpio(4) ioctl calls from userspace, > e.g.: for(;;) { HI(pin); usleep(1); LO(pin(); usleep(1); } > > I'd like to see more folks test and other devs to share their > thoughts: What are the risks associated with bumping HZ globally? > Drawbacks? Reasons for hesitation? In general I'd like to reduce wakeup latency as well. Raising HZ is an obvious route to achieving that. But I think there are a couple things that need to be addressed before it would be reasonable. The things that come to mind for me are: - A tick is a 32-bit signed integer on all platforms. If HZ=100, we can represent at most ~248 days in ticks. This is plenty. If HZ=1000, we now only have ~24.8 days. Some may disagree, but I don't think this is enough. One possible solution is to make ticks 64-bit. This addresses the timeout length issue at a cost to 32-bit platforms that I cannot quantify without lots of testing: what is the overhead of using 64-bit arithmetic on a 32-bit machine for all timeouts? A compromise is to make ticks a long. kettenis mentioned this possibility in a commit [1] some time back. This would allow 64-bit platforms to raise HZ without crippling timeout ranges. But then you have ticks of different sizes on different platforms, which could be a headache, I imagine. (maybe there are other solutions?) - How does an OpenBSD guest on vmd(8) behave when HZ=1000? Multiple such guests on vmd(8)? Such guests on other hypervisors? - The replies in this thread don't indicate any effect on battery life or power consumption but I find it hard to believe that raising HZ has no impact on such things. Bumping HZ like this *must* increase CPU utilization. What is the cost in watt-hours? - Can smaller machines even handle HZ=1000? Linux experimented with this over a decade ago and settled on a default HZ=250 for i386 [2]. I don't know how it all shook out, but my guess is that they didn't revert from 1000 -> 250 for no reason at all. Of course, FreeBSD went ahead with 1000 on i386, so opinions differ. - How does this effect e.g. packet throughput on smaller machines? I think bigger boxes on amd64 would be fine, but I wonder if throughput would take a noticeable hit on a smaller router. And then... can we reduce wakeup latency in general without raising HZ? Other systems (e.g. DFly) have better wakeup latencies and still have HZ=100. What are they doing? Can we borrow it? -- Sorry for the length. In short, you should be fine compiling custom kernels for your controllers with HZ=1000; you shouldn't see any ill effects for that use case. But making it the default, even for select platforms, needs more planning. -Scott [1] http://cvsweb.openbsd.org/src/sys/kern/kern_clock.c?rev=1.93&content-type=text/x-cvsweb-markup [2] http://man7.org/linux/man-pages/man7/time.7.html > > Index: conf/param.c > > === > > RCS file: /cvs/src/sys/conf/param.c,v > > retrieving revision 1.37 > > diff -u -p -r1.37 param.c > > --- conf/param.c6 May 2016 19:45:35 - 1.37 > > +++ conf/param.c14 Aug 2017 17:03:23 - > > @@ -76,7 +76,7 @@ > > # define DST 0 > > #endif > > #ifndef HZ > > -#defineHZ 100 > > +#defineHZ 1000 > > #endif > > inthz = HZ; > > inttick = 100 / HZ; > > Index: kern/kern_clock.c > > =
Re: Please test: HZ bump
On Mon, Aug 14, 2017 at 3:07 PM Martin Pieuchot wrote: > > I'd like to improve the fairness of the scheduler, with the goal of > mitigating userland starvations. For that the kernel needs to have > a better understanding of the amount of executed time per task. > > The smallest interval currently usable on all our architectures for > such accounting is a tick. With the current HZ value of 100, this > smallest interval is 10ms. I'd like to bump this value to 1000. > > The diff below intentionally bump other `hz' value to keep current > ratios. We certainly want to call schedclock(), or a similar time > accounting function, at a higher frequency than 16 Hz. However this > will be part of a later diff. > > I'd be really interested in test reports. mlarkin@ raised a good > question: is your battery lifetime shorter with this diff? > > Comments, oks? > I'd like to revisit this patch. It makes our armv7 platform more usable for what it is meant to do, i.e. be a microcontroller. I imagine on other platforms it would accrue similar benefits as well. I've tested this patch and found delightfully proportional results. Currently, at HZ = 100, the minimum latency for a sleep calll from userspace is about 10ms: https://ce.gl/baseline.jpg After the patch, which bumps HZ from 100 --> 1000, we see a tenfold decrease in this latency: https://ce.gl/with-mpi-hz-patch.jpg This signal is generated with gpio(4) ioctl calls from userspace, e.g.: for(;;) { HI(pin); usleep(1); LO(pin(); usleep(1); } I'd like to see more folks test and other devs to share their thoughts: What are the risks associated with bumping HZ globally? Drawbacks? Reasons for hesitation? Thanks, Ian Sutton > Index: conf/param.c > === > RCS file: /cvs/src/sys/conf/param.c,v > retrieving revision 1.37 > diff -u -p -r1.37 param.c > --- conf/param.c6 May 2016 19:45:35 - 1.37 > +++ conf/param.c14 Aug 2017 17:03:23 - > @@ -76,7 +76,7 @@ > # define DST 0 > #endif > #ifndef HZ > -#defineHZ 100 > +#defineHZ 1000 > #endif > inthz = HZ; > inttick = 100 / HZ; > Index: kern/kern_clock.c > === > RCS file: /cvs/src/sys/kern/kern_clock.c,v > retrieving revision 1.93 > diff -u -p -r1.93 kern_clock.c > --- kern/kern_clock.c 22 Jul 2017 14:33:45 - 1.93 > +++ kern/kern_clock.c 14 Aug 2017 19:50:49 - > @@ -406,12 +406,11 @@ statclock(struct clockframe *frame) > if (p != NULL) { > p->p_cpticks++; > /* > -* If no schedclock is provided, call it here at ~~12-25 Hz; > +* If no schedclock is provided, call it here; > * ~~16 Hz is best > */ > if (schedhz == 0) { > - if ((++curcpu()->ci_schedstate.spc_schedticks & 3) == > - 0) > + if ((spc->spc_schedticks & 0x3f) == 0) > schedclock(p); > } > } > Index: arch/amd64/isa/clock.c > === > RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v > retrieving revision 1.25 > diff -u -p -r1.25 clock.c > --- arch/amd64/isa/clock.c 11 Aug 2017 21:18:11 - 1.25 > +++ arch/amd64/isa/clock.c 14 Aug 2017 17:19:35 - > @@ -303,8 +303,8 @@ rtcdrain(void *v) > void > i8254_initclocks(void) > { > - stathz = 128; > - profhz = 1024; > + stathz = 1024; > + profhz = 8192; > > isa_intr_establish(NULL, 0, IST_PULSE, IPL_CLOCK, clockintr, > 0, "clock"); > @@ -321,7 +321,7 @@ rtcstart(void) > { > static struct timeout rtcdrain_timeout; > > - mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_128_Hz); > + mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_1024_Hz); > mc146818_write(NULL, MC_REGB, MC_REGB_24HR | MC_REGB_PIE); > > /* > @@ -577,10 +577,10 @@ setstatclockrate(int arg) > if (initclock_func == i8254_initclocks) { > if (arg == stathz) > mc146818_write(NULL, MC_REGA, > - MC_BASE_32_KHz | MC_RATE_128_Hz); > + MC_BASE_32_KHz | MC_RATE_1024_Hz); > else > mc146818_write(NULL, MC_REGA, > - MC_BASE_32_KHz | MC_RATE_1024_Hz); > + MC_BASE_32_KHz | MC_RATE_8192_Hz); > } > } > > Index: arch/armv7/omap/dmtimer.c > === > RCS file: /cvs/src/sys/arch/armv7/omap/dmtimer.c,v > retrieving revision 1.6 > diff -u -p -r1.6 dmtimer.c > --- arch/armv7/omap/dmtimer.c 22 Jan 2015 14:33:01 - 1.6 > +++ arch/armv7/omap/dmtimer.c 14 Aug 2017 17:16:01 - > @@ -296,8 +296,8 @@ d
Re: Please test: HZ bump
I've been testing the second version of this diff in a number of areas (servers, desktop, laptop, routers) and I haven't noticed anything interesting with power usage, run time on the laptops nor anything else, anywhere. That's probably a good thing...
Re: Please test: HZ bump
On Mon, Aug 14, 2017 at 04:06:51PM -0400, Martin Pieuchot wrote: > I'd like to improve the fairness of the scheduler, with the goal of > mitigating userland starvations. For that the kernel needs to have > a better understanding of the amount of executed time per task. > > The smallest interval currently usable on all our architectures for > such accounting is a tick. With the current HZ value of 100, this > smallest interval is 10ms. I'd like to bump this value to 1000. > > The diff below intentionally bump other `hz' value to keep current > ratios. We certainly want to call schedclock(), or a similar time > accounting function, at a higher frequency than 16 Hz. However this > will be part of a later diff. > > I'd be really interested in test reports. mlarkin@ raised a good > question: is your battery lifetime shorter with this diff? > > Comments, oks? > Slightly off-topic, but FYI since around ~2003 I run my everyday desktop/music machines (and various laptops) with HZ=1024. These were first i386's and now mostly amd64's (this is needed by my MIDI stuff). Battery lifetime doesn't seem affected. This didn't cause any problems, despite the fact that 1024 is multiple of the rtc tick rate, which in theory would cause aliasing. my 2 cents.
Re: Please test: HZ bump
On 14/08/17(Mon) 22:32, Mark Kettenis wrote: > > Date: Mon, 14 Aug 2017 16:06:51 -0400 > > From: Martin Pieuchot > > > > I'd like to improve the fairness of the scheduler, with the goal of > > mitigating userland starvations. For that the kernel needs to have > > a better understanding of the amount of executed time per task. > > > > The smallest interval currently usable on all our architectures for > > such accounting is a tick. With the current HZ value of 100, this > > smallest interval is 10ms. I'd like to bump this value to 1000. > > > > The diff below intentionally bump other `hz' value to keep current > > ratios. We certainly want to call schedclock(), or a similar time > > accounting function, at a higher frequency than 16 Hz. However this > > will be part of a later diff. > > > > I'd be really interested in test reports. mlarkin@ raised a good > > question: is your battery lifetime shorter with this diff? > > > > Comments, oks? > > Need to look at this a bit more carefully but: > > > Index: conf/param.c > > === > > RCS file: /cvs/src/sys/conf/param.c,v > > retrieving revision 1.37 > > diff -u -p -r1.37 param.c > > --- conf/param.c6 May 2016 19:45:35 - 1.37 > > +++ conf/param.c14 Aug 2017 17:03:23 - > > @@ -76,7 +76,7 @@ > > # define DST 0 > > #endif > > #ifndef HZ > > -#defineHZ 100 > > +#defineHZ 1000 > > #endif > > inthz = HZ; > > inttick = 100 / HZ; > > Index: kern/kern_clock.c > > === > > RCS file: /cvs/src/sys/kern/kern_clock.c,v > > retrieving revision 1.93 > > diff -u -p -r1.93 kern_clock.c > > --- kern/kern_clock.c 22 Jul 2017 14:33:45 - 1.93 > > +++ kern/kern_clock.c 14 Aug 2017 19:50:49 - > > @@ -406,12 +406,11 @@ statclock(struct clockframe *frame) > > if (p != NULL) { > > p->p_cpticks++; > > /* > > -* If no schedclock is provided, call it here at ~~12-25 Hz; > > +* If no schedclock is provided, call it here; > > * ~~16 Hz is best > > */ > > if (schedhz == 0) { > > - if ((++curcpu()->ci_schedstate.spc_schedticks & 3) == > > - 0) > > + if ((spc->spc_schedticks & 0x3f) == 0) > > That ++ should not be dropped sould it? Indeed! Index: conf/param.c === RCS file: /cvs/src/sys/conf/param.c,v retrieving revision 1.37 diff -u -p -r1.37 param.c --- conf/param.c6 May 2016 19:45:35 - 1.37 +++ conf/param.c14 Aug 2017 17:03:23 - @@ -76,7 +76,7 @@ # define DST 0 #endif #ifndef HZ -#defineHZ 100 +#defineHZ 1000 #endif inthz = HZ; inttick = 100 / HZ; Index: kern/kern_clock.c === RCS file: /cvs/src/sys/kern/kern_clock.c,v retrieving revision 1.93 diff -u -p -r1.93 kern_clock.c --- kern/kern_clock.c 22 Jul 2017 14:33:45 - 1.93 +++ kern/kern_clock.c 14 Aug 2017 21:03:54 - @@ -406,12 +406,11 @@ statclock(struct clockframe *frame) if (p != NULL) { p->p_cpticks++; /* -* If no schedclock is provided, call it here at ~~12-25 Hz; +* If no schedclock is provided, call it here; * ~~16 Hz is best */ if (schedhz == 0) { - if ((++curcpu()->ci_schedstate.spc_schedticks & 3) == - 0) + if ((++spc->spc_schedticks & 0x3f) == 0) schedclock(p); } } Index: arch/amd64/isa/clock.c === RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v retrieving revision 1.25 diff -u -p -r1.25 clock.c --- arch/amd64/isa/clock.c 11 Aug 2017 21:18:11 - 1.25 +++ arch/amd64/isa/clock.c 14 Aug 2017 17:19:35 - @@ -303,8 +303,8 @@ rtcdrain(void *v) void i8254_initclocks(void) { - stathz = 128; - profhz = 1024; + stathz = 1024; + profhz = 8192; isa_intr_establish(NULL, 0, IST_PULSE, IPL_CLOCK, clockintr, 0, "clock"); @@ -321,7 +321,7 @@ rtcstart(void) { static struct timeout rtcdrain_timeout; - mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_128_Hz); + mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_1024_Hz); mc146818_write(NULL, MC_REGB, MC_REGB_24HR | MC_REGB_PIE); /* @@ -577,10 +577,10 @@ setstatclockrate(int arg) if (initclock_func == i8254_initclocks) { if (arg == stathz) mc146818_write(NULL, MC_REGA, - MC_BASE_32_KHz | MC_RATE_128_Hz); + MC_BASE
Re: Please test: HZ bump
Ted Unangst wrote: > Martin Pieuchot wrote: > > I'd like to improve the fairness of the scheduler, with the goal of > > mitigating userland starvations. For that the kernel needs to have > > a better understanding of the amount of executed time per task. > > > > The smallest interval currently usable on all our architectures for > > such accounting is a tick. With the current HZ value of 100, this > > smallest interval is 10ms. I'd like to bump this value to 1000. > > Maybe we want this too, for sh? This looks like accidental netbsd copying. Or > are we intentionally resetting hz on sh for some reason? apparently yes because the clock only works at 64hz. is the conf file a better place for that, instead of having two separate ifndef initializers with different values? that troubles me, even if it seems to work. just define HZ=64 in the right place. Index: arch/landisk/conf/GENERIC === RCS file: /cvs/src/sys/arch/landisk/conf/GENERIC,v retrieving revision 1.51 diff -u -p -r1.51 GENERIC --- arch/landisk/conf/GENERIC 28 Jun 2016 04:41:37 - 1.51 +++ arch/landisk/conf/GENERIC 14 Aug 2017 20:56:29 - @@ -21,6 +21,8 @@ optionPCLOCK= # 33.33MHz clo option DONT_INIT_BSC #optionDONT_INIT_PCIBSC +option HZ=64 + option PCIVERBOSE option USER_PCICONF# user-space PCI configuration option USBVERBOSE > > > Index: arch/sh/sh/clock.c > === > RCS file: /cvs/src/sys/arch/sh/sh/clock.c,v > retrieving revision 1.9 > diff -u -p -r1.9 clock.c > --- arch/sh/sh/clock.c5 Mar 2016 17:16:33 - 1.9 > +++ arch/sh/sh/clock.c14 Aug 2017 20:49:31 - > @@ -47,9 +47,6 @@ > > #define NWDOG 0 > > -#ifndef HZ > -#define HZ 64 > -#endif > #define MINYEAR 2002/* "today" */ > #define SH_RTC_CLOCK16384 /* Hz */ > > @@ -231,10 +228,6 @@ cpu_initclocks(void) > { > if (sh_clock.pclock == 0) > panic("No PCLOCK information."); > - > - /* Set global variables. */ > - hz = HZ; > - tick = 100 / hz; > > /* >* Use TMU channel 0 as hard clock >
Re: Please test: HZ bump
Martin Pieuchot wrote: > I'd like to improve the fairness of the scheduler, with the goal of > mitigating userland starvations. For that the kernel needs to have > a better understanding of the amount of executed time per task. > > The smallest interval currently usable on all our architectures for > such accounting is a tick. With the current HZ value of 100, this > smallest interval is 10ms. I'd like to bump this value to 1000. Maybe we want this too, for sh? This looks like accidental netbsd copying. Or are we intentionally resetting hz on sh for some reason? Index: arch/sh/sh/clock.c === RCS file: /cvs/src/sys/arch/sh/sh/clock.c,v retrieving revision 1.9 diff -u -p -r1.9 clock.c --- arch/sh/sh/clock.c 5 Mar 2016 17:16:33 - 1.9 +++ arch/sh/sh/clock.c 14 Aug 2017 20:49:31 - @@ -47,9 +47,6 @@ #defineNWDOG 0 -#ifndef HZ -#defineHZ 64 -#endif #defineMINYEAR 2002/* "today" */ #defineSH_RTC_CLOCK16384 /* Hz */ @@ -231,10 +228,6 @@ cpu_initclocks(void) { if (sh_clock.pclock == 0) panic("No PCLOCK information."); - - /* Set global variables. */ - hz = HZ; - tick = 100 / hz; /* * Use TMU channel 0 as hard clock
Re: Please test: HZ bump
> Date: Mon, 14 Aug 2017 16:06:51 -0400 > From: Martin Pieuchot > > I'd like to improve the fairness of the scheduler, with the goal of > mitigating userland starvations. For that the kernel needs to have > a better understanding of the amount of executed time per task. > > The smallest interval currently usable on all our architectures for > such accounting is a tick. With the current HZ value of 100, this > smallest interval is 10ms. I'd like to bump this value to 1000. > > The diff below intentionally bump other `hz' value to keep current > ratios. We certainly want to call schedclock(), or a similar time > accounting function, at a higher frequency than 16 Hz. However this > will be part of a later diff. > > I'd be really interested in test reports. mlarkin@ raised a good > question: is your battery lifetime shorter with this diff? > > Comments, oks? Need to look at this a bit more carefully but: > Index: conf/param.c > === > RCS file: /cvs/src/sys/conf/param.c,v > retrieving revision 1.37 > diff -u -p -r1.37 param.c > --- conf/param.c 6 May 2016 19:45:35 - 1.37 > +++ conf/param.c 14 Aug 2017 17:03:23 - > @@ -76,7 +76,7 @@ > # define DST 0 > #endif > #ifndef HZ > -#define HZ 100 > +#define HZ 1000 > #endif > int hz = HZ; > int tick = 100 / HZ; > Index: kern/kern_clock.c > === > RCS file: /cvs/src/sys/kern/kern_clock.c,v > retrieving revision 1.93 > diff -u -p -r1.93 kern_clock.c > --- kern/kern_clock.c 22 Jul 2017 14:33:45 - 1.93 > +++ kern/kern_clock.c 14 Aug 2017 19:50:49 - > @@ -406,12 +406,11 @@ statclock(struct clockframe *frame) > if (p != NULL) { > p->p_cpticks++; > /* > - * If no schedclock is provided, call it here at ~~12-25 Hz; > + * If no schedclock is provided, call it here; >* ~~16 Hz is best >*/ > if (schedhz == 0) { > - if ((++curcpu()->ci_schedstate.spc_schedticks & 3) == > - 0) > + if ((spc->spc_schedticks & 0x3f) == 0) That ++ should not be dropped sould it?
Please test: HZ bump
I'd like to improve the fairness of the scheduler, with the goal of mitigating userland starvations. For that the kernel needs to have a better understanding of the amount of executed time per task. The smallest interval currently usable on all our architectures for such accounting is a tick. With the current HZ value of 100, this smallest interval is 10ms. I'd like to bump this value to 1000. The diff below intentionally bump other `hz' value to keep current ratios. We certainly want to call schedclock(), or a similar time accounting function, at a higher frequency than 16 Hz. However this will be part of a later diff. I'd be really interested in test reports. mlarkin@ raised a good question: is your battery lifetime shorter with this diff? Comments, oks? Index: conf/param.c === RCS file: /cvs/src/sys/conf/param.c,v retrieving revision 1.37 diff -u -p -r1.37 param.c --- conf/param.c6 May 2016 19:45:35 - 1.37 +++ conf/param.c14 Aug 2017 17:03:23 - @@ -76,7 +76,7 @@ # define DST 0 #endif #ifndef HZ -#defineHZ 100 +#defineHZ 1000 #endif inthz = HZ; inttick = 100 / HZ; Index: kern/kern_clock.c === RCS file: /cvs/src/sys/kern/kern_clock.c,v retrieving revision 1.93 diff -u -p -r1.93 kern_clock.c --- kern/kern_clock.c 22 Jul 2017 14:33:45 - 1.93 +++ kern/kern_clock.c 14 Aug 2017 19:50:49 - @@ -406,12 +406,11 @@ statclock(struct clockframe *frame) if (p != NULL) { p->p_cpticks++; /* -* If no schedclock is provided, call it here at ~~12-25 Hz; +* If no schedclock is provided, call it here; * ~~16 Hz is best */ if (schedhz == 0) { - if ((++curcpu()->ci_schedstate.spc_schedticks & 3) == - 0) + if ((spc->spc_schedticks & 0x3f) == 0) schedclock(p); } } Index: arch/amd64/isa/clock.c === RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v retrieving revision 1.25 diff -u -p -r1.25 clock.c --- arch/amd64/isa/clock.c 11 Aug 2017 21:18:11 - 1.25 +++ arch/amd64/isa/clock.c 14 Aug 2017 17:19:35 - @@ -303,8 +303,8 @@ rtcdrain(void *v) void i8254_initclocks(void) { - stathz = 128; - profhz = 1024; + stathz = 1024; + profhz = 8192; isa_intr_establish(NULL, 0, IST_PULSE, IPL_CLOCK, clockintr, 0, "clock"); @@ -321,7 +321,7 @@ rtcstart(void) { static struct timeout rtcdrain_timeout; - mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_128_Hz); + mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_1024_Hz); mc146818_write(NULL, MC_REGB, MC_REGB_24HR | MC_REGB_PIE); /* @@ -577,10 +577,10 @@ setstatclockrate(int arg) if (initclock_func == i8254_initclocks) { if (arg == stathz) mc146818_write(NULL, MC_REGA, - MC_BASE_32_KHz | MC_RATE_128_Hz); + MC_BASE_32_KHz | MC_RATE_1024_Hz); else mc146818_write(NULL, MC_REGA, - MC_BASE_32_KHz | MC_RATE_1024_Hz); + MC_BASE_32_KHz | MC_RATE_8192_Hz); } } Index: arch/armv7/omap/dmtimer.c === RCS file: /cvs/src/sys/arch/armv7/omap/dmtimer.c,v retrieving revision 1.6 diff -u -p -r1.6 dmtimer.c --- arch/armv7/omap/dmtimer.c 22 Jan 2015 14:33:01 - 1.6 +++ arch/armv7/omap/dmtimer.c 14 Aug 2017 17:16:01 - @@ -296,8 +296,8 @@ dmtimer_cpu_initclocks() { struct dmtimer_softc*sc = dmtimer_cd.cd_devs[1]; - stathz = 128; - profhz = 1024; + stathz = 1024; + profhz = 8192; sc->sc_ticks_per_second = TIMER_FREQUENCY; /* 32768 */ Index: arch/armv7/omap/gptimer.c === RCS file: /cvs/src/sys/arch/armv7/omap/gptimer.c,v retrieving revision 1.4 diff -u -p -r1.4 gptimer.c --- arch/armv7/omap/gptimer.c 20 Jun 2014 14:08:11 - 1.4 +++ arch/armv7/omap/gptimer.c 14 Aug 2017 17:15:44 - @@ -283,8 +283,8 @@ void gptimer_cpu_initclocks() { // u_int32_t now; - stathz = 128; - profhz = 1024; + stathz = 1024; + profhz = 8192; ticks_per_second = TIMER_FREQUENCY; Index: arch/armv7/sunxi/sxitimer.c === RCS file: /cvs/src/sys/arch/armv7/sunxi/sxitimer.c,v retrieving revision 1.10 diff -u -p -r1.10 sxitimer.c --- arch/armv7/sunxi/sxitimer.c 21 Jan 2017 08:26:49 - 1.10 +++ arch/arm