Re: Please test: HZ bump

2018-12-25 Thread Mike Larkin
On Tue, Dec 25, 2018 at 06:37:03PM -0200, Martin Pieuchot wrote:
> On 24/12/18(Mon) 20:07, Scott Cheloha wrote:
> > On Tue, Dec 18, 2018 at 03:39:43PM -0600, Ian Sutton wrote:
> > > On Mon, Aug 14, 2017 at 3:07 PM Martin Pieuchot  wrote:
> > > >
> > > > I'd like to improve the fairness of the scheduler, with the goal of
> > > > mitigating userland starvations.  For that the kernel needs to have
> > > > a better understanding of the amount of executed time per task.
> > > >
> > > > The smallest interval currently usable on all our architectures for
> > > > such accounting is a tick.  With the current HZ value of 100, this
> > > > smallest interval is 10ms.  I'd like to bump this value to 1000.
> > > >
> > > > The diff below intentionally bump other `hz' value to keep current
> > > > ratios.  We certainly want to call schedclock(), or a similar time
> > > > accounting function, at a higher frequency than 16 Hz.  However this
> > > > will be part of a later diff.
> > > >
> > > > I'd be really interested in test reports.  mlarkin@ raised a good
> > > > question: is your battery lifetime shorter with this diff?
> > > >
> > > [...] 
> > > I'd like to see more folks test and other devs to share their
> > > thoughts: What are the risks associated with bumping HZ globally?
> > > Drawbacks? Reasons for hesitation?
> > 
> > In general I'd like to reduce wakeup latency as well.  Raising HZ is an
> > obvious route to achieving that.  But I think there are a couple things
> > that need to be addressed before it would be reasonable.  The things that
> > come to mind for me are:
> > 
> >  - A tick is a 32-bit signed integer on all platforms.  If HZ=100, we
> >can represent at most ~248 days in ticks.  This is plenty.  If HZ=1000,
> >we now only have ~24.8 days.  Some may disagree, but I don't think this
> >is enough.
> 
> Why do you think it isn't enough?
> 
> >One possible solution is to make ticks 64-bit.  This addresses the
> >timeout length issue at a cost to 32-bit platforms that I cannot
> >quantify without lots of testing: what is the overhead of using 64-bit
> >arithmetic on a 32-bit machine for all timeouts?
> > 
> >A compromise is to make ticks a long.  kettenis mentioned this
> >possibility in a commit [1] some time back.  This would allow 64-bit
> >platforms to raise HZ without crippling timeout ranges.  But then you
> >have ticks of different sizes on different platforms, which could be a
> >headache, I imagine.
> 
> Note that we had, and certainly still have, tick-wrapping bugs in the
> kernel :)  
> 
> >(maybe there are other solutions?)
> 
> Solution to what?
> 
> >  - How does an OpenBSD guest on vmd(8) behave when HZ=1000?  Multiple such
> >guests on vmd(8)?  Such guests on other hypervisors?
> > 
> >  - The replies in this thread don't indicate any effect on battery life or
> >power consumption but I find it hard to believe that raising HZ has no
> >impact on such things.  Bumping HZ like this *must* increase CPU 
> > utilization.
> >What is the cost in watt-hours?
> 
> It depends on the machine.  But that's one of the reasons I dropped the
> bump.
> 
> >  - Can smaller machines even handle HZ=1000?  Linux experimented with this
> >over a decade ago and settled on a default HZ=250 for i386 [2].  I don't
> >know how it all shook out, but my guess is that they didn't revert from
> >1000 -> 250 for no reason at all.  Of course, FreeBSD went ahead with 
> > 1000
> >on i386, so opinions differ.
> 
> Indeed, we still support architectures that can't handle an HZ of 1000.
> 
> >  - How does this effect e.g. packet throughput on smaller machines?  I think
> >bigger boxes on amd64 would be fine, but I wonder if throughput would 
> > take
> >a noticeable hit on a smaller router.
> 
> Some measurements indicated a drop of 10% in packet forwarding on some
> machines and no difference on others. 
> 
> > And then... can we reduce wakeup latency in general without raising HZ?  
> > Other
> > systems (e.g. DFly) have better wakeup latencies and still have HZ=100.  
> > What
> > are they doing?  Can we borrow it?
> 
> I haven't looked at other systems like DragonFly, but since you seem
> interested to improve that area, here's my story.  I didn't look at
> wakeup latencies.  I don't know why you're after that.  Instead I
> focused on `schedhz' and schedclock().  I landed there after observing
> that with a high number of threads in "running" state (an active
> browser while baking a muild), work was badly distributed amongst CPUs.
> Some per-CPU queues were growing and others stayed empty.
> 
> CPUs have runqueues that are selected based on per-thread `p_priority'.
> What does this field represent today is confusing.  Many changes since
> the original scheduler design, including hardware improvements, side-
> effects and developer mistakes makes it more confusing.  However bumping
> HZ improves the placements of "running" threads

Re: Please test: HZ bump

2018-12-25 Thread Martin Pieuchot
On 24/12/18(Mon) 20:07, Scott Cheloha wrote:
> On Tue, Dec 18, 2018 at 03:39:43PM -0600, Ian Sutton wrote:
> > On Mon, Aug 14, 2017 at 3:07 PM Martin Pieuchot  wrote:
> > >
> > > I'd like to improve the fairness of the scheduler, with the goal of
> > > mitigating userland starvations.  For that the kernel needs to have
> > > a better understanding of the amount of executed time per task.
> > >
> > > The smallest interval currently usable on all our architectures for
> > > such accounting is a tick.  With the current HZ value of 100, this
> > > smallest interval is 10ms.  I'd like to bump this value to 1000.
> > >
> > > The diff below intentionally bump other `hz' value to keep current
> > > ratios.  We certainly want to call schedclock(), or a similar time
> > > accounting function, at a higher frequency than 16 Hz.  However this
> > > will be part of a later diff.
> > >
> > > I'd be really interested in test reports.  mlarkin@ raised a good
> > > question: is your battery lifetime shorter with this diff?
> > >
> > [...] 
> > I'd like to see more folks test and other devs to share their
> > thoughts: What are the risks associated with bumping HZ globally?
> > Drawbacks? Reasons for hesitation?
> 
> In general I'd like to reduce wakeup latency as well.  Raising HZ is an
> obvious route to achieving that.  But I think there are a couple things
> that need to be addressed before it would be reasonable.  The things that
> come to mind for me are:
> 
>  - A tick is a 32-bit signed integer on all platforms.  If HZ=100, we
>can represent at most ~248 days in ticks.  This is plenty.  If HZ=1000,
>we now only have ~24.8 days.  Some may disagree, but I don't think this
>is enough.

Why do you think it isn't enough?

>One possible solution is to make ticks 64-bit.  This addresses the
>timeout length issue at a cost to 32-bit platforms that I cannot
>quantify without lots of testing: what is the overhead of using 64-bit
>arithmetic on a 32-bit machine for all timeouts?
> 
>A compromise is to make ticks a long.  kettenis mentioned this
>possibility in a commit [1] some time back.  This would allow 64-bit
>platforms to raise HZ without crippling timeout ranges.  But then you
>have ticks of different sizes on different platforms, which could be a
>headache, I imagine.

Note that we had, and certainly still have, tick-wrapping bugs in the
kernel :)  

>(maybe there are other solutions?)

Solution to what?

>  - How does an OpenBSD guest on vmd(8) behave when HZ=1000?  Multiple such
>guests on vmd(8)?  Such guests on other hypervisors?
> 
>  - The replies in this thread don't indicate any effect on battery life or
>power consumption but I find it hard to believe that raising HZ has no
>impact on such things.  Bumping HZ like this *must* increase CPU 
> utilization.
>What is the cost in watt-hours?

It depends on the machine.  But that's one of the reasons I dropped the
bump.

>  - Can smaller machines even handle HZ=1000?  Linux experimented with this
>over a decade ago and settled on a default HZ=250 for i386 [2].  I don't
>know how it all shook out, but my guess is that they didn't revert from
>1000 -> 250 for no reason at all.  Of course, FreeBSD went ahead with 1000
>on i386, so opinions differ.

Indeed, we still support architectures that can't handle an HZ of 1000.

>  - How does this effect e.g. packet throughput on smaller machines?  I think
>bigger boxes on amd64 would be fine, but I wonder if throughput would take
>a noticeable hit on a smaller router.

Some measurements indicated a drop of 10% in packet forwarding on some
machines and no difference on others. 

> And then... can we reduce wakeup latency in general without raising HZ?  Other
> systems (e.g. DFly) have better wakeup latencies and still have HZ=100.  What
> are they doing?  Can we borrow it?

I haven't looked at other systems like DragonFly, but since you seem
interested to improve that area, here's my story.  I didn't look at
wakeup latencies.  I don't know why you're after that.  Instead I
focused on `schedhz' and schedclock().  I landed there after observing
that with a high number of threads in "running" state (an active
browser while baking a muild), work was badly distributed amongst CPUs.
Some per-CPU queues were growing and others stayed empty.

CPUs have runqueues that are selected based on per-thread `p_priority'.
What does this field represent today is confusing.  Many changes since
the original scheduler design, including hardware improvements, side-
effects and developer mistakes makes it more confusing.  However bumping
HZ improves the placements of "running" threads in per-CPU runqueues.

I spend a lot of time trying to observe and understand why.  I don't
remember the details but came to the conclusion that `p_priority' was
newer.  In other words the kernel has more up-to-date information to
make choices.

However it became clear to me that t

Re: Please test: HZ bump

2018-12-25 Thread Henri Kemppainen
> And then... can we reduce wakeup latency in general without raising HZ?  Other
> systems (e.g. DFly) have better wakeup latencies and still have HZ=100.  What
> are they doing?  Can we borrow it?

https://frenchfries.net/paul/dfly/nanosleep.html

OpenBSD is still adding that one tick which results in a (typical) sleep
duration of no less than about 20ms.



Re: Please test: HZ bump

2018-12-24 Thread Ted Unangst
Scott Cheloha wrote:
>  - A tick is a 32-bit signed integer on all platforms.  If HZ=100, we
>can represent at most ~248 days in ticks.  This is plenty.  If HZ=1000,
>we now only have ~24.8 days.  Some may disagree, but I don't think this
>is enough.

So the question is what happens when a timeout fires early?

Kernel code we control, and we can fix that. By inspection, it seems nanosleep
is already non-compliant since posix says it may not return early. We'd need
to add a loop or something. And review other userland interfaces to timeouts.

>A compromise is to make ticks a long.  kettenis mentioned this
>possibility in a commit [1] some time back.  This would allow 64-bit
>platforms to raise HZ without crippling timeout ranges.  But then you
>have ticks of different sizes on different platforms, which could be a
>headache, I imagine.

If we have learned anything from off_t and time_t, it's that such splits cause
a lot of ongoing difficulty.

>  - How does an OpenBSD guest on vmd(8) behave when HZ=1000?  Multiple such
>guests on vmd(8)?  Such guests on other hypervisors?

If the host is HZ=1000 and the guest is HZ=100, time keeping works much
better. :)

> And then... can we reduce wakeup latency in general without raising HZ?  Other
> systems (e.g. DFly) have better wakeup latencies and still have HZ=100.  What
> are they doing?  Can we borrow it?

Ideally, yes. The lapic can be programmed to fire one shot with a much shorter
duration, but this quickly gets complicated with coalescing, etc.



Re: Please test: HZ bump

2018-12-24 Thread Scott Cheloha
On Tue, Dec 18, 2018 at 03:39:43PM -0600, Ian Sutton wrote:
> On Mon, Aug 14, 2017 at 3:07 PM Martin Pieuchot  wrote:
> >
> > I'd like to improve the fairness of the scheduler, with the goal of
> > mitigating userland starvations.  For that the kernel needs to have
> > a better understanding of the amount of executed time per task.
> >
> > The smallest interval currently usable on all our architectures for
> > such accounting is a tick.  With the current HZ value of 100, this
> > smallest interval is 10ms.  I'd like to bump this value to 1000.
> >
> > The diff below intentionally bump other `hz' value to keep current
> > ratios.  We certainly want to call schedclock(), or a similar time
> > accounting function, at a higher frequency than 16 Hz.  However this
> > will be part of a later diff.
> >
> > I'd be really interested in test reports.  mlarkin@ raised a good
> > question: is your battery lifetime shorter with this diff?
> >
> > Comments, oks?
> 
> I'd like to revisit this patch. It makes our armv7 platform more
> usable for what it is meant to do, i.e. be a microcontroller. I
> imagine on other platforms it would accrue similar benefits as well.
> 
> I've tested this patch and found delightfully proportional results.
> Currently, at HZ = 100, the minimum latency for a sleep calll from
> userspace is about 10ms:
> 
> https://ce.gl/baseline.jpg
> 
> After the patch, which bumps HZ from 100 --> 1000, we see a tenfold
> decrease in this latency:
> 
> https://ce.gl/with-mpi-hz-patch.jpg
> 
> This signal is generated with gpio(4) ioctl calls from userspace,
> e.g.: for(;;) { HI(pin); usleep(1); LO(pin(); usleep(1); }
> 
> I'd like to see more folks test and other devs to share their
> thoughts: What are the risks associated with bumping HZ globally?
> Drawbacks? Reasons for hesitation?

In general I'd like to reduce wakeup latency as well.  Raising HZ is an
obvious route to achieving that.  But I think there are a couple things
that need to be addressed before it would be reasonable.  The things that
come to mind for me are:

 - A tick is a 32-bit signed integer on all platforms.  If HZ=100, we
   can represent at most ~248 days in ticks.  This is plenty.  If HZ=1000,
   we now only have ~24.8 days.  Some may disagree, but I don't think this
   is enough.

   One possible solution is to make ticks 64-bit.  This addresses the
   timeout length issue at a cost to 32-bit platforms that I cannot
   quantify without lots of testing: what is the overhead of using 64-bit
   arithmetic on a 32-bit machine for all timeouts?

   A compromise is to make ticks a long.  kettenis mentioned this
   possibility in a commit [1] some time back.  This would allow 64-bit
   platforms to raise HZ without crippling timeout ranges.  But then you
   have ticks of different sizes on different platforms, which could be a
   headache, I imagine.

   (maybe there are other solutions?)

 - How does an OpenBSD guest on vmd(8) behave when HZ=1000?  Multiple such
   guests on vmd(8)?  Such guests on other hypervisors?

 - The replies in this thread don't indicate any effect on battery life or
   power consumption but I find it hard to believe that raising HZ has no
   impact on such things.  Bumping HZ like this *must* increase CPU utilization.
   What is the cost in watt-hours?

 - Can smaller machines even handle HZ=1000?  Linux experimented with this
   over a decade ago and settled on a default HZ=250 for i386 [2].  I don't
   know how it all shook out, but my guess is that they didn't revert from
   1000 -> 250 for no reason at all.  Of course, FreeBSD went ahead with 1000
   on i386, so opinions differ.

 - How does this effect e.g. packet throughput on smaller machines?  I think
   bigger boxes on amd64 would be fine, but I wonder if throughput would take
   a noticeable hit on a smaller router.

And then... can we reduce wakeup latency in general without raising HZ?  Other
systems (e.g. DFly) have better wakeup latencies and still have HZ=100.  What
are they doing?  Can we borrow it?

--

Sorry for the length.  In short, you should be fine compiling custom kernels
for your controllers with HZ=1000; you shouldn't see any ill effects for that
use case.  But making it the default, even for select platforms, needs more
planning.

-Scott

[1] 
http://cvsweb.openbsd.org/src/sys/kern/kern_clock.c?rev=1.93&content-type=text/x-cvsweb-markup

[2] http://man7.org/linux/man-pages/man7/time.7.html

> > Index: conf/param.c
> > ===
> > RCS file: /cvs/src/sys/conf/param.c,v
> > retrieving revision 1.37
> > diff -u -p -r1.37 param.c
> > --- conf/param.c6 May 2016 19:45:35 -   1.37
> > +++ conf/param.c14 Aug 2017 17:03:23 -
> > @@ -76,7 +76,7 @@
> >  # define DST 0
> >  #endif
> >  #ifndef HZ
> > -#defineHZ 100
> > +#defineHZ 1000
> >  #endif
> >  inthz = HZ;
> >  inttick = 100 / HZ;
> > Index: kern/kern_clock.c
> > =

Re: Please test: HZ bump

2018-12-18 Thread Ian Sutton
On Mon, Aug 14, 2017 at 3:07 PM Martin Pieuchot  wrote:
>
> I'd like to improve the fairness of the scheduler, with the goal of
> mitigating userland starvations.  For that the kernel needs to have
> a better understanding of the amount of executed time per task.
>
> The smallest interval currently usable on all our architectures for
> such accounting is a tick.  With the current HZ value of 100, this
> smallest interval is 10ms.  I'd like to bump this value to 1000.
>
> The diff below intentionally bump other `hz' value to keep current
> ratios.  We certainly want to call schedclock(), or a similar time
> accounting function, at a higher frequency than 16 Hz.  However this
> will be part of a later diff.
>
> I'd be really interested in test reports.  mlarkin@ raised a good
> question: is your battery lifetime shorter with this diff?
>
> Comments, oks?
>

I'd like to revisit this patch. It makes our armv7 platform more
usable for what it is meant to do, i.e. be a microcontroller. I
imagine on other platforms it would accrue similar benefits as well.

I've tested this patch and found delightfully proportional results.
Currently, at HZ = 100, the minimum latency for a sleep calll from
userspace is about 10ms:

https://ce.gl/baseline.jpg

After the patch, which bumps HZ from 100 --> 1000, we see a tenfold
decrease in this latency:

https://ce.gl/with-mpi-hz-patch.jpg

This signal is generated with gpio(4) ioctl calls from userspace,
e.g.: for(;;) { HI(pin); usleep(1); LO(pin(); usleep(1); }

I'd like to see more folks test and other devs to share their
thoughts: What are the risks associated with bumping HZ globally?
Drawbacks? Reasons for hesitation?

Thanks,
Ian Sutton



> Index: conf/param.c
> ===
> RCS file: /cvs/src/sys/conf/param.c,v
> retrieving revision 1.37
> diff -u -p -r1.37 param.c
> --- conf/param.c6 May 2016 19:45:35 -   1.37
> +++ conf/param.c14 Aug 2017 17:03:23 -
> @@ -76,7 +76,7 @@
>  # define DST 0
>  #endif
>  #ifndef HZ
> -#defineHZ 100
> +#defineHZ 1000
>  #endif
>  inthz = HZ;
>  inttick = 100 / HZ;
> Index: kern/kern_clock.c
> ===
> RCS file: /cvs/src/sys/kern/kern_clock.c,v
> retrieving revision 1.93
> diff -u -p -r1.93 kern_clock.c
> --- kern/kern_clock.c   22 Jul 2017 14:33:45 -  1.93
> +++ kern/kern_clock.c   14 Aug 2017 19:50:49 -
> @@ -406,12 +406,11 @@ statclock(struct clockframe *frame)
> if (p != NULL) {
> p->p_cpticks++;
> /*
> -* If no schedclock is provided, call it here at ~~12-25 Hz;
> +* If no schedclock is provided, call it here;
>  * ~~16 Hz is best
>  */
> if (schedhz == 0) {
> -   if ((++curcpu()->ci_schedstate.spc_schedticks & 3) ==
> -   0)
> +   if ((spc->spc_schedticks & 0x3f) == 0)
> schedclock(p);
> }
> }
> Index: arch/amd64/isa/clock.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v
> retrieving revision 1.25
> diff -u -p -r1.25 clock.c
> --- arch/amd64/isa/clock.c  11 Aug 2017 21:18:11 -  1.25
> +++ arch/amd64/isa/clock.c  14 Aug 2017 17:19:35 -
> @@ -303,8 +303,8 @@ rtcdrain(void *v)
>  void
>  i8254_initclocks(void)
>  {
> -   stathz = 128;
> -   profhz = 1024;
> +   stathz = 1024;
> +   profhz = 8192;
>
> isa_intr_establish(NULL, 0, IST_PULSE, IPL_CLOCK, clockintr,
> 0, "clock");
> @@ -321,7 +321,7 @@ rtcstart(void)
>  {
> static struct timeout rtcdrain_timeout;
>
> -   mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_128_Hz);
> +   mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_1024_Hz);
> mc146818_write(NULL, MC_REGB, MC_REGB_24HR | MC_REGB_PIE);
>
> /*
> @@ -577,10 +577,10 @@ setstatclockrate(int arg)
> if (initclock_func == i8254_initclocks) {
> if (arg == stathz)
> mc146818_write(NULL, MC_REGA,
> -   MC_BASE_32_KHz | MC_RATE_128_Hz);
> +   MC_BASE_32_KHz | MC_RATE_1024_Hz);
> else
> mc146818_write(NULL, MC_REGA,
> -   MC_BASE_32_KHz | MC_RATE_1024_Hz);
> +   MC_BASE_32_KHz | MC_RATE_8192_Hz);
> }
>  }
>
> Index: arch/armv7/omap/dmtimer.c
> ===
> RCS file: /cvs/src/sys/arch/armv7/omap/dmtimer.c,v
> retrieving revision 1.6
> diff -u -p -r1.6 dmtimer.c
> --- arch/armv7/omap/dmtimer.c   22 Jan 2015 14:33:01 -  1.6
> +++ arch/armv7/omap/dmtimer.c   14 Aug 2017 17:16:01 -
> @@ -296,8 +296,8 @@ d

Re: Please test: HZ bump

2017-08-21 Thread Chris Cappuccio
I've been testing the second version of this diff in a number of areas 
(servers, desktop, laptop, routers) and I haven't noticed anything interesting
with power usage, run time on the laptops nor anything else, anywhere. That's
probably a good thing...



Re: Please test: HZ bump

2017-08-18 Thread Alexandre Ratchov
On Mon, Aug 14, 2017 at 04:06:51PM -0400, Martin Pieuchot wrote:
> I'd like to improve the fairness of the scheduler, with the goal of
> mitigating userland starvations.  For that the kernel needs to have
> a better understanding of the amount of executed time per task. 
> 
> The smallest interval currently usable on all our architectures for
> such accounting is a tick.  With the current HZ value of 100, this
> smallest interval is 10ms.  I'd like to bump this value to 1000.
> 
> The diff below intentionally bump other `hz' value to keep current
> ratios.  We certainly want to call schedclock(), or a similar time
> accounting function, at a higher frequency than 16 Hz.  However this
> will be part of a later diff.
> 
> I'd be really interested in test reports.  mlarkin@ raised a good
> question: is your battery lifetime shorter with this diff?
> 
> Comments, oks?
> 

Slightly off-topic, but FYI since around ~2003 I run my everyday
desktop/music machines (and various laptops) with HZ=1024.  These
were first i386's and now mostly amd64's (this is needed by my MIDI
stuff).  Battery lifetime doesn't seem affected.

This didn't cause any problems, despite the fact that 1024 is
multiple of the rtc tick rate, which in theory would cause
aliasing.

my 2 cents.



Re: Please test: HZ bump

2017-08-14 Thread Martin Pieuchot
On 14/08/17(Mon) 22:32, Mark Kettenis wrote:
> > Date: Mon, 14 Aug 2017 16:06:51 -0400
> > From: Martin Pieuchot 
> > 
> > I'd like to improve the fairness of the scheduler, with the goal of
> > mitigating userland starvations.  For that the kernel needs to have
> > a better understanding of the amount of executed time per task. 
> > 
> > The smallest interval currently usable on all our architectures for
> > such accounting is a tick.  With the current HZ value of 100, this
> > smallest interval is 10ms.  I'd like to bump this value to 1000.
> > 
> > The diff below intentionally bump other `hz' value to keep current
> > ratios.  We certainly want to call schedclock(), or a similar time
> > accounting function, at a higher frequency than 16 Hz.  However this
> > will be part of a later diff.
> > 
> > I'd be really interested in test reports.  mlarkin@ raised a good
> > question: is your battery lifetime shorter with this diff?
> > 
> > Comments, oks?
> 
> Need to look at this a bit more carefully but:
> 
> > Index: conf/param.c
> > ===
> > RCS file: /cvs/src/sys/conf/param.c,v
> > retrieving revision 1.37
> > diff -u -p -r1.37 param.c
> > --- conf/param.c6 May 2016 19:45:35 -   1.37
> > +++ conf/param.c14 Aug 2017 17:03:23 -
> > @@ -76,7 +76,7 @@
> >  # define DST 0
> >  #endif
> >  #ifndef HZ
> > -#defineHZ 100
> > +#defineHZ 1000
> >  #endif
> >  inthz = HZ;
> >  inttick = 100 / HZ;
> > Index: kern/kern_clock.c
> > ===
> > RCS file: /cvs/src/sys/kern/kern_clock.c,v
> > retrieving revision 1.93
> > diff -u -p -r1.93 kern_clock.c
> > --- kern/kern_clock.c   22 Jul 2017 14:33:45 -  1.93
> > +++ kern/kern_clock.c   14 Aug 2017 19:50:49 -
> > @@ -406,12 +406,11 @@ statclock(struct clockframe *frame)
> > if (p != NULL) {
> > p->p_cpticks++;
> > /*
> > -* If no schedclock is provided, call it here at ~~12-25 Hz;
> > +* If no schedclock is provided, call it here;
> >  * ~~16 Hz is best
> >  */
> > if (schedhz == 0) {
> > -   if ((++curcpu()->ci_schedstate.spc_schedticks & 3) ==
> > -   0)
> > +   if ((spc->spc_schedticks & 0x3f) == 0)
> 
> That ++ should not be dropped sould it?

Indeed!


Index: conf/param.c
===
RCS file: /cvs/src/sys/conf/param.c,v
retrieving revision 1.37
diff -u -p -r1.37 param.c
--- conf/param.c6 May 2016 19:45:35 -   1.37
+++ conf/param.c14 Aug 2017 17:03:23 -
@@ -76,7 +76,7 @@
 # define DST 0
 #endif
 #ifndef HZ
-#defineHZ 100
+#defineHZ 1000
 #endif
 inthz = HZ;
 inttick = 100 / HZ;
Index: kern/kern_clock.c
===
RCS file: /cvs/src/sys/kern/kern_clock.c,v
retrieving revision 1.93
diff -u -p -r1.93 kern_clock.c
--- kern/kern_clock.c   22 Jul 2017 14:33:45 -  1.93
+++ kern/kern_clock.c   14 Aug 2017 21:03:54 -
@@ -406,12 +406,11 @@ statclock(struct clockframe *frame)
if (p != NULL) {
p->p_cpticks++;
/*
-* If no schedclock is provided, call it here at ~~12-25 Hz;
+* If no schedclock is provided, call it here;
 * ~~16 Hz is best
 */
if (schedhz == 0) {
-   if ((++curcpu()->ci_schedstate.spc_schedticks & 3) ==
-   0)
+   if ((++spc->spc_schedticks & 0x3f) == 0)
schedclock(p);
}
}
Index: arch/amd64/isa/clock.c
===
RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v
retrieving revision 1.25
diff -u -p -r1.25 clock.c
--- arch/amd64/isa/clock.c  11 Aug 2017 21:18:11 -  1.25
+++ arch/amd64/isa/clock.c  14 Aug 2017 17:19:35 -
@@ -303,8 +303,8 @@ rtcdrain(void *v)
 void
 i8254_initclocks(void)
 {
-   stathz = 128;
-   profhz = 1024;
+   stathz = 1024;
+   profhz = 8192;
 
isa_intr_establish(NULL, 0, IST_PULSE, IPL_CLOCK, clockintr,
0, "clock");
@@ -321,7 +321,7 @@ rtcstart(void)
 {
static struct timeout rtcdrain_timeout;
 
-   mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_128_Hz);
+   mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_1024_Hz);
mc146818_write(NULL, MC_REGB, MC_REGB_24HR | MC_REGB_PIE);
 
/*
@@ -577,10 +577,10 @@ setstatclockrate(int arg)
if (initclock_func == i8254_initclocks) {
if (arg == stathz)
mc146818_write(NULL, MC_REGA,
-   MC_BASE_32_KHz | MC_RATE_128_Hz);
+   MC_BASE

Re: Please test: HZ bump

2017-08-14 Thread Ted Unangst
Ted Unangst wrote:
> Martin Pieuchot wrote:
> > I'd like to improve the fairness of the scheduler, with the goal of
> > mitigating userland starvations.  For that the kernel needs to have
> > a better understanding of the amount of executed time per task. 
> > 
> > The smallest interval currently usable on all our architectures for
> > such accounting is a tick.  With the current HZ value of 100, this
> > smallest interval is 10ms.  I'd like to bump this value to 1000.
> 
> Maybe we want this too, for sh? This looks like accidental netbsd copying. Or
> are we intentionally resetting hz on sh for some reason?

apparently yes because the clock only works at 64hz. is the conf file a better
place for that, instead of having two separate ifndef initializers with
different values? that troubles me, even if it seems to work.

just define HZ=64 in the right place.

Index: arch/landisk/conf/GENERIC
===
RCS file: /cvs/src/sys/arch/landisk/conf/GENERIC,v
retrieving revision 1.51
diff -u -p -r1.51 GENERIC
--- arch/landisk/conf/GENERIC   28 Jun 2016 04:41:37 -  1.51
+++ arch/landisk/conf/GENERIC   14 Aug 2017 20:56:29 -
@@ -21,6 +21,8 @@ optionPCLOCK= # 33.33MHz clo
 option DONT_INIT_BSC
 #optionDONT_INIT_PCIBSC
 
+option HZ=64
+
 option PCIVERBOSE
 option USER_PCICONF# user-space PCI configuration
 option USBVERBOSE


> 
> 
> Index: arch/sh/sh/clock.c
> ===
> RCS file: /cvs/src/sys/arch/sh/sh/clock.c,v
> retrieving revision 1.9
> diff -u -p -r1.9 clock.c
> --- arch/sh/sh/clock.c5 Mar 2016 17:16:33 -   1.9
> +++ arch/sh/sh/clock.c14 Aug 2017 20:49:31 -
> @@ -47,9 +47,6 @@
>  
>  #define  NWDOG 0
>  
> -#ifndef HZ
> -#define  HZ  64
> -#endif
>  #define  MINYEAR 2002/* "today" */
>  #define  SH_RTC_CLOCK16384   /* Hz */
>  
> @@ -231,10 +228,6 @@ cpu_initclocks(void)
>  {
>   if (sh_clock.pclock == 0)
>   panic("No PCLOCK information.");
> -
> - /* Set global variables. */
> - hz = HZ;
> - tick = 100 / hz;
>  
>   /*
>* Use TMU channel 0 as hard clock
> 



Re: Please test: HZ bump

2017-08-14 Thread Ted Unangst
Martin Pieuchot wrote:
> I'd like to improve the fairness of the scheduler, with the goal of
> mitigating userland starvations.  For that the kernel needs to have
> a better understanding of the amount of executed time per task. 
> 
> The smallest interval currently usable on all our architectures for
> such accounting is a tick.  With the current HZ value of 100, this
> smallest interval is 10ms.  I'd like to bump this value to 1000.

Maybe we want this too, for sh? This looks like accidental netbsd copying. Or
are we intentionally resetting hz on sh for some reason?


Index: arch/sh/sh/clock.c
===
RCS file: /cvs/src/sys/arch/sh/sh/clock.c,v
retrieving revision 1.9
diff -u -p -r1.9 clock.c
--- arch/sh/sh/clock.c  5 Mar 2016 17:16:33 -   1.9
+++ arch/sh/sh/clock.c  14 Aug 2017 20:49:31 -
@@ -47,9 +47,6 @@
 
 #defineNWDOG 0
 
-#ifndef HZ
-#defineHZ  64
-#endif
 #defineMINYEAR 2002/* "today" */
 #defineSH_RTC_CLOCK16384   /* Hz */
 
@@ -231,10 +228,6 @@ cpu_initclocks(void)
 {
if (sh_clock.pclock == 0)
panic("No PCLOCK information.");
-
-   /* Set global variables. */
-   hz = HZ;
-   tick = 100 / hz;
 
/*
 * Use TMU channel 0 as hard clock



Re: Please test: HZ bump

2017-08-14 Thread Mark Kettenis
> Date: Mon, 14 Aug 2017 16:06:51 -0400
> From: Martin Pieuchot 
> 
> I'd like to improve the fairness of the scheduler, with the goal of
> mitigating userland starvations.  For that the kernel needs to have
> a better understanding of the amount of executed time per task. 
> 
> The smallest interval currently usable on all our architectures for
> such accounting is a tick.  With the current HZ value of 100, this
> smallest interval is 10ms.  I'd like to bump this value to 1000.
> 
> The diff below intentionally bump other `hz' value to keep current
> ratios.  We certainly want to call schedclock(), or a similar time
> accounting function, at a higher frequency than 16 Hz.  However this
> will be part of a later diff.
> 
> I'd be really interested in test reports.  mlarkin@ raised a good
> question: is your battery lifetime shorter with this diff?
> 
> Comments, oks?

Need to look at this a bit more carefully but:

> Index: conf/param.c
> ===
> RCS file: /cvs/src/sys/conf/param.c,v
> retrieving revision 1.37
> diff -u -p -r1.37 param.c
> --- conf/param.c  6 May 2016 19:45:35 -   1.37
> +++ conf/param.c  14 Aug 2017 17:03:23 -
> @@ -76,7 +76,7 @@
>  # define DST 0
>  #endif
>  #ifndef HZ
> -#define  HZ 100
> +#define  HZ 1000
>  #endif
>  int  hz = HZ;
>  int  tick = 100 / HZ;
> Index: kern/kern_clock.c
> ===
> RCS file: /cvs/src/sys/kern/kern_clock.c,v
> retrieving revision 1.93
> diff -u -p -r1.93 kern_clock.c
> --- kern/kern_clock.c 22 Jul 2017 14:33:45 -  1.93
> +++ kern/kern_clock.c 14 Aug 2017 19:50:49 -
> @@ -406,12 +406,11 @@ statclock(struct clockframe *frame)
>   if (p != NULL) {
>   p->p_cpticks++;
>   /*
> -  * If no schedclock is provided, call it here at ~~12-25 Hz;
> +  * If no schedclock is provided, call it here;
>* ~~16 Hz is best
>*/
>   if (schedhz == 0) {
> - if ((++curcpu()->ci_schedstate.spc_schedticks & 3) ==
> - 0)
> + if ((spc->spc_schedticks & 0x3f) == 0)

That ++ should not be dropped sould it?



Please test: HZ bump

2017-08-14 Thread Martin Pieuchot
I'd like to improve the fairness of the scheduler, with the goal of
mitigating userland starvations.  For that the kernel needs to have
a better understanding of the amount of executed time per task. 

The smallest interval currently usable on all our architectures for
such accounting is a tick.  With the current HZ value of 100, this
smallest interval is 10ms.  I'd like to bump this value to 1000.

The diff below intentionally bump other `hz' value to keep current
ratios.  We certainly want to call schedclock(), or a similar time
accounting function, at a higher frequency than 16 Hz.  However this
will be part of a later diff.

I'd be really interested in test reports.  mlarkin@ raised a good
question: is your battery lifetime shorter with this diff?

Comments, oks?

Index: conf/param.c
===
RCS file: /cvs/src/sys/conf/param.c,v
retrieving revision 1.37
diff -u -p -r1.37 param.c
--- conf/param.c6 May 2016 19:45:35 -   1.37
+++ conf/param.c14 Aug 2017 17:03:23 -
@@ -76,7 +76,7 @@
 # define DST 0
 #endif
 #ifndef HZ
-#defineHZ 100
+#defineHZ 1000
 #endif
 inthz = HZ;
 inttick = 100 / HZ;
Index: kern/kern_clock.c
===
RCS file: /cvs/src/sys/kern/kern_clock.c,v
retrieving revision 1.93
diff -u -p -r1.93 kern_clock.c
--- kern/kern_clock.c   22 Jul 2017 14:33:45 -  1.93
+++ kern/kern_clock.c   14 Aug 2017 19:50:49 -
@@ -406,12 +406,11 @@ statclock(struct clockframe *frame)
if (p != NULL) {
p->p_cpticks++;
/*
-* If no schedclock is provided, call it here at ~~12-25 Hz;
+* If no schedclock is provided, call it here;
 * ~~16 Hz is best
 */
if (schedhz == 0) {
-   if ((++curcpu()->ci_schedstate.spc_schedticks & 3) ==
-   0)
+   if ((spc->spc_schedticks & 0x3f) == 0)
schedclock(p);
}
}
Index: arch/amd64/isa/clock.c
===
RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v
retrieving revision 1.25
diff -u -p -r1.25 clock.c
--- arch/amd64/isa/clock.c  11 Aug 2017 21:18:11 -  1.25
+++ arch/amd64/isa/clock.c  14 Aug 2017 17:19:35 -
@@ -303,8 +303,8 @@ rtcdrain(void *v)
 void
 i8254_initclocks(void)
 {
-   stathz = 128;
-   profhz = 1024;
+   stathz = 1024;
+   profhz = 8192;
 
isa_intr_establish(NULL, 0, IST_PULSE, IPL_CLOCK, clockintr,
0, "clock");
@@ -321,7 +321,7 @@ rtcstart(void)
 {
static struct timeout rtcdrain_timeout;
 
-   mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_128_Hz);
+   mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_1024_Hz);
mc146818_write(NULL, MC_REGB, MC_REGB_24HR | MC_REGB_PIE);
 
/*
@@ -577,10 +577,10 @@ setstatclockrate(int arg)
if (initclock_func == i8254_initclocks) {
if (arg == stathz)
mc146818_write(NULL, MC_REGA,
-   MC_BASE_32_KHz | MC_RATE_128_Hz);
+   MC_BASE_32_KHz | MC_RATE_1024_Hz);
else
mc146818_write(NULL, MC_REGA,
-   MC_BASE_32_KHz | MC_RATE_1024_Hz);
+   MC_BASE_32_KHz | MC_RATE_8192_Hz);
}
 }
 
Index: arch/armv7/omap/dmtimer.c
===
RCS file: /cvs/src/sys/arch/armv7/omap/dmtimer.c,v
retrieving revision 1.6
diff -u -p -r1.6 dmtimer.c
--- arch/armv7/omap/dmtimer.c   22 Jan 2015 14:33:01 -  1.6
+++ arch/armv7/omap/dmtimer.c   14 Aug 2017 17:16:01 -
@@ -296,8 +296,8 @@ dmtimer_cpu_initclocks()
 {
struct dmtimer_softc*sc = dmtimer_cd.cd_devs[1];
 
-   stathz = 128;
-   profhz = 1024;
+   stathz = 1024;
+   profhz = 8192;
 
sc->sc_ticks_per_second = TIMER_FREQUENCY; /* 32768 */
 
Index: arch/armv7/omap/gptimer.c
===
RCS file: /cvs/src/sys/arch/armv7/omap/gptimer.c,v
retrieving revision 1.4
diff -u -p -r1.4 gptimer.c
--- arch/armv7/omap/gptimer.c   20 Jun 2014 14:08:11 -  1.4
+++ arch/armv7/omap/gptimer.c   14 Aug 2017 17:15:44 -
@@ -283,8 +283,8 @@ void
 gptimer_cpu_initclocks()
 {
 // u_int32_t now;
-   stathz = 128;
-   profhz = 1024;
+   stathz = 1024;
+   profhz = 8192;
 
ticks_per_second = TIMER_FREQUENCY;
 
Index: arch/armv7/sunxi/sxitimer.c
===
RCS file: /cvs/src/sys/arch/armv7/sunxi/sxitimer.c,v
retrieving revision 1.10
diff -u -p -r1.10 sxitimer.c
--- arch/armv7/sunxi/sxitimer.c 21 Jan 2017 08:26:49 -  1.10
+++ arch/arm