On Wed, Jun 21, 2023 at 04:40:59PM -0500, Scott Cheloha wrote:
> On Wed, Jun 21, 2023 at 01:03:03PM -0600, Chris Waddey wrote:
> > Sorry for breaking the thread, I wasn't subscribed to misc, but found
> > this in the archives.
> > 
> > After some testing, it looks like the recent uvm_meter() commit is what
> > did this (to my machine at least).
> > 
> > The git commit for that is 71d823ace2523fb9fee2d1ab9b4d92a18d3f5714.
> > 
> > I compiled the commit right before it in the logs and booted no problems
> > with a GENERIC.MP kernel config, but that one broke it.
> > 
> > I'm not as familiar with CVS, so apologies for not having the commit
> > from there.
> > 
> > Here is the commit message if that helps, though I those on tech will
> > know it regardless:
> > 
> >     schedcpu, uvm_meter(9): make uvm_meter() an independent timeout
> >     
> >     uvm_meter(9) should not base its periodic uvm_loadav() call on the UTC
> >     clock.  It also no longer needs to periodically wake up proc0 because
> >     proc0 doesn't do any work.  schedcpu() itself may change or go away,
> >     but as kettenis@ notes we probably can't completely remove the concept
> >     of a "load average" from OpenBSD, given its long Unix heritage.
> >     
> >     So, (1) remove the uvm_meter() call from schedcpu(), (2) make
> >     uvm_meter() an independent timeout started alongside schedcpu() during
> >     scheduler_start(), and (3) delete the vestigial periodic proc0 wakeup.
> >     
> >     With input from deraadt@, kettenis@, and claudio@.  deraadt@ cautions
> >     that this change may confuse administrators who hold the load average
> >     in high regard.
> >     
> >     Thread: https://marc.info/?l=openbsd-tech&m=168710929409153&w=2
> >     
> >     general agreement with this direction from kettenis@
> >     ok claudio@
> > 
> > If I should repost on tech, let me know.
> 
> Thank you for bisecting the issue.
> 
> Could you try this tweaked version of that patch?  I am trying to
> narrow down the chunk responsible for the hang.  I cannot trigger the
> hang on any of my machines.  Clearly you have something in your setup
> that I don't.
> 
> There aren't many possibilities.
> 
> 1. Separating uvm_meter() from schedcpu().
> 2. Not using the UTC clock to decide when to update load averages.
> 3. Not periodically awakening proc0.
> 
> Let's try 3.
> 
> Index: kern/sched_bsd.c
> ===================================================================
> RCS file: /cvs/src/sys/kern/sched_bsd.c,v
> retrieving revision 1.76
> diff -u -p -r1.76 sched_bsd.c
> --- kern/sched_bsd.c  21 Jun 2023 21:16:21 -0000      1.76
> +++ kern/sched_bsd.c  21 Jun 2023 21:40:17 -0000
> @@ -234,7 +234,6 @@ schedcpu(void *arg)
>               }
>               SCHED_UNLOCK(s);
>       }
> -     uvm_meter();
>       wakeup(&lbolt);
>       timeout_add_sec(to, 1);
>  }
> @@ -669,6 +668,7 @@ scheduler_start(void)
>  
>       rrticks_init = hz / 10;
>       schedcpu(&schedcpu_to);
> +     uvm_meter(NULL);
>  
>  #ifndef SMALL_KERNEL
>       if (perfpolicy == PERFPOL_AUTO)
> Index: uvm/uvm_meter.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_meter.c,v
> retrieving revision 1.44
> diff -u -p -r1.44 uvm_meter.c
> --- uvm/uvm_meter.c   21 Jun 2023 21:16:21 -0000      1.44
> +++ uvm/uvm_meter.c   21 Jun 2023 21:40:17 -0000
> @@ -65,6 +65,9 @@
>  int maxslp = MAXSLP; /* patchable ... */
>  struct loadavg averunnable;
>  
> +#define UVM_METER_INTVL      5
> +struct timeout uvm_meter_to = TIMEOUT_INITIALIZER(uvm_meter, NULL);
> +
>  /*
>   * constants for averages over 1, 5, and 15 minutes when sampling at
>   * 5 second intervals.
> @@ -85,10 +88,10 @@ void uvmexp_read(struct uvmexp *);
>   * uvm_meter: calculate load average and wake up the swapper (if needed)
>   */
>  void
> -uvm_meter(void)
> +uvm_meter(void *unused)
>  {
> -     if ((gettime() % 5) == 0)
> -             uvm_loadav(&averunnable);
> +     timeout_add_sec(&uvm_meter_to, UVM_METER_INTVL);
> +     uvm_loadav(&averunnable);
>       if (proc0.p_slptime > (maxslp / 2))
>               wakeup(&proc0);
>  }
> Index: uvm/uvm_extern.h
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_extern.h,v
> retrieving revision 1.170
> diff -u -p -r1.170 uvm_extern.h
> --- uvm/uvm_extern.h  21 Jun 2023 21:16:21 -0000      1.170
> +++ uvm/uvm_extern.h  21 Jun 2023 21:40:17 -0000
> @@ -414,7 +414,7 @@ void                      uvmspace_free(struct vmspace *);
>  struct vmspace               *uvmspace_share(struct process *);
>  int                  uvm_share(vm_map_t, vaddr_t, vm_prot_t,
>                           vm_map_t, vaddr_t, vsize_t);
> -void                 uvm_meter(void);
> +void                 uvm_meter(void *);
>  int                  uvm_sysctl(int *, u_int, void *, size_t *, 
>                           void *, size_t, struct proc *);
>  struct vm_page               *uvm_pagealloc(struct uvm_object *,

That worked. Applied it to the last commit that worked for me, built GENERIC.MP,
rebooted and everything seems normal.

Reply via email to