On Tue, Aug 29, 2023 at 07:15:14PM +0200, Claudio Jeker wrote:
> On Tue, Aug 29, 2023 at 01:01:10AM +0000, Eric Wong wrote:
> > >Synopsis: RLIMIT_CPU doesn't work reliably on mostly idle systems
> > >Category: system
> > >Environment:
> >     System      : OpenBSD 7.3
> >     Details     : OpenBSD 7.3 (GENERIC.MP) #1242: Sat Mar 25 18:04:31 MDT 
> > 2023
> >                      
> > dera...@octeon.openbsd.org:/usr/src/sys/arch/octeon/compile/GENERIC.MP
> > 
> >     Architecture: OpenBSD.octeon
> >     Machine     : octeon
> > >Description:
> > 
> > RLIMIT_CPU doesn't work reliably when few/no syscalls are made on an
> > otherwise idle system (aside from the test process using up CPU).
> > It can take 20-50s to fire SIGKILL with rlim_max=9 (and the SIGXCPU
> > from rlim_cur=1 won't even fire).
> > 
> > I can reproduce this on a private amd64 VM and also on gcc231
> > on GCC compiler farm <https://cfarm.tetaneutral.net/>.
> > I can't reproduce this on a busy system like gcc220 on cfarm,
> > however.
> 
> Thanks for the report. There is indeed an issue in how the CPU time is
> accounted on an idle system. The below diff is a possible fix.
> 
> In roundrobin() force a resched and therefor mi_switch() when
> SPCF_SHOULDYIELD is set. On an idle CPU mi_switch() will just do all
> accounting bits but skip the expensive cpu_switchto() since the proc
> remains the same.

This is an elegant solution.  I'm a little reluctant to add
mi_switch() overhead to a thread if *nothing* else wants to run, but
the roundrobin() quantum is massive at the moment, so it's not all
that much.  This change also fixes a couple other things.

- Runaway user threads (like a synthetic spinloop) can no longer
  keep the system from suspending.  If such threads are forced into
  mi_switch() after two ticks, sched_chooseproc() will then see
  SPCF_SHOULDHALT, etc.

- Sibling theads in a process using ITIMER_VIRTUAL or ITIMER_PROF
  are now forced to enable the itimer_update() interrupt after
  two ticks.

- Sibling threads in a process running profil(2) are now forced
  to enable the profclock() interrupt after two ticks.

For sake of discussion, an alternative approach is below.  Basically,
tweak rucheck() to account for any un-accumulated ONPROC time.  It
seems to work.

I would lean toward your change, though.  Fixing those other things is
a nice bonus.

Index: kern_resource.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_resource.c,v
retrieving revision 1.78
diff -u -p -r1.78 kern_resource.c
--- kern_resource.c     29 Aug 2023 16:19:34 -0000      1.78
+++ kern_resource.c     29 Aug 2023 22:21:06 -0000
@@ -531,21 +531,32 @@ ruadd(struct rusage *ru, struct rusage *
 void
 rucheck(void *arg)
 {
+       struct timespec elapsed, now, total;
        struct rlimit rlim;
        struct process *pr = arg;
        time_t runtime;
+       struct proc *p;
        int s;
 
        KERNEL_ASSERT_LOCKED();
 
        SCHED_LOCK(s);
-       runtime = pr->ps_tu.tu_runtime.tv_sec;
+       nanouptime(&now);
+       total = pr->ps_tu.tu_runtime;
+       TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) {
+               if (p->p_stat == SONPROC) {
+                       timespecsub(&now, &p->p_cpu->ci_schedstate.spc_runtime,
+                           &elapsed);
+                       timespecadd(&total, &elapsed, &total);
+               }
+       }
        SCHED_UNLOCK(s);
 
        mtx_enter(&pr->ps_mtx);
        rlim = pr->ps_limit->pl_rlimit[RLIMIT_CPU];
        mtx_leave(&pr->ps_mtx);
 
+       runtime = total.tv_sec;
        if ((rlim_t)runtime >= rlim.rlim_cur) {
                if ((rlim_t)runtime >= rlim.rlim_max) {
                        prsignal(pr, SIGKILL);

Reply via email to