On Sat, May 22, 2021 at 02:33:47PM +0200, Mark Kettenis wrote:
> > Date: Sat, 22 May 2021 11:11:38 +0000
> > From: Visa Hankala <v...@hankala.org>
> > 
> > On Wed, May 19, 2021 at 05:11:09PM -0500, Scott Cheloha wrote:
> > > Hi,
> > > 
> > > visa@ says I need to unlock softintr_dispatch() before I can
> > > unlock softclock(), so let's do that.
> > > 
> > > Additionally, when we call softintr_disestablish() we want to wait for
> > > the underlying softintr handle to finish running if it is running.
> > > 
> > > We can start with amd64.
> > > 
> > > I think this approach will work:
> > > 
> > > - Keep a pointer to the running softintr, if any, in the queue.  NULL
> > >   the pointer when we return from sih_func().
> > > 
> > > - Take/release the kernel lock if the SI_MPSAFE flag is present when
> > >   we enter/leave sih_func().
> > > 
> > > - If the handle is running when you call softintr_disestablish(), spin
> > >   until the handle isn't running anymore and retry.
> > > 
> > > There is no softintr manpage but I think it is understood that
> > > softintr_disestablish() is only safe to call from a process context,
> > > otherwise you may deadlock.  Maybe we should do splassert(IPL_NONE)?
> > > 
> > > We could probably sleep here instead of spinning.  We'd have to change
> > > every softintr_disestablish() implementation to do that, though.
> > > Otherwise you'd have different behavior on different platforms.
> > 
> > I think your diff does not pay enough attention to the fact that soft
> > interrupts are handled by all CPUs. I think the diff that I posted
> > a while ago [1] is better in that respect.
> > 
> > Two biggest things that I do not like in my original diff are
> > synchronization of handler execution, and use of the SMR barrier.
> > 
> > [1] https://marc.info/?l=openbsd-tech&m=162092714911609
> > 
> > The kernel lock has guaranteed that at most one CPU is able to run
> > a given soft interrupt handler at a time. My diff used a mutex to
> > prevent concurrent execution. However, it is wasteful to spin. It would
> > be more economical to let the current runner of the handler re-execute
> > the code.
> > 
> > The SMR barrier in softintr_disestablish() was a trick to drain any
> > pending activity. However, it made me feel uneasy because I have not
> > checked every caller of softintr_disestablish(). My main worry is not
> > the latency but unexpected side effects.
> > 
> > Below is a revised diff that improves the above two points.
> > 
> > When a soft interrupt handler is scheduled, it is assigned to a CPU.
> > That CPU will keep running the handler as long as there are pending
> > requests. Once all pending requests have been drained, the CPU
> > relinquishes its hold of the handler. This provides natural
> > serialization.
> > 
> > Now softintr_disestablish() uses spinning for draining activity.
> > I still have slight qualms about this, though, because the feature
> > has not been so explicit before. Integration with witness(4) might be
> > in order.
> > 
> > softintr_disestablish() uses READ_ONCE() to enforce reloading of the
> > value in the busy-wait loop. This way the variable does not need to be
> > volatile. (As yet another option, CPU_BUSY_CYCLE() could always
> > imply memory clobbering, which should make an optimizing compiler
> > redo the load.) For consistency with this READ_ONCE(), WRITE_ONCE() is
> > used whenever the variable is written, excluding the initialization.
> > 
> > The patch uses a single mutex for access serialization. The old code
> > has used one mutex per each soft IPL level, but I am not sure how
> > useful that has been. I think it would be better to have a separate
> > mutex for each CPU. However, the increased code complexity might not
> > be worthwhile at the moment. Even having the per-CPU queues has
> > a whiff of being somewhat overkill.
> 
> A few comments:
> 
> * Looking at amd64 in isolation does not make sense.  Like a lot of MD
>   code in OpenBSD the softintr code was copied from whatever
>   Net/FreeBSD had at the time, with no attempt at unification (it
>   works, check it in, don't go back to clean it up).  However, with
>   powerpc64 and riscv64 we try to do things a little bit better in
>   that area.  So arm64, powerpc64 and riscv64 share the same softintr
>   implementation that already implements softintr_establish_flags()
>   with SOFTINTR_ESTABLISH_MPSAFE.  Now we haven't used that flag
>   anywhere in our tree yet, so the code might be completely busted.
>   But it may make a lot of sense to migrate other architectures to the
>   same codebase.

I know that the implementations should be unified, or made machine
independent as far as possible. I have focused on amd64's code to get
the MP-safe processing working sooner on that platform - at the cost
of increased code debt.

> * The softintr_disestablish() function isn't used a lot in our tree.
>   It may make sense to postpone worrying about safely disestablishing
>   mpsafe soft interrupts for now and simply panic if someone tries to
>   do this.

Thinking about this I realized that the disestablishing is tricky even
in the non-MP-safe case. A CPU might be trying to tear down a handler
while another CPU is about to enter the handler but spinning for the
kernel lock, which would cause a deadlock. Until now this has not been
an issue with the implementations where the whole dispatch loop runs
under the kernel lock.

Users of soft interrupts should ensure that no further scheduling of the
soft interrupt happens when softintr_disestablish() is called. Typically
this would be accomplished by disabling and draining the hard interrupt
that uses the soft interrupt. In the usual case the hard interrupt is
processed by a specific CPU. Consequently, the soft interrupt is run
by that CPU as well, immediately after the hard interrupt. Once
intr_barrier() returns, the soft interrupt should have quiesced.
However, this looks quite fragile, and SMR barrier has gained a tiny bit
of appeal because it releases the lock temporarily.

> * Wouldn't it make sense for an mpsafe soft interrupt to protect
>   itself from running simultaniously on multiple CPUs?  It probably
>   already needs some sort of lock to protect the handler and other
>   code running in process context on other CPUs.  And some handlers
>   may be safe to run simultaniously anyway.

OpenBSD's soft interrupt code actually combines two things into one:
implementation of soft interrupt vectors, and tasklet-like execution
of handlers. In general, the former is part MD and part MI, whereas
the latter is essentially fully MI.

Timeout processing could be among users of the semi-fixed set of soft
interrupt vectors. Most device drivers needing soft interrupts could use
the higher-level tasklet interface.

I have already thought about implementing the above. However, there are
things that I should finish first.

> * I think we should avoid MAXCPU arrays if we can; adding stuff to
>   struct cpu_info is probably a better approach here.

I agree. I used MAXCPUS to simplify the initialization.

What would be the right place to call per-CPU initialization of the
subsystem? cpu_init() on amd64?

Reply via email to