Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Claudio Jeker Mon, 16 May 2022 08:07:14 -0700

On Sat, May 14, 2022 at 12:41:00AM +0200, Alexander Bluhm wrote:
> On Fri, May 13, 2022 at 05:53:27PM +0200, Alexandr Nedvedicky wrote:
> >     at this point we hold a NET_LOCK(). So basically if there won't
> >     be enough memory we might start sleeping waiting for memory
> >     while we will be holding a NET_LOCK.
> > 
> >     This is something we should try to avoid, however this can be
> >     sorted out later. At this point I just want to point out
> >     this problem, which can be certainly solved in follow up
> >     commit. pf(4) also has its homework to be done around
> >     sleeping mallocs.
> 
> I think sleeping with netlock is not a problem in general.
> 
> In pf(4) ioctl we sleep with netlock and pflock while doing copyin()
> or copyout().  This results in a lock order reversal due to a hack
> in uvn_io().  In my opinion we should not sleep within pf lock, so
> we can convert it to mutex or someting better later.
> 
> In veb configuration we are holding the netlock and sleep in
> smr_barrier() and refcnt_finalize().  An additional sleep in malloc()
> is fine here.


Are you sure about this? smr_barrier() on busy systems with many cpus can
take 100ms and more. Isn't all processing stopped during that time (IIRC
the code uses the write exclusive netlock).

Also for smr_barrier() the ownership of the object should have already
been moved exclusivly to the thread and so it can call smr_barrier()
without holding any locks. I feel the same is true for refcnt_finalize().
 
Sleeping with a global rwlock held is not a good design.

> Holding the netlock and sleeping in m_get() is worse.  There is no
> recovery after reaching the mbuf limit.  Sleeping rules are
> inconsistent and depend on the area of the stack.  Different people
> have multiple ideas how it should be done.

-- 
:wq Claudio

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Reply via email to