On Sat, 16 Dec 2017 03:13:29 +0000 matthew green <m...@netbsd.org> wrote:
> Module Name: src > Committed By: mrg > Date: Sat Dec 16 03:13:29 UTC 2017 > > Modified Files: > src/sys/kern: subr_pool.c > src/sys/sys: pool.h > > Log Message: > hopefully workaround the irregularly "fork fails in init" problem. > > if a pool is growing, and the grower is PR_NOWAIT, mark this. > if another caller wants to grow the pool and is also PR_NOWAIT, > busy-wait for the original caller, which should either succeed > or hard-fail fairly quickly. > > implement the busy-wait by unlocking and relocking this pools > mutex and returning ERESTART. other methods (such as having > the caller do this) were significantly more code and this hack > is fairly localised. > > ok chs@ riastradh@ Hi! I have an easily reproducable system hang that I believe originates from this change. It can be triggered by doing lots of block and network i/o (like 3 multiple rsyncs) on a uniprocessor system running under Linux KVM. Basically what happens is that for unknown reasons the PR_NOWAIT grower blocks forever when it tries to reaquire the pool lock to do pool_prime_page() in pool_grow(). Meanwhile another process, waiting for the grower to finish, is spinning forever at 100% doing the mutex_exit/mutex_enter/ERESTART thing on the same pool. It looks to me like the grower never actually gets scheduled to run. Also, although it doesn't fix the issue, this pr_flags modification looks like it should be moved to after the mutex is acquired: pp->pr_flags &= ~(PR_GROWING|PR_GROWINGNOWAIT); mutex_enter(&pp->pr_lock); Kind regards, -Tobias