On Tue, Jul 02, 2019 at 09:41:40PM +0200, Florian Westphal wrote:
> Thomas and Juliana report a deadlock when running:
> 
> (rmmod nf_conntrack_netlink/xfrm_user)
> 
>   conntrack -e NEW -E &
>   modprobe -v xfrm_user
> 
> They provided following analysis:
> 
> conntrack -e NEW -E
>     netlink_bind()
>         netlink_lock_table() -> increases "nl_table_users"
>             nfnetlink_bind()
>             # does not unlock the table as it's locked by netlink_bind()
>                 __request_module()
>                     call_usermodehelper_exec()
> 
> This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind()
> won't return until modprobe process is done.
> 
> "modprobe xfrm_user":
>     xfrm_user_init()
>         register_pernet_subsys()
>             -> grab pernet_ops_rwsem
>                 ..
>                 netlink_table_grab()
>                     calls schedule() as "nl_table_users" is non-zero
> 
> so modprobe is blocked because netlink_bind() increased
> nl_table_users while also holding pernet_ops_rwsem.
> 
> "modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
>     ctnetlink_init()
>         register_pernet_subsys()
>             -> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
> 
> both modprobe processes wait on one another -- neither can make
> progress.
> 
> Switch netlink_bind() to "nowait" modprobe -- this releases the netlink
> table lock, which then allows both modprobe instances to complete.

Applied, thanks.

Reply via email to