On Tue, Jul 02, 2019 at 09:41:40PM +0200, Florian Westphal wrote: > Thomas and Juliana report a deadlock when running: > > (rmmod nf_conntrack_netlink/xfrm_user) > > conntrack -e NEW -E & > modprobe -v xfrm_user > > They provided following analysis: > > conntrack -e NEW -E > netlink_bind() > netlink_lock_table() -> increases "nl_table_users" > nfnetlink_bind() > # does not unlock the table as it's locked by netlink_bind() > __request_module() > call_usermodehelper_exec() > > This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind() > won't return until modprobe process is done. > > "modprobe xfrm_user": > xfrm_user_init() > register_pernet_subsys() > -> grab pernet_ops_rwsem > .. > netlink_table_grab() > calls schedule() as "nl_table_users" is non-zero > > so modprobe is blocked because netlink_bind() increased > nl_table_users while also holding pernet_ops_rwsem. > > "modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink: > ctnetlink_init() > register_pernet_subsys() > -> blocks on "pernet_ops_rwsem" thanks to xfrm_user module > > both modprobe processes wait on one another -- neither can make > progress. > > Switch netlink_bind() to "nowait" modprobe -- this releases the netlink > table lock, which then allows both modprobe instances to complete.
Applied, thanks.