Stephen Hemminger <step...@networkplumber.org> writes:

> With new transparent VF support, it is possible to get a deadlock
> when some of the deferred work is running and the unregister_vf
> is trying to cancel the work element. The solution is to use
> trylock and reschedule (similar to bonding and team device).
>
> Reported-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
> Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
> ---
>  drivers/net/hyperv/netvsc_drv.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index c71728d82049..e75c0f852a63 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -1601,7 +1601,11 @@ static void netvsc_vf_setup(struct work_struct *w)
>       struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
>       struct net_device *vf_netdev;
>
> -     rtnl_lock();
> +     if (!rtnl_trylock()) {
> +             schedule_work(w);
> +             return;
> +     }
> +
>       vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
>       if (vf_netdev)
>               __netvsc_vf_setup(ndev, vf_netdev);
> @@ -1655,7 +1659,11 @@ static void netvsc_vf_update(struct work_struct *w)
>       struct net_device *vf_netdev;
>       bool vf_is_up;
>
> -     rtnl_lock();
> +     if (!rtnl_trylock()) {
> +             schedule_work(w);
> +             return;
> +     }
> +

So in the situation when we're currently in netvsc_unregister_vf() and
trying to do
        cancel_work_sync(&net_device_ctx->vf_takeover);
        cancel_work_sync(&net_device_ctx->vf_notify);

we'll end up not executing netvsc_vf_update() at all, right? Wouldn't it
create an issue as nobody is switching the datapath back to netvsc?

>       vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
>       if (!vf_netdev)
>               goto unlock;

-- 
  Vitaly

Reply via email to