Dexuan Cui <de...@microsoft.com> writes:

> If the daemon is NOT running at all, when we disable the util device from
> Hyper-V Manager (or sometimes the host can rescind a util device and then
> re-offer it), we'll hang in util_remove -> hv_kvp_deinit ->
> wait_for_completion(&release_event), because this code path doesn't run:
> hvt_op_release -> ... -> kvp_on_reset -> complete(&release_event).
>
> Due to this, we even can't reboot the VM properly.
>
> The patch tracks if the dev file is opened or not, and we only need to
> wait if it's opened.
>
> Fixes: 5a66fecbf6aa ("Drivers: hv: util: kvp: Fix a rescind processing issue")
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: "K. Y. Srinivasan" <k...@microsoft.com>
> Cc: Haiyang Zhang <haiya...@microsoft.com>
> Cc: Stephen Hemminger <sthem...@microsoft.com>
> ---
>  drivers/hv/hv_fcopy.c           | 5 ++++-
>  drivers/hv/hv_kvp.c             | 6 +++++-
>  drivers/hv/hv_snapshot.c        | 5 ++++-
>  drivers/hv/hv_utils_transport.c | 2 ++
>  drivers/hv/hv_utils_transport.h | 1 +
>  5 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> index 9aee601..545cf43 100644
> --- a/drivers/hv/hv_fcopy.c
> +++ b/drivers/hv/hv_fcopy.c
> @@ -358,8 +358,11 @@ int hv_fcopy_init(struct hv_util_service *srv)
>
>  void hv_fcopy_deinit(void)
>  {
> +     bool wait = hvt->dev_opened;
> +
>       fcopy_transaction.state = HVUTIL_DEVICE_DYING;
>       cancel_delayed_work_sync(&fcopy_timeout_work);
>       hvutil_transport_destroy(hvt);
> -     wait_for_completion(&release_event);
> +     if (wait)
> +             wait_for_completion(&release_event);

This is racy I think. We need to prevent openning the device first and
then query its state:

        bool wait;

        fcopy_transaction.state = HVUTIL_DEVICE_DYING;
        /* make sure state is set */
        mb();
        wait = hvt->dev_opened;
        cancel_delayed_work_sync(&fcopy_timeout_work);
        hvutil_transport_destroy(hvt);
        if (wait)
                wait_for_completion(&release_event);

otherwise someone could open the device before we manage to update its
state.

>  }
> diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
> index de26371..15c7873 100644
> --- a/drivers/hv/hv_kvp.c
> +++ b/drivers/hv/hv_kvp.c
> @@ -742,10 +742,14 @@ hv_kvp_init(struct hv_util_service *srv)
>
>  void hv_kvp_deinit(void)
>  {
> +     bool wait = hvt->dev_opened;
> +
>       kvp_transaction.state = HVUTIL_DEVICE_DYING;
>       cancel_delayed_work_sync(&kvp_host_handshake_work);
>       cancel_delayed_work_sync(&kvp_timeout_work);
>       cancel_work_sync(&kvp_sendkey_work);
>       hvutil_transport_destroy(hvt);
> -     wait_for_completion(&release_event);
> +
> +     if (wait)
> +             wait_for_completion(&release_event);
>  }
> diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
> index bcc03f0..3847f19 100644
> --- a/drivers/hv/hv_snapshot.c
> +++ b/drivers/hv/hv_snapshot.c
> @@ -396,9 +396,12 @@ hv_vss_init(struct hv_util_service *srv)
>
>  void hv_vss_deinit(void)
>  {
> +     bool wait = hvt->dev_opened;
> +
>       vss_transaction.state = HVUTIL_DEVICE_DYING;
>       cancel_delayed_work_sync(&vss_timeout_work);
>       cancel_work_sync(&vss_handle_request_work);
>       hvutil_transport_destroy(hvt);
> -     wait_for_completion(&release_event);
> +     if (wait)
> +             wait_for_completion(&release_event);
>  }
> diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
> index c235a95..05e0648 100644
> --- a/drivers/hv/hv_utils_transport.c
> +++ b/drivers/hv/hv_utils_transport.c
> @@ -153,6 +153,7 @@ static int hvt_op_open(struct inode *inode, struct file 
> *file)
>
>       if (issue_reset)
>               hvt_reset(hvt);
> +     hvt->dev_opened = (hvt->mode == HVUTIL_TRANSPORT_CHARDEV) && !ret;
>
>       mutex_unlock(&hvt->lock);
>
> @@ -182,6 +183,7 @@ static int hvt_op_release(struct inode *inode, struct 
> file *file)
>        * connects back.
>        */
>       hvt_reset(hvt);
> +     hvt->dev_opened = false;
>       mutex_unlock(&hvt->lock);
>

Not sure but it seems this may also be racy (what if we query the state
just before we reset it?).

>       if (mode_old == HVUTIL_TRANSPORT_DESTROY)
> diff --git a/drivers/hv/hv_utils_transport.h b/drivers/hv/hv_utils_transport.h
> index d98f522..9871283 100644
> --- a/drivers/hv/hv_utils_transport.h
> +++ b/drivers/hv/hv_utils_transport.h
> @@ -32,6 +32,7 @@ struct hvutil_transport {
>       int mode;                           /* hvutil_transport_mode */
>       struct file_operations fops;        /* file operations */
>       struct miscdevice mdev;             /* misc device */
> +     bool   dev_opened;                  /* Is the device opened? */
>       struct cb_id cn_id;                 /* CN_*_IDX/CN_*_VAL */
>       struct list_head list;              /* hvt_list */
>       int (*on_msg)(void *, int);         /* callback on new user message */

I think we can get away without introducing this new flag, e.g. if we
replace release_event with an atomic which will hold the state
(open/closed). This will also elimenate possible races above. I can try
prototyping a patch if you want me to.

Thanks,

-- 
  Vitaly

Reply via email to