Dexuan Cui <de...@microsoft.com> writes: > If the daemon is NOT running at all, when we disable the util device from > Hyper-V Manager (or sometimes the host can rescind a util device and then > re-offer it), we'll hang in util_remove -> hv_kvp_deinit -> > wait_for_completion(&release_event), because this code path doesn't run: > hvt_op_release -> ... -> kvp_on_reset -> complete(&release_event). > > Due to this, we even can't reboot the VM properly. > > The patch tracks if the dev file is opened or not, and we only need to > wait if it's opened. > > Fixes: 5a66fecbf6aa ("Drivers: hv: util: kvp: Fix a rescind processing issue") > Signed-off-by: Dexuan Cui <de...@microsoft.com> > Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> > Cc: "K. Y. Srinivasan" <k...@microsoft.com> > Cc: Haiyang Zhang <haiya...@microsoft.com> > Cc: Stephen Hemminger <sthem...@microsoft.com> > --- > drivers/hv/hv_fcopy.c | 5 ++++- > drivers/hv/hv_kvp.c | 6 +++++- > drivers/hv/hv_snapshot.c | 5 ++++- > drivers/hv/hv_utils_transport.c | 2 ++ > drivers/hv/hv_utils_transport.h | 1 + > 5 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c > index 9aee601..545cf43 100644 > --- a/drivers/hv/hv_fcopy.c > +++ b/drivers/hv/hv_fcopy.c > @@ -358,8 +358,11 @@ int hv_fcopy_init(struct hv_util_service *srv) > > void hv_fcopy_deinit(void) > { > + bool wait = hvt->dev_opened; > + > fcopy_transaction.state = HVUTIL_DEVICE_DYING; > cancel_delayed_work_sync(&fcopy_timeout_work); > hvutil_transport_destroy(hvt); > - wait_for_completion(&release_event); > + if (wait) > + wait_for_completion(&release_event);
This is racy I think. We need to prevent openning the device first and then query its state: bool wait; fcopy_transaction.state = HVUTIL_DEVICE_DYING; /* make sure state is set */ mb(); wait = hvt->dev_opened; cancel_delayed_work_sync(&fcopy_timeout_work); hvutil_transport_destroy(hvt); if (wait) wait_for_completion(&release_event); otherwise someone could open the device before we manage to update its state. > } > diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c > index de26371..15c7873 100644 > --- a/drivers/hv/hv_kvp.c > +++ b/drivers/hv/hv_kvp.c > @@ -742,10 +742,14 @@ hv_kvp_init(struct hv_util_service *srv) > > void hv_kvp_deinit(void) > { > + bool wait = hvt->dev_opened; > + > kvp_transaction.state = HVUTIL_DEVICE_DYING; > cancel_delayed_work_sync(&kvp_host_handshake_work); > cancel_delayed_work_sync(&kvp_timeout_work); > cancel_work_sync(&kvp_sendkey_work); > hvutil_transport_destroy(hvt); > - wait_for_completion(&release_event); > + > + if (wait) > + wait_for_completion(&release_event); > } > diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c > index bcc03f0..3847f19 100644 > --- a/drivers/hv/hv_snapshot.c > +++ b/drivers/hv/hv_snapshot.c > @@ -396,9 +396,12 @@ hv_vss_init(struct hv_util_service *srv) > > void hv_vss_deinit(void) > { > + bool wait = hvt->dev_opened; > + > vss_transaction.state = HVUTIL_DEVICE_DYING; > cancel_delayed_work_sync(&vss_timeout_work); > cancel_work_sync(&vss_handle_request_work); > hvutil_transport_destroy(hvt); > - wait_for_completion(&release_event); > + if (wait) > + wait_for_completion(&release_event); > } > diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c > index c235a95..05e0648 100644 > --- a/drivers/hv/hv_utils_transport.c > +++ b/drivers/hv/hv_utils_transport.c > @@ -153,6 +153,7 @@ static int hvt_op_open(struct inode *inode, struct file > *file) > > if (issue_reset) > hvt_reset(hvt); > + hvt->dev_opened = (hvt->mode == HVUTIL_TRANSPORT_CHARDEV) && !ret; > > mutex_unlock(&hvt->lock); > > @@ -182,6 +183,7 @@ static int hvt_op_release(struct inode *inode, struct > file *file) > * connects back. > */ > hvt_reset(hvt); > + hvt->dev_opened = false; > mutex_unlock(&hvt->lock); > Not sure but it seems this may also be racy (what if we query the state just before we reset it?). > if (mode_old == HVUTIL_TRANSPORT_DESTROY) > diff --git a/drivers/hv/hv_utils_transport.h b/drivers/hv/hv_utils_transport.h > index d98f522..9871283 100644 > --- a/drivers/hv/hv_utils_transport.h > +++ b/drivers/hv/hv_utils_transport.h > @@ -32,6 +32,7 @@ struct hvutil_transport { > int mode; /* hvutil_transport_mode */ > struct file_operations fops; /* file operations */ > struct miscdevice mdev; /* misc device */ > + bool dev_opened; /* Is the device opened? */ > struct cb_id cn_id; /* CN_*_IDX/CN_*_VAL */ > struct list_head list; /* hvt_list */ > int (*on_msg)(void *, int); /* callback on new user message */ I think we can get away without introducing this new flag, e.g. if we replace release_event with an atomic which will hold the state (open/closed). This will also elimenate possible races above. I can try prototyping a patch if you want me to. Thanks, -- Vitaly