Hey Markus, On Wed, Jun 15, 2016 at 12:00 PM, Markus Pargmann <m...@pengutronix.de> wrote: > Hi Pranay, > > On Tuesday 14 June 2016 15:03:40 Pranay Srivastava wrote: >> Hi Markus, >> >> On Tue, Jun 14, 2016 at 2:29 PM, Markus Pargmann <m...@pengutronix.de> wrote: >> > >> > On Thursday 02 June 2016 13:25:00 Pranay Kr. Srivastava wrote: >> > > When a timeout occurs or a recv fails, then >> > > instead of abruplty killing nbd block device >> > > wait for it's users to finish. >> > > >> > > This is more required when filesystem(s) like >> > > ext2 or ext3 don't expect their buffer heads to >> > > disappear while the filesystem is mounted. >> > > >> > > Each open of a nbd device is refcounted, while >> > > the userland program [nbd-client] doing the >> > > NBD_DO_IT ioctl would now wait for any other users >> > > of this device before invalidating the nbd device. >> > > >> > > Signed-off-by: Pranay Kr. Srivastava <pran...@gmail.com> >> > > --- >> > > drivers/block/nbd.c | 58 >> > > +++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > 1 file changed, 58 insertions(+) >> > > >> > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c >> > > index d1d898d..4da40dc 100644 >> > > --- a/drivers/block/nbd.c >> > > +++ b/drivers/block/nbd.c >> > > @@ -70,10 +70,13 @@ struct nbd_device { >> > > #if IS_ENABLED(CONFIG_DEBUG_FS) >> > > struct dentry *dbg_dir; >> > > #endif >> > > + atomic_t inuse; >> > > /* >> > > *This is specifically for calling sock_shutdown, for now. >> > > */ >> > > struct work_struct ws_shutdown; >> > > + struct kref users; >> > > + struct completion user_completion; >> > > }; >> > > >> > > #if IS_ENABLED(CONFIG_DEBUG_FS) >> > > @@ -104,6 +107,7 @@ static DEFINE_SPINLOCK(nbd_lock); >> > > * Shutdown function for nbd_dev work struct. >> > > */ >> > > static void nbd_ws_func_shutdown(struct work_struct *); >> > > +static void nbd_kref_release(struct kref *); >> > > >> > > static inline struct device *nbd_to_dev(struct nbd_device *nbd) >> > > { >> > > @@ -682,6 +686,8 @@ static void nbd_reset(struct nbd_device *nbd) >> > > nbd->flags = 0; >> > > nbd->xmit_timeout = 0; >> > > INIT_WORK(&nbd->ws_shutdown, nbd_ws_func_shutdown); >> > > + init_completion(&nbd->user_completion); >> > > + kref_init(&nbd->users); >> > > queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue); >> > > del_timer_sync(&nbd->timeout_timer); >> > > } >> > > @@ -815,6 +821,14 @@ static int __nbd_ioctl(struct block_device *bdev, >> > > struct nbd_device *nbd, >> > > kthread_stop(thread); >> > > >> > > sock_shutdown(nbd); >> > > + /* >> > > + * kref_init initializes with ref count as 1, >> > > + * nbd_client, or the user-land program executing >> > > + * this ioctl will make the refcount to 2[at least] >> > > + * so subtracting 2 from refcount. >> > > + */ >> > > + kref_sub(&nbd->users, 2, nbd_kref_release); >> > >> > Why don't you use a kref_put? >> >> Ok, so I'll try to explain as I've understood the problem. >> >> When the module is loaded the kref is initialized to 1. >> >> Suppose now, someone has started nbd-client [nbdC-1] , then this >> nbd-client will increase the ref count to 2. So far so good... >> >> Now let's say this device is being shutdown via nbd-client[nbdC-2]. >> >> nbdC-1 will subtract the refcount by two, it has to do in NBD_DO_IT >> since device file will not >> be closed until after ioctl is over, and it'll wait_for_completion. >> >> nbdC-2 now closes it's use of device file, this makes the refcount as >> zero and completion >> is triggered with nbdC-1 completed. >> >> Now we don't want to trigger kref_put when nbdC-1 closes the device >> file so kref_put needs >> to be conditional in this regard so for that in_use is used. >> >> >> > >> > > + wait_for_completion(&nbd->user_completion); >> > > mutex_lock(&nbd->tx_lock); >> > > nbd_clear_que(nbd); >> > > kill_bdev(bdev); >> > > @@ -865,13 +879,56 @@ static int nbd_ioctl(struct block_device *bdev, >> > > fmode_t mode, >> > > >> > > return error; >> > > } >> > > +static void nbd_kref_release(struct kref *kref_users) >> > > +{ >> > > + struct nbd_device *nbd = container_of(kref_users, struct >> > > nbd_device, >> > > + users); >> > >> > Not indented to opening bracket. >> > >> > > + pr_debug("Releasing kref [%s]\n", __func__); >> > > + atomic_set(&nbd->inuse, 0); >> > > + complete(&nbd->user_completion); >> > > + >> > > +} >> > > + >> > > +static int nbd_open(struct block_device *bdev, fmode_t mode) >> > > +{ >> > > + struct nbd_device *nbd_dev = bdev->bd_disk->private_data; >> > > + >> > > + if (kref_get_unless_zero(&nbd_dev->users)) >> > > + atomic_set(&nbd_dev->inuse, 1); >> > > + >> > > + pr_debug("Opening nbd_dev %s. Active users = %u\n", >> > > + bdev->bd_disk->disk_name, >> > > + atomic_read(&nbd_dev->users.refcount) - 1); >> > >> > Indent to opening bracket. >> > >> > > + return 0; >> > > +} >> > > + >> > > +static void nbd_release(struct gendisk *disk, fmode_t mode) >> > > +{ >> > > + struct nbd_device *nbd_dev = disk->private_data; >> > > + /* >> > > + *kref_init initializes ref count to 1, so we >> > > + *we check for refcount to be 2 for a final put. >> > > + * >> > > + *kref needs to be re-initialized just here as the >> > > + *other process holding it must see the ref count as 2. >> > > + */ >> > > + if (atomic_read(&nbd_dev->inuse)) >> > > + kref_put(&nbd_dev->users, nbd_kref_release); >> > >> >> > What is this inuse atomic for? Everyone that releases the nbd device >> > will need to execute a kref_put(). >> >> To do away with inuse, perhaps we can do >> >> kref_get just before leaving the NBD_DO_IT? so that when device file >> is closed everyone >> would do a kref_put? However there's a small race window while the >> kref is being initialized, >> and another process [not just nbd-client] is trying to open the device. >> >> Do you think it's better to do this by introducing a spin_lock instead >> of atomic? >> >> Let me know if my understanding is correct. > > Thanks for the explanations. I think my understanding was off by one ;). > I didn't realize that the DO_IT thread from the userspace has the block > device open as well. > > I thought a bit about this, does it make sense to delay the essential > cleanup steps until really all open file handles were closed? So that > even if the DO_IT thread exits, the block device is still there. Only if > the file is closed everything is cleaned up. Maybe this makes the code > simpler and we can directly use krefs without any strange constructs. > What do you think? >
I chose open/close as that is the common interface to all processes that need to use nbd device and not just nbd-client. Let's take example of a mount, so some user has just done a mount of this device and right now it's not doing anything. So someone issues an NBD_DISCONNECT and wham... The idea is to propagate errors to user space correctly. So the solution I've proposed says, if there's someone apart_from_nbd-client (which is currently waiting for NBD_DO_IT to complete) is using this device then nbd-client should honor that and shouldn't allow this device to be reused until after all such processes have left this device. > This would also allow the client to setup a new socket as long as it > does not close the nbd file handle. I think that is still possible right, which is why there's a kref_sub of 2, so the wait is only for the "other processes" using this device and _not_ this nbd-client whose NBD_DO_IT just got over. Now I'm just concerned over the processes which are trying to use this nbd device but the re-initialzation code at the end of NBD_DO_IT hasn't been done. So it's possible some device may skip getting a kref, due to kref_get_unless_zero. Actually I wanted to put this kref under bd_mutex to avoid such races and it'll always be a call to kref_get in open and kref_put in close. Still kref_sub(2) would be required :-) > > Could this behavior be potentially problematic for any client > implementation? Does it solve our other issue with setting up a new > sockets for an existing nbd blockdevice? I don't think that would be a problem. sock_shutdown is before the wait right? So in error condition the worker thread would close it and set to null while for a normal case, sock_shutdown in NBD_DO_IT would set that. So it should be OK. > > Cc Wouter > > Best Regards, > > Markus > >> >> >> > >> > Best Regards, >> > >> > Markus >> > >> > > + >> > > + pr_debug("Closing nbd_dev %s. Active users = %u\n", >> > > + disk->disk_name, >> > > + atomic_read(&nbd_dev->users.refcount) - 1); >> > > +} >> > > >> > > static const struct block_device_operations nbd_fops = { >> > > .owner = THIS_MODULE, >> > > .ioctl = nbd_ioctl, >> > > .compat_ioctl = nbd_ioctl, >> > > + .open = nbd_open, >> > > + .release = nbd_release >> > > }; >> > > >> > > + >> > > static void nbd_ws_func_shutdown(struct work_struct *ws_nbd) >> > > { >> > > struct nbd_device *nbd_dev = container_of(ws_nbd, struct >> > > nbd_device, >> > > @@ -1107,6 +1164,7 @@ static int __init nbd_init(void) >> > > disk->fops = &nbd_fops; >> > > disk->private_data = &nbd_dev[i]; >> > > sprintf(disk->disk_name, "nbd%d", i); >> > > + atomic_set(&nbd_dev[i].inuse, 0); >> > > nbd_reset(&nbd_dev[i]); >> > > add_disk(disk); >> > > } >> > > >> > >> > -- >> > Pengutronix e.K. | | >> > Industrial Linux Solutions | http://www.pengutronix.de/ | >> > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | >> > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | >> >> >> >> >> > > -- > Pengutronix e.K. | | > Industrial Linux Solutions | http://www.pengutronix.de/ | > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | -- ---P.K.S