Hi Markus, On Thu, May 12, 2016 at 11:53:01AM +0200, Markus Pargmann wrote: > On Thursday 28 April 2016 18:27:34 Wouter Verhelst wrote: > > However, at some point I agreed with Paul (your predecessor) that when > > this happens due to an error condition (as opposed to it being due to an > > explicit disconnect), the kernel would block all reads from or writes to > > the device, and the client may try to reconnect *from the same > > PID* (i.e., it may not fork()). If that succeeds, the next NBD_DO_IT is > > assumed to be connected to the same server; if instead the process > > exits, then the block device is assumed to be dead, will be reset, and > > all pending reads or writes would error. > > > > In principle, this allows for a proper reconnect from userspace if it > > can be done. However, I'm not sure whether this ever worked well or > > whether it was documented, so it's probably fine if you think it should > > be replaced with something else. > > At least I was not aware of this possibility. As far as I know the > previous code even had issues with the signals used to kill on timeouts > which also killed the userspace program sometimes. > > Currently I can't see a code path that supports reconnects. But I may > have removed that accidently in the past.
Right. Like I said, I'm not sure if it ever worked well. The user space client has a -persist option that tries to implement it, but I've been getting some bug reports from people who've tried it (although that may have been my fault rather than the kernel's). > > (obviously, userspace reconnecting the device to a different device is > > wrong and should not be done, but that's a case of "if you break it, you > > get to keep both pieces) > > > > At any rate, I think it makes sense for userspace to be given a chance > > to *attempt* to reconnect a device when the connection drops > > unexpectedly. > > Perhaps it would be better to setup the kernel driver explicitly for > that. Perhaps some flag to let the kernel driver know that the client > would like to keep the block device open? In that case the client could > excplicitly use NBD_CLEAR_SOCK to cleanup everything. I'm not sure what you mean by this. Can you clarify? > Or perhaps a completely new ioctl that can transmit back some more > information about what failures were seen and whether the blockdevice > was closed or not? The intent was that ioctl(NBD_DO_IT) would return an error when the disconnect was not requested, and would return 0 when the connection dropped due to userspace doing ioctl(NBD_DISCONNECT), since dropping the connection when userspace explicitly asks for it is not an error. drivers/block/nbd.c contains the following: static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, unsigned int cmd, unsigned long arg) { [...] case NBD_DO_IT: { [...] if (nbd->disconnect) /* user requested, ignore socket errors */ return 0; return error; } [...] so the signalling part of it is at least still there. Whether it works, I haven't tested. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12