Re: [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
On Fri, Apr 16, 2021 at 8:23 AM Thomas Huth wrote: > > A customer reported that running > > qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2 > > fails for them with the following error message when the images are > stored on a GPFS file system: > > qemu-img: error while writing sector 0: Invalid argument > > After analyzing the strace output, it seems like the problem is in > handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE) > returns EINVAL, which can apparently happen if the file system has > a different idea of the granularity of the operation. It's arguably > a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL > according to the man-page of fallocate(), but the file system is out > there in production and so we have to deal with it. In commit 294682cc3a > ("block: workaround for unaligned byte range in fallocate()") we also > already applied the a work-around for the same problem to the earlier > fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the > PUNCH_HOLE call. > > Signed-off-by: Thomas Huth > --- > block/file-posix.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/block/file-posix.c b/block/file-posix.c > index 20e14f8e96..7a40428d52 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -1675,6 +1675,13 @@ static int handle_aiocb_write_zeroes(void *opaque) > } > s->has_fallocate = false; > } else if (ret != -ENOTSUP) { > +if (ret == -EINVAL) { > +/* > + * File systems like GPFS do not like unaligned byte ranges, > + * treat it like unsupported (so caller falls back to pwrite) > + */ > +return -ENOTSUP; This skips the next fallback, using plain fallocate(0) if we write after the end of the file. Is this intended? We can treat the buggy EINVAL return value as "filesystem is buggy, let's not try other options", or "let's try the next option". Since falling back to actually writing zeroes is so much slower, I think it is better to try the next option. This issue affects also libnbd (nbdcopy file backend). Do we have a bug for GFS? Nir > +} > return ret; > } else { > s->has_discard = false; > -- > 2.27.0 > >
[PATCH v3] hw/block/nvme: fix lbaf formats initialization
Currently LBAF formats are being intialized based on metadata size if and only if nvme-ns "ms" parameter is non-zero value. Since FormatNVM command being supported device parameter "ms" may not be the criteria to initialize the supported LBAFs. Signed-off-by: Gollu Appalanaidu --- -v3: Remove "mset" constraint check if ms < 8, "mset" can be set even when ms < 8 and non-zero. -v2: Addressing review comments (Klaus) Change the current "pi" and "ms" constraint check such that it will throw the error if ms < 8 and if namespace protection info, location and metadata settings are set. Splitting this from compare fix patch series. hw/block/nvme-ns.c | 58 -- 1 file changed, 25 insertions(+), 33 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 7bb618f182..594b0003cf 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) ds = 31 - clz32(ns->blkconf.logical_block_size); ms = ns->params.ms; -if (ns->params.ms) { -id_ns->mc = 0x3; +id_ns->mc = 0x3; -if (ns->params.mset) { -id_ns->flbas |= 0x10; -} +if (ms && ns->params.mset) { +id_ns->flbas |= 0x10; +} -id_ns->dpc = 0x1f; -id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; - -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 9, .ms = 8 }, -[2] = { .ds = 9, .ms = 16 }, -[3] = { .ds = 9, .ms = 64 }, -[4] = { .ds = 12 }, -[5] = { .ds = 12, .ms = 8 }, -[6] = { .ds = 12, .ms = 16 }, -[7] = { .ds = 12, .ms = 64 }, -}; - -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 7; -} else { -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 12 }, -}; +id_ns->dpc = 0x1f; +id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 1; -} +NvmeLBAF lbaf[16] = { +[0] = { .ds = 9 }, +[1] = { .ds = 9, .ms = 8 }, +[2] = { .ds = 9, .ms = 16 }, +[3] = { .ds = 9, .ms = 64 }, +[4] = { .ds = 12 }, +[5] = { .ds = 12, .ms = 8 }, +[6] = { .ds = 12, .ms = 16 }, +[7] = { .ds = 12, .ms = 64 }, +}; + +memcpy(_ns->lbaf, , sizeof(lbaf)); +id_ns->nlbaf = 7; for (i = 0; i <= id_ns->nlbaf; i++) { NvmeLBAF *lbaf = _ns->lbaf[i]; @@ -395,10 +385,12 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, return -1; } -if (ns->params.pi && ns->params.ms < 8) { -error_setg(errp, "at least 8 bytes of metadata required to enable " - "protection information"); -return -1; +if (ns->params.ms < 8) { +if (ns->params.pi || ns->params.pil) { +error_setg(errp, "at least 8 bytes of metadata required to enable " +"protection information, protection information location"); +return -1; +} } if (ns->params.nsid > NVME_MAX_NAMESPACES) { -- 2.17.1
[PATCH] hw/ide: Fix crash when plugging a piix3-ide device into the x-remote machine
QEMU currently crashes when the user tries to do something like: qemu-system-x86_64 -M x-remote -device piix3-ide This happens because the "isabus" variable is not initialized with the x-remote machine yet. Add a proper check for this condition and propagate the error to the caller, so we can fail there gracefully. Signed-off-by: Thomas Huth --- hw/ide/ioport.c | 16 ++-- hw/ide/piix.c | 22 +- hw/isa/isa-bus.c | 14 ++ include/hw/ide/internal.h | 2 +- include/hw/isa/isa.h | 13 - 5 files changed, 46 insertions(+), 21 deletions(-) diff --git a/hw/ide/ioport.c b/hw/ide/ioport.c index b613ff3bba..e6caa537fa 100644 --- a/hw/ide/ioport.c +++ b/hw/ide/ioport.c @@ -50,15 +50,19 @@ static const MemoryRegionPortio ide_portio2_list[] = { PORTIO_END_OF_LIST(), }; -void ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2) +int ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2) { +int ret; + /* ??? Assume only ISA and PCI configurations, and that the PCI-ISA bridge has been setup properly to always register with ISA. */ -isa_register_portio_list(dev, >portio_list, - iobase, ide_portio_list, bus, "ide"); +ret = isa_register_portio_list(dev, >portio_list, + iobase, ide_portio_list, bus, "ide"); -if (iobase2) { -isa_register_portio_list(dev, >portio2_list, - iobase2, ide_portio2_list, bus, "ide"); +if (ret == 0 && iobase2) { +ret = isa_register_portio_list(dev, >portio2_list, + iobase2, ide_portio2_list, bus, "ide"); } + +return ret; } diff --git a/hw/ide/piix.c b/hw/ide/piix.c index b9860e35a5..d3e738320b 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -26,6 +26,7 @@ #include "qemu/osdep.h" #include "hw/pci/pci.h" #include "migration/vmstate.h" +#include "qapi/error.h" #include "qemu/module.h" #include "sysemu/block-backend.h" #include "sysemu/blockdev.h" @@ -123,7 +124,8 @@ static void piix_ide_reset(DeviceState *dev) pci_conf[0x20] = 0x01; /* BMIBA: 20-23h */ } -static void pci_piix_init_ports(PCIIDEState *d) { +static int pci_piix_init_ports(PCIIDEState *d) +{ static const struct { int iobase; int iobase2; @@ -132,24 +134,30 @@ static void pci_piix_init_ports(PCIIDEState *d) { {0x1f0, 0x3f6, 14}, {0x170, 0x376, 15}, }; -int i; +int i, ret; for (i = 0; i < 2; i++) { ide_bus_new(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2); -ide_init_ioport(>bus[i], NULL, port_info[i].iobase, -port_info[i].iobase2); +ret = ide_init_ioport(>bus[i], NULL, port_info[i].iobase, + port_info[i].iobase2); +if (ret) { +return ret; +} ide_init2(>bus[i], isa_get_irq(NULL, port_info[i].isairq)); bmdma_init(>bus[i], >bmdma[i], d); d->bmdma[i].bus = >bus[i]; ide_register_restart_cb(>bus[i]); } + +return 0; } static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) { PCIIDEState *d = PCI_IDE(dev); uint8_t *pci_conf = dev->config; +int rc; pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode @@ -158,7 +166,11 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error **errp) vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d); -pci_piix_init_ports(d); +rc = pci_piix_init_ports(d); +if (rc) { +error_setg_errno(errp, -rc, "Failed to realize %s", + object_get_typename(OBJECT(dev))); +} } int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux) diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c index 7820068e6e..cffaa35e9c 100644 --- a/hw/isa/isa-bus.c +++ b/hw/isa/isa-bus.c @@ -131,13 +131,17 @@ void isa_register_ioport(ISADevice *dev, MemoryRegion *io, uint16_t start) isa_init_ioport(dev, start); } -void isa_register_portio_list(ISADevice *dev, - PortioList *piolist, uint16_t start, - const MemoryRegionPortio *pio_start, - void *opaque, const char *name) +int isa_register_portio_list(ISADevice *dev, + PortioList *piolist, uint16_t start, + const MemoryRegionPortio *pio_start, + void *opaque, const char *name) { assert(piolist && !piolist->owner); +if (!isabus) { +return -ENODEV; +} + /* START is how we should treat DEV, regardless of the actual contents of the portio array. This is how the old code actually handled e.g. the FDC device. */ @@ -145,6 +149,8 @@ void isa_register_portio_list(ISADevice *dev, portio_list_init(piolist, OBJECT(dev), pio_start, opaque, name);
Re: [PATCH 1/2] hw/block/nvme: consider metadata read aio return value in compare
On Apr 16 12:52, Gollu Appalanaidu wrote: Currently in compare command metadata aio read blk_aio_preadv return value ignored, consider it and complete the block accounting. Signed-off-by: Gollu Appalanaidu --- hw/block/nvme.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 624a1431d0..c2727540f1 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -2369,10 +2369,19 @@ static void nvme_compare_mdata_cb(void *opaque, int ret) uint32_t reftag = le32_to_cpu(rw->reftag); struct nvme_compare_ctx *ctx = req->opaque; g_autofree uint8_t *buf = NULL; +BlockBackend *blk = ns->blkconf.blk; +BlockAcctCookie *acct = >acct; +BlockAcctStats *stats = blk_get_stats(blk); uint16_t status = NVME_SUCCESS; trace_pci_nvme_compare_mdata_cb(nvme_cid(req)); +if (ret) { +block_acct_failed(stats, acct); +nvme_aio_err(req, ret); +goto out; +} + buf = g_malloc(ctx->mdata.iov.size); status = nvme_bounce_mdata(n, buf, ctx->mdata.iov.size, @@ -2421,6 +2430,8 @@ static void nvme_compare_mdata_cb(void *opaque, int ret) goto out; } +block_acct_done(stats, acct); + out: qemu_iovec_destroy(>data.iov); g_free(ctx->data.bounce); -- 2.17.1 Good fix, thanks! Since there is no crash, data corruption or other "bad" behavior, this isn't critical for v6.0. Might consider it for a potential stable release though, so I'll add a Fixes: tag and queue it up. Reviewed-by: Klaus Jensen signature.asc Description: PGP signature
[PATCH v2] hw/block/nvme: fix lbaf formats initialization
Currently LBAF formats are being intialized based on metadata size if and only if nvme-ns "ms" parameter is non-zero value. Since FormatNVM command being supported device parameter "ms" may not be the criteria to initialize the supported LBAFs. Signed-off-by: Gollu Appalanaidu --- -v2: Addressing review comments (Klaus) Change the current "pi" and "ms" constraint check such that it will throw the error if ms < 8 and if namespace protection info, location and metadata settings are set. Splitting this from compare fix patch series. hw/block/nvme-ns.c | 59 -- 1 file changed, 26 insertions(+), 33 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 7bb618f182..160873a8d4 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) ds = 31 - clz32(ns->blkconf.logical_block_size); ms = ns->params.ms; -if (ns->params.ms) { -id_ns->mc = 0x3; +id_ns->mc = 0x3; -if (ns->params.mset) { -id_ns->flbas |= 0x10; -} +if (ns->params.mset) { +id_ns->flbas |= 0x10; +} -id_ns->dpc = 0x1f; -id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; - -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 9, .ms = 8 }, -[2] = { .ds = 9, .ms = 16 }, -[3] = { .ds = 9, .ms = 64 }, -[4] = { .ds = 12 }, -[5] = { .ds = 12, .ms = 8 }, -[6] = { .ds = 12, .ms = 16 }, -[7] = { .ds = 12, .ms = 64 }, -}; - -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 7; -} else { -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 12 }, -}; +id_ns->dpc = 0x1f; +id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 1; -} +NvmeLBAF lbaf[16] = { +[0] = { .ds = 9 }, +[1] = { .ds = 9, .ms = 8 }, +[2] = { .ds = 9, .ms = 16 }, +[3] = { .ds = 9, .ms = 64 }, +[4] = { .ds = 12 }, +[5] = { .ds = 12, .ms = 8 }, +[6] = { .ds = 12, .ms = 16 }, +[7] = { .ds = 12, .ms = 64 }, +}; + +memcpy(_ns->lbaf, , sizeof(lbaf)); +id_ns->nlbaf = 7; for (i = 0; i <= id_ns->nlbaf; i++) { NvmeLBAF *lbaf = _ns->lbaf[i]; @@ -395,10 +385,13 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns, return -1; } -if (ns->params.pi && ns->params.ms < 8) { -error_setg(errp, "at least 8 bytes of metadata required to enable " - "protection information"); -return -1; +if (ns->params.ms < 8) { +if (ns->params.pi || ns->params.pil || ns->params.mset) { +error_setg(errp, "at least 8 bytes of metadata required to enable " +"protection information, protection information location and " +"metadata settings"); +return -1; +} } if (ns->params.nsid > NVME_MAX_NAMESPACES) { -- 2.17.1
Re: [PATCH 2/2] hw/block/nvme: fix lbaf formats initialization
On Fri, Apr 16, 2021 at 10:48:16AM +0200, Klaus Jensen wrote: On Apr 16 12:52, Gollu Appalanaidu wrote: Currently LBAF formats are being intialized based on metadata size if and only if nvme-ns "ms" parameter is non-zero value. Since FormatNVM command being supported device parameter "ms" may not be the criteria to initialize the supported LBAFs. Signed-off-by: Gollu Appalanaidu --- hw/block/nvme-ns.c | 48 ++ 1 file changed, 19 insertions(+), 29 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 7bb618f182..573dbb5a9d 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) ds = 31 - clz32(ns->blkconf.logical_block_size); ms = ns->params.ms; -if (ns->params.ms) { -id_ns->mc = 0x3; +id_ns->mc = 0x3; -if (ns->params.mset) { -id_ns->flbas |= 0x10; -} +if (ms && ns->params.mset) { +id_ns->flbas |= 0x10; +} -id_ns->dpc = 0x1f; -id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; - -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 9, .ms = 8 }, -[2] = { .ds = 9, .ms = 16 }, -[3] = { .ds = 9, .ms = 64 }, -[4] = { .ds = 12 }, -[5] = { .ds = 12, .ms = 8 }, -[6] = { .ds = 12, .ms = 16 }, -[7] = { .ds = 12, .ms = 64 }, -}; - -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 7; -} else { -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 12 }, -}; +id_ns->dpc = 0x1f; +id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; While nvme_ns_check_constraints() will error out if pi is set and the metadata bytes are insufficient, I don't think this should set bit 3 unless both metadata and pi is enabled. It's not against the spec, but it's just slightly weird. okay, will modify current "ms" and "pi" constraint check like this: if (ns->params.ms < 8) { if (ns->params.pi || ns->params.pil || ns->params.mset) { error_setg(errp, "at least 8 bytes of metadata required to enable " "protection information, protection information location and " "metadata settings"); return -1; } } and "mset" is set will set directly without checing "ms" in nvme_ns_init() -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 1; -} +NvmeLBAF lbaf[16] = { +[0] = { .ds = 9 }, +[1] = { .ds = 9, .ms = 8 }, +[2] = { .ds = 9, .ms = 16 }, +[3] = { .ds = 9, .ms = 64 }, +[4] = { .ds = 12 }, +[5] = { .ds = 12, .ms = 8 }, +[6] = { .ds = 12, .ms = 16 }, +[7] = { .ds = 12, .ms = 64 }, +}; + +memcpy(_ns->lbaf, , sizeof(lbaf)); +id_ns->nlbaf = 7; for (i = 0; i <= id_ns->nlbaf; i++) { NvmeLBAF *lbaf = _ns->lbaf[i]; -- 2.17.1
Re: [PATCH 2/2] hw/block/nvme: fix lbaf formats initialization
On Apr 16 12:52, Gollu Appalanaidu wrote: Currently LBAF formats are being intialized based on metadata size if and only if nvme-ns "ms" parameter is non-zero value. Since FormatNVM command being supported device parameter "ms" may not be the criteria to initialize the supported LBAFs. Signed-off-by: Gollu Appalanaidu --- hw/block/nvme-ns.c | 48 ++ 1 file changed, 19 insertions(+), 29 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 7bb618f182..573dbb5a9d 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) ds = 31 - clz32(ns->blkconf.logical_block_size); ms = ns->params.ms; -if (ns->params.ms) { -id_ns->mc = 0x3; +id_ns->mc = 0x3; -if (ns->params.mset) { -id_ns->flbas |= 0x10; -} +if (ms && ns->params.mset) { +id_ns->flbas |= 0x10; +} -id_ns->dpc = 0x1f; -id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; - -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 9, .ms = 8 }, -[2] = { .ds = 9, .ms = 16 }, -[3] = { .ds = 9, .ms = 64 }, -[4] = { .ds = 12 }, -[5] = { .ds = 12, .ms = 8 }, -[6] = { .ds = 12, .ms = 16 }, -[7] = { .ds = 12, .ms = 64 }, -}; - -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 7; -} else { -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 12 }, -}; +id_ns->dpc = 0x1f; +id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; While nvme_ns_check_constraints() will error out if pi is set and the metadata bytes are insufficient, I don't think this should set bit 3 unless both metadata and pi is enabled. It's not against the spec, but it's just slightly weird. -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 1; -} +NvmeLBAF lbaf[16] = { +[0] = { .ds = 9 }, +[1] = { .ds = 9, .ms = 8 }, +[2] = { .ds = 9, .ms = 16 }, +[3] = { .ds = 9, .ms = 64 }, +[4] = { .ds = 12 }, +[5] = { .ds = 12, .ms = 8 }, +[6] = { .ds = 12, .ms = 16 }, +[7] = { .ds = 12, .ms = 64 }, +}; + +memcpy(_ns->lbaf, , sizeof(lbaf)); +id_ns->nlbaf = 7; for (i = 0; i <= id_ns->nlbaf; i++) { NvmeLBAF *lbaf = _ns->lbaf[i]; -- 2.17.1 signature.asc Description: PGP signature
[PATCH v3 26/33] block-coroutine-wrapper: allow non bdrv_ prefix
We are going to reuse the script to generate a qcow2_ function in further commit. Prepare the script now. Signed-off-by: Vladimir Sementsov-Ogievskiy --- scripts/block-coroutine-wrapper.py | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py index 0461fd1c45..85dbeb9ecf 100644 --- a/scripts/block-coroutine-wrapper.py +++ b/scripts/block-coroutine-wrapper.py @@ -98,12 +98,13 @@ def snake_to_camel(func_name: str) -> str: def gen_wrapper(func: FuncDecl) -> str: -assert func.name.startswith('bdrv_') -assert not func.name.startswith('bdrv_co_') +assert not '_co_' in func.name assert func.return_type == 'int' assert func.args[0].type in ['BlockDriverState *', 'BdrvChild *'] -name = 'bdrv_co_' + func.name[5:] +subsystem, subname = func.name.split('_', 1) + +name = f'{subsystem}_co_{subname}' bs = 'bs' if func.args[0].type == 'BlockDriverState *' else 'child->bs' struct_name = snake_to_camel(name) -- 2.29.2
[PATCH v3 29/33] nbd/client-connection: add option for non-blocking connection attempt
We'll need a possibility of non-blocking nbd_co_establish_connection(), so that it returns immediately, and it returns success only if connections was previously established in background. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/nbd.h | 2 +- block/nbd.c | 2 +- nbd/client-connection.c | 8 +++- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 00bf08bade..6d5a807482 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -421,7 +421,7 @@ void nbd_client_connection_release(NBDClientConnection *conn); QIOChannel *coroutine_fn nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info, -Error **errp); +bool blocking, Error **errp); void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection *conn); diff --git a/block/nbd.c b/block/nbd.c index 59971bfba8..863d950abd 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -368,7 +368,7 @@ static int nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp) assert(!s->ioc); -s->ioc = nbd_co_establish_connection(s->conn, >info, errp); +s->ioc = nbd_co_establish_connection(s->conn, >info, true, errp); if (!s->ioc) { return -ECONNREFUSED; } diff --git a/nbd/client-connection.c b/nbd/client-connection.c index 00efff293f..8914de7b94 100644 --- a/nbd/client-connection.c +++ b/nbd/client-connection.c @@ -251,6 +251,8 @@ void nbd_client_connection_release(NBDClientConnection *conn) * result, just return it now * otherwise if thread is not running, start a thread and wait for completion * + * If @blocking is false, don't wait for the thread, return immediately. + * * If @info is not NULL, also do nbd-negotiation after successful connection. * In this case info is used only as out parameter, and is fully initialized by * nbd_co_establish_connection(). "IN" fields of info as well as related only to @@ -259,7 +261,7 @@ void nbd_client_connection_release(NBDClientConnection *conn) */ QIOChannel *coroutine_fn nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info, -Error **errp) +bool blocking, Error **errp) { QIOChannel *ioc; QemuThread thread; @@ -299,6 +301,10 @@ nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info, connect_thread_func, conn, QEMU_THREAD_DETACHED); } +if (!blocking) { +return NULL; +} + conn->wait_co = qemu_coroutine_self(); } -- 2.29.2
[PATCH v3 25/33] nbd/client-connection: return only one io channel
block/nbd doesn't need underlying sioc channel anymore. So, we can update nbd/client-connection interface to return only one top-most io channel, which is more straight forward. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/nbd.h | 4 ++-- block/nbd.c | 13 ++--- nbd/client-connection.c | 33 + 3 files changed, 29 insertions(+), 21 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 10756d2544..00bf08bade 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -419,9 +419,9 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr, Monitor *mon); void nbd_client_connection_release(NBDClientConnection *conn); -QIOChannelSocket *coroutine_fn +QIOChannel *coroutine_fn nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info, -QIOChannel **ioc, Error **errp); +Error **errp); void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection *conn); diff --git a/block/nbd.c b/block/nbd.c index f9b56c57b4..15b5921725 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -365,7 +365,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) { int ret; AioContext *aio_context = bdrv_get_aio_context(s->bs); -QIOChannelSocket *sioc; if (!nbd_client_connecting(s)) { return; @@ -404,20 +403,12 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) s->ioc = NULL; } -sioc = nbd_co_establish_connection(s->conn, >info, >ioc, NULL); -if (!sioc) { +s->ioc = nbd_co_establish_connection(s->conn, >info, NULL); +if (!s->ioc) { ret = -ECONNREFUSED; goto out; } -if (s->ioc) { -/* sioc is referenced by s->ioc */ -object_unref(OBJECT(sioc)); -} else { -s->ioc = QIO_CHANNEL(sioc); -} -sioc = NULL; - qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL); qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context); diff --git a/nbd/client-connection.c b/nbd/client-connection.c index c26cd59464..36d2c7c693 100644 --- a/nbd/client-connection.c +++ b/nbd/client-connection.c @@ -255,15 +255,15 @@ void nbd_client_connection_release(NBDClientConnection *conn) * nbd_receive_export_list() would be zero (see description of NBDExportInfo in * include/block/nbd.h). */ -QIOChannelSocket *coroutine_fn +QIOChannel *coroutine_fn nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info, -QIOChannel **ioc, Error **errp) +Error **errp) { +QIOChannel *ioc; QemuThread thread; if (conn->do_negotiation) { assert(info); -assert(ioc); } WITH_QEMU_LOCK_GUARD(>mutex) { @@ -277,10 +277,17 @@ nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info, if (conn->sioc) { /* Previous attempt finally succeeded in background */ if (conn->do_negotiation) { -*ioc = g_steal_pointer(>ioc); +ioc = g_steal_pointer(>ioc); memcpy(info, >updated_info, sizeof(*info)); } -return g_steal_pointer(>sioc); +if (ioc) { +/* TLS channel now has own reference to parent */ +object_unref(OBJECT(conn->sioc)); +} else { +ioc = QIO_CHANNEL(conn->sioc); +} +conn->sioc = NULL; +return ioc; } conn->running = true; @@ -311,11 +318,21 @@ nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info, } else { error_propagate(errp, conn->err); conn->err = NULL; -if (conn->sioc && conn->do_negotiation) { -*ioc = g_steal_pointer(>ioc); +if (!conn->sioc) { +return NULL; +} +if (conn->do_negotiation) { +ioc = g_steal_pointer(>ioc); memcpy(info, >updated_info, sizeof(*info)); } -return g_steal_pointer(>sioc); +if (ioc) { +/* TLS channel now has own reference to parent */ +object_unref(OBJECT(conn->sioc)); +} else { +ioc = QIO_CHANNEL(conn->sioc); +} +conn->sioc = NULL; +return ioc; } } -- 2.29.2
[PATCH v3 28/33] nbd/client-connection: do qio_channel_set_delay(false)
nbd_open() does it (through nbd_establish_connection()). Actually we lost that call on reconnect path in 1dc4718d849e1a1fe "block/nbd: use non-blocking connect: fix vm hang on connect()" when we have introduced reconnect thread. Fixes: 1dc4718d849e1a1fe665ce5241ed79048cfa2cfc Signed-off-by: Vladimir Sementsov-Ogievskiy --- nbd/client-connection.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/nbd/client-connection.c b/nbd/client-connection.c index 36d2c7c693..00efff293f 100644 --- a/nbd/client-connection.c +++ b/nbd/client-connection.c @@ -126,6 +126,8 @@ static int nbd_connect(QIOChannelSocket *sioc, SocketAddress *addr, return ret; } +qio_channel_set_delay(QIO_CHANNEL(sioc), false); + if (!info) { return 0; } -- 2.29.2
[PATCH v3 33/33] block/nbd: drop connection_co
OK, that's a big rewrite of the logic. Pre-patch we have an always running coroutine - connection_co. It does reply receiving and reconnecting. And it leads to a lot of difficult and unobvious code around drained sections and context switch. We also abuse bs->in_flight counter which is increased for connection_co and temporary decreased in points where we want to allow drained section to begin. One of these place is in another file: in nbd_read_eof() in nbd/client.c. We also cancel reconnect and requests waiting for reconnect on drained begin which is not correct. Let's finally drop this always running coroutine and go another way: 1. reconnect_attempt() goes to nbd_co_send_request and called under send_mutex 2. We do receive headers in request coroutine. But we also should dispatch replies for another pending requests. So, nbd_connection_entry() is turned into nbd_receive_replies(), which does reply dispatching until it receive another request headers, and returns when it receive the requested header. 3. All old staff around drained sections and context switch is dropped. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 376 --- nbd/client.c | 2 - 2 files changed, 119 insertions(+), 259 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 03391bb231..3a7b532790 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -59,7 +59,7 @@ typedef struct { Coroutine *coroutine; uint64_t offset;/* original offset of the request */ -bool receiving; /* waiting for connection_co? */ +bool receiving; /* waiting in first yield of nbd_receive_replies() */ } NBDClientRequest; typedef enum NBDClientState { @@ -75,14 +75,10 @@ typedef struct BDRVNBDState { CoMutex send_mutex; CoQueue free_sema; -Coroutine *connection_co; -Coroutine *teardown_co; -QemuCoSleepState *connection_co_sleep_ns_state; -bool drained; -bool wait_drained_end; +Coroutine *receive_co; +Coroutine *in_flight_waiter; int in_flight; NBDClientState state; -bool wait_in_flight; QEMUTimer *reconnect_delay_timer; @@ -131,33 +127,20 @@ static bool nbd_client_connected(BDRVNBDState *s) static void nbd_channel_error(BDRVNBDState *s, int ret) { +if (nbd_client_connected(s)) { +qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); +} + if (ret == -EIO) { if (nbd_client_connected(s)) { s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT : NBD_CLIENT_CONNECTING_NOWAIT; } } else { -if (nbd_client_connected(s)) { -qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); -} s->state = NBD_CLIENT_QUIT; } } -static void nbd_recv_coroutines_wake_all(BDRVNBDState *s) -{ -int i; - -for (i = 0; i < MAX_NBD_REQUESTS; i++) { -NBDClientRequest *req = >requests[i]; - -if (req->coroutine && req->receiving) { -req->receiving = false; -aio_co_wake(req->coroutine); -} -} -} - static void reconnect_delay_timer_del(BDRVNBDState *s) { if (s->reconnect_delay_timer) { @@ -194,117 +177,23 @@ static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns) timer_mod(s->reconnect_delay_timer, expire_time_ns); } -static void nbd_client_detach_aio_context(BlockDriverState *bs) -{ -BDRVNBDState *s = (BDRVNBDState *)bs->opaque; - -/* Timer is deleted in nbd_client_co_drain_begin() */ -assert(!s->reconnect_delay_timer); -/* - * If reconnect is in progress we may have no ->ioc. It will be - * re-instantiated in the proper aio context once the connection is - * reestablished. - */ -if (s->ioc) { -qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc)); -} -} - -static void nbd_client_attach_aio_context_bh(void *opaque) -{ -BlockDriverState *bs = opaque; -BDRVNBDState *s = (BDRVNBDState *)bs->opaque; - -if (s->connection_co) { -/* - * The node is still drained, so we know the coroutine has yielded in - * nbd_read_eof(), the only place where bs->in_flight can reach 0, or - * it is entered for the first time. Both places are safe for entering - * the coroutine. - */ -qemu_aio_coroutine_enter(bs->aio_context, s->connection_co); -} -bdrv_dec_in_flight(bs); -} - -static void nbd_client_attach_aio_context(BlockDriverState *bs, - AioContext *new_context) -{ -BDRVNBDState *s = (BDRVNBDState *)bs->opaque; - -/* - * s->connection_co is either yielded from nbd_receive_reply or from - * nbd_co_reconnect_loop() - */ -if (nbd_client_connected(s)) { -qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context); -} - -bdrv_inc_in_flight(bs); - -/* - * Need to
Re: [PATCH v3 33/33] block/nbd: drop connection_co
16.04.2021 11:14, Vladimir Sementsov-Ogievskiy wrote: 16.04.2021 11:09, Vladimir Sementsov-Ogievskiy wrote: OK, that's a big rewrite of the logic. Pre-patch we have an always running coroutine - connection_co. It does reply receiving and reconnecting. And it leads to a lot of difficult and unobvious code around drained sections and context switch. We also abuse bs->in_flight counter which is increased for connection_co and temporary decreased in points where we want to allow drained section to begin. One of these place is in another file: in nbd_read_eof() in nbd/client.c. We also cancel reconnect and requests waiting for reconnect on drained begin which is not correct. Let's finally drop this always running coroutine and go another way: 1. reconnect_attempt() goes to nbd_co_send_request and called under send_mutex 2. We do receive headers in request coroutine. But we also should dispatch replies for another pending requests. So, nbd_connection_entry() is turned into nbd_receive_replies(), which does reply dispatching until it receive another request headers, and returns when it receive the requested header. 3. All old staff around drained sections and context switch is dropped. Signed-off-by: Vladimir Sementsov-Ogievskiy Please consider this last patch as RFC for now: 1. It is complicated, and doesn't have good documentation. Please look through and ask everything that is not obvious, I'll explain. Don't waste your time trying to understand what is not clean. 2. I also failed to image, how to split the patch into smaller simple patches.. Ideas are welcome. 3. It actually reverts what was done in commit 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace Author: Vladimir Sementsov-Ogievskiy Date: Thu Sep 3 22:02:58 2020 +0300 block/nbd: fix drain dead-lock because of nbd reconnect-delay and I didn't check yet, does this dead-lock still here or not. Even if it still here I believe that nbd driver is a wrong place to workaround this bug, but I should check it first at least. 4. As Roman said, there is a problem in new architecture: when guest is idle, we will not detect disconnect immediately but only on the next request from the guest. That may be considered as a degradation. Still, let's implement a kind of keep-alive on top of this series, some ideas: - add an idle-timeout, and do simple NBD request by timeout, which will result in some expected error reply from the server. - or add an idle coroutine, which will do endless "read" when there no requests. It will be a kind of old connection_co, but it will have only one function and will be extremely simple. And we may just cancel it on drained-begin and restart on drained-end. -- Best regards, Vladimir
[PATCH v3 17/33] nbd/client-connection: implement connection retry
Add an option for thread to retry connection until success. We'll use nbd/client-connection both for reconnect and for initial connection in nbd_open(), so we need a possibility to use same NBDClientConnection instance to connect once in nbd_open() and then use retry semantics for reconnect. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/nbd.h | 2 ++ nbd/client-connection.c | 55 + 2 files changed, 41 insertions(+), 16 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 5d86e6a393..5bb54d831c 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -409,6 +409,8 @@ const char *nbd_err_lookup(int err); /* nbd/client-connection.c */ typedef struct NBDClientConnection NBDClientConnection; +void nbd_client_connection_enable_retry(NBDClientConnection *conn); + NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr, bool do_negotiation, const char *export_name, diff --git a/nbd/client-connection.c b/nbd/client-connection.c index ae4a77f826..002bd91f42 100644 --- a/nbd/client-connection.c +++ b/nbd/client-connection.c @@ -36,6 +36,8 @@ struct NBDClientConnection { NBDExportInfo initial_info; bool do_negotiation; +bool do_retry; + /* * Result of last attempt. Valid in FAIL and SUCCESS states. * If you want to steal error, don't forget to set pointer to NULL. @@ -52,6 +54,15 @@ struct NBDClientConnection { Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ }; +/* + * The function isn't protected by any mutex, so call it when thread is not + * running. + */ +void nbd_client_connection_enable_retry(NBDClientConnection *conn) +{ +conn->do_retry = true; +} + NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr, bool do_negotiation, const char *export_name, @@ -144,24 +155,37 @@ static void *connect_thread_func(void *opaque) NBDClientConnection *conn = opaque; bool do_free; int ret; +uint64_t timeout = 1; +uint64_t max_timeout = 16; + +while (true) { +conn->sioc = qio_channel_socket_new(); + +error_free(conn->err); +conn->err = NULL; +conn->updated_info = conn->initial_info; + +ret = nbd_connect(conn->sioc, conn->saddr, + conn->do_negotiation ? >updated_info : NULL, + conn->tlscreds, >ioc, >err); +conn->updated_info.x_dirty_bitmap = NULL; +conn->updated_info.name = NULL; + +if (ret < 0) { +object_unref(OBJECT(conn->sioc)); +conn->sioc = NULL; +if (conn->do_retry) { +sleep(timeout); +if (timeout < max_timeout) { +timeout *= 2; +} +continue; +} +} -conn->sioc = qio_channel_socket_new(); - -error_free(conn->err); -conn->err = NULL; -conn->updated_info = conn->initial_info; - -ret = nbd_connect(conn->sioc, conn->saddr, - conn->do_negotiation ? >updated_info : NULL, - conn->tlscreds, >ioc, >err); -if (ret < 0) { -object_unref(OBJECT(conn->sioc)); -conn->sioc = NULL; +break; } -conn->updated_info.x_dirty_bitmap = NULL; -conn->updated_info.name = NULL; - WITH_QEMU_LOCK_GUARD(>mutex) { assert(conn->running); conn->running = false; @@ -172,7 +196,6 @@ static void *connect_thread_func(void *opaque) do_free = conn->detached; } - if (do_free) { nbd_client_connection_do_free(conn); } -- 2.29.2
[PATCH v3 16/33] nbd/client-connection: add possibility of negotiation
Add arguments and logic to support nbd negotiation in the same thread after successful connection. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/nbd.h | 9 +++- block/nbd.c | 4 +- nbd/client-connection.c | 105 ++-- 3 files changed, 109 insertions(+), 9 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 57381be76f..5d86e6a393 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -409,11 +409,16 @@ const char *nbd_err_lookup(int err); /* nbd/client-connection.c */ typedef struct NBDClientConnection NBDClientConnection; -NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr); +NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr, + bool do_negotiation, + const char *export_name, + const char *x_dirty_bitmap, + QCryptoTLSCreds *tlscreds); void nbd_client_connection_release(NBDClientConnection *conn); QIOChannelSocket *coroutine_fn -nbd_co_establish_connection(NBDClientConnection *conn, Error **errp); +nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info, +QIOChannel **ioc, Error **errp); void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection *conn); diff --git a/block/nbd.c b/block/nbd.c index 9bd68dcf10..5e63caaf4b 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -361,7 +361,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) s->ioc = NULL; } -s->sioc = nbd_co_establish_connection(s->conn, NULL); +s->sioc = nbd_co_establish_connection(s->conn, NULL, NULL, NULL); if (!s->sioc) { ret = -ECONNREFUSED; goto out; @@ -2033,7 +2033,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, goto fail; } -s->conn = nbd_client_connection_new(s->saddr); +s->conn = nbd_client_connection_new(s->saddr, false, NULL, NULL, NULL); /* * establish TCP connection, return error if it fails diff --git a/nbd/client-connection.c b/nbd/client-connection.c index b45a0bd5f6..ae4a77f826 100644 --- a/nbd/client-connection.c +++ b/nbd/client-connection.c @@ -30,14 +30,19 @@ #include "qapi/clone-visitor.h" struct NBDClientConnection { -/* Initialization constants */ +/* Initialization constants, never change */ SocketAddress *saddr; /* address to connect to */ +QCryptoTLSCreds *tlscreds; +NBDExportInfo initial_info; +bool do_negotiation; /* * Result of last attempt. Valid in FAIL and SUCCESS states. * If you want to steal error, don't forget to set pointer to NULL. */ +NBDExportInfo updated_info; QIOChannelSocket *sioc; +QIOChannel *ioc; Error *err; QemuMutex mutex; @@ -47,12 +52,25 @@ struct NBDClientConnection { Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ }; -NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr) +NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr, + bool do_negotiation, + const char *export_name, + const char *x_dirty_bitmap, + QCryptoTLSCreds *tlscreds) { NBDClientConnection *conn = g_new(NBDClientConnection, 1); +object_ref(OBJECT(tlscreds)); *conn = (NBDClientConnection) { .saddr = QAPI_CLONE(SocketAddress, saddr), +.tlscreds = tlscreds, +.do_negotiation = do_negotiation, + +.initial_info.request_sizes = true, +.initial_info.structured_reply = true, +.initial_info.base_allocation = true, +.initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap), +.initial_info.name = g_strdup(export_name ?: "") }; qemu_mutex_init(>mutex); @@ -68,9 +86,59 @@ static void nbd_client_connection_do_free(NBDClientConnection *conn) } error_free(conn->err); qapi_free_SocketAddress(conn->saddr); +object_unref(OBJECT(conn->tlscreds)); +g_free(conn->initial_info.x_dirty_bitmap); +g_free(conn->initial_info.name); g_free(conn); } +/* + * Connect to @addr and do NBD negotiation if @info is not null. If @tlscreds + * given @outioc is provided. @outioc is provided only on success. The call may + * be cancelled in parallel by simply qio_channel_shutdown(sioc). + */ +static int nbd_connect(QIOChannelSocket *sioc, SocketAddress *addr, + NBDExportInfo *info, QCryptoTLSCreds *tlscreds, + QIOChannel **outioc, Error **errp) +{ +int ret; + +if (outioc) { +*outioc = NULL; +} + +ret =
[PATCH v3 32/33] block/nbd: safer transition to receiving request
req->receiving is a flag of request being in one concrete yield point in nbd_co_do_receive_one_chunk(). Such kind of boolean flag is always better to unset before scheduling the coroutine, to avoid double scheduling. So, let's be more careful. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 6cc563e13d..03391bb231 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -152,6 +152,7 @@ static void nbd_recv_coroutines_wake_all(BDRVNBDState *s) NBDClientRequest *req = >requests[i]; if (req->coroutine && req->receiving) { +req->receiving = false; aio_co_wake(req->coroutine); } } @@ -552,6 +553,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque) * connection_co happens through a bottom half, which can only * run after we yield. */ +s->requests[i].receiving = false; aio_co_wake(s->requests[i].coroutine); qemu_coroutine_yield(); } @@ -614,7 +616,7 @@ static int nbd_co_send_request(BlockDriverState *bs, if (qiov) { qio_channel_set_cork(s->ioc, true); rc = nbd_send_request(s->ioc, request); -if (nbd_clinet_connected(s) && rc >= 0) { +if (nbd_client_connected(s) && rc >= 0) { if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov, NULL) < 0) { rc = -EIO; @@ -938,7 +940,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk( /* Wait until we're woken up by nbd_connection_entry. */ s->requests[i].receiving = true; qemu_coroutine_yield(); -s->requests[i].receiving = false; +assert(!s->requests[i].receiving); if (!nbd_client_connected(s)) { error_setg(errp, "Connection closed"); return -EIO; -- 2.29.2
[PATCH v3 30/33] block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()
The only last step we need to reuse the function is coroutine-wrapper. nbd_open() may be called from non-coroutine context. So, generate the wrapper and use it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/coroutines.h | 6 +++ block/nbd.c| 101 ++--- 2 files changed, 10 insertions(+), 97 deletions(-) diff --git a/block/coroutines.h b/block/coroutines.h index 4cfb4946e6..514d169d23 100644 --- a/block/coroutines.h +++ b/block/coroutines.h @@ -66,4 +66,10 @@ int coroutine_fn bdrv_co_readv_vmstate(BlockDriverState *bs, int coroutine_fn bdrv_co_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos); +int generated_co_wrapper +nbd_do_establish_connection(BlockDriverState *bs, Error **errp); +int coroutine_fn +nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp); + + #endif /* BLOCK_COROUTINES_INT_H */ diff --git a/block/nbd.c b/block/nbd.c index 863d950abd..3b31941a83 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -44,6 +44,7 @@ #include "block/qdict.h" #include "block/nbd.h" #include "block/block_int.h" +#include "block/coroutines.h" #include "monitor/monitor.h" @@ -101,11 +102,6 @@ typedef struct BDRVNBDState { NBDClientConnection *conn; } BDRVNBDState; -static QIOChannelSocket *nbd_establish_connection(BlockDriverState *bs, - SocketAddress *saddr, - Error **errp); -static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, -Error **errp); static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BlockDriverState *bs) @@ -361,7 +357,7 @@ static int nbd_handle_updated_info(BlockDriverState *bs, Error **errp) return 0; } -static int nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp) +int nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp) { BDRVNBDState *s = (BDRVNBDState *)bs->opaque; int ret; @@ -1577,83 +1573,6 @@ static void nbd_client_close(BlockDriverState *bs) nbd_teardown_connection(bs); } -static QIOChannelSocket *nbd_establish_connection(BlockDriverState *bs, - SocketAddress *saddr, - Error **errp) -{ -ERRP_GUARD(); -QIOChannelSocket *sioc; - -sioc = qio_channel_socket_new(); -qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client"); - -qio_channel_socket_connect_sync(sioc, saddr, errp); -if (*errp) { -object_unref(OBJECT(sioc)); -return NULL; -} - -yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), nbd_yank, bs); -qio_channel_set_delay(QIO_CHANNEL(sioc), false); - -return sioc; -} - -/* nbd_client_handshake takes ownership on sioc. */ -static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, -Error **errp) -{ -BDRVNBDState *s = (BDRVNBDState *)bs->opaque; -AioContext *aio_context = bdrv_get_aio_context(bs); -int ret; - -trace_nbd_client_handshake(s->export); -qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL); -qio_channel_attach_aio_context(QIO_CHANNEL(sioc), aio_context); - -s->info.request_sizes = true; -s->info.structured_reply = true; -s->info.base_allocation = true; -s->info.x_dirty_bitmap = g_strdup(s->x_dirty_bitmap); -s->info.name = g_strdup(s->export ?: ""); -ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(sioc), s->tlscreds, -s->hostname, >ioc, >info, errp); -g_free(s->info.x_dirty_bitmap); -g_free(s->info.name); -if (ret < 0) { -yank_unregister_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), - nbd_yank, bs); -object_unref(OBJECT(sioc)); -return ret; -} - -if (s->ioc) { -/* sioc is referenced by s->ioc */ -object_unref(OBJECT(sioc)); -} else { -s->ioc = QIO_CHANNEL(sioc); -} -sioc = NULL; - -ret = nbd_handle_updated_info(bs, errp); -if (ret < 0) { -/* - * We have connected, but must fail for other reasons. - * Send NBD_CMD_DISC as a courtesy to the server. - */ -NBDRequest request = { .type = NBD_CMD_DISC }; - -nbd_send_request(s->ioc, ); - -yank_unregister_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), - nbd_yank, bs); -object_unref(OBJECT(s->ioc)); -s->ioc = NULL; -return ret; -} - -return 0; -} /* * Parse nbd_open options @@ -2037,7 +1956,6 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, { int ret; BDRVNBDState *s = (BDRVNBDState *)bs->opaque; -QIOChannelSocket *sioc; s->bs = bs; qemu_co_mutex_init(>send_mutex); @@ -2056,22
Re: [PATCH v3 33/33] block/nbd: drop connection_co
16.04.2021 11:09, Vladimir Sementsov-Ogievskiy wrote: OK, that's a big rewrite of the logic. Pre-patch we have an always running coroutine - connection_co. It does reply receiving and reconnecting. And it leads to a lot of difficult and unobvious code around drained sections and context switch. We also abuse bs->in_flight counter which is increased for connection_co and temporary decreased in points where we want to allow drained section to begin. One of these place is in another file: in nbd_read_eof() in nbd/client.c. We also cancel reconnect and requests waiting for reconnect on drained begin which is not correct. Let's finally drop this always running coroutine and go another way: 1. reconnect_attempt() goes to nbd_co_send_request and called under send_mutex 2. We do receive headers in request coroutine. But we also should dispatch replies for another pending requests. So, nbd_connection_entry() is turned into nbd_receive_replies(), which does reply dispatching until it receive another request headers, and returns when it receive the requested header. 3. All old staff around drained sections and context switch is dropped. Signed-off-by: Vladimir Sementsov-Ogievskiy Please consider this last patch as RFC for now: 1. It is complicated, and doesn't have good documentation. Please look through and ask everything that is not obvious, I'll explain. Don't waste your time trying to understand what is not clean. 2. I also failed to image, how to split the patch into smaller simple patches.. Ideas are welcome. 3. It actually reverts what was done in commit 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace Author: Vladimir Sementsov-Ogievskiy Date: Thu Sep 3 22:02:58 2020 +0300 block/nbd: fix drain dead-lock because of nbd reconnect-delay and I didn't check yet, does this dead-lock still here or not. Even if it still here I believe that nbd driver is a wrong place to workaround this bug, but I should check it first at least. -- Best regards, Vladimir
[PATCH v3 22/33] block/nbd: pass monitor directly to connection thread
monitor_cur() is used by socket_get_fd, but it doesn't work in connection thread. Let's monitor directly to cover this thing. We are going to unify connection establishing path in nbd_open and reconnect, so we should support fd-passing. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/nbd.h | 3 ++- block/nbd.c | 5 - nbd/client-connection.c | 11 +++ 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/include/block/nbd.h b/include/block/nbd.h index 5bb54d831c..10756d2544 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -415,7 +415,8 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr, bool do_negotiation, const char *export_name, const char *x_dirty_bitmap, - QCryptoTLSCreds *tlscreds); + QCryptoTLSCreds *tlscreds, + Monitor *mon); void nbd_client_connection_release(NBDClientConnection *conn); QIOChannelSocket *coroutine_fn diff --git a/block/nbd.c b/block/nbd.c index c1e61a2a52..ec69a4ad65 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -45,6 +45,8 @@ #include "block/nbd.h" #include "block/block_int.h" +#include "monitor/monitor.h" + #include "qemu/yank.h" #define EN_OPTSTR ":exportname=" @@ -2064,7 +2066,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, } s->conn = nbd_client_connection_new(s->saddr, true, s->export, -s->x_dirty_bitmap, s->tlscreds); +s->x_dirty_bitmap, s->tlscreds, +monitor_cur()); /* * establish TCP connection, return error if it fails diff --git a/nbd/client-connection.c b/nbd/client-connection.c index 54f73c6c24..c26cd59464 100644 --- a/nbd/client-connection.c +++ b/nbd/client-connection.c @@ -37,6 +37,7 @@ struct NBDClientConnection { bool do_negotiation; bool do_retry; +Monitor *mon; /* * Result of last attempt. Valid in FAIL and SUCCESS states. @@ -67,7 +68,8 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr, bool do_negotiation, const char *export_name, const char *x_dirty_bitmap, - QCryptoTLSCreds *tlscreds) + QCryptoTLSCreds *tlscreds, + Monitor *mon) { NBDClientConnection *conn = g_new(NBDClientConnection, 1); @@ -76,6 +78,7 @@ NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr, .saddr = QAPI_CLONE(SocketAddress, saddr), .tlscreds = tlscreds, .do_negotiation = do_negotiation, +.mon = mon, .initial_info.request_sizes = true, .initial_info.structured_reply = true, @@ -110,7 +113,7 @@ static void nbd_client_connection_do_free(NBDClientConnection *conn) */ static int nbd_connect(QIOChannelSocket *sioc, SocketAddress *addr, NBDExportInfo *info, QCryptoTLSCreds *tlscreds, - QIOChannel **outioc, Error **errp) + QIOChannel **outioc, Monitor *mon, Error **errp) { int ret; @@ -118,7 +121,7 @@ static int nbd_connect(QIOChannelSocket *sioc, SocketAddress *addr, *outioc = NULL; } -ret = qio_channel_socket_connect_sync(sioc, addr, errp); +ret = qio_channel_socket_connect_sync_mon(sioc, addr, mon, errp); if (ret < 0) { return ret; } @@ -171,7 +174,7 @@ static void *connect_thread_func(void *opaque) ret = nbd_connect(conn->sioc, conn->saddr, conn->do_negotiation ? >updated_info : NULL, - conn->tlscreds, >ioc, >err); + conn->tlscreds, >ioc, conn->mon, >err); conn->updated_info.x_dirty_bitmap = NULL; conn->updated_info.name = NULL; -- 2.29.2
[PATCH v3 20/33] block/nbd: use negotiation of NBDClientConnection
Use a new possibility of negotiation in connect thread. Now we are on the way of simplifying connection_co. We want to move the whole reconnect code to NBDClientConnection. NBDClientConnection already updated to support negotiation and retry. Use now the first thing. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 44 ++-- 1 file changed, 30 insertions(+), 14 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 03ffe95231..c1e61a2a52 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -365,6 +365,7 @@ static int nbd_handle_updated_info(BlockDriverState *bs, Error **errp) static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) { int ret; +AioContext *aio_context = bdrv_get_aio_context(s->bs); if (!nbd_client_connecting(s)) { return; @@ -405,30 +406,44 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) s->ioc = NULL; } -s->sioc = nbd_co_establish_connection(s->conn, NULL, NULL, NULL); +s->sioc = nbd_co_establish_connection(s->conn, >info, >ioc, NULL); if (!s->sioc) { ret = -ECONNREFUSED; goto out; } +qio_channel_set_blocking(QIO_CHANNEL(s->sioc), false, NULL); +qio_channel_attach_aio_context(QIO_CHANNEL(s->sioc), aio_context); +if (s->ioc) { +qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL); +qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context); +} else { +s->ioc = QIO_CHANNEL(s->sioc); +object_ref(OBJECT(s->ioc)); +} + yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, s->bs); -bdrv_dec_in_flight(s->bs); +ret = nbd_handle_updated_info(s->bs, NULL); +if (ret < 0) { +/* + * We have connected, but must fail for other reasons. + * Send NBD_CMD_DISC as a courtesy to the server. + */ +NBDRequest request = { .type = NBD_CMD_DISC }; -ret = nbd_client_handshake(s->bs, NULL); +nbd_send_request(s->ioc, ); -if (s->drained) { -s->wait_drained_end = true; -while (s->drained) { -/* - * We may be entered once from nbd_client_attach_aio_context_bh - * and then from nbd_client_co_drain_end. So here is a loop. - */ -qemu_coroutine_yield(); -} +yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), + nbd_yank, s->bs); +object_unref(OBJECT(s->sioc)); +s->sioc = NULL; +object_unref(OBJECT(s->ioc)); +s->ioc = NULL; + +return; } -bdrv_inc_in_flight(s->bs); out: if (ret >= 0) { @@ -2048,7 +2063,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, goto fail; } -s->conn = nbd_client_connection_new(s->saddr, false, NULL, NULL, NULL); +s->conn = nbd_client_connection_new(s->saddr, true, s->export, +s->x_dirty_bitmap, s->tlscreds); /* * establish TCP connection, return error if it fails -- 2.29.2
[PATCH v3 11/33] block/nbd: rename NBDConnectThread to NBDClientConnection
We are going to move connection code to own file and want clear names and APIs. The structure is shared between user and (possibly) several runs of connect-thread. So it's wrong to call it "thread". Let's rename to something more generic. Appropriately rename connect_thread and thr variables to conn. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 137 ++-- 1 file changed, 68 insertions(+), 69 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index dab73bdf3b..9ce6a323eb 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -66,7 +66,7 @@ typedef enum NBDClientState { NBD_CLIENT_QUIT } NBDClientState; -typedef struct NBDConnectThread { +typedef struct NBDClientConnection { /* Initialization constants */ SocketAddress *saddr; /* address to connect to */ @@ -82,7 +82,7 @@ typedef struct NBDConnectThread { bool running; /* thread is running now */ bool detached; /* thread is detached and should cleanup the state */ Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ -} NBDConnectThread; +} NBDClientConnection; typedef struct BDRVNBDState { QIOChannelSocket *sioc; /* The master data channel */ @@ -115,34 +115,34 @@ typedef struct BDRVNBDState { char *x_dirty_bitmap; bool alloc_depth; -NBDConnectThread *connect_thread; +NBDClientConnection *conn; } BDRVNBDState; -static void nbd_free_connect_thread(NBDConnectThread *thr); +static void nbd_free_connect_thread(NBDClientConnection *conn); static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); static coroutine_fn QIOChannelSocket * -nbd_co_establish_connection(NBDConnectThread *thr, Error **errp); -static void nbd_co_establish_connection_cancel(NBDConnectThread *thr); +nbd_co_establish_connection(NBDClientConnection *conn, Error **errp); +static void nbd_co_establish_connection_cancel(NBDClientConnection *conn); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BlockDriverState *bs) { BDRVNBDState *s = (BDRVNBDState *)bs->opaque; -NBDConnectThread *thr = s->connect_thread; +NBDClientConnection *conn = s->conn; bool do_free; -qemu_mutex_lock(>mutex); -if (thr->running) { -thr->detached = true; +qemu_mutex_lock(>mutex); +if (conn->running) { +conn->detached = true; } -do_free = !thr->running && !thr->detached; -qemu_mutex_unlock(>mutex); +do_free = !conn->running && !conn->detached; +qemu_mutex_unlock(>mutex); /* the runaway thread will clean it up itself */ if (do_free) { -nbd_free_connect_thread(thr); +nbd_free_connect_thread(conn); } yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name)); @@ -286,7 +286,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(s->connect_thread); +nbd_co_establish_connection_cancel(s->conn); reconnect_delay_timer_del(s); @@ -326,7 +326,7 @@ static void nbd_teardown_connection(BlockDriverState *bs) if (s->connection_co_sleep_ns_state) { qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(s->connect_thread); +nbd_co_establish_connection_cancel(s->conn); } if (qemu_in_coroutine()) { s->teardown_co = qemu_coroutine_self(); @@ -353,101 +353,101 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) static void nbd_init_connect_thread(BDRVNBDState *s) { -s->connect_thread = g_new(NBDConnectThread, 1); +s->conn = g_new(NBDClientConnection, 1); -*s->connect_thread = (NBDConnectThread) { +*s->conn = (NBDClientConnection) { .saddr = QAPI_CLONE(SocketAddress, s->saddr), }; -qemu_mutex_init(>connect_thread->mutex); +qemu_mutex_init(>conn->mutex); } -static void nbd_free_connect_thread(NBDConnectThread *thr) +static void nbd_free_connect_thread(NBDClientConnection *conn) { -if (thr->sioc) { -qio_channel_close(QIO_CHANNEL(thr->sioc), NULL); -object_unref(OBJECT(thr->sioc)); +if (conn->sioc) { +qio_channel_close(QIO_CHANNEL(conn->sioc), NULL); +object_unref(OBJECT(conn->sioc)); } -error_free(thr->err); -qapi_free_SocketAddress(thr->saddr); -g_free(thr); +error_free(conn->err); +qapi_free_SocketAddress(conn->saddr); +g_free(conn); } static void *connect_thread_func(void *opaque) { -NBDConnectThread *thr = opaque; +NBDClientConnection *conn = opaque; int ret; bool do_free = false; -thr->sioc = qio_channel_socket_new(); +conn->sioc = qio_channel_socket_new(); -error_free(thr->err); -thr->err = NULL; -
[PATCH v3 19/33] block/nbd: split nbd_handle_updated_info out of nbd_client_handshake()
To be reused in the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 99 ++--- 1 file changed, 57 insertions(+), 42 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 5e63caaf4b..03ffe95231 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -318,6 +318,50 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } +/* + * Check s->info updated by negotiation process. + * Update @bs correspondingly to new options. + */ +static int nbd_handle_updated_info(BlockDriverState *bs, Error **errp) +{ +BDRVNBDState *s = (BDRVNBDState *)bs->opaque; +int ret; + +if (s->x_dirty_bitmap) { +if (!s->info.base_allocation) { +error_setg(errp, "requested x-dirty-bitmap %s not found", + s->x_dirty_bitmap); +return -EINVAL; +} +if (strcmp(s->x_dirty_bitmap, "qemu:allocation-depth") == 0) { +s->alloc_depth = true; +} +} + +if (s->info.flags & NBD_FLAG_READ_ONLY) { +ret = bdrv_apply_auto_read_only(bs, "NBD export is read-only", errp); +if (ret < 0) { +return ret; +} +} + +if (s->info.flags & NBD_FLAG_SEND_FUA) { +bs->supported_write_flags = BDRV_REQ_FUA; +bs->supported_zero_flags |= BDRV_REQ_FUA; +} + +if (s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) { +bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP; +if (s->info.flags & NBD_FLAG_SEND_FAST_ZERO) { +bs->supported_zero_flags |= BDRV_REQ_NO_FALLBACK; +} +} + +trace_nbd_client_handshake_success(s->export); + +return 0; +} + static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) { int ret; @@ -1579,49 +1623,13 @@ static int nbd_client_handshake(BlockDriverState *bs, Error **errp) s->sioc = NULL; return ret; } -if (s->x_dirty_bitmap) { -if (!s->info.base_allocation) { -error_setg(errp, "requested x-dirty-bitmap %s not found", - s->x_dirty_bitmap); -ret = -EINVAL; -goto fail; -} -if (strcmp(s->x_dirty_bitmap, "qemu:allocation-depth") == 0) { -s->alloc_depth = true; -} -} -if (s->info.flags & NBD_FLAG_READ_ONLY) { -ret = bdrv_apply_auto_read_only(bs, "NBD export is read-only", errp); -if (ret < 0) { -goto fail; -} -} -if (s->info.flags & NBD_FLAG_SEND_FUA) { -bs->supported_write_flags = BDRV_REQ_FUA; -bs->supported_zero_flags |= BDRV_REQ_FUA; -} -if (s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) { -bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP; -if (s->info.flags & NBD_FLAG_SEND_FAST_ZERO) { -bs->supported_zero_flags |= BDRV_REQ_NO_FALLBACK; -} -} -if (!s->ioc) { -s->ioc = QIO_CHANNEL(s->sioc); -object_ref(OBJECT(s->ioc)); -} - -trace_nbd_client_handshake_success(s->export); - -return 0; - - fail: -/* - * We have connected, but must fail for other reasons. - * Send NBD_CMD_DISC as a courtesy to the server. - */ -{ +ret = nbd_handle_updated_info(bs, errp); +if (ret < 0) { +/* + * We have connected, but must fail for other reasons. + * Send NBD_CMD_DISC as a courtesy to the server. + */ NBDRequest request = { .type = NBD_CMD_DISC }; nbd_send_request(s->ioc ?: QIO_CHANNEL(s->sioc), ); @@ -1635,6 +1643,13 @@ static int nbd_client_handshake(BlockDriverState *bs, Error **errp) return ret; } + +if (!s->ioc) { +s->ioc = QIO_CHANNEL(s->sioc); +object_ref(OBJECT(s->ioc)); +} + +return 0; } /* -- 2.29.2
[PATCH v3 31/33] block/nbd: add nbd_clinent_connected() helper
We already have two similar helpers for other state. Let's add another one for convenience. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 3b31941a83..6cc563e13d 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -124,15 +124,20 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs) s->x_dirty_bitmap = NULL; } +static bool nbd_client_connected(BDRVNBDState *s) +{ +return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED; +} + static void nbd_channel_error(BDRVNBDState *s, int ret) { if (ret == -EIO) { -if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { +if (nbd_client_connected(s)) { s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT : NBD_CLIENT_CONNECTING_NOWAIT; } } else { -if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { +if (nbd_client_connected(s)) { qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } s->state = NBD_CLIENT_QUIT; @@ -230,7 +235,7 @@ static void nbd_client_attach_aio_context(BlockDriverState *bs, * s->connection_co is either yielded from nbd_receive_reply or from * nbd_co_reconnect_loop() */ -if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) { +if (nbd_client_connected(s)) { qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context); } @@ -503,7 +508,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque) nbd_co_reconnect_loop(s); } -if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) { +if (!nbd_client_connected(s)) { continue; } @@ -582,7 +587,7 @@ static int nbd_co_send_request(BlockDriverState *bs, qemu_co_queue_wait(>free_sema, >send_mutex); } -if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) { +if (!nbd_client_connected(s)) { rc = -EIO; goto err; } @@ -609,8 +614,7 @@ static int nbd_co_send_request(BlockDriverState *bs, if (qiov) { qio_channel_set_cork(s->ioc, true); rc = nbd_send_request(s->ioc, request); -if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED && -rc >= 0) { +if (nbd_clinet_connected(s) && rc >= 0) { if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov, NULL) < 0) { rc = -EIO; @@ -935,7 +939,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk( s->requests[i].receiving = true; qemu_coroutine_yield(); s->requests[i].receiving = false; -if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) { +if (!nbd_client_connected(s)) { error_setg(errp, "Connection closed"); return -EIO; } @@ -1094,7 +1098,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s, NBDReply local_reply; NBDStructuredReplyChunk *chunk; Error *local_err = NULL; -if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) { +if (!nbd_client_connected(s)) { error_setg(_err, "Connection closed"); nbd_iter_channel_error(iter, -EIO, _err); goto break_loop; @@ -1119,8 +1123,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s, } /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */ -if (nbd_reply_is_simple(reply) || -qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) { +if (nbd_reply_is_simple(reply) || !nbd_client_connected(s)) { goto break_loop; } -- 2.29.2
[PATCH v3 27/33] block/nbd: split nbd_co_do_establish_connection out of nbd_reconnect_attempt
Split out part, that we want to reuse for nbd_open(). Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 79 +++-- 1 file changed, 41 insertions(+), 38 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 15b5921725..59971bfba8 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -361,11 +361,49 @@ static int nbd_handle_updated_info(BlockDriverState *bs, Error **errp) return 0; } -static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) +static int nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp) { +BDRVNBDState *s = (BDRVNBDState *)bs->opaque; int ret; -AioContext *aio_context = bdrv_get_aio_context(s->bs); +assert(!s->ioc); + +s->ioc = nbd_co_establish_connection(s->conn, >info, errp); +if (!s->ioc) { +return -ECONNREFUSED; +} + +ret = nbd_handle_updated_info(s->bs, NULL); +if (ret < 0) { +/* + * We have connected, but must fail for other reasons. + * Send NBD_CMD_DISC as a courtesy to the server. + */ +NBDRequest request = { .type = NBD_CMD_DISC }; + +nbd_send_request(s->ioc, ); + +object_unref(OBJECT(s->ioc)); +s->ioc = NULL; + +return ret; +} + +qio_channel_set_blocking(s->ioc, false, NULL); +qio_channel_attach_aio_context(s->ioc, bdrv_get_aio_context(bs)); + +yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, + bs); + +/* successfully connected */ +s->state = NBD_CLIENT_CONNECTED; +qemu_co_queue_restart_all(>free_sema); + +return 0; +} + +static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) +{ if (!nbd_client_connecting(s)) { return; } @@ -403,42 +441,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) s->ioc = NULL; } -s->ioc = nbd_co_establish_connection(s->conn, >info, NULL); -if (!s->ioc) { -ret = -ECONNREFUSED; -goto out; -} - -qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL); -qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context); - -yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, - s->bs); - -ret = nbd_handle_updated_info(s->bs, NULL); -if (ret < 0) { -/* - * We have connected, but must fail for other reasons. - * Send NBD_CMD_DISC as a courtesy to the server. - */ -NBDRequest request = { .type = NBD_CMD_DISC }; - -nbd_send_request(s->ioc, ); - -yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), - nbd_yank, s->bs); -object_unref(OBJECT(s->ioc)); -s->ioc = NULL; - -return; -} - -out: -if (ret >= 0) { -/* successfully connected */ -s->state = NBD_CLIENT_CONNECTED; -qemu_co_queue_restart_all(>free_sema); -} +nbd_co_do_establish_connection(s->bs, NULL); } static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s) -- 2.29.2
[PATCH v3 24/33] block/nbd: drop BDRVNBDState::sioc
Currently sioc pointer is used just to pass from socket-connection to nbd negotiation. Drop the field, and use local variables instead. With next commit we'll update nbd/client-connection.c to behave appropriately (return only top-most ioc, not two channels). Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 98 ++--- 1 file changed, 48 insertions(+), 50 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index cece53313c..f9b56c57b4 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -69,8 +69,7 @@ typedef enum NBDClientState { } NBDClientState; typedef struct BDRVNBDState { -QIOChannelSocket *sioc; /* The master data channel */ -QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */ +QIOChannel *ioc; /* The current I/O channel */ NBDExportInfo info; CoMutex send_mutex; @@ -102,9 +101,11 @@ typedef struct BDRVNBDState { NBDClientConnection *conn; } BDRVNBDState; -static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, -Error **errp); -static int nbd_client_handshake(BlockDriverState *bs, Error **errp); +static QIOChannelSocket *nbd_establish_connection(BlockDriverState *bs, + SocketAddress *saddr, + Error **errp); +static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc, +Error **errp); static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BlockDriverState *bs) @@ -364,6 +365,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) { int ret; AioContext *aio_context = bdrv_get_aio_context(s->bs); +QIOChannelSocket *sioc; if (!nbd_client_connecting(s)) { return; @@ -398,27 +400,26 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc)); yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, s->bs); -object_unref(OBJECT(s->sioc)); -s->sioc = NULL; object_unref(OBJECT(s->ioc)); s->ioc = NULL; } -s->sioc = nbd_co_establish_connection(s->conn, >info, >ioc, NULL); -if (!s->sioc) { +sioc = nbd_co_establish_connection(s->conn, >info, >ioc, NULL); +if (!sioc) { ret = -ECONNREFUSED; goto out; } -qio_channel_set_blocking(QIO_CHANNEL(s->sioc), false, NULL); -qio_channel_attach_aio_context(QIO_CHANNEL(s->sioc), aio_context); if (s->ioc) { -qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL); -qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context); +/* sioc is referenced by s->ioc */ +object_unref(OBJECT(sioc)); } else { -s->ioc = QIO_CHANNEL(s->sioc); -object_ref(OBJECT(s->ioc)); +s->ioc = QIO_CHANNEL(sioc); } +sioc = NULL; + +qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL); +qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context); yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, s->bs); @@ -435,8 +436,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, s->bs); -object_unref(OBJECT(s->sioc)); -s->sioc = NULL; object_unref(OBJECT(s->ioc)); s->ioc = NULL; @@ -571,8 +570,6 @@ static coroutine_fn void nbd_connection_entry(void *opaque) qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc)); yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, s->bs); -object_unref(OBJECT(s->sioc)); -s->sioc = NULL; object_unref(OBJECT(s->ioc)); s->ioc = NULL; } @@ -1571,7 +1568,7 @@ static void nbd_yank(void *opaque) BDRVNBDState *s = (BDRVNBDState *)bs->opaque; qatomic_store_release(>state, NBD_CLIENT_QUIT); -qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL); +qio_channel_shutdown(QIO_CHANNEL(s->ioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL); } static void nbd_client_close(BlockDriverState *bs) @@ -1586,57 +1583,64 @@ static void nbd_client_close(BlockDriverState *bs) nbd_teardown_connection(bs); } -static int nbd_establish_connection(BlockDriverState *bs, -SocketAddress *saddr, -Error **errp) +static QIOChannelSocket *nbd_establish_connection(BlockDriverState *bs, + SocketAddress *saddr, + Error **errp) { ERRP_GUARD(); -BDRVNBDState *s =
[PATCH v3 13/33] block/nbd: introduce nbd_client_connection_release()
Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 43 ++- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 21a4039359..8531d019b2 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -118,7 +118,7 @@ typedef struct BDRVNBDState { NBDClientConnection *conn; } BDRVNBDState; -static void nbd_free_connect_thread(NBDClientConnection *conn); +static void nbd_client_connection_release(NBDClientConnection *conn); static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); static coroutine_fn QIOChannelSocket * @@ -130,20 +130,9 @@ static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BlockDriverState *bs) { BDRVNBDState *s = (BDRVNBDState *)bs->opaque; -NBDClientConnection *conn = s->conn; -bool do_free; - -qemu_mutex_lock(>mutex); -if (conn->running) { -conn->detached = true; -} -do_free = !conn->running && !conn->detached; -qemu_mutex_unlock(>mutex); -/* the runaway thread will clean it up itself */ -if (do_free) { -nbd_free_connect_thread(conn); -} +nbd_client_connection_release(s->conn); +s->conn = NULL; yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name)); @@ -365,7 +354,7 @@ nbd_client_connection_new(const SocketAddress *saddr) return conn; } -static void nbd_free_connect_thread(NBDClientConnection *conn) +static void nbd_client_connection_do_free(NBDClientConnection *conn) { if (conn->sioc) { qio_channel_close(QIO_CHANNEL(conn->sioc), NULL); @@ -379,8 +368,8 @@ static void nbd_free_connect_thread(NBDClientConnection *conn) static void *connect_thread_func(void *opaque) { NBDClientConnection *conn = opaque; +bool do_free; int ret; -bool do_free = false; conn->sioc = qio_channel_socket_new(); @@ -405,12 +394,32 @@ static void *connect_thread_func(void *opaque) qemu_mutex_unlock(>mutex); if (do_free) { -nbd_free_connect_thread(conn); +nbd_client_connection_do_free(conn); } return NULL; } +static void nbd_client_connection_release(NBDClientConnection *conn) +{ +bool do_free; + +if (!conn) { +return; +} + +qemu_mutex_lock(>mutex); +if (conn->running) { +conn->detached = true; +} +do_free = !conn->running && !conn->detached; +qemu_mutex_unlock(>mutex); + +if (do_free) { +nbd_client_connection_do_free(conn); +} +} + /* * Get a new connection in context of @conn: * if thread is running, wait for completion -- 2.29.2
[PATCH v3 21/33] qemu-socket: pass monitor link to socket_get_fd directly
Detecting monitor by current coroutine works bad when we are not in coroutine context. And that's exactly so in nbd reconnect code, where qio_channel_socket_connect_sync() is called from thread. Add a possibility to pass monitor by hand, to be used in the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/io/channel-socket.h| 20 include/qemu/sockets.h | 2 +- io/channel-socket.c| 17 + tests/unit/test-util-sockets.c | 16 util/qemu-sockets.c| 10 +- 5 files changed, 47 insertions(+), 18 deletions(-) diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h index e747e63514..6d0915420d 100644 --- a/include/io/channel-socket.h +++ b/include/io/channel-socket.h @@ -78,6 +78,23 @@ qio_channel_socket_new_fd(int fd, Error **errp); +/** + * qio_channel_socket_connect_sync_mon: + * @ioc: the socket channel object + * @addr: the address to connect to + * @mon: current monitor. If NULL, it will be detected by + * current coroutine. + * @errp: pointer to a NULL-initialized error object + * + * Attempt to connect to the address @addr. This method + * will run in the foreground so the caller will not regain + * execution control until the connection is established or + * an error occurs. + */ +int qio_channel_socket_connect_sync_mon(QIOChannelSocket *ioc, +SocketAddress *addr, +Monitor *mon, +Error **errp); /** * qio_channel_socket_connect_sync: * @ioc: the socket channel object @@ -88,6 +105,9 @@ qio_channel_socket_new_fd(int fd, * will run in the foreground so the caller will not regain * execution control until the connection is established or * an error occurs. + * + * This a wrapper, calling qio_channel_socket_connect_sync_mon() + * with @mon=NULL. */ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc, SocketAddress *addr, diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h index 7d1f813576..cdf6f2413b 100644 --- a/include/qemu/sockets.h +++ b/include/qemu/sockets.h @@ -41,7 +41,7 @@ int unix_listen(const char *path, Error **errp); int unix_connect(const char *path, Error **errp); SocketAddress *socket_parse(const char *str, Error **errp); -int socket_connect(SocketAddress *addr, Error **errp); +int socket_connect(SocketAddress *addr, Monitor *mon, Error **errp); int socket_listen(SocketAddress *addr, int num, Error **errp); void socket_listen_cleanup(int fd, Error **errp); int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp); diff --git a/io/channel-socket.c b/io/channel-socket.c index de259f7eed..9dc42ca29d 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -135,14 +135,15 @@ qio_channel_socket_new_fd(int fd, } -int qio_channel_socket_connect_sync(QIOChannelSocket *ioc, -SocketAddress *addr, -Error **errp) +int qio_channel_socket_connect_sync_mon(QIOChannelSocket *ioc, +SocketAddress *addr, +Monitor *mon, +Error **errp) { int fd; trace_qio_channel_socket_connect_sync(ioc, addr); -fd = socket_connect(addr, errp); +fd = socket_connect(addr, mon, errp); if (fd < 0) { trace_qio_channel_socket_connect_fail(ioc); return -1; @@ -158,6 +159,14 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc, } +int qio_channel_socket_connect_sync(QIOChannelSocket *ioc, +SocketAddress *addr, +Error **errp) +{ +return qio_channel_socket_connect_sync_mon(ioc, addr, NULL, errp); +} + + static void qio_channel_socket_connect_worker(QIOTask *task, gpointer opaque) { diff --git a/tests/unit/test-util-sockets.c b/tests/unit/test-util-sockets.c index 72b9246529..d902ecede7 100644 --- a/tests/unit/test-util-sockets.c +++ b/tests/unit/test-util-sockets.c @@ -90,7 +90,7 @@ static void test_socket_fd_pass_name_good(void) addr.type = SOCKET_ADDRESS_TYPE_FD; addr.u.fd.str = g_strdup(mon_fdname); -fd = socket_connect(, _abort); +fd = socket_connect(, NULL, _abort); g_assert_cmpint(fd, !=, -1); g_assert_cmpint(fd, !=, mon_fd); close(fd); @@ -122,7 +122,7 @@ static void test_socket_fd_pass_name_bad(void) addr.type = SOCKET_ADDRESS_TYPE_FD; addr.u.fd.str = g_strdup(mon_fdname); -fd = socket_connect(, ); +fd = socket_connect(, NULL, ); g_assert_cmpint(fd, ==, -1); error_free_or_abort(); @@ -149,7 +149,7 @@ static void test_socket_fd_pass_name_nomon(void) addr.type = SOCKET_ADDRESS_TYPE_FD; addr.u.fd.str =
[PATCH v3 18/33] nbd/client-connection: shutdown connection on release
Now, when thread can do negotiation and retry, it may run relatively long. We need a mechanism to stop it, when user is not interested in result anymore. So, on nbd_client_connection_release() let's shutdown the socked, and do not retry connection if thread is detached. Signed-off-by: Vladimir Sementsov-Ogievskiy --- nbd/client-connection.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/nbd/client-connection.c b/nbd/client-connection.c index 002bd91f42..54f73c6c24 100644 --- a/nbd/client-connection.c +++ b/nbd/client-connection.c @@ -158,9 +158,13 @@ static void *connect_thread_func(void *opaque) uint64_t timeout = 1; uint64_t max_timeout = 16; -while (true) { +qemu_mutex_lock(>mutex); +while (!conn->detached) { +assert(!conn->sioc); conn->sioc = qio_channel_socket_new(); +qemu_mutex_unlock(>mutex); + error_free(conn->err); conn->err = NULL; conn->updated_info = conn->initial_info; @@ -171,14 +175,20 @@ static void *connect_thread_func(void *opaque) conn->updated_info.x_dirty_bitmap = NULL; conn->updated_info.name = NULL; +qemu_mutex_lock(>mutex); + if (ret < 0) { object_unref(OBJECT(conn->sioc)); conn->sioc = NULL; -if (conn->do_retry) { +if (conn->do_retry && !conn->detached) { +qemu_mutex_unlock(>mutex); + sleep(timeout); if (timeout < max_timeout) { timeout *= 2; } + +qemu_mutex_lock(>mutex); continue; } } @@ -186,15 +196,17 @@ static void *connect_thread_func(void *opaque) break; } -WITH_QEMU_LOCK_GUARD(>mutex) { -assert(conn->running); -conn->running = false; -if (conn->wait_co) { -aio_co_schedule(NULL, conn->wait_co); -conn->wait_co = NULL; -} -do_free = conn->detached; +/* mutex is locked */ + +assert(conn->running); +conn->running = false; +if (conn->wait_co) { +aio_co_schedule(NULL, conn->wait_co); +conn->wait_co = NULL; } +do_free = conn->detached; + +qemu_mutex_unlock(>mutex); if (do_free) { nbd_client_connection_do_free(conn); @@ -215,6 +227,10 @@ void nbd_client_connection_release(NBDClientConnection *conn) if (conn->running) { conn->detached = true; } +if (conn->sioc) { +qio_channel_shutdown(QIO_CHANNEL(conn->sioc), + QIO_CHANNEL_SHUTDOWN_BOTH, NULL); +} do_free = !conn->running && !conn->detached; qemu_mutex_unlock(>mutex); -- 2.29.2
[PATCH v3 14/33] nbd: move connection code from block/nbd to nbd/client-connection
We now have bs-independent connection API, which consists of four functions: nbd_client_connection_new() nbd_client_connection_unref() nbd_co_establish_connection() nbd_co_establish_connection_cancel() Move them to a separate file together with NBDClientConnection structure which becomes private to the new API. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/nbd.h | 11 +++ block/nbd.c | 187 --- nbd/client-connection.c | 212 nbd/meson.build | 1 + 4 files changed, 224 insertions(+), 187 deletions(-) create mode 100644 nbd/client-connection.c diff --git a/include/block/nbd.h b/include/block/nbd.h index 5f34d23bb0..57381be76f 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -406,4 +406,15 @@ const char *nbd_info_lookup(uint16_t info); const char *nbd_cmd_lookup(uint16_t info); const char *nbd_err_lookup(int err); +/* nbd/client-connection.c */ +typedef struct NBDClientConnection NBDClientConnection; + +NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr); +void nbd_client_connection_release(NBDClientConnection *conn); + +QIOChannelSocket *coroutine_fn +nbd_co_establish_connection(NBDClientConnection *conn, Error **errp); + +void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection *conn); + #endif diff --git a/block/nbd.c b/block/nbd.c index 8531d019b2..9bd68dcf10 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -66,24 +66,6 @@ typedef enum NBDClientState { NBD_CLIENT_QUIT } NBDClientState; -typedef struct NBDClientConnection { -/* Initialization constants */ -SocketAddress *saddr; /* address to connect to */ - -/* - * Result of last attempt. Valid in FAIL and SUCCESS states. - * If you want to steal error, don't forget to set pointer to NULL. - */ -QIOChannelSocket *sioc; -Error *err; - -QemuMutex mutex; -/* All further fields are protected by mutex */ -bool running; /* thread is running now */ -bool detached; /* thread is detached and should cleanup the state */ -Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ -} NBDClientConnection; - typedef struct BDRVNBDState { QIOChannelSocket *sioc; /* The master data channel */ QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */ @@ -118,12 +100,8 @@ typedef struct BDRVNBDState { NBDClientConnection *conn; } BDRVNBDState; -static void nbd_client_connection_release(NBDClientConnection *conn); static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); -static coroutine_fn QIOChannelSocket * -nbd_co_establish_connection(NBDClientConnection *conn, Error **errp); -static void nbd_co_establish_connection_cancel(NBDClientConnection *conn); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); @@ -340,171 +318,6 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } -static NBDClientConnection * -nbd_client_connection_new(const SocketAddress *saddr) -{ -NBDClientConnection *conn = g_new(NBDClientConnection, 1); - -*conn = (NBDClientConnection) { -.saddr = QAPI_CLONE(SocketAddress, saddr), -}; - -qemu_mutex_init(>mutex); - -return conn; -} - -static void nbd_client_connection_do_free(NBDClientConnection *conn) -{ -if (conn->sioc) { -qio_channel_close(QIO_CHANNEL(conn->sioc), NULL); -object_unref(OBJECT(conn->sioc)); -} -error_free(conn->err); -qapi_free_SocketAddress(conn->saddr); -g_free(conn); -} - -static void *connect_thread_func(void *opaque) -{ -NBDClientConnection *conn = opaque; -bool do_free; -int ret; - -conn->sioc = qio_channel_socket_new(); - -error_free(conn->err); -conn->err = NULL; -ret = qio_channel_socket_connect_sync(conn->sioc, conn->saddr, >err); -if (ret < 0) { -object_unref(OBJECT(conn->sioc)); -conn->sioc = NULL; -} - -qemu_mutex_lock(>mutex); - -assert(conn->running); -conn->running = false; -if (conn->wait_co) { -aio_co_schedule(NULL, conn->wait_co); -conn->wait_co = NULL; -} -do_free = conn->detached; - -qemu_mutex_unlock(>mutex); - -if (do_free) { -nbd_client_connection_do_free(conn); -} - -return NULL; -} - -static void nbd_client_connection_release(NBDClientConnection *conn) -{ -bool do_free; - -if (!conn) { -return; -} - -qemu_mutex_lock(>mutex); -if (conn->running) { -conn->detached = true; -} -do_free = !conn->running && !conn->detached; -qemu_mutex_unlock(>mutex); - -if (do_free) { -nbd_client_connection_do_free(conn); -} -} - -/* - * Get a new connection in context of @conn: - * if thread is
[PATCH v3 15/33] nbd/client-connection: use QEMU_LOCK_GUARD
Signed-off-by: Vladimir Sementsov-Ogievskiy --- nbd/client-connection.c | 94 ++--- 1 file changed, 42 insertions(+), 52 deletions(-) diff --git a/nbd/client-connection.c b/nbd/client-connection.c index 4e39a5b1af..b45a0bd5f6 100644 --- a/nbd/client-connection.c +++ b/nbd/client-connection.c @@ -87,17 +87,16 @@ static void *connect_thread_func(void *opaque) conn->sioc = NULL; } -qemu_mutex_lock(>mutex); - -assert(conn->running); -conn->running = false; -if (conn->wait_co) { -aio_co_schedule(NULL, conn->wait_co); -conn->wait_co = NULL; +WITH_QEMU_LOCK_GUARD(>mutex) { +assert(conn->running); +conn->running = false; +if (conn->wait_co) { +aio_co_schedule(NULL, conn->wait_co); +conn->wait_co = NULL; +} } do_free = conn->detached; -qemu_mutex_unlock(>mutex); if (do_free) { nbd_client_connection_do_free(conn); @@ -136,61 +135,54 @@ void nbd_client_connection_release(NBDClientConnection *conn) QIOChannelSocket *coroutine_fn nbd_co_establish_connection(NBDClientConnection *conn, Error **errp) { -QIOChannelSocket *sioc = NULL; QemuThread thread; -qemu_mutex_lock(>mutex); - -/* - * Don't call nbd_co_establish_connection() in several coroutines in - * parallel. Only one call at once is supported. - */ -assert(!conn->wait_co); - -if (!conn->running) { -if (conn->sioc) { -/* Previous attempt finally succeeded in background */ -sioc = g_steal_pointer(>sioc); -qemu_mutex_unlock(>mutex); - -return sioc; +WITH_QEMU_LOCK_GUARD(>mutex) { +/* + * Don't call nbd_co_establish_connection() in several coroutines in + * parallel. Only one call at once is supported. + */ +assert(!conn->wait_co); + +if (!conn->running) { +if (conn->sioc) { +/* Previous attempt finally succeeded in background */ +return g_steal_pointer(>sioc); +} + +conn->running = true; +error_free(conn->err); +conn->err = NULL; +qemu_thread_create(, "nbd-connect", + connect_thread_func, conn, QEMU_THREAD_DETACHED); } -conn->running = true; -error_free(conn->err); -conn->err = NULL; -qemu_thread_create(, "nbd-connect", - connect_thread_func, conn, QEMU_THREAD_DETACHED); +conn->wait_co = qemu_coroutine_self(); } -conn->wait_co = qemu_coroutine_self(); - -qemu_mutex_unlock(>mutex); - /* * We are going to wait for connect-thread finish, but * nbd_co_establish_connection_cancel() can interrupt. */ qemu_coroutine_yield(); -qemu_mutex_lock(>mutex); - -if (conn->running) { -/* - * Obviously, drained section wants to start. Report the attempt as - * failed. Still connect thread is executing in background, and its - * result may be used for next connection attempt. - */ -error_setg(errp, "Connection attempt cancelled by other operation"); -} else { -error_propagate(errp, conn->err); -conn->err = NULL; -sioc = g_steal_pointer(>sioc); +WITH_QEMU_LOCK_GUARD(>mutex) { +if (conn->running) { +/* + * Obviously, drained section wants to start. Report the attempt as + * failed. Still connect thread is executing in background, and its + * result may be used for next connection attempt. + */ +error_setg(errp, "Connection attempt cancelled by other operation"); +return NULL; +} else { +error_propagate(errp, conn->err); +conn->err = NULL; +return g_steal_pointer(>sioc); +} } -qemu_mutex_unlock(>mutex); - -return sioc; +abort(); /* unreachable */ } /* @@ -201,12 +193,10 @@ nbd_co_establish_connection(NBDClientConnection *conn, Error **errp) */ void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection *conn) { -qemu_mutex_lock(>mutex); +QEMU_LOCK_GUARD(>mutex); if (conn->wait_co) { aio_co_schedule(NULL, conn->wait_co); conn->wait_co = NULL; } - -qemu_mutex_unlock(>mutex); } -- 2.29.2
[PATCH v3 23/33] block/nbd: nbd_teardown_connection() don't touch s->sioc
Negotiation during reconnect is now done in thread, and s->sioc is not available during negotiation. Negotiation in thread will be cancelled by nbd_client_connection_release() called from nbd_clear_bdrvstate(). So, we don't need this code chunk anymore. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 4 1 file changed, 4 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index ec69a4ad65..cece53313c 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -284,10 +284,6 @@ static void nbd_teardown_connection(BlockDriverState *bs) if (s->ioc) { /* finish any pending coroutines */ qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); -} else if (s->sioc) { -/* abort negotiation */ -qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, - NULL); } s->state = NBD_CLIENT_QUIT; -- 2.29.2
[PATCH v3 12/33] block/nbd: introduce nbd_client_connection_new()
This is the last step of creating bs-independent nbd connection interface. With next commit we can finally move it to separate file. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Roman Kagan --- block/nbd.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 9ce6a323eb..21a4039359 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -351,15 +351,18 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } -static void nbd_init_connect_thread(BDRVNBDState *s) +static NBDClientConnection * +nbd_client_connection_new(const SocketAddress *saddr) { -s->conn = g_new(NBDClientConnection, 1); +NBDClientConnection *conn = g_new(NBDClientConnection, 1); -*s->conn = (NBDClientConnection) { -.saddr = QAPI_CLONE(SocketAddress, s->saddr), +*conn = (NBDClientConnection) { +.saddr = QAPI_CLONE(SocketAddress, saddr), }; -qemu_mutex_init(>conn->mutex); +qemu_mutex_init(>mutex); + +return conn; } static void nbd_free_connect_thread(NBDClientConnection *conn) @@ -2208,7 +2211,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, goto fail; } -nbd_init_connect_thread(s); +s->conn = nbd_client_connection_new(s->saddr); /* * establish TCP connection, return error if it fails -- 2.29.2
[PATCH v3 10/33] block/nbd: make nbd_co_establish_connection_cancel() bs-independent
nbd_co_establish_connection_cancel() actually needs only pointer to NBDConnectThread. So, make it clean. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Roman Kagan --- block/nbd.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index dd97ea0916..dab73bdf3b 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -123,7 +123,7 @@ static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); static coroutine_fn QIOChannelSocket * nbd_co_establish_connection(NBDConnectThread *thr, Error **errp); -static void nbd_co_establish_connection_cancel(BlockDriverState *bs); +static void nbd_co_establish_connection_cancel(NBDConnectThread *thr); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); @@ -286,7 +286,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(bs); +nbd_co_establish_connection_cancel(s->connect_thread); reconnect_delay_timer_del(s); @@ -326,7 +326,7 @@ static void nbd_teardown_connection(BlockDriverState *bs) if (s->connection_co_sleep_ns_state) { qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(bs); +nbd_co_establish_connection_cancel(s->connect_thread); } if (qemu_in_coroutine()) { s->teardown_co = qemu_coroutine_self(); @@ -477,14 +477,12 @@ nbd_co_establish_connection(NBDConnectThread *thr, Error **errp) /* * nbd_co_establish_connection_cancel - * Cancel nbd_co_establish_connection asynchronously: it will finish soon, to - * allow drained section to begin. + * Cancel nbd_co_establish_connection() asynchronously. Note, that it doesn't + * stop the thread itself neither close the socket. It just safely wakes + * nbd_co_establish_connection() sleeping in the yield(). */ -static void nbd_co_establish_connection_cancel(BlockDriverState *bs) +static void nbd_co_establish_connection_cancel(NBDConnectThread *thr) { -BDRVNBDState *s = bs->opaque; -NBDConnectThread *thr = s->connect_thread; - qemu_mutex_lock(>mutex); if (thr->wait_co) { -- 2.29.2
[PATCH v3 09/33] block/nbd: bs-independent interface for nbd_co_establish_connection()
We are going to split connection code to separate file. Now we are ready to give nbd_co_establish_connection() clean and bs-independent interface. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Roman Kagan --- block/nbd.c | 49 +++-- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 2b26a033a4..dd97ea0916 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -121,7 +121,8 @@ typedef struct BDRVNBDState { static void nbd_free_connect_thread(NBDConnectThread *thr); static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); -static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); +static coroutine_fn QIOChannelSocket * +nbd_co_establish_connection(NBDConnectThread *thr, Error **errp); static void nbd_co_establish_connection_cancel(BlockDriverState *bs); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); @@ -407,22 +408,36 @@ static void *connect_thread_func(void *opaque) return NULL; } -static int coroutine_fn -nbd_co_establish_connection(BlockDriverState *bs, Error **errp) +/* + * Get a new connection in context of @thr: + * if thread is running, wait for completion + * if thread is already succeeded in background, and user didn't get the + * result, just return it now + * otherwise if thread is not running, start a thread and wait for completion + */ +static coroutine_fn QIOChannelSocket * +nbd_co_establish_connection(NBDConnectThread *thr, Error **errp) { +QIOChannelSocket *sioc = NULL; QemuThread thread; -BDRVNBDState *s = bs->opaque; -NBDConnectThread *thr = s->connect_thread; - -assert(!s->sioc); qemu_mutex_lock(>mutex); +/* + * Don't call nbd_co_establish_connection() in several coroutines in + * parallel. Only one call at once is supported. + */ +assert(!thr->wait_co); + if (!thr->running) { if (thr->sioc) { /* Previous attempt finally succeeded in background */ -goto out; +sioc = g_steal_pointer(>sioc); +qemu_mutex_unlock(>mutex); + +return sioc; } + thr->running = true; error_free(thr->err); thr->err = NULL; @@ -436,13 +451,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) /* * We are going to wait for connect-thread finish, but - * nbd_client_co_drain_begin() can interrupt. + * nbd_co_establish_connection_cancel() can interrupt. */ qemu_coroutine_yield(); qemu_mutex_lock(>mutex); -out: if (thr->running) { /* * Obviously, drained section wants to start. Report the attempt as @@ -453,17 +467,12 @@ out: } else { error_propagate(errp, thr->err); thr->err = NULL; -s->sioc = thr->sioc; -thr->sioc = NULL; -if (s->sioc) { -yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), - nbd_yank, bs); -} +sioc = g_steal_pointer(>sioc); } qemu_mutex_unlock(>mutex); -return s->sioc ? 0 : -1; +return sioc; } /* @@ -530,11 +539,15 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) s->ioc = NULL; } -if (nbd_co_establish_connection(s->bs, NULL) < 0) { +s->sioc = nbd_co_establish_connection(s->connect_thread, NULL); +if (!s->sioc) { ret = -ECONNREFUSED; goto out; } +yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, + s->bs); + bdrv_dec_in_flight(s->bs); ret = nbd_client_handshake(s->bs, NULL); -- 2.29.2
[PATCH v3 08/33] block/nbd: drop thr->state
We don't need all these states. The code refactored to use two boolean variables looks simpler. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 125 ++-- 1 file changed, 34 insertions(+), 91 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index e1f39eda6c..2b26a033a4 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -66,24 +66,6 @@ typedef enum NBDClientState { NBD_CLIENT_QUIT } NBDClientState; -typedef enum NBDConnectThreadState { -/* No thread, no pending results */ -CONNECT_THREAD_NONE, - -/* Thread is running, no results for now */ -CONNECT_THREAD_RUNNING, - -/* - * Thread is running, but requestor exited. Thread should close - * the new socket and free the connect state on exit. - */ -CONNECT_THREAD_RUNNING_DETACHED, - -/* Thread finished, results are stored in a state */ -CONNECT_THREAD_FAIL, -CONNECT_THREAD_SUCCESS -} NBDConnectThreadState; - typedef struct NBDConnectThread { /* Initialization constants */ SocketAddress *saddr; /* address to connect to */ @@ -97,7 +79,8 @@ typedef struct NBDConnectThread { QemuMutex mutex; /* All further fields are protected by mutex */ -NBDConnectThreadState state; /* current state of the thread */ +bool running; /* thread is running now */ +bool detached; /* thread is detached and should cleanup the state */ Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ } NBDConnectThread; @@ -147,17 +130,17 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs) { BDRVNBDState *s = (BDRVNBDState *)bs->opaque; NBDConnectThread *thr = s->connect_thread; -bool thr_running; +bool do_free; qemu_mutex_lock(>mutex); -thr_running = thr->state == CONNECT_THREAD_RUNNING; -if (thr_running) { -thr->state = CONNECT_THREAD_RUNNING_DETACHED; +if (thr->running) { +thr->detached = true; } +do_free = !thr->running && !thr->detached; qemu_mutex_unlock(>mutex); /* the runaway thread will clean it up itself */ -if (!thr_running) { +if (do_free) { nbd_free_connect_thread(thr); } @@ -373,7 +356,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s) *s->connect_thread = (NBDConnectThread) { .saddr = QAPI_CLONE(SocketAddress, s->saddr), -.state = CONNECT_THREAD_NONE, }; qemu_mutex_init(>connect_thread->mutex); @@ -408,20 +390,13 @@ static void *connect_thread_func(void *opaque) qemu_mutex_lock(>mutex); -switch (thr->state) { -case CONNECT_THREAD_RUNNING: -thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS; -if (thr->wait_co) { -aio_co_schedule(NULL, thr->wait_co); -thr->wait_co = NULL; -} -break; -case CONNECT_THREAD_RUNNING_DETACHED: -do_free = true; -break; -default: -abort(); +assert(thr->running); +thr->running = false; +if (thr->wait_co) { +aio_co_schedule(NULL, thr->wait_co); +thr->wait_co = NULL; } +do_free = thr->detached; qemu_mutex_unlock(>mutex); @@ -435,36 +410,24 @@ static void *connect_thread_func(void *opaque) static int coroutine_fn nbd_co_establish_connection(BlockDriverState *bs, Error **errp) { -int ret; QemuThread thread; BDRVNBDState *s = bs->opaque; NBDConnectThread *thr = s->connect_thread; +assert(!s->sioc); + qemu_mutex_lock(>mutex); -switch (thr->state) { -case CONNECT_THREAD_FAIL: -case CONNECT_THREAD_NONE: +if (!thr->running) { +if (thr->sioc) { +/* Previous attempt finally succeeded in background */ +goto out; +} +thr->running = true; error_free(thr->err); thr->err = NULL; -thr->state = CONNECT_THREAD_RUNNING; qemu_thread_create(, "nbd-connect", connect_thread_func, thr, QEMU_THREAD_DETACHED); -break; -case CONNECT_THREAD_SUCCESS: -/* Previous attempt finally succeeded in background */ -thr->state = CONNECT_THREAD_NONE; -s->sioc = thr->sioc; -thr->sioc = NULL; -yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), - nbd_yank, bs); -qemu_mutex_unlock(>mutex); -return 0; -case CONNECT_THREAD_RUNNING: -/* Already running, will wait */ -break; -default: -abort(); } thr->wait_co = qemu_coroutine_self(); @@ -479,10 +442,15 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) qemu_mutex_lock(>mutex); -switch (thr->state) { -case CONNECT_THREAD_SUCCESS: -case CONNECT_THREAD_FAIL: -thr->state = CONNECT_THREAD_NONE; +out: +if (thr->running) { +/* + * Obviously, drained section wants to start. Report the attempt as + *
[PATCH v3 06/33] util/async: aio_co_schedule(): support reschedule in same ctx
With the following patch we want to call wake coroutine from thread. And it doesn't work with aio_co_wake: Assume we have no iothreads. Assume we have a coroutine A, which waits in the yield point for external aio_co_wake(), and no progress can be done until it happen. Main thread is in blocking aio_poll() (for example, in blk_read()). Now, in a separate thread we do aio_co_wake(). It calls aio_co_enter(), which goes through last "else" branch and do aio_context_acquire(ctx). Now we have a deadlock, as aio_poll() will not release the context lock until some progress is done, and progress can't be done until aio_co_wake() wake the coroutine A. And it can't because it wait for aio_context_acquire(). Still, aio_co_schedule() works well in parallel with blocking aio_poll(). So we want use it. Let's add a possibility of rescheduling coroutine in same ctx where it was yield'ed. Fetch co->ctx in same way as it is done in aio_co_wake(). Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/block/aio.h | 2 +- util/async.c| 8 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/include/block/aio.h b/include/block/aio.h index 5f342267d5..744b695525 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -643,7 +643,7 @@ static inline bool aio_node_check(AioContext *ctx, bool is_external) /** * aio_co_schedule: - * @ctx: the aio context + * @ctx: the aio context, if NULL, the current ctx of @co will be used. * @co: the coroutine * * Start a coroutine on a remote AioContext. diff --git a/util/async.c b/util/async.c index 674dbefb7c..750be555c6 100644 --- a/util/async.c +++ b/util/async.c @@ -545,6 +545,14 @@ fail: void aio_co_schedule(AioContext *ctx, Coroutine *co) { +if (!ctx) { +/* + * Read coroutine before co->ctx. Matches smp_wmb in + * qemu_coroutine_enter. + */ +smp_read_barrier_depends(); +ctx = qatomic_read(>ctx); +} trace_aio_co_schedule(ctx, co); const char *scheduled = qatomic_cmpxchg(>scheduled, NULL, __func__); -- 2.29.2
[PATCH v3 07/33] block/nbd: simplify waking of nbd_co_establish_connection()
Instead of connect_bh, bh_ctx and wait_connect fields we can live with only one link to waiting coroutine, protected by mutex. So new logic is: nbd_co_establish_connection() sets wait_co under mutex, release the mutex and do yield(). Note, that wait_co may be scheduled by thread immediately after unlocking the mutex. Still, in main thread (or iothread) we'll not reach the code for entering the coroutine until the yield() so we are safe. Both connect_thread_func() and nbd_co_establish_connection_cancel() do the following to handle wait_co: Under mutex, if thr->wait_co is not NULL, call aio_co_wake() (which never tries to acquire aio context since previous commit, so we are safe to do it under thr->mutex) and set thr->wait_co to NULL. This way we protect ourselves of scheduling it twice. Also this commit make nbd_co_establish_connection() less connected to bs (we have generic pointer to the coroutine, not use s->connection_co directly). So, we are on the way of splitting connection API out of nbd.c (which is overcomplicated now). Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 49 + 1 file changed, 9 insertions(+), 40 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index d67556c7ee..e1f39eda6c 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -87,12 +87,6 @@ typedef enum NBDConnectThreadState { typedef struct NBDConnectThread { /* Initialization constants */ SocketAddress *saddr; /* address to connect to */ -/* - * Bottom half to schedule on completion. Scheduled only if bh_ctx is not - * NULL - */ -QEMUBHFunc *bh_func; -void *bh_opaque; /* * Result of last attempt. Valid in FAIL and SUCCESS states. @@ -101,10 +95,10 @@ typedef struct NBDConnectThread { QIOChannelSocket *sioc; Error *err; -/* state and bh_ctx are protected by mutex */ QemuMutex mutex; +/* All further fields are protected by mutex */ NBDConnectThreadState state; /* current state of the thread */ -AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */ +Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ } NBDConnectThread; typedef struct BDRVNBDState { @@ -138,7 +132,6 @@ typedef struct BDRVNBDState { char *x_dirty_bitmap; bool alloc_depth; -bool wait_connect; NBDConnectThread *connect_thread; } BDRVNBDState; @@ -374,15 +367,6 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } -static void connect_bh(void *opaque) -{ -BDRVNBDState *state = opaque; - -assert(state->wait_connect); -state->wait_connect = false; -aio_co_wake(state->connection_co); -} - static void nbd_init_connect_thread(BDRVNBDState *s) { s->connect_thread = g_new(NBDConnectThread, 1); @@ -390,8 +374,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s) *s->connect_thread = (NBDConnectThread) { .saddr = QAPI_CLONE(SocketAddress, s->saddr), .state = CONNECT_THREAD_NONE, -.bh_func = connect_bh, -.bh_opaque = s, }; qemu_mutex_init(>connect_thread->mutex); @@ -429,11 +411,9 @@ static void *connect_thread_func(void *opaque) switch (thr->state) { case CONNECT_THREAD_RUNNING: thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS; -if (thr->bh_ctx) { -aio_bh_schedule_oneshot(thr->bh_ctx, thr->bh_func, thr->bh_opaque); - -/* play safe, don't reuse bh_ctx on further connection attempts */ -thr->bh_ctx = NULL; +if (thr->wait_co) { +aio_co_schedule(NULL, thr->wait_co); +thr->wait_co = NULL; } break; case CONNECT_THREAD_RUNNING_DETACHED: @@ -487,20 +467,14 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) abort(); } -thr->bh_ctx = qemu_get_current_aio_context(); +thr->wait_co = qemu_coroutine_self(); qemu_mutex_unlock(>mutex); - /* * We are going to wait for connect-thread finish, but * nbd_client_co_drain_begin() can interrupt. - * - * Note that wait_connect variable is not visible for connect-thread. It - * doesn't need mutex protection, it used only inside home aio context of - * bs. */ -s->wait_connect = true; qemu_coroutine_yield(); qemu_mutex_lock(>mutex); @@ -555,24 +529,19 @@ static void nbd_co_establish_connection_cancel(BlockDriverState *bs) { BDRVNBDState *s = bs->opaque; NBDConnectThread *thr = s->connect_thread; -bool wake = false; qemu_mutex_lock(>mutex); if (thr->state == CONNECT_THREAD_RUNNING) { /* We can cancel only in running state, when bh is not yet scheduled */ -thr->bh_ctx = NULL; -if (s->wait_connect) { -s->wait_connect = false; -wake = true; +if (thr->wait_co) { +
[PATCH v3 02/33] block/nbd: fix how state is cleared on nbd_open() failure paths
We have two "return error" paths in nbd_open() after nbd_process_options(). Actually we should call nbd_clear_bdrvstate() on these paths. Interesting that nbd_process_options() calls nbd_clear_bdrvstate() by itself. Let's fix leaks and refactor things to be more obvious: - intialize yank at top of nbd_open() - move yank cleanup to nbd_clear_bdrvstate() - refactor nbd_open() so that all failure paths except for yank-register goes through nbd_clear_bdrvstate() Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 739ae2941f..a407a3814b 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -152,8 +152,12 @@ static void nbd_co_establish_connection_cancel(BlockDriverState *bs, static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); -static void nbd_clear_bdrvstate(BDRVNBDState *s) +static void nbd_clear_bdrvstate(BlockDriverState *bs) { +BDRVNBDState *s = (BDRVNBDState *)bs->opaque; + +yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name)); + object_unref(OBJECT(s->tlscreds)); qapi_free_SocketAddress(s->saddr); s->saddr = NULL; @@ -2279,9 +2283,6 @@ static int nbd_process_options(BlockDriverState *bs, QDict *options, ret = 0; error: -if (ret < 0) { -nbd_clear_bdrvstate(s); -} qemu_opts_del(opts); return ret; } @@ -2292,11 +2293,6 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, int ret; BDRVNBDState *s = (BDRVNBDState *)bs->opaque; -ret = nbd_process_options(bs, options, errp); -if (ret < 0) { -return ret; -} - s->bs = bs; qemu_co_mutex_init(>send_mutex); qemu_co_queue_init(>free_sema); @@ -2305,20 +2301,23 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, return -EEXIST; } +ret = nbd_process_options(bs, options, errp); +if (ret < 0) { +goto fail; +} + /* * establish TCP connection, return error if it fails * TODO: Configurable retry-until-timeout behaviour. */ if (nbd_establish_connection(bs, s->saddr, errp) < 0) { -yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name)); -return -ECONNREFUSED; +ret = -ECONNREFUSED; +goto fail; } ret = nbd_client_handshake(bs, errp); if (ret < 0) { -yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name)); -nbd_clear_bdrvstate(s); -return ret; +goto fail; } /* successfully connected */ s->state = NBD_CLIENT_CONNECTED; @@ -2330,6 +2329,10 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, aio_co_schedule(bdrv_get_aio_context(bs), s->connection_co); return 0; + +fail: +nbd_clear_bdrvstate(bs); +return ret; } static int nbd_co_flush(BlockDriverState *bs) @@ -2373,11 +2376,8 @@ static void nbd_refresh_limits(BlockDriverState *bs, Error **errp) static void nbd_close(BlockDriverState *bs) { -BDRVNBDState *s = bs->opaque; - nbd_client_close(bs); -yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name)); -nbd_clear_bdrvstate(s); +nbd_clear_bdrvstate(bs); } /* -- 2.29.2
[PATCH v3 04/33] block/nbd: nbd_client_handshake(): fix leak of s->ioc
Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/nbd.c b/block/nbd.c index 272af60b44..6efa11a185 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -1891,6 +1891,8 @@ static int nbd_client_handshake(BlockDriverState *bs, Error **errp) nbd_yank, bs); object_unref(OBJECT(s->sioc)); s->sioc = NULL; +object_unref(OBJECT(s->ioc)); +s->ioc = NULL; return ret; } -- 2.29.2
[PATCH v3 03/33] block/nbd: ensure ->connection_thread is always valid
From: Roman Kagan Simplify lifetime management of BDRVNBDState->connect_thread by delaying the possible cleanup of it until the BDRVNBDState itself goes away. This also reverts 0267101af6 "block/nbd: fix possible use after free of s->connect_thread" as now s->connect_thread can't be cleared until the very end. Signed-off-by: Roman Kagan [vsementsov: rebase, revert 0267101af6 changes] Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 56 - 1 file changed, 21 insertions(+), 35 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index a407a3814b..272af60b44 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -144,17 +144,31 @@ typedef struct BDRVNBDState { NBDConnectThread *connect_thread; } BDRVNBDState; +static void nbd_free_connect_thread(NBDConnectThread *thr); static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); -static void nbd_co_establish_connection_cancel(BlockDriverState *bs, - bool detach); +static void nbd_co_establish_connection_cancel(BlockDriverState *bs); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BlockDriverState *bs) { BDRVNBDState *s = (BDRVNBDState *)bs->opaque; +NBDConnectThread *thr = s->connect_thread; +bool thr_running; + +qemu_mutex_lock(>mutex); +thr_running = thr->state == CONNECT_THREAD_RUNNING; +if (thr_running) { +thr->state = CONNECT_THREAD_RUNNING_DETACHED; +} +qemu_mutex_unlock(>mutex); + +/* the runaway thread will clean it up itself */ +if (!thr_running) { +nbd_free_connect_thread(thr); +} yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name)); @@ -297,7 +311,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(bs, false); +nbd_co_establish_connection_cancel(bs); reconnect_delay_timer_del(s); @@ -337,7 +351,7 @@ static void nbd_teardown_connection(BlockDriverState *bs) if (s->connection_co_sleep_ns_state) { qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(bs, true); +nbd_co_establish_connection_cancel(bs); } if (qemu_in_coroutine()) { s->teardown_co = qemu_coroutine_self(); @@ -448,11 +462,6 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) BDRVNBDState *s = bs->opaque; NBDConnectThread *thr = s->connect_thread; -if (!thr) { -/* detached */ -return -1; -} - qemu_mutex_lock(>mutex); switch (thr->state) { @@ -496,12 +505,6 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) s->wait_connect = true; qemu_coroutine_yield(); -if (!s->connect_thread) { -/* detached */ -return -1; -} -assert(thr == s->connect_thread); - qemu_mutex_lock(>mutex); switch (thr->state) { @@ -549,18 +552,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) * nbd_co_establish_connection_cancel * Cancel nbd_co_establish_connection asynchronously: it will finish soon, to * allow drained section to begin. - * - * If detach is true, also cleanup the state (or if thread is running, move it - * to CONNECT_THREAD_RUNNING_DETACHED state). s->connect_thread becomes NULL if - * detach is true. */ -static void nbd_co_establish_connection_cancel(BlockDriverState *bs, - bool detach) +static void nbd_co_establish_connection_cancel(BlockDriverState *bs) { BDRVNBDState *s = bs->opaque; NBDConnectThread *thr = s->connect_thread; bool wake = false; -bool do_free = false; qemu_mutex_lock(>mutex); @@ -571,21 +568,10 @@ static void nbd_co_establish_connection_cancel(BlockDriverState *bs, s->wait_connect = false; wake = true; } -if (detach) { -thr->state = CONNECT_THREAD_RUNNING_DETACHED; -s->connect_thread = NULL; -} -} else if (detach) { -do_free = true; } qemu_mutex_unlock(>mutex); -if (do_free) { -nbd_free_connect_thread(thr); -s->connect_thread = NULL; -} - if (wake) { aio_co_wake(s->connection_co); } @@ -2306,6 +2292,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, goto fail; } +nbd_init_connect_thread(s); + /* * establish TCP connection, return error if it fails * TODO: Configurable retry-until-timeout behaviour. @@ -2322,8 +2310,6 @@ static int nbd_open(BlockDriverState *bs, QDict
[PATCH v3 05/33] block/nbd: BDRVNBDState: drop unused connect_err and connect_status
These fields are write-only. Drop them. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Roman Kagan --- block/nbd.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 6efa11a185..d67556c7ee 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -121,8 +121,6 @@ typedef struct BDRVNBDState { bool wait_drained_end; int in_flight; NBDClientState state; -int connect_status; -Error *connect_err; bool wait_in_flight; QEMUTimer *reconnect_delay_timer; @@ -580,7 +578,6 @@ static void nbd_co_establish_connection_cancel(BlockDriverState *bs) static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) { int ret; -Error *local_err = NULL; if (!nbd_client_connecting(s)) { return; @@ -621,14 +618,14 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) s->ioc = NULL; } -if (nbd_co_establish_connection(s->bs, _err) < 0) { +if (nbd_co_establish_connection(s->bs, NULL) < 0) { ret = -ECONNREFUSED; goto out; } bdrv_dec_in_flight(s->bs); -ret = nbd_client_handshake(s->bs, _err); +ret = nbd_client_handshake(s->bs, NULL); if (s->drained) { s->wait_drained_end = true; @@ -643,11 +640,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) bdrv_inc_in_flight(s->bs); out: -s->connect_status = ret; -error_free(s->connect_err); -s->connect_err = NULL; -error_propagate(>connect_err, local_err); - if (ret >= 0) { /* successfully connected */ s->state = NBD_CLIENT_CONNECTED; -- 2.29.2
[PATCH v3 01/33] block/nbd: fix channel object leak
From: Roman Kagan nbd_free_connect_thread leaks the channel object if it hasn't been stolen. Unref it and fix the leak. Signed-off-by: Roman Kagan --- block/nbd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/nbd.c b/block/nbd.c index 1d4668d42d..739ae2941f 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -385,6 +385,7 @@ static void nbd_free_connect_thread(NBDConnectThread *thr) { if (thr->sioc) { qio_channel_close(QIO_CHANNEL(thr->sioc), NULL); +object_unref(OBJECT(thr->sioc)); } error_free(thr->err); qapi_free_SocketAddress(thr->saddr); -- 2.29.2
[PATCH v3 00/33] block/nbd: rework client connection
The series substitutes "[PATCH v2 00/10] block/nbd: move connection code to separate file" Supersedes: <20210408140827.332915-1-vsement...@virtuozzo.com> so it's called v3 block/nbd.c is overcomplicated. These series is a big refactoring, which finally drops all the complications around drained sections and context switching, including abuse of bs->in_flight counter. Also, at the end of the series we don't cancel reconnect on drained sections (and don't cancel requests waiting for reconnect on drained section begin), which fixes a problem reported by Roman. The series is also available at tag up-nbd-client-connection-v3 in git https://src.openvz.org/scm/~vsementsov/qemu.git v3: Changes in first part of the series (main thing is not using refcnt, but instead (modified) Roman's patch): 01-04: new 05: add Roman's r-b 06: new 07: now, new aio_co_schedule(NULL, thr->wait_co) is used 08: reworked, we now need also bool detached, as we don't have refcnt 09,10: add Roman's r-b 11: rebased, don't modify nbd_free_connect_thread() name at this point 12: add Roman's r-b 13: new 14: rebased Other patches are new. Roman Kagan (2): block/nbd: fix channel object leak block/nbd: ensure ->connection_thread is always valid Vladimir Sementsov-Ogievskiy (31): block/nbd: fix how state is cleared on nbd_open() failure paths block/nbd: nbd_client_handshake(): fix leak of s->ioc block/nbd: BDRVNBDState: drop unused connect_err and connect_status util/async: aio_co_schedule(): support reschedule in same ctx block/nbd: simplify waking of nbd_co_establish_connection() block/nbd: drop thr->state block/nbd: bs-independent interface for nbd_co_establish_connection() block/nbd: make nbd_co_establish_connection_cancel() bs-independent block/nbd: rename NBDConnectThread to NBDClientConnection block/nbd: introduce nbd_client_connection_new() block/nbd: introduce nbd_client_connection_release() nbd: move connection code from block/nbd to nbd/client-connection nbd/client-connection: use QEMU_LOCK_GUARD nbd/client-connection: add possibility of negotiation nbd/client-connection: implement connection retry nbd/client-connection: shutdown connection on release block/nbd: split nbd_handle_updated_info out of nbd_client_handshake() block/nbd: use negotiation of NBDClientConnection qemu-socket: pass monitor link to socket_get_fd directly block/nbd: pass monitor directly to connection thread block/nbd: nbd_teardown_connection() don't touch s->sioc block/nbd: drop BDRVNBDState::sioc nbd/client-connection: return only one io channel block-coroutine-wrapper: allow non bdrv_ prefix block/nbd: split nbd_co_do_establish_connection out of nbd_reconnect_attempt nbd/client-connection: do qio_channel_set_delay(false) nbd/client-connection: add option for non-blocking connection attempt block/nbd: reuse nbd_co_do_establish_connection() in nbd_open() block/nbd: add nbd_clinent_connected() helper block/nbd: safer transition to receiving request block/nbd: drop connection_co block/coroutines.h | 6 + include/block/aio.h| 2 +- include/block/nbd.h| 19 + include/io/channel-socket.h| 20 + include/qemu/sockets.h | 2 +- block/nbd.c| 908 +++-- io/channel-socket.c| 17 +- nbd/client-connection.c| 364 nbd/client.c | 2 - tests/unit/test-util-sockets.c | 16 +- util/async.c | 8 + util/qemu-sockets.c| 10 +- nbd/meson.build| 1 + scripts/block-coroutine-wrapper.py | 7 +- 14 files changed, 666 insertions(+), 716 deletions(-) create mode 100644 nbd/client-connection.c -- 2.29.2
Re: about mirror cancel
16.04.2021 10:11, Max Reitz wrote: On 16.04.21 09:05, Max Reitz wrote: On 15.04.21 20:46, Vladimir Sementsov-Ogievskiy wrote: [...] Note, that if cancelling all in-flight requests on target is wrong on mirror cancel, we still don't have real bug, as the only implementation of .bdrv_cancel_in_flight is stopping reconnect waiting in nbd driver. So, we'll cancel requests only if connection is already lost anyway. But that probably means, that correct name of the handler would be .bdrv_cancel_in_fligth_requests_that_will_most_probably_fail_anyway().. It’s documentation states that it should cancel all in-flight requests, so it’s more likely it just isn’t implemented where it could be. Oh, I now see you added it in the same series. Well, then I suppose you’re free to change the semantics as you see fit. But be aware that even cancelling those requests means that you abandon the target. So it must then fail instead of emitting the COMPLETED event (AFAIR the mirror job emits COMPLETED when cancelled in READY with force=false). Note that BLOCK_JOB_COMPLETED can indicate failure, it has error field.. Interesting, does libvirt aware of it.. If the user wants the mirror job to create a consistent copy and so cancels it after READY (with force=false), I don’t know whether cancelling those hanging requests is what we want. If the cancel hangs and the user sees this, they are still free to decide to cancel again with force=true, no? Right. All right, I'll remake the feature to cancel requests only with force=true. Thanks for explanations, and great that I decided to ask. -- Best regards, Vladimir
[PATCH 1/2] hw/block/nvme: consider metadata read aio return value in compare
Currently in compare command metadata aio read blk_aio_preadv return value ignored, consider it and complete the block accounting. Signed-off-by: Gollu Appalanaidu --- hw/block/nvme.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 624a1431d0..c2727540f1 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -2369,10 +2369,19 @@ static void nvme_compare_mdata_cb(void *opaque, int ret) uint32_t reftag = le32_to_cpu(rw->reftag); struct nvme_compare_ctx *ctx = req->opaque; g_autofree uint8_t *buf = NULL; +BlockBackend *blk = ns->blkconf.blk; +BlockAcctCookie *acct = >acct; +BlockAcctStats *stats = blk_get_stats(blk); uint16_t status = NVME_SUCCESS; trace_pci_nvme_compare_mdata_cb(nvme_cid(req)); +if (ret) { +block_acct_failed(stats, acct); +nvme_aio_err(req, ret); +goto out; +} + buf = g_malloc(ctx->mdata.iov.size); status = nvme_bounce_mdata(n, buf, ctx->mdata.iov.size, @@ -2421,6 +2430,8 @@ static void nvme_compare_mdata_cb(void *opaque, int ret) goto out; } +block_acct_done(stats, acct); + out: qemu_iovec_destroy(>data.iov); g_free(ctx->data.bounce); -- 2.17.1
[PATCH 2/2] hw/block/nvme: fix lbaf formats initialization
Currently LBAF formats are being intialized based on metadata size if and only if nvme-ns "ms" parameter is non-zero value. Since FormatNVM command being supported device parameter "ms" may not be the criteria to initialize the supported LBAFs. Signed-off-by: Gollu Appalanaidu --- hw/block/nvme-ns.c | 48 ++ 1 file changed, 19 insertions(+), 29 deletions(-) diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 7bb618f182..573dbb5a9d 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) ds = 31 - clz32(ns->blkconf.logical_block_size); ms = ns->params.ms; -if (ns->params.ms) { -id_ns->mc = 0x3; +id_ns->mc = 0x3; -if (ns->params.mset) { -id_ns->flbas |= 0x10; -} +if (ms && ns->params.mset) { +id_ns->flbas |= 0x10; +} -id_ns->dpc = 0x1f; -id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; - -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 9, .ms = 8 }, -[2] = { .ds = 9, .ms = 16 }, -[3] = { .ds = 9, .ms = 64 }, -[4] = { .ds = 12 }, -[5] = { .ds = 12, .ms = 8 }, -[6] = { .ds = 12, .ms = 16 }, -[7] = { .ds = 12, .ms = 64 }, -}; - -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 7; -} else { -NvmeLBAF lbaf[16] = { -[0] = { .ds = 9 }, -[1] = { .ds = 12 }, -}; +id_ns->dpc = 0x1f; +id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi; -memcpy(_ns->lbaf, , sizeof(lbaf)); -id_ns->nlbaf = 1; -} +NvmeLBAF lbaf[16] = { +[0] = { .ds = 9 }, +[1] = { .ds = 9, .ms = 8 }, +[2] = { .ds = 9, .ms = 16 }, +[3] = { .ds = 9, .ms = 64 }, +[4] = { .ds = 12 }, +[5] = { .ds = 12, .ms = 8 }, +[6] = { .ds = 12, .ms = 16 }, +[7] = { .ds = 12, .ms = 64 }, +}; + +memcpy(_ns->lbaf, , sizeof(lbaf)); +id_ns->nlbaf = 7; for (i = 0; i <= id_ns->nlbaf; i++) { NvmeLBAF *lbaf = _ns->lbaf[i]; -- 2.17.1
Re: about mirror cancel
On 16.04.21 09:05, Max Reitz wrote: On 15.04.21 20:46, Vladimir Sementsov-Ogievskiy wrote: [...] Note, that if cancelling all in-flight requests on target is wrong on mirror cancel, we still don't have real bug, as the only implementation of .bdrv_cancel_in_flight is stopping reconnect waiting in nbd driver. So, we'll cancel requests only if connection is already lost anyway. But that probably means, that correct name of the handler would be .bdrv_cancel_in_fligth_requests_that_will_most_probably_fail_anyway().. It’s documentation states that it should cancel all in-flight requests, so it’s more likely it just isn’t implemented where it could be. Oh, I now see you added it in the same series. Well, then I suppose you’re free to change the semantics as you see fit. But be aware that even cancelling those requests means that you abandon the target. So it must then fail instead of emitting the COMPLETED event (AFAIR the mirror job emits COMPLETED when cancelled in READY with force=false). If the user wants the mirror job to create a consistent copy and so cancels it after READY (with force=false), I don’t know whether cancelling those hanging requests is what we want. If the cancel hangs and the user sees this, they are still free to decide to cancel again with force=true, no? Max
Re: about mirror cancel
On 15.04.21 20:46, Vladimir Sementsov-Ogievskiy wrote: Hi all! Recently I've implemented fast-cancelling of mirror job: do bdrv_cancel_in_flight() in mirror_cancel(). Now I'm in doubt: is it a correct thing? I heard, that mirror-cancel is a kind of valid mirror completion.. Looking at documentation: # Note that if you issue 'block-job-cancel' after 'drive-mirror' has indicated # (via the event BLOCK_JOB_READY) that the source and destination are # synchronized, then the event triggered by this command changes to # BLOCK_JOB_COMPLETED, to indicate that the mirroring has ended and the # destination now has a point-in-time copy tied to the time of the cancellation. So, in other words, do we guarantee something to the user, if it calls mirror-cancel in ready state? Does this abuse exist in libvirt? How is it abuse it if it’s documented? AFAIR it does exist, because libvirt’s blockcopy always uses mirror (with --pivot it’s allowed to complete, without it is cancelled). (And the point of course is that if you want mirror to create a copy without pivoting, you need this behavior, because otherwise the target may be in an inconsistent state.) Note, that if cancelling all in-flight requests on target is wrong on mirror cancel, we still don't have real bug, as the only implementation of .bdrv_cancel_in_flight is stopping reconnect waiting in nbd driver. So, we'll cancel requests only if connection is already lost anyway. But that probably means, that correct name of the handler would be .bdrv_cancel_in_fligth_requests_that_will_most_probably_fail_anyway().. It’s documentation states that it should cancel all in-flight requests, so it’s more likely it just isn’t implemented where it could be. And it also means, that abuse of mirror-cancel as valid completion is a bad idea, as we can't distinguish the cases when user calls job-cancel to have a kind of point-in-time copy, or just to cancel job (and being not interested in the final state of target). So, probably we need an option boolean argument for blockjob-cancel, like "hard", that will cancel in-flight writes on target node.. There is @force. See commit b76e4458b1eb3c3. Max