date:20210416

Re: [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS

2021-04-16 Thread Nir Soffer

On Fri, Apr 16, 2021 at 8:23 AM Thomas Huth  wrote:
>
> A customer reported that running
>
>  qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2
>
> fails for them with the following error message when the images are
> stored on a GPFS file system:
>
>  qemu-img: error while writing sector 0: Invalid argument
>
> After analyzing the strace output, it seems like the problem is in
> handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE)
> returns EINVAL, which can apparently happen if the file system has
> a different idea of the granularity of the operation. It's arguably
> a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL
> according to the man-page of fallocate(), but the file system is out
> there in production and so we have to deal with it. In commit 294682cc3a
> ("block: workaround for unaligned byte range in fallocate()") we also
> already applied the a work-around for the same problem to the earlier
> fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the
> PUNCH_HOLE call.
>
> Signed-off-by: Thomas Huth 
> ---
>  block/file-posix.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 20e14f8e96..7a40428d52 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1675,6 +1675,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
>  }
>  s->has_fallocate = false;
>  } else if (ret != -ENOTSUP) {
> +if (ret == -EINVAL) {
> +/*
> + * File systems like GPFS do not like unaligned byte ranges,
> + * treat it like unsupported (so caller falls back to pwrite)
> + */
> +return -ENOTSUP;

This skips the next fallback, using plain fallocate(0) if we write
after the end of the file. Is this intended?

We can treat the buggy EINVAL return value as "filesystem is buggy,
let's not try other options", or "let's try the next option". Since falling
back to actually writing zeroes is so much slower, I think it is better to
try the next option.

This issue affects also libnbd (nbdcopy file backend).

Do we have a bug for GFS?

Nir

> +}
>  return ret;
>  } else {
>  s->has_discard = false;
> --
> 2.27.0
>
>

[PATCH v3] hw/block/nvme: fix lbaf formats initialization

2021-04-16 Thread Gollu Appalanaidu

Currently LBAF formats are being intialized based on metadata
size if and only if nvme-ns "ms" parameter is non-zero value.
Since FormatNVM command being supported device parameter "ms"
may not be the criteria to initialize the supported LBAFs.

Signed-off-by: Gollu Appalanaidu 
---
-v3: Remove "mset" constraint  check if ms < 8, "mset" can be
 set even when ms < 8 and non-zero.

-v2: Addressing review comments (Klaus)
 Change the current "pi" and "ms" constraint check such that it
 will throw the error if ms < 8 and if namespace protection info,
 location and metadata settings are set.
 Splitting this from compare fix patch series.

 hw/block/nvme-ns.c | 58 --
 1 file changed, 25 insertions(+), 33 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7bb618f182..594b0003cf 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 ds = 31 - clz32(ns->blkconf.logical_block_size);
 ms = ns->params.ms;
 
-if (ns->params.ms) {
-id_ns->mc = 0x3;
+id_ns->mc = 0x3;
 
-if (ns->params.mset) {
-id_ns->flbas |= 0x10;
-}
+if (ms && ns->params.mset) {
+id_ns->flbas |= 0x10;
+}
 
-id_ns->dpc = 0x1f;
-id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;
-
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9   },
-[1] = { .ds =  9, .ms =  8 },
-[2] = { .ds =  9, .ms = 16 },
-[3] = { .ds =  9, .ms = 64 },
-[4] = { .ds = 12   },
-[5] = { .ds = 12, .ms =  8 },
-[6] = { .ds = 12, .ms = 16 },
-[7] = { .ds = 12, .ms = 64 },
-};
-
-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 7;
-} else {
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9 },
-[1] = { .ds = 12 },
-};
+id_ns->dpc = 0x1f;
+id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;
 
-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 1;
-}
+NvmeLBAF lbaf[16] = {
+[0] = { .ds =  9   },
+[1] = { .ds =  9, .ms =  8 },
+[2] = { .ds =  9, .ms = 16 },
+[3] = { .ds =  9, .ms = 64 },
+[4] = { .ds = 12   },
+[5] = { .ds = 12, .ms =  8 },
+[6] = { .ds = 12, .ms = 16 },
+[7] = { .ds = 12, .ms = 64 },
+};
+
+memcpy(_ns->lbaf, , sizeof(lbaf));
+id_ns->nlbaf = 7;
 
 for (i = 0; i <= id_ns->nlbaf; i++) {
 NvmeLBAF *lbaf = _ns->lbaf[i];
@@ -395,10 +385,12 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, 
NvmeNamespace *ns,
 return -1;
 }
 
-if (ns->params.pi && ns->params.ms < 8) {
-error_setg(errp, "at least 8 bytes of metadata required to enable "
-   "protection information");
-return -1;
+if (ns->params.ms < 8) {
+if (ns->params.pi || ns->params.pil) {
+error_setg(errp, "at least 8 bytes of metadata required to enable "
+"protection information, protection information location");
+return -1;
+}
 }
 
 if (ns->params.nsid > NVME_MAX_NAMESPACES) {
-- 
2.17.1

[PATCH] hw/ide: Fix crash when plugging a piix3-ide device into the x-remote machine

2021-04-16 Thread Thomas Huth

QEMU currently crashes when the user tries to do something like:

 qemu-system-x86_64 -M x-remote -device piix3-ide

This happens because the "isabus" variable is not initialized with
the x-remote machine yet. Add a proper check for this condition
and propagate the error to the caller, so we can fail there gracefully.

Signed-off-by: Thomas Huth 
---
 hw/ide/ioport.c   | 16 ++--
 hw/ide/piix.c | 22 +-
 hw/isa/isa-bus.c  | 14 ++
 include/hw/ide/internal.h |  2 +-
 include/hw/isa/isa.h  | 13 -
 5 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/hw/ide/ioport.c b/hw/ide/ioport.c
index b613ff3bba..e6caa537fa 100644
--- a/hw/ide/ioport.c
+++ b/hw/ide/ioport.c
@@ -50,15 +50,19 @@ static const MemoryRegionPortio ide_portio2_list[] = {
 PORTIO_END_OF_LIST(),
 };
 
-void ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2)
+int ide_init_ioport(IDEBus *bus, ISADevice *dev, int iobase, int iobase2)
 {
+int ret;
+
 /* ??? Assume only ISA and PCI configurations, and that the PCI-ISA
bridge has been setup properly to always register with ISA.  */
-isa_register_portio_list(dev, >portio_list,
- iobase, ide_portio_list, bus, "ide");
+ret = isa_register_portio_list(dev, >portio_list,
+   iobase, ide_portio_list, bus, "ide");
 
-if (iobase2) {
-isa_register_portio_list(dev, >portio2_list,
- iobase2, ide_portio2_list, bus, "ide");
+if (ret == 0 && iobase2) {
+ret = isa_register_portio_list(dev, >portio2_list,
+   iobase2, ide_portio2_list, bus, "ide");
 }
+
+return ret;
 }
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index b9860e35a5..d3e738320b 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -26,6 +26,7 @@
 #include "qemu/osdep.h"
 #include "hw/pci/pci.h"
 #include "migration/vmstate.h"
+#include "qapi/error.h"
 #include "qemu/module.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/blockdev.h"
@@ -123,7 +124,8 @@ static void piix_ide_reset(DeviceState *dev)
 pci_conf[0x20] = 0x01; /* BMIBA: 20-23h */
 }
 
-static void pci_piix_init_ports(PCIIDEState *d) {
+static int pci_piix_init_ports(PCIIDEState *d)
+{
 static const struct {
 int iobase;
 int iobase2;
@@ -132,24 +134,30 @@ static void pci_piix_init_ports(PCIIDEState *d) {
 {0x1f0, 0x3f6, 14},
 {0x170, 0x376, 15},
 };
-int i;
+int i, ret;
 
 for (i = 0; i < 2; i++) {
 ide_bus_new(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
-ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
-port_info[i].iobase2);
+ret = ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
+  port_info[i].iobase2);
+if (ret) {
+return ret;
+}
 ide_init2(>bus[i], isa_get_irq(NULL, port_info[i].isairq));
 
 bmdma_init(>bus[i], >bmdma[i], d);
 d->bmdma[i].bus = >bus[i];
 ide_register_restart_cb(>bus[i]);
 }
+
+return 0;
 }
 
 static void pci_piix_ide_realize(PCIDevice *dev, Error **errp)
 {
 PCIIDEState *d = PCI_IDE(dev);
 uint8_t *pci_conf = dev->config;
+int rc;
 
 pci_conf[PCI_CLASS_PROG] = 0x80; // legacy ATA mode
 
@@ -158,7 +166,11 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
**errp)
 
 vmstate_register(VMSTATE_IF(dev), 0, _ide_pci, d);
 
-pci_piix_init_ports(d);
+rc = pci_piix_init_ports(d);
+if (rc) {
+error_setg_errno(errp, -rc, "Failed to realize %s",
+ object_get_typename(OBJECT(dev)));
+}
 }
 
 int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux)
diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c
index 7820068e6e..cffaa35e9c 100644
--- a/hw/isa/isa-bus.c
+++ b/hw/isa/isa-bus.c
@@ -131,13 +131,17 @@ void isa_register_ioport(ISADevice *dev, MemoryRegion 
*io, uint16_t start)
 isa_init_ioport(dev, start);
 }
 
-void isa_register_portio_list(ISADevice *dev,
-  PortioList *piolist, uint16_t start,
-  const MemoryRegionPortio *pio_start,
-  void *opaque, const char *name)
+int isa_register_portio_list(ISADevice *dev,
+ PortioList *piolist, uint16_t start,
+ const MemoryRegionPortio *pio_start,
+ void *opaque, const char *name)
 {
 assert(piolist && !piolist->owner);
 
+if (!isabus) {
+return -ENODEV;
+}
+
 /* START is how we should treat DEV, regardless of the actual
contents of the portio array.  This is how the old code
actually handled e.g. the FDC device.  */
@@ -145,6 +149,8 @@ void isa_register_portio_list(ISADevice *dev,
 
 portio_list_init(piolist, OBJECT(dev), pio_start, opaque, name);

Re: [PATCH 1/2] hw/block/nvme: consider metadata read aio return value in compare

2021-04-16 Thread Klaus Jensen


On Apr 16 12:52, Gollu Appalanaidu wrote:

Currently in compare command metadata aio read blk_aio_preadv return
value ignored, consider it and complete the block accounting.

Signed-off-by: Gollu Appalanaidu 
---
hw/block/nvme.c | 11 +++
1 file changed, 11 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 624a1431d0..c2727540f1 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2369,10 +2369,19 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
uint32_t reftag = le32_to_cpu(rw->reftag);
struct nvme_compare_ctx *ctx = req->opaque;
g_autofree uint8_t *buf = NULL;
+BlockBackend *blk = ns->blkconf.blk;
+BlockAcctCookie *acct = >acct;
+BlockAcctStats *stats = blk_get_stats(blk);
uint16_t status = NVME_SUCCESS;

trace_pci_nvme_compare_mdata_cb(nvme_cid(req));

+if (ret) {
+block_acct_failed(stats, acct);
+nvme_aio_err(req, ret);
+goto out;
+}
+
buf = g_malloc(ctx->mdata.iov.size);

status = nvme_bounce_mdata(n, buf, ctx->mdata.iov.size,
@@ -2421,6 +2430,8 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
goto out;
}

+block_acct_done(stats, acct);
+
out:
qemu_iovec_destroy(>data.iov);
g_free(ctx->data.bounce);
--
2.17.1




Good fix, thanks! Since there is no crash, data corruption or other 
"bad" behavior, this isn't critical for v6.0.


Might consider it for a potential stable release though, so I'll add a 
Fixes: tag and queue it up.


Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

[PATCH v2] hw/block/nvme: fix lbaf formats initialization

2021-04-16 Thread Gollu Appalanaidu

Currently LBAF formats are being intialized based on metadata
size if and only if nvme-ns "ms" parameter is non-zero value.
Since FormatNVM command being supported device parameter "ms"
may not be the criteria to initialize the supported LBAFs.

Signed-off-by: Gollu Appalanaidu 
---

-v2: Addressing review comments (Klaus)
 Change the current "pi" and "ms" constraint check such that it
 will throw the error if ms < 8 and if namespace protection info,
 location and metadata settings are set.
 Splitting this from compare fix patch series.

 hw/block/nvme-ns.c | 59 --
 1 file changed, 26 insertions(+), 33 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7bb618f182..160873a8d4 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 ds = 31 - clz32(ns->blkconf.logical_block_size);
 ms = ns->params.ms;
 
-if (ns->params.ms) {
-id_ns->mc = 0x3;
+id_ns->mc = 0x3;
 
-if (ns->params.mset) {
-id_ns->flbas |= 0x10;
-}
+if (ns->params.mset) {
+id_ns->flbas |= 0x10;
+}
 
-id_ns->dpc = 0x1f;
-id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;
-
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9   },
-[1] = { .ds =  9, .ms =  8 },
-[2] = { .ds =  9, .ms = 16 },
-[3] = { .ds =  9, .ms = 64 },
-[4] = { .ds = 12   },
-[5] = { .ds = 12, .ms =  8 },
-[6] = { .ds = 12, .ms = 16 },
-[7] = { .ds = 12, .ms = 64 },
-};
-
-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 7;
-} else {
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9 },
-[1] = { .ds = 12 },
-};
+id_ns->dpc = 0x1f;
+id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;
 
-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 1;
-}
+NvmeLBAF lbaf[16] = {
+[0] = { .ds =  9   },
+[1] = { .ds =  9, .ms =  8 },
+[2] = { .ds =  9, .ms = 16 },
+[3] = { .ds =  9, .ms = 64 },
+[4] = { .ds = 12   },
+[5] = { .ds = 12, .ms =  8 },
+[6] = { .ds = 12, .ms = 16 },
+[7] = { .ds = 12, .ms = 64 },
+};
+
+memcpy(_ns->lbaf, , sizeof(lbaf));
+id_ns->nlbaf = 7;
 
 for (i = 0; i <= id_ns->nlbaf; i++) {
 NvmeLBAF *lbaf = _ns->lbaf[i];
@@ -395,10 +385,13 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, 
NvmeNamespace *ns,
 return -1;
 }
 
-if (ns->params.pi && ns->params.ms < 8) {
-error_setg(errp, "at least 8 bytes of metadata required to enable "
-   "protection information");
-return -1;
+if (ns->params.ms < 8) {
+if (ns->params.pi || ns->params.pil || ns->params.mset) {
+error_setg(errp, "at least 8 bytes of metadata required to enable "
+"protection information, protection information location 
and "
+"metadata settings");
+return -1;
+}
 }
 
 if (ns->params.nsid > NVME_MAX_NAMESPACES) {
-- 
2.17.1

Re: [PATCH 2/2] hw/block/nvme: fix lbaf formats initialization

2021-04-16 Thread Gollu Appalanaidu


On Fri, Apr 16, 2021 at 10:48:16AM +0200, Klaus Jensen wrote:

On Apr 16 12:52, Gollu Appalanaidu wrote:

Currently LBAF formats are being intialized based on metadata
size if and only if nvme-ns "ms" parameter is non-zero value.
Since FormatNVM command being supported device parameter "ms"
may not be the criteria to initialize the supported LBAFs.

Signed-off-by: Gollu Appalanaidu 
---
hw/block/nvme-ns.c | 48 ++
1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7bb618f182..573dbb5a9d 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
   ds = 31 - clz32(ns->blkconf.logical_block_size);
   ms = ns->params.ms;

-if (ns->params.ms) {
-id_ns->mc = 0x3;
+id_ns->mc = 0x3;

-if (ns->params.mset) {
-id_ns->flbas |= 0x10;
-}
+if (ms && ns->params.mset) {
+id_ns->flbas |= 0x10;
+}

-id_ns->dpc = 0x1f;
-id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;
-
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9   },
-[1] = { .ds =  9, .ms =  8 },
-[2] = { .ds =  9, .ms = 16 },
-[3] = { .ds =  9, .ms = 64 },
-[4] = { .ds = 12   },
-[5] = { .ds = 12, .ms =  8 },
-[6] = { .ds = 12, .ms = 16 },
-[7] = { .ds = 12, .ms = 64 },
-};
-
-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 7;
-} else {
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9 },
-[1] = { .ds = 12 },
-};
+id_ns->dpc = 0x1f;
+id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;


While nvme_ns_check_constraints() will error out if pi is set and the 
metadata bytes are insufficient, I don't think this should set bit 3 
unless both metadata and pi is enabled. It's not against the spec, but 
it's just slightly weird.


okay, will modify current "ms" and "pi" constraint check like this:
if (ns->params.ms < 8) {
   if (ns->params.pi || ns->params.pil || ns->params.mset) {
   error_setg(errp, "at least 8 bytes of metadata required to enable "
   "protection information, protection information location and 
"
   "metadata settings");
   return -1;
   }
}
and "mset" is set will set directly without checing "ms" in nvme_ns_init()




-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 1;
-}
+NvmeLBAF lbaf[16] = {
+[0] = { .ds =  9   },
+[1] = { .ds =  9, .ms =  8 },
+[2] = { .ds =  9, .ms = 16 },
+[3] = { .ds =  9, .ms = 64 },
+[4] = { .ds = 12   },
+[5] = { .ds = 12, .ms =  8 },
+[6] = { .ds = 12, .ms = 16 },
+[7] = { .ds = 12, .ms = 64 },
+};
+
+memcpy(_ns->lbaf, , sizeof(lbaf));
+id_ns->nlbaf = 7;

   for (i = 0; i <= id_ns->nlbaf; i++) {
   NvmeLBAF *lbaf = _ns->lbaf[i];
--
2.17.1

Re: [PATCH 2/2] hw/block/nvme: fix lbaf formats initialization

2021-04-16 Thread Klaus Jensen


On Apr 16 12:52, Gollu Appalanaidu wrote:

Currently LBAF formats are being intialized based on metadata
size if and only if nvme-ns "ms" parameter is non-zero value.
Since FormatNVM command being supported device parameter "ms"
may not be the criteria to initialize the supported LBAFs.

Signed-off-by: Gollu Appalanaidu 
---
hw/block/nvme-ns.c | 48 ++
1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7bb618f182..573dbb5a9d 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
ds = 31 - clz32(ns->blkconf.logical_block_size);
ms = ns->params.ms;

-if (ns->params.ms) {
-id_ns->mc = 0x3;
+id_ns->mc = 0x3;

-if (ns->params.mset) {
-id_ns->flbas |= 0x10;
-}
+if (ms && ns->params.mset) {
+id_ns->flbas |= 0x10;
+}

-id_ns->dpc = 0x1f;
-id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;
-
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9   },
-[1] = { .ds =  9, .ms =  8 },
-[2] = { .ds =  9, .ms = 16 },
-[3] = { .ds =  9, .ms = 64 },
-[4] = { .ds = 12   },
-[5] = { .ds = 12, .ms =  8 },
-[6] = { .ds = 12, .ms = 16 },
-[7] = { .ds = 12, .ms = 64 },
-};
-
-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 7;
-} else {
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9 },
-[1] = { .ds = 12 },
-};
+id_ns->dpc = 0x1f;
+id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;


While nvme_ns_check_constraints() will error out if pi is set and the 
metadata bytes are insufficient, I don't think this should set bit 3 
unless both metadata and pi is enabled. It's not against the spec, but 
it's just slightly weird.




-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 1;
-}
+NvmeLBAF lbaf[16] = {
+[0] = { .ds =  9   },
+[1] = { .ds =  9, .ms =  8 },
+[2] = { .ds =  9, .ms = 16 },
+[3] = { .ds =  9, .ms = 64 },
+[4] = { .ds = 12   },
+[5] = { .ds = 12, .ms =  8 },
+[6] = { .ds = 12, .ms = 16 },
+[7] = { .ds = 12, .ms = 64 },
+};
+
+memcpy(_ns->lbaf, , sizeof(lbaf));
+id_ns->nlbaf = 7;

for (i = 0; i <= id_ns->nlbaf; i++) {
NvmeLBAF *lbaf = _ns->lbaf[i];
--
2.17.1






signature.asc
Description: PGP signature

[PATCH v3 26/33] block-coroutine-wrapper: allow non bdrv_ prefix

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

We are going to reuse the script to generate a qcow2_ function in
further commit. Prepare the script now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/block-coroutine-wrapper.py | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/scripts/block-coroutine-wrapper.py 
b/scripts/block-coroutine-wrapper.py
index 0461fd1c45..85dbeb9ecf 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -98,12 +98,13 @@ def snake_to_camel(func_name: str) -> str:
 
 
 def gen_wrapper(func: FuncDecl) -> str:
-assert func.name.startswith('bdrv_')
-assert not func.name.startswith('bdrv_co_')
+assert not '_co_' in func.name
 assert func.return_type == 'int'
 assert func.args[0].type in ['BlockDriverState *', 'BdrvChild *']
 
-name = 'bdrv_co_' + func.name[5:]
+subsystem, subname = func.name.split('_', 1)
+
+name = f'{subsystem}_co_{subname}'
 bs = 'bs' if func.args[0].type == 'BlockDriverState *' else 'child->bs'
 struct_name = snake_to_camel(name)
 
-- 
2.29.2

[PATCH v3 29/33] nbd/client-connection: add option for non-blocking connection attempt

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

We'll need a possibility of non-blocking nbd_co_establish_connection(),
so that it returns immediately, and it returns success only if
connections was previously established in background.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h | 2 +-
 block/nbd.c | 2 +-
 nbd/client-connection.c | 8 +++-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 00bf08bade..6d5a807482 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -421,7 +421,7 @@ void nbd_client_connection_release(NBDClientConnection 
*conn);
 
 QIOChannel *coroutine_fn
 nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
-Error **errp);
+bool blocking, Error **errp);
 
 void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection 
*conn);
 
diff --git a/block/nbd.c b/block/nbd.c
index 59971bfba8..863d950abd 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -368,7 +368,7 @@ static int nbd_co_do_establish_connection(BlockDriverState 
*bs, Error **errp)
 
 assert(!s->ioc);
 
-s->ioc = nbd_co_establish_connection(s->conn, >info, errp);
+s->ioc = nbd_co_establish_connection(s->conn, >info, true, errp);
 if (!s->ioc) {
 return -ECONNREFUSED;
 }
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 00efff293f..8914de7b94 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -251,6 +251,8 @@ void nbd_client_connection_release(NBDClientConnection 
*conn)
  * result, just return it now
  *   otherwise if thread is not running, start a thread and wait for completion
  *
+ * If @blocking is false, don't wait for the thread, return immediately.
+ *
  * If @info is not NULL, also do nbd-negotiation after successful connection.
  * In this case info is used only as out parameter, and is fully initialized by
  * nbd_co_establish_connection(). "IN" fields of info as well as related only 
to
@@ -259,7 +261,7 @@ void nbd_client_connection_release(NBDClientConnection 
*conn)
  */
 QIOChannel *coroutine_fn
 nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
-Error **errp)
+bool blocking, Error **errp)
 {
 QIOChannel *ioc;
 QemuThread thread;
@@ -299,6 +301,10 @@ nbd_co_establish_connection(NBDClientConnection *conn, 
NBDExportInfo *info,
connect_thread_func, conn, 
QEMU_THREAD_DETACHED);
 }
 
+if (!blocking) {
+return NULL;
+}
+
 conn->wait_co = qemu_coroutine_self();
 }
 
-- 
2.29.2

[PATCH v3 25/33] nbd/client-connection: return only one io channel

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

block/nbd doesn't need underlying sioc channel anymore. So, we can
update nbd/client-connection interface to return only one top-most io
channel, which is more straight forward.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |  4 ++--
 block/nbd.c | 13 ++---
 nbd/client-connection.c | 33 +
 3 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 10756d2544..00bf08bade 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -419,9 +419,9 @@ NBDClientConnection *nbd_client_connection_new(const 
SocketAddress *saddr,
Monitor *mon);
 void nbd_client_connection_release(NBDClientConnection *conn);
 
-QIOChannelSocket *coroutine_fn
+QIOChannel *coroutine_fn
 nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
-QIOChannel **ioc, Error **errp);
+Error **errp);
 
 void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection 
*conn);
 
diff --git a/block/nbd.c b/block/nbd.c
index f9b56c57b4..15b5921725 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -365,7 +365,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState 
*s)
 {
 int ret;
 AioContext *aio_context = bdrv_get_aio_context(s->bs);
-QIOChannelSocket *sioc;
 
 if (!nbd_client_connecting(s)) {
 return;
@@ -404,20 +403,12 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 s->ioc = NULL;
 }
 
-sioc = nbd_co_establish_connection(s->conn, >info, >ioc, NULL);
-if (!sioc) {
+s->ioc = nbd_co_establish_connection(s->conn, >info, NULL);
+if (!s->ioc) {
 ret = -ECONNREFUSED;
 goto out;
 }
 
-if (s->ioc) {
-/* sioc is referenced by s->ioc */
-object_unref(OBJECT(sioc));
-} else {
-s->ioc = QIO_CHANNEL(sioc);
-}
-sioc = NULL;
-
 qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL);
 qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context);
 
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index c26cd59464..36d2c7c693 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -255,15 +255,15 @@ void nbd_client_connection_release(NBDClientConnection 
*conn)
  * nbd_receive_export_list() would be zero (see description of NBDExportInfo in
  * include/block/nbd.h).
  */
-QIOChannelSocket *coroutine_fn
+QIOChannel *coroutine_fn
 nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
-QIOChannel **ioc, Error **errp)
+Error **errp)
 {
+QIOChannel *ioc;
 QemuThread thread;
 
 if (conn->do_negotiation) {
 assert(info);
-assert(ioc);
 }
 
 WITH_QEMU_LOCK_GUARD(>mutex) {
@@ -277,10 +277,17 @@ nbd_co_establish_connection(NBDClientConnection *conn, 
NBDExportInfo *info,
 if (conn->sioc) {
 /* Previous attempt finally succeeded in background */
 if (conn->do_negotiation) {
-*ioc = g_steal_pointer(>ioc);
+ioc = g_steal_pointer(>ioc);
 memcpy(info, >updated_info, sizeof(*info));
 }
-return g_steal_pointer(>sioc);
+if (ioc) {
+/* TLS channel now has own reference to parent */
+object_unref(OBJECT(conn->sioc));
+} else {
+ioc = QIO_CHANNEL(conn->sioc);
+}
+conn->sioc = NULL;
+return ioc;
 }
 
 conn->running = true;
@@ -311,11 +318,21 @@ nbd_co_establish_connection(NBDClientConnection *conn, 
NBDExportInfo *info,
 } else {
 error_propagate(errp, conn->err);
 conn->err = NULL;
-if (conn->sioc && conn->do_negotiation) {
-*ioc = g_steal_pointer(>ioc);
+if (!conn->sioc) {
+return NULL;
+}
+if (conn->do_negotiation) {
+ioc = g_steal_pointer(>ioc);
 memcpy(info, >updated_info, sizeof(*info));
 }
-return g_steal_pointer(>sioc);
+if (ioc) {
+/* TLS channel now has own reference to parent */
+object_unref(OBJECT(conn->sioc));
+} else {
+ioc = QIO_CHANNEL(conn->sioc);
+}
+conn->sioc = NULL;
+return ioc;
 }
 }
 
-- 
2.29.2

[PATCH v3 28/33] nbd/client-connection: do qio_channel_set_delay(false)

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

nbd_open() does it (through nbd_establish_connection()).
Actually we lost that call on reconnect path in 1dc4718d849e1a1fe
"block/nbd: use non-blocking connect: fix vm hang on connect()"
when we have introduced reconnect thread.

Fixes: 1dc4718d849e1a1fe665ce5241ed79048cfa2cfc
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/client-connection.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 36d2c7c693..00efff293f 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -126,6 +126,8 @@ static int nbd_connect(QIOChannelSocket *sioc, 
SocketAddress *addr,
 return ret;
 }
 
+qio_channel_set_delay(QIO_CHANNEL(sioc), false);
+
 if (!info) {
 return 0;
 }
-- 
2.29.2

[PATCH v3 33/33] block/nbd: drop connection_co

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

OK, that's a big rewrite of the logic.

Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.

We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct.

Let's finally drop this always running coroutine and go another way:

1. reconnect_attempt() goes to nbd_co_send_request and called under
   send_mutex

2. We do receive headers in request coroutine. But we also should
   dispatch replies for another pending requests. So,
   nbd_connection_entry() is turned into nbd_receive_replies(), which
   does reply dispatching until it receive another request headers, and
   returns when it receive the requested header.

3. All old staff around drained sections and context switch is dropped.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c  | 376 ---
 nbd/client.c |   2 -
 2 files changed, 119 insertions(+), 259 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 03391bb231..3a7b532790 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -59,7 +59,7 @@
 typedef struct {
 Coroutine *coroutine;
 uint64_t offset;/* original offset of the request */
-bool receiving; /* waiting for connection_co? */
+bool receiving; /* waiting in first yield of nbd_receive_replies() */
 } NBDClientRequest;
 
 typedef enum NBDClientState {
@@ -75,14 +75,10 @@ typedef struct BDRVNBDState {
 
 CoMutex send_mutex;
 CoQueue free_sema;
-Coroutine *connection_co;
-Coroutine *teardown_co;
-QemuCoSleepState *connection_co_sleep_ns_state;
-bool drained;
-bool wait_drained_end;
+Coroutine *receive_co;
+Coroutine *in_flight_waiter;
 int in_flight;
 NBDClientState state;
-bool wait_in_flight;
 
 QEMUTimer *reconnect_delay_timer;
 
@@ -131,33 +127,20 @@ static bool nbd_client_connected(BDRVNBDState *s)
 
 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
+if (nbd_client_connected(s)) {
+qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
+
 if (ret == -EIO) {
 if (nbd_client_connected(s)) {
 s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
 NBD_CLIENT_CONNECTING_NOWAIT;
 }
 } else {
-if (nbd_client_connected(s)) {
-qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
-}
 s->state = NBD_CLIENT_QUIT;
 }
 }
 
-static void nbd_recv_coroutines_wake_all(BDRVNBDState *s)
-{
-int i;
-
-for (i = 0; i < MAX_NBD_REQUESTS; i++) {
-NBDClientRequest *req = >requests[i];
-
-if (req->coroutine && req->receiving) {
-req->receiving = false;
-aio_co_wake(req->coroutine);
-}
-}
-}
-
 static void reconnect_delay_timer_del(BDRVNBDState *s)
 {
 if (s->reconnect_delay_timer) {
@@ -194,117 +177,23 @@ static void reconnect_delay_timer_init(BDRVNBDState *s, 
uint64_t expire_time_ns)
 timer_mod(s->reconnect_delay_timer, expire_time_ns);
 }
 
-static void nbd_client_detach_aio_context(BlockDriverState *bs)
-{
-BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-
-/* Timer is deleted in nbd_client_co_drain_begin() */
-assert(!s->reconnect_delay_timer);
-/*
- * If reconnect is in progress we may have no ->ioc.  It will be
- * re-instantiated in the proper aio context once the connection is
- * reestablished.
- */
-if (s->ioc) {
-qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
-}
-}
-
-static void nbd_client_attach_aio_context_bh(void *opaque)
-{
-BlockDriverState *bs = opaque;
-BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-
-if (s->connection_co) {
-/*
- * The node is still drained, so we know the coroutine has yielded in
- * nbd_read_eof(), the only place where bs->in_flight can reach 0, or
- * it is entered for the first time. Both places are safe for entering
- * the coroutine.
- */
-qemu_aio_coroutine_enter(bs->aio_context, s->connection_co);
-}
-bdrv_dec_in_flight(bs);
-}
-
-static void nbd_client_attach_aio_context(BlockDriverState *bs,
-  AioContext *new_context)
-{
-BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-
-/*
- * s->connection_co is either yielded from nbd_receive_reply or from
- * nbd_co_reconnect_loop()
- */
-if (nbd_client_connected(s)) {
-qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
-}
-
-bdrv_inc_in_flight(bs);
-
-/*
- * Need to

Re: [PATCH v3 33/33] block/nbd: drop connection_co

2021-04-16 Thread Vladimir Sementsov-Ogievskiy


16.04.2021 11:14, Vladimir Sementsov-Ogievskiy wrote:

16.04.2021 11:09, Vladimir Sementsov-Ogievskiy wrote:

OK, that's a big rewrite of the logic.

Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.

We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct.

Let's finally drop this always running coroutine and go another way:

1. reconnect_attempt() goes to nbd_co_send_request and called under
    send_mutex

2. We do receive headers in request coroutine. But we also should
    dispatch replies for another pending requests. So,
    nbd_connection_entry() is turned into nbd_receive_replies(), which
    does reply dispatching until it receive another request headers, and
    returns when it receive the requested header.

3. All old staff around drained sections and context switch is dropped.

Signed-off-by: Vladimir Sementsov-Ogievskiy


Please consider this last patch as RFC for now:

1. It is complicated, and doesn't have good documentation. Please look through 
and ask everything that is not obvious, I'll explain. Don't waste your time 
trying to understand what is not clean.

2. I also failed to image, how to split the patch into smaller simple patches.. 
Ideas are welcome.

3. It actually reverts what was done in

commit 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace
Author: Vladimir Sementsov-Ogievskiy 
Date:   Thu Sep 3 22:02:58 2020 +0300

    block/nbd: fix drain dead-lock because of nbd reconnect-delay

and I didn't check yet, does this dead-lock still here or not. Even if it still 
here I believe that nbd driver is a wrong place to workaround this bug, but I 
should check it first at least.



4. As Roman said, there is a problem in new architecture: when guest is idle, 
we will not detect disconnect immediately but only on the next request from the 
guest. That may be considered as a degradation.
Still, let's implement a kind of keep-alive on top of this series, some ideas:

  - add an idle-timeout, and do simple NBD request by timeout, which will 
result in some expected error reply from the server.
  - or add an idle coroutine, which will do endless "read" when there no 
requests. It will be a kind of old connection_co, but it will have only one function and 
will be extremely simple. And we may just cancel it on drained-begin and restart on 
drained-end.

--
Best regards,
Vladimir

[PATCH v3 17/33] nbd/client-connection: implement connection retry

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Add an option for thread to retry connection until success. We'll use
nbd/client-connection both for reconnect and for initial connection in
nbd_open(), so we need a possibility to use same NBDClientConnection
instance to connect once in nbd_open() and then use retry semantics for
reconnect.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |  2 ++
 nbd/client-connection.c | 55 +
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 5d86e6a393..5bb54d831c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -409,6 +409,8 @@ const char *nbd_err_lookup(int err);
 /* nbd/client-connection.c */
 typedef struct NBDClientConnection NBDClientConnection;
 
+void nbd_client_connection_enable_retry(NBDClientConnection *conn);
+
 NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
bool do_negotiation,
const char *export_name,
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index ae4a77f826..002bd91f42 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -36,6 +36,8 @@ struct NBDClientConnection {
 NBDExportInfo initial_info;
 bool do_negotiation;
 
+bool do_retry;
+
 /*
  * Result of last attempt. Valid in FAIL and SUCCESS states.
  * If you want to steal error, don't forget to set pointer to NULL.
@@ -52,6 +54,15 @@ struct NBDClientConnection {
 Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
 };
 
+/*
+ * The function isn't protected by any mutex, so call it when thread is not
+ * running.
+ */
+void nbd_client_connection_enable_retry(NBDClientConnection *conn)
+{
+conn->do_retry = true;
+}
+
 NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
bool do_negotiation,
const char *export_name,
@@ -144,24 +155,37 @@ static void *connect_thread_func(void *opaque)
 NBDClientConnection *conn = opaque;
 bool do_free;
 int ret;
+uint64_t timeout = 1;
+uint64_t max_timeout = 16;
+
+while (true) {
+conn->sioc = qio_channel_socket_new();
+
+error_free(conn->err);
+conn->err = NULL;
+conn->updated_info = conn->initial_info;
+
+ret = nbd_connect(conn->sioc, conn->saddr,
+  conn->do_negotiation ? >updated_info : NULL,
+  conn->tlscreds, >ioc, >err);
+conn->updated_info.x_dirty_bitmap = NULL;
+conn->updated_info.name = NULL;
+
+if (ret < 0) {
+object_unref(OBJECT(conn->sioc));
+conn->sioc = NULL;
+if (conn->do_retry) {
+sleep(timeout);
+if (timeout < max_timeout) {
+timeout *= 2;
+}
+continue;
+}
+}
 
-conn->sioc = qio_channel_socket_new();
-
-error_free(conn->err);
-conn->err = NULL;
-conn->updated_info = conn->initial_info;
-
-ret = nbd_connect(conn->sioc, conn->saddr,
-  conn->do_negotiation ? >updated_info : NULL,
-  conn->tlscreds, >ioc, >err);
-if (ret < 0) {
-object_unref(OBJECT(conn->sioc));
-conn->sioc = NULL;
+break;
 }
 
-conn->updated_info.x_dirty_bitmap = NULL;
-conn->updated_info.name = NULL;
-
 WITH_QEMU_LOCK_GUARD(>mutex) {
 assert(conn->running);
 conn->running = false;
@@ -172,7 +196,6 @@ static void *connect_thread_func(void *opaque)
 do_free = conn->detached;
 }
 
-
 if (do_free) {
 nbd_client_connection_do_free(conn);
 }
-- 
2.29.2

[PATCH v3 16/33] nbd/client-connection: add possibility of negotiation

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Add arguments and logic to support nbd negotiation in the same thread
after successful connection.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |   9 +++-
 block/nbd.c |   4 +-
 nbd/client-connection.c | 105 ++--
 3 files changed, 109 insertions(+), 9 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 57381be76f..5d86e6a393 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -409,11 +409,16 @@ const char *nbd_err_lookup(int err);
 /* nbd/client-connection.c */
 typedef struct NBDClientConnection NBDClientConnection;
 
-NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr);
+NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
+   bool do_negotiation,
+   const char *export_name,
+   const char *x_dirty_bitmap,
+   QCryptoTLSCreds *tlscreds);
 void nbd_client_connection_release(NBDClientConnection *conn);
 
 QIOChannelSocket *coroutine_fn
-nbd_co_establish_connection(NBDClientConnection *conn, Error **errp);
+nbd_co_establish_connection(NBDClientConnection *conn, NBDExportInfo *info,
+QIOChannel **ioc, Error **errp);
 
 void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection 
*conn);
 
diff --git a/block/nbd.c b/block/nbd.c
index 9bd68dcf10..5e63caaf4b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -361,7 +361,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState 
*s)
 s->ioc = NULL;
 }
 
-s->sioc = nbd_co_establish_connection(s->conn, NULL);
+s->sioc = nbd_co_establish_connection(s->conn, NULL, NULL, NULL);
 if (!s->sioc) {
 ret = -ECONNREFUSED;
 goto out;
@@ -2033,7 +2033,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 goto fail;
 }
 
-s->conn = nbd_client_connection_new(s->saddr);
+s->conn = nbd_client_connection_new(s->saddr, false, NULL, NULL, NULL);
 
 /*
  * establish TCP connection, return error if it fails
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index b45a0bd5f6..ae4a77f826 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -30,14 +30,19 @@
 #include "qapi/clone-visitor.h"
 
 struct NBDClientConnection {
-/* Initialization constants */
+/* Initialization constants, never change */
 SocketAddress *saddr; /* address to connect to */
+QCryptoTLSCreds *tlscreds;
+NBDExportInfo initial_info;
+bool do_negotiation;
 
 /*
  * Result of last attempt. Valid in FAIL and SUCCESS states.
  * If you want to steal error, don't forget to set pointer to NULL.
  */
+NBDExportInfo updated_info;
 QIOChannelSocket *sioc;
+QIOChannel *ioc;
 Error *err;
 
 QemuMutex mutex;
@@ -47,12 +52,25 @@ struct NBDClientConnection {
 Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
 };
 
-NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr)
+NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr,
+   bool do_negotiation,
+   const char *export_name,
+   const char *x_dirty_bitmap,
+   QCryptoTLSCreds *tlscreds)
 {
 NBDClientConnection *conn = g_new(NBDClientConnection, 1);
 
+object_ref(OBJECT(tlscreds));
 *conn = (NBDClientConnection) {
 .saddr = QAPI_CLONE(SocketAddress, saddr),
+.tlscreds = tlscreds,
+.do_negotiation = do_negotiation,
+
+.initial_info.request_sizes = true,
+.initial_info.structured_reply = true,
+.initial_info.base_allocation = true,
+.initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
+.initial_info.name = g_strdup(export_name ?: "")
 };
 
 qemu_mutex_init(>mutex);
@@ -68,9 +86,59 @@ static void 
nbd_client_connection_do_free(NBDClientConnection *conn)
 }
 error_free(conn->err);
 qapi_free_SocketAddress(conn->saddr);
+object_unref(OBJECT(conn->tlscreds));
+g_free(conn->initial_info.x_dirty_bitmap);
+g_free(conn->initial_info.name);
 g_free(conn);
 }
 
+/*
+ * Connect to @addr and do NBD negotiation if @info is not null. If @tlscreds
+ * given @outioc is provided. @outioc is provided only on success.  The call 
may
+ * be cancelled in parallel by simply qio_channel_shutdown(sioc).
+ */
+static int nbd_connect(QIOChannelSocket *sioc, SocketAddress *addr,
+   NBDExportInfo *info, QCryptoTLSCreds *tlscreds,
+   QIOChannel **outioc, Error **errp)
+{
+int ret;
+
+if (outioc) {
+*outioc = NULL;
+}
+
+ret =

[PATCH v3 32/33] block/nbd: safer transition to receiving request

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

req->receiving is a flag of request being in one concrete yield point
in nbd_co_do_receive_one_chunk().

Such kind of boolean flag is always better to unset before scheduling
the coroutine, to avoid double scheduling. So, let's be more careful.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 6cc563e13d..03391bb231 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -152,6 +152,7 @@ static void nbd_recv_coroutines_wake_all(BDRVNBDState *s)
 NBDClientRequest *req = >requests[i];
 
 if (req->coroutine && req->receiving) {
+req->receiving = false;
 aio_co_wake(req->coroutine);
 }
 }
@@ -552,6 +553,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
  *   connection_co happens through a bottom half, which can only
  *   run after we yield.
  */
+s->requests[i].receiving = false;
 aio_co_wake(s->requests[i].coroutine);
 qemu_coroutine_yield();
 }
@@ -614,7 +616,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
 if (qiov) {
 qio_channel_set_cork(s->ioc, true);
 rc = nbd_send_request(s->ioc, request);
-if (nbd_clinet_connected(s) && rc >= 0) {
+if (nbd_client_connected(s) && rc >= 0) {
 if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
NULL) < 0) {
 rc = -EIO;
@@ -938,7 +940,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 /* Wait until we're woken up by nbd_connection_entry.  */
 s->requests[i].receiving = true;
 qemu_coroutine_yield();
-s->requests[i].receiving = false;
+assert(!s->requests[i].receiving);
 if (!nbd_client_connected(s)) {
 error_setg(errp, "Connection closed");
 return -EIO;
-- 
2.29.2

[PATCH v3 30/33] block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

The only last step we need to reuse the function is coroutine-wrapper.
nbd_open() may be called from non-coroutine context. So, generate the
wrapper and use it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/coroutines.h |   6 +++
 block/nbd.c| 101 ++---
 2 files changed, 10 insertions(+), 97 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 4cfb4946e6..514d169d23 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -66,4 +66,10 @@ int coroutine_fn bdrv_co_readv_vmstate(BlockDriverState *bs,
 int coroutine_fn bdrv_co_writev_vmstate(BlockDriverState *bs,
 QEMUIOVector *qiov, int64_t pos);
 
+int generated_co_wrapper
+nbd_do_establish_connection(BlockDriverState *bs, Error **errp);
+int coroutine_fn
+nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp);
+
+
 #endif /* BLOCK_COROUTINES_INT_H */
diff --git a/block/nbd.c b/block/nbd.c
index 863d950abd..3b31941a83 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -44,6 +44,7 @@
 #include "block/qdict.h"
 #include "block/nbd.h"
 #include "block/block_int.h"
+#include "block/coroutines.h"
 
 #include "monitor/monitor.h"
 
@@ -101,11 +102,6 @@ typedef struct BDRVNBDState {
 NBDClientConnection *conn;
 } BDRVNBDState;
 
-static QIOChannelSocket *nbd_establish_connection(BlockDriverState *bs,
-  SocketAddress *saddr,
-  Error **errp);
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-Error **errp);
 static void nbd_yank(void *opaque);
 
 static void nbd_clear_bdrvstate(BlockDriverState *bs)
@@ -361,7 +357,7 @@ static int nbd_handle_updated_info(BlockDriverState *bs, 
Error **errp)
 return 0;
 }
 
-static int nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp)
+int nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 int ret;
@@ -1577,83 +1573,6 @@ static void nbd_client_close(BlockDriverState *bs)
 nbd_teardown_connection(bs);
 }
 
-static QIOChannelSocket *nbd_establish_connection(BlockDriverState *bs,
-  SocketAddress *saddr,
-  Error **errp)
-{
-ERRP_GUARD();
-QIOChannelSocket *sioc;
-
-sioc = qio_channel_socket_new();
-qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
-
-qio_channel_socket_connect_sync(sioc, saddr, errp);
-if (*errp) {
-object_unref(OBJECT(sioc));
-return NULL;
-}
-
-yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), nbd_yank, 
bs);
-qio_channel_set_delay(QIO_CHANNEL(sioc), false);
-
-return sioc;
-}
-
-/* nbd_client_handshake takes ownership on sioc. */
-static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
-Error **errp)
-{
-BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-AioContext *aio_context = bdrv_get_aio_context(bs);
-int ret;
-
-trace_nbd_client_handshake(s->export);
-qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
-qio_channel_attach_aio_context(QIO_CHANNEL(sioc), aio_context);
-
-s->info.request_sizes = true;
-s->info.structured_reply = true;
-s->info.base_allocation = true;
-s->info.x_dirty_bitmap = g_strdup(s->x_dirty_bitmap);
-s->info.name = g_strdup(s->export ?: "");
-ret = nbd_receive_negotiate(aio_context, QIO_CHANNEL(sioc), s->tlscreds,
-s->hostname, >ioc, >info, errp);
-g_free(s->info.x_dirty_bitmap);
-g_free(s->info.name);
-if (ret < 0) {
-yank_unregister_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
- nbd_yank, bs);
-object_unref(OBJECT(sioc));
-return ret;
-}
-
-if (s->ioc) {
-/* sioc is referenced by s->ioc */
-object_unref(OBJECT(sioc));
-} else {
-s->ioc = QIO_CHANNEL(sioc);
-}
-sioc = NULL;
-
-ret = nbd_handle_updated_info(bs, errp);
-if (ret < 0) {
-/*
- * We have connected, but must fail for other reasons.
- * Send NBD_CMD_DISC as a courtesy to the server.
- */
-NBDRequest request = { .type = NBD_CMD_DISC };
-
-nbd_send_request(s->ioc, );
-
-yank_unregister_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
- nbd_yank, bs);
-object_unref(OBJECT(s->ioc));
-s->ioc = NULL;
-return ret;
-}
-
-return 0;
-}
 
 /*
  * Parse nbd_open options
@@ -2037,7 +1956,6 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 {
 int ret;
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-QIOChannelSocket *sioc;
 
 s->bs = bs;
 qemu_co_mutex_init(>send_mutex);
@@ -2056,22

Re: [PATCH v3 33/33] block/nbd: drop connection_co

2021-04-16 Thread Vladimir Sementsov-Ogievskiy


16.04.2021 11:09, Vladimir Sementsov-Ogievskiy wrote:

OK, that's a big rewrite of the logic.

Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.

We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct.

Let's finally drop this always running coroutine and go another way:

1. reconnect_attempt() goes to nbd_co_send_request and called under
send_mutex

2. We do receive headers in request coroutine. But we also should
dispatch replies for another pending requests. So,
nbd_connection_entry() is turned into nbd_receive_replies(), which
does reply dispatching until it receive another request headers, and
returns when it receive the requested header.

3. All old staff around drained sections and context switch is dropped.

Signed-off-by: Vladimir Sementsov-Ogievskiy


Please consider this last patch as RFC for now:

1. It is complicated, and doesn't have good documentation. Please look through 
and ask everything that is not obvious, I'll explain. Don't waste your time 
trying to understand what is not clean.

2. I also failed to image, how to split the patch into smaller simple patches.. 
Ideas are welcome.

3. It actually reverts what was done in

commit 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace
Author: Vladimir Sementsov-Ogievskiy 
Date:   Thu Sep 3 22:02:58 2020 +0300

block/nbd: fix drain dead-lock because of nbd reconnect-delay

and I didn't check yet, does this dead-lock still here or not. Even if it still 
here I believe that nbd driver is a wrong place to workaround this bug, but I 
should check it first at least.

--
Best regards,
Vladimir

[PATCH v3 22/33] block/nbd: pass monitor directly to connection thread

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

monitor_cur() is used by socket_get_fd, but it doesn't work in
connection thread. Let's monitor directly to cover this thing. We are
going to unify connection establishing path in nbd_open and reconnect,
so we should support fd-passing.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |  3 ++-
 block/nbd.c |  5 -
 nbd/client-connection.c | 11 +++
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 5bb54d831c..10756d2544 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -415,7 +415,8 @@ NBDClientConnection *nbd_client_connection_new(const 
SocketAddress *saddr,
bool do_negotiation,
const char *export_name,
const char *x_dirty_bitmap,
-   QCryptoTLSCreds *tlscreds);
+   QCryptoTLSCreds *tlscreds,
+   Monitor *mon);
 void nbd_client_connection_release(NBDClientConnection *conn);
 
 QIOChannelSocket *coroutine_fn
diff --git a/block/nbd.c b/block/nbd.c
index c1e61a2a52..ec69a4ad65 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -45,6 +45,8 @@
 #include "block/nbd.h"
 #include "block/block_int.h"
 
+#include "monitor/monitor.h"
+
 #include "qemu/yank.h"
 
 #define EN_OPTSTR ":exportname="
@@ -2064,7 +2066,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 }
 
 s->conn = nbd_client_connection_new(s->saddr, true, s->export,
-s->x_dirty_bitmap, s->tlscreds);
+s->x_dirty_bitmap, s->tlscreds,
+monitor_cur());
 
 /*
  * establish TCP connection, return error if it fails
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 54f73c6c24..c26cd59464 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -37,6 +37,7 @@ struct NBDClientConnection {
 bool do_negotiation;
 
 bool do_retry;
+Monitor *mon;
 
 /*
  * Result of last attempt. Valid in FAIL and SUCCESS states.
@@ -67,7 +68,8 @@ NBDClientConnection *nbd_client_connection_new(const 
SocketAddress *saddr,
bool do_negotiation,
const char *export_name,
const char *x_dirty_bitmap,
-   QCryptoTLSCreds *tlscreds)
+   QCryptoTLSCreds *tlscreds,
+   Monitor *mon)
 {
 NBDClientConnection *conn = g_new(NBDClientConnection, 1);
 
@@ -76,6 +78,7 @@ NBDClientConnection *nbd_client_connection_new(const 
SocketAddress *saddr,
 .saddr = QAPI_CLONE(SocketAddress, saddr),
 .tlscreds = tlscreds,
 .do_negotiation = do_negotiation,
+.mon = mon,
 
 .initial_info.request_sizes = true,
 .initial_info.structured_reply = true,
@@ -110,7 +113,7 @@ static void 
nbd_client_connection_do_free(NBDClientConnection *conn)
  */
 static int nbd_connect(QIOChannelSocket *sioc, SocketAddress *addr,
NBDExportInfo *info, QCryptoTLSCreds *tlscreds,
-   QIOChannel **outioc, Error **errp)
+   QIOChannel **outioc, Monitor *mon, Error **errp)
 {
 int ret;
 
@@ -118,7 +121,7 @@ static int nbd_connect(QIOChannelSocket *sioc, 
SocketAddress *addr,
 *outioc = NULL;
 }
 
-ret = qio_channel_socket_connect_sync(sioc, addr, errp);
+ret = qio_channel_socket_connect_sync_mon(sioc, addr, mon, errp);
 if (ret < 0) {
 return ret;
 }
@@ -171,7 +174,7 @@ static void *connect_thread_func(void *opaque)
 
 ret = nbd_connect(conn->sioc, conn->saddr,
   conn->do_negotiation ? >updated_info : NULL,
-  conn->tlscreds, >ioc, >err);
+  conn->tlscreds, >ioc, conn->mon, >err);
 conn->updated_info.x_dirty_bitmap = NULL;
 conn->updated_info.name = NULL;
 
-- 
2.29.2

[PATCH v3 20/33] block/nbd: use negotiation of NBDClientConnection

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Use a new possibility of negotiation in connect thread. Now we are on
the way of simplifying connection_co. We want to move the whole
reconnect code to NBDClientConnection. NBDClientConnection already
updated to support negotiation and retry. Use now the first thing.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 44 ++--
 1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 03ffe95231..c1e61a2a52 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -365,6 +365,7 @@ static int nbd_handle_updated_info(BlockDriverState *bs, 
Error **errp)
 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
 int ret;
+AioContext *aio_context = bdrv_get_aio_context(s->bs);
 
 if (!nbd_client_connecting(s)) {
 return;
@@ -405,30 +406,44 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 s->ioc = NULL;
 }
 
-s->sioc = nbd_co_establish_connection(s->conn, NULL, NULL, NULL);
+s->sioc = nbd_co_establish_connection(s->conn, >info, >ioc, NULL);
 if (!s->sioc) {
 ret = -ECONNREFUSED;
 goto out;
 }
 
+qio_channel_set_blocking(QIO_CHANNEL(s->sioc), false, NULL);
+qio_channel_attach_aio_context(QIO_CHANNEL(s->sioc), aio_context);
+if (s->ioc) {
+qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL);
+qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context);
+} else {
+s->ioc = QIO_CHANNEL(s->sioc);
+object_ref(OBJECT(s->ioc));
+}
+
 yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank,
s->bs);
 
-bdrv_dec_in_flight(s->bs);
+ret = nbd_handle_updated_info(s->bs, NULL);
+if (ret < 0) {
+/*
+ * We have connected, but must fail for other reasons.
+ * Send NBD_CMD_DISC as a courtesy to the server.
+ */
+NBDRequest request = { .type = NBD_CMD_DISC };
 
-ret = nbd_client_handshake(s->bs, NULL);
+nbd_send_request(s->ioc, );
 
-if (s->drained) {
-s->wait_drained_end = true;
-while (s->drained) {
-/*
- * We may be entered once from nbd_client_attach_aio_context_bh
- * and then from nbd_client_co_drain_end. So here is a loop.
- */
-qemu_coroutine_yield();
-}
+yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
+ nbd_yank, s->bs);
+object_unref(OBJECT(s->sioc));
+s->sioc = NULL;
+object_unref(OBJECT(s->ioc));
+s->ioc = NULL;
+
+return;
 }
-bdrv_inc_in_flight(s->bs);
 
 out:
 if (ret >= 0) {
@@ -2048,7 +2063,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 goto fail;
 }
 
-s->conn = nbd_client_connection_new(s->saddr, false, NULL, NULL, NULL);
+s->conn = nbd_client_connection_new(s->saddr, true, s->export,
+s->x_dirty_bitmap, s->tlscreds);
 
 /*
  * establish TCP connection, return error if it fails
-- 
2.29.2

[PATCH v3 11/33] block/nbd: rename NBDConnectThread to NBDClientConnection

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

We are going to move connection code to own file and want clear names
and APIs.

The structure is shared between user and (possibly) several runs of
connect-thread. So it's wrong to call it "thread". Let's rename to
something more generic.

Appropriately rename connect_thread and thr variables to conn.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 137 ++--
 1 file changed, 68 insertions(+), 69 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index dab73bdf3b..9ce6a323eb 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -66,7 +66,7 @@ typedef enum NBDClientState {
 NBD_CLIENT_QUIT
 } NBDClientState;
 
-typedef struct NBDConnectThread {
+typedef struct NBDClientConnection {
 /* Initialization constants */
 SocketAddress *saddr; /* address to connect to */
 
@@ -82,7 +82,7 @@ typedef struct NBDConnectThread {
 bool running; /* thread is running now */
 bool detached; /* thread is detached and should cleanup the state */
 Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
-} NBDConnectThread;
+} NBDClientConnection;
 
 typedef struct BDRVNBDState {
 QIOChannelSocket *sioc; /* The master data channel */
@@ -115,34 +115,34 @@ typedef struct BDRVNBDState {
 char *x_dirty_bitmap;
 bool alloc_depth;
 
-NBDConnectThread *connect_thread;
+NBDClientConnection *conn;
 } BDRVNBDState;
 
-static void nbd_free_connect_thread(NBDConnectThread *thr);
+static void nbd_free_connect_thread(NBDClientConnection *conn);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
 static coroutine_fn QIOChannelSocket *
-nbd_co_establish_connection(NBDConnectThread *thr, Error **errp);
-static void nbd_co_establish_connection_cancel(NBDConnectThread *thr);
+nbd_co_establish_connection(NBDClientConnection *conn, Error **errp);
+static void nbd_co_establish_connection_cancel(NBDClientConnection *conn);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
 static void nbd_clear_bdrvstate(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-NBDConnectThread *thr = s->connect_thread;
+NBDClientConnection *conn = s->conn;
 bool do_free;
 
-qemu_mutex_lock(>mutex);
-if (thr->running) {
-thr->detached = true;
+qemu_mutex_lock(>mutex);
+if (conn->running) {
+conn->detached = true;
 }
-do_free = !thr->running && !thr->detached;
-qemu_mutex_unlock(>mutex);
+do_free = !conn->running && !conn->detached;
+qemu_mutex_unlock(>mutex);
 
 /* the runaway thread will clean it up itself */
 if (do_free) {
-nbd_free_connect_thread(thr);
+nbd_free_connect_thread(conn);
 }
 
 yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
@@ -286,7 +286,7 @@ static void coroutine_fn 
nbd_client_co_drain_begin(BlockDriverState *bs)
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
 
-nbd_co_establish_connection_cancel(s->connect_thread);
+nbd_co_establish_connection_cancel(s->conn);
 
 reconnect_delay_timer_del(s);
 
@@ -326,7 +326,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 if (s->connection_co_sleep_ns_state) {
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
-nbd_co_establish_connection_cancel(s->connect_thread);
+nbd_co_establish_connection_cancel(s->conn);
 }
 if (qemu_in_coroutine()) {
 s->teardown_co = qemu_coroutine_self();
@@ -353,101 +353,101 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 
 static void nbd_init_connect_thread(BDRVNBDState *s)
 {
-s->connect_thread = g_new(NBDConnectThread, 1);
+s->conn = g_new(NBDClientConnection, 1);
 
-*s->connect_thread = (NBDConnectThread) {
+*s->conn = (NBDClientConnection) {
 .saddr = QAPI_CLONE(SocketAddress, s->saddr),
 };
 
-qemu_mutex_init(>connect_thread->mutex);
+qemu_mutex_init(>conn->mutex);
 }
 
-static void nbd_free_connect_thread(NBDConnectThread *thr)
+static void nbd_free_connect_thread(NBDClientConnection *conn)
 {
-if (thr->sioc) {
-qio_channel_close(QIO_CHANNEL(thr->sioc), NULL);
-object_unref(OBJECT(thr->sioc));
+if (conn->sioc) {
+qio_channel_close(QIO_CHANNEL(conn->sioc), NULL);
+object_unref(OBJECT(conn->sioc));
 }
-error_free(thr->err);
-qapi_free_SocketAddress(thr->saddr);
-g_free(thr);
+error_free(conn->err);
+qapi_free_SocketAddress(conn->saddr);
+g_free(conn);
 }
 
 static void *connect_thread_func(void *opaque)
 {
-NBDConnectThread *thr = opaque;
+NBDClientConnection *conn = opaque;
 int ret;
 bool do_free = false;
 
-thr->sioc = qio_channel_socket_new();
+conn->sioc = qio_channel_socket_new();
 
-error_free(thr->err);
-thr->err = NULL;
-

[PATCH v3 19/33] block/nbd: split nbd_handle_updated_info out of nbd_client_handshake()

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

To be reused in the following patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 99 ++---
 1 file changed, 57 insertions(+), 42 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 5e63caaf4b..03ffe95231 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -318,6 +318,50 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT;
 }
 
+/*
+ * Check s->info updated by negotiation process.
+ * Update @bs correspondingly to new options.
+ */
+static int nbd_handle_updated_info(BlockDriverState *bs, Error **errp)
+{
+BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+int ret;
+
+if (s->x_dirty_bitmap) {
+if (!s->info.base_allocation) {
+error_setg(errp, "requested x-dirty-bitmap %s not found",
+   s->x_dirty_bitmap);
+return -EINVAL;
+}
+if (strcmp(s->x_dirty_bitmap, "qemu:allocation-depth") == 0) {
+s->alloc_depth = true;
+}
+}
+
+if (s->info.flags & NBD_FLAG_READ_ONLY) {
+ret = bdrv_apply_auto_read_only(bs, "NBD export is read-only", errp);
+if (ret < 0) {
+return ret;
+}
+}
+
+if (s->info.flags & NBD_FLAG_SEND_FUA) {
+bs->supported_write_flags = BDRV_REQ_FUA;
+bs->supported_zero_flags |= BDRV_REQ_FUA;
+}
+
+if (s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) {
+bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
+if (s->info.flags & NBD_FLAG_SEND_FAST_ZERO) {
+bs->supported_zero_flags |= BDRV_REQ_NO_FALLBACK;
+}
+}
+
+trace_nbd_client_handshake_success(s->export);
+
+return 0;
+}
+
 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
 int ret;
@@ -1579,49 +1623,13 @@ static int nbd_client_handshake(BlockDriverState *bs, 
Error **errp)
 s->sioc = NULL;
 return ret;
 }
-if (s->x_dirty_bitmap) {
-if (!s->info.base_allocation) {
-error_setg(errp, "requested x-dirty-bitmap %s not found",
-   s->x_dirty_bitmap);
-ret = -EINVAL;
-goto fail;
-}
-if (strcmp(s->x_dirty_bitmap, "qemu:allocation-depth") == 0) {
-s->alloc_depth = true;
-}
-}
-if (s->info.flags & NBD_FLAG_READ_ONLY) {
-ret = bdrv_apply_auto_read_only(bs, "NBD export is read-only", errp);
-if (ret < 0) {
-goto fail;
-}
-}
-if (s->info.flags & NBD_FLAG_SEND_FUA) {
-bs->supported_write_flags = BDRV_REQ_FUA;
-bs->supported_zero_flags |= BDRV_REQ_FUA;
-}
-if (s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) {
-bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
-if (s->info.flags & NBD_FLAG_SEND_FAST_ZERO) {
-bs->supported_zero_flags |= BDRV_REQ_NO_FALLBACK;
-}
-}
 
-if (!s->ioc) {
-s->ioc = QIO_CHANNEL(s->sioc);
-object_ref(OBJECT(s->ioc));
-}
-
-trace_nbd_client_handshake_success(s->export);
-
-return 0;
-
- fail:
-/*
- * We have connected, but must fail for other reasons.
- * Send NBD_CMD_DISC as a courtesy to the server.
- */
-{
+ret = nbd_handle_updated_info(bs, errp);
+if (ret < 0) {
+/*
+ * We have connected, but must fail for other reasons.
+ * Send NBD_CMD_DISC as a courtesy to the server.
+ */
 NBDRequest request = { .type = NBD_CMD_DISC };
 
 nbd_send_request(s->ioc ?: QIO_CHANNEL(s->sioc), );
@@ -1635,6 +1643,13 @@ static int nbd_client_handshake(BlockDriverState *bs, 
Error **errp)
 
 return ret;
 }
+
+if (!s->ioc) {
+s->ioc = QIO_CHANNEL(s->sioc);
+object_ref(OBJECT(s->ioc));
+}
+
+return 0;
 }
 
 /*
-- 
2.29.2

[PATCH v3 31/33] block/nbd: add nbd_clinent_connected() helper

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

We already have two similar helpers for other state. Let's add another
one for convenience.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 3b31941a83..6cc563e13d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -124,15 +124,20 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
 s->x_dirty_bitmap = NULL;
 }
 
+static bool nbd_client_connected(BDRVNBDState *s)
+{
+return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED;
+}
+
 static void nbd_channel_error(BDRVNBDState *s, int ret)
 {
 if (ret == -EIO) {
-if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) {
+if (nbd_client_connected(s)) {
 s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
 NBD_CLIENT_CONNECTING_NOWAIT;
 }
 } else {
-if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) {
+if (nbd_client_connected(s)) {
 qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 }
 s->state = NBD_CLIENT_QUIT;
@@ -230,7 +235,7 @@ static void nbd_client_attach_aio_context(BlockDriverState 
*bs,
  * s->connection_co is either yielded from nbd_receive_reply or from
  * nbd_co_reconnect_loop()
  */
-if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED) {
+if (nbd_client_connected(s)) {
 qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), new_context);
 }
 
@@ -503,7 +508,7 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
 nbd_co_reconnect_loop(s);
 }
 
-if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) {
+if (!nbd_client_connected(s)) {
 continue;
 }
 
@@ -582,7 +587,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
 qemu_co_queue_wait(>free_sema, >send_mutex);
 }
 
-if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) {
+if (!nbd_client_connected(s)) {
 rc = -EIO;
 goto err;
 }
@@ -609,8 +614,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
 if (qiov) {
 qio_channel_set_cork(s->ioc, true);
 rc = nbd_send_request(s->ioc, request);
-if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED &&
-rc >= 0) {
+if (nbd_clinet_connected(s) && rc >= 0) {
 if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
NULL) < 0) {
 rc = -EIO;
@@ -935,7 +939,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 s->requests[i].receiving = true;
 qemu_coroutine_yield();
 s->requests[i].receiving = false;
-if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) {
+if (!nbd_client_connected(s)) {
 error_setg(errp, "Connection closed");
 return -EIO;
 }
@@ -1094,7 +1098,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
 NBDReply local_reply;
 NBDStructuredReplyChunk *chunk;
 Error *local_err = NULL;
-if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) {
+if (!nbd_client_connected(s)) {
 error_setg(_err, "Connection closed");
 nbd_iter_channel_error(iter, -EIO, _err);
 goto break_loop;
@@ -1119,8 +1123,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
 }
 
 /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
-if (nbd_reply_is_simple(reply) ||
-qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTED) {
+if (nbd_reply_is_simple(reply) || !nbd_client_connected(s)) {
 goto break_loop;
 }
 
-- 
2.29.2

[PATCH v3 27/33] block/nbd: split nbd_co_do_establish_connection out of nbd_reconnect_attempt

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Split out part, that we want to reuse for nbd_open().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 79 +++--
 1 file changed, 41 insertions(+), 38 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 15b5921725..59971bfba8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -361,11 +361,49 @@ static int nbd_handle_updated_info(BlockDriverState *bs, 
Error **errp)
 return 0;
 }
 
-static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
+static int nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp)
 {
+BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 int ret;
-AioContext *aio_context = bdrv_get_aio_context(s->bs);
 
+assert(!s->ioc);
+
+s->ioc = nbd_co_establish_connection(s->conn, >info, errp);
+if (!s->ioc) {
+return -ECONNREFUSED;
+}
+
+ret = nbd_handle_updated_info(s->bs, NULL);
+if (ret < 0) {
+/*
+ * We have connected, but must fail for other reasons.
+ * Send NBD_CMD_DISC as a courtesy to the server.
+ */
+NBDRequest request = { .type = NBD_CMD_DISC };
+
+nbd_send_request(s->ioc, );
+
+object_unref(OBJECT(s->ioc));
+s->ioc = NULL;
+
+return ret;
+}
+
+qio_channel_set_blocking(s->ioc, false, NULL);
+qio_channel_attach_aio_context(s->ioc, bdrv_get_aio_context(bs));
+
+yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank,
+   bs);
+
+/* successfully connected */
+s->state = NBD_CLIENT_CONNECTED;
+qemu_co_queue_restart_all(>free_sema);
+
+return 0;
+}
+
+static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
+{
 if (!nbd_client_connecting(s)) {
 return;
 }
@@ -403,42 +441,7 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 s->ioc = NULL;
 }
 
-s->ioc = nbd_co_establish_connection(s->conn, >info, NULL);
-if (!s->ioc) {
-ret = -ECONNREFUSED;
-goto out;
-}
-
-qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL);
-qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context);
-
-yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank,
-   s->bs);
-
-ret = nbd_handle_updated_info(s->bs, NULL);
-if (ret < 0) {
-/*
- * We have connected, but must fail for other reasons.
- * Send NBD_CMD_DISC as a courtesy to the server.
- */
-NBDRequest request = { .type = NBD_CMD_DISC };
-
-nbd_send_request(s->ioc, );
-
-yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
- nbd_yank, s->bs);
-object_unref(OBJECT(s->ioc));
-s->ioc = NULL;
-
-return;
-}
-
-out:
-if (ret >= 0) {
-/* successfully connected */
-s->state = NBD_CLIENT_CONNECTED;
-qemu_co_queue_restart_all(>free_sema);
-}
+nbd_co_do_establish_connection(s->bs, NULL);
 }
 
 static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState *s)
-- 
2.29.2

[PATCH v3 24/33] block/nbd: drop BDRVNBDState::sioc

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Currently sioc pointer is used just to pass from socket-connection to
nbd negotiation. Drop the field, and use local variables instead. With
next commit we'll update nbd/client-connection.c to behave
appropriately (return only top-most ioc, not two channels).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 98 ++---
 1 file changed, 48 insertions(+), 50 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index cece53313c..f9b56c57b4 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -69,8 +69,7 @@ typedef enum NBDClientState {
 } NBDClientState;
 
 typedef struct BDRVNBDState {
-QIOChannelSocket *sioc; /* The master data channel */
-QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
+QIOChannel *ioc; /* The current I/O channel */
 NBDExportInfo info;
 
 CoMutex send_mutex;
@@ -102,9 +101,11 @@ typedef struct BDRVNBDState {
 NBDClientConnection *conn;
 } BDRVNBDState;
 
-static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
-Error **errp);
-static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
+static QIOChannelSocket *nbd_establish_connection(BlockDriverState *bs,
+  SocketAddress *saddr,
+  Error **errp);
+static int nbd_client_handshake(BlockDriverState *bs, QIOChannelSocket *sioc,
+Error **errp);
 static void nbd_yank(void *opaque);
 
 static void nbd_clear_bdrvstate(BlockDriverState *bs)
@@ -364,6 +365,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState 
*s)
 {
 int ret;
 AioContext *aio_context = bdrv_get_aio_context(s->bs);
+QIOChannelSocket *sioc;
 
 if (!nbd_client_connecting(s)) {
 return;
@@ -398,27 +400,26 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
 yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
  nbd_yank, s->bs);
-object_unref(OBJECT(s->sioc));
-s->sioc = NULL;
 object_unref(OBJECT(s->ioc));
 s->ioc = NULL;
 }
 
-s->sioc = nbd_co_establish_connection(s->conn, >info, >ioc, NULL);
-if (!s->sioc) {
+sioc = nbd_co_establish_connection(s->conn, >info, >ioc, NULL);
+if (!sioc) {
 ret = -ECONNREFUSED;
 goto out;
 }
 
-qio_channel_set_blocking(QIO_CHANNEL(s->sioc), false, NULL);
-qio_channel_attach_aio_context(QIO_CHANNEL(s->sioc), aio_context);
 if (s->ioc) {
-qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL);
-qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context);
+/* sioc is referenced by s->ioc */
+object_unref(OBJECT(sioc));
 } else {
-s->ioc = QIO_CHANNEL(s->sioc);
-object_ref(OBJECT(s->ioc));
+s->ioc = QIO_CHANNEL(sioc);
 }
+sioc = NULL;
+
+qio_channel_set_blocking(QIO_CHANNEL(s->ioc), false, NULL);
+qio_channel_attach_aio_context(QIO_CHANNEL(s->ioc), aio_context);
 
 yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank,
s->bs);
@@ -435,8 +436,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState 
*s)
 
 yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
  nbd_yank, s->bs);
-object_unref(OBJECT(s->sioc));
-s->sioc = NULL;
 object_unref(OBJECT(s->ioc));
 s->ioc = NULL;
 
@@ -571,8 +570,6 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
 qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
 yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
  nbd_yank, s->bs);
-object_unref(OBJECT(s->sioc));
-s->sioc = NULL;
 object_unref(OBJECT(s->ioc));
 s->ioc = NULL;
 }
@@ -1571,7 +1568,7 @@ static void nbd_yank(void *opaque)
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
 qatomic_store_release(>state, NBD_CLIENT_QUIT);
-qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH, 
NULL);
+qio_channel_shutdown(QIO_CHANNEL(s->ioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 }
 
 static void nbd_client_close(BlockDriverState *bs)
@@ -1586,57 +1583,64 @@ static void nbd_client_close(BlockDriverState *bs)
 nbd_teardown_connection(bs);
 }
 
-static int nbd_establish_connection(BlockDriverState *bs,
-SocketAddress *saddr,
-Error **errp)
+static QIOChannelSocket *nbd_establish_connection(BlockDriverState *bs,
+  SocketAddress *saddr,
+  Error **errp)
 {
 ERRP_GUARD();
-BDRVNBDState *s =

[PATCH v3 13/33] block/nbd: introduce nbd_client_connection_release()

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 43 ++-
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 21a4039359..8531d019b2 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -118,7 +118,7 @@ typedef struct BDRVNBDState {
 NBDClientConnection *conn;
 } BDRVNBDState;
 
-static void nbd_free_connect_thread(NBDClientConnection *conn);
+static void nbd_client_connection_release(NBDClientConnection *conn);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
 static coroutine_fn QIOChannelSocket *
@@ -130,20 +130,9 @@ static void nbd_yank(void *opaque);
 static void nbd_clear_bdrvstate(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-NBDClientConnection *conn = s->conn;
-bool do_free;
-
-qemu_mutex_lock(>mutex);
-if (conn->running) {
-conn->detached = true;
-}
-do_free = !conn->running && !conn->detached;
-qemu_mutex_unlock(>mutex);
 
-/* the runaway thread will clean it up itself */
-if (do_free) {
-nbd_free_connect_thread(conn);
-}
+nbd_client_connection_release(s->conn);
+s->conn = NULL;
 
 yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
 
@@ -365,7 +354,7 @@ nbd_client_connection_new(const SocketAddress *saddr)
 return conn;
 }
 
-static void nbd_free_connect_thread(NBDClientConnection *conn)
+static void nbd_client_connection_do_free(NBDClientConnection *conn)
 {
 if (conn->sioc) {
 qio_channel_close(QIO_CHANNEL(conn->sioc), NULL);
@@ -379,8 +368,8 @@ static void nbd_free_connect_thread(NBDClientConnection 
*conn)
 static void *connect_thread_func(void *opaque)
 {
 NBDClientConnection *conn = opaque;
+bool do_free;
 int ret;
-bool do_free = false;
 
 conn->sioc = qio_channel_socket_new();
 
@@ -405,12 +394,32 @@ static void *connect_thread_func(void *opaque)
 qemu_mutex_unlock(>mutex);
 
 if (do_free) {
-nbd_free_connect_thread(conn);
+nbd_client_connection_do_free(conn);
 }
 
 return NULL;
 }
 
+static void nbd_client_connection_release(NBDClientConnection *conn)
+{
+bool do_free;
+
+if (!conn) {
+return;
+}
+
+qemu_mutex_lock(>mutex);
+if (conn->running) {
+conn->detached = true;
+}
+do_free = !conn->running && !conn->detached;
+qemu_mutex_unlock(>mutex);
+
+if (do_free) {
+nbd_client_connection_do_free(conn);
+}
+}
+
 /*
  * Get a new connection in context of @conn:
  *   if thread is running, wait for completion
-- 
2.29.2

[PATCH v3 21/33] qemu-socket: pass monitor link to socket_get_fd directly

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Detecting monitor by current coroutine works bad when we are not in
coroutine context. And that's exactly so in nbd reconnect code, where
qio_channel_socket_connect_sync() is called from thread.

Add a possibility to pass monitor by hand, to be used in the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/io/channel-socket.h| 20 
 include/qemu/sockets.h |  2 +-
 io/channel-socket.c| 17 +
 tests/unit/test-util-sockets.c | 16 
 util/qemu-sockets.c| 10 +-
 5 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
index e747e63514..6d0915420d 100644
--- a/include/io/channel-socket.h
+++ b/include/io/channel-socket.h
@@ -78,6 +78,23 @@ qio_channel_socket_new_fd(int fd,
   Error **errp);
 
 
+/**
+ * qio_channel_socket_connect_sync_mon:
+ * @ioc: the socket channel object
+ * @addr: the address to connect to
+ * @mon: current monitor. If NULL, it will be detected by
+ *   current coroutine.
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Attempt to connect to the address @addr. This method
+ * will run in the foreground so the caller will not regain
+ * execution control until the connection is established or
+ * an error occurs.
+ */
+int qio_channel_socket_connect_sync_mon(QIOChannelSocket *ioc,
+SocketAddress *addr,
+Monitor *mon,
+Error **errp);
 /**
  * qio_channel_socket_connect_sync:
  * @ioc: the socket channel object
@@ -88,6 +105,9 @@ qio_channel_socket_new_fd(int fd,
  * will run in the foreground so the caller will not regain
  * execution control until the connection is established or
  * an error occurs.
+ *
+ * This a wrapper, calling qio_channel_socket_connect_sync_mon()
+ * with @mon=NULL.
  */
 int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
 SocketAddress *addr,
diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 7d1f813576..cdf6f2413b 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -41,7 +41,7 @@ int unix_listen(const char *path, Error **errp);
 int unix_connect(const char *path, Error **errp);
 
 SocketAddress *socket_parse(const char *str, Error **errp);
-int socket_connect(SocketAddress *addr, Error **errp);
+int socket_connect(SocketAddress *addr, Monitor *mon, Error **errp);
 int socket_listen(SocketAddress *addr, int num, Error **errp);
 void socket_listen_cleanup(int fd, Error **errp);
 int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
diff --git a/io/channel-socket.c b/io/channel-socket.c
index de259f7eed..9dc42ca29d 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -135,14 +135,15 @@ qio_channel_socket_new_fd(int fd,
 }
 
 
-int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
-SocketAddress *addr,
-Error **errp)
+int qio_channel_socket_connect_sync_mon(QIOChannelSocket *ioc,
+SocketAddress *addr,
+Monitor *mon,
+Error **errp)
 {
 int fd;
 
 trace_qio_channel_socket_connect_sync(ioc, addr);
-fd = socket_connect(addr, errp);
+fd = socket_connect(addr, mon, errp);
 if (fd < 0) {
 trace_qio_channel_socket_connect_fail(ioc);
 return -1;
@@ -158,6 +159,14 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
 }
 
 
+int qio_channel_socket_connect_sync(QIOChannelSocket *ioc,
+SocketAddress *addr,
+Error **errp)
+{
+return qio_channel_socket_connect_sync_mon(ioc, addr, NULL, errp);
+}
+
+
 static void qio_channel_socket_connect_worker(QIOTask *task,
   gpointer opaque)
 {
diff --git a/tests/unit/test-util-sockets.c b/tests/unit/test-util-sockets.c
index 72b9246529..d902ecede7 100644
--- a/tests/unit/test-util-sockets.c
+++ b/tests/unit/test-util-sockets.c
@@ -90,7 +90,7 @@ static void test_socket_fd_pass_name_good(void)
 addr.type = SOCKET_ADDRESS_TYPE_FD;
 addr.u.fd.str = g_strdup(mon_fdname);
 
-fd = socket_connect(, _abort);
+fd = socket_connect(, NULL, _abort);
 g_assert_cmpint(fd, !=, -1);
 g_assert_cmpint(fd, !=, mon_fd);
 close(fd);
@@ -122,7 +122,7 @@ static void test_socket_fd_pass_name_bad(void)
 addr.type = SOCKET_ADDRESS_TYPE_FD;
 addr.u.fd.str = g_strdup(mon_fdname);
 
-fd = socket_connect(, );
+fd = socket_connect(, NULL, );
 g_assert_cmpint(fd, ==, -1);
 error_free_or_abort();
 
@@ -149,7 +149,7 @@ static void test_socket_fd_pass_name_nomon(void)
 addr.type = SOCKET_ADDRESS_TYPE_FD;
 addr.u.fd.str =

[PATCH v3 18/33] nbd/client-connection: shutdown connection on release

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Now, when thread can do negotiation and retry, it may run relatively
long. We need a mechanism to stop it, when user is not interested in
result anymore. So, on nbd_client_connection_release() let's shutdown
the socked, and do not retry connection if thread is detached.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/client-connection.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 002bd91f42..54f73c6c24 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -158,9 +158,13 @@ static void *connect_thread_func(void *opaque)
 uint64_t timeout = 1;
 uint64_t max_timeout = 16;
 
-while (true) {
+qemu_mutex_lock(>mutex);
+while (!conn->detached) {
+assert(!conn->sioc);
 conn->sioc = qio_channel_socket_new();
 
+qemu_mutex_unlock(>mutex);
+
 error_free(conn->err);
 conn->err = NULL;
 conn->updated_info = conn->initial_info;
@@ -171,14 +175,20 @@ static void *connect_thread_func(void *opaque)
 conn->updated_info.x_dirty_bitmap = NULL;
 conn->updated_info.name = NULL;
 
+qemu_mutex_lock(>mutex);
+
 if (ret < 0) {
 object_unref(OBJECT(conn->sioc));
 conn->sioc = NULL;
-if (conn->do_retry) {
+if (conn->do_retry && !conn->detached) {
+qemu_mutex_unlock(>mutex);
+
 sleep(timeout);
 if (timeout < max_timeout) {
 timeout *= 2;
 }
+
+qemu_mutex_lock(>mutex);
 continue;
 }
 }
@@ -186,15 +196,17 @@ static void *connect_thread_func(void *opaque)
 break;
 }
 
-WITH_QEMU_LOCK_GUARD(>mutex) {
-assert(conn->running);
-conn->running = false;
-if (conn->wait_co) {
-aio_co_schedule(NULL, conn->wait_co);
-conn->wait_co = NULL;
-}
-do_free = conn->detached;
+/* mutex is locked */
+
+assert(conn->running);
+conn->running = false;
+if (conn->wait_co) {
+aio_co_schedule(NULL, conn->wait_co);
+conn->wait_co = NULL;
 }
+do_free = conn->detached;
+
+qemu_mutex_unlock(>mutex);
 
 if (do_free) {
 nbd_client_connection_do_free(conn);
@@ -215,6 +227,10 @@ void nbd_client_connection_release(NBDClientConnection 
*conn)
 if (conn->running) {
 conn->detached = true;
 }
+if (conn->sioc) {
+qio_channel_shutdown(QIO_CHANNEL(conn->sioc),
+ QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+}
 do_free = !conn->running && !conn->detached;
 qemu_mutex_unlock(>mutex);
 
-- 
2.29.2

[PATCH v3 14/33] nbd: move connection code from block/nbd to nbd/client-connection

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

We now have bs-independent connection API, which consists of four
functions:

  nbd_client_connection_new()
  nbd_client_connection_unref()
  nbd_co_establish_connection()
  nbd_co_establish_connection_cancel()

Move them to a separate file together with NBDClientConnection
structure which becomes private to the new API.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h |  11 +++
 block/nbd.c | 187 ---
 nbd/client-connection.c | 212 
 nbd/meson.build |   1 +
 4 files changed, 224 insertions(+), 187 deletions(-)
 create mode 100644 nbd/client-connection.c

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 5f34d23bb0..57381be76f 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -406,4 +406,15 @@ const char *nbd_info_lookup(uint16_t info);
 const char *nbd_cmd_lookup(uint16_t info);
 const char *nbd_err_lookup(int err);
 
+/* nbd/client-connection.c */
+typedef struct NBDClientConnection NBDClientConnection;
+
+NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr);
+void nbd_client_connection_release(NBDClientConnection *conn);
+
+QIOChannelSocket *coroutine_fn
+nbd_co_establish_connection(NBDClientConnection *conn, Error **errp);
+
+void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection 
*conn);
+
 #endif
diff --git a/block/nbd.c b/block/nbd.c
index 8531d019b2..9bd68dcf10 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -66,24 +66,6 @@ typedef enum NBDClientState {
 NBD_CLIENT_QUIT
 } NBDClientState;
 
-typedef struct NBDClientConnection {
-/* Initialization constants */
-SocketAddress *saddr; /* address to connect to */
-
-/*
- * Result of last attempt. Valid in FAIL and SUCCESS states.
- * If you want to steal error, don't forget to set pointer to NULL.
- */
-QIOChannelSocket *sioc;
-Error *err;
-
-QemuMutex mutex;
-/* All further fields are protected by mutex */
-bool running; /* thread is running now */
-bool detached; /* thread is detached and should cleanup the state */
-Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
-} NBDClientConnection;
-
 typedef struct BDRVNBDState {
 QIOChannelSocket *sioc; /* The master data channel */
 QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
@@ -118,12 +100,8 @@ typedef struct BDRVNBDState {
 NBDClientConnection *conn;
 } BDRVNBDState;
 
-static void nbd_client_connection_release(NBDClientConnection *conn);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
-static coroutine_fn QIOChannelSocket *
-nbd_co_establish_connection(NBDClientConnection *conn, Error **errp);
-static void nbd_co_establish_connection_cancel(NBDClientConnection *conn);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
@@ -340,171 +318,6 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT;
 }
 
-static NBDClientConnection *
-nbd_client_connection_new(const SocketAddress *saddr)
-{
-NBDClientConnection *conn = g_new(NBDClientConnection, 1);
-
-*conn = (NBDClientConnection) {
-.saddr = QAPI_CLONE(SocketAddress, saddr),
-};
-
-qemu_mutex_init(>mutex);
-
-return conn;
-}
-
-static void nbd_client_connection_do_free(NBDClientConnection *conn)
-{
-if (conn->sioc) {
-qio_channel_close(QIO_CHANNEL(conn->sioc), NULL);
-object_unref(OBJECT(conn->sioc));
-}
-error_free(conn->err);
-qapi_free_SocketAddress(conn->saddr);
-g_free(conn);
-}
-
-static void *connect_thread_func(void *opaque)
-{
-NBDClientConnection *conn = opaque;
-bool do_free;
-int ret;
-
-conn->sioc = qio_channel_socket_new();
-
-error_free(conn->err);
-conn->err = NULL;
-ret = qio_channel_socket_connect_sync(conn->sioc, conn->saddr, >err);
-if (ret < 0) {
-object_unref(OBJECT(conn->sioc));
-conn->sioc = NULL;
-}
-
-qemu_mutex_lock(>mutex);
-
-assert(conn->running);
-conn->running = false;
-if (conn->wait_co) {
-aio_co_schedule(NULL, conn->wait_co);
-conn->wait_co = NULL;
-}
-do_free = conn->detached;
-
-qemu_mutex_unlock(>mutex);
-
-if (do_free) {
-nbd_client_connection_do_free(conn);
-}
-
-return NULL;
-}
-
-static void nbd_client_connection_release(NBDClientConnection *conn)
-{
-bool do_free;
-
-if (!conn) {
-return;
-}
-
-qemu_mutex_lock(>mutex);
-if (conn->running) {
-conn->detached = true;
-}
-do_free = !conn->running && !conn->detached;
-qemu_mutex_unlock(>mutex);
-
-if (do_free) {
-nbd_client_connection_do_free(conn);
-}
-}
-
-/*
- * Get a new connection in context of @conn:
- *   if thread is

[PATCH v3 15/33] nbd/client-connection: use QEMU_LOCK_GUARD

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 nbd/client-connection.c | 94 ++---
 1 file changed, 42 insertions(+), 52 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 4e39a5b1af..b45a0bd5f6 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -87,17 +87,16 @@ static void *connect_thread_func(void *opaque)
 conn->sioc = NULL;
 }
 
-qemu_mutex_lock(>mutex);
-
-assert(conn->running);
-conn->running = false;
-if (conn->wait_co) {
-aio_co_schedule(NULL, conn->wait_co);
-conn->wait_co = NULL;
+WITH_QEMU_LOCK_GUARD(>mutex) {
+assert(conn->running);
+conn->running = false;
+if (conn->wait_co) {
+aio_co_schedule(NULL, conn->wait_co);
+conn->wait_co = NULL;
+}
 }
 do_free = conn->detached;
 
-qemu_mutex_unlock(>mutex);
 
 if (do_free) {
 nbd_client_connection_do_free(conn);
@@ -136,61 +135,54 @@ void nbd_client_connection_release(NBDClientConnection 
*conn)
 QIOChannelSocket *coroutine_fn
 nbd_co_establish_connection(NBDClientConnection *conn, Error **errp)
 {
-QIOChannelSocket *sioc = NULL;
 QemuThread thread;
 
-qemu_mutex_lock(>mutex);
-
-/*
- * Don't call nbd_co_establish_connection() in several coroutines in
- * parallel. Only one call at once is supported.
- */
-assert(!conn->wait_co);
-
-if (!conn->running) {
-if (conn->sioc) {
-/* Previous attempt finally succeeded in background */
-sioc = g_steal_pointer(>sioc);
-qemu_mutex_unlock(>mutex);
-
-return sioc;
+WITH_QEMU_LOCK_GUARD(>mutex) {
+/*
+ * Don't call nbd_co_establish_connection() in several coroutines in
+ * parallel. Only one call at once is supported.
+ */
+assert(!conn->wait_co);
+
+if (!conn->running) {
+if (conn->sioc) {
+/* Previous attempt finally succeeded in background */
+return g_steal_pointer(>sioc);
+}
+
+conn->running = true;
+error_free(conn->err);
+conn->err = NULL;
+qemu_thread_create(, "nbd-connect",
+   connect_thread_func, conn, 
QEMU_THREAD_DETACHED);
 }
 
-conn->running = true;
-error_free(conn->err);
-conn->err = NULL;
-qemu_thread_create(, "nbd-connect",
-   connect_thread_func, conn, QEMU_THREAD_DETACHED);
+conn->wait_co = qemu_coroutine_self();
 }
 
-conn->wait_co = qemu_coroutine_self();
-
-qemu_mutex_unlock(>mutex);
-
 /*
  * We are going to wait for connect-thread finish, but
  * nbd_co_establish_connection_cancel() can interrupt.
  */
 qemu_coroutine_yield();
 
-qemu_mutex_lock(>mutex);
-
-if (conn->running) {
-/*
- * Obviously, drained section wants to start. Report the attempt as
- * failed. Still connect thread is executing in background, and its
- * result may be used for next connection attempt.
- */
-error_setg(errp, "Connection attempt cancelled by other operation");
-} else {
-error_propagate(errp, conn->err);
-conn->err = NULL;
-sioc = g_steal_pointer(>sioc);
+WITH_QEMU_LOCK_GUARD(>mutex) {
+if (conn->running) {
+/*
+ * Obviously, drained section wants to start. Report the attempt as
+ * failed. Still connect thread is executing in background, and its
+ * result may be used for next connection attempt.
+ */
+error_setg(errp, "Connection attempt cancelled by other 
operation");
+return NULL;
+} else {
+error_propagate(errp, conn->err);
+conn->err = NULL;
+return g_steal_pointer(>sioc);
+}
 }
 
-qemu_mutex_unlock(>mutex);
-
-return sioc;
+abort(); /* unreachable */
 }
 
 /*
@@ -201,12 +193,10 @@ nbd_co_establish_connection(NBDClientConnection *conn, 
Error **errp)
  */
 void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection *conn)
 {
-qemu_mutex_lock(>mutex);
+QEMU_LOCK_GUARD(>mutex);
 
 if (conn->wait_co) {
 aio_co_schedule(NULL, conn->wait_co);
 conn->wait_co = NULL;
 }
-
-qemu_mutex_unlock(>mutex);
 }
-- 
2.29.2

[PATCH v3 23/33] block/nbd: nbd_teardown_connection() don't touch s->sioc

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Negotiation during reconnect is now done in thread, and s->sioc is not
available during negotiation. Negotiation in thread will be cancelled
by nbd_client_connection_release() called from nbd_clear_bdrvstate().
So, we don't need this code chunk anymore.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index ec69a4ad65..cece53313c 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -284,10 +284,6 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 if (s->ioc) {
 /* finish any pending coroutines */
 qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
-} else if (s->sioc) {
-/* abort negotiation */
-qio_channel_shutdown(QIO_CHANNEL(s->sioc), QIO_CHANNEL_SHUTDOWN_BOTH,
- NULL);
 }
 
 s->state = NBD_CLIENT_QUIT;
-- 
2.29.2

[PATCH v3 12/33] block/nbd: introduce nbd_client_connection_new()

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

This is the last step of creating bs-independent nbd connection
interface. With next commit we can finally move it to separate file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Roman Kagan 
---
 block/nbd.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 9ce6a323eb..21a4039359 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -351,15 +351,18 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT;
 }
 
-static void nbd_init_connect_thread(BDRVNBDState *s)
+static NBDClientConnection *
+nbd_client_connection_new(const SocketAddress *saddr)
 {
-s->conn = g_new(NBDClientConnection, 1);
+NBDClientConnection *conn = g_new(NBDClientConnection, 1);
 
-*s->conn = (NBDClientConnection) {
-.saddr = QAPI_CLONE(SocketAddress, s->saddr),
+*conn = (NBDClientConnection) {
+.saddr = QAPI_CLONE(SocketAddress, saddr),
 };
 
-qemu_mutex_init(>conn->mutex);
+qemu_mutex_init(>mutex);
+
+return conn;
 }
 
 static void nbd_free_connect_thread(NBDClientConnection *conn)
@@ -2208,7 +2211,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 goto fail;
 }
 
-nbd_init_connect_thread(s);
+s->conn = nbd_client_connection_new(s->saddr);
 
 /*
  * establish TCP connection, return error if it fails
-- 
2.29.2

[PATCH v3 10/33] block/nbd: make nbd_co_establish_connection_cancel() bs-independent

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

nbd_co_establish_connection_cancel() actually needs only pointer to
NBDConnectThread. So, make it clean.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Roman Kagan 
---
 block/nbd.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index dd97ea0916..dab73bdf3b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -123,7 +123,7 @@ static int nbd_establish_connection(BlockDriverState *bs, 
SocketAddress *saddr,
 Error **errp);
 static coroutine_fn QIOChannelSocket *
 nbd_co_establish_connection(NBDConnectThread *thr, Error **errp);
-static void nbd_co_establish_connection_cancel(BlockDriverState *bs);
+static void nbd_co_establish_connection_cancel(NBDConnectThread *thr);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
@@ -286,7 +286,7 @@ static void coroutine_fn 
nbd_client_co_drain_begin(BlockDriverState *bs)
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
 
-nbd_co_establish_connection_cancel(bs);
+nbd_co_establish_connection_cancel(s->connect_thread);
 
 reconnect_delay_timer_del(s);
 
@@ -326,7 +326,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 if (s->connection_co_sleep_ns_state) {
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
-nbd_co_establish_connection_cancel(bs);
+nbd_co_establish_connection_cancel(s->connect_thread);
 }
 if (qemu_in_coroutine()) {
 s->teardown_co = qemu_coroutine_self();
@@ -477,14 +477,12 @@ nbd_co_establish_connection(NBDConnectThread *thr, Error 
**errp)
 
 /*
  * nbd_co_establish_connection_cancel
- * Cancel nbd_co_establish_connection asynchronously: it will finish soon, to
- * allow drained section to begin.
+ * Cancel nbd_co_establish_connection() asynchronously. Note, that it doesn't
+ * stop the thread itself neither close the socket. It just safely wakes
+ * nbd_co_establish_connection() sleeping in the yield().
  */
-static void nbd_co_establish_connection_cancel(BlockDriverState *bs)
+static void nbd_co_establish_connection_cancel(NBDConnectThread *thr)
 {
-BDRVNBDState *s = bs->opaque;
-NBDConnectThread *thr = s->connect_thread;
-
 qemu_mutex_lock(>mutex);
 
 if (thr->wait_co) {
-- 
2.29.2

[PATCH v3 09/33] block/nbd: bs-independent interface for nbd_co_establish_connection()

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

We are going to split connection code to separate file. Now we are
ready to give nbd_co_establish_connection() clean and bs-independent
interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Roman Kagan 
---
 block/nbd.c | 49 +++--
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2b26a033a4..dd97ea0916 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -121,7 +121,8 @@ typedef struct BDRVNBDState {
 static void nbd_free_connect_thread(NBDConnectThread *thr);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
-static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp);
+static coroutine_fn QIOChannelSocket *
+nbd_co_establish_connection(NBDConnectThread *thr, Error **errp);
 static void nbd_co_establish_connection_cancel(BlockDriverState *bs);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
@@ -407,22 +408,36 @@ static void *connect_thread_func(void *opaque)
 return NULL;
 }
 
-static int coroutine_fn
-nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
+/*
+ * Get a new connection in context of @thr:
+ *   if thread is running, wait for completion
+ *   if thread is already succeeded in background, and user didn't get the
+ * result, just return it now
+ *   otherwise if thread is not running, start a thread and wait for completion
+ */
+static coroutine_fn QIOChannelSocket *
+nbd_co_establish_connection(NBDConnectThread *thr, Error **errp)
 {
+QIOChannelSocket *sioc = NULL;
 QemuThread thread;
-BDRVNBDState *s = bs->opaque;
-NBDConnectThread *thr = s->connect_thread;
-
-assert(!s->sioc);
 
 qemu_mutex_lock(>mutex);
 
+/*
+ * Don't call nbd_co_establish_connection() in several coroutines in
+ * parallel. Only one call at once is supported.
+ */
+assert(!thr->wait_co);
+
 if (!thr->running) {
 if (thr->sioc) {
 /* Previous attempt finally succeeded in background */
-goto out;
+sioc = g_steal_pointer(>sioc);
+qemu_mutex_unlock(>mutex);
+
+return sioc;
 }
+
 thr->running = true;
 error_free(thr->err);
 thr->err = NULL;
@@ -436,13 +451,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 
 /*
  * We are going to wait for connect-thread finish, but
- * nbd_client_co_drain_begin() can interrupt.
+ * nbd_co_establish_connection_cancel() can interrupt.
  */
 qemu_coroutine_yield();
 
 qemu_mutex_lock(>mutex);
 
-out:
 if (thr->running) {
 /*
  * Obviously, drained section wants to start. Report the attempt as
@@ -453,17 +467,12 @@ out:
 } else {
 error_propagate(errp, thr->err);
 thr->err = NULL;
-s->sioc = thr->sioc;
-thr->sioc = NULL;
-if (s->sioc) {
-yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
-   nbd_yank, bs);
-}
+sioc = g_steal_pointer(>sioc);
 }
 
 qemu_mutex_unlock(>mutex);
 
-return s->sioc ? 0 : -1;
+return sioc;
 }
 
 /*
@@ -530,11 +539,15 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 s->ioc = NULL;
 }
 
-if (nbd_co_establish_connection(s->bs, NULL) < 0) {
+s->sioc = nbd_co_establish_connection(s->connect_thread, NULL);
+if (!s->sioc) {
 ret = -ECONNREFUSED;
 goto out;
 }
 
+yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank,
+   s->bs);
+
 bdrv_dec_in_flight(s->bs);
 
 ret = nbd_client_handshake(s->bs, NULL);
-- 
2.29.2

[PATCH v3 08/33] block/nbd: drop thr->state

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

We don't need all these states. The code refactored to use two boolean
variables looks simpler.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 125 ++--
 1 file changed, 34 insertions(+), 91 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index e1f39eda6c..2b26a033a4 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -66,24 +66,6 @@ typedef enum NBDClientState {
 NBD_CLIENT_QUIT
 } NBDClientState;
 
-typedef enum NBDConnectThreadState {
-/* No thread, no pending results */
-CONNECT_THREAD_NONE,
-
-/* Thread is running, no results for now */
-CONNECT_THREAD_RUNNING,
-
-/*
- * Thread is running, but requestor exited. Thread should close
- * the new socket and free the connect state on exit.
- */
-CONNECT_THREAD_RUNNING_DETACHED,
-
-/* Thread finished, results are stored in a state */
-CONNECT_THREAD_FAIL,
-CONNECT_THREAD_SUCCESS
-} NBDConnectThreadState;
-
 typedef struct NBDConnectThread {
 /* Initialization constants */
 SocketAddress *saddr; /* address to connect to */
@@ -97,7 +79,8 @@ typedef struct NBDConnectThread {
 
 QemuMutex mutex;
 /* All further fields are protected by mutex */
-NBDConnectThreadState state; /* current state of the thread */
+bool running; /* thread is running now */
+bool detached; /* thread is detached and should cleanup the state */
 Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
 } NBDConnectThread;
 
@@ -147,17 +130,17 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 NBDConnectThread *thr = s->connect_thread;
-bool thr_running;
+bool do_free;
 
 qemu_mutex_lock(>mutex);
-thr_running = thr->state == CONNECT_THREAD_RUNNING;
-if (thr_running) {
-thr->state = CONNECT_THREAD_RUNNING_DETACHED;
+if (thr->running) {
+thr->detached = true;
 }
+do_free = !thr->running && !thr->detached;
 qemu_mutex_unlock(>mutex);
 
 /* the runaway thread will clean it up itself */
-if (!thr_running) {
+if (do_free) {
 nbd_free_connect_thread(thr);
 }
 
@@ -373,7 +356,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
 
 *s->connect_thread = (NBDConnectThread) {
 .saddr = QAPI_CLONE(SocketAddress, s->saddr),
-.state = CONNECT_THREAD_NONE,
 };
 
 qemu_mutex_init(>connect_thread->mutex);
@@ -408,20 +390,13 @@ static void *connect_thread_func(void *opaque)
 
 qemu_mutex_lock(>mutex);
 
-switch (thr->state) {
-case CONNECT_THREAD_RUNNING:
-thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS;
-if (thr->wait_co) {
-aio_co_schedule(NULL, thr->wait_co);
-thr->wait_co = NULL;
-}
-break;
-case CONNECT_THREAD_RUNNING_DETACHED:
-do_free = true;
-break;
-default:
-abort();
+assert(thr->running);
+thr->running = false;
+if (thr->wait_co) {
+aio_co_schedule(NULL, thr->wait_co);
+thr->wait_co = NULL;
 }
+do_free = thr->detached;
 
 qemu_mutex_unlock(>mutex);
 
@@ -435,36 +410,24 @@ static void *connect_thread_func(void *opaque)
 static int coroutine_fn
 nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
 {
-int ret;
 QemuThread thread;
 BDRVNBDState *s = bs->opaque;
 NBDConnectThread *thr = s->connect_thread;
 
+assert(!s->sioc);
+
 qemu_mutex_lock(>mutex);
 
-switch (thr->state) {
-case CONNECT_THREAD_FAIL:
-case CONNECT_THREAD_NONE:
+if (!thr->running) {
+if (thr->sioc) {
+/* Previous attempt finally succeeded in background */
+goto out;
+}
+thr->running = true;
 error_free(thr->err);
 thr->err = NULL;
-thr->state = CONNECT_THREAD_RUNNING;
 qemu_thread_create(, "nbd-connect",
connect_thread_func, thr, QEMU_THREAD_DETACHED);
-break;
-case CONNECT_THREAD_SUCCESS:
-/* Previous attempt finally succeeded in background */
-thr->state = CONNECT_THREAD_NONE;
-s->sioc = thr->sioc;
-thr->sioc = NULL;
-yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
-   nbd_yank, bs);
-qemu_mutex_unlock(>mutex);
-return 0;
-case CONNECT_THREAD_RUNNING:
-/* Already running, will wait */
-break;
-default:
-abort();
 }
 
 thr->wait_co = qemu_coroutine_self();
@@ -479,10 +442,15 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 
 qemu_mutex_lock(>mutex);
 
-switch (thr->state) {
-case CONNECT_THREAD_SUCCESS:
-case CONNECT_THREAD_FAIL:
-thr->state = CONNECT_THREAD_NONE;
+out:
+if (thr->running) {
+/*
+ * Obviously, drained section wants to start. Report the attempt as
+ *

[PATCH v3 06/33] util/async: aio_co_schedule(): support reschedule in same ctx

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

With the following patch we want to call wake coroutine from thread.
And it doesn't work with aio_co_wake:
Assume we have no iothreads.
Assume we have a coroutine A, which waits in the yield point for
external aio_co_wake(), and no progress can be done until it happen.
Main thread is in blocking aio_poll() (for example, in blk_read()).

Now, in a separate thread we do aio_co_wake(). It calls  aio_co_enter(),
which goes through last "else" branch and do aio_context_acquire(ctx).

Now we have a deadlock, as aio_poll() will not release the context lock
until some progress is done, and progress can't be done until
aio_co_wake() wake the coroutine A. And it can't because it wait for
aio_context_acquire().

Still, aio_co_schedule() works well in parallel with blocking
aio_poll(). So we want use it. Let's add a possibility of rescheduling
coroutine in same ctx where it was yield'ed.

Fetch co->ctx in same way as it is done in aio_co_wake().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/aio.h | 2 +-
 util/async.c| 8 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 5f342267d5..744b695525 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -643,7 +643,7 @@ static inline bool aio_node_check(AioContext *ctx, bool 
is_external)
 
 /**
  * aio_co_schedule:
- * @ctx: the aio context
+ * @ctx: the aio context, if NULL, the current ctx of @co will be used.
  * @co: the coroutine
  *
  * Start a coroutine on a remote AioContext.
diff --git a/util/async.c b/util/async.c
index 674dbefb7c..750be555c6 100644
--- a/util/async.c
+++ b/util/async.c
@@ -545,6 +545,14 @@ fail:
 
 void aio_co_schedule(AioContext *ctx, Coroutine *co)
 {
+if (!ctx) {
+/*
+ * Read coroutine before co->ctx.  Matches smp_wmb in
+ * qemu_coroutine_enter.
+ */
+smp_read_barrier_depends();
+ctx = qatomic_read(>ctx);
+}
 trace_aio_co_schedule(ctx, co);
 const char *scheduled = qatomic_cmpxchg(>scheduled, NULL,
__func__);
-- 
2.29.2

[PATCH v3 07/33] block/nbd: simplify waking of nbd_co_establish_connection()

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Instead of connect_bh, bh_ctx and wait_connect fields we can live with
only one link to waiting coroutine, protected by mutex.

So new logic is:

nbd_co_establish_connection() sets wait_co under mutex, release the
mutex and do yield(). Note, that wait_co may be scheduled by thread
immediately after unlocking the mutex. Still, in main thread (or
iothread) we'll not reach the code for entering the coroutine until the
yield() so we are safe.

Both connect_thread_func() and nbd_co_establish_connection_cancel() do
the following to handle wait_co:

Under mutex, if thr->wait_co is not NULL, call aio_co_wake() (which
never tries to acquire aio context since previous commit, so we are
safe to do it under thr->mutex) and set thr->wait_co to NULL.
This way we protect ourselves of scheduling it twice.

Also this commit make nbd_co_establish_connection() less connected to
bs (we have generic pointer to the coroutine, not use s->connection_co
directly). So, we are on the way of splitting connection API out of
nbd.c (which is overcomplicated now).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 49 +
 1 file changed, 9 insertions(+), 40 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index d67556c7ee..e1f39eda6c 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -87,12 +87,6 @@ typedef enum NBDConnectThreadState {
 typedef struct NBDConnectThread {
 /* Initialization constants */
 SocketAddress *saddr; /* address to connect to */
-/*
- * Bottom half to schedule on completion. Scheduled only if bh_ctx is not
- * NULL
- */
-QEMUBHFunc *bh_func;
-void *bh_opaque;
 
 /*
  * Result of last attempt. Valid in FAIL and SUCCESS states.
@@ -101,10 +95,10 @@ typedef struct NBDConnectThread {
 QIOChannelSocket *sioc;
 Error *err;
 
-/* state and bh_ctx are protected by mutex */
 QemuMutex mutex;
+/* All further fields are protected by mutex */
 NBDConnectThreadState state; /* current state of the thread */
-AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */
+Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
 } NBDConnectThread;
 
 typedef struct BDRVNBDState {
@@ -138,7 +132,6 @@ typedef struct BDRVNBDState {
 char *x_dirty_bitmap;
 bool alloc_depth;
 
-bool wait_connect;
 NBDConnectThread *connect_thread;
 } BDRVNBDState;
 
@@ -374,15 +367,6 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT;
 }
 
-static void connect_bh(void *opaque)
-{
-BDRVNBDState *state = opaque;
-
-assert(state->wait_connect);
-state->wait_connect = false;
-aio_co_wake(state->connection_co);
-}
-
 static void nbd_init_connect_thread(BDRVNBDState *s)
 {
 s->connect_thread = g_new(NBDConnectThread, 1);
@@ -390,8 +374,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
 *s->connect_thread = (NBDConnectThread) {
 .saddr = QAPI_CLONE(SocketAddress, s->saddr),
 .state = CONNECT_THREAD_NONE,
-.bh_func = connect_bh,
-.bh_opaque = s,
 };
 
 qemu_mutex_init(>connect_thread->mutex);
@@ -429,11 +411,9 @@ static void *connect_thread_func(void *opaque)
 switch (thr->state) {
 case CONNECT_THREAD_RUNNING:
 thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS;
-if (thr->bh_ctx) {
-aio_bh_schedule_oneshot(thr->bh_ctx, thr->bh_func, thr->bh_opaque);
-
-/* play safe, don't reuse bh_ctx on further connection attempts */
-thr->bh_ctx = NULL;
+if (thr->wait_co) {
+aio_co_schedule(NULL, thr->wait_co);
+thr->wait_co = NULL;
 }
 break;
 case CONNECT_THREAD_RUNNING_DETACHED:
@@ -487,20 +467,14 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 abort();
 }
 
-thr->bh_ctx = qemu_get_current_aio_context();
+thr->wait_co = qemu_coroutine_self();
 
 qemu_mutex_unlock(>mutex);
 
-
 /*
  * We are going to wait for connect-thread finish, but
  * nbd_client_co_drain_begin() can interrupt.
- *
- * Note that wait_connect variable is not visible for connect-thread. It
- * doesn't need mutex protection, it used only inside home aio context of
- * bs.
  */
-s->wait_connect = true;
 qemu_coroutine_yield();
 
 qemu_mutex_lock(>mutex);
@@ -555,24 +529,19 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs)
 {
 BDRVNBDState *s = bs->opaque;
 NBDConnectThread *thr = s->connect_thread;
-bool wake = false;
 
 qemu_mutex_lock(>mutex);
 
 if (thr->state == CONNECT_THREAD_RUNNING) {
 /* We can cancel only in running state, when bh is not yet scheduled */
-thr->bh_ctx = NULL;
-if (s->wait_connect) {
-s->wait_connect = false;
-wake = true;
+if (thr->wait_co) {
+

[PATCH v3 02/33] block/nbd: fix how state is cleared on nbd_open() failure paths

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

We have two "return error" paths in nbd_open() after
nbd_process_options(). Actually we should call nbd_clear_bdrvstate()
on these paths. Interesting that nbd_process_options() calls
nbd_clear_bdrvstate() by itself.

Let's fix leaks and refactor things to be more obvious:

- intialize yank at top of nbd_open()
- move yank cleanup to nbd_clear_bdrvstate()
- refactor nbd_open() so that all failure paths except for
  yank-register goes through nbd_clear_bdrvstate()

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 739ae2941f..a407a3814b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -152,8 +152,12 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
-static void nbd_clear_bdrvstate(BDRVNBDState *s)
+static void nbd_clear_bdrvstate(BlockDriverState *bs)
 {
+BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+
+yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
+
 object_unref(OBJECT(s->tlscreds));
 qapi_free_SocketAddress(s->saddr);
 s->saddr = NULL;
@@ -2279,9 +2283,6 @@ static int nbd_process_options(BlockDriverState *bs, 
QDict *options,
 ret = 0;
 
  error:
-if (ret < 0) {
-nbd_clear_bdrvstate(s);
-}
 qemu_opts_del(opts);
 return ret;
 }
@@ -2292,11 +2293,6 @@ static int nbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 int ret;
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
-ret = nbd_process_options(bs, options, errp);
-if (ret < 0) {
-return ret;
-}
-
 s->bs = bs;
 qemu_co_mutex_init(>send_mutex);
 qemu_co_queue_init(>free_sema);
@@ -2305,20 +2301,23 @@ static int nbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 return -EEXIST;
 }
 
+ret = nbd_process_options(bs, options, errp);
+if (ret < 0) {
+goto fail;
+}
+
 /*
  * establish TCP connection, return error if it fails
  * TODO: Configurable retry-until-timeout behaviour.
  */
 if (nbd_establish_connection(bs, s->saddr, errp) < 0) {
-yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
-return -ECONNREFUSED;
+ret = -ECONNREFUSED;
+goto fail;
 }
 
 ret = nbd_client_handshake(bs, errp);
 if (ret < 0) {
-yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
-nbd_clear_bdrvstate(s);
-return ret;
+goto fail;
 }
 /* successfully connected */
 s->state = NBD_CLIENT_CONNECTED;
@@ -2330,6 +2329,10 @@ static int nbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 aio_co_schedule(bdrv_get_aio_context(bs), s->connection_co);
 
 return 0;
+
+fail:
+nbd_clear_bdrvstate(bs);
+return ret;
 }
 
 static int nbd_co_flush(BlockDriverState *bs)
@@ -2373,11 +2376,8 @@ static void nbd_refresh_limits(BlockDriverState *bs, 
Error **errp)
 
 static void nbd_close(BlockDriverState *bs)
 {
-BDRVNBDState *s = bs->opaque;
-
 nbd_client_close(bs);
-yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
-nbd_clear_bdrvstate(s);
+nbd_clear_bdrvstate(bs);
 }
 
 /*
-- 
2.29.2

[PATCH v3 04/33] block/nbd: nbd_client_handshake(): fix leak of s->ioc

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 272af60b44..6efa11a185 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1891,6 +1891,8 @@ static int nbd_client_handshake(BlockDriverState *bs, 
Error **errp)
  nbd_yank, bs);
 object_unref(OBJECT(s->sioc));
 s->sioc = NULL;
+object_unref(OBJECT(s->ioc));
+s->ioc = NULL;
 
 return ret;
 }
-- 
2.29.2

[PATCH v3 03/33] block/nbd: ensure ->connection_thread is always valid

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

From: Roman Kagan 

Simplify lifetime management of BDRVNBDState->connect_thread by
delaying the possible cleanup of it until the BDRVNBDState itself goes
away.

This also reverts
 0267101af6 "block/nbd: fix possible use after free of s->connect_thread"
as now s->connect_thread can't be cleared until the very end.

Signed-off-by: Roman Kagan 
 [vsementsov: rebase, revert 0267101af6 changes]
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 56 -
 1 file changed, 21 insertions(+), 35 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a407a3814b..272af60b44 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -144,17 +144,31 @@ typedef struct BDRVNBDState {
 NBDConnectThread *connect_thread;
 } BDRVNBDState;
 
+static void nbd_free_connect_thread(NBDConnectThread *thr);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
 static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp);
-static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
-   bool detach);
+static void nbd_co_establish_connection_cancel(BlockDriverState *bs);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
 static void nbd_clear_bdrvstate(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
+NBDConnectThread *thr = s->connect_thread;
+bool thr_running;
+
+qemu_mutex_lock(>mutex);
+thr_running = thr->state == CONNECT_THREAD_RUNNING;
+if (thr_running) {
+thr->state = CONNECT_THREAD_RUNNING_DETACHED;
+}
+qemu_mutex_unlock(>mutex);
+
+/* the runaway thread will clean it up itself */
+if (!thr_running) {
+nbd_free_connect_thread(thr);
+}
 
 yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
 
@@ -297,7 +311,7 @@ static void coroutine_fn 
nbd_client_co_drain_begin(BlockDriverState *bs)
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
 
-nbd_co_establish_connection_cancel(bs, false);
+nbd_co_establish_connection_cancel(bs);
 
 reconnect_delay_timer_del(s);
 
@@ -337,7 +351,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 if (s->connection_co_sleep_ns_state) {
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
-nbd_co_establish_connection_cancel(bs, true);
+nbd_co_establish_connection_cancel(bs);
 }
 if (qemu_in_coroutine()) {
 s->teardown_co = qemu_coroutine_self();
@@ -448,11 +462,6 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 BDRVNBDState *s = bs->opaque;
 NBDConnectThread *thr = s->connect_thread;
 
-if (!thr) {
-/* detached */
-return -1;
-}
-
 qemu_mutex_lock(>mutex);
 
 switch (thr->state) {
@@ -496,12 +505,6 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 s->wait_connect = true;
 qemu_coroutine_yield();
 
-if (!s->connect_thread) {
-/* detached */
-return -1;
-}
-assert(thr == s->connect_thread);
-
 qemu_mutex_lock(>mutex);
 
 switch (thr->state) {
@@ -549,18 +552,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
  * nbd_co_establish_connection_cancel
  * Cancel nbd_co_establish_connection asynchronously: it will finish soon, to
  * allow drained section to begin.
- *
- * If detach is true, also cleanup the state (or if thread is running, move it
- * to CONNECT_THREAD_RUNNING_DETACHED state). s->connect_thread becomes NULL if
- * detach is true.
  */
-static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
-   bool detach)
+static void nbd_co_establish_connection_cancel(BlockDriverState *bs)
 {
 BDRVNBDState *s = bs->opaque;
 NBDConnectThread *thr = s->connect_thread;
 bool wake = false;
-bool do_free = false;
 
 qemu_mutex_lock(>mutex);
 
@@ -571,21 +568,10 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
 s->wait_connect = false;
 wake = true;
 }
-if (detach) {
-thr->state = CONNECT_THREAD_RUNNING_DETACHED;
-s->connect_thread = NULL;
-}
-} else if (detach) {
-do_free = true;
 }
 
 qemu_mutex_unlock(>mutex);
 
-if (do_free) {
-nbd_free_connect_thread(thr);
-s->connect_thread = NULL;
-}
-
 if (wake) {
 aio_co_wake(s->connection_co);
 }
@@ -2306,6 +2292,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 goto fail;
 }
 
+nbd_init_connect_thread(s);
+
 /*
  * establish TCP connection, return error if it fails
  * TODO: Configurable retry-until-timeout behaviour.
@@ -2322,8 +2310,6 @@ static int nbd_open(BlockDriverState *bs, QDict

[PATCH v3 05/33] block/nbd: BDRVNBDState: drop unused connect_err and connect_status

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

These fields are write-only. Drop them.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Roman Kagan 
---
 block/nbd.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 6efa11a185..d67556c7ee 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -121,8 +121,6 @@ typedef struct BDRVNBDState {
 bool wait_drained_end;
 int in_flight;
 NBDClientState state;
-int connect_status;
-Error *connect_err;
 bool wait_in_flight;
 
 QEMUTimer *reconnect_delay_timer;
@@ -580,7 +578,6 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs)
 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
 int ret;
-Error *local_err = NULL;
 
 if (!nbd_client_connecting(s)) {
 return;
@@ -621,14 +618,14 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 s->ioc = NULL;
 }
 
-if (nbd_co_establish_connection(s->bs, _err) < 0) {
+if (nbd_co_establish_connection(s->bs, NULL) < 0) {
 ret = -ECONNREFUSED;
 goto out;
 }
 
 bdrv_dec_in_flight(s->bs);
 
-ret = nbd_client_handshake(s->bs, _err);
+ret = nbd_client_handshake(s->bs, NULL);
 
 if (s->drained) {
 s->wait_drained_end = true;
@@ -643,11 +640,6 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 bdrv_inc_in_flight(s->bs);
 
 out:
-s->connect_status = ret;
-error_free(s->connect_err);
-s->connect_err = NULL;
-error_propagate(>connect_err, local_err);
-
 if (ret >= 0) {
 /* successfully connected */
 s->state = NBD_CLIENT_CONNECTED;
-- 
2.29.2

[PATCH v3 01/33] block/nbd: fix channel object leak

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

From: Roman Kagan 

nbd_free_connect_thread leaks the channel object if it hasn't been
stolen.

Unref it and fix the leak.

Signed-off-by: Roman Kagan 
---
 block/nbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nbd.c b/block/nbd.c
index 1d4668d42d..739ae2941f 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -385,6 +385,7 @@ static void nbd_free_connect_thread(NBDConnectThread *thr)
 {
 if (thr->sioc) {
 qio_channel_close(QIO_CHANNEL(thr->sioc), NULL);
+object_unref(OBJECT(thr->sioc));
 }
 error_free(thr->err);
 qapi_free_SocketAddress(thr->saddr);
-- 
2.29.2

[PATCH v3 00/33] block/nbd: rework client connection

2021-04-16 Thread Vladimir Sementsov-Ogievskiy

The series substitutes "[PATCH v2 00/10] block/nbd: move connection code to 
separate file"
Supersedes: <20210408140827.332915-1-vsement...@virtuozzo.com>
so it's called v3

block/nbd.c is overcomplicated. These series is a big refactoring, which
finally drops all the complications around drained sections and context
switching, including abuse of bs->in_flight counter.

Also, at the end of the series we don't cancel reconnect on drained
sections (and don't cancel requests waiting for reconnect on drained
section begin), which fixes a problem reported by Roman.

The series is also available at tag up-nbd-client-connection-v3 in
git https://src.openvz.org/scm/~vsementsov/qemu.git 

v3:
Changes in first part of the series (main thing is not using refcnt, but 
instead (modified) Roman's patch):

01-04: new
05: add Roman's r-b
06: new
07: now, new aio_co_schedule(NULL, thr->wait_co) is used
08: reworked, we now need also bool detached, as we don't have refcnt
09,10: add Roman's r-b
11: rebased, don't modify nbd_free_connect_thread() name at this point
12: add Roman's r-b
13: new
14: rebased

Other patches are new.

Roman Kagan (2):
  block/nbd: fix channel object leak
  block/nbd: ensure ->connection_thread is always valid

Vladimir Sementsov-Ogievskiy (31):
  block/nbd: fix how state is cleared on nbd_open() failure paths
  block/nbd: nbd_client_handshake(): fix leak of s->ioc
  block/nbd: BDRVNBDState: drop unused connect_err and connect_status
  util/async: aio_co_schedule(): support reschedule in same ctx
  block/nbd: simplify waking of nbd_co_establish_connection()
  block/nbd: drop thr->state
  block/nbd: bs-independent interface for nbd_co_establish_connection()
  block/nbd: make nbd_co_establish_connection_cancel() bs-independent
  block/nbd: rename NBDConnectThread to NBDClientConnection
  block/nbd: introduce nbd_client_connection_new()
  block/nbd: introduce nbd_client_connection_release()
  nbd: move connection code from block/nbd to nbd/client-connection
  nbd/client-connection: use QEMU_LOCK_GUARD
  nbd/client-connection: add possibility of negotiation
  nbd/client-connection: implement connection retry
  nbd/client-connection: shutdown connection on release
  block/nbd: split nbd_handle_updated_info out of nbd_client_handshake()
  block/nbd: use negotiation of NBDClientConnection
  qemu-socket: pass monitor link to socket_get_fd directly
  block/nbd: pass monitor directly to connection thread
  block/nbd: nbd_teardown_connection() don't touch s->sioc
  block/nbd: drop BDRVNBDState::sioc
  nbd/client-connection: return only one io channel
  block-coroutine-wrapper: allow non bdrv_ prefix
  block/nbd: split nbd_co_do_establish_connection out of
nbd_reconnect_attempt
  nbd/client-connection: do qio_channel_set_delay(false)
  nbd/client-connection: add option for non-blocking connection attempt
  block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()
  block/nbd: add nbd_clinent_connected() helper
  block/nbd: safer transition to receiving request
  block/nbd: drop connection_co

 block/coroutines.h |   6 +
 include/block/aio.h|   2 +-
 include/block/nbd.h|  19 +
 include/io/channel-socket.h|  20 +
 include/qemu/sockets.h |   2 +-
 block/nbd.c| 908 +++--
 io/channel-socket.c|  17 +-
 nbd/client-connection.c| 364 
 nbd/client.c   |   2 -
 tests/unit/test-util-sockets.c |  16 +-
 util/async.c   |   8 +
 util/qemu-sockets.c|  10 +-
 nbd/meson.build|   1 +
 scripts/block-coroutine-wrapper.py |   7 +-
 14 files changed, 666 insertions(+), 716 deletions(-)
 create mode 100644 nbd/client-connection.c

-- 
2.29.2

Re: about mirror cancel

2021-04-16 Thread Vladimir Sementsov-Ogievskiy


16.04.2021 10:11, Max Reitz wrote:

On 16.04.21 09:05, Max Reitz wrote:

On 15.04.21 20:46, Vladimir Sementsov-Ogievskiy wrote:


[...]


Note, that if cancelling all in-flight requests on target is wrong on mirror 
cancel, we still don't have real bug, as the only implementation of 
.bdrv_cancel_in_flight is stopping reconnect waiting in nbd driver. So, we'll 
cancel requests only if connection is already lost anyway.

But that probably means, that correct name of the handler would be 
.bdrv_cancel_in_fligth_requests_that_will_most_probably_fail_anyway()..


It’s documentation states that it should cancel all in-flight requests, so it’s 
more likely it just isn’t implemented where it could be.


Oh, I now see you added it in the same series.  Well, then I suppose you’re 
free to change the semantics as you see fit.

But be aware that even cancelling those requests means that you abandon the 
target.  So it must then fail instead of emitting the COMPLETED event (AFAIR 
the mirror job emits COMPLETED when cancelled in READY with force=false).


Note that BLOCK_JOB_COMPLETED can indicate failure, it has error field.. 
Interesting, does libvirt aware of it..



If the user wants the mirror job to create a consistent copy and so cancels it 
after READY (with force=false), I don’t know whether cancelling those hanging 
requests is what we want.  If the cancel hangs and the user sees this, they are 
still free to decide to cancel again with force=true, no?


Right. All right, I'll remake the feature to cancel requests only with 
force=true. Thanks for explanations, and great that I decided to ask.

--
Best regards,
Vladimir

[PATCH 1/2] hw/block/nvme: consider metadata read aio return value in compare

2021-04-16 Thread Gollu Appalanaidu

Currently in compare command metadata aio read blk_aio_preadv return
value ignored, consider it and complete the block accounting.

Signed-off-by: Gollu Appalanaidu 
---
 hw/block/nvme.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 624a1431d0..c2727540f1 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2369,10 +2369,19 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
 uint32_t reftag = le32_to_cpu(rw->reftag);
 struct nvme_compare_ctx *ctx = req->opaque;
 g_autofree uint8_t *buf = NULL;
+BlockBackend *blk = ns->blkconf.blk;
+BlockAcctCookie *acct = >acct;
+BlockAcctStats *stats = blk_get_stats(blk);
 uint16_t status = NVME_SUCCESS;
 
 trace_pci_nvme_compare_mdata_cb(nvme_cid(req));
 
+if (ret) {
+block_acct_failed(stats, acct);
+nvme_aio_err(req, ret);
+goto out;
+}
+
 buf = g_malloc(ctx->mdata.iov.size);
 
 status = nvme_bounce_mdata(n, buf, ctx->mdata.iov.size,
@@ -2421,6 +2430,8 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
 goto out;
 }
 
+block_acct_done(stats, acct);
+
 out:
 qemu_iovec_destroy(>data.iov);
 g_free(ctx->data.bounce);
-- 
2.17.1

[PATCH 2/2] hw/block/nvme: fix lbaf formats initialization

2021-04-16 Thread Gollu Appalanaidu

Currently LBAF formats are being intialized based on metadata
size if and only if nvme-ns "ms" parameter is non-zero value.
Since FormatNVM command being supported device parameter "ms"
may not be the criteria to initialize the supported LBAFs.

Signed-off-by: Gollu Appalanaidu 
---
 hw/block/nvme-ns.c | 48 ++
 1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7bb618f182..573dbb5a9d 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -85,38 +85,28 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
 ds = 31 - clz32(ns->blkconf.logical_block_size);
 ms = ns->params.ms;
 
-if (ns->params.ms) {
-id_ns->mc = 0x3;
+id_ns->mc = 0x3;
 
-if (ns->params.mset) {
-id_ns->flbas |= 0x10;
-}
+if (ms && ns->params.mset) {
+id_ns->flbas |= 0x10;
+}
 
-id_ns->dpc = 0x1f;
-id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;
-
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9   },
-[1] = { .ds =  9, .ms =  8 },
-[2] = { .ds =  9, .ms = 16 },
-[3] = { .ds =  9, .ms = 64 },
-[4] = { .ds = 12   },
-[5] = { .ds = 12, .ms =  8 },
-[6] = { .ds = 12, .ms = 16 },
-[7] = { .ds = 12, .ms = 64 },
-};
-
-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 7;
-} else {
-NvmeLBAF lbaf[16] = {
-[0] = { .ds =  9 },
-[1] = { .ds = 12 },
-};
+id_ns->dpc = 0x1f;
+id_ns->dps = ((ns->params.pil & 0x1) << 3) | ns->params.pi;
 
-memcpy(_ns->lbaf, , sizeof(lbaf));
-id_ns->nlbaf = 1;
-}
+NvmeLBAF lbaf[16] = {
+[0] = { .ds =  9   },
+[1] = { .ds =  9, .ms =  8 },
+[2] = { .ds =  9, .ms = 16 },
+[3] = { .ds =  9, .ms = 64 },
+[4] = { .ds = 12   },
+[5] = { .ds = 12, .ms =  8 },
+[6] = { .ds = 12, .ms = 16 },
+[7] = { .ds = 12, .ms = 64 },
+};
+
+memcpy(_ns->lbaf, , sizeof(lbaf));
+id_ns->nlbaf = 7;
 
 for (i = 0; i <= id_ns->nlbaf; i++) {
 NvmeLBAF *lbaf = _ns->lbaf[i];
-- 
2.17.1

Re: about mirror cancel

2021-04-16 Thread Max Reitz


On 16.04.21 09:05, Max Reitz wrote:

On 15.04.21 20:46, Vladimir Sementsov-Ogievskiy wrote:


[...]

Note, that if cancelling all in-flight requests on target is wrong on 
mirror cancel, we still don't have real bug, as the only 
implementation of .bdrv_cancel_in_flight is stopping reconnect waiting 
in nbd driver. So, we'll cancel requests only if connection is already 
lost anyway.


But that probably means, that correct name of the handler would be 
.bdrv_cancel_in_fligth_requests_that_will_most_probably_fail_anyway()..


It’s documentation states that it should cancel all in-flight requests, 
so it’s more likely it just isn’t implemented where it could be.


Oh, I now see you added it in the same series.  Well, then I suppose 
you’re free to change the semantics as you see fit.


But be aware that even cancelling those requests means that you abandon 
the target.  So it must then fail instead of emitting the COMPLETED 
event (AFAIR the mirror job emits COMPLETED when cancelled in READY with 
force=false).


If the user wants the mirror job to create a consistent copy and so 
cancels it after READY (with force=false), I don’t know whether 
cancelling those hanging requests is what we want.  If the cancel hangs 
and the user sees this, they are still free to decide to cancel again 
with force=true, no?


Max

Re: about mirror cancel

2021-04-16 Thread Max Reitz


On 15.04.21 20:46, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

Recently I've implemented fast-cancelling of mirror job: do 
bdrv_cancel_in_flight() in mirror_cancel().


Now I'm in doubt: is it a correct thing? I heard, that mirror-cancel is 
a kind of valid mirror completion..


Looking at documentation:

# Note that if you issue 'block-job-cancel' after 'drive-mirror' has 
indicated

# (via the event BLOCK_JOB_READY) that the source and destination are
# synchronized, then the event triggered by this command changes to
# BLOCK_JOB_COMPLETED, to indicate that the mirroring has ended and the
# destination now has a point-in-time copy tied to the time of the 
cancellation.


So, in other words, do we guarantee something to the user, if it calls 
mirror-cancel in ready state? Does this abuse exist in libvirt?


How is it abuse it if it’s documented?  AFAIR it does exist, because 
libvirt’s blockcopy always uses mirror (with --pivot it’s allowed to 
complete, without it is cancelled).


(And the point of course is that if you want mirror to create a copy 
without pivoting, you need this behavior, because otherwise the target 
may be in an inconsistent state.)





Note, that if cancelling all in-flight requests on target is wrong on 
mirror cancel, we still don't have real bug, as the only implementation 
of .bdrv_cancel_in_flight is stopping reconnect waiting in nbd driver. 
So, we'll cancel requests only if connection is already lost anyway.


But that probably means, that correct name of the handler would be 
.bdrv_cancel_in_fligth_requests_that_will_most_probably_fail_anyway()..


It’s documentation states that it should cancel all in-flight requests, 
so it’s more likely it just isn’t implemented where it could be.


And it also means, that abuse of mirror-cancel as valid completion is a 
bad idea, as we can't distinguish the cases when user calls job-cancel 
to have a kind of point-in-time copy, or just to cancel job (and being 
not interested in the final state of target).


So, probably we need an option boolean argument for blockjob-cancel, 
like "hard", that will cancel in-flight writes on target node..


There is @force.  See commit b76e4458b1eb3c3.

Max

48 matches

Mail list logo