date:20200930

Re: [PATCH v3 11/11] qapi: Restrict code generated for user-mode

2020-09-30 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> A lot of QAPI generated code is never used by user-mode.
>
> Split out qapi_system_modules and qapi_system_or_tools_modules
> from the qapi_all_modules array. We now have 3 groups:
> - always used
> - use by system-mode or tools (usually by the block layer)
> - only used by system-mode
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Resetting due to Meson update:
> Reviewed-by: Richard Henderson 
> ---
>  qapi/meson.build | 51 ++--
>  1 file changed, 36 insertions(+), 15 deletions(-)
>
> diff --git a/qapi/meson.build b/qapi/meson.build
> index 7c4a89a882..ba9677ba97 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -14,39 +14,60 @@ util_ss.add(files(
>  ))
>  
>  qapi_all_modules = [
> +  'common',
> +  'introspect',
> +  'misc',
> +]
> +
> +qapi_system_modules = [
>'acpi',
>'audio',
> +  'dump',
> +  'machine-target',
> +  'machine',
> +  'migration',
> +  'misc-target',
> +  'net',
> +  'pci',
> +  'qdev',
> +  'rdma',
> +  'rocker',
> +  'tpm',
> +  'trace',
> +]
> +
> +# system or tools
> +qapi_block_modules = [
>'authz',
>'block-core',
>'block',
>'char',
> -  'common',
>'control',
>'crypto',
> -  'dump',
>'error',
> -  'introspect',
>'job',
> -  'machine',
> -  'machine-target',
> -  'migration',
> -  'misc',
> -  'misc-target',
> -  'net',
>'pragma',
> -  'qdev',
> -  'pci',
>'qom',
> -  'rdma',
> -  'rocker',
>'run-state',
>'sockets',
> -  'tpm',
> -  'trace',
>'transaction',
>'ui',
>  ]

Most of these aren't "block modules".  Name the thing
qapi_system_or_tools_modules?

> +if have_system
> +  qapi_all_modules += qapi_system_modules
> +elif have_user
> +  # Temporary kludge because X86CPUFeatureWordInfo is not
> +  # restricted to system-mode. This should be removed (along
> +  # with target/i386/feature-stub.c) once target/i386/cpu.c
> +  # has been cleaned.
> +  qapi_all_modules += ['machine-target']
> +endif
> +
> +if have_block

Aha, precedence for using "block" as an abbreviation of "system or
tools".  I find that confusing.

> +  qapi_all_modules += qapi_block_modules
> +endif
> +
>  qapi_storage_daemon_modules = [
>'block-core',
>'char',

Re: [PATCH v3 01/11] qapi: Restrict query-uuid command to block code

2020-09-30 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> In commit f68c01470b we restricted the query-uuid command to
> machine code, but it is incorrect, as it is also used by the
> tools.  Therefore move this command again, but to block.json,
> which is shared by machine code and tools.
>
> Fixes: f68c01470b ("qapi: Restrict query-uuid command to machine code")
>
> Signed-off-by: Philippe Mathieu-Daudé 

UUIDs are not really a block-specific thing.

QMP query-uuid and HMP info uuid are about the VM, like query-name.
That's why they used to be next to query-name in misc.json.

There's one additional use in block/iscsi.c's get_initiator_name().  I
figure that's what pulls it into tools via qemu-img.

Which other QAPI modules are shared by all the executables that use it?

What about reverting the commit?  How bad would that be for user mode?

Re: [PATCH 4/9] hw/block/nvme: validate command set selected

2020-09-30 Thread Klaus Jensen

On Sep 30 15:04, Keith Busch wrote:
> Fail to start the controller if the user requests a command set that the
> controller does not support.
> 
> Signed-off-by: Keith Busch 

Reviewed-by: Klaus Jensen 

> ---
>  hw/block/nvme.c   | 6 +-
>  hw/block/trace-events | 1 +
>  include/block/nvme.h  | 4 
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 41389b2b09..6c582e6874 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -2049,6 +2049,10 @@ static int nvme_start_ctrl(NvmeCtrl *n)
>  trace_pci_nvme_err_startfail_acq_misaligned(n->bar.acq);
>  return -1;
>  }
> +if (unlikely(!(NVME_CAP_CSS(n->bar.cap) & (1 << 
> NVME_CC_CSS(n->bar.cc) {
> +trace_pci_nvme_err_startfail_css(NVME_CC_CSS(n->bar.cc));
> +return -1;
> +}
>  if (unlikely(NVME_CC_MPS(n->bar.cc) <
>   NVME_CAP_MPSMIN(n->bar.cap))) {
>  trace_pci_nvme_err_startfail_page_too_small(
> @@ -2750,7 +2754,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
>  NVME_CAP_SET_CQR(n->bar.cap, 1);
>  NVME_CAP_SET_TO(n->bar.cap, 0xf);
> -NVME_CAP_SET_CSS(n->bar.cap, 1);
> +NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_NVM);
>  NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
>  
>  n->bar.vs = NVME_SPEC_VER;
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 446cca08e9..7720e1b4d9 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -133,6 +133,7 @@ pci_nvme_err_startfail_cqent_too_small(uint8_t log2ps, 
> uint8_t maxlog2ps) "nvme_
>  pci_nvme_err_startfail_cqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) 
> "nvme_start_ctrl failed because the completion queue entry size is too large: 
> log2size=%u, max=%u"
>  pci_nvme_err_startfail_sqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) 
> "nvme_start_ctrl failed because the submission queue entry size is too small: 
> log2size=%u, min=%u"
>  pci_nvme_err_startfail_sqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) 
> "nvme_start_ctrl failed because the submission queue entry size is too large: 
> log2size=%u, max=%u"
> +pci_nvme_err_startfail_css(uint8_t css) "nvme_start_ctrl failed because 
> invalid command set selected:%u"
>  pci_nvme_err_startfail_asqent_sz_zero(void) "nvme_start_ctrl failed because 
> the admin submission queue size is zero"
>  pci_nvme_err_startfail_acqent_sz_zero(void) "nvme_start_ctrl failed because 
> the admin completion queue size is zero"
>  pci_nvme_err_startfail(void) "setting controller enable bit failed"
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 868cf53f0b..bc20a2ba5e 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -82,6 +82,10 @@ enum NvmeCapMask {
>  #define NVME_CAP_SET_PMRS(cap, val) (cap |= (uint64_t)(val & CAP_PMR_MASK)\
>  << CAP_PMR_SHIFT)
>  
> +enum NvmeCapCss {
> +NVME_CAP_CSS_NVM = 1 << 0,
> +};
> +
>  enum NvmeCcShift {
>  CC_EN_SHIFT = 0,
>  CC_CSS_SHIFT= 4,
> -- 
> 2.24.1
> 
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature

Re: [PATCH 5/9] hw/block/nvme: support for admin-only command set

2020-09-30 Thread Klaus Jensen

On Sep 30 15:04, Keith Busch wrote:
> Signed-off-by: Keith Busch 
> ---

Reviewed-by: Klaus Jensen 

>  hw/block/nvme.c  | 1 +
>  include/block/nvme.h | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 6c582e6874..ec7363ea40 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -2755,6 +2755,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  NVME_CAP_SET_CQR(n->bar.cap, 1);
>  NVME_CAP_SET_TO(n->bar.cap, 0xf);
>  NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_NVM);
> +NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_ADMIN_ONLY);
>  NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
>  
>  n->bar.vs = NVME_SPEC_VER;
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index bc20a2ba5e..521533fd2a 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -83,7 +83,8 @@ enum NvmeCapMask {
>  << CAP_PMR_SHIFT)
>  
>  enum NvmeCapCss {
> -NVME_CAP_CSS_NVM = 1 << 0,
> +NVME_CAP_CSS_NVM= 1 << 0,
> +NVME_CAP_CSS_ADMIN_ONLY = 1 << 7,
>  };
>  
>  enum NvmeCcShift {
> -- 
> 2.24.1
> 
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature

Re: [PATCH 3/9] hw/block/nvme: support per-namespace smart log

2020-09-30 Thread Klaus Jensen

On Sep 30 15:04, Keith Busch wrote:
> Let the user specify a specific namespace if they want to get access
> stats for a specific namespace.
> 

I don't think this makes sense for v1.3+.

NVM Express v1.3d, Section 5.14.1.2: "There is no namespace specific
information defined in the SMART / Health log page in this revision of
the specification.  therefore the controller log page and namespace
specific log page contain identical information".

I have no idea why the TWG decided this, but that's the way it is ;)

> Signed-off-by: Keith Busch 
> ---
>  hw/block/nvme.c  | 66 +++-
>  include/block/nvme.h |  1 +
>  2 files changed, 41 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 8d2b5be567..41389b2b09 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1164,48 +1164,62 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, 
> NvmeRequest *req)
>  return NVME_SUCCESS;
>  }
>  
> +struct nvme_stats {
> +uint64_t units_read;
> +uint64_t units_written;
> +uint64_t read_commands;
> +uint64_t write_commands;
> +};
> +
> +static void nvme_set_blk_stats(NvmeNamespace *ns, struct nvme_stats *stats)
> +{
> +BlockAcctStats *s = blk_get_stats(ns->blkconf.blk);
> +
> +stats->units_read += s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
> +stats->units_written += s->nr_bytes[BLOCK_ACCT_WRITE] >> 
> BDRV_SECTOR_BITS;
> +stats->read_commands += s->nr_ops[BLOCK_ACCT_READ];
> +stats->write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
> +}
> +
>  static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
>  uint64_t off, NvmeRequest *req)
>  {
>  uint32_t nsid = le32_to_cpu(req->cmd.nsid);
> -
> +struct nvme_stats stats = { 0 };
> +NvmeSmartLog smart = { 0 };
>  uint32_t trans_len;
> +NvmeNamespace *ns;
>  time_t current_ms;
> -uint64_t units_read = 0, units_written = 0;
> -uint64_t read_commands = 0, write_commands = 0;
> -NvmeSmartLog smart;
> -
> -if (nsid && nsid != 0x) {
> -return NVME_INVALID_FIELD | NVME_DNR;
> -}
>  
>  if (off >= sizeof(smart)) {
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
>  
> -for (int i = 1; i <= n->num_namespaces; i++) {
> -NvmeNamespace *ns = nvme_ns(n, i);
> -if (!ns) {
> -continue;
> -}
> -
> -BlockAcctStats *s = blk_get_stats(ns->blkconf.blk);
> +if (nsid != 0x) {
> +ns = nvme_ns(n, nsid);
> +if (!ns)
> +return NVME_INVALID_NSID | NVME_DNR;
> +nvme_set_blk_stats(ns, );
> +} else {
> +int i;
>  
> -units_read += s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
> -units_written += s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
> -read_commands += s->nr_ops[BLOCK_ACCT_READ];
> -write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
> +for (i = 1; i <= n->num_namespaces; i++) {
> +ns = nvme_ns(n, i);
> +if (!ns) {
> +continue;
> +}
> +nvme_set_blk_stats(ns, );
> +}
>  }
>  
>  trans_len = MIN(sizeof(smart) - off, buf_len);
>  
> -memset(, 0x0, sizeof(smart));
> -
> -smart.data_units_read[0] = cpu_to_le64(DIV_ROUND_UP(units_read, 1000));
> -smart.data_units_written[0] = cpu_to_le64(DIV_ROUND_UP(units_written,
> +smart.data_units_read[0] = cpu_to_le64(DIV_ROUND_UP(stats.units_read,
> +1000));
> +smart.data_units_written[0] = 
> cpu_to_le64(DIV_ROUND_UP(stats.units_written,
> 1000));
> -smart.host_read_commands[0] = cpu_to_le64(read_commands);
> -smart.host_write_commands[0] = cpu_to_le64(write_commands);
> +smart.host_read_commands[0] = cpu_to_le64(stats.read_commands);
> +smart.host_write_commands[0] = cpu_to_le64(stats.write_commands);
>  
>  smart.temperature = cpu_to_le16(n->temperature);
>  
> @@ -2708,7 +2722,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  id->acl = 3;
>  id->aerl = n->params.aerl;
>  id->frmw = (NVME_NUM_FW_SLOTS << 1) | NVME_FRMW_SLOT1_RO;
> -id->lpa = NVME_LPA_EXTENDED;
> +id->lpa = NVME_LPA_NS_SMART | NVME_LPA_EXTENDED;
>  
>  /* recommended default value (~70 C) */
>  id->wctemp = cpu_to_le16(NVME_TEMPERATURE_WARNING);
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 58647bcdad..868cf53f0b 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -849,6 +849,7 @@ enum NvmeIdCtrlFrmw {
>  };
>  
>  enum NvmeIdCtrlLpa {
> +NVME_LPA_NS_SMART = 1 << 0,
>  NVME_LPA_EXTENDED = 1 << 2,
>  };
>  
> -- 
> 2.24.1
> 
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature

Re: [PATCH 2/9] hw/block/nvme: fix log page offset check

2020-09-30 Thread Klaus Jensen

On Sep 30 15:04, Keith Busch wrote:
> Return error if the requested offset starts after the size of the log
> being returned. Also, move the check for earlier in the function so
> we're not doing unnecessary calculations.
> 
> Signed-off-by: Keith Busch 

Reviewed-by: Klaus Jensen 

> ---
>  hw/block/nvme.c | 22 ++
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index db52ea0db9..8d2b5be567 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1179,6 +1179,10 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t 
> rae, uint32_t buf_len,
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
>  
> +if (off >= sizeof(smart)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
>  for (int i = 1; i <= n->num_namespaces; i++) {
>  NvmeNamespace *ns = nvme_ns(n, i);
>  if (!ns) {
> @@ -1193,10 +1197,6 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t 
> rae, uint32_t buf_len,
>  write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
>  }
>  
> -if (off > sizeof(smart)) {
> -return NVME_INVALID_FIELD | NVME_DNR;
> -}
> -
>  trans_len = MIN(sizeof(smart) - off, buf_len);
>  
>  memset(, 0x0, sizeof(smart));
> @@ -1234,12 +1234,11 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, 
> uint32_t buf_len, uint64_t off,
>  .afi = 0x1,
>  };
>  
> -strpadcpy((char *)_log.frs1, sizeof(fw_log.frs1), "1.0", ' ');
> -
> -if (off > sizeof(fw_log)) {
> +if (off >= sizeof(fw_log)) {
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
>  
> +strpadcpy((char *)_log.frs1, sizeof(fw_log.frs1), "1.0", ' ');
>  trans_len = MIN(sizeof(fw_log) - off, buf_len);
>  
>  return nvme_dma(n, (uint8_t *) _log + off, trans_len,
> @@ -1252,16 +1251,15 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t 
> rae, uint32_t buf_len,
>  uint32_t trans_len;
>  NvmeErrorLog errlog;
>  
> -if (!rae) {
> -nvme_clear_events(n, NVME_AER_TYPE_ERROR);
> +if (off >= sizeof(errlog)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
>  }
>  
> -if (off > sizeof(errlog)) {
> -return NVME_INVALID_FIELD | NVME_DNR;
> +if (!rae) {
> +nvme_clear_events(n, NVME_AER_TYPE_ERROR);
>  }
>  
>  memset(, 0x0, sizeof(errlog));
> -
>  trans_len = MIN(sizeof(errlog) - off, buf_len);
>  
>  return nvme_dma(n, (uint8_t *), trans_len,
> -- 
> 2.24.1
> 
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature

Re: [PATCH 1/9] hw/block/nvme: remove pointless rw indirection

2020-09-30 Thread Klaus Jensen

On Sep 30 15:04, Keith Busch wrote:
> The code switches on the opcode to invoke a function specific to that
> opcode. There's no point in consolidating back to a common function that
> just switches on that same opcode without any actual common code.
> Restore the opcode specific behavior without going back through another
> level of switches.
> 
> Signed-off-by: Keith Busch 

Reviewed-by: Klaus Jensen 

Point taken. I could've sweared I had a better reason for this.

> ---
>  hw/block/nvme.c | 91 -
>  1 file changed, 29 insertions(+), 62 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index da8344f196..db52ea0db9 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -927,68 +927,12 @@ static void nvme_rw_cb(void *opaque, int ret)
>  nvme_enqueue_req_completion(nvme_cq(req), req);
>  }
>  
> -static uint16_t nvme_do_aio(BlockBackend *blk, int64_t offset, size_t len,
> -NvmeRequest *req)
> -{
> -BlockAcctCookie *acct = >acct;
> -BlockAcctStats *stats = blk_get_stats(blk);
> -
> -bool is_write = false;
> -
> -trace_pci_nvme_do_aio(nvme_cid(req), req->cmd.opcode,
> -  nvme_io_opc_str(req->cmd.opcode), blk_name(blk),
> -  offset, len);
> -
> -switch (req->cmd.opcode) {
> -case NVME_CMD_FLUSH:
> -block_acct_start(stats, acct, 0, BLOCK_ACCT_FLUSH);
> -req->aiocb = blk_aio_flush(blk, nvme_rw_cb, req);
> -break;
> -
> -case NVME_CMD_WRITE_ZEROES:
> -block_acct_start(stats, acct, len, BLOCK_ACCT_WRITE);
> -req->aiocb = blk_aio_pwrite_zeroes(blk, offset, len,
> -   BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
> -   req);
> -break;
> -
> -case NVME_CMD_WRITE:
> -is_write = true;
> -
> -/* fallthrough */
> -
> -case NVME_CMD_READ:
> -block_acct_start(stats, acct, len,
> - is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
> -
> -if (req->qsg.sg) {
> -if (is_write) {
> -req->aiocb = dma_blk_write(blk, >qsg, offset,
> -   BDRV_SECTOR_SIZE, nvme_rw_cb, 
> req);
> -} else {
> -req->aiocb = dma_blk_read(blk, >qsg, offset,
> -  BDRV_SECTOR_SIZE, nvme_rw_cb, req);
> -}
> -} else {
> -if (is_write) {
> -req->aiocb = blk_aio_pwritev(blk, offset, >iov, 0,
> - nvme_rw_cb, req);
> -} else {
> -req->aiocb = blk_aio_preadv(blk, offset, >iov, 0,
> -nvme_rw_cb, req);
> -}
> -}
> -
> -break;
> -}
> -
> -return NVME_NO_COMPLETE;
> -}
> -
>  static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
>  {
> -NvmeNamespace *ns = req->ns;
> -return nvme_do_aio(ns->blkconf.blk, 0, 0, req);
> +block_acct_start(blk_get_stats(n->conf.blk), >acct, 0,
> + BLOCK_ACCT_FLUSH);
> +req->aiocb = blk_aio_flush(n->conf.blk, nvme_rw_cb, req);
> +return NVME_NO_COMPLETE;
>  }
>  
>  static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
> @@ -1009,7 +953,11 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, 
> NvmeRequest *req)
>  return status;
>  }
>  
> -return nvme_do_aio(ns->blkconf.blk, offset, count, req);
> +block_acct_start(blk_get_stats(n->conf.blk), >acct, 0,
> + BLOCK_ACCT_WRITE);
> +req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, offset, count,
> +   BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
> +return NVME_NO_COMPLETE;
>  }
>  
>  static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
> @@ -1023,6 +971,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
>  uint64_t data_offset = nvme_l2b(ns, slba);
>  enum BlockAcctType acct = req->cmd.opcode == NVME_CMD_WRITE ?
>  BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
> +BlockBackend *blk = ns->blkconf.blk;
>  uint16_t status;
>  
>  trace_pci_nvme_rw(nvme_cid(req), nvme_io_opc_str(rw->opcode),
> @@ -1045,7 +994,25 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
>  goto invalid;
>  }
>  
> -return nvme_do_aio(ns->blkconf.blk, data_offset, data_size, req);
> +block_acct_start(blk_get_stats(blk), >acct, data_size, acct);
> +if (req->qsg.sg) {
> +if (acct == BLOCK_ACCT_WRITE) {
> +req->aiocb = dma_blk_write(blk, >qsg, data_offset,
> +   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
> +} else {
> +req->aiocb = dma_blk_read(blk, >qsg, data_offset,
> +  BDRV_SECTOR_SIZE, nvme_rw_cb, req);
> +}
> +} else {
> +if (acct ==

RE: [PATCH 5/9] hw/block/nvme: support for admin-only command set

2020-09-30 Thread Dmitry Fomichev

> -Original Message-
> From: Keith Busch 
> Sent: Wednesday, September 30, 2020 6:04 PM
> To: qemu-block@nongnu.org; qemu-de...@nongnu.org; Klaus Jensen
> 
> Cc: Niklas Cassel ; Dmitry Fomichev
> ; Kevin Wolf ; Philippe
> Mathieu-Daudé ; Keith Busch 
> Subject: [PATCH 5/9] hw/block/nvme: support for admin-only command set
> 
> Signed-off-by: Keith Busch 
> ---
>  hw/block/nvme.c  | 1 +
>  include/block/nvme.h | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 6c582e6874..ec7363ea40 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -2755,6 +2755,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice
> *pci_dev)
>  NVME_CAP_SET_CQR(n->bar.cap, 1);
>  NVME_CAP_SET_TO(n->bar.cap, 0xf);
>  NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_NVM);
> +NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_ADMIN_ONLY);

This could be

- NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_NVM);
+NVME_CAP_SET_CSS(n->bar.cap, (NVME_CAP_CSS_NVM  | 
NVME_CAP_CSS_ADMIN_ONLY));

Unfortunately, parentheses are needed above because NVME_CAP_SET_CSS macro and
other similar macros use "val" instead of (val). A possible cleanup topic...

>  NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
> 
>  n->bar.vs = NVME_SPEC_VER;
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index bc20a2ba5e..521533fd2a 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -83,7 +83,8 @@ enum NvmeCapMask {
>  << CAP_PMR_SHIFT)
> 
>  enum NvmeCapCss {
> -NVME_CAP_CSS_NVM = 1 << 0,
> +NVME_CAP_CSS_NVM= 1 << 0,
> +NVME_CAP_CSS_ADMIN_ONLY = 1 << 7,
>  };
> 
>  enum NvmeCcShift {
> --
> 2.24.1

RE: [PATCH 8/9] hw/block/nvme: add trace event for requests with non-zero status code

2020-09-30 Thread Dmitry Fomichev

> -Original Message-
> From: Keith Busch 
> Sent: Wednesday, September 30, 2020 6:04 PM
> To: qemu-block@nongnu.org; qemu-de...@nongnu.org; Klaus Jensen
> 
> Cc: Niklas Cassel ; Dmitry Fomichev
> ; Kevin Wolf ; Philippe
> Mathieu-Daudé ; Keith Busch 
> Subject: [PATCH 8/9] hw/block/nvme: add trace event for requests with
> non-zero status code
> 
> From: Klaus Jensen 
> 
> If a command results in a non-zero status code, trace it.
> 
> Signed-off-by: Klaus Jensen 
> Signed-off-by: Keith Busch 
> ---
>  hw/block/nvme.c   | 6 ++
>  hw/block/trace-events | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index dc971c9653..16804d0278 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -777,6 +777,12 @@ static void
> nvme_enqueue_req_completion(NvmeCQueue *cq, NvmeRequest *req)
>  assert(cq->cqid == req->sq->cqid);
>  trace_pci_nvme_enqueue_req_completion(nvme_cid(req), cq->cqid,
>req->status);
> +
> +if (req->status) {
> +trace_pci_nvme_err_req_status(nvme_cid(req), nvme_nsid(req->ns),
> +  req->status, req->cmd.opcode);
> +}
> +

Very useful.
Reviewed-by: Dmitry Fomichev 

>  QTAILQ_REMOVE(>sq->out_req_list, req, entry);
>  QTAILQ_INSERT_TAIL(>req_list, req, entry);
>  timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
> 500);
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 180c43d258..ff3ca4bbf6 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -89,6 +89,7 @@ pci_nvme_mmio_shutdown_cleared(void) "shutdown
> bit cleared"
> 
>  # nvme traces for error conditions
>  pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu"
> +pci_nvme_err_req_status(uint16_t cid, uint32_t nsid, uint16_t status,
> uint8_t opc) "cid %"PRIu16" nsid %"PRIu32" status 0x%"PRIx16" opc
> 0x%"PRIx8""
>  pci_nvme_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
>  pci_nvme_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
>  pci_nvme_err_cfs(void) "controller fatal status"
> --
> 2.24.1

RE: [PATCH 2/9] hw/block/nvme: fix log page offset check

2020-09-30 Thread Dmitry Fomichev

> -Original Message-
> From: Keith Busch 
> Sent: Wednesday, September 30, 2020 6:04 PM
> To: qemu-block@nongnu.org; qemu-de...@nongnu.org; Klaus Jensen
> 
> Cc: Niklas Cassel ; Dmitry Fomichev
> ; Kevin Wolf ; Philippe
> Mathieu-Daudé ; Keith Busch 
> Subject: [PATCH 2/9] hw/block/nvme: fix log page offset check
> 
> Return error if the requested offset starts after the size of the log
> being returned. Also, move the check for earlier in the function so
> we're not doing unnecessary calculations.
> 
> Signed-off-by: Keith Busch 

Reviewed- by: Dmitry Fomichev 

> ---
>  hw/block/nvme.c | 22 ++
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index db52ea0db9..8d2b5be567 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1179,6 +1179,10 @@ static uint16_t nvme_smart_info(NvmeCtrl *n,
> uint8_t rae, uint32_t buf_len,
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
> 
> +if (off >= sizeof(smart)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
> +}
> +
>  for (int i = 1; i <= n->num_namespaces; i++) {
>  NvmeNamespace *ns = nvme_ns(n, i);
>  if (!ns) {
> @@ -1193,10 +1197,6 @@ static uint16_t nvme_smart_info(NvmeCtrl *n,
> uint8_t rae, uint32_t buf_len,
>  write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
>  }
> 
> -if (off > sizeof(smart)) {
> -return NVME_INVALID_FIELD | NVME_DNR;
> -}
> -
>  trans_len = MIN(sizeof(smart) - off, buf_len);
> 
>  memset(, 0x0, sizeof(smart));
> @@ -1234,12 +1234,11 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n,
> uint32_t buf_len, uint64_t off,
>  .afi = 0x1,
>  };
> 
> -strpadcpy((char *)_log.frs1, sizeof(fw_log.frs1), "1.0", ' ');
> -
> -if (off > sizeof(fw_log)) {
> +if (off >= sizeof(fw_log)) {
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
> 
> +strpadcpy((char *)_log.frs1, sizeof(fw_log.frs1), "1.0", ' ');
>  trans_len = MIN(sizeof(fw_log) - off, buf_len);
> 
>  return nvme_dma(n, (uint8_t *) _log + off, trans_len,
> @@ -1252,16 +1251,15 @@ static uint16_t nvme_error_info(NvmeCtrl *n,
> uint8_t rae, uint32_t buf_len,
>  uint32_t trans_len;
>  NvmeErrorLog errlog;
> 
> -if (!rae) {
> -nvme_clear_events(n, NVME_AER_TYPE_ERROR);
> +if (off >= sizeof(errlog)) {
> +return NVME_INVALID_FIELD | NVME_DNR;
>  }
> 
> -if (off > sizeof(errlog)) {
> -return NVME_INVALID_FIELD | NVME_DNR;
> +if (!rae) {
> +nvme_clear_events(n, NVME_AER_TYPE_ERROR);
>  }
> 
>  memset(, 0x0, sizeof(errlog));
> -
>  trans_len = MIN(sizeof(errlog) - off, buf_len);
> 
>  return nvme_dma(n, (uint8_t *), trans_len,
> --
> 2.24.1

RE: [PATCH 6/9] hw/block/nvme: reject io commands if only admin command set selected

2020-09-30 Thread Dmitry Fomichev

> -Original Message-
> From: Keith Busch 
> Sent: Wednesday, September 30, 2020 6:04 PM
> To: qemu-block@nongnu.org; qemu-de...@nongnu.org; Klaus Jensen
> 
> Cc: Niklas Cassel ; Dmitry Fomichev
> ; Kevin Wolf ; Philippe
> Mathieu-Daudé ; Keith Busch 
> Subject: [PATCH 6/9] hw/block/nvme: reject io commands if only admin
> command set selected
> 
> From: Klaus Jensen 
> 
> If the host sets CC.CSS to 111b, all commands submitted to I/O queues
> should be completed with status Invalid Command Opcode.
> 
> Note that this is technically a v1.4 feature, but it does not hurt to
> implement before we finally bump the reported version implemented.
> 
> Signed-off-by: Klaus Jensen 
> Signed-off-by: Keith Busch 

Reviewed-by: Dmitry Fomichev 

> ---
>  hw/block/nvme.c  | 4 
>  include/block/nvme.h | 5 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index ec7363ea40..80730e1c03 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1026,6 +1026,10 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n,
> NvmeRequest *req)
>  trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req),
>req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode));
> 
> +if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ADMIN_ONLY) {
> +return NVME_INVALID_OPCODE | NVME_DNR;
> +}
> +
>  if (!nvme_nsid_valid(n, nsid)) {
>  return NVME_INVALID_NSID | NVME_DNR;
>  }
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 521533fd2a..6de2d5aa75 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -115,6 +115,11 @@ enum NvmeCcMask {
>  #define NVME_CC_IOSQES(cc) ((cc >> CC_IOSQES_SHIFT) &
> CC_IOSQES_MASK)
>  #define NVME_CC_IOCQES(cc) ((cc >> CC_IOCQES_SHIFT) &
> CC_IOCQES_MASK)
> 
> +enum NvmeCcCss {
> +NVME_CC_CSS_NVM= 0x0,
> +NVME_CC_CSS_ADMIN_ONLY = 0x7,
> +};
> +
>  enum NvmeCstsShift {
>  CSTS_RDY_SHIFT  = 0,
>  CSTS_CFS_SHIFT  = 1,
> --
> 2.24.1

[PATCH 9/9] hw/block/nvme: report actual LBA data shift in LBAF

2020-09-30 Thread Keith Busch

From: Dmitry Fomichev 

Calculate the data shift value to report based on the set value of
logical_block_size device property.

In the process, use a local variable to calculate the LBA format
index instead of the hardcoded value 0. This makes the code more
readable and it will make it easier to add support for multiple LBA
formats in the future.

Signed-off-by: Dmitry Fomichev 
Reviewed-by: Klaus Jensen 
Signed-off-by: Keith Busch 
---
 hw/block/nvme-ns.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 2ba0263dda..a85e5fdb42 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -47,6 +47,8 @@ static void nvme_ns_init(NvmeNamespace *ns)
 
 static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
 {
+int lba_index;
+
 if (!blkconf_blocksizes(>blkconf, errp)) {
 return -1;
 }
@@ -67,6 +69,9 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, 
Error **errp)
 n->features.vwc = 0x1;
 }
 
+lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
+ns->id_ns.lbaf[lba_index].ds = 31 - clz32(ns->blkconf.logical_block_size);
+
 return 0;
 }
 
-- 
2.24.1

[PATCH 6/9] hw/block/nvme: reject io commands if only admin command set selected

2020-09-30 Thread Keith Busch

From: Klaus Jensen 

If the host sets CC.CSS to 111b, all commands submitted to I/O queues
should be completed with status Invalid Command Opcode.

Note that this is technically a v1.4 feature, but it does not hurt to
implement before we finally bump the reported version implemented.

Signed-off-by: Klaus Jensen 
Signed-off-by: Keith Busch 
---
 hw/block/nvme.c  | 4 
 include/block/nvme.h | 5 +
 2 files changed, 9 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ec7363ea40..80730e1c03 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1026,6 +1026,10 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req),
   req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode));
 
+if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ADMIN_ONLY) {
+return NVME_INVALID_OPCODE | NVME_DNR;
+}
+
 if (!nvme_nsid_valid(n, nsid)) {
 return NVME_INVALID_NSID | NVME_DNR;
 }
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 521533fd2a..6de2d5aa75 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -115,6 +115,11 @@ enum NvmeCcMask {
 #define NVME_CC_IOSQES(cc) ((cc >> CC_IOSQES_SHIFT) & CC_IOSQES_MASK)
 #define NVME_CC_IOCQES(cc) ((cc >> CC_IOCQES_SHIFT) & CC_IOCQES_MASK)
 
+enum NvmeCcCss {
+NVME_CC_CSS_NVM= 0x0,
+NVME_CC_CSS_ADMIN_ONLY = 0x7,
+};
+
 enum NvmeCstsShift {
 CSTS_RDY_SHIFT  = 0,
 CSTS_CFS_SHIFT  = 1,
-- 
2.24.1

[PATCH 7/9] hw/block/nvme: add nsid to get/setfeat trace events

2020-09-30 Thread Keith Busch

From: Klaus Jensen 

Include the namespace id in the pci_nvme_{get,set}feat trace events.

Signed-off-by: Klaus Jensen 
Signed-off-by: Keith Busch 
---
 hw/block/nvme.c   | 4 ++--
 hw/block/trace-events | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 80730e1c03..dc971c9653 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1643,7 +1643,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest 
*req)
 [NVME_ARBITRATION] = NVME_ARB_AB_NOLIMIT,
 };
 
-trace_pci_nvme_getfeat(nvme_cid(req), fid, sel, dw11);
+trace_pci_nvme_getfeat(nvme_cid(req), nsid, fid, sel, dw11);
 
 if (!nvme_feature_support[fid]) {
 return NVME_INVALID_FIELD | NVME_DNR;
@@ -1781,7 +1781,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest 
*req)
 uint8_t fid = NVME_GETSETFEAT_FID(dw10);
 uint8_t save = NVME_SETFEAT_SAVE(dw10);
 
-trace_pci_nvme_setfeat(nvme_cid(req), fid, save, dw11);
+trace_pci_nvme_setfeat(nvme_cid(req), nsid, fid, save, dw11);
 
 if (save) {
 return NVME_FID_NOT_SAVEABLE | NVME_DNR;
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 7720e1b4d9..180c43d258 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -53,8 +53,8 @@ pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_ns_descr_list(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_get_log(uint16_t cid, uint8_t lid, uint8_t lsp, uint8_t rae, uint32_t 
len, uint64_t off) "cid %"PRIu16" lid 0x%"PRIx8" lsp 0x%"PRIx8" rae 0x%"PRIx8" 
len %"PRIu32" off %"PRIu64""
-pci_nvme_getfeat(uint16_t cid, uint8_t fid, uint8_t sel, uint32_t cdw11) "cid 
%"PRIu16" fid 0x%"PRIx8" sel 0x%"PRIx8" cdw11 0x%"PRIx32""
-pci_nvme_setfeat(uint16_t cid, uint8_t fid, uint8_t save, uint32_t cdw11) "cid 
%"PRIu16" fid 0x%"PRIx8" save 0x%"PRIx8" cdw11 0x%"PRIx32""
+pci_nvme_getfeat(uint16_t cid, uint32_t nsid, uint8_t fid, uint8_t sel, 
uint32_t cdw11) "cid %"PRIu16" nsid 0x%"PRIx32" fid 0x%"PRIx8" sel 0x%"PRIx8" 
cdw11 0x%"PRIx32""
+pci_nvme_setfeat(uint16_t cid, uint32_t nsid, uint8_t fid, uint8_t save, 
uint32_t cdw11) "cid %"PRIu16" nsid 0x%"PRIx32" fid 0x%"PRIx8" save 0x%"PRIx8" 
cdw11 0x%"PRIx32""
 pci_nvme_getfeat_vwcache(const char* result) "get feature volatile write 
cache, result=%s"
 pci_nvme_getfeat_numq(int result) "get feature number of queues, result=%d"
 pci_nvme_setfeat_numq(int reqcq, int reqsq, int gotcq, int gotsq) "requested 
cq_count=%d sq_count=%d, responding with cq_count=%d sq_count=%d"
-- 
2.24.1

[PATCH 8/9] hw/block/nvme: add trace event for requests with non-zero status code

2020-09-30 Thread Keith Busch

From: Klaus Jensen 

If a command results in a non-zero status code, trace it.

Signed-off-by: Klaus Jensen 
Signed-off-by: Keith Busch 
---
 hw/block/nvme.c   | 6 ++
 hw/block/trace-events | 1 +
 2 files changed, 7 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index dc971c9653..16804d0278 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -777,6 +777,12 @@ static void nvme_enqueue_req_completion(NvmeCQueue *cq, 
NvmeRequest *req)
 assert(cq->cqid == req->sq->cqid);
 trace_pci_nvme_enqueue_req_completion(nvme_cid(req), cq->cqid,
   req->status);
+
+if (req->status) {
+trace_pci_nvme_err_req_status(nvme_cid(req), nvme_nsid(req->ns),
+  req->status, req->cmd.opcode);
+}
+
 QTAILQ_REMOVE(>sq->out_req_list, req, entry);
 QTAILQ_INSERT_TAIL(>req_list, req, entry);
 timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 180c43d258..ff3ca4bbf6 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -89,6 +89,7 @@ pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared"
 
 # nvme traces for error conditions
 pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu"
+pci_nvme_err_req_status(uint16_t cid, uint32_t nsid, uint16_t status, uint8_t 
opc) "cid %"PRIu16" nsid %"PRIu32" status 0x%"PRIx16" opc 0x%"PRIx8""
 pci_nvme_err_addr_read(uint64_t addr) "addr 0x%"PRIx64""
 pci_nvme_err_addr_write(uint64_t addr) "addr 0x%"PRIx64""
 pci_nvme_err_cfs(void) "controller fatal status"
-- 
2.24.1

[PATCH 2/9] hw/block/nvme: fix log page offset check

2020-09-30 Thread Keith Busch

Return error if the requested offset starts after the size of the log
being returned. Also, move the check for earlier in the function so
we're not doing unnecessary calculations.

Signed-off-by: Keith Busch 
---
 hw/block/nvme.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index db52ea0db9..8d2b5be567 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1179,6 +1179,10 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
+if (off >= sizeof(smart)) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
 for (int i = 1; i <= n->num_namespaces; i++) {
 NvmeNamespace *ns = nvme_ns(n, i);
 if (!ns) {
@@ -1193,10 +1197,6 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
 }
 
-if (off > sizeof(smart)) {
-return NVME_INVALID_FIELD | NVME_DNR;
-}
-
 trans_len = MIN(sizeof(smart) - off, buf_len);
 
 memset(, 0x0, sizeof(smart));
@@ -1234,12 +1234,11 @@ static uint16_t nvme_fw_log_info(NvmeCtrl *n, uint32_t 
buf_len, uint64_t off,
 .afi = 0x1,
 };
 
-strpadcpy((char *)_log.frs1, sizeof(fw_log.frs1), "1.0", ' ');
-
-if (off > sizeof(fw_log)) {
+if (off >= sizeof(fw_log)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
+strpadcpy((char *)_log.frs1, sizeof(fw_log.frs1), "1.0", ' ');
 trans_len = MIN(sizeof(fw_log) - off, buf_len);
 
 return nvme_dma(n, (uint8_t *) _log + off, trans_len,
@@ -1252,16 +1251,15 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 uint32_t trans_len;
 NvmeErrorLog errlog;
 
-if (!rae) {
-nvme_clear_events(n, NVME_AER_TYPE_ERROR);
+if (off >= sizeof(errlog)) {
+return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-if (off > sizeof(errlog)) {
-return NVME_INVALID_FIELD | NVME_DNR;
+if (!rae) {
+nvme_clear_events(n, NVME_AER_TYPE_ERROR);
 }
 
 memset(, 0x0, sizeof(errlog));
-
 trans_len = MIN(sizeof(errlog) - off, buf_len);
 
 return nvme_dma(n, (uint8_t *), trans_len,
-- 
2.24.1

[PATCH 3/9] hw/block/nvme: support per-namespace smart log

2020-09-30 Thread Keith Busch

Let the user specify a specific namespace if they want to get access
stats for a specific namespace.

Signed-off-by: Keith Busch 
---
 hw/block/nvme.c  | 66 +++-
 include/block/nvme.h |  1 +
 2 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 8d2b5be567..41389b2b09 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1164,48 +1164,62 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_SUCCESS;
 }
 
+struct nvme_stats {
+uint64_t units_read;
+uint64_t units_written;
+uint64_t read_commands;
+uint64_t write_commands;
+};
+
+static void nvme_set_blk_stats(NvmeNamespace *ns, struct nvme_stats *stats)
+{
+BlockAcctStats *s = blk_get_stats(ns->blkconf.blk);
+
+stats->units_read += s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
+stats->units_written += s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
+stats->read_commands += s->nr_ops[BLOCK_ACCT_READ];
+stats->write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
+}
+
 static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
 uint64_t off, NvmeRequest *req)
 {
 uint32_t nsid = le32_to_cpu(req->cmd.nsid);
-
+struct nvme_stats stats = { 0 };
+NvmeSmartLog smart = { 0 };
 uint32_t trans_len;
+NvmeNamespace *ns;
 time_t current_ms;
-uint64_t units_read = 0, units_written = 0;
-uint64_t read_commands = 0, write_commands = 0;
-NvmeSmartLog smart;
-
-if (nsid && nsid != 0x) {
-return NVME_INVALID_FIELD | NVME_DNR;
-}
 
 if (off >= sizeof(smart)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-for (int i = 1; i <= n->num_namespaces; i++) {
-NvmeNamespace *ns = nvme_ns(n, i);
-if (!ns) {
-continue;
-}
-
-BlockAcctStats *s = blk_get_stats(ns->blkconf.blk);
+if (nsid != 0x) {
+ns = nvme_ns(n, nsid);
+if (!ns)
+return NVME_INVALID_NSID | NVME_DNR;
+nvme_set_blk_stats(ns, );
+} else {
+int i;
 
-units_read += s->nr_bytes[BLOCK_ACCT_READ] >> BDRV_SECTOR_BITS;
-units_written += s->nr_bytes[BLOCK_ACCT_WRITE] >> BDRV_SECTOR_BITS;
-read_commands += s->nr_ops[BLOCK_ACCT_READ];
-write_commands += s->nr_ops[BLOCK_ACCT_WRITE];
+for (i = 1; i <= n->num_namespaces; i++) {
+ns = nvme_ns(n, i);
+if (!ns) {
+continue;
+}
+nvme_set_blk_stats(ns, );
+}
 }
 
 trans_len = MIN(sizeof(smart) - off, buf_len);
 
-memset(, 0x0, sizeof(smart));
-
-smart.data_units_read[0] = cpu_to_le64(DIV_ROUND_UP(units_read, 1000));
-smart.data_units_written[0] = cpu_to_le64(DIV_ROUND_UP(units_written,
+smart.data_units_read[0] = cpu_to_le64(DIV_ROUND_UP(stats.units_read,
+1000));
+smart.data_units_written[0] = cpu_to_le64(DIV_ROUND_UP(stats.units_written,
1000));
-smart.host_read_commands[0] = cpu_to_le64(read_commands);
-smart.host_write_commands[0] = cpu_to_le64(write_commands);
+smart.host_read_commands[0] = cpu_to_le64(stats.read_commands);
+smart.host_write_commands[0] = cpu_to_le64(stats.write_commands);
 
 smart.temperature = cpu_to_le16(n->temperature);
 
@@ -2708,7 +2722,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 id->acl = 3;
 id->aerl = n->params.aerl;
 id->frmw = (NVME_NUM_FW_SLOTS << 1) | NVME_FRMW_SLOT1_RO;
-id->lpa = NVME_LPA_EXTENDED;
+id->lpa = NVME_LPA_NS_SMART | NVME_LPA_EXTENDED;
 
 /* recommended default value (~70 C) */
 id->wctemp = cpu_to_le16(NVME_TEMPERATURE_WARNING);
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 58647bcdad..868cf53f0b 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -849,6 +849,7 @@ enum NvmeIdCtrlFrmw {
 };
 
 enum NvmeIdCtrlLpa {
+NVME_LPA_NS_SMART = 1 << 0,
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
-- 
2.24.1

[PATCH 4/9] hw/block/nvme: validate command set selected

2020-09-30 Thread Keith Busch

Fail to start the controller if the user requests a command set that the
controller does not support.

Signed-off-by: Keith Busch 
---
 hw/block/nvme.c   | 6 +-
 hw/block/trace-events | 1 +
 include/block/nvme.h  | 4 
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 41389b2b09..6c582e6874 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2049,6 +2049,10 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 trace_pci_nvme_err_startfail_acq_misaligned(n->bar.acq);
 return -1;
 }
+if (unlikely(!(NVME_CAP_CSS(n->bar.cap) & (1 << NVME_CC_CSS(n->bar.cc) 
{
+trace_pci_nvme_err_startfail_css(NVME_CC_CSS(n->bar.cc));
+return -1;
+}
 if (unlikely(NVME_CC_MPS(n->bar.cc) <
  NVME_CAP_MPSMIN(n->bar.cap))) {
 trace_pci_nvme_err_startfail_page_too_small(
@@ -2750,7 +2754,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
 NVME_CAP_SET_CQR(n->bar.cap, 1);
 NVME_CAP_SET_TO(n->bar.cap, 0xf);
-NVME_CAP_SET_CSS(n->bar.cap, 1);
+NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_NVM);
 NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
 
 n->bar.vs = NVME_SPEC_VER;
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 446cca08e9..7720e1b4d9 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -133,6 +133,7 @@ pci_nvme_err_startfail_cqent_too_small(uint8_t log2ps, 
uint8_t maxlog2ps) "nvme_
 pci_nvme_err_startfail_cqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) 
"nvme_start_ctrl failed because the completion queue entry size is too large: 
log2size=%u, max=%u"
 pci_nvme_err_startfail_sqent_too_small(uint8_t log2ps, uint8_t maxlog2ps) 
"nvme_start_ctrl failed because the submission queue entry size is too small: 
log2size=%u, min=%u"
 pci_nvme_err_startfail_sqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) 
"nvme_start_ctrl failed because the submission queue entry size is too large: 
log2size=%u, max=%u"
+pci_nvme_err_startfail_css(uint8_t css) "nvme_start_ctrl failed because 
invalid command set selected:%u"
 pci_nvme_err_startfail_asqent_sz_zero(void) "nvme_start_ctrl failed because 
the admin submission queue size is zero"
 pci_nvme_err_startfail_acqent_sz_zero(void) "nvme_start_ctrl failed because 
the admin completion queue size is zero"
 pci_nvme_err_startfail(void) "setting controller enable bit failed"
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 868cf53f0b..bc20a2ba5e 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -82,6 +82,10 @@ enum NvmeCapMask {
 #define NVME_CAP_SET_PMRS(cap, val) (cap |= (uint64_t)(val & CAP_PMR_MASK)\
 << CAP_PMR_SHIFT)
 
+enum NvmeCapCss {
+NVME_CAP_CSS_NVM = 1 << 0,
+};
+
 enum NvmeCcShift {
 CC_EN_SHIFT = 0,
 CC_CSS_SHIFT= 4,
-- 
2.24.1

[PATCH 0/9] nvme qemu cleanups and fixes

2020-09-30 Thread Keith Busch

After going through the zns enabling, I notice the controller enabling
is not correct. Then I just continued maked more stuff. The series, I
think, contains some of the less controversial patches from the two
conflicting zns series, preceeded by some cleanups and fixes from me.

If this is all fine, I took the liberty of porting the zns enabling to
it and made a public branch for consideration here:

 http://git.infradead.org/qemu-nvme.git/shortlog/refs/heads/kb-zns 

Dmitry Fomichev (1):
  hw/block/nvme: report actual LBA data shift in LBAF

Keith Busch (5):
  hw/block/nvme: remove pointless rw indirection
  hw/block/nvme: fix log page offset check
  hw/block/nvme: support per-namespace smart log
  hw/block/nvme: validate command set selected
  hw/block/nvme: support for admin-only command set

Klaus Jensen (3):
  hw/block/nvme: reject io commands if only admin command set selected
  hw/block/nvme: add nsid to get/setfeat trace events
  hw/block/nvme: add trace event for requests with non-zero status code

 hw/block/nvme-ns.c|   5 ++
 hw/block/nvme.c   | 194 --
 hw/block/trace-events |   6 +-
 include/block/nvme.h  |  11 +++
 4 files changed, 114 insertions(+), 102 deletions(-)

-- 
2.24.1

[PATCH 5/9] hw/block/nvme: support for admin-only command set

2020-09-30 Thread Keith Busch

Signed-off-by: Keith Busch 
---
 hw/block/nvme.c  | 1 +
 include/block/nvme.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6c582e6874..ec7363ea40 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2755,6 +2755,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 NVME_CAP_SET_CQR(n->bar.cap, 1);
 NVME_CAP_SET_TO(n->bar.cap, 0xf);
 NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_NVM);
+NVME_CAP_SET_CSS(n->bar.cap, NVME_CAP_CSS_ADMIN_ONLY);
 NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
 
 n->bar.vs = NVME_SPEC_VER;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index bc20a2ba5e..521533fd2a 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -83,7 +83,8 @@ enum NvmeCapMask {
 << CAP_PMR_SHIFT)
 
 enum NvmeCapCss {
-NVME_CAP_CSS_NVM = 1 << 0,
+NVME_CAP_CSS_NVM= 1 << 0,
+NVME_CAP_CSS_ADMIN_ONLY = 1 << 7,
 };
 
 enum NvmeCcShift {
-- 
2.24.1

[PATCH 1/9] hw/block/nvme: remove pointless rw indirection

2020-09-30 Thread Keith Busch

The code switches on the opcode to invoke a function specific to that
opcode. There's no point in consolidating back to a common function that
just switches on that same opcode without any actual common code.
Restore the opcode specific behavior without going back through another
level of switches.

Signed-off-by: Keith Busch 
---
 hw/block/nvme.c | 91 -
 1 file changed, 29 insertions(+), 62 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index da8344f196..db52ea0db9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -927,68 +927,12 @@ static void nvme_rw_cb(void *opaque, int ret)
 nvme_enqueue_req_completion(nvme_cq(req), req);
 }
 
-static uint16_t nvme_do_aio(BlockBackend *blk, int64_t offset, size_t len,
-NvmeRequest *req)
-{
-BlockAcctCookie *acct = >acct;
-BlockAcctStats *stats = blk_get_stats(blk);
-
-bool is_write = false;
-
-trace_pci_nvme_do_aio(nvme_cid(req), req->cmd.opcode,
-  nvme_io_opc_str(req->cmd.opcode), blk_name(blk),
-  offset, len);
-
-switch (req->cmd.opcode) {
-case NVME_CMD_FLUSH:
-block_acct_start(stats, acct, 0, BLOCK_ACCT_FLUSH);
-req->aiocb = blk_aio_flush(blk, nvme_rw_cb, req);
-break;
-
-case NVME_CMD_WRITE_ZEROES:
-block_acct_start(stats, acct, len, BLOCK_ACCT_WRITE);
-req->aiocb = blk_aio_pwrite_zeroes(blk, offset, len,
-   BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
-   req);
-break;
-
-case NVME_CMD_WRITE:
-is_write = true;
-
-/* fallthrough */
-
-case NVME_CMD_READ:
-block_acct_start(stats, acct, len,
- is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
-
-if (req->qsg.sg) {
-if (is_write) {
-req->aiocb = dma_blk_write(blk, >qsg, offset,
-   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
-} else {
-req->aiocb = dma_blk_read(blk, >qsg, offset,
-  BDRV_SECTOR_SIZE, nvme_rw_cb, req);
-}
-} else {
-if (is_write) {
-req->aiocb = blk_aio_pwritev(blk, offset, >iov, 0,
- nvme_rw_cb, req);
-} else {
-req->aiocb = blk_aio_preadv(blk, offset, >iov, 0,
-nvme_rw_cb, req);
-}
-}
-
-break;
-}
-
-return NVME_NO_COMPLETE;
-}
-
 static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 {
-NvmeNamespace *ns = req->ns;
-return nvme_do_aio(ns->blkconf.blk, 0, 0, req);
+block_acct_start(blk_get_stats(n->conf.blk), >acct, 0,
+ BLOCK_ACCT_FLUSH);
+req->aiocb = blk_aio_flush(n->conf.blk, nvme_rw_cb, req);
+return NVME_NO_COMPLETE;
 }
 
 static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
@@ -1009,7 +953,11 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, 
NvmeRequest *req)
 return status;
 }
 
-return nvme_do_aio(ns->blkconf.blk, offset, count, req);
+block_acct_start(blk_get_stats(n->conf.blk), >acct, 0,
+ BLOCK_ACCT_WRITE);
+req->aiocb = blk_aio_pwrite_zeroes(n->conf.blk, offset, count,
+   BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
+return NVME_NO_COMPLETE;
 }
 
 static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
@@ -1023,6 +971,7 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 uint64_t data_offset = nvme_l2b(ns, slba);
 enum BlockAcctType acct = req->cmd.opcode == NVME_CMD_WRITE ?
 BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
+BlockBackend *blk = ns->blkconf.blk;
 uint16_t status;
 
 trace_pci_nvme_rw(nvme_cid(req), nvme_io_opc_str(rw->opcode),
@@ -1045,7 +994,25 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 goto invalid;
 }
 
-return nvme_do_aio(ns->blkconf.blk, data_offset, data_size, req);
+block_acct_start(blk_get_stats(blk), >acct, data_size, acct);
+if (req->qsg.sg) {
+if (acct == BLOCK_ACCT_WRITE) {
+req->aiocb = dma_blk_write(blk, >qsg, data_offset,
+   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+} else {
+req->aiocb = dma_blk_read(blk, >qsg, data_offset,
+  BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+}
+} else {
+if (acct == BLOCK_ACCT_WRITE) {
+req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0,
+ nvme_rw_cb, req);
+} else {
+req->aiocb = blk_aio_preadv(blk, data_offset, >iov, 0,
+nvme_rw_cb, req);
+}
+}
+return NVME_NO_COMPLETE;
 
 invalid:

Re: [PATCH v3 00/11] user-mode: Prune build dependencies (part 3)

2020-09-30 Thread Paolo Bonzini

Il mer 30 set 2020, 20:57 Alex Bennée  ha scritto:

> > 1-8 is fine, but I think 9-11 is too much complication (especially not
> > really future-proof) for the benefit.
>
> Isn't qdev considered an internal API for our object and device lifetime
> handling (which should be shared) versus QAPI which only exists for
> system emulation and tool integration?
>

qdev is nothing more than a bunch of QOM classes, and QAPI is an integral
part of QOM (though properties, which are used when setting up CPUs in user
more emulation)

Therefore, even though most of the QAPI schema is specific to system
emulation and tools, a small part is used by common code.

Paolo

>

Re: [PATCH v3 00/11] user-mode: Prune build dependencies (part 3)

2020-09-30 Thread Alex Bennée

Paolo Bonzini  writes:

> On 30/09/20 19:15, Eduardo Habkost wrote:
>> On Wed, Sep 30, 2020 at 06:49:38PM +0200, Philippe Mathieu-Daudé wrote:
>>> This is the third part of a series reducing user-mode
>>> dependencies. By stripping out unused code, the build
>>> and testing time is reduced (as is space used by objects).
>> I'm queueing patches 2-9 on machine-next.  Thanks!
>> 
>> Markus, Eric: I can merge the QAPI patches (1, 11) if I get an
>> Acked-by.
>> 
>> I'll send separate comments on patch 10.
>> 
>
> 1-8 is fine, but I think 9-11 is too much complication (especially not
> really future-proof) for the benefit.

Isn't qdev considered an internal API for our object and device lifetime
handling (which should be shared) versus QAPI which only exists for
system emulation and tool integration? That is of course assuming
libvirt is never going to want to know about linux-user emulation?

>
> Paolo

-- 
Alex Bennée

Re: [PATCH v5 09/14] hw/block/nvme: Support Zoned Namespace Command Set

2020-09-30 Thread Klaus Jensen

On Sep 30 14:50, Niklas Cassel wrote:
> On Mon, Sep 28, 2020 at 11:35:23AM +0900, Dmitry Fomichev wrote:
> > The emulation code has been changed to advertise NVM Command Set when
> > "zoned" device property is not set (default) and Zoned Namespace
> > Command Set otherwise.
> > 
> > Handlers for three new NVMe commands introduced in Zoned Namespace
> > Command Set specification are added, namely for Zone Management
> > Receive, Zone Management Send and Zone Append.
> > 
> > Device initialization code has been extended to create a proper
> > configuration for zoned operation using device properties.
> > 
> > Read/Write command handler is modified to only allow writes at the
> > write pointer if the namespace is zoned. For Zone Append command,
> > writes implicitly happen at the write pointer and the starting write
> > pointer value is returned as the result of the command. Write Zeroes
> > handler is modified to add zoned checks that are identical to those
> > done as a part of Write flow.
> > 
> > The code to support for Zone Descriptor Extensions is not included in
> > this commit and ZDES 0 is always reported. A later commit in this
> > series will add ZDE support.
> > 
> > This commit doesn't yet include checks for active and open zone
> > limits. It is assumed that there are no limits on either active or
> > open zones.
> > 
> > Signed-off-by: Niklas Cassel 
> > Signed-off-by: Hans Holmberg 
> > Signed-off-by: Ajay Joshi 
> > Signed-off-by: Chaitanya Kulkarni 
> > Signed-off-by: Matias Bjorling 
> > Signed-off-by: Aravind Ramesh 
> > Signed-off-by: Shin'ichiro Kawasaki 
> > Signed-off-by: Adam Manzanares 
> > Signed-off-by: Dmitry Fomichev 
> > ---
> >  block/nvme.c |   2 +-
> >  hw/block/nvme-ns.c   | 185 -
> >  hw/block/nvme-ns.h   |   6 +-
> >  hw/block/nvme.c  | 872 +--
> >  include/block/nvme.h |   6 +-
> >  5 files changed, 1033 insertions(+), 38 deletions(-)
> > 
> > diff --git a/block/nvme.c b/block/nvme.c
> > index 05485fdd11..7a513c9a17 100644
> > --- a/block/nvme.c
> > +++ b/block/nvme.c
> > @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
> >  {
> >  uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
> >  if (status) {
> > -trace_nvme_error(le32_to_cpu(c->result),
> > +trace_nvme_error(le32_to_cpu(c->result32),
> >   le16_to_cpu(c->sq_head),
> >   le16_to_cpu(c->sq_id),
> >   le16_to_cpu(c->cid),
> > diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> > index 31b7f986c3..6d9dc9205b 100644
> > --- a/hw/block/nvme-ns.c
> > +++ b/hw/block/nvme-ns.c
> > @@ -33,14 +33,14 @@ static void nvme_ns_init(NvmeNamespace *ns)
> >  NvmeIdNs *id_ns = >id_ns;
> >  
> >  if (blk_get_flags(ns->blkconf.blk) & BDRV_O_UNMAP) {
> > -ns->id_ns.dlfeat = 0x9;
> > +ns->id_ns.dlfeat = 0x8;
> 
> You seem to change something that is NVM namespace specific here, why?
> If it is indeed needed, I assume that this change should be in a separate
> patch.
> 

Stood out to me as well - and I thought it was sound enough, but now I'm
not sure sure.

DLFEAT is set to 0x8, which only signifies that Deallocate in Write
Zeroes is supported. Previously, it would also signify that returned
values would be 0x00 (DLFEAT 0x8 | 0x1). But since Dmitry added the
fill_pattern parameter...


> > +static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int 
> > lba_index,
> > +  Error **errp)
> > +{
> > +NvmeIdNsZoned *id_ns_z;
> > +
> > +if (n->params.fill_pattern == 0) {
> > +ns->id_ns.dlfeat |= 0x01;
> > +} else if (n->params.fill_pattern == 0xff) {
> > +ns->id_ns.dlfeat |= 0x02;
> > +}

... then, when initialized, we look at the fill_pattern and set DLFEAT
accordingly instead.

But since fill_pattern only works for ZNS namespaces, I think dlfeat
should still be 0x9 for NVM namespaces. For NVM namespaces, since
neither DULBE or DSM is not supported, there is really only Write Zeroes
that can explicitly "deallocate" a block, and since that *will* write
zeroes no matter if DEAC is set or not, 0x00 pattern is guaranteed.


signature.asc
Description: PGP signature

Re: [PATCH 0/4] assorted gcc 10/fedora32 compile warning fixes

2020-09-30 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20200930155859.303148-1-borntrae...@de.ibm.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20200930155859.303148-1-borntrae...@de.ibm.com
Subject: [PATCH 0/4] assorted gcc 10/fedora32 compile warning fixes

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20200930043150.1454766-1-js...@redhat.com -> 
patchew/20200930043150.1454766-1-js...@redhat.com
 - [tag update]  patchew/20200930095321.2006-1-zhaolich...@huawei.com -> 
patchew/20200930095321.2006-1-zhaolich...@huawei.com
 - [tag update]  patchew/20200930151616.3588165-1-mky...@tachyum.com -> 
patchew/20200930151616.3588165-1-mky...@tachyum.com
 - [tag update]  patchew/20200930155859.303148-1-borntrae...@de.ibm.com -> 
patchew/20200930155859.303148-1-borntrae...@de.ibm.com
 * [new tag] patchew/20200930164949.1425294-1-phi...@redhat.com -> 
patchew/20200930164949.1425294-1-phi...@redhat.com
fatal: failed to write ref-pack file
fatal: The remote end hung up unexpectedly
Traceback (most recent call last):
  File "patchew-tester/src/patchew-cli", line 521, in test_one
git_clone_repo(clone, r["repo"], r["head"], logf, True)
  File "patchew-tester/src/patchew-cli", line 53, in git_clone_repo
subprocess.check_call(clone_cmd, stderr=logf, stdout=logf)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 291, 
in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'clone', '-q', 
'/home/patchew/.cache/patchew-git-cache/httpsgithubcompatchewprojectqemu-3c8cf5a9c21ff8782164d1def7f44bd888713384',
 '/var/tmp/patchew-tester-tmp-529h9bha/src']' returned non-zero exit status 128.



The full log is available at
http://patchew.org/logs/20200930155859.303148-1-borntrae...@de.ibm.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v3 00/11] user-mode: Prune build dependencies (part 3)

2020-09-30 Thread Eduardo Habkost

On Wed, Sep 30, 2020 at 07:24:24PM +0200, Paolo Bonzini wrote:
> On 30/09/20 19:15, Eduardo Habkost wrote:
> > On Wed, Sep 30, 2020 at 06:49:38PM +0200, Philippe Mathieu-Daudé wrote:
> >> This is the third part of a series reducing user-mode
> >> dependencies. By stripping out unused code, the build
> >> and testing time is reduced (as is space used by objects).
> > I'm queueing patches 2-9 on machine-next.  Thanks!
> > 
> > Markus, Eric: I can merge the QAPI patches (1, 11) if I get an
> > Acked-by.
> > 
> > I'll send separate comments on patch 10.
> > 
> 
> 1-8 is fine, but I think 9-11 is too much complication (especially not
> really future-proof) for the benefit.

I'll dequeue patch 9 while this is discussed.

-- 
Eduardo

Re: [PATCH v3 00/11] user-mode: Prune build dependencies (part 3)

2020-09-30 Thread Paolo Bonzini

On 30/09/20 19:15, Eduardo Habkost wrote:
> On Wed, Sep 30, 2020 at 06:49:38PM +0200, Philippe Mathieu-Daudé wrote:
>> This is the third part of a series reducing user-mode
>> dependencies. By stripping out unused code, the build
>> and testing time is reduced (as is space used by objects).
> I'm queueing patches 2-9 on machine-next.  Thanks!
> 
> Markus, Eric: I can merge the QAPI patches (1, 11) if I get an
> Acked-by.
> 
> I'll send separate comments on patch 10.
> 

1-8 is fine, but I think 9-11 is too much complication (especially not
really future-proof) for the benefit.

Paolo

Re: [PATCH 2/4] nbd: silence maybe-uninitialized warnings

2020-09-30 Thread Eric Blake

On 9/30/20 10:58 AM, Christian Borntraeger wrote:
> gcc 10 from Fedora 32 gives me:
> 
> Compiling C object libblock.fa.p/nbd_server.c.o
> ../nbd/server.c: In function ‘nbd_co_client_start’:
> ../nbd/server.c:625:14: error: ‘namelen’ may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
>   625 | rc = nbd_negotiate_send_info(client, NBD_INFO_NAME, namelen, 
> name,
>   |  
> ^
>   626 |  errp);
>   |  ~
> ../nbd/server.c:564:14: note: ‘namelen’ was declared here
>   564 | uint32_t namelen;
>   |  ^~~
> cc1: all warnings being treated as errors
> 
> As I cannot see how this can happen, let uns silence the warning.

gcc is smart enough to see that nbd_opt_read_name(... ), which
is the only use of namelen between declaration and use, does not always
initialize namelen; but fails to see we also exit this function early in
the same conditions when nbd_opt_read_name left namelen uninit.  The
workaround is fine.

Reviewed-by: Eric Blake 

I'm happy for this to go in through the trivial tree, but I'll also
queue it on my NBD tree if that is ready first.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [PATCH v7 06/13] qmp: Call monitor_set_cur() only in qmp_dispatch()

2020-09-30 Thread Dr. David Alan Gilbert

* Kevin Wolf (kw...@redhat.com) wrote:
> Am 30.09.2020 um 15:14 hat Markus Armbruster geschrieben:
> > Kevin Wolf  writes:
> > 
> > > Am 30.09.2020 um 11:26 hat Markus Armbruster geschrieben:
> > >> Kevin Wolf  writes:
> > >> 
> > >> > Am 28.09.2020 um 13:42 hat Markus Armbruster geschrieben:
> > >> >> Kevin Wolf  writes:
> > >> >> 
> > >> >> > Am 14.09.2020 um 17:10 hat Markus Armbruster geschrieben:
> > >> >> >> Kevin Wolf  writes:
> > [...]
> > >> >> >> > diff --git a/monitor/qmp.c b/monitor/qmp.c
> > >> >> >> > index 8469970c69..922fdb5541 100644
> > >> >> >> > --- a/monitor/qmp.c
> > >> >> >> > +++ b/monitor/qmp.c
> > >> >> >> > @@ -135,16 +135,10 @@ static void monitor_qmp_respond(MonitorQMP 
> > >> >> >> > *mon, QDict *rsp)
> > >> >> >> >  
> > >> >> >> >  static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
> > >> >> >> >  {
> > >> >> >> > -Monitor *old_mon;
> > >> >> >> >  QDict *rsp;
> > >> >> >> >  QDict *error;
> > >> >> >> >  
> > >> >> >> > -old_mon = monitor_set_cur(>common);
> > >> >> >> > -assert(old_mon == NULL);
> > >> >> >> > -
> > >> >> >> > -rsp = qmp_dispatch(mon->commands, req, 
> > >> >> >> > qmp_oob_enabled(mon));
> > >> >> >> > -
> > >> >> >> > -monitor_set_cur(NULL);
> > >> >> >> > +rsp = qmp_dispatch(mon->commands, req, 
> > >> >> >> > qmp_oob_enabled(mon), >common);
> > >> >> >> 
> > >> >> >> Long line.  Happy to wrap it in my tree.  A few more in PATCH 
> > >> >> >> 08-11.
> > >> >> >
> > >> >> > It's 79 characters. Should be fine even with your local deviation 
> > >> >> > from
> > >> >> > the coding style to require less than that for comments?
> > >> >> 
> > >> >> Let me rephrase my remark.
> > >> >> 
> > >> >> For me,
> > >> >> 
> > >> >> rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
> > >> >>>common);
> > >> >> 
> > >> >> is significantly easier to read than
> > >> >> 
> > >> >> rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon), 
> > >> >> >common);
> > >> >
> > >> > I guess this is highly subjective. I find wrapped lines harder to read.
> > >> > For answering subjective questions like this, we generally use the
> > >> > coding style document.
> > >> >
> > >> > Anyway, I guess following an idiosyncratic coding style that is
> > >> > different from every other subsystem in QEMU is possible (if
> > >> > inconvenient) if I know what it is.
> > >> 
> > >> The applicable coding style document is PEP 8.
> > >
> > > I'll happily apply PEP 8 to Python code, but this is C. I don't think
> > > PEP 8 applies very well to C code. (In fact, PEP 7 exists as a C style
> > > guide, but we're not writing C code for the Python project here...)
> > 
> > I got confused (too much Python code review), my apologies.
> > 
> > >> > My problem is more that I don't know what the exact rules are. Can they
> > >> > only be figured out experimentally by submitting patches and seeing
> > >> > whether you like them or not?
> > >> 
> > >> PEP 8:
> > >> 
> > >> A style guide is about consistency.  Consistency with this style
> > >> guide is important.  Consistency within a project is more important.
> > >> Consistency within one module or function is the most important.
> > >> 
> > >> In other words, you should make a reasonable effort to blend in.
> > >
> > > The project style guide for C is defined in CODING_STYLE.rst. Missing
> > > consistency with it is what I'm complaining about.
> > >
> > > I also agree that consistency within one module or function is most
> > > important, which is why I allow you to reformat my code. But I don't
> > > think it means that local coding style rules shouldn't be documented,
> > > especially if you can't just look at the code and see immediately how
> > > it's supposed to be.
> > >
> > >> >> Would you mind me wrapping this line in my tree?
> > >> >
> > >> > I have no say in this subsystem and I take it that you want all code to
> > >> > look as if you had written it yourself, so do as you wish.
> > >> 
> > >> I'm refusing the bait.
> > >> 
> > >> > But I understand that I'll have to respin anyway, so if you could
> > >> > explain what you're after, I might be able to apply the rules for the
> > >> > next version of the series.
> > >> 
> > >> First, PEP 8 again:
> > >> 
> > >> Limit all lines to a maximum of 79 characters.
> > >> 
> > >> For flowing long blocks of text with fewer structural restrictions
> > >> (docstrings or comments), the line length should be limited to 72
> > >> characters.
> > >
> > > Ok, that's finally clear limits at least.
> > >
> > > Any other rules from PEP 8 that you want to see applied to C code?
> > 
> > PEP 8 does not apply to C.
> > 
> > > Would you mind documenting this somewhere?
> > >
> > >> Second, an argument we two had on this list, during review of a prior
> > >> version of this patch series, talking about C:
> > >> 
> > >> Legibility.  Humans tend to have trouble following long lines with
> > >>

Re: [PATCH v3 10/11] target/i386: Restrict X86CPUFeatureWord to X86 targets

2020-09-30 Thread Eduardo Habkost

On Wed, Sep 30, 2020 at 06:49:48PM +0200, Philippe Mathieu-Daudé wrote:
> Only qemu-system-FOO and qemu-storage-daemon provide QMP
> monitors, therefore such declarations and definitions are
> irrelevant for user-mode emulation.
> 
> Restricting the x86-specific commands to machine-target.json
> pulls less QAPI-generated code into user-mode.
> 
> Add a stub to satisfy linking in user-mode:
> 
>   /usr/bin/ld: libqemu-i386-linux-user.fa.p/target_i386_cpu.c.o: in function 
> `x86_cpu_get_feature_words':
>   target/i386/cpu.c:4643: undefined reference to 
> `visit_type_X86CPUFeatureWordInfoList'
>   collect2: error: ld returned 1 exit status
>   make: *** [Makefile.ninja:1125: qemu-i386] Error 1
> 

If you don't want the QAPI definitions in user mode, there's no
reason to register the properties in user mode.  Wrapping #ifdef
around "feature-words" and "filtered-features" registration would
be simpler than adding a stub.

> Acked-by: Richard Henderson 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v3: Reworded + Meson rebase
> ---
>  qapi/machine-target.json   | 45 ++
>  qapi/machine.json  | 42 ---
>  target/i386/cpu.c  |  2 +-
>  target/i386/feature-stub.c | 23 +++
>  target/i386/meson.build|  1 +
>  5 files changed, 70 insertions(+), 43 deletions(-)
>  create mode 100644 target/i386/feature-stub.c
> 
> diff --git a/qapi/machine-target.json b/qapi/machine-target.json
> index 698850cc78..b4d769a53b 100644
> --- a/qapi/machine-target.json
> +++ b/qapi/machine-target.json
> @@ -4,6 +4,51 @@
>  # This work is licensed under the terms of the GNU GPL, version 2 or later.
>  # See the COPYING file in the top-level directory.
>  
> +##
> +# @X86CPURegister32:
> +#
> +# A X86 32-bit register
> +#
> +# Since: 1.5
> +##
> +{ 'enum': 'X86CPURegister32',
> +  'data': [ 'EAX', 'EBX', 'ECX', 'EDX', 'ESP', 'EBP', 'ESI', 'EDI' ],
> +  'if': 'defined(TARGET_I386)' }
> +
> +##
> +# @X86CPUFeatureWordInfo:
> +#
> +# Information about a X86 CPU feature word
> +#
> +# @cpuid-input-eax: Input EAX value for CPUID instruction for that feature 
> word
> +#
> +# @cpuid-input-ecx: Input ECX value for CPUID instruction for that
> +#   feature word
> +#
> +# @cpuid-register: Output register containing the feature bits
> +#
> +# @features: value of output register, containing the feature bits
> +#
> +# Since: 1.5
> +##
> +{ 'struct': 'X86CPUFeatureWordInfo',
> +  'data': { 'cpuid-input-eax': 'int',
> +'*cpuid-input-ecx': 'int',
> +'cpuid-register': 'X86CPURegister32',
> +'features': 'int' },
> +  'if': 'defined(TARGET_I386)' }
> +
> +##
> +# @DummyForceArrays:
> +#
> +# Not used by QMP; hack to let us use X86CPUFeatureWordInfoList internally
> +#
> +# Since: 2.5
> +##
> +{ 'struct': 'DummyForceArrays',
> +  'data': { 'unused': ['X86CPUFeatureWordInfo'] },
> +  'if': 'defined(TARGET_I386)' }
> +
>  ##
>  # @CpuModelInfo:
>  #
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 72f014bb5b..cb878acdac 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -544,48 +544,6 @@
> 'dst': 'uint16',
> 'val': 'uint8' }}
>  
> -##
> -# @X86CPURegister32:
> -#
> -# A X86 32-bit register
> -#
> -# Since: 1.5
> -##
> -{ 'enum': 'X86CPURegister32',
> -  'data': [ 'EAX', 'EBX', 'ECX', 'EDX', 'ESP', 'EBP', 'ESI', 'EDI' ] }
> -
> -##
> -# @X86CPUFeatureWordInfo:
> -#
> -# Information about a X86 CPU feature word
> -#
> -# @cpuid-input-eax: Input EAX value for CPUID instruction for that feature 
> word
> -#
> -# @cpuid-input-ecx: Input ECX value for CPUID instruction for that
> -#   feature word
> -#
> -# @cpuid-register: Output register containing the feature bits
> -#
> -# @features: value of output register, containing the feature bits
> -#
> -# Since: 1.5
> -##
> -{ 'struct': 'X86CPUFeatureWordInfo',
> -  'data': { 'cpuid-input-eax': 'int',
> -'*cpuid-input-ecx': 'int',
> -'cpuid-register': 'X86CPURegister32',
> -'features': 'int' } }
> -
> -##
> -# @DummyForceArrays:
> -#
> -# Not used by QMP; hack to let us use X86CPUFeatureWordInfoList internally
> -#
> -# Since: 2.5
> -##
> -{ 'struct': 'DummyForceArrays',
> -  'data': { 'unused': ['X86CPUFeatureWordInfo'] } }
> -
>  ##
>  # @NumaCpuOptions:
>  #
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 3ffd877dd5..d45fa217cc 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -38,7 +38,7 @@
>  #include "qemu/option.h"
>  #include "qemu/config-file.h"
>  #include "qapi/error.h"
> -#include "qapi/qapi-visit-machine.h"
> +#include "qapi/qapi-visit-machine-target.h"
>  #include "qapi/qapi-visit-run-state.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qerror.h"
> diff --git a/target/i386/feature-stub.c b/target/i386/feature-stub.c
> new file mode 100644
> index 00..787c3c7fa1
> --- /dev/null
> +++ b/target/i386/feature-stub.c
> @@ -0,0

Re: [PATCH v3 00/11] user-mode: Prune build dependencies (part 3)

2020-09-30 Thread Eduardo Habkost

On Wed, Sep 30, 2020 at 06:49:38PM +0200, Philippe Mathieu-Daudé wrote:
> This is the third part of a series reducing user-mode
> dependencies. By stripping out unused code, the build
> and testing time is reduced (as is space used by objects).

I'm queueing patches 2-9 on machine-next.  Thanks!

Markus, Eric: I can merge the QAPI patches (1, 11) if I get an
Acked-by.

I'll send separate comments on patch 10.

> 
> Part 3:
> - Extract code not related to user-mode from hw/core/qdev-properties.c
> - Reduce user-mode QAPI generated files
> 
> Since v2:
> - Fixed UuidInfo placed in incorrect json
> - Rebased on Meson
> - Include X86CPUFeatureWord unmerged from part 2
> 
> Since v1:
> - Addressed Richard and Paolo review comments
> 
> Patches missing review: QAPI ones :)
> - #1  'qapi: Restrict query-uuid command to block code'
> - #11 'qapi: Restrict code generated for user-mode'
> 
> Green CI: https://gitlab.com/philmd/qemu/-/pipelines/196505787
> 
> v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg688879.html
> v1: https://www.mail-archive.com/qemu-devel@nongnu.org/msg688486.html
> 
> Philippe Mathieu-Daudé (11):
>   qapi: Restrict query-uuid command to block code
>   hw/core/qdev-properties: Use qemu_strtol() in set_mac() handler
>   hw/core/qdev-properties: Use qemu_strtoul() in set_pci_host_devaddr()
>   hw/core/qdev-properties: Fix code style
>   hw/core/qdev-properties: Export enum-related functions
>   hw/core/qdev-properties: Export qdev_prop_enum
>   hw/core/qdev-properties: Export some integer-related functions
>   hw/core/qdev-properties: Extract system-mode specific properties
>   hw/core: Add qdev stub for user-mode
>   target/i386: Restrict X86CPUFeatureWord to X86 targets
>   qapi: Restrict code generated for user-mode
> 
>  qapi/block.json  |  30 ++
>  qapi/machine-target.json |  45 ++
>  qapi/machine.json|  72 ---
>  hw/core/qdev-prop-internal.h |  30 ++
>  include/hw/qdev-properties.h |   1 +
>  block/iscsi.c|   2 +-
>  hw/core/qdev-properties-system.c | 687 -
>  hw/core/qdev-properties.c| 735 ++-
>  stubs/qdev-system.c  |  24 +
>  stubs/uuid.c |   2 +-
>  target/i386/cpu.c|   2 +-
>  target/i386/feature-stub.c   |  23 +
>  qapi/meson.build |  51 ++-
>  stubs/meson.build|   5 +-
>  target/i386/meson.build  |   1 +
>  15 files changed, 915 insertions(+), 795 deletions(-)
>  create mode 100644 hw/core/qdev-prop-internal.h
>  create mode 100644 stubs/qdev-system.c
>  create mode 100644 target/i386/feature-stub.c
> 
> -- 
> 2.26.2
> 

-- 
Eduardo

[PATCH v3 07/11] hw/core/qdev-properties: Export some integer-related functions

2020-09-30 Thread Philippe Mathieu-Daudé

We are going to split this file and reuse these static functions.
Declare them in the local "qdev-prop-internal.h" header.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
v3:
Also export qdev_propinfo_get_size32 introduced in commits
914e74cda9 ("qdev-properties: add size32 property type") and
031ffd9a61 ("qdev-properties: add getter for size32 and blocksize").
---
 hw/core/qdev-prop-internal.h | 11 +
 hw/core/qdev-properties.c| 46 +++-
 2 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/hw/core/qdev-prop-internal.h b/hw/core/qdev-prop-internal.h
index 2a8c9a306a..9cf5cc1d51 100644
--- a/hw/core/qdev-prop-internal.h
+++ b/hw/core/qdev-prop-internal.h
@@ -15,5 +15,16 @@ void qdev_propinfo_set_enum(Object *obj, Visitor *v, const 
char *name,
 
 void qdev_propinfo_set_default_value_enum(ObjectProperty *op,
   const Property *prop);
+void qdev_propinfo_set_default_value_int(ObjectProperty *op,
+ const Property *prop);
+void qdev_propinfo_set_default_value_uint(ObjectProperty *op,
+  const Property *prop);
+
+void qdev_propinfo_get_uint16(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp);
+void qdev_propinfo_get_int32(Object *obj, Visitor *v, const char *name,
+ void *opaque, Error **errp);
+void qdev_propinfo_get_size32(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp);
 
 #endif
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 31dfe441e2..37e309077a 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -271,12 +271,14 @@ static void set_uint8(Object *obj, Visitor *v, const char 
*name, void *opaque,
 visit_type_uint8(v, name, ptr, errp);
 }
 
-static void set_default_value_int(ObjectProperty *op, const Property *prop)
+void qdev_propinfo_set_default_value_int(ObjectProperty *op,
+ const Property *prop)
 {
 object_property_set_default_int(op, prop->defval.i);
 }
 
-static void set_default_value_uint(ObjectProperty *op, const Property *prop)
+void qdev_propinfo_set_default_value_uint(ObjectProperty *op,
+  const Property *prop)
 {
 object_property_set_default_uint(op, prop->defval.u);
 }
@@ -285,13 +287,13 @@ const PropertyInfo qdev_prop_uint8 = {
 .name  = "uint8",
 .get   = get_uint8,
 .set   = set_uint8,
-.set_default_value = set_default_value_uint,
+.set_default_value = qdev_propinfo_set_default_value_uint,
 };
 
 /* --- 16bit integer --- */
 
-static void get_uint16(Object *obj, Visitor *v, const char *name,
-   void *opaque, Error **errp)
+void qdev_propinfo_get_uint16(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
@@ -317,9 +319,9 @@ static void set_uint16(Object *obj, Visitor *v, const char 
*name,
 
 const PropertyInfo qdev_prop_uint16 = {
 .name  = "uint16",
-.get   = get_uint16,
+.get   = qdev_propinfo_get_uint16,
 .set   = set_uint16,
-.set_default_value = set_default_value_uint,
+.set_default_value = qdev_propinfo_set_default_value_uint,
 };
 
 /* --- 32bit integer --- */
@@ -349,8 +351,8 @@ static void set_uint32(Object *obj, Visitor *v, const char 
*name,
 visit_type_uint32(v, name, ptr, errp);
 }
 
-static void get_int32(Object *obj, Visitor *v, const char *name, void *opaque,
-  Error **errp)
+void qdev_propinfo_get_int32(Object *obj, Visitor *v, const char *name,
+ void *opaque, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
@@ -378,14 +380,14 @@ const PropertyInfo qdev_prop_uint32 = {
 .name  = "uint32",
 .get   = get_uint32,
 .set   = set_uint32,
-.set_default_value = set_default_value_uint,
+.set_default_value = qdev_propinfo_set_default_value_uint,
 };
 
 const PropertyInfo qdev_prop_int32 = {
 .name  = "int32",
-.get   = get_int32,
+.get   = qdev_propinfo_get_int32,
 .set   = set_int32,
-.set_default_value = set_default_value_int,
+.set_default_value = qdev_propinfo_set_default_value_int,
 };
 
 /* --- 64bit integer --- */
@@ -444,14 +446,14 @@ const PropertyInfo qdev_prop_uint64 = {
 .name  = "uint64",
 .get   = get_uint64,
 .set   = set_uint64,
-.set_default_value = set_default_value_uint,
+.set_default_value = qdev_propinfo_set_default_value_uint,
 };
 
 const PropertyInfo qdev_prop_int64 = {
 .name  = "int64",
 .get   = get_int64,
 .set   = set_int64,
-.set_default_value = set_default_value_int,
+.set_default_value = qdev_propinfo_set_default_value_int,
 };
 
 /*

[PATCH v3 11/11] qapi: Restrict code generated for user-mode

2020-09-30 Thread Philippe Mathieu-Daudé

A lot of QAPI generated code is never used by user-mode.

Split out qapi_system_modules and qapi_system_or_tools_modules
from the qapi_all_modules array. We now have 3 groups:
- always used
- use by system-mode or tools (usually by the block layer)
- only used by system-mode

Signed-off-by: Philippe Mathieu-Daudé 
---
Resetting due to Meson update:
Reviewed-by: Richard Henderson 
---
 qapi/meson.build | 51 ++--
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/qapi/meson.build b/qapi/meson.build
index 7c4a89a882..ba9677ba97 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -14,39 +14,60 @@ util_ss.add(files(
 ))
 
 qapi_all_modules = [
+  'common',
+  'introspect',
+  'misc',
+]
+
+qapi_system_modules = [
   'acpi',
   'audio',
+  'dump',
+  'machine-target',
+  'machine',
+  'migration',
+  'misc-target',
+  'net',
+  'pci',
+  'qdev',
+  'rdma',
+  'rocker',
+  'tpm',
+  'trace',
+]
+
+# system or tools
+qapi_block_modules = [
   'authz',
   'block-core',
   'block',
   'char',
-  'common',
   'control',
   'crypto',
-  'dump',
   'error',
-  'introspect',
   'job',
-  'machine',
-  'machine-target',
-  'migration',
-  'misc',
-  'misc-target',
-  'net',
   'pragma',
-  'qdev',
-  'pci',
   'qom',
-  'rdma',
-  'rocker',
   'run-state',
   'sockets',
-  'tpm',
-  'trace',
   'transaction',
   'ui',
 ]
 
+if have_system
+  qapi_all_modules += qapi_system_modules
+elif have_user
+  # Temporary kludge because X86CPUFeatureWordInfo is not
+  # restricted to system-mode. This should be removed (along
+  # with target/i386/feature-stub.c) once target/i386/cpu.c
+  # has been cleaned.
+  qapi_all_modules += ['machine-target']
+endif
+
+if have_block
+  qapi_all_modules += qapi_block_modules
+endif
+
 qapi_storage_daemon_modules = [
   'block-core',
   'char',
-- 
2.26.2

[PATCH v3 05/11] hw/core/qdev-properties: Export enum-related functions

2020-09-30 Thread Philippe Mathieu-Daudé

We are going to split this file and reuse these static functions.
Add the local "qdev-prop-internal.h" header declaring them.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/core/qdev-prop-internal.h | 19 
 hw/core/qdev-properties.c| 58 +++-
 2 files changed, 49 insertions(+), 28 deletions(-)
 create mode 100644 hw/core/qdev-prop-internal.h

diff --git a/hw/core/qdev-prop-internal.h b/hw/core/qdev-prop-internal.h
new file mode 100644
index 00..2a8c9a306a
--- /dev/null
+++ b/hw/core/qdev-prop-internal.h
@@ -0,0 +1,19 @@
+/*
+ * qdev property parsing
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_CORE_QDEV_PROP_INTERNAL_H
+#define HW_CORE_QDEV_PROP_INTERNAL_H
+
+void qdev_propinfo_get_enum(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp);
+void qdev_propinfo_set_enum(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp);
+
+void qdev_propinfo_set_default_value_enum(ObjectProperty *op,
+  const Property *prop);
+
+#endif
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 071fd5864a..76417d0936 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -18,6 +18,7 @@
 #include "qemu/uuid.h"
 #include "qemu/units.h"
 #include "qemu/cutils.h"
+#include "qdev-prop-internal.h"
 
 void qdev_prop_set_after_realize(DeviceState *dev, const char *name,
   Error **errp)
@@ -53,8 +54,8 @@ void *qdev_get_prop_ptr(DeviceState *dev, Property *prop)
 return ptr;
 }
 
-static void get_enum(Object *obj, Visitor *v, const char *name, void *opaque,
- Error **errp)
+void qdev_propinfo_get_enum(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
@@ -63,8 +64,8 @@ static void get_enum(Object *obj, Visitor *v, const char 
*name, void *opaque,
 visit_type_enum(v, prop->name, ptr, prop->info->enum_table, errp);
 }
 
-static void set_enum(Object *obj, Visitor *v, const char *name, void *opaque,
- Error **errp)
+void qdev_propinfo_set_enum(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
 {
 DeviceState *dev = DEVICE(obj);
 Property *prop = opaque;
@@ -78,7 +79,8 @@ static void set_enum(Object *obj, Visitor *v, const char 
*name, void *opaque,
 visit_type_enum(v, prop->name, ptr, prop->info->enum_table, errp);
 }
 
-static void set_default_value_enum(ObjectProperty *op, const Property *prop)
+void qdev_propinfo_set_default_value_enum(ObjectProperty *op,
+  const Property *prop)
 {
 object_property_set_default_str(op,
 qapi_enum_lookup(prop->info->enum_table, prop->defval.i));
@@ -669,9 +671,9 @@ const PropertyInfo qdev_prop_on_off_auto = {
 .name = "OnOffAuto",
 .description = "on/off/auto",
 .enum_table = _lookup,
-.get = get_enum,
-.set = set_enum,
-.set_default_value = set_default_value_enum,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
 };
 
 /* --- lost tick policy --- */
@@ -681,9 +683,9 @@ QEMU_BUILD_BUG_ON(sizeof(LostTickPolicy) != sizeof(int));
 const PropertyInfo qdev_prop_losttickpolicy = {
 .name  = "LostTickPolicy",
 .enum_table  = _lookup,
-.get   = get_enum,
-.set   = set_enum,
-.set_default_value = set_default_value_enum,
+.get   = qdev_propinfo_get_enum,
+.set   = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
 };
 
 /* --- Block device error handling policy --- */
@@ -695,9 +697,9 @@ const PropertyInfo qdev_prop_blockdev_on_error = {
 .description = "Error handling policy, "
"report/ignore/enospc/stop/auto",
 .enum_table = _lookup,
-.get = get_enum,
-.set = set_enum,
-.set_default_value = set_default_value_enum,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
 };
 
 /* --- BIOS CHS translation */
@@ -709,9 +711,9 @@ const PropertyInfo qdev_prop_bios_chs_trans = {
 .description = "Logical CHS translation algorithm, "
"auto/none/lba/large/rechs",
 .enum_table = _lookup,
-.get = get_enum,
-.set = set_enum,
-.set_default_value = set_default_value_enum,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
 };
 
 /* --- FDC default drive types */
@@ -721,9 +723,9 @@ const PropertyInfo qdev_prop_fdc_drive_type = {

[PATCH v3 09/11] hw/core: Add qdev stub for user-mode

2020-09-30 Thread Philippe Mathieu-Daudé

While user-mode does not use peripherals (devices), it uses a
CPU which is a device.
In the next commit we will reduce the QAPI generated code for
user-mode. Since qdev.c calls qapi_event_send_device_deleted()
in device_finalize, let's add a stub for it.

Suggested-by: Paolo Bonzini 
Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
v3: Meson rebase
---
 stubs/qdev-system.c | 24 
 stubs/meson.build   |  1 +
 2 files changed, 25 insertions(+)
 create mode 100644 stubs/qdev-system.c

diff --git a/stubs/qdev-system.c b/stubs/qdev-system.c
new file mode 100644
index 00..2b4b54f621
--- /dev/null
+++ b/stubs/qdev-system.c
@@ -0,0 +1,24 @@
+/*
+ * QAPI qdev stubs
+ *
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ *   Philippe Mathieu-Daudé 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/qapi-events-qdev.h"
+
+void qapi_event_send_device_deleted(bool has_device,
+const char *device, const char *path)
+{
+/*
+ * Called in user-mode in fork() when a CPUState is qdev::finalize()'d.
+ * Simply ignore the QAPI event there.
+ */
+}
diff --git a/stubs/meson.build b/stubs/meson.build
index 2e231590e1..71d42c34d6 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -25,6 +25,7 @@ stub_ss.add(files('monitor.c'))
 stub_ss.add(files('monitor-core.c'))
 stub_ss.add(files('pci-bus.c'))
 stub_ss.add(files('pci-host-piix.c'))
+stub_ss.add(files('qdev-system.c'))
 stub_ss.add(files('qemu-timer-notify-cb.c'))
 stub_ss.add(files('qmp_memory_device.c'))
 stub_ss.add(files('qtest.c'))
-- 
2.26.2

[PATCH v3 10/11] target/i386: Restrict X86CPUFeatureWord to X86 targets

2020-09-30 Thread Philippe Mathieu-Daudé

Only qemu-system-FOO and qemu-storage-daemon provide QMP
monitors, therefore such declarations and definitions are
irrelevant for user-mode emulation.

Restricting the x86-specific commands to machine-target.json
pulls less QAPI-generated code into user-mode.

Add a stub to satisfy linking in user-mode:

  /usr/bin/ld: libqemu-i386-linux-user.fa.p/target_i386_cpu.c.o: in function 
`x86_cpu_get_feature_words':
  target/i386/cpu.c:4643: undefined reference to 
`visit_type_X86CPUFeatureWordInfoList'
  collect2: error: ld returned 1 exit status
  make: *** [Makefile.ninja:1125: qemu-i386] Error 1

Acked-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
v3: Reworded + Meson rebase
---
 qapi/machine-target.json   | 45 ++
 qapi/machine.json  | 42 ---
 target/i386/cpu.c  |  2 +-
 target/i386/feature-stub.c | 23 +++
 target/i386/meson.build|  1 +
 5 files changed, 70 insertions(+), 43 deletions(-)
 create mode 100644 target/i386/feature-stub.c

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index 698850cc78..b4d769a53b 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -4,6 +4,51 @@
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
 
+##
+# @X86CPURegister32:
+#
+# A X86 32-bit register
+#
+# Since: 1.5
+##
+{ 'enum': 'X86CPURegister32',
+  'data': [ 'EAX', 'EBX', 'ECX', 'EDX', 'ESP', 'EBP', 'ESI', 'EDI' ],
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @X86CPUFeatureWordInfo:
+#
+# Information about a X86 CPU feature word
+#
+# @cpuid-input-eax: Input EAX value for CPUID instruction for that feature word
+#
+# @cpuid-input-ecx: Input ECX value for CPUID instruction for that
+#   feature word
+#
+# @cpuid-register: Output register containing the feature bits
+#
+# @features: value of output register, containing the feature bits
+#
+# Since: 1.5
+##
+{ 'struct': 'X86CPUFeatureWordInfo',
+  'data': { 'cpuid-input-eax': 'int',
+'*cpuid-input-ecx': 'int',
+'cpuid-register': 'X86CPURegister32',
+'features': 'int' },
+  'if': 'defined(TARGET_I386)' }
+
+##
+# @DummyForceArrays:
+#
+# Not used by QMP; hack to let us use X86CPUFeatureWordInfoList internally
+#
+# Since: 2.5
+##
+{ 'struct': 'DummyForceArrays',
+  'data': { 'unused': ['X86CPUFeatureWordInfo'] },
+  'if': 'defined(TARGET_I386)' }
+
 ##
 # @CpuModelInfo:
 #
diff --git a/qapi/machine.json b/qapi/machine.json
index 72f014bb5b..cb878acdac 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -544,48 +544,6 @@
'dst': 'uint16',
'val': 'uint8' }}
 
-##
-# @X86CPURegister32:
-#
-# A X86 32-bit register
-#
-# Since: 1.5
-##
-{ 'enum': 'X86CPURegister32',
-  'data': [ 'EAX', 'EBX', 'ECX', 'EDX', 'ESP', 'EBP', 'ESI', 'EDI' ] }
-
-##
-# @X86CPUFeatureWordInfo:
-#
-# Information about a X86 CPU feature word
-#
-# @cpuid-input-eax: Input EAX value for CPUID instruction for that feature word
-#
-# @cpuid-input-ecx: Input ECX value for CPUID instruction for that
-#   feature word
-#
-# @cpuid-register: Output register containing the feature bits
-#
-# @features: value of output register, containing the feature bits
-#
-# Since: 1.5
-##
-{ 'struct': 'X86CPUFeatureWordInfo',
-  'data': { 'cpuid-input-eax': 'int',
-'*cpuid-input-ecx': 'int',
-'cpuid-register': 'X86CPURegister32',
-'features': 'int' } }
-
-##
-# @DummyForceArrays:
-#
-# Not used by QMP; hack to let us use X86CPUFeatureWordInfoList internally
-#
-# Since: 2.5
-##
-{ 'struct': 'DummyForceArrays',
-  'data': { 'unused': ['X86CPUFeatureWordInfo'] } }
-
 ##
 # @NumaCpuOptions:
 #
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3ffd877dd5..d45fa217cc 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -38,7 +38,7 @@
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 #include "qapi/error.h"
-#include "qapi/qapi-visit-machine.h"
+#include "qapi/qapi-visit-machine-target.h"
 #include "qapi/qapi-visit-run-state.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qerror.h"
diff --git a/target/i386/feature-stub.c b/target/i386/feature-stub.c
new file mode 100644
index 00..787c3c7fa1
--- /dev/null
+++ b/target/i386/feature-stub.c
@@ -0,0 +1,23 @@
+/*
+ * QAPI x86 CPU features stub
+ *
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Author:
+ *   Philippe Mathieu-Daudé 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-visit-machine-target.h"
+
+bool visit_type_X86CPUFeatureWordInfoList(Visitor *v, const char *name,
+  X86CPUFeatureWordInfoList **obj,
+  Error **errp)
+{

[PATCH v3 08/11] hw/core/qdev-properties: Extract system-mode specific properties

2020-09-30 Thread Philippe Mathieu-Daudé

Move properties specific to machines into a separate file.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
v3: Also move Reserved Region introduced in commit f78069253c
("qdev: Introduce DEFINE_PROP_RESERVED_REGION").
---
 hw/core/qdev-properties-system.c | 687 ++-
 hw/core/qdev-properties.c| 674 --
 2 files changed, 679 insertions(+), 682 deletions(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index b29daf4fb5..49bdd12581 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -11,19 +11,25 @@
  */
 
 #include "qemu/osdep.h"
-#include "audio/audio.h"
-#include "net/net.h"
 #include "hw/qdev-properties.h"
 #include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "qapi/qapi-types-block.h"
+#include "qapi/qapi-types-machine.h"
+#include "qapi/qapi-types-migration.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/ctype.h"
+#include "qemu/cutils.h"
+#include "qemu/units.h"
+#include "qemu/error-report.h"
+#include "qdev-prop-internal.h"
+
+#include "audio/audio.h"
+#include "chardev/char-fe.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/blockdev.h"
-#include "hw/block/block.h"
-#include "net/hub.h"
-#include "qapi/visitor.h"
-#include "chardev/char-fe.h"
-#include "sysemu/iothread.h"
-#include "sysemu/tpm_backend.h"
+#include "net/net.h"
+#include "hw/pci/pci.h"
 
 static bool check_prop_still_unset(DeviceState *dev, const char *name,
const void *old_val, const char *new_val,
@@ -280,6 +286,96 @@ const PropertyInfo qdev_prop_chr = {
 .release = release_chr,
 };
 
+/* --- mac address --- */
+
+/*
+ * accepted syntax versions:
+ *   01:02:03:04:05:06
+ *   01-02-03-04-05-06
+ */
+static void get_mac(Object *obj, Visitor *v, const char *name, void *opaque,
+Error **errp)
+{
+DeviceState *dev = DEVICE(obj);
+Property *prop = opaque;
+MACAddr *mac = qdev_get_prop_ptr(dev, prop);
+char buffer[2 * 6 + 5 + 1];
+char *p = buffer;
+
+snprintf(buffer, sizeof(buffer), "%02x:%02x:%02x:%02x:%02x:%02x",
+ mac->a[0], mac->a[1], mac->a[2],
+ mac->a[3], mac->a[4], mac->a[5]);
+
+visit_type_str(v, name, , errp);
+}
+
+static void set_mac(Object *obj, Visitor *v, const char *name, void *opaque,
+Error **errp)
+{
+DeviceState *dev = DEVICE(obj);
+Property *prop = opaque;
+MACAddr *mac = qdev_get_prop_ptr(dev, prop);
+int i, pos;
+char *str;
+const char *p;
+
+if (dev->realized) {
+qdev_prop_set_after_realize(dev, name, errp);
+return;
+}
+
+if (!visit_type_str(v, name, , errp)) {
+return;
+}
+
+for (i = 0, pos = 0; i < 6; i++, pos += 3) {
+long val;
+
+if (!qemu_isxdigit(str[pos])) {
+goto inval;
+}
+if (!qemu_isxdigit(str[pos + 1])) {
+goto inval;
+}
+if (i == 5) {
+if (str[pos + 2] != '\0') {
+goto inval;
+}
+} else {
+if (str[pos + 2] != ':' && str[pos + 2] != '-') {
+goto inval;
+}
+}
+if (qemu_strtol(str + pos, , 16, ) < 0 || val > 0xff) {
+goto inval;
+}
+mac->a[i] = val;
+}
+g_free(str);
+return;
+
+inval:
+error_set_from_qdev_prop_error(errp, EINVAL, dev, prop, str);
+g_free(str);
+}
+
+const PropertyInfo qdev_prop_macaddr = {
+.name  = "str",
+.description = "Ethernet 6-byte MAC Address, example: 52:54:00:12:34:56",
+.get   = get_mac,
+.set   = set_mac,
+};
+
+void qdev_prop_set_macaddr(DeviceState *dev, const char *name,
+   const uint8_t *value)
+{
+char str[2 * 6 + 5 + 1];
+snprintf(str, sizeof(str), "%02x:%02x:%02x:%02x:%02x:%02x",
+ value[0], value[1], value[2], value[3], value[4], value[5]);
+
+object_property_set_str(OBJECT(dev), name, str, _abort);
+}
+
 /* --- netdev device --- */
 static void get_netdev(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
@@ -465,3 +561,578 @@ void qdev_set_nic_properties(DeviceState *dev, NICInfo 
*nd)
 }
 nd->instantiated = 1;
 }
+
+/* --- lost tick policy --- */
+
+QEMU_BUILD_BUG_ON(sizeof(LostTickPolicy) != sizeof(int));
+
+const PropertyInfo qdev_prop_losttickpolicy = {
+.name  = "LostTickPolicy",
+.enum_table  = _lookup,
+.get   = qdev_propinfo_get_enum,
+.set   = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+};
+
+/* --- blocksize --- */
+
+/* lower limit is sector size */
+#define MIN_BLOCK_SIZE  512
+#define MIN_BLOCK_SIZE_STR  "512 B"
+/*
+ * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
+ * matches qcow2 cluster size limit
+ */
+#define MAX_BLOCK_SIZE  (2 *

[PATCH v3 01/11] qapi: Restrict query-uuid command to block code

2020-09-30 Thread Philippe Mathieu-Daudé

In commit f68c01470b we restricted the query-uuid command to
machine code, but it is incorrect, as it is also used by the
tools.  Therefore move this command again, but to block.json,
which is shared by machine code and tools.

Fixes: f68c01470b ("qapi: Restrict query-uuid command to machine code")
Signed-off-by: Philippe Mathieu-Daudé 
---
 qapi/block.json   | 30 ++
 qapi/machine.json | 30 --
 block/iscsi.c |  2 +-
 stubs/uuid.c  |  2 +-
 stubs/meson.build |  4 +++-
 5 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/qapi/block.json b/qapi/block.json
index a009f7d3a2..4ae1716b56 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -11,6 +11,36 @@
 # == Additional block stuff (VM related)
 ##
 
+##
+# @UuidInfo:
+#
+# Guest UUID information (Universally Unique Identifier).
+#
+# @UUID: the UUID of the guest
+#
+# Since: 0.14.0
+#
+# Notes: If no UUID was specified for the guest, a null UUID is returned.
+##
+{ 'struct': 'UuidInfo', 'data': {'UUID': 'str'} }
+
+##
+# @query-uuid:
+#
+# Query the guest UUID information.
+#
+# Returns: The @UuidInfo for the guest
+#
+# Since: 0.14.0
+#
+# Example:
+#
+# -> { "execute": "query-uuid" }
+# <- { "return": { "UUID": "550e8400-e29b-41d4-a716-44665544" } }
+#
+##
+{ 'command': 'query-uuid', 'returns': 'UuidInfo', 'allow-preconfig': true }
+
 ##
 # @BiosAtaTranslation:
 #
diff --git a/qapi/machine.json b/qapi/machine.json
index 756dacb06f..72f014bb5b 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -402,36 +402,6 @@
 ##
 { 'command': 'query-target', 'returns': 'TargetInfo' }
 
-##
-# @UuidInfo:
-#
-# Guest UUID information (Universally Unique Identifier).
-#
-# @UUID: the UUID of the guest
-#
-# Since: 0.14.0
-#
-# Notes: If no UUID was specified for the guest, a null UUID is returned.
-##
-{ 'struct': 'UuidInfo', 'data': {'UUID': 'str'} }
-
-##
-# @query-uuid:
-#
-# Query the guest UUID information.
-#
-# Returns: The @UuidInfo for the guest
-#
-# Since: 0.14.0
-#
-# Example:
-#
-# -> { "execute": "query-uuid" }
-# <- { "return": { "UUID": "550e8400-e29b-41d4-a716-44665544" } }
-#
-##
-{ 'command': 'query-uuid', 'returns': 'UuidInfo', 'allow-preconfig': true }
-
 ##
 # @GuidInfo:
 #
diff --git a/block/iscsi.c b/block/iscsi.c
index e30a7e3606..1effea25ed 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -42,7 +42,7 @@
 #include "qemu/uuid.h"
 #include "sysemu/replay.h"
 #include "qapi/error.h"
-#include "qapi/qapi-commands-machine.h"
+#include "qapi/qapi-commands-block.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
 #include "crypto/secret.h"
diff --git a/stubs/uuid.c b/stubs/uuid.c
index e5112eb3f6..d6bfb442e0 100644
--- a/stubs/uuid.c
+++ b/stubs/uuid.c
@@ -1,5 +1,5 @@
 #include "qemu/osdep.h"
-#include "qapi/qapi-commands-machine.h"
+#include "qapi/qapi-commands-block.h"
 #include "qemu/uuid.h"
 
 UuidInfo *qmp_query_uuid(Error **errp)
diff --git a/stubs/meson.build b/stubs/meson.build
index e0b322bc28..2e231590e1 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -39,7 +39,9 @@ stub_ss.add(files('target-get-monitor-def.c'))
 stub_ss.add(files('target-monitor-defs.c'))
 stub_ss.add(files('tpm.c'))
 stub_ss.add(files('trace-control.c'))
-stub_ss.add(files('uuid.c'))
+if have_block
+  stub_ss.add(files('uuid.c'))
+endif
 stub_ss.add(files('vmgenid.c'))
 stub_ss.add(files('vmstate.c'))
 stub_ss.add(files('vm-stop.c'))
-- 
2.26.2

[PATCH v3 06/11] hw/core/qdev-properties: Export qdev_prop_enum

2020-09-30 Thread Philippe Mathieu-Daudé

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/qdev-properties.h | 1 +
 hw/core/qdev-properties.c| 7 +++
 2 files changed, 8 insertions(+)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 528310bb22..4437450065 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -8,6 +8,7 @@
 extern const PropertyInfo qdev_prop_bit;
 extern const PropertyInfo qdev_prop_bit64;
 extern const PropertyInfo qdev_prop_bool;
+extern const PropertyInfo qdev_prop_enum;
 extern const PropertyInfo qdev_prop_uint8;
 extern const PropertyInfo qdev_prop_uint16;
 extern const PropertyInfo qdev_prop_uint32;
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 76417d0936..31dfe441e2 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -86,6 +86,13 @@ void qdev_propinfo_set_default_value_enum(ObjectProperty *op,
 qapi_enum_lookup(prop->info->enum_table, prop->defval.i));
 }
 
+const PropertyInfo qdev_prop_enum = {
+.name  = "enum",
+.get   = qdev_propinfo_get_enum,
+.set   = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+};
+
 /* Bit */
 
 static uint32_t qdev_get_prop_mask(Property *prop)
-- 
2.26.2

[PATCH v3 02/11] hw/core/qdev-properties: Use qemu_strtol() in set_mac() handler

2020-09-30 Thread Philippe Mathieu-Daudé

The MACAddr structure contains an array of uint8_t. Previously
if a value was out of the [0..255] range, it was silently casted
and no input validation was done.
Replace strtol() by qemu_strtol() -- so checkpatch.pl won't
complain if we move this code later -- and return EINVAL if the
input is invalid.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/core/qdev-properties.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 343c824da0..080ba319a1 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -1,4 +1,5 @@
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 #include "net/net.h"
 #include "hw/qdev-properties.h"
 #include "qapi/error.h"
@@ -524,7 +525,8 @@ static void set_mac(Object *obj, Visitor *v, const char 
*name, void *opaque,
 Property *prop = opaque;
 MACAddr *mac = qdev_get_prop_ptr(dev, prop);
 int i, pos;
-char *str, *p;
+char *str;
+const char *p;
 
 if (dev->realized) {
 qdev_prop_set_after_realize(dev, name, errp);
@@ -536,6 +538,8 @@ static void set_mac(Object *obj, Visitor *v, const char 
*name, void *opaque,
 }
 
 for (i = 0, pos = 0; i < 6; i++, pos += 3) {
+long val;
+
 if (!qemu_isxdigit(str[pos])) {
 goto inval;
 }
@@ -551,7 +555,10 @@ static void set_mac(Object *obj, Visitor *v, const char 
*name, void *opaque,
 goto inval;
 }
 }
-mac->a[i] = strtol(str+pos, , 16);
+if (qemu_strtol(str + pos, , 16, ) < 0 || val > 0xff) {
+goto inval;
+}
+mac->a[i] = val;
 }
 g_free(str);
 return;
-- 
2.26.2

[PATCH v3 04/11] hw/core/qdev-properties: Fix code style

2020-09-30 Thread Philippe Mathieu-Daudé

We will soon move this code, fix its style to avoid checkpatch.pl
to complain.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/core/qdev-properties.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index a1190a5db9..071fd5864a 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -543,15 +543,15 @@ static void set_mac(Object *obj, Visitor *v, const char 
*name, void *opaque,
 if (!qemu_isxdigit(str[pos])) {
 goto inval;
 }
-if (!qemu_isxdigit(str[pos+1])) {
+if (!qemu_isxdigit(str[pos + 1])) {
 goto inval;
 }
 if (i == 5) {
-if (str[pos+2] != '\0') {
+if (str[pos + 2] != '\0') {
 goto inval;
 }
 } else {
-if (str[pos+2] != ':' && str[pos+2] != '-') {
+if (str[pos + 2] != ':' && str[pos + 2] != '-') {
 goto inval;
 }
 }
@@ -898,8 +898,8 @@ static void set_blocksize(Object *obj, Visitor *v, const 
char *name,
 /* We rely on power-of-2 blocksizes for bitmasks */
 if ((value & (value - 1)) != 0) {
 error_setg(errp,
-  "Property %s.%s doesn't take value '%" PRId64 "', it's not a 
power of 2",
-  dev->id ?: "", name, (int64_t)value);
+  "Property %s.%s doesn't take value '%" PRId64 "', "
+  "it's not a power of 2", dev->id ?: "", name, 
(int64_t)value);
 return;
 }
 
-- 
2.26.2

[PATCH v3 00/11] user-mode: Prune build dependencies (part 3)

2020-09-30 Thread Philippe Mathieu-Daudé

This is the third part of a series reducing user-mode
dependencies. By stripping out unused code, the build
and testing time is reduced (as is space used by objects).

Part 3:
- Extract code not related to user-mode from hw/core/qdev-properties.c
- Reduce user-mode QAPI generated files

Since v2:
- Fixed UuidInfo placed in incorrect json
- Rebased on Meson
- Include X86CPUFeatureWord unmerged from part 2

Since v1:
- Addressed Richard and Paolo review comments

Patches missing review: QAPI ones :)
- #1  'qapi: Restrict query-uuid command to block code'
- #11 'qapi: Restrict code generated for user-mode'

Green CI: https://gitlab.com/philmd/qemu/-/pipelines/196505787

v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg688879.html
v1: https://www.mail-archive.com/qemu-devel@nongnu.org/msg688486.html

Philippe Mathieu-Daudé (11):
  qapi: Restrict query-uuid command to block code
  hw/core/qdev-properties: Use qemu_strtol() in set_mac() handler
  hw/core/qdev-properties: Use qemu_strtoul() in set_pci_host_devaddr()
  hw/core/qdev-properties: Fix code style
  hw/core/qdev-properties: Export enum-related functions
  hw/core/qdev-properties: Export qdev_prop_enum
  hw/core/qdev-properties: Export some integer-related functions
  hw/core/qdev-properties: Extract system-mode specific properties
  hw/core: Add qdev stub for user-mode
  target/i386: Restrict X86CPUFeatureWord to X86 targets
  qapi: Restrict code generated for user-mode

 qapi/block.json  |  30 ++
 qapi/machine-target.json |  45 ++
 qapi/machine.json|  72 ---
 hw/core/qdev-prop-internal.h |  30 ++
 include/hw/qdev-properties.h |   1 +
 block/iscsi.c|   2 +-
 hw/core/qdev-properties-system.c | 687 -
 hw/core/qdev-properties.c| 735 ++-
 stubs/qdev-system.c  |  24 +
 stubs/uuid.c |   2 +-
 target/i386/cpu.c|   2 +-
 target/i386/feature-stub.c   |  23 +
 qapi/meson.build |  51 ++-
 stubs/meson.build|   5 +-
 target/i386/meson.build  |   1 +
 15 files changed, 915 insertions(+), 795 deletions(-)
 create mode 100644 hw/core/qdev-prop-internal.h
 create mode 100644 stubs/qdev-system.c
 create mode 100644 target/i386/feature-stub.c

-- 
2.26.2

[PATCH v3 03/11] hw/core/qdev-properties: Use qemu_strtoul() in set_pci_host_devaddr()

2020-09-30 Thread Philippe Mathieu-Daudé

Replace strtoul() by qemu_strtoul() so checkpatch.pl won't complain
if we move this code later.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/core/qdev-properties.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 080ba319a1..a1190a5db9 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -951,7 +951,7 @@ static void set_pci_host_devaddr(Object *obj, Visitor *v, 
const char *name,
 Property *prop = opaque;
 PCIHostDeviceAddress *addr = qdev_get_prop_ptr(dev, prop);
 char *str, *p;
-char *e;
+const char *e;
 unsigned long val;
 unsigned long dom = 0, bus = 0;
 unsigned int slot = 0, func = 0;
@@ -966,23 +966,23 @@ static void set_pci_host_devaddr(Object *obj, Visitor *v, 
const char *name,
 }
 
 p = str;
-val = strtoul(p, , 16);
-if (e == p || *e != ':') {
+if (qemu_strtoul(p, , 16, ) < 0 || val > 0x || e == p) {
+goto inval;
+}
+if (*e != ':') {
 goto inval;
 }
 bus = val;
 
-p = e + 1;
-val = strtoul(p, , 16);
-if (e == p) {
+p = (char *)e + 1;
+if (qemu_strtoul(p, , 16, ) < 0 || val > 0x1f || e == p) {
 goto inval;
 }
 if (*e == ':') {
 dom = bus;
 bus = val;
-p = e + 1;
-val = strtoul(p, , 16);
-if (e == p) {
+p = (char *)e + 1;
+if (qemu_strtoul(p, , 16, ) < 0 || val > 0x1f || e == p) {
 goto inval;
 }
 }
@@ -991,14 +991,13 @@ static void set_pci_host_devaddr(Object *obj, Visitor *v, 
const char *name,
 if (*e != '.') {
 goto inval;
 }
-p = e + 1;
-val = strtoul(p, , 10);
-if (e == p) {
+p = (char *)e + 1;
+if (qemu_strtoul(p, , 10, ) < 0 || val > 7 || e == p) {
 goto inval;
 }
 func = val;
 
-if (dom > 0x || bus > 0xff || slot > 0x1f || func > 7) {
+if (bus > 0xff) {
 goto inval;
 }
 
-- 
2.26.2

Re: [PATCH 1/4] vmdk: fix maybe uninitialized warnings

2020-09-30 Thread Fam Zheng

On Wed, 2020-09-30 at 17:58 +0200, Christian Borntraeger wrote:
> Fedora 32 gcc 10 seems to give false positives:
> 
> Compiling C object libblock.fa.p/block_vmdk.c.o
> ../block/vmdk.c: In function ‘vmdk_parse_extents’:
> ../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>   587 | g_free(extent->l1_table);
>   | ^~~~
> ../block/vmdk.c:754:17: note: ‘extent’ was declared here
>   754 | VmdkExtent *extent;
>   | ^~
> ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>   620 | ret = vmdk_init_tables(bs, extent, errp);
>   |   ^~
> ../block/vmdk.c:598:17: note: ‘extent’ was declared here
>   598 | VmdkExtent *extent;
>   | ^~
> ../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>  1178 | extent->flat_start_offset = flat_offset << 9;
>   | ~~^~
> ../block/vmdk.c: In function ‘vmdk_open_vmdk4’:
> ../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>   581 | extent->l2_cache =
>   | ~^
>   582 | g_malloc(extent->entry_size * extent->l2_size *
> L2_CACHE_SIZE);
>   | ~
> ~
> ../block/vmdk.c:872:17: note: ‘extent’ was declared here
>   872 | VmdkExtent *extent;
>   | ^~
> ../block/vmdk.c: In function ‘vmdk_open’:
> ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in
> this function [-Werror=maybe-uninitialized]
>   620 | ret = vmdk_init_tables(bs, extent, errp);
>   |   ^~
> ../block/vmdk.c:598:17: note: ‘extent’ was declared here
>   598 | VmdkExtent *extent;
>   | ^~
> cc1: all warnings being treated as errors
> make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1
> 
> fix them by assigning a default value.
> 
> Signed-off-by: Christian Borntraeger 
> ---
>  block/vmdk.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 8ec62c7ab798..a00dc00eb47a 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -595,7 +595,7 @@ static int vmdk_open_vmfs_sparse(BlockDriverState
> *bs,
>  int ret;
>  uint32_t magic;
>  VMDK3Header header;
> -VmdkExtent *extent;
> +VmdkExtent *extent = NULL;
>  
>  ret = bdrv_pread(file, sizeof(magic), , sizeof(header));
>  if (ret < 0) {
> @@ -751,7 +751,7 @@ static int vmdk_open_se_sparse(BlockDriverState
> *bs,
>  int ret;
>  VMDKSESparseConstHeader const_header;
>  VMDKSESparseVolatileHeader volatile_header;
> -VmdkExtent *extent;
> +VmdkExtent *extent = NULL;
>  
>  ret = bdrv_apply_auto_read_only(bs,
>  "No write support for seSparse images available", errp);
> @@ -869,7 +869,7 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
>  uint32_t magic;
>  uint32_t l1_size, l1_entry_sectors;
>  VMDK4Header header;
> -VmdkExtent *extent;
> +VmdkExtent *extent = NULL;
>  BDRVVmdkState *s = bs->opaque;
>  int64_t l1_backup_offset = 0;
>  bool compressed;
> @@ -1088,7 +1088,7 @@ static int vmdk_parse_extents(const char *desc,
> BlockDriverState *bs,
>  BdrvChild *extent_file;
>  BdrvChildRole extent_role;
>  BDRVVmdkState *s = bs->opaque;
> -VmdkExtent *extent;
> +VmdkExtent *extent = NULL;
>  char extent_opt_prefix[32];
>  Error *local_err = NULL;
>  

Looks trivial, and correct.

Reviewed-by: Fam Zheng

Re: [PATCH 4/4] virtiofsd: avoid false positive compiler warning

2020-09-30 Thread Dr. David Alan Gilbert

* Christian Borntraeger (borntrae...@de.ibm.com) wrote:
> make: *** [Makefile:121: config-host.mak] Error 1
> [cborntra@m83lp52 qemu]$ make -C build/
> make: Entering directory '/home/cborntra/REPOS/qemu/build'
> Generating qemu-version.h with a meson_exe.py custom command
> Compiling C object tools/virtiofsd/virtiofsd.p/passthrough_ll.c.o
> ../tools/virtiofsd/passthrough_ll.c: In function ‘lo_setattr’:
> ../tools/virtiofsd/passthrough_ll.c:702:19: error: ‘fd’ may be used 
> uninitialized in this function [-Werror=maybe-uninitialized]
>   702 | res = futimens(fd, tv);
>   |   ^~~~
> cc1: all warnings being treated as errors
> make: *** [Makefile.ninja:1438: 
> tools/virtiofsd/virtiofsd.p/passthrough_ll.c.o] Error 1
> make: Leaving directory '/home/cborntra/REPOS/
> 
> as far as I can see this can not happen. Let us silence the warning by
> giving fd a default value.
> 
> Signed-off-by: Christian Borntraeger 

Yeh, I'd posted 
https://www.mail-archive.com/qemu-devel@nongnu.org/msg738783.html
but not yet merged it; only difference is I'd used -1 since it seemd
safer to use -1 even if it couldn't happen :-)

Dave

> ---
>  tools/virtiofsd/passthrough_ll.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c 
> b/tools/virtiofsd/passthrough_ll.c
> index 0b229ebd5786..da06aa6e9264 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -620,7 +620,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, 
> struct stat *attr,
>  struct lo_inode *inode;
>  int ifd;
>  int res;
> -int fd;
> +int fd = 0;
>  
>  inode = lo_inode(req, ino);
>  if (!inode) {
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 3/4] qemu-io-cmds: avoid gcc 10 warning

2020-09-30 Thread Dr. David Alan Gilbert

* Christian Borntraeger (borntrae...@de.ibm.com) wrote:
> With gcc 10 on Fedora32 I do get:
> 
> Compiling C object libblock.fa.p/qemu-io-cmds.c.o
> In file included from /usr/include/stdio.h:867,
>  from /home/cborntra/REPOS/qemu/include/qemu/osdep.h:85,
>  from ../qemu-io-cmds.c:11:
> In function ‘printf’,
> inlined from ‘help_oneline’ at ../qemu-io-cmds.c:2389:9,
> inlined from ‘help_all’ at ../qemu-io-cmds.c:2414:9,
> inlined from ‘help_f’ at ../qemu-io-cmds.c:2424:9:
> /usr/include/bits/stdio2.h:107:10: error: ‘%s’ directive argument is null 
> [-Werror=format-overflow=]
>   107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack 
> ());
>   |  
> ^~~
> cc1: all warnings being treated as errors
> 
> Let us check for null.
> 
> Signed-off-by: Christian Borntraeger 

I'd already posted
'qemu-io-cmds: Simplify help_oneline' that simplifies
this function much more; Kevin picked that up for the block branch a
couple of days ago.

Dave


> ---
>  qemu-io-cmds.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index baeae86d8c85..c2080aa398a9 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -2386,7 +2386,9 @@ static void help_oneline(const char *cmd, const 
> cmdinfo_t *ct)
>  if (cmd) {
>  printf("%s ", cmd);
>  } else {
> -printf("%s ", ct->name);
> +if (ct->name) {
> +printf("%s ", ct->name);
> +}
>  if (ct->altname) {
>  printf("(or %s) ", ct->altname);
>  }
> -- 
> 2.26.2
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 3/4] qemu-io-cmds: avoid gcc 10 warning

2020-09-30 Thread Philippe Mathieu-Daudé

On 9/30/20 5:58 PM, Christian Borntraeger wrote:
> With gcc 10 on Fedora32 I do get:
> 
> Compiling C object libblock.fa.p/qemu-io-cmds.c.o
> In file included from /usr/include/stdio.h:867,
>  from /home/cborntra/REPOS/qemu/include/qemu/osdep.h:85,
>  from ../qemu-io-cmds.c:11:
> In function ‘printf’,
> inlined from ‘help_oneline’ at ../qemu-io-cmds.c:2389:9,
> inlined from ‘help_all’ at ../qemu-io-cmds.c:2414:9,
> inlined from ‘help_f’ at ../qemu-io-cmds.c:2424:9:
> /usr/include/bits/stdio2.h:107:10: error: ‘%s’ directive argument is null 
> [-Werror=format-overflow=]
>   107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack 
> ());
>   |  
> ^~~
> cc1: all warnings being treated as errors
> 
> Let us check for null.
> 
> Signed-off-by: Christian Borntraeger 
> ---
>  qemu-io-cmds.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index baeae86d8c85..c2080aa398a9 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -2386,7 +2386,9 @@ static void help_oneline(const char *cmd, const 
> cmdinfo_t *ct)
>  if (cmd) {
>  printf("%s ", cmd);
>  } else {
> -printf("%s ", ct->name);
> +if (ct->name) {
> +printf("%s ", ct->name);
> +}
>  if (ct->altname) {
>  printf("(or %s) ", ct->altname);
>  }
> 

This one has been fixed last month:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg732728.html

Then queued recently:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg745394.html

[PATCH 2/4] nbd: silence maybe-uninitialized warnings

2020-09-30 Thread Christian Borntraeger

gcc 10 from Fedora 32 gives me:

Compiling C object libblock.fa.p/nbd_server.c.o
../nbd/server.c: In function ‘nbd_co_client_start’:
../nbd/server.c:625:14: error: ‘namelen’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
  625 | rc = nbd_negotiate_send_info(client, NBD_INFO_NAME, namelen, 
name,
  |  
^
  626 |  errp);
  |  ~
../nbd/server.c:564:14: note: ‘namelen’ was declared here
  564 | uint32_t namelen;
  |  ^~~
cc1: all warnings being treated as errors

As I cannot see how this can happen, let uns silence the warning.

Signed-off-by: Christian Borntraeger 
---
 nbd/server.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index 982de67816a7..2ff04ee7533d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -561,7 +561,7 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
 NBDExport *exp;
 uint16_t requests;
 uint16_t request;
-uint32_t namelen;
+uint32_t namelen = 0;
 bool sendname = false;
 bool blocksize = false;
 uint32_t sizes[3];
-- 
2.26.2

[PATCH 4/4] virtiofsd: avoid false positive compiler warning

2020-09-30 Thread Christian Borntraeger

make: *** [Makefile:121: config-host.mak] Error 1
[cborntra@m83lp52 qemu]$ make -C build/
make: Entering directory '/home/cborntra/REPOS/qemu/build'
Generating qemu-version.h with a meson_exe.py custom command
Compiling C object tools/virtiofsd/virtiofsd.p/passthrough_ll.c.o
../tools/virtiofsd/passthrough_ll.c: In function ‘lo_setattr’:
../tools/virtiofsd/passthrough_ll.c:702:19: error: ‘fd’ may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
  702 | res = futimens(fd, tv);
  |   ^~~~
cc1: all warnings being treated as errors
make: *** [Makefile.ninja:1438: tools/virtiofsd/virtiofsd.p/passthrough_ll.c.o] 
Error 1
make: Leaving directory '/home/cborntra/REPOS/

as far as I can see this can not happen. Let us silence the warning by
giving fd a default value.

Signed-off-by: Christian Borntraeger 
---
 tools/virtiofsd/passthrough_ll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 0b229ebd5786..da06aa6e9264 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -620,7 +620,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, 
struct stat *attr,
 struct lo_inode *inode;
 int ifd;
 int res;
-int fd;
+int fd = 0;
 
 inode = lo_inode(req, ino);
 if (!inode) {
-- 
2.26.2

[PATCH 3/4] qemu-io-cmds: avoid gcc 10 warning

2020-09-30 Thread Christian Borntraeger

With gcc 10 on Fedora32 I do get:

Compiling C object libblock.fa.p/qemu-io-cmds.c.o
In file included from /usr/include/stdio.h:867,
 from /home/cborntra/REPOS/qemu/include/qemu/osdep.h:85,
 from ../qemu-io-cmds.c:11:
In function ‘printf’,
inlined from ‘help_oneline’ at ../qemu-io-cmds.c:2389:9,
inlined from ‘help_all’ at ../qemu-io-cmds.c:2414:9,
inlined from ‘help_f’ at ../qemu-io-cmds.c:2424:9:
/usr/include/bits/stdio2.h:107:10: error: ‘%s’ directive argument is null 
[-Werror=format-overflow=]
  107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack 
());
  |  ^~~
cc1: all warnings being treated as errors

Let us check for null.

Signed-off-by: Christian Borntraeger 
---
 qemu-io-cmds.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index baeae86d8c85..c2080aa398a9 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -2386,7 +2386,9 @@ static void help_oneline(const char *cmd, const cmdinfo_t 
*ct)
 if (cmd) {
 printf("%s ", cmd);
 } else {
-printf("%s ", ct->name);
+if (ct->name) {
+printf("%s ", ct->name);
+}
 if (ct->altname) {
 printf("(or %s) ", ct->altname);
 }
-- 
2.26.2

[PATCH 1/4] vmdk: fix maybe uninitialized warnings

2020-09-30 Thread Christian Borntraeger

Fedora 32 gcc 10 seems to give false positives:

Compiling C object libblock.fa.p/block_vmdk.c.o
../block/vmdk.c: In function ‘vmdk_parse_extents’:
../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
  587 | g_free(extent->l1_table);
  | ^~~~
../block/vmdk.c:754:17: note: ‘extent’ was declared here
  754 | VmdkExtent *extent;
  | ^~
../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
  620 | ret = vmdk_init_tables(bs, extent, errp);
  |   ^~
../block/vmdk.c:598:17: note: ‘extent’ was declared here
  598 | VmdkExtent *extent;
  | ^~
../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
 1178 | extent->flat_start_offset = flat_offset << 9;
  | ~~^~
../block/vmdk.c: In function ‘vmdk_open_vmdk4’:
../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
  581 | extent->l2_cache =
  | ~^
  582 | g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
  | ~~
../block/vmdk.c:872:17: note: ‘extent’ was declared here
  872 | VmdkExtent *extent;
  | ^~
../block/vmdk.c: In function ‘vmdk_open’:
../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
  620 | ret = vmdk_init_tables(bs, extent, errp);
  |   ^~
../block/vmdk.c:598:17: note: ‘extent’ was declared here
  598 | VmdkExtent *extent;
  | ^~
cc1: all warnings being treated as errors
make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1

fix them by assigning a default value.

Signed-off-by: Christian Borntraeger 
---
 block/vmdk.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 8ec62c7ab798..a00dc00eb47a 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -595,7 +595,7 @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
 int ret;
 uint32_t magic;
 VMDK3Header header;
-VmdkExtent *extent;
+VmdkExtent *extent = NULL;
 
 ret = bdrv_pread(file, sizeof(magic), , sizeof(header));
 if (ret < 0) {
@@ -751,7 +751,7 @@ static int vmdk_open_se_sparse(BlockDriverState *bs,
 int ret;
 VMDKSESparseConstHeader const_header;
 VMDKSESparseVolatileHeader volatile_header;
-VmdkExtent *extent;
+VmdkExtent *extent = NULL;
 
 ret = bdrv_apply_auto_read_only(bs,
 "No write support for seSparse images available", errp);
@@ -869,7 +869,7 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
 uint32_t magic;
 uint32_t l1_size, l1_entry_sectors;
 VMDK4Header header;
-VmdkExtent *extent;
+VmdkExtent *extent = NULL;
 BDRVVmdkState *s = bs->opaque;
 int64_t l1_backup_offset = 0;
 bool compressed;
@@ -1088,7 +1088,7 @@ static int vmdk_parse_extents(const char *desc, 
BlockDriverState *bs,
 BdrvChild *extent_file;
 BdrvChildRole extent_role;
 BDRVVmdkState *s = bs->opaque;
-VmdkExtent *extent;
+VmdkExtent *extent = NULL;
 char extent_opt_prefix[32];
 Error *local_err = NULL;
 
-- 
2.26.2

[PATCH 0/4] assorted gcc 10/fedora32 compile warning fixes

2020-09-30 Thread Christian Borntraeger

I might be wrong and some of these warnings could be correct, so some
review from the subject matter experts would be good.

Christian Borntraeger (4):
  vmdk: fix maybe uninitialized warnings
  nbd: silence maybe-uninitialized warnings
  qemu-io-cmds: avoid gcc 10 warning
  virtiofsd: avoid false positive compiler warning

 block/vmdk.c | 8 
 nbd/server.c | 2 +-
 qemu-io-cmds.c   | 4 +++-
 tools/virtiofsd/passthrough_ll.c | 2 +-
 4 files changed, 9 insertions(+), 7 deletions(-)

-- 
2.26.2

Re: [PULL 13/13] qemu/atomic.h: rename atomic_ to qatomic_

2020-09-30 Thread Paolo Bonzini

On 30/09/20 17:28, Andrew Jones wrote:
> pvrdma_ring.h is an update-linux-headers.sh file. When running the
> script again we lose the atomic_ to qatomic_ renaming. I've hacked
> the script by adding
> 
>  -e 's/\batomic_read/qatomic_read/g;s/\batomic_set/qatomic_set/g'
> 
> to the cp_portable() sed command, but only considering the two
> qatomic_ functions currently used is obviously not a complete
> solution.
> 
> Any ideas?

My first thought was that it's strange that there are atomics in a uapi/
file, and in fact the file is not uapi/ but rather part of the driver.
I think we should just copy the file into QEMU sources, and remove it
from update-linux-headers.sh.  The "s/atomic_t/int/" can go as well.

Paolo

Re: [PATCH v5 09/14] hw/block/nvme: Support Zoned Namespace Command Set

2020-09-30 Thread Niklas Cassel

On Mon, Sep 28, 2020 at 11:35:23AM +0900, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> The code to support for Zone Descriptor Extensions is not included in
> this commit and ZDES 0 is always reported. A later commit in this
> series will add ZDE support.
> 
> This commit doesn't yet include checks for active and open zone
> limits. It is assumed that there are no limits on either active or
> open zones.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> ---
>  block/nvme.c |   2 +-
>  hw/block/nvme-ns.c   | 185 -
>  hw/block/nvme-ns.h   |   6 +-
>  hw/block/nvme.c  | 872 +--
>  include/block/nvme.h |   6 +-
>  5 files changed, 1033 insertions(+), 38 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 05485fdd11..7a513c9a17 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c

(snip)

> @@ -1326,11 +2060,20 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, 
> uint32_t buf_len,
>  acs[NVME_ADM_CMD_GET_LOG_PAGE] = NVME_CMD_EFFECTS_CSUPP;
>  acs[NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFFECTS_CSUPP;
>  
> -iocs[NVME_CMD_FLUSH] = NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC;
> -iocs[NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFFECTS_CSUPP |
> -  NVME_CMD_EFFECTS_LBCC;
> -iocs[NVME_CMD_WRITE] = NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC;
> -iocs[NVME_CMD_READ] = NVME_CMD_EFFECTS_CSUPP;
> +if (NVME_CC_CSS(n->bar.cc) != CSS_ADMIN_ONLY) {
> +iocs[NVME_CMD_FLUSH] = NVME_CMD_EFFECTS_CSUPP | 
> NVME_CMD_EFFECTS_LBCC;
> +iocs[NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFFECTS_CSUPP |
> +  NVME_CMD_EFFECTS_LBCC;
> +iocs[NVME_CMD_WRITE] = NVME_CMD_EFFECTS_CSUPP | 
> NVME_CMD_EFFECTS_LBCC;
> +iocs[NVME_CMD_READ] = NVME_CMD_EFFECTS_CSUPP;
> +}
> +
> +if (csi == NVME_CSI_ZONED && NVME_CC_CSS(n->bar.cc) == CSS_CSI) {

Actually, intead of naming the helper function, ctrl_has_zns_namespaces(),
a better name might be ctrl_has_zns_support()

Since this is what is used to set the bit in nvme_identify_cmd_set(),

Then, I think that this should be:

if (ctrl_has_zns_support() && csi == NVME_CSI_ZONED && NVME_CC_CSS(n->bar.cc) 
== CSS_CSI) {


> +iocs[NVME_CMD_ZONE_APPEND] = NVME_CMD_EFFECTS_CSUPP |
> + NVME_CMD_EFFECTS_LBCC;
> +iocs[NVME_CMD_ZONE_MGMT_SEND] = NVME_CMD_EFFECTS_CSUPP;
> +iocs[NVME_CMD_ZONE_MGMT_RECV] = NVME_CMD_EFFECTS_CSUPP;
> +}
>  
>  trans_len = MIN(sizeof(cmd_eff_log) - off, buf_len);
>

Re: [PULL 13/13] qemu/atomic.h: rename atomic_ to qatomic_

2020-09-30 Thread Andrew Jones

On Wed, Sep 23, 2020 at 05:10:31PM +0100, Stefan Hajnoczi wrote:
> clang's C11 atomic_fetch_*() functions only take a C11 atomic type
> pointer argument. QEMU uses direct types (int, etc) and this causes a
> compiler error when a QEMU code calls these functions in a source file
> that also included  via a system header file:
> 
>   $ CC=clang CXX=clang++ ./configure ... && make
>   ../util/async.c:79:17: error: address argument to atomic operation must be 
> a pointer to _Atomic type ('unsigned int *' invalid)
> 
> Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
> used by . Prefix QEMU's APIs with 'q' so that atomic.h
> and  can co-exist. I checked /usr/include on my machine and
> searched GitHub for existing "qatomic_" users but there seem to be none.
> 
> This patch was generated using:
> 
>   $ git grep -h -o '\ sort -u >/tmp/changed_identifiers
>   $ for identifier in $( sed -i "s%\<$identifier\>%q$identifier%g" \
> $(git grep -I -l "\<$identifier\>")
> done
> 
> I manually fixed line-wrap issues and misaligned rST tables.
> 
> Signed-off-by: Stefan Hajnoczi 
> Reviewed-by: Philippe Mathieu-DaudÃ© 
> Acked-by: Paolo Bonzini 
> Message-Id: <20200923105646.47864-1-stefa...@redhat.com>
> ---
>  include/qemu/atomic.h | 248 +-
>  docs/devel/lockcnt.txt|   8 +-
>  docs/devel/rcu.txt|  34 +--
>  accel/tcg/atomic_template.h   |  20 +-
>  include/block/aio-wait.h  |   4 +-
>  include/block/aio.h   |   8 +-
>  include/exec/cpu_ldst.h   |   2 +-
>  include/exec/exec-all.h   |   6 +-
>  include/exec/log.h|   6 +-
>  include/exec/memory.h |   2 +-
>  include/exec/ram_addr.h   |  26 +-
>  include/exec/ramlist.h|   2 +-
>  include/exec/tb-lookup.h  |   4 +-
>  include/hw/core/cpu.h |   2 +-
>  include/qemu/atomic128.h  |   6 +-
>  include/qemu/bitops.h |   2 +-
>  include/qemu/coroutine.h  |   2 +-
>  include/qemu/log.h|   6 +-
>  include/qemu/queue.h  |   7 +-
>  include/qemu/rcu.h|  10 +-
>  include/qemu/rcu_queue.h  | 100 +++
>  include/qemu/seqlock.h|   8 +-
>  include/qemu/stats64.h|  28 +-
>  include/qemu/thread.h |  24 +-
>  .../infiniband/hw/vmw_pvrdma/pvrdma_ring.h|  14 +-

Hi Stefan,

pvrdma_ring.h is an update-linux-headers.sh file. When running the
script again we lose the atomic_ to qatomic_ renaming. I've hacked
the script by adding

 -e 's/\batomic_read/qatomic_read/g;s/\batomic_set/qatomic_set/g'

to the cp_portable() sed command, but only considering the two
qatomic_ functions currently used is obviously not a complete
solution.

Any ideas?

Thanks,
drew

Re: [PATCH v2 1/4] keyval: Parse help options

2020-09-30 Thread Kevin Wolf

Am 30.09.2020 um 15:35 hat Eric Blake geschrieben:
> On 9/30/20 7:45 AM, Kevin Wolf wrote:
> > This adds a new parameter 'help' to keyval_parse() that enables parsing
> > of help options. If NULL is passed, the function behaves the same as
> > before. But if a bool pointer is given, it contains the information
> > whether an option "help" without value was given (which would otherwise
> > either result in an error or be interpreted as the value for an implied
> > key).
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> 
> > +
> > +/* "help" is only a help option if it has no value */
> > +qdict = keyval_parse("help=on", NULL, , _abort);
> > +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> > +g_assert_cmpstr(qdict_get_try_str(qdict, "help"), ==, "on");
> > +g_assert_false(help);
> > +qobject_unref(qdict);
> > +
> > +/* Double comma after "help" in an implied key is not a help option */
> > +qdict = keyval_parse("help,,abc", "implied", , _abort);
> > +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> > +g_assert_false(help);
> > +qobject_unref(qdict);
> 
> Worth checking qdict_get_try_str(qdict, "implied") for "help,abc"?

Yes, this makes sense.

> > +
> > +/* Without implied key and without value, it's an error */
> > +qdict = keyval_parse("help,,abc", NULL, , );
> > +error_free_or_abort();
> > +g_assert(!qdict);
> > +
> > +/* "help" as the only option */
> > +qdict = keyval_parse("help", NULL, , _abort);
> > +g_assert_cmpuint(qdict_size(qdict), ==, 0);
> > +g_assert_true(help);
> > +qobject_unref(qdict);
> > +
> > +/* "help" as the first part of the key */
> > +qdict = keyval_parse("help.abc", NULL, , );
> > +error_free_or_abort();
> > +g_assert(!qdict);
> 
> Worth checking qdict_get_try_str(qdict, "help.abc") for "on"? (at least,
> that's my guess as what happened)

The keyval parser doesn't support boolean options like this (I think
this is intentional because it would be ambiguous with implied keys).

This is an error case, and I don't think the QDict has defined content
then. But if anything, we would want to check that it's empty.

> > +
> > +/* "help" as the last part of the key */
> > +qdict = keyval_parse("abc.help", NULL, , );
> > +error_free_or_abort();
> > +g_assert(!qdict);
> 
> [1]
> 
> > +
> > +/* "help" is not a value for the implied key if  is given */
> > +qdict = keyval_parse("help", "implied", , _abort);
> > +g_assert_cmpuint(qdict_size(qdict), ==, 0);
> > +g_assert_true(help);
> > +qobject_unref(qdict);
> 
> Worth checking that the qdict does not contain "implied"?  Perhaps by
> checking qdict_size() == 0?

Already there, the first line after keyval_parse(). :-)

> > +
> > +/* "help" is a value for the implied key when passing NULL for help */
> > +qdict = keyval_parse("help", "implied", NULL, _abort);
> > +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> > +g_assert_cmpstr(qdict_get_try_str(qdict, "implied"), ==, "help");
> > +qobject_unref(qdict);
> > +
> > +/* "help.abc" is a value for the implied key */
> > +qdict = keyval_parse("help.abc", "implied", , );
> > +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> > +g_assert_cmpstr(qdict_get_try_str(qdict, "implied"), ==, "help.abc");
> > +g_assert_false(help);
> > +qobject_unref(qdict);
> > +
> > +/* "abc.help" is a value for the implied key */
> > +qdict = keyval_parse("abc.help", "implied", , );
> > +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> > +g_assert_cmpstr(qdict_get_try_str(qdict, "implied"), ==, "abc.help");
> > +g_assert_false(help);
> > +qobject_unref(qdict);
> > +
> > +/* "help" as the last part of the key */
> > +qdict = keyval_parse("abc.help", NULL, , );
> > +error_free_or_abort();
> > +g_assert(!qdict);
> 
> duplicates [1]

So we can be extra sure!

Somehow I suspect I wanted to test another case here, but I'm not sure
which one. Maybe the same with an implied key? Or I can just remove it.

> > +
> > +/* Other keys before and after help are still parsed normally */
> > +qdict = keyval_parse("number=42,help,foo=bar", NULL, , 
> > _abort);
> > +g_assert_cmpuint(qdict_size(qdict), ==, 2);
> > +g_assert_cmpstr(qdict_get_try_str(qdict, "number"), ==, "42");
> > +g_assert_cmpstr(qdict_get_try_str(qdict, "foo"), ==, "bar");
> > +g_assert_true(help);
> > +qobject_unref(qdict);
> > +
> > +/* ...even with an implied key */
> > +qdict = keyval_parse("val,help,foo=bar", "implied", , 
> > _abort);
> > +g_assert_cmpuint(qdict_size(qdict), ==, 2);
> > +g_assert_cmpstr(qdict_get_try_str(qdict, "implied"), ==, "val");
> > +g_assert_cmpstr(qdict_get_try_str(qdict, "foo"), ==, "bar");
> > +g_assert_true(help);
> > +qobject_unref(qdict);
> >  }
> >  
> 
> Overall a nice set of additions.  You could tweak it further, but I'm no
> longer seeing a hole like last time.
> 
> > +++

Re: [PATCH v5 03/14] hw/block/nvme: Introduce the Namespace Types definitions

2020-09-30 Thread Keith Busch

On Mon, Sep 28, 2020 at 11:35:17AM +0900, Dmitry Fomichev wrote:
> From: Niklas Cassel 
> 
> Define the structures and constants required to implement
> Namespace Types support.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Dmitry Fomichev 
> ---
>  hw/block/nvme-ns.h   |  2 ++
>  hw/block/nvme.c  |  2 +-
>  include/block/nvme.h | 74 +++-
>  3 files changed, 63 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> index 83734f4606..cca23bc0b3 100644
> --- a/hw/block/nvme-ns.h
> +++ b/hw/block/nvme-ns.h
> @@ -21,6 +21,8 @@
>  
>  typedef struct NvmeNamespaceParams {
>  uint32_t nsid;
> +uint8_t  csi;
> +QemuUUID uuid;

Neither of these new params are used anywhere in this patch.

Re: [PATCH 1/4] keyval: Parse help options

2020-09-30 Thread Kevin Wolf

Am 30.09.2020 um 15:42 hat Eric Blake geschrieben:
> On 9/30/20 8:04 AM, Kevin Wolf wrote:
> > Am 29.09.2020 um 19:46 hat Eric Blake geschrieben:
> >> On 9/29/20 12:26 PM, Kevin Wolf wrote:
> >>> This adds a new parameter 'help' to keyval_parse() that enables parsing
> >>> of help options. If NULL is passed, the function behaves the same as
> >>> before. But if a bool pointer is given, it contains the information
> >>> whether an option "help" without value was given (which would otherwise
> >>> either result in an error or be interpreted as the value for an implied
> >>> key).
> >>>
> >>> Signed-off-by: Kevin Wolf 
> >>> ---
> >>
> >>> +++ b/util/keyval.c
> >>
> >> Might be nice to see this before the testsuite changes by tweaking the
> >> git orderfile.
> > 
> > What does your git orderfile look like? I don't know how to exclude
> > tests/ from file type patterns like *.c.
> 
> You can start with scripts/git.orderfile, and temporarily add:
> 
>  # decoding tree specification
>  *.decode
> 
> +# Key files that I want first for this patch
> +util/*.c
> +
>  # code
>  *.c
> 
> or similar.  It's not a show-stopper if you don't, and I concede that
> remembering to do it (and then to revert back to the usual afterwords)
> is not trivial.

Ah, I see. I never did per-patch/series orderfiles, I just have my
generic one that does things like headers before implementation, and
documentation and QAPI schema changes before both.

Kevin


signature.asc
Description: PGP signature

Re: [PATCH] qemu-storage-daemon: Fix help line for --export

2020-09-30 Thread Kevin Wolf

Am 30.09.2020 um 15:48 hat Eric Blake geschrieben:
> On 9/30/20 8:39 AM, Kevin Wolf wrote:
> > Commit 5f479a8d renamed the 'device' option of --export into
> > 'node-name', but forgot to update the help in qemu-storage-daemon.
> > 
> > Fixes: 5f479a8dc086bfa42c9f94e9ab69962f256e207f
> > Signed-off-by: Kevin Wolf 
> > ---
> >  storage-daemon/qemu-storage-daemon.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/storage-daemon/qemu-storage-daemon.c 
> > b/storage-daemon/qemu-storage-daemon.c
> > index 7cbdbf0b23..42839c981f 100644
> > --- a/storage-daemon/qemu-storage-daemon.c
> > +++ b/storage-daemon/qemu-storage-daemon.c
> > @@ -92,7 +92,7 @@ static void help(void)
> >  "  --chardev configure a character device backend\n"
> >  " (see the qemu(1) man page for possible 
> > options)\n"
> >  "\n"
> > -"  --export [type=]nbd,device=,id=,[,name=]\n"
> > +"  --export 
> > [type=]nbd,id=,node-name=,[,name=]\n"
> 
> While touching this, get rid of the doubled comma before the optional
> name key (s/,\[,/\[,/)

Somehow I didn't even notice even though I reordered the options before
it. I'm removing the extra comma now.

> With that,
> Reviewed-by: Eric Blake 

Thanks, applied.

Kevin


signature.asc
Description: PGP signature

Re: [PATCH v5 09/14] hw/block/nvme: Support Zoned Namespace Command Set

2020-09-30 Thread Niklas Cassel

On Mon, Sep 28, 2020 at 11:35:23AM +0900, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> The code to support for Zone Descriptor Extensions is not included in
> this commit and ZDES 0 is always reported. A later commit in this
> series will add ZDE support.
> 
> This commit doesn't yet include checks for active and open zone
> limits. It is assumed that there are no limits on either active or
> open zones.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 
> ---
>  block/nvme.c |   2 +-
>  hw/block/nvme-ns.c   | 185 -
>  hw/block/nvme-ns.h   |   6 +-
>  hw/block/nvme.c  | 872 +--
>  include/block/nvme.h |   6 +-
>  5 files changed, 1033 insertions(+), 38 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 05485fdd11..7a513c9a17 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -333,7 +333,7 @@ static inline int nvme_translate_error(const NvmeCqe *c)
>  {
>  uint16_t status = (le16_to_cpu(c->status) >> 1) & 0xFF;
>  if (status) {
> -trace_nvme_error(le32_to_cpu(c->result),
> +trace_nvme_error(le32_to_cpu(c->result32),
>   le16_to_cpu(c->sq_head),
>   le16_to_cpu(c->sq_id),
>   le16_to_cpu(c->cid),
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> index 31b7f986c3..6d9dc9205b 100644
> --- a/hw/block/nvme-ns.c
> +++ b/hw/block/nvme-ns.c
> @@ -33,14 +33,14 @@ static void nvme_ns_init(NvmeNamespace *ns)
>  NvmeIdNs *id_ns = >id_ns;
>  
>  if (blk_get_flags(ns->blkconf.blk) & BDRV_O_UNMAP) {
> -ns->id_ns.dlfeat = 0x9;
> +ns->id_ns.dlfeat = 0x8;

You seem to change something that is NVM namespace specific here, why?
If it is indeed needed, I assume that this change should be in a separate
patch.

>  }
>  
>  id_ns->lbaf[0].ds = BDRV_SECTOR_BITS;
>  
>  id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
>  
> -ns->params.csi = NVME_CSI_NVM;
> +ns->csi = NVME_CSI_NVM;
>  qemu_uuid_generate(>params.uuid); /* TODO make UUIDs persistent */
>  
>  /* no thin provisioning */
> @@ -73,7 +73,162 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace 
> *ns, Error **errp)
>  }
>  
>  lba_index = NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas);
> -ns->id_ns.lbaf[lba_index].ds = 31 - clz32(n->conf.logical_block_size);
> +ns->id_ns.lbaf[lba_index].ds = 31 - 
> clz32(ns->blkconf.logical_block_size);
> +
> +return 0;
> +}
> +
> +/*
> + * Add a zone to the tail of a zone list.
> + */
> +void nvme_add_zone_tail(NvmeNamespace *ns, NvmeZoneList *zl, NvmeZone *zone)
> +{
> +uint32_t idx = (uint32_t)(zone - ns->zone_array);
> +
> +assert(nvme_zone_not_in_list(zone));
> +
> +if (!zl->size) {
> +zl->head = zl->tail = idx;
> +zone->next = zone->prev = NVME_ZONE_LIST_NIL;
> +} else {
> +ns->zone_array[zl->tail].next = idx;
> +zone->prev = zl->tail;
> +zone->next = NVME_ZONE_LIST_NIL;
> +zl->tail = idx;
> +}
> +zl->size++;
> +}
> +
> +/*
> + * Remove a zone from a zone list. The zone must be linked in the list.
> + */
> +void nvme_remove_zone(NvmeNamespace *ns, NvmeZoneList *zl, NvmeZone *zone)
> +{
> +uint32_t idx = (uint32_t)(zone - ns->zone_array);
> +
> +assert(!nvme_zone_not_in_list(zone));
> +
> +--zl->size;
> +if (zl->size == 0) {
> +zl->head = NVME_ZONE_LIST_NIL;
> +zl->tail = NVME_ZONE_LIST_NIL;
> +} else if (idx == zl->head) {
> +zl->head = zone->next;
> +ns->zone_array[zl->head].prev = NVME_ZONE_LIST_NIL;
> +} else if (idx == zl->tail) {
> +zl->tail = zone->prev;
> +ns->zone_array[zl->tail].next = NVME_ZONE_LIST_NIL;
> +} else {
> +ns->zone_array[zone->next].prev = zone->prev;
> +ns->zone_array[zone->prev].next = zone->next;
> +}
> +
> +

Re: [PATCH 0/2] hw/block/m25p80: Fix Numonyx flash dummy cycle register behavior

2020-09-30 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/1601425716-204629-1-git-send-email-koml...@xilinx.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

N/A. Internal error while reading log file



The full log is available at
http://patchew.org/logs/1601425716-204629-1-git-send-email-koml...@xilinx.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2 11/13] block/export: convert vhost-user-blk server to block export API

2020-09-30 Thread Markus Armbruster

Stefan Hajnoczi  writes:

> On Wed, Sep 30, 2020 at 07:28:58AM +0200, Markus Armbruster wrote:
>> Stefan Hajnoczi  writes:
>> 
>> > Use the new QAPI block exports API instead of defining our own QOM
>> > objects.
>> >
>> > This is a large change because the lifecycle of VuBlockDev needs to
>> > follow BlockExportDriver. QOM properties are replaced by QAPI options
>> > objects.
>> >
>> > VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
>> > Several fields can be dropped since BlockExport already has equivalents.
>> >
>> > The file names and meson build integration will be adjusted in a future
>> > patch. libvhost-user should probably be built as a static library that
>> > is linked into QEMU instead of as a .c file that results in duplicate
>> > compilation.
>> >
>> > The new command-line syntax is:
>> >
>> >   $ qemu-storage-daemon \
>> >   --blockdev file,node-name=drive0,filename=test.img \
>> >   --export 
>> > vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock
>> >
>> > Note that unix-socket is optional because we may wish to accept chardevs
>> > too in the future.
>> >
>> > Signed-off-by: Stefan Hajnoczi 
>> > ---
>> > v2:
>> >  * Replace str unix-socket with SocketAddress addr to match NBD and
>> >support file descriptor passing
>> >  * Make addr mandatory [Markus]
>> >  * Update vhost-user-blk-test.c to use --export syntax
>> > ---
>> >  qapi/block-export.json   |  21 +-
>> >  block/export/vhost-user-blk-server.h |  23 +-
>> >  block/export/export.c|   8 +-
>> >  block/export/vhost-user-blk-server.c | 452 +++
>> >  tests/qtest/vhost-user-blk-test.c|   2 +-
>> >  util/vhost-user-server.c |  10 +-
>> >  block/export/meson.build |   1 +
>> >  block/meson.build|   1 -
>> >  8 files changed, 158 insertions(+), 360 deletions(-)
>> >
>> > diff --git a/qapi/block-export.json b/qapi/block-export.json
>> > index ace0d66e17..2e44625bb1 100644
>> > --- a/qapi/block-export.json
>> > +++ b/qapi/block-export.json
>> > @@ -84,6 +84,21 @@
>> >'data': { '*name': 'str', '*description': 'str',
>> >  '*bitmap': 'str' } }
>> >  
>> > +##
>> > +# @BlockExportOptionsVhostUserBlk:
>> > +#
>> > +# A vhost-user-blk block export.
>> > +#
>> > +# @addr: The vhost-user socket on which to listen. Both 'unix' and 'fd'
>> > +#SocketAddress types are supported. Passed fds must be UNIX domain
>> > +#sockets.
>> 
>> "addr.type must be 'unix' or 'fd'" is not visible in introspection.
>> Awkward.  Practical problem only if other addresses ever become
>> available here.  Is that possible?
>
> addr.type=fd itself has the same problem, because it is a file
> descriptor without type information. Therefore the QMP client cannot
> introspect which types of file descriptors can be passed.

Yes, but if introspection could tell us which which values of addr.type
are valid, then a client should figure out the address families, as
follows.  Any valid value other than 'fd' corresponds to an address
family.  The set of values valid for addr.type therefore corresponds to
a set of address families.  The address families in that set are all
valid with 'fd', aren't they?

> Two ideas:
>
> 1. Introduce per-address family fd types (SocketAddrFdTcpInet,
>SocketAddrFdTcpInet6, SocketAddrFdUnix, etc) to express specific
>address families in the QAPI schema.
>
>Then use per-command unions to express the address families supported
>by specific commands. For example,
>BlockExportOptionsVhostUserBlkSocketAddr would only allow
>SocketAddrUnix and SocketAddrFdUnix. That way new address families
>can be supported in the future and introspection reports.

Awkward.  These types would have to differ structurally, or else they
are indistinguishable in introspection.

> 2. Use a side-channel (query-features, I think we discussed something
>like this a while back) to report features that cannot be
>introspected.

We implemented this in the form of QAPI feature flags, visible in
introspection.  You could do something like

  'addr': { 'type': 'SocketAddress',
'features': [ 'unix' ] }

> I think the added complexity for achieving full introspection is not
> worth it. It becomes harder to define new QAPI commands, increases the
> chance of errors, and is more tedious to program for clients/servers.

Hence my question: is it possible that address families other than unix
become available here?

When that happens, we have an introspection problem of the sort we
common solve with a feature flag.

> Accepting any SocketAddr seems reasonable to me since vhost-user
> requires an address family that has file descriptor passing. Very few
> address families support this feature and we don't expect to add new
> ones often.

Your answer appears to be "yes in theory, quite unlikely in practice".
Correct?

Re: [PATCH v5 06/14] hw/block/nvme: Add support for active/inactive namespaces

2020-09-30 Thread Niklas Cassel

On Mon, Sep 28, 2020 at 11:35:20AM +0900, Dmitry Fomichev wrote:
> From: Niklas Cassel 
> 
> In NVMe, a namespace is active if it exists and is attached to the
> controller.
> 
> CAP.CSS (together with the I/O Command Set data structure) defines what
> command sets are supported by the controller.
> 
> CC.CSS (together with Set Profile) can be set to enable a subset of the
> available command sets. The namespaces belonging to a disabled command set
> will not be able to attach to the controller, and will thus be inactive.
> 
> E.g., if the user sets CC.CSS to Admin Only, NVM namespaces should be
> marked as inactive.
> 
> The identify namespace, the identify namespace CSI specific, and the namespace
> list commands have two different versions, one that only shows active
> namespaces, and the other version that shows existing namespaces, regardless
> of whether the namespace is attached or not.
> 
> Add an attached member to struct NvmeNamespace, and implement the missing CNS
> commands.
> 
> The added functionality will also simplify the implementation of namespace
> management in the future, since namespace management can also attach and
> detach namespaces.

Following my previous discussion with Klaus,
I think we need to rewrite this commit message completely:

Subject: hw/block/nvme: Add support for allocated CNS command variants

Many CNS commands have "allocated" command variants.
These includes a namespace as long as it is allocated
(i.e. a namespace is included regardless if it is active (attached)
or not.)

While these commands are optional (they are mandatory for controllers
supporting the namespace attachment command), our QEMU implementation
is more complete by actually providing support for these CNS values.

However, since our QEMU model currently does not support the namespace
attachment command, these new allocated CNS commands will return the same
result as the active CNS command variants.

In NVMe, a namespace is active if it exists and is attached to the
controller.

CAP.CSS (together with the I/O Command Set data structure) defines what
command sets are supported by the controller.

CC.CSS (together with Set Profile) can be set to enable a subset of the
available command sets.

Even if a user configures CC.CSS to e.g. Admin only, NVM namespaces
will still be attached (and thus marked as active).
Similarly, if a user configures CC.CSS to e.g. NVM, ZNS namespaces
will still be attached (and thus marked as active).

However, any operation from a disabled command set will result in a
Invalid Command Opcode.

Add an attached struct member for struct NvmeNamespace,
so that we lay the foundation for namespace attachment
support. Also implement logic in the new CNS values to
include/exclude namespaces based on this new struct member.
The only thing missing hooking up the actual Namespace Attachment
command opcode, which allows a user to toggle the attached
variable per namespace. The reason for not hooking up this
command completely is because the NVMe specification
requires that the namespace managment command is supported
if the namespacement attachment command is supported.

> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Dmitry Fomichev 
> ---
>  hw/block/nvme-ns.h   |  1 +
>  hw/block/nvme.c  | 60 ++--
>  include/block/nvme.h | 20 +--
>  3 files changed, 65 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> index cca23bc0b3..acdb76f058 100644
> --- a/hw/block/nvme-ns.h
> +++ b/hw/block/nvme-ns.h
> @@ -22,6 +22,7 @@
>  typedef struct NvmeNamespaceParams {
>  uint32_t nsid;
>  uint8_t  csi;
> +bool attached;
>  QemuUUID uuid;
>  } NvmeNamespaceParams;
>  
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 4ec1ddc90a..63ad03d6d6 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c

We need to add an additional check in nvme_io_cmd()
that returns Invalid Command Opcode when CC.CSS == Admin only.

> @@ -1523,7 +1523,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, 
> NvmeRequest *req)
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
>  
> -static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
> +static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req,
> + bool only_active)
>  {
>  NvmeNamespace *ns;
>  NvmeIdentify *c = (NvmeIdentify *)>cmd;
> @@ -1540,11 +1541,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
> NvmeRequest *req)
>  return nvme_rpt_empty_id_struct(n, req);
>  }
>  
> +if (only_active && !ns->params.attached) {
> +return nvme_rpt_empty_id_struct(n, req);
> +}
> +
>  return nvme_dma(n, (uint8_t *)>id_ns, sizeof(NvmeIdNs),
>  DMA_DIRECTION_FROM_DEVICE, req);
>  }
>  
> -static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req)
> +static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
> +

Re: [PATCH v7 06/13] qmp: Call monitor_set_cur() only in qmp_dispatch()

2020-09-30 Thread Kevin Wolf

Am 30.09.2020 um 15:14 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > Am 30.09.2020 um 11:26 hat Markus Armbruster geschrieben:
> >> Kevin Wolf  writes:
> >> 
> >> > Am 28.09.2020 um 13:42 hat Markus Armbruster geschrieben:
> >> >> Kevin Wolf  writes:
> >> >> 
> >> >> > Am 14.09.2020 um 17:10 hat Markus Armbruster geschrieben:
> >> >> >> Kevin Wolf  writes:
> [...]
> >> >> >> > diff --git a/monitor/qmp.c b/monitor/qmp.c
> >> >> >> > index 8469970c69..922fdb5541 100644
> >> >> >> > --- a/monitor/qmp.c
> >> >> >> > +++ b/monitor/qmp.c
> >> >> >> > @@ -135,16 +135,10 @@ static void monitor_qmp_respond(MonitorQMP 
> >> >> >> > *mon, QDict *rsp)
> >> >> >> >  
> >> >> >> >  static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
> >> >> >> >  {
> >> >> >> > -Monitor *old_mon;
> >> >> >> >  QDict *rsp;
> >> >> >> >  QDict *error;
> >> >> >> >  
> >> >> >> > -old_mon = monitor_set_cur(>common);
> >> >> >> > -assert(old_mon == NULL);
> >> >> >> > -
> >> >> >> > -rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon));
> >> >> >> > -
> >> >> >> > -monitor_set_cur(NULL);
> >> >> >> > +rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon), 
> >> >> >> > >common);
> >> >> >> 
> >> >> >> Long line.  Happy to wrap it in my tree.  A few more in PATCH 08-11.
> >> >> >
> >> >> > It's 79 characters. Should be fine even with your local deviation from
> >> >> > the coding style to require less than that for comments?
> >> >> 
> >> >> Let me rephrase my remark.
> >> >> 
> >> >> For me,
> >> >> 
> >> >> rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
> >> >>>common);
> >> >> 
> >> >> is significantly easier to read than
> >> >> 
> >> >> rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon), 
> >> >> >common);
> >> >
> >> > I guess this is highly subjective. I find wrapped lines harder to read.
> >> > For answering subjective questions like this, we generally use the
> >> > coding style document.
> >> >
> >> > Anyway, I guess following an idiosyncratic coding style that is
> >> > different from every other subsystem in QEMU is possible (if
> >> > inconvenient) if I know what it is.
> >> 
> >> The applicable coding style document is PEP 8.
> >
> > I'll happily apply PEP 8 to Python code, but this is C. I don't think
> > PEP 8 applies very well to C code. (In fact, PEP 7 exists as a C style
> > guide, but we're not writing C code for the Python project here...)
> 
> I got confused (too much Python code review), my apologies.
> 
> >> > My problem is more that I don't know what the exact rules are. Can they
> >> > only be figured out experimentally by submitting patches and seeing
> >> > whether you like them or not?
> >> 
> >> PEP 8:
> >> 
> >> A style guide is about consistency.  Consistency with this style
> >> guide is important.  Consistency within a project is more important.
> >> Consistency within one module or function is the most important.
> >> 
> >> In other words, you should make a reasonable effort to blend in.
> >
> > The project style guide for C is defined in CODING_STYLE.rst. Missing
> > consistency with it is what I'm complaining about.
> >
> > I also agree that consistency within one module or function is most
> > important, which is why I allow you to reformat my code. But I don't
> > think it means that local coding style rules shouldn't be documented,
> > especially if you can't just look at the code and see immediately how
> > it's supposed to be.
> >
> >> >> Would you mind me wrapping this line in my tree?
> >> >
> >> > I have no say in this subsystem and I take it that you want all code to
> >> > look as if you had written it yourself, so do as you wish.
> >> 
> >> I'm refusing the bait.
> >> 
> >> > But I understand that I'll have to respin anyway, so if you could
> >> > explain what you're after, I might be able to apply the rules for the
> >> > next version of the series.
> >> 
> >> First, PEP 8 again:
> >> 
> >> Limit all lines to a maximum of 79 characters.
> >> 
> >> For flowing long blocks of text with fewer structural restrictions
> >> (docstrings or comments), the line length should be limited to 72
> >> characters.
> >
> > Ok, that's finally clear limits at least.
> >
> > Any other rules from PEP 8 that you want to see applied to C code?
> 
> PEP 8 does not apply to C.
> 
> > Would you mind documenting this somewhere?
> >
> >> Second, an argument we two had on this list, during review of a prior
> >> version of this patch series, talking about C:
> >> 
> >> Legibility.  Humans tend to have trouble following long lines with
> >> their eyes (I sure do).  Typographic manuals suggest to limit
> >> columns to roughly 60 characters for exactly that reason[*].
> >> 
> >> Code is special.  It's typically indented, and long identifiers push
> >> it further to the right, function arguments in particular.  We
> >> compromised at 80

Re: [PATCH v2 4/4] qemu-storage-daemon: Remove QemuOpts from --object parser

2020-09-30 Thread Eric Blake

On 9/30/20 7:45 AM, Kevin Wolf wrote:
> The command line parser for --object parses the input twice: Once into
> QemuOpts just for detecting help options, and then again into a QDict
> using the keyval parser for actually creating the object.
> 
> Now that the keyval parser can also detect help options, we can simplify
> this and remove the QemuOpts part.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  storage-daemon/qemu-storage-daemon.c | 15 ---
>  1 file changed, 4 insertions(+), 11 deletions(-)

As with v1,
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] qemu-storage-daemon: Fix help line for --export

2020-09-30 Thread Eric Blake

On 9/30/20 8:39 AM, Kevin Wolf wrote:
> Commit 5f479a8d renamed the 'device' option of --export into
> 'node-name', but forgot to update the help in qemu-storage-daemon.
> 
> Fixes: 5f479a8dc086bfa42c9f94e9ab69962f256e207f
> Signed-off-by: Kevin Wolf 
> ---
>  storage-daemon/qemu-storage-daemon.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/storage-daemon/qemu-storage-daemon.c 
> b/storage-daemon/qemu-storage-daemon.c
> index 7cbdbf0b23..42839c981f 100644
> --- a/storage-daemon/qemu-storage-daemon.c
> +++ b/storage-daemon/qemu-storage-daemon.c
> @@ -92,7 +92,7 @@ static void help(void)
>  "  --chardev configure a character device backend\n"
>  " (see the qemu(1) man page for possible options)\n"
>  "\n"
> -"  --export [type=]nbd,device=,id=,[,name=]\n"
> +"  --export [type=]nbd,id=,node-name=,[,name=]\n"

While touching this, get rid of the doubled comma before the optional
name key (s/,\[,/\[,/)

With that,
Reviewed-by: Eric Blake 

>  "   [,writable=on|off][,bitmap=]\n"
>  " export the specified block node over NBD\n"
>  " (requires --nbd-server)\n"
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 3/4] qom: Add user_creatable_print_help_from_qdict()

2020-09-30 Thread Eric Blake

On 9/30/20 7:45 AM, Kevin Wolf wrote:
> This adds a function that, given a QDict of non-help options, prints
> help for user creatable objects.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/qom/object_interfaces.h | 9 +
>  qom/object_interfaces.c | 9 +
>  2 files changed, 18 insertions(+)

As with v1,
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v2 2/4] qom: Factor out helpers from user_creatable_print_help()

2020-09-30 Thread Eric Blake

On 9/30/20 7:45 AM, Kevin Wolf wrote:
> This creates separate helper functions for printing a list of user
> creatable object types and for printing a list of properties of a given
> type. This allows using these parts without having a QemuOpts.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  qom/object_interfaces.c | 90 -
>  1 file changed, 52 insertions(+), 38 deletions(-)

I don't see any changes from v1, so my R-b from there could have been
applied.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 1/4] keyval: Parse help options

2020-09-30 Thread Eric Blake

On 9/30/20 8:04 AM, Kevin Wolf wrote:
> Am 29.09.2020 um 19:46 hat Eric Blake geschrieben:
>> On 9/29/20 12:26 PM, Kevin Wolf wrote:
>>> This adds a new parameter 'help' to keyval_parse() that enables parsing
>>> of help options. If NULL is passed, the function behaves the same as
>>> before. But if a bool pointer is given, it contains the information
>>> whether an option "help" without value was given (which would otherwise
>>> either result in an error or be interpreted as the value for an implied
>>> key).
>>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>
>>> +++ b/util/keyval.c
>>
>> Might be nice to see this before the testsuite changes by tweaking the
>> git orderfile.
> 
> What does your git orderfile look like? I don't know how to exclude
> tests/ from file type patterns like *.c.

You can start with scripts/git.orderfile, and temporarily add:

 # decoding tree specification
 *.decode

+# Key files that I want first for this patch
+util/*.c
+
 # code
 *.c

or similar.  It's not a show-stopper if you don't, and I concede that
remembering to do it (and then to revert back to the usual afterwords)
is not trivial.


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[PATCH] qemu-storage-daemon: Fix help line for --export

2020-09-30 Thread Kevin Wolf

Commit 5f479a8d renamed the 'device' option of --export into
'node-name', but forgot to update the help in qemu-storage-daemon.

Fixes: 5f479a8dc086bfa42c9f94e9ab69962f256e207f
Signed-off-by: Kevin Wolf 
---
 storage-daemon/qemu-storage-daemon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index 7cbdbf0b23..42839c981f 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -92,7 +92,7 @@ static void help(void)
 "  --chardev configure a character device backend\n"
 " (see the qemu(1) man page for possible options)\n"
 "\n"
-"  --export [type=]nbd,device=,id=,[,name=]\n"
+"  --export [type=]nbd,id=,node-name=,[,name=]\n"
 "   [,writable=on|off][,bitmap=]\n"
 " export the specified block node over NBD\n"
 " (requires --nbd-server)\n"
-- 
2.25.4

[PATCH] block/nvme: Add driver statistics for access alignment and hw errors

2020-09-30 Thread Philippe Mathieu-Daudé

Keep statistics of some hardware errors, and number of
aligned/unaligned I/O accesses.

QMP example booting a full RHEL 8.3 aarch64 guest:

{ "execute": "query-blockstats" }
{
"return": [
{
"device": "",
"node-name": "drive0",
"stats": {
"flush_total_time_ns": 6026948,
"wr_highest_offset": 3383991230464,
"wr_total_time_ns": 807450995,
"failed_wr_operations": 0,
"failed_rd_operations": 0,
"wr_merged": 3,
"wr_bytes": 50133504,
"failed_unmap_operations": 0,
"failed_flush_operations": 0,
"account_invalid": false,
"rd_total_time_ns": 1846979900,
"flush_operations": 130,
"wr_operations": 659,
"rd_merged": 1192,
"rd_bytes": 218244096,
"account_failed": false,
"idle_time_ns": 2678641497,
"rd_operations": 7406,
},
"driver-specific": {
"driver": "nvme",
"completion-errors": 0,
"unaligned-access-nb": 2959,
"aligned-access-nb": 4477
},
"qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
}
]
}

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Philippe Mathieu-Daudé 
---
 qapi/block-core.json | 24 +++-
 block/nvme.c | 27 +++
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 86ed72ef9f..795e4185bd 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -941,6 +941,27 @@
   'discard-nb-failed': 'uint64',
   'discard-bytes-ok': 'uint64' } }
 
+##
+# @BlockStatsSpecificNvme:
+#
+# NVMe driver statistics
+#
+# @completion-errors: The number of completion errors.
+#
+# @aligned-access-nb: The number of aligned accesses performed by
+# the driver.
+#
+# @unaligned-access-nb: The number of unaligned accesses performed by
+#   the driver.
+#
+# Since: 5.2
+##
+{ 'struct': 'BlockStatsSpecificNvme',
+  'data': {
+  'completion-errors': 'uint64',
+  'aligned-access-nb': 'uint64',
+  'unaligned-access-nb': 'uint64' } }
+
 ##
 # @BlockStatsSpecific:
 #
@@ -953,7 +974,8 @@
   'discriminator': 'driver',
   'data': {
   'file': 'BlockStatsSpecificFile',
-  'host_device': 'BlockStatsSpecificFile' } }
+  'host_device': 'BlockStatsSpecificFile',
+  'nvme': 'BlockStatsSpecificNvme' } }
 
 ##
 # @BlockStats:
diff --git a/block/nvme.c b/block/nvme.c
index f4f27b6da7..382f696202 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -133,6 +133,12 @@ struct BDRVNVMeState {
 
 /* PCI address (required for nvme_refresh_filename()) */
 char *device;
+
+struct {
+uint64_t completion_errors;
+uint64_t aligned_access_nb;
+uint64_t unaligned_access_nb;
+} stats;
 };
 
 #define NVME_BLOCK_OPT_DEVICE "device"
@@ -389,6 +395,9 @@ static bool nvme_process_completion(NVMeQueuePair *q)
 break;
 }
 ret = nvme_translate_error(c);
+if (ret) {
+s->stats.completion_errors++;
+}
 q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
 if (!q->cq.head) {
 q->cq_phase = !q->cq_phase;
@@ -1146,8 +1155,10 @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t 
offset, uint64_t bytes,
 assert(QEMU_IS_ALIGNED(bytes, s->page_size));
 assert(bytes <= s->max_transfer);
 if (nvme_qiov_aligned(bs, qiov)) {
+s->stats.aligned_access_nb++;
 return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
 }
+s->stats.unaligned_access_nb++;
 trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
 buf = qemu_try_memalign(s->page_size, bytes);
 
@@ -1443,6 +1454,21 @@ static void nvme_unregister_buf(BlockDriverState *bs, 
void *host)
 qemu_vfio_dma_unmap(s->vfio, host);
 }
 
+static BlockStatsSpecific *nvme_get_specific_stats(BlockDriverState *bs)
+{
+BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
+BDRVNVMeState *s = bs->opaque;
+
+stats->driver = BLOCKDEV_DRIVER_NVME;
+stats->u.nvme = (BlockStatsSpecificNvme) {
+.completion_errors = s->stats.completion_errors,
+.aligned_access_nb = s->stats.aligned_access_nb,
+.unaligned_access_nb = s->stats.unaligned_access_nb,
+};
+
+return stats;
+}
+
 static const char *const nvme_strong_runtime_opts[] = {
 NVME_BLOCK_OPT_DEVICE,
 NVME_BLOCK_OPT_NAMESPACE,
@@ -1476,6 +1502,7 @@ static BlockDriver bdrv_nvme = {
 .bdrv_refresh_filename= nvme_refresh_filename,
 .bdrv_refresh_limits  = nvme_refresh_limits,
 .strong_runtime_opts  = nvme_strong_runtime_opts,
+.bdrv_get_specific_stats  = nvme_get_specific_stats,

Re: [PATCH v2 1/4] keyval: Parse help options

2020-09-30 Thread Eric Blake

On 9/30/20 7:45 AM, Kevin Wolf wrote:
> This adds a new parameter 'help' to keyval_parse() that enables parsing
> of help options. If NULL is passed, the function behaves the same as
> before. But if a bool pointer is given, it contains the information
> whether an option "help" without value was given (which would otherwise
> either result in an error or be interpreted as the value for an implied
> key).
> 
> Signed-off-by: Kevin Wolf 
> ---

> +
> +/* "help" is only a help option if it has no value */
> +qdict = keyval_parse("help=on", NULL, , _abort);
> +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> +g_assert_cmpstr(qdict_get_try_str(qdict, "help"), ==, "on");
> +g_assert_false(help);
> +qobject_unref(qdict);
> +
> +/* Double comma after "help" in an implied key is not a help option */
> +qdict = keyval_parse("help,,abc", "implied", , _abort);
> +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> +g_assert_false(help);
> +qobject_unref(qdict);

Worth checking qdict_get_try_str(qdict, "implied") for "help,abc"?

> +
> +/* Without implied key and without value, it's an error */
> +qdict = keyval_parse("help,,abc", NULL, , );
> +error_free_or_abort();
> +g_assert(!qdict);
> +
> +/* "help" as the only option */
> +qdict = keyval_parse("help", NULL, , _abort);
> +g_assert_cmpuint(qdict_size(qdict), ==, 0);
> +g_assert_true(help);
> +qobject_unref(qdict);
> +
> +/* "help" as the first part of the key */
> +qdict = keyval_parse("help.abc", NULL, , );
> +error_free_or_abort();
> +g_assert(!qdict);

Worth checking qdict_get_try_str(qdict, "help.abc") for "on"? (at least,
that's my guess as what happened)

> +
> +/* "help" as the last part of the key */
> +qdict = keyval_parse("abc.help", NULL, , );
> +error_free_or_abort();
> +g_assert(!qdict);

[1]

> +
> +/* "help" is not a value for the implied key if  is given */
> +qdict = keyval_parse("help", "implied", , _abort);
> +g_assert_cmpuint(qdict_size(qdict), ==, 0);
> +g_assert_true(help);
> +qobject_unref(qdict);

Worth checking that the qdict does not contain "implied"?  Perhaps by
checking qdict_size() == 0?

> +
> +/* "help" is a value for the implied key when passing NULL for help */
> +qdict = keyval_parse("help", "implied", NULL, _abort);
> +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> +g_assert_cmpstr(qdict_get_try_str(qdict, "implied"), ==, "help");
> +qobject_unref(qdict);
> +
> +/* "help.abc" is a value for the implied key */
> +qdict = keyval_parse("help.abc", "implied", , );
> +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> +g_assert_cmpstr(qdict_get_try_str(qdict, "implied"), ==, "help.abc");
> +g_assert_false(help);
> +qobject_unref(qdict);
> +
> +/* "abc.help" is a value for the implied key */
> +qdict = keyval_parse("abc.help", "implied", , );
> +g_assert_cmpuint(qdict_size(qdict), ==, 1);
> +g_assert_cmpstr(qdict_get_try_str(qdict, "implied"), ==, "abc.help");
> +g_assert_false(help);
> +qobject_unref(qdict);
> +
> +/* "help" as the last part of the key */
> +qdict = keyval_parse("abc.help", NULL, , );
> +error_free_or_abort();
> +g_assert(!qdict);

duplicates [1]

> +
> +/* Other keys before and after help are still parsed normally */
> +qdict = keyval_parse("number=42,help,foo=bar", NULL, , 
> _abort);
> +g_assert_cmpuint(qdict_size(qdict), ==, 2);
> +g_assert_cmpstr(qdict_get_try_str(qdict, "number"), ==, "42");
> +g_assert_cmpstr(qdict_get_try_str(qdict, "foo"), ==, "bar");
> +g_assert_true(help);
> +qobject_unref(qdict);
> +
> +/* ...even with an implied key */
> +qdict = keyval_parse("val,help,foo=bar", "implied", , _abort);
> +g_assert_cmpuint(qdict_size(qdict), ==, 2);
> +g_assert_cmpstr(qdict_get_try_str(qdict, "implied"), ==, "val");
> +g_assert_cmpstr(qdict_get_try_str(qdict, "foo"), ==, "bar");
> +g_assert_true(help);
> +qobject_unref(qdict);
>  }
>  

Overall a nice set of additions.  You could tweak it further, but I'm no
longer seeing a hole like last time.

> +++ b/util/keyval.c
> @@ -166,7 +166,7 @@ static QObject *keyval_parse_put(QDict *cur,
>   * On failure, return NULL.
>   */
>  static const char *keyval_parse_one(QDict *qdict, const char *params,
> -const char *implied_key,
> +const char *implied_key, bool *help,
>  Error **errp)
>  {
>  const char *key, *key_end, *s, *end;
> @@ -238,13 +238,20 @@ static const char *keyval_parse_one(QDict *qdict, const 
> char *params,
>  if (key == implied_key) {
>  assert(!*s);
>  s = params;
> +} else if (*s == '=') {
> +s++;
>  } else {
> -if (*s != '=') {
> +if (help && !strncmp(key, "help", s - key)) {

Should this use is_help_option() to also accept "?", or

Re: [PATCH] job: delete job_{lock, unlock} functions and replace them with lock guard

2020-09-30 Thread Paolo Bonzini

On 30/09/20 14:15, Elena Afanasova wrote:
>>> +WITH_QEMU_LOCK_GUARD(_mutex) {
>>> +if (ns != -1) {
>>> +timer_mod(>sleep_timer, ns);
>>> +}
>>> +job->busy = false;
>>> +job_event_idle(job);
>> Is this new macro safe to use in a coroutine context?
> Hi, I suppose it's safe. It would be nice to get some more opinions
> here.
> 

Yes, the macro is just a wrapper around the qemu_mutex_lock/unlock
functions (or qemu_co_mutex_lock/unlock depending on the type of its
argument).

Paolo

Re: [PATCH v7 06/13] qmp: Call monitor_set_cur() only in qmp_dispatch()

2020-09-30 Thread Markus Armbruster

Kevin Wolf  writes:

> Am 30.09.2020 um 11:26 hat Markus Armbruster geschrieben:
>> Kevin Wolf  writes:
>> 
>> > Am 28.09.2020 um 13:42 hat Markus Armbruster geschrieben:
>> >> Kevin Wolf  writes:
>> >> 
>> >> > Am 14.09.2020 um 17:10 hat Markus Armbruster geschrieben:
>> >> >> Kevin Wolf  writes:
[...]
>> >> >> > diff --git a/monitor/qmp.c b/monitor/qmp.c
>> >> >> > index 8469970c69..922fdb5541 100644
>> >> >> > --- a/monitor/qmp.c
>> >> >> > +++ b/monitor/qmp.c
>> >> >> > @@ -135,16 +135,10 @@ static void monitor_qmp_respond(MonitorQMP 
>> >> >> > *mon, QDict *rsp)
>> >> >> >  
>> >> >> >  static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
>> >> >> >  {
>> >> >> > -Monitor *old_mon;
>> >> >> >  QDict *rsp;
>> >> >> >  QDict *error;
>> >> >> >  
>> >> >> > -old_mon = monitor_set_cur(>common);
>> >> >> > -assert(old_mon == NULL);
>> >> >> > -
>> >> >> > -rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon));
>> >> >> > -
>> >> >> > -monitor_set_cur(NULL);
>> >> >> > +rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon), 
>> >> >> > >common);
>> >> >> 
>> >> >> Long line.  Happy to wrap it in my tree.  A few more in PATCH 08-11.
>> >> >
>> >> > It's 79 characters. Should be fine even with your local deviation from
>> >> > the coding style to require less than that for comments?
>> >> 
>> >> Let me rephrase my remark.
>> >> 
>> >> For me,
>> >> 
>> >> rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
>> >>>common);
>> >> 
>> >> is significantly easier to read than
>> >> 
>> >> rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon), 
>> >> >common);
>> >
>> > I guess this is highly subjective. I find wrapped lines harder to read.
>> > For answering subjective questions like this, we generally use the
>> > coding style document.
>> >
>> > Anyway, I guess following an idiosyncratic coding style that is
>> > different from every other subsystem in QEMU is possible (if
>> > inconvenient) if I know what it is.
>> 
>> The applicable coding style document is PEP 8.
>
> I'll happily apply PEP 8 to Python code, but this is C. I don't think
> PEP 8 applies very well to C code. (In fact, PEP 7 exists as a C style
> guide, but we're not writing C code for the Python project here...)

I got confused (too much Python code review), my apologies.

>> > My problem is more that I don't know what the exact rules are. Can they
>> > only be figured out experimentally by submitting patches and seeing
>> > whether you like them or not?
>> 
>> PEP 8:
>> 
>> A style guide is about consistency.  Consistency with this style
>> guide is important.  Consistency within a project is more important.
>> Consistency within one module or function is the most important.
>> 
>> In other words, you should make a reasonable effort to blend in.
>
> The project style guide for C is defined in CODING_STYLE.rst. Missing
> consistency with it is what I'm complaining about.
>
> I also agree that consistency within one module or function is most
> important, which is why I allow you to reformat my code. But I don't
> think it means that local coding style rules shouldn't be documented,
> especially if you can't just look at the code and see immediately how
> it's supposed to be.
>
>> >> Would you mind me wrapping this line in my tree?
>> >
>> > I have no say in this subsystem and I take it that you want all code to
>> > look as if you had written it yourself, so do as you wish.
>> 
>> I'm refusing the bait.
>> 
>> > But I understand that I'll have to respin anyway, so if you could
>> > explain what you're after, I might be able to apply the rules for the
>> > next version of the series.
>> 
>> First, PEP 8 again:
>> 
>> Limit all lines to a maximum of 79 characters.
>> 
>> For flowing long blocks of text with fewer structural restrictions
>> (docstrings or comments), the line length should be limited to 72
>> characters.
>
> Ok, that's finally clear limits at least.
>
> Any other rules from PEP 8 that you want to see applied to C code?

PEP 8 does not apply to C.

> Would you mind documenting this somewhere?
>
>> Second, an argument we two had on this list, during review of a prior
>> version of this patch series, talking about C:
>> 
>> Legibility.  Humans tend to have trouble following long lines with
>> their eyes (I sure do).  Typographic manuals suggest to limit
>> columns to roughly 60 characters for exactly that reason[*].
>> 
>> Code is special.  It's typically indented, and long identifiers push
>> it further to the right, function arguments in particular.  We
>> compromised at 80 columns.
>> 
>> [...]
>> 
>> [*] https://en.wikipedia.org/wiki/Column_(typography)#Typographic_style
>> 
>> The width of the line not counting indentation matters for legibility.
>> 
>> The line I flagged as long is 75 characters wide not counting
>> indentation.  That's needlessly hard to read for me.

Re: [PATCH 1/4] keyval: Parse help options

2020-09-30 Thread Kevin Wolf

Am 29.09.2020 um 19:46 hat Eric Blake geschrieben:
> On 9/29/20 12:26 PM, Kevin Wolf wrote:
> > This adds a new parameter 'help' to keyval_parse() that enables parsing
> > of help options. If NULL is passed, the function behaves the same as
> > before. But if a bool pointer is given, it contains the information
> > whether an option "help" without value was given (which would otherwise
> > either result in an error or be interpreted as the value for an implied
> > key).
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> 
> > +++ b/util/keyval.c
> 
> Might be nice to see this before the testsuite changes by tweaking the
> git orderfile.

What does your git orderfile look like? I don't know how to exclude
tests/ from file type patterns like *.c.

Kevin


signature.asc
Description: PGP signature

[PATCH v2 1/4] keyval: Parse help options

2020-09-30 Thread Kevin Wolf

This adds a new parameter 'help' to keyval_parse() that enables parsing
of help options. If NULL is passed, the function behaves the same as
before. But if a bool pointer is given, it contains the information
whether an option "help" without value was given (which would otherwise
either result in an error or be interpreted as the value for an implied
key).

Signed-off-by: Kevin Wolf 
---
 include/qemu/option.h|   2 +-
 qapi/qobject-input-visitor.c |   2 +-
 storage-daemon/qemu-storage-daemon.c |   2 +-
 tests/test-keyval.c  | 205 +++
 util/keyval.c|  38 -
 5 files changed, 179 insertions(+), 70 deletions(-)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index 05e8a15c73..ac69352e0e 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -149,6 +149,6 @@ void qemu_opts_free(QemuOptsList *list);
 QemuOptsList *qemu_opts_append(QemuOptsList *dst, QemuOptsList *list);
 
 QDict *keyval_parse(const char *params, const char *implied_key,
-Error **errp);
+bool *help, Error **errp);
 
 #endif
diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
index f918a05e5f..7b184b50a7 100644
--- a/qapi/qobject-input-visitor.c
+++ b/qapi/qobject-input-visitor.c
@@ -757,7 +757,7 @@ Visitor *qobject_input_visitor_new_str(const char *str,
 assert(args);
 v = qobject_input_visitor_new(QOBJECT(args));
 } else {
-args = keyval_parse(str, implied_key, errp);
+args = keyval_parse(str, implied_key, NULL, errp);
 if (!args) {
 return NULL;
 }
diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index e6157ff518..bb9cb740f0 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -278,7 +278,7 @@ static void process_options(int argc, char *argv[])
 }
 qemu_opts_del(opts);
 
-args = keyval_parse(optarg, "qom-type", _fatal);
+args = keyval_parse(optarg, "qom-type", NULL, _fatal);
 user_creatable_add_dict(args, true, _fatal);
 qobject_unref(args);
 break;
diff --git a/tests/test-keyval.c b/tests/test-keyval.c
index e331a84149..83b65f04f7 100644
--- a/tests/test-keyval.c
+++ b/tests/test-keyval.c
@@ -27,27 +27,28 @@ static void test_keyval_parse(void)
 QDict *qdict, *sub_qdict;
 char long_key[129];
 char *params;
+bool help;
 
 /* Nothing */
-qdict = keyval_parse("", NULL, _abort);
+qdict = keyval_parse("", NULL, NULL, _abort);
 g_assert_cmpuint(qdict_size(qdict), ==, 0);
 qobject_unref(qdict);
 
 /* Empty key (qemu_opts_parse() accepts this) */
-qdict = keyval_parse("=val", NULL, );
+qdict = keyval_parse("=val", NULL, NULL, );
 error_free_or_abort();
 g_assert(!qdict);
 
 /* Empty key fragment */
-qdict = keyval_parse(".", NULL, );
+qdict = keyval_parse(".", NULL, NULL, );
 error_free_or_abort();
 g_assert(!qdict);
-qdict = keyval_parse("key.", NULL, );
+qdict = keyval_parse("key.", NULL, NULL, );
 error_free_or_abort();
 g_assert(!qdict);
 
 /* Invalid non-empty key (qemu_opts_parse() doesn't care) */
-qdict = keyval_parse("7up=val", NULL, );
+qdict = keyval_parse("7up=val", NULL, NULL, );
 error_free_or_abort();
 g_assert(!qdict);
 
@@ -56,25 +57,25 @@ static void test_keyval_parse(void)
 long_key[127] = 'z';
 long_key[128] = 0;
 params = g_strdup_printf("k.%s=v", long_key);
-qdict = keyval_parse(params + 2, NULL, );
+qdict = keyval_parse(params + 2, NULL, NULL, );
 error_free_or_abort();
 g_assert(!qdict);
 
 /* Overlong key fragment */
-qdict = keyval_parse(params, NULL, );
+qdict = keyval_parse(params, NULL, NULL, );
 error_free_or_abort();
 g_assert(!qdict);
 g_free(params);
 
 /* Long key (qemu_opts_parse() accepts and truncates silently) */
 params = g_strdup_printf("k.%s=v", long_key + 1);
-qdict = keyval_parse(params + 2, NULL, _abort);
+qdict = keyval_parse(params + 2, NULL, NULL, _abort);
 g_assert_cmpuint(qdict_size(qdict), ==, 1);
 g_assert_cmpstr(qdict_get_try_str(qdict, long_key + 1), ==, "v");
 qobject_unref(qdict);
 
 /* Long key fragment */
-qdict = keyval_parse(params, NULL, _abort);
+qdict = keyval_parse(params, NULL, NULL, _abort);
 g_assert_cmpuint(qdict_size(qdict), ==, 1);
 sub_qdict = qdict_get_qdict(qdict, "k");
 g_assert(sub_qdict);
@@ -84,25 +85,25 @@ static void test_keyval_parse(void)
 g_free(params);
 
 /* Crap after valid key */
-qdict = keyval_parse("key[0]=val", NULL, );
+qdict = keyval_parse("key[0]=val", NULL, NULL, );
 error_free_or_abort();
 g_assert(!qdict);
 
 /* Multiple keys, last one wins */
-qdict = keyval_parse("a=1,b=2,,x,a=3",

[PATCH v2 4/4] qemu-storage-daemon: Remove QemuOpts from --object parser

2020-09-30 Thread Kevin Wolf

The command line parser for --object parses the input twice: Once into
QemuOpts just for detecting help options, and then again into a QDict
using the keyval parser for actually creating the object.

Now that the keyval parser can also detect help options, we can simplify
this and remove the QemuOpts part.

Signed-off-by: Kevin Wolf 
---
 storage-daemon/qemu-storage-daemon.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index bb9cb740f0..7cbdbf0b23 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -264,21 +264,14 @@ static void process_options(int argc, char *argv[])
 }
 case OPTION_OBJECT:
 {
-QemuOpts *opts;
-const char *type;
 QDict *args;
+bool help;
 
-/* FIXME The keyval parser rejects 'help' arguments, so we must
- * unconditionall try QemuOpts first. */
-opts = qemu_opts_parse(_object_opts,
-   optarg, true, _fatal);
-type = qemu_opt_get(opts, "qom-type");
-if (type && user_creatable_print_help(type, opts)) {
+args = keyval_parse(optarg, "qom-type", , _fatal);
+if (help) {
+user_creatable_print_help_from_qdict(args);
 exit(EXIT_SUCCESS);
 }
-qemu_opts_del(opts);
-
-args = keyval_parse(optarg, "qom-type", NULL, _fatal);
 user_creatable_add_dict(args, true, _fatal);
 qobject_unref(args);
 break;
-- 
2.25.4

Re: [PATCH v5 05/14] hw/block/nvme: Add support for Namespace Types

2020-09-30 Thread Niklas Cassel

On Mon, Sep 28, 2020 at 11:35:19AM +0900, Dmitry Fomichev wrote:
> From: Niklas Cassel 
> 
> Namespace Types introduce a new command set, "I/O Command Sets",
> that allows the host to retrieve the command sets associated with
> a namespace. Introduce support for the command set and enable
> detection for the NVM Command Set.
> 
> The new workflows for identify commands rely heavily on zero-filled
> identify structs. E.g., certain CNS commands are defined to return
> a zero-filled identify struct when an inactive namespace NSID
> is supplied.
> 
> Add a helper function in order to avoid code duplication when
> reporting zero-filled identify structures.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Dmitry Fomichev 
> ---
>  hw/block/nvme-ns.c |   3 +
>  hw/block/nvme.c| 210 +
>  2 files changed, 175 insertions(+), 38 deletions(-)
> 
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> index bbd7879492..31b7f986c3 100644
> --- a/hw/block/nvme-ns.c
> +++ b/hw/block/nvme-ns.c

(snip)

> @@ -1597,12 +1667,31 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
> *n, NvmeRequest *req)
>   * Namespace Identification Descriptor. Add a very basic Namespace UUID
>   * here.
>   */
> -ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID;
> -ns_descrs->uuid.hdr.nidl = NVME_NIDL_UUID;
> -stl_be_p(_descrs->uuid.v, nsid);
> +desc = list_ptr;
> +desc->nidt = NVME_NIDT_UUID;
> +desc->nidl = NVME_NIDL_UUID;
> +list_ptr += sizeof(*desc);
> +memcpy(list_ptr, ns->params.uuid.data, NVME_NIDL_UUID);
> +list_ptr += NVME_NIDL_UUID;
>  
> -return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE,
> -DMA_DIRECTION_FROM_DEVICE, req);
> +desc = list_ptr;
> +desc->nidt = NVME_NIDT_CSI;
> +desc->nidl = NVME_NIDL_CSI;
> +list_ptr += sizeof(*desc);
> +*(uint8_t *)list_ptr = NVME_CSI_NVM;

I think that we should use ns->csi/ns->params.csi here rather than
NVME_CSI_NVM.
You do this change in a later patch, but I think it is more correct
to do it here already. (No reason not to, since ns->csi/ns->params.csi
should be set to NVME_CSI_NVM for NVM namespace already in this patch.)

> +
> +return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req);
> +}

[PATCH v2 3/4] qom: Add user_creatable_print_help_from_qdict()

2020-09-30 Thread Kevin Wolf

This adds a function that, given a QDict of non-help options, prints
help for user creatable objects.

Signed-off-by: Kevin Wolf 
---
 include/qom/object_interfaces.h | 9 +
 qom/object_interfaces.c | 9 +
 2 files changed, 18 insertions(+)

diff --git a/include/qom/object_interfaces.h b/include/qom/object_interfaces.h
index f118fb516b..53b114b11a 100644
--- a/include/qom/object_interfaces.h
+++ b/include/qom/object_interfaces.h
@@ -161,6 +161,15 @@ int user_creatable_add_opts_foreach(void *opaque,
  */
 bool user_creatable_print_help(const char *type, QemuOpts *opts);
 
+/**
+ * user_creatable_print_help_from_qdict:
+ * @args: options to create
+ *
+ * Prints help considering the other options given in @args (if "qom-type" is
+ * given and valid, print properties for the type, otherwise print valid types)
+ */
+void user_creatable_print_help_from_qdict(QDict *args);
+
 /**
  * user_creatable_del:
  * @id: the unique ID for the object
diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index 3fd1da157e..ed896fe764 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -279,6 +279,15 @@ bool user_creatable_print_help(const char *type, QemuOpts 
*opts)
 return false;
 }
 
+void user_creatable_print_help_from_qdict(QDict *args)
+{
+const char *type = qdict_get_try_str(args, "qom-type");
+
+if (!type || !user_creatable_print_type_properites(type)) {
+user_creatable_print_types();
+}
+}
+
 bool user_creatable_del(const char *id, Error **errp)
 {
 Object *container;
-- 
2.25.4

[PATCH v2 2/4] qom: Factor out helpers from user_creatable_print_help()

2020-09-30 Thread Kevin Wolf

This creates separate helper functions for printing a list of user
creatable object types and for printing a list of properties of a given
type. This allows using these parts without having a QemuOpts.

Signed-off-by: Kevin Wolf 
---
 qom/object_interfaces.c | 90 -
 1 file changed, 52 insertions(+), 38 deletions(-)

diff --git a/qom/object_interfaces.c b/qom/object_interfaces.c
index e8e1523960..3fd1da157e 100644
--- a/qom/object_interfaces.c
+++ b/qom/object_interfaces.c
@@ -214,54 +214,68 @@ char *object_property_help(const char *name, const char 
*type,
 return g_string_free(str, false);
 }
 
-bool user_creatable_print_help(const char *type, QemuOpts *opts)
+static void user_creatable_print_types(void)
+{
+GSList *l, *list;
+
+printf("List of user creatable objects:\n");
+list = object_class_get_list_sorted(TYPE_USER_CREATABLE, false);
+for (l = list; l != NULL; l = l->next) {
+ObjectClass *oc = OBJECT_CLASS(l->data);
+printf("  %s\n", object_class_get_name(oc));
+}
+g_slist_free(list);
+}
+
+static bool user_creatable_print_type_properites(const char *type)
 {
 ObjectClass *klass;
+ObjectPropertyIterator iter;
+ObjectProperty *prop;
+GPtrArray *array;
+int i;
 
-if (is_help_option(type)) {
-GSList *l, *list;
+klass = object_class_by_name(type);
+if (!klass) {
+return false;
+}
 
-printf("List of user creatable objects:\n");
-list = object_class_get_list_sorted(TYPE_USER_CREATABLE, false);
-for (l = list; l != NULL; l = l->next) {
-ObjectClass *oc = OBJECT_CLASS(l->data);
-printf("  %s\n", object_class_get_name(oc));
+array = g_ptr_array_new();
+object_class_property_iter_init(, klass);
+while ((prop = object_property_iter_next())) {
+if (!prop->set) {
+continue;
 }
-g_slist_free(list);
-return true;
+
+g_ptr_array_add(array,
+object_property_help(prop->name, prop->type,
+ prop->defval, prop->description));
 }
+g_ptr_array_sort(array, (GCompareFunc)qemu_pstrcmp0);
+if (array->len > 0) {
+printf("%s options:\n", type);
+} else {
+printf("There are no options for %s.\n", type);
+}
+for (i = 0; i < array->len; i++) {
+printf("%s\n", (char *)array->pdata[i]);
+}
+g_ptr_array_set_free_func(array, g_free);
+g_ptr_array_free(array, true);
+return true;
+}
 
-klass = object_class_by_name(type);
-if (klass && qemu_opt_has_help_opt(opts)) {
-ObjectPropertyIterator iter;
-ObjectProperty *prop;
-GPtrArray *array = g_ptr_array_new();
-int i;
-
-object_class_property_iter_init(, klass);
-while ((prop = object_property_iter_next())) {
-if (!prop->set) {
-continue;
-}
-
-g_ptr_array_add(array,
-object_property_help(prop->name, prop->type,
- prop->defval, 
prop->description));
-}
-g_ptr_array_sort(array, (GCompareFunc)qemu_pstrcmp0);
-if (array->len > 0) {
-printf("%s options:\n", type);
-} else {
-printf("There are no options for %s.\n", type);
-}
-for (i = 0; i < array->len; i++) {
-printf("%s\n", (char *)array->pdata[i]);
-}
-g_ptr_array_set_free_func(array, g_free);
-g_ptr_array_free(array, true);
+bool user_creatable_print_help(const char *type, QemuOpts *opts)
+{
+if (is_help_option(type)) {
+user_creatable_print_types();
 return true;
 }
 
+if (qemu_opt_has_help_opt(opts)) {
+return user_creatable_print_type_properites(type);
+}
+
 return false;
 }
 
-- 
2.25.4

[PATCH v2 0/4] qemu-storage-daemon: Remove QemuOpts from --object parser

2020-09-30 Thread Kevin Wolf

This replaces the QemuOpts-based help code for --object in the storage
daemon with code based on the keyval parser.

v2:
- Fixed double comma by reusing the existing key and value parsers [Eric]
- More tests to cover the additional cases

Kevin Wolf (4):
  keyval: Parse help options
  qom: Factor out helpers from user_creatable_print_help()
  qom: Add user_creatable_print_help_from_qdict()
  qemu-storage-daemon: Remove QemuOpts from --object parser

 include/qemu/option.h|   2 +-
 include/qom/object_interfaces.h  |   9 ++
 qapi/qobject-input-visitor.c |   2 +-
 qom/object_interfaces.c  |  99 -
 storage-daemon/qemu-storage-daemon.c |  15 +-
 tests/test-keyval.c  | 205 +++
 util/keyval.c|  38 -
 7 files changed, 252 insertions(+), 118 deletions(-)

-- 
2.25.4

Re: [PATCH] job: delete job_{lock,unlock} functions and replace them with lock guard

2020-09-30 Thread Elena Afanasova

On Tue, 2020-09-29 at 14:04 -0400, John Snow wrote:
> On 9/29/20 9:42 AM, Elena Afanasova wrote:
> > Signed-off-by: Elena Afanasova 
> 
> Hi, can I have a commit message here, please?
> 
> > ---
> >   job.c | 46 +-
> >   1 file changed, 17 insertions(+), 29 deletions(-)
> > 
> > diff --git a/job.c b/job.c
> > index 8fecf38960..89ceb53434 100644
> > --- a/job.c
> > +++ b/job.c
> > @@ -79,16 +79,6 @@ struct JobTxn {
> >* job_enter. */
> >   static QemuMutex job_mutex;
> >   
> > -static void job_lock(void)
> > -{
> > -qemu_mutex_lock(_mutex);
> > -}
> > -
> > -static void job_unlock(void)
> > -{
> > -qemu_mutex_unlock(_mutex);
> > -}
> > -
> >   static void __attribute__((__constructor__)) job_init(void)
> >   {
> >   qemu_mutex_init(_mutex);
> > @@ -437,21 +427,19 @@ void job_enter_cond(Job *job, bool(*fn)(Job
> > *job))
> >   return;
> >   }
> >   
> > -job_lock();
> > -if (job->busy) {
> > -job_unlock();
> > -return;
> > -}
> > +WITH_QEMU_LOCK_GUARD(_mutex) {
> > +if (job->busy) {
> > +return;
> > +}
> >   
> > -if (fn && !fn(job)) {
> > -job_unlock();
> > -return;
> > -}
> > +if (fn && !fn(job)) {
> > +return;
> > +}
> >   
> > -assert(!job->deferred_to_main_loop);
> > -timer_del(>sleep_timer);
> > -job->busy = true;
> > -job_unlock();
> > +assert(!job->deferred_to_main_loop);
> > +timer_del(>sleep_timer);
> > +job->busy = true;
> > +}
> >   aio_co_enter(job->aio_context, job->co);
> >   }
> >   
> > @@ -468,13 +456,13 @@ void job_enter(Job *job)
> >* called explicitly. */
> >   static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
> >   {
> > -job_lock();
> > -if (ns != -1) {
> > -timer_mod(>sleep_timer, ns);
> > +WITH_QEMU_LOCK_GUARD(_mutex) {
> > +if (ns != -1) {
> > +timer_mod(>sleep_timer, ns);
> > +}
> > +job->busy = false;
> > +job_event_idle(job);
> 
> Is this new macro safe to use in a coroutine context?

Hi, I suppose it's safe. It would be nice to get some more opinions
here.

> >   }
> > -job->busy = false;
> > -job_event_idle(job);
> > -job_unlock();
> >   qemu_coroutine_yield();
> >   
> >   /* Set by job_enter_cond() before re-entering the
> > coroutine.  */
> > 
> 
> I haven't looked into WITH_QEMU_LOCK_GUARD before, I assume it's new.
> If 
> it works like I think it does, this change seems good.
> 
> (I'm assuming it works like a Python context manager and it drops
> the 
> lock when it leaves the scope of the macro using GCC/Clang language 
> extensions.)
>

[PATCH v2 4/5] nbd: Add new qemu:allocation-depth metacontext

2020-09-30 Thread Eric Blake

'qemu-img map' provides a way to determine which extents of an image
come from the top layer vs. inherited from a backing chain.  This is
useful information worth exposing over NBD.  There is a proposal to
add a QMP command block-dirty-bitmap-populate which can create a dirty
bitmap that reflects allocation information, at which point
qemu:dirty-bitmap:NAME can expose that information via the creation of
a temporary bitmap, but we can shorten the effort by adding a new
qemu:allocation-depth context that does the same thing without an
intermediate bitmap (this patch does not eliminate the need for that
proposal, as it will have other uses as well).

For this patch, I just encoded a tri-state value (unallocated, from
this layer, from any of the backing layers); an obvious extension
would be to provide the actual depth in bits 31-4 while keeping bits
1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
from a hex number).  But this extension would require
bdrv_is_allocated_above to return a depth number.

Note that this patch does not actually enable any way to request a
server to enable this context; that will come in the next patch.

Signed-off-by: Eric Blake 
---
 docs/interop/nbd.txt |  22 +++--
 include/block/nbd.h  |   8 +++-
 nbd/server.c | 105 ---
 3 files changed, 125 insertions(+), 10 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f3b3cacc9621..56efec7aee12 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -17,9 +17,9 @@ namespace "qemu".

 == "qemu" namespace ==

-The "qemu" namespace currently contains only one type of context,
-related to exposing the contents of a dirty bitmap alongside the
-associated disk contents.  That context has the following form:
+The "qemu" namespace currently contains two types of context.  The
+first is related to exposing the contents of a dirty bitmap alongside
+the associated disk contents.  That context has the following form:

 qemu:dirty-bitmap:

@@ -28,8 +28,21 @@ in reply for NBD_CMD_BLOCK_STATUS:

 bit 0: NBD_STATE_DIRTY, means that the extent is "dirty"

+The second is related to exposing the source of various extents within
+the image, with a single context named:
+
+qemu:allocation-depth
+
+In the allocation depth context, bits 0 and 1 form a tri-state value:
+
+bits 0-1 clear: NBD_STATE_DEPTH_UNALLOC, means the extent is unallocated
+bit 0 set: NBD_STATE_DEPTH_LOCAL, the extent is allocated in this image
+bit 1 set: NBD_STATE_DEPTH_BACKING, the extent is inherited from a
+   backing layer
+
 For NBD_OPT_LIST_META_CONTEXT the following queries are supported
-in addition to "qemu:dirty-bitmap:":
+in addition to the specific "qemu:allocation-depth" and
+"qemu:dirty-bitmap:":

 * "qemu:" - returns list of all available metadata contexts in the
 namespace.
@@ -55,3 +68,4 @@ the operation of that feature.
 NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
+* 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 3dd9a04546ec..06208bc25027 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright (C) 2016-2020 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori 
  *
  *  Network Block Device
@@ -259,6 +259,12 @@ enum {
 /* Extent flags for qemu:dirty-bitmap in NBD_REPLY_TYPE_BLOCK_STATUS */
 #define NBD_STATE_DIRTY (1 << 0)

+/* Extent flags for qemu:allocation-depth in NBD_REPLY_TYPE_BLOCK_STATUS */
+#define NBD_STATE_DEPTH_UNALLOC (0 << 0)
+#define NBD_STATE_DEPTH_LOCAL   (1 << 0)
+#define NBD_STATE_DEPTH_BACKING (2 << 0)
+#define NBD_STATE_DEPTH_MASK(0x3)
+
 static inline bool nbd_reply_type_is_error(int type)
 {
 return type & (1 << 15);
diff --git a/nbd/server.c b/nbd/server.c
index 7271a09b5c2b..830b21000be3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -27,7 +27,8 @@
 #include "qemu/units.h"

 #define NBD_META_ID_BASE_ALLOCATION 0
-#define NBD_META_ID_DIRTY_BITMAP 1
+#define NBD_META_ID_ALLOCATION_DEPTH 1
+#define NBD_META_ID_DIRTY_BITMAP 2

 /*
  * NBD_MAX_BLOCK_STATUS_EXTENTS: 1 MiB of extents data. An empirical
@@ -94,6 +95,7 @@ struct NBDExport {
 BlockBackend *eject_notifier_blk;
 Notifier eject_notifier;

+bool alloc_context;
 BdrvDirtyBitmap *export_bitmap;
 char *export_bitmap_context;
 };
@@ -108,6 +110,7 @@ typedef struct NBDExportMetaContexts {
 bool valid; /* means that negotiation of the option finished without
errors */
 bool base_allocation; /* export base:allocation context (block status) */
+bool allocation_depth; /* export qemu:allocation-depth */
 bool bitmap; /* export qemu:dirty-bitmap: */
 } NBDExportMetaContexts;

@@ -806,7 +809,7 @@ static bool

[PATCH v2 5/5] nbd: Add 'qemu-nbd -A' to expose allocation depth

2020-09-30 Thread Eric Blake

Allow the server to expose an additional metacontext to be requested
by savvy clients.  qemu-nbd adds a new option -A to expose the
qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this
can also be set via QMP when using block-export-add.

qemu as client can be hacked into viewing this new context by using
the now-misnamed x-dirty-bitmap option when creating an NBD blockdev
(even though our x- naming means we could rename it, I did not think
it worth breaking back-compat of tools that have been using it while
waiting for a better solution).  It is worth noting the decoding of
how such context information will appear in 'qemu-img map
--output=json':

NBD_STATE_DEPTH_UNALLOC => "zero":false, "data":true
NBD_STATE_DEPTH_LOCAL   => "zero":false, "data":false
NBD_STATE_DEPTH_BACKING => "zero":true,  "data":true

libnbd as client is probably a nicer way to get at the information
without having to decipher such hacks in qemu as client. ;)

Signed-off-by: Eric Blake 
---
 docs/tools/qemu-nbd.rst|  6 
 qapi/block-core.json   |  7 ++--
 qapi/block-export.json |  6 +++-
 blockdev-nbd.c |  2 ++
 nbd/server.c   |  2 ++
 qemu-nbd.c | 14 ++--
 tests/qemu-iotests/309 | 73 ++
 tests/qemu-iotests/309.out | 22 
 tests/qemu-iotests/group   |  1 +
 9 files changed, 127 insertions(+), 6 deletions(-)
 create mode 100755 tests/qemu-iotests/309
 create mode 100644 tests/qemu-iotests/309.out

diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
index 667861cb22e9..0e545a97cfa3 100644
--- a/docs/tools/qemu-nbd.rst
+++ b/docs/tools/qemu-nbd.rst
@@ -72,6 +72,12 @@ driver options if ``--image-opts`` is specified.

   Export the disk as read-only.

+.. option:: -A, --allocation-depth
+
+  Expose allocation depth information via the
+  ``qemu:allocation-depth`` context accessible through
+  NBD_OPT_SET_META_CONTEXT.
+
 .. option:: -B, --bitmap=NAME

   If *filename* has a qcow2 persistent bitmap *NAME*, expose
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d620bd1302b2..0379eb992db8 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3874,9 +3874,12 @@
 #
 # @tls-creds: TLS credentials ID
 #
-# @x-dirty-bitmap: A "qemu:dirty-bitmap:NAME" string to query in place of
+# @x-dirty-bitmap: A metacontext name such as "qemu:dirty-bitmap:NAME" or
+#  "qemu:allocation-depth" to query in place of the
 #  traditional "base:allocation" block status (see
-#  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0)
+#  NBD_OPT_LIST_META_CONTEXT in the NBD protocol; and
+#  yes, naming this option x-context would have made
+#  more sense) (since 3.0)
 #
 # @reconnect-delay: On an unexpected disconnect, the nbd client tries to
 #   connect again until succeeding or encountering a serious
diff --git a/qapi/block-export.json b/qapi/block-export.json
index 2291d6cb0cbc..45e43984b11d 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -78,11 +78,15 @@
 #  NBD client can use NBD_OPT_SET_META_CONTEXT with
 #  "qemu:dirty-bitmap:NAME" to inspect the bitmap. (since 4.0)
 #
+# @alloc: Also export the allocation map for @device, so the NBD client
+# can use NBD_OPT_SET_META_CONTEXT with "qemu:allocation-depth"
+# to inspect allocation details. (since 5.2)
+#
 # Since: 5.0
 ##
 { 'struct': 'BlockExportOptionsNbd',
   'data': { '*name': 'str', '*description': 'str',
-'*bitmap': 'str' } }
+'*bitmap': 'str', '*alloc': 'bool' } }

 ##
 # @NbdServerAddOptions:
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 8174023e5c47..f9012f93e2bb 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -212,6 +212,8 @@ void qmp_nbd_server_add(NbdServerAddOptions *arg, Error 
**errp)
 .description= g_strdup(arg->description),
 .has_bitmap = arg->has_bitmap,
 .bitmap = g_strdup(arg->bitmap),
+.has_alloc  = arg->alloc,
+.alloc  = arg->alloc,
 },
 };

diff --git a/nbd/server.c b/nbd/server.c
index 830b21000be3..11cdc2eab0b3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1598,6 +1598,8 @@ static int nbd_export_create(BlockExport *blk_exp, 
BlockExportOptions *exp_args,
 assert(strlen(exp->export_bitmap_context) < NBD_MAX_STRING_SIZE);
 }

+exp->alloc_context = arg->alloc;
+
 blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);

 QTAILQ_INSERT_TAIL(, exp, next);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index e7520261134f..3a92e00464de 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -99,6 +99,7 @@ static void usage(const char *name)
 "\n"
 "Exposing part of the image:\n"
 "  -o, --offset=OFFSET   offset into the image\n"
+"  -A, --allocation-depthexpose the allocation depth\n"
 "

[PATCH v2 3/5] nbd: Simplify meta-context parsing

2020-09-30 Thread Eric Blake

We had a premature optimization of trying to read as little from the
wire as possible while handling NBD_OPT_SET_META_CONTEXT in phases.
But in reality, we HAVE to read the entire string from the client
before we can get to the next command, and it is easier to just read
it all at once than it is to read it in pieces.  And once we do that,
several functions end up no longer performing I/O, so they can drop
length and errp parameters, and just return a bool instead of
modifying through a pointer.

Our iotests still pass; I also checked that libnbd's testsuite (which
covers more corner cases of odd meta context requests) still passes.

Signed-off-by: Eric Blake 
---
 nbd/server.c | 193 +++
 1 file changed, 70 insertions(+), 123 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 809f88ce6607..7271a09b5c2b 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2018 Red Hat, Inc.
+ *  Copyright (C) 2016-2020 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori 
  *
  *  Network Block Device Server Side
@@ -797,135 +797,95 @@ static int nbd_negotiate_send_meta_context(NBDClient 
*client,
 return qio_channel_writev_all(client->ioc, iov, 2, errp) < 0 ? -EIO : 0;
 }

-/* Read strlen(@pattern) bytes, and set @match to true if they match @pattern.
- * @match is never set to false.
- *
- * Return -errno on I/O error, 0 if option was completely handled by
- * sending a reply about inconsistent lengths, or 1 on success.
- *
- * Note: return code = 1 doesn't mean that we've read exactly @pattern.
- * It only means that there are no errors.
+/*
+ * Return true if @query matches @pattern, or if @query is empty when
+ * the @client is performing _LIST_.
  */
-static int nbd_meta_pattern(NBDClient *client, const char *pattern, bool 
*match,
-Error **errp)
+static bool nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
+  const char *query)
 {
-int ret;
-char *query;
-size_t len = strlen(pattern);
-
-assert(len);
-
-query = g_malloc(len);
-ret = nbd_opt_read(client, query, len, true, errp);
-if (ret <= 0) {
-g_free(query);
-return ret;
+if (!*query) {
+trace_nbd_negotiate_meta_query_parse("empty");
+return client->opt == NBD_OPT_LIST_META_CONTEXT;
 }
-
-if (strncmp(query, pattern, len) == 0) {
+if (strcmp(query, pattern) == 0) {
 trace_nbd_negotiate_meta_query_parse(pattern);
-*match = true;
-} else {
-trace_nbd_negotiate_meta_query_skip("pattern not matched");
+return true;
 }
-g_free(query);
-
-return 1;
+trace_nbd_negotiate_meta_query_skip("pattern not matched");
+return false;
 }

 /*
- * Read @len bytes, and set @match to true if they match @pattern, or if @len
- * is 0 and the client is performing _LIST_. @match is never set to false.
- *
- * Return -errno on I/O error, 0 if option was completely handled by
- * sending a reply about inconsistent lengths, or 1 on success.
- *
- * Note: return code = 1 doesn't mean that we've read exactly @pattern.
- * It only means that there are no errors.
+ * Return true and adjust @str in place if it begins with @prefix.
  */
-static int nbd_meta_empty_or_pattern(NBDClient *client, const char *pattern,
- uint32_t len, bool *match, Error **errp)
+static bool nbd_strshift(const char **str, const char *prefix)
 {
-if (len == 0) {
-if (client->opt == NBD_OPT_LIST_META_CONTEXT) {
-*match = true;
-}
-trace_nbd_negotiate_meta_query_parse("empty");
-return 1;
-}
+size_t len = strlen(prefix);

-if (len != strlen(pattern)) {
-trace_nbd_negotiate_meta_query_skip("different lengths");
-return nbd_opt_skip(client, len, errp);
+if (strncmp(*str, prefix, len) == 0) {
+*str += len;
+return true;
 }
-
-return nbd_meta_pattern(client, pattern, match, errp);
+return false;
 }

 /* nbd_meta_base_query
  *
  * Handle queries to 'base' namespace. For now, only the base:allocation
- * context is available.  'len' is the amount of text remaining to be read from
- * the current name, after the 'base:' portion has been stripped.
- *
- * Return -errno on I/O error, 0 if option was completely handled by
- * sending a reply about inconsistent lengths, or 1 on success.
+ * context is available.  Return true if @query has been handled.
  */
-static int nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
-   uint32_t len, Error **errp)
+static bool nbd_meta_base_query(NBDClient *client, NBDExportMetaContexts *meta,
+const char *query)
 {
-return nbd_meta_empty_or_pattern(client, "allocation", len,
- >base_allocation, errp);
+if (!nbd_strshift(,

[PATCH v2 1/5] qemu-nbd: Honor SIGINT and SIGHUP

2020-09-30 Thread Eric Blake

Honoring just SIGTERM on Linux is too weak; we also want to handle
other common signals, and do so even on BSD.  Why?  Because at least
'qemu-nbd -B bitmap' needs a chance to clean up the in-use bit on
bitmaps when the server is shut down via a signal.

See also: http://bugzilla.redhat.com/1883608

Signed-off-by: Eric Blake 
---
 qemu-nbd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index bacb69b0898b..e7520261134f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -581,7 +581,7 @@ int main(int argc, char **argv)
 const char *pid_file_name = NULL;
 BlockExportOptions *export_opts;

-#if HAVE_NBD_DEVICE
+#ifdef CONFIG_POSIX
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
  */
@@ -589,9 +589,9 @@ int main(int argc, char **argv)
 memset(_sigterm, 0, sizeof(sa_sigterm));
 sa_sigterm.sa_handler = termsig_handler;
 sigaction(SIGTERM, _sigterm, NULL);
-#endif /* HAVE_NBD_DEVICE */
+sigaction(SIGINT, _sigterm, NULL);
+sigaction(SIGHUP, _sigterm, NULL);

-#ifdef CONFIG_POSIX
 signal(SIGPIPE, SIG_IGN);
 #endif

-- 
2.28.0

[PATCH v2 2/5] nbd/server: Reject embedded NUL in NBD strings

2020-09-30 Thread Eric Blake

The NBD spec is clear that any string sent from the client must not
contain embedded NUL characters.  If the client passes "a\0", we
should reject that option request rather than act on "a".

Testing this is not possible with a compliant client, but I was able
to use gdb to coerce libnbd into temporarily behaving as such a
client.

Signed-off-by: Eric Blake 
---
 nbd/server.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index f74766add7b7..809f88ce6607 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -301,10 +301,11 @@ nbd_opt_invalid(NBDClient *client, Error **errp, const 
char *fmt, ...)
 }

 /* Read size bytes from the unparsed payload of the current option.
+ * If @check_nul, require that no NUL bytes appear in buffer.
  * Return -errno on I/O error, 0 if option was completely handled by
  * sending a reply about inconsistent lengths, or 1 on success. */
 static int nbd_opt_read(NBDClient *client, void *buffer, size_t size,
-Error **errp)
+bool check_nul, Error **errp)
 {
 if (size > client->optlen) {
 return nbd_opt_invalid(client, errp,
@@ -312,7 +313,16 @@ static int nbd_opt_read(NBDClient *client, void *buffer, 
size_t size,
nbd_opt_lookup(client->opt));
 }
 client->optlen -= size;
-return qio_channel_read_all(client->ioc, buffer, size, errp) < 0 ? -EIO : 
1;
+if (qio_channel_read_all(client->ioc, buffer, size, errp) < 0) {
+return -EIO;
+}
+
+if (check_nul && strnlen(buffer, size) != size) {
+return nbd_opt_invalid(client, errp,
+   "Unexpected embedded NUL in option %s",
+   nbd_opt_lookup(client->opt));
+}
+return 1;
 }

 /* Drop size bytes from the unparsed payload of the current option.
@@ -349,7 +359,7 @@ static int nbd_opt_read_name(NBDClient *client, char 
**name, uint32_t *length,
 g_autofree char *local_name = NULL;

 *name = NULL;
-ret = nbd_opt_read(client, , sizeof(len), errp);
+ret = nbd_opt_read(client, , sizeof(len), false, errp);
 if (ret <= 0) {
 return ret;
 }
@@ -361,7 +371,7 @@ static int nbd_opt_read_name(NBDClient *client, char 
**name, uint32_t *length,
 }

 local_name = g_malloc(len + 1);
-ret = nbd_opt_read(client, local_name, len, errp);
+ret = nbd_opt_read(client, local_name, len, true, errp);
 if (ret <= 0) {
 return ret;
 }
@@ -576,14 +586,14 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
 }
 trace_nbd_negotiate_handle_export_name_request(name);

-rc = nbd_opt_read(client, , sizeof(requests), errp);
+rc = nbd_opt_read(client, , sizeof(requests), false, errp);
 if (rc <= 0) {
 return rc;
 }
 requests = be16_to_cpu(requests);
 trace_nbd_negotiate_handle_info_requests(requests);
 while (requests--) {
-rc = nbd_opt_read(client, , sizeof(request), errp);
+rc = nbd_opt_read(client, , sizeof(request), false, errp);
 if (rc <= 0) {
 return rc;
 }
@@ -806,7 +816,7 @@ static int nbd_meta_pattern(NBDClient *client, const char 
*pattern, bool *match,
 assert(len);

 query = g_malloc(len);
-ret = nbd_opt_read(client, query, len, errp);
+ret = nbd_opt_read(client, query, len, true, errp);
 if (ret <= 0) {
 g_free(query);
 return ret;
@@ -943,7 +953,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
 char ns[5];
 uint32_t len;

-ret = nbd_opt_read(client, , sizeof(len), errp);
+ret = nbd_opt_read(client, , sizeof(len), false, errp);
 if (ret <= 0) {
 return ret;
 }
@@ -959,7 +969,7 @@ static int nbd_negotiate_meta_query(NBDClient *client,
 }

 len -= ns_len;
-ret = nbd_opt_read(client, ns, ns_len, errp);
+ret = nbd_opt_read(client, ns, ns_len, true, errp);
 if (ret <= 0) {
 return ret;
 }
@@ -1016,7 +1026,7 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
 "export '%s' not present", sane_name);
 }

-ret = nbd_opt_read(client, _queries, sizeof(nb_queries), errp);
+ret = nbd_opt_read(client, _queries, sizeof(nb_queries), false, errp);
 if (ret <= 0) {
 return ret;
 }
-- 
2.28.0

[PATCH v2 0/5] Exposing backing-chain allocation over NBD

2020-09-30 Thread Eric Blake

v1 was here: https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg09623.htm
l

Based-on: <20200924152717.287415-1-kw...@redhat.com>
(block/export: Add infrastructure and QAPI for block exports)

Also available at:
https://repo.or.cz/qemu/ericb.git/shortlog/refs/tags/nbd-alloc-depth-v2

Since then:
- rebase on Kevin's work
- add new patch to fix qemu-nbd SIGINT (conflicts with Stefan's work,
we can either rebase his on mine or drop mine if his goes in first)
- split out fix for handling NUL bytes from client [Vladimir]
- further cleanups of query parsing [Vladimir]
- more documentation of how we could also expose actual depth in
remaining bits of the context reply [Rich]

001/5:[down] 'qemu-nbd: Honor SIGINT and SIGHUP'
002/5:[down] 'nbd/server: Reject embedded NUL in NBD strings'
003/5:[0139] [FC] 'nbd: Simplify meta-context parsing'
004/5:[0035] [FC] 'nbd: Add new qemu:allocation-depth metacontext'
005/5:[0038] [FC] 'nbd: Add 'qemu-nbd -A' to expose allocation depth'

Eric Blake (5):
  qemu-nbd: Honor SIGINT and SIGHUP
  nbd/server: Reject embedded NUL in NBD strings
  nbd: Simplify meta-context parsing
  nbd: Add new qemu:allocation-depth metacontext
  nbd: Add 'qemu-nbd -A' to expose allocation depth

 docs/interop/nbd.txt   |  22 ++-
 docs/tools/qemu-nbd.rst|   6 +
 qapi/block-core.json   |   7 +-
 qapi/block-export.json |   6 +-
 include/block/nbd.h|   8 +-
 blockdev-nbd.c |   2 +
 nbd/server.c   | 324 +
 qemu-nbd.c |  20 ++-
 tests/qemu-iotests/309 |  73 +
 tests/qemu-iotests/309.out |  22 +++
 tests/qemu-iotests/group   |   1 +
 11 files changed, 342 insertions(+), 149 deletions(-)
 create mode 100755 tests/qemu-iotests/309
 create mode 100644 tests/qemu-iotests/309.out

-- 
2.28.0

Re: [PATCH v7 06/13] qmp: Call monitor_set_cur() only in qmp_dispatch()

2020-09-30 Thread Kevin Wolf

Am 30.09.2020 um 11:26 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > Am 28.09.2020 um 13:42 hat Markus Armbruster geschrieben:
> >> Kevin Wolf  writes:
> >> 
> >> > Am 14.09.2020 um 17:10 hat Markus Armbruster geschrieben:
> >> >> Kevin Wolf  writes:
> >> >> 
> >> >> > The correct way to set the current monitor for a coroutine handler 
> >> >> > will
> >> >> > be different than for a blocking handler, so monitor_set_cur() needs 
> >> >> > to
> >> >> > be called in qmp_dispatch().
> >> >> >
> >> >> > Signed-off-by: Kevin Wolf 
> >> >> > ---
> >> >> >  include/qapi/qmp/dispatch.h | 3 ++-
> >> >> >  monitor/qmp.c   | 8 +---
> >> >> >  qapi/qmp-dispatch.c | 8 +++-
> >> >> >  qga/main.c  | 2 +-
> >> >> >  stubs/monitor-core.c| 5 +
> >> >> >  tests/test-qmp-cmds.c   | 6 +++---
> >> >> >  6 files changed, 19 insertions(+), 13 deletions(-)
> >> >> >
> >> >> > diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
> >> >> > index 5a9cf82472..0c2f467028 100644
> >> >> > --- a/include/qapi/qmp/dispatch.h
> >> >> > +++ b/include/qapi/qmp/dispatch.h
> >> >> > @@ -14,6 +14,7 @@
> >> >> >  #ifndef QAPI_QMP_DISPATCH_H
> >> >> >  #define QAPI_QMP_DISPATCH_H
> >> >> >  
> >> >> > +#include "monitor/monitor.h"
> >> >> >  #include "qemu/queue.h"
> >> >> >  
> >> >> >  typedef void (QmpCommandFunc)(QDict *, QObject **, Error **);
> >> >> > @@ -49,7 +50,7 @@ const char *qmp_command_name(const QmpCommand *cmd);
> >> >> >  bool qmp_has_success_response(const QmpCommand *cmd);
> >> >> >  QDict *qmp_error_response(Error *err);
> >> >> >  QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
> >> >> > -bool allow_oob);
> >> >> > +bool allow_oob, Monitor *cur_mon);
> >> >> >  bool qmp_is_oob(const QDict *dict);
> >> >> >  
> >> >> >  typedef void (*qmp_cmd_callback_fn)(const QmpCommand *cmd, void 
> >> >> > *opaque);
> >> >> > diff --git a/monitor/qmp.c b/monitor/qmp.c
> >> >> > index 8469970c69..922fdb5541 100644
> >> >> > --- a/monitor/qmp.c
> >> >> > +++ b/monitor/qmp.c
> >> >> > @@ -135,16 +135,10 @@ static void monitor_qmp_respond(MonitorQMP 
> >> >> > *mon, QDict *rsp)
> >> >> >  
> >> >> >  static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
> >> >> >  {
> >> >> > -Monitor *old_mon;
> >> >> >  QDict *rsp;
> >> >> >  QDict *error;
> >> >> >  
> >> >> > -old_mon = monitor_set_cur(>common);
> >> >> > -assert(old_mon == NULL);
> >> >> > -
> >> >> > -rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon));
> >> >> > -
> >> >> > -monitor_set_cur(NULL);
> >> >> > +rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon), 
> >> >> > >common);
> >> >> 
> >> >> Long line.  Happy to wrap it in my tree.  A few more in PATCH 08-11.
> >> >
> >> > It's 79 characters. Should be fine even with your local deviation from
> >> > the coding style to require less than that for comments?
> >> 
> >> Let me rephrase my remark.
> >> 
> >> For me,
> >> 
> >> rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
> >>>common);
> >> 
> >> is significantly easier to read than
> >> 
> >> rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon), 
> >> >common);
> >
> > I guess this is highly subjective. I find wrapped lines harder to read.
> > For answering subjective questions like this, we generally use the
> > coding style document.
> >
> > Anyway, I guess following an idiosyncratic coding style that is
> > different from every other subsystem in QEMU is possible (if
> > inconvenient) if I know what it is.
> 
> The applicable coding style document is PEP 8.

I'll happily apply PEP 8 to Python code, but this is C. I don't think
PEP 8 applies very well to C code. (In fact, PEP 7 exists as a C style
guide, but we're not writing C code for the Python project here...)

> > My problem is more that I don't know what the exact rules are. Can they
> > only be figured out experimentally by submitting patches and seeing
> > whether you like them or not?
> 
> PEP 8:
> 
> A style guide is about consistency.  Consistency with this style
> guide is important.  Consistency within a project is more important.
> Consistency within one module or function is the most important.
> 
> In other words, you should make a reasonable effort to blend in.

The project style guide for C is defined in CODING_STYLE.rst. Missing
consistency with it is what I'm complaining about.

I also agree that consistency within one module or function is most
important, which is why I allow you to reformat my code. But I don't
think it means that local coding style rules shouldn't be documented,
especially if you can't just look at the code and see immediately how
it's supposed to be.

> >> Would you mind me wrapping this line in my tree?
> >
> > I have no say in this subsystem and I take it that you want all code to
> > look as if you had written it

[PULL 07/17] block: return error-code from bdrv_invalidate_cache

2020-09-30 Thread Stefan Hajnoczi

From: Vladimir Sementsov-Ogievskiy 

This is the only coroutine wrapper from block.c and block/io.c which
doesn't return a value, so let's convert it to the common behavior, to
simplify moving to generated coroutine wrappers in a further commit.

Also, bdrv_invalidate_cache is a void function, returning error only
through **errp parameter, which is considered to be bad practice, as
it forces callers to define and propagate local_err variable, so
conversion is good anyway.

This patch leaves the conversion of .bdrv_co_invalidate_cache() driver
callbacks and bdrv_invalidate_cache_all() for another day.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200924185414.28642-2-vsement...@virtuozzo.com>
---
 include/block/block.h |  2 +-
 block.c   | 32 ++--
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b314..81d591dd4c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -460,7 +460,7 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb);
 int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 
 /* Invalidate any cached metadata used by image formats */
-void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
+int bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
 
diff --git a/block.c b/block.c
index f72a2e26e8..4829c8ac47 100644
--- a/block.c
+++ b/block.c
@@ -5781,8 +5781,8 @@ void bdrv_init_with_whitelist(void)
 bdrv_init();
 }
 
-static void coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
-  Error **errp)
+static int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
+ Error **errp)
 {
 BdrvChild *child, *parent;
 uint64_t perm, shared_perm;
@@ -5791,14 +5791,14 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 BdrvDirtyBitmap *bm;
 
 if (!bs->drv)  {
-return;
+return -ENOMEDIUM;
 }
 
 QLIST_FOREACH(child, >children, next) {
 bdrv_co_invalidate_cache(child->bs, _err);
 if (local_err) {
 error_propagate(errp, local_err);
-return;
+return -EINVAL;
 }
 }
 
@@ -5821,7 +5821,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 ret = bdrv_check_perm(bs, NULL, perm, shared_perm, NULL, NULL, errp);
 if (ret < 0) {
 bs->open_flags |= BDRV_O_INACTIVE;
-return;
+return ret;
 }
 bdrv_set_perm(bs, perm, shared_perm);
 
@@ -5830,7 +5830,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 if (local_err) {
 bs->open_flags |= BDRV_O_INACTIVE;
 error_propagate(errp, local_err);
-return;
+return -EINVAL;
 }
 }
 
@@ -5842,7 +5842,7 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 if (ret < 0) {
 bs->open_flags |= BDRV_O_INACTIVE;
 error_setg_errno(errp, -ret, "Could not refresh total sector 
count");
-return;
+return ret;
 }
 }
 
@@ -5852,27 +5852,30 @@ static void coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs,
 if (local_err) {
 bs->open_flags |= BDRV_O_INACTIVE;
 error_propagate(errp, local_err);
-return;
+return -EINVAL;
 }
 }
 }
+
+return 0;
 }
 
 typedef struct InvalidateCacheCo {
 BlockDriverState *bs;
 Error **errp;
 bool done;
+int ret;
 } InvalidateCacheCo;
 
 static void coroutine_fn bdrv_invalidate_cache_co_entry(void *opaque)
 {
 InvalidateCacheCo *ico = opaque;
-bdrv_co_invalidate_cache(ico->bs, ico->errp);
+ico->ret = bdrv_co_invalidate_cache(ico->bs, ico->errp);
 ico->done = true;
 aio_wait_kick();
 }
 
-void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
+int bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
 Coroutine *co;
 InvalidateCacheCo ico = {
@@ -5889,22 +5892,23 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error 
**errp)
 bdrv_coroutine_enter(bs, co);
 BDRV_POLL_WHILE(bs, !ico.done);
 }
+
+return ico.ret;
 }
 
 void bdrv_invalidate_cache_all(Error **errp)
 {
 BlockDriverState *bs;
-Error *local_err = NULL;
 BdrvNextIterator it;
 
 for (bs = bdrv_first(); bs; bs = bdrv_next()) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
+int ret;
 
 aio_context_acquire(aio_context);
-bdrv_invalidate_cache(bs, _err);
+ret =

[PULL 14/17] include/block/block.h: drop non-ascii quotation mark

2020-09-30 Thread Stefan Hajnoczi

From: Vladimir Sementsov-Ogievskiy 

This is the only non-ascii character in the file and it doesn't really
needed here. Let's use normal "'" symbol for consistency with the rest
11 occurrences of "'" in the file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/block/block.h b/include/block/block.h
index 8b87df69a1..ce2ac39299 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -304,7 +304,7 @@ enum BdrvChildRoleBits {
 BDRV_CHILD_FILTERED = (1 << 2),
 
 /*
- * Child from which to read all data that isn’t allocated in the
+ * Child from which to read all data that isn't allocated in the
  * parent (i.e., the backing child); such data is copied to the
  * parent through COW (and optionally COR).
  * This field is mutually exclusive with DATA, METADATA, and
-- 
2.26.2

[PULL 10/17] scripts: add block-coroutine-wrapper.py

2020-09-30 Thread Stefan Hajnoczi

From: Vladimir Sementsov-Ogievskiy 

We have a very frequent pattern of creating a coroutine from a function
with several arguments:

  - create a structure to pack parameters
  - create _entry function to call original function taking parameters
from struct
  - do different magic to handle completion: set ret to NOT_DONE or
EINPROGRESS or use separate bool field
  - fill the struct and create coroutine from _entry function with this
struct as a parameter
  - do coroutine enter and BDRV_POLL_WHILE loop

Let's reduce code duplication by generating coroutine wrappers.

This patch adds scripts/block-coroutine-wrapper.py together with some
friends, which will generate functions with declared prototypes marked
by the 'generated_co_wrapper' specifier.

The usage of new code generation is as follows:

1. define the coroutine function somewhere

int coroutine_fn bdrv_co_NAME(...) {...}

2. declare in some header file

int generated_co_wrapper bdrv_NAME(...);

   with same list of parameters (generated_co_wrapper is
   defined in "include/block/block.h").

3. Make sure the block_gen_c declaration in block/meson.build
   mentions the file with your marker function.

Still, no function is now marked, this work is for the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
Message-Id: <20200924185414.28642-5-vsement...@virtuozzo.com>
[Added encoding='utf-8' to open() calls as requested by Vladimir. Fixed
typo and grammar issues pointed out by Eric Blake.
--Stefan]
Signed-off-by: Stefan Hajnoczi 
---
 block/block-gen.h  |  49 +++
 include/block/block.h  |  10 ++
 block/meson.build  |   8 ++
 docs/devel/block-coroutine-wrapper.rst |  54 +++
 docs/devel/index.rst   |   1 +
 scripts/block-coroutine-wrapper.py | 188 +
 6 files changed, 310 insertions(+)
 create mode 100644 block/block-gen.h
 create mode 100644 docs/devel/block-coroutine-wrapper.rst
 create mode 100644 scripts/block-coroutine-wrapper.py

diff --git a/block/block-gen.h b/block/block-gen.h
new file mode 100644
index 00..f80cf4897d
--- /dev/null
+++ b/block/block-gen.h
@@ -0,0 +1,49 @@
+/*
+ * Block coroutine wrapping core, used by auto-generated block/block-gen.c
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ * Copyright (c) 2020 Virtuozzo International GmbH
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_BLOCK_GEN_H
+#define BLOCK_BLOCK_GEN_H
+
+#include "block/block_int.h"
+
+/* Base structure for argument packing structures */
+typedef struct BdrvPollCo {
+BlockDriverState *bs;
+bool in_progress;
+int ret;
+Coroutine *co; /* Keep pointer here for debugging */
+} BdrvPollCo;
+
+static inline int bdrv_poll_co(BdrvPollCo *s)
+{
+assert(!qemu_in_coroutine());
+
+bdrv_coroutine_enter(s->bs, s->co);
+BDRV_POLL_WHILE(s->bs, s->in_progress);
+
+return s->ret;
+}
+
+#endif /* BLOCK_BLOCK_GEN_H */
diff --git a/include/block/block.h b/include/block/block.h
index 81d591dd4c..0f0ddc51b4 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -10,6 +10,16 @@
 #include "block/blockjob.h"
 #include "qemu/hbitmap.h"
 
+/*
+ * generated_co_wrapper
+ *
+ * Function specifier, which does nothing but mark functions to be
+ * generated by scripts/block-coroutine-wrapper.py
+ *
+ * Read more in docs/devel/block-coroutine-wrapper.rst
+ */
+#define generated_co_wrapper
+
 /* block.c */
 typedef struct BlockDriver BlockDriver;
 typedef struct BdrvChild BdrvChild;
diff --git a/block/meson.build b/block/meson.build
index a3e56b7cd1..88ad73583a 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -107,6 +107,14 @@ module_block_h = custom_target('module_block.h',
command: [module_block_py, '@OUTPUT0@', modsrc])
 block_ss.add(module_block_h)
 
+wrapper_py =

[PULL 09/17] block: declare some coroutine functions in block/coroutines.h

2020-09-30 Thread Stefan Hajnoczi

From: Vladimir Sementsov-Ogievskiy 

We are going to keep coroutine-wrappers code (structure-packing
parameters, BDRV_POLL wrapper functions) in separate auto-generated
files. So, we'll need a header with declaration of original _co_
functions, for those which are static now. As well, we'll need
declarations for wrapper functions. Do these declarations now, as a
preparation step.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200924185414.28642-4-vsement...@virtuozzo.com>
---
 block/coroutines.h | 67 ++
 block.c|  8 +++---
 block/io.c | 34 +++
 3 files changed, 88 insertions(+), 21 deletions(-)
 create mode 100644 block/coroutines.h

diff --git a/block/coroutines.h b/block/coroutines.h
new file mode 100644
index 00..9ce1730a09
--- /dev/null
+++ b/block/coroutines.h
@@ -0,0 +1,67 @@
+/*
+ * Block layer I/O functions
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_COROUTINES_INT_H
+#define BLOCK_COROUTINES_INT_H
+
+#include "block/block_int.h"
+
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res, BdrvCheckMode fix);
+int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp);
+
+int coroutine_fn
+bdrv_co_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
+ bool is_write, BdrvRequestFlags flags);
+int
+bdrv_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
+  bool is_write, BdrvRequestFlags flags);
+
+int coroutine_fn
+bdrv_co_common_block_status_above(BlockDriverState *bs,
+  BlockDriverState *base,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file);
+int
+bdrv_common_block_status_above(BlockDriverState *bs,
+   BlockDriverState *base,
+   bool want_zero,
+   int64_t offset,
+   int64_t bytes,
+   int64_t *pnum,
+   int64_t *map,
+   BlockDriverState **file);
+
+int coroutine_fn
+bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
+   bool is_read);
+int
+bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
+bool is_read);
+
+#endif /* BLOCK_COROUTINES_INT_H */
diff --git a/block.c b/block.c
index 4829c8ac47..517a425340 100644
--- a/block.c
+++ b/block.c
@@ -48,6 +48,7 @@
 #include "qemu/timer.h"
 #include "qemu/cutils.h"
 #include "qemu/id.h"
+#include "block/coroutines.h"
 
 #ifdef CONFIG_BSD
 #include 
@@ -4676,8 +4677,8 @@ static void bdrv_delete(BlockDriverState *bs)
  * free of errors) or -errno when an internal error occurred. The results of 
the
  * check are stored in res.
  */
-static int coroutine_fn bdrv_co_check(BlockDriverState *bs,
-  BdrvCheckResult *res, BdrvCheckMode fix)
+int coroutine_fn bdrv_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res, BdrvCheckMode fix)
 {
 if (bs->drv == NULL) {
 return -ENOMEDIUM;
@@ -5781,8 +5782,7 @@ void bdrv_init_with_whitelist(void)
 bdrv_init();
 }
 
-static int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs,
- Error **errp)
+int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
 BdrvChild *child, *parent;

[PULL 17/17] util/vfio-helpers: Rework the IOVA allocator to avoid IOVA reserved regions

2020-09-30 Thread Stefan Hajnoczi

From: Eric Auger 

Introduce the qemu_vfio_find_fixed/temp_iova helpers which
respectively allocate IOVAs from the bottom/top parts of the
usable IOVA range, without picking within host IOVA reserved
windows. The allocation remains basic: if the size is too big
for the remaining of the current usable IOVA range, we jump
to the next one, leaving a hole in the address map.

Signed-off-by: Eric Auger 
Message-id: 20200929085550.30926-3-eric.au...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 util/vfio-helpers.c | 57 +
 1 file changed, 53 insertions(+), 4 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index fe9ca9ce38..c469beb061 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -667,6 +667,50 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s)
 return true;
 }
 
+static int
+qemu_vfio_find_fixed_iova(QEMUVFIOState *s, size_t size, uint64_t *iova)
+{
+int i;
+
+for (i = 0; i < s->nb_iova_ranges; i++) {
+if (s->usable_iova_ranges[i].end < s->low_water_mark) {
+continue;
+}
+s->low_water_mark =
+MAX(s->low_water_mark, s->usable_iova_ranges[i].start);
+
+if (s->usable_iova_ranges[i].end - s->low_water_mark + 1 >= size ||
+s->usable_iova_ranges[i].end - s->low_water_mark + 1 == 0) {
+*iova = s->low_water_mark;
+s->low_water_mark += size;
+return 0;
+}
+}
+return -ENOMEM;
+}
+
+static int
+qemu_vfio_find_temp_iova(QEMUVFIOState *s, size_t size, uint64_t *iova)
+{
+int i;
+
+for (i = s->nb_iova_ranges - 1; i >= 0; i--) {
+if (s->usable_iova_ranges[i].start > s->high_water_mark) {
+continue;
+}
+s->high_water_mark =
+MIN(s->high_water_mark, s->usable_iova_ranges[i].end + 1);
+
+if (s->high_water_mark - s->usable_iova_ranges[i].start + 1 >= size ||
+s->high_water_mark - s->usable_iova_ranges[i].start + 1 == 0) {
+*iova = s->high_water_mark - size;
+s->high_water_mark = *iova;
+return 0;
+}
+}
+return -ENOMEM;
+}
+
 /* Map [host, host + size) area into a contiguous IOVA address space, and store
  * the result in @iova if not NULL. The caller need to make sure the area is
  * aligned to page size, and mustn't overlap with existing mapping areas (split
@@ -693,7 +737,11 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 goto out;
 }
 if (!temporary) {
-iova0 = s->low_water_mark;
+if (qemu_vfio_find_fixed_iova(s, size, )) {
+ret = -ENOMEM;
+goto out;
+}
+
 mapping = qemu_vfio_add_mapping(s, host, size, index + 1, iova0);
 if (!mapping) {
 ret = -ENOMEM;
@@ -705,15 +753,16 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, 
size_t size,
 qemu_vfio_undo_mapping(s, mapping, NULL);
 goto out;
 }
-s->low_water_mark += size;
 qemu_vfio_dump_mappings(s);
 } else {
-iova0 = s->high_water_mark - size;
+if (qemu_vfio_find_temp_iova(s, size, )) {
+ret = -ENOMEM;
+goto out;
+}
 ret = qemu_vfio_do_mapping(s, host, size, iova0);
 if (ret) {
 goto out;
 }
-s->high_water_mark -= size;
 }
 }
 if (iova) {
-- 
2.26.2

[PULL 11/17] block: generate coroutine-wrapper code

2020-09-30 Thread Stefan Hajnoczi

From: Vladimir Sementsov-Ogievskiy 

Use code generation implemented in previous commit to generated
coroutine wrappers in block.c and block/io.c

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200924185414.28642-6-vsement...@virtuozzo.com>
---
 block/coroutines.h|   6 +-
 include/block/block.h |  16 ++--
 block.c   |  73 ---
 block/io.c| 212 --
 4 files changed, 13 insertions(+), 294 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 9ce1730a09..c62b3a2697 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -34,7 +34,7 @@ int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState 
*bs, Error **errp);
 int coroutine_fn
 bdrv_co_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
  bool is_write, BdrvRequestFlags flags);
-int
+int generated_co_wrapper
 bdrv_prwv(BdrvChild *child, int64_t offset, QEMUIOVector *qiov,
   bool is_write, BdrvRequestFlags flags);
 
@@ -47,7 +47,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
   int64_t *pnum,
   int64_t *map,
   BlockDriverState **file);
-int
+int generated_co_wrapper
 bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
bool want_zero,
@@ -60,7 +60,7 @@ bdrv_common_block_status_above(BlockDriverState *bs,
 int coroutine_fn
 bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
bool is_read);
-int
+int generated_co_wrapper
 bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
 bool is_read);
 
diff --git a/include/block/block.h b/include/block/block.h
index 0f0ddc51b4..f2d85f2cf1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -403,8 +403,9 @@ void bdrv_refresh_filename(BlockDriverState *bs);
 int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
   PreallocMode prealloc, BdrvRequestFlags 
flags,
   Error **errp);
-int bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
-  PreallocMode prealloc, BdrvRequestFlags flags, Error **errp);
+int generated_co_wrapper
+bdrv_truncate(BdrvChild *child, int64_t offset, bool exact,
+  PreallocMode prealloc, BdrvRequestFlags flags, Error **errp);
 
 int64_t bdrv_nb_sectors(BlockDriverState *bs);
 int64_t bdrv_getlength(BlockDriverState *bs);
@@ -446,7 +447,8 @@ typedef enum {
 BDRV_FIX_ERRORS   = 2,
 } BdrvCheckMode;
 
-int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
+int generated_co_wrapper bdrv_check(BlockDriverState *bs, BdrvCheckResult *res,
+BdrvCheckMode fix);
 
 /* The units of offset and total_work_size may be chosen arbitrarily by the
  * block driver; total_work_size may change during the course of the amendment
@@ -470,12 +472,13 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb);
 int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf);
 
 /* Invalidate any cached metadata used by image formats */
-int bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
+int generated_co_wrapper bdrv_invalidate_cache(BlockDriverState *bs,
+   Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
 
 /* Ensure contents are flushed to disk.  */
-int bdrv_flush(BlockDriverState *bs);
+int generated_co_wrapper bdrv_flush(BlockDriverState *bs);
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
 int bdrv_flush_all(void);
 void bdrv_close_all(void);
@@ -490,7 +493,8 @@ void bdrv_drain_all(void);
 AIO_WAIT_WHILE(bdrv_get_aio_context(bs_),  \
cond); })
 
-int bdrv_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
+int generated_co_wrapper bdrv_pdiscard(BdrvChild *child, int64_t offset,
+   int64_t bytes);
 int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int64_t bytes);
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
diff --git a/block.c b/block.c
index 517a425340..429864e204 100644
--- a/block.c
+++ b/block.c
@@ -4691,43 +4691,6 @@ int coroutine_fn bdrv_co_check(BlockDriverState *bs,
 return bs->drv->bdrv_co_check(bs, res, fix);
 }
 
-typedef struct CheckCo {
-BlockDriverState *bs;
-BdrvCheckResult *res;
-BdrvCheckMode fix;
-int ret;
-} CheckCo;
-
-static void coroutine_fn bdrv_check_co_entry(void *opaque)
-{
-CheckCo *cco = opaque;
-cco->ret = bdrv_co_check(cco->bs, cco->res, cco->fix);
-aio_wait_kick();
-}
-
-int bdrv_check(BlockDriverState *bs,
-

[PULL 13/17] block/io: refactor save/load vmstate

2020-09-30 Thread Stefan Hajnoczi

From: Vladimir Sementsov-Ogievskiy 

Like for read/write in a previous commit, drop extra indirection layer,
generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200924185414.28642-8-vsement...@virtuozzo.com>
---
 block/coroutines.h| 10 +++
 include/block/block.h |  6 ++--
 block/io.c| 70 ++-
 3 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 6c63a819c9..f69179f5ef 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -57,11 +57,9 @@ bdrv_common_block_status_above(BlockDriverState *bs,
int64_t *map,
BlockDriverState **file);
 
-int coroutine_fn
-bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-   bool is_read);
-int generated_co_wrapper
-bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-bool is_read);
+int coroutine_fn bdrv_co_readv_vmstate(BlockDriverState *bs,
+   QEMUIOVector *qiov, int64_t pos);
+int coroutine_fn bdrv_co_writev_vmstate(BlockDriverState *bs,
+QEMUIOVector *qiov, int64_t pos);
 
 #endif /* BLOCK_COROUTINES_INT_H */
diff --git a/include/block/block.h b/include/block/block.h
index eef4cceaf0..8b87df69a1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -572,8 +572,10 @@ int path_has_protocol(const char *path);
 int path_is_absolute(const char *path);
 char *path_combine(const char *base_path, const char *filename);
 
-int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
-int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
+int generated_co_wrapper
+bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
+int generated_co_wrapper
+bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos);
 int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
   int64_t pos, int size);
 
diff --git a/block/io.c b/block/io.c
index c3dc1db036..54f0968aee 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2475,28 +2475,50 @@ int bdrv_is_allocated_above(BlockDriverState *top,
 }
 
 int coroutine_fn
-bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-   bool is_read)
+bdrv_co_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
 {
 BlockDriver *drv = bs->drv;
 BlockDriverState *child_bs = bdrv_primary_bs(bs);
 int ret = -ENOTSUP;
 
+if (!drv) {
+return -ENOMEDIUM;
+}
+
 bdrv_inc_in_flight(bs);
 
+if (drv->bdrv_load_vmstate) {
+ret = drv->bdrv_load_vmstate(bs, qiov, pos);
+} else if (child_bs) {
+ret = bdrv_co_readv_vmstate(child_bs, qiov, pos);
+}
+
+bdrv_dec_in_flight(bs);
+
+return ret;
+}
+
+int coroutine_fn
+bdrv_co_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
+{
+BlockDriver *drv = bs->drv;
+BlockDriverState *child_bs = bdrv_primary_bs(bs);
+int ret = -ENOTSUP;
+
 if (!drv) {
-ret = -ENOMEDIUM;
-} else if (drv->bdrv_load_vmstate) {
-if (is_read) {
-ret = drv->bdrv_load_vmstate(bs, qiov, pos);
-} else {
-ret = drv->bdrv_save_vmstate(bs, qiov, pos);
-}
+return -ENOMEDIUM;
+}
+
+bdrv_inc_in_flight(bs);
+
+if (drv->bdrv_save_vmstate) {
+ret = drv->bdrv_save_vmstate(bs, qiov, pos);
 } else if (child_bs) {
-ret = bdrv_co_rw_vmstate(child_bs, qiov, pos, is_read);
+ret = bdrv_co_writev_vmstate(child_bs, qiov, pos);
 }
 
 bdrv_dec_in_flight(bs);
+
 return ret;
 }
 
@@ -2504,38 +2526,18 @@ int bdrv_save_vmstate(BlockDriverState *bs, const 
uint8_t *buf,
   int64_t pos, int size)
 {
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, size);
-int ret;
+int ret = bdrv_writev_vmstate(bs, , pos);
 
-ret = bdrv_writev_vmstate(bs, , pos);
-if (ret < 0) {
-return ret;
-}
-
-return size;
-}
-
-int bdrv_writev_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
-{
-return bdrv_rw_vmstate(bs, qiov, pos, false);
+return ret < 0 ? ret : size;
 }
 
 int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
   int64_t pos, int size)
 {
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, buf, size);
-int ret;
+int ret = bdrv_readv_vmstate(bs, , pos);
 
-ret = bdrv_readv_vmstate(bs, , pos);
-if (ret < 0) {
-return ret;
-}
-
-return size;
-}
-
-int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
-{
-return bdrv_rw_vmstate(bs, qiov, pos, true);
+return ret < 0 ? ret :

[PULL 05/17] block/nvme: Use register definitions from 'block/nvme.h'

2020-09-30 Thread Stefan Hajnoczi

From: Philippe Mathieu-Daudé 

Use the NVMe register definitions from "block/nvme.h" which
ease a bit reviewing the code while matching the datasheet.

Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <20200922083821.578519-6-phi...@redhat.com>
---
 block/nvme.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index bd82990b66..959569d262 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -718,22 +718,22 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
  * Initialization". */
 
 cap = le64_to_cpu(regs->cap);
-if (!(cap & (1ULL << 37))) {
+if (!NVME_CAP_CSS(cap)) {
 error_setg(errp, "Device doesn't support NVMe command set");
 ret = -EINVAL;
 goto out;
 }
 
-s->page_size = MAX(4096, 1 << (12 + ((cap >> 48) & 0xF)));
-s->doorbell_scale = (4 << (((cap >> 32) & 0xF))) / sizeof(uint32_t);
+s->page_size = MAX(4096, 1 << NVME_CAP_MPSMIN(cap));
+s->doorbell_scale = (4 << NVME_CAP_DSTRD(cap)) / sizeof(uint32_t);
 bs->bl.opt_mem_alignment = s->page_size;
-timeout_ms = MIN(500 * ((cap >> 24) & 0xFF), 3);
+timeout_ms = MIN(500 * NVME_CAP_TO(cap), 3);
 
 /* Reset device to get a clean state. */
 regs->cc = cpu_to_le32(le32_to_cpu(regs->cc) & 0xFE);
 /* Wait for CSTS.RDY = 0. */
 deadline = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + timeout_ms * SCALE_MS;
-while (le32_to_cpu(regs->csts) & 0x1) {
+while (NVME_CSTS_RDY(le32_to_cpu(regs->csts))) {
 if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
 error_setg(errp, "Timeout while waiting for device to reset (%"
  PRId64 " ms)",
@@ -761,18 +761,19 @@ static int nvme_init(BlockDriverState *bs, const char 
*device, int namespace,
 }
 s->nr_queues = 1;
 QEMU_BUILD_BUG_ON(NVME_QUEUE_SIZE & 0xF000);
-regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << 16) | NVME_QUEUE_SIZE);
+regs->aqa = cpu_to_le32((NVME_QUEUE_SIZE << AQA_ACQS_SHIFT) |
+(NVME_QUEUE_SIZE << AQA_ASQS_SHIFT));
 regs->asq = cpu_to_le64(s->queues[INDEX_ADMIN]->sq.iova);
 regs->acq = cpu_to_le64(s->queues[INDEX_ADMIN]->cq.iova);
 
 /* After setting up all control registers we can enable device now. */
-regs->cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << 20) |
-  (ctz32(NVME_SQ_ENTRY_BYTES) << 16) |
-  0x1);
+regs->cc = cpu_to_le32((ctz32(NVME_CQ_ENTRY_BYTES) << CC_IOCQES_SHIFT) |
+   (ctz32(NVME_SQ_ENTRY_BYTES) << CC_IOSQES_SHIFT) |
+   CC_EN_MASK);
 /* Wait for CSTS.RDY = 1. */
 now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 deadline = now + timeout_ms * 100;
-while (!(le32_to_cpu(regs->csts) & 0x1)) {
+while (!NVME_CSTS_RDY(le32_to_cpu(regs->csts))) {
 if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > deadline) {
 error_setg(errp, "Timeout while waiting for device to start (%"
  PRId64 " ms)",
-- 
2.26.2

[PULL 16/17] util/vfio-helpers: Collect IOVA reserved regions

2020-09-30 Thread Stefan Hajnoczi

From: Eric Auger 

The IOVA allocator currently ignores host reserved regions.
As a result some chosen IOVAs may collide with some of them,
resulting in VFIO MAP_DMA errors later on. This happens on ARM
where the MSI reserved window quickly is encountered:
[0x800, 0x810]. since 5.4 kernel, VFIO returns the usable
IOVA regions. So let's enumerate them in the prospect to avoid
them, later on.

Signed-off-by: Eric Auger 
Message-id: 20200929085550.30926-2-eric.au...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 util/vfio-helpers.c | 72 +++--
 1 file changed, 70 insertions(+), 2 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 9ac307e3d4..fe9ca9ce38 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -40,6 +40,11 @@ typedef struct {
 uint64_t iova;
 } IOVAMapping;
 
+struct IOVARange {
+uint64_t start;
+uint64_t end;
+};
+
 struct QEMUVFIOState {
 QemuMutex lock;
 
@@ -49,6 +54,8 @@ struct QEMUVFIOState {
 int device;
 RAMBlockNotifier ram_notifier;
 struct vfio_region_info config_region_info, bar_region_info[6];
+struct IOVARange *usable_iova_ranges;
+uint8_t nb_iova_ranges;
 
 /* These fields are protected by @lock */
 /* VFIO's IO virtual address space is managed by splitting into a few
@@ -236,6 +243,35 @@ static int qemu_vfio_pci_write_config(QEMUVFIOState *s, 
void *buf, int size, int
 return ret == size ? 0 : -errno;
 }
 
+static void collect_usable_iova_ranges(QEMUVFIOState *s, void *buf)
+{
+struct vfio_iommu_type1_info *info = (struct vfio_iommu_type1_info *)buf;
+struct vfio_info_cap_header *cap = (void *)buf + info->cap_offset;
+struct vfio_iommu_type1_info_cap_iova_range *cap_iova_range;
+int i;
+
+while (cap->id != VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE) {
+if (!cap->next) {
+return;
+}
+cap = (struct vfio_info_cap_header *)(buf + cap->next);
+}
+
+cap_iova_range = (struct vfio_iommu_type1_info_cap_iova_range *)cap;
+
+s->nb_iova_ranges = cap_iova_range->nr_iovas;
+if (s->nb_iova_ranges > 1) {
+s->usable_iova_ranges =
+g_realloc(s->usable_iova_ranges,
+  s->nb_iova_ranges * sizeof(struct IOVARange));
+}
+
+for (i = 0; i < s->nb_iova_ranges; i++) {
+s->usable_iova_ranges[i].start = cap_iova_range->iova_ranges[i].start;
+s->usable_iova_ranges[i].end = cap_iova_range->iova_ranges[i].end;
+}
+}
+
 static int qemu_vfio_init_pci(QEMUVFIOState *s, const char *device,
   Error **errp)
 {
@@ -243,10 +279,13 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const 
char *device,
 int i;
 uint16_t pci_cmd;
 struct vfio_group_status group_status = { .argsz = sizeof(group_status) };
-struct vfio_iommu_type1_info iommu_info = { .argsz = sizeof(iommu_info) };
+struct vfio_iommu_type1_info *iommu_info = NULL;
+size_t iommu_info_size = sizeof(*iommu_info);
 struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
 char *group_file = NULL;
 
+s->usable_iova_ranges = NULL;
+
 /* Create a new container */
 s->container = open("/dev/vfio/vfio", O_RDWR);
 
@@ -310,13 +349,35 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const 
char *device,
 goto fail;
 }
 
+iommu_info = g_malloc0(iommu_info_size);
+iommu_info->argsz = iommu_info_size;
+
 /* Get additional IOMMU info */
-if (ioctl(s->container, VFIO_IOMMU_GET_INFO, _info)) {
+if (ioctl(s->container, VFIO_IOMMU_GET_INFO, iommu_info)) {
 error_setg_errno(errp, errno, "Failed to get IOMMU info");
 ret = -errno;
 goto fail;
 }
 
+/*
+ * if the kernel does not report usable IOVA regions, choose
+ * the legacy [QEMU_VFIO_IOVA_MIN, QEMU_VFIO_IOVA_MAX -1] region
+ */
+s->nb_iova_ranges = 1;
+s->usable_iova_ranges = g_new0(struct IOVARange, 1);
+s->usable_iova_ranges[0].start = QEMU_VFIO_IOVA_MIN;
+s->usable_iova_ranges[0].end = QEMU_VFIO_IOVA_MAX - 1;
+
+if (iommu_info->argsz > iommu_info_size) {
+iommu_info_size = iommu_info->argsz;
+iommu_info = g_realloc(iommu_info, iommu_info_size);
+if (ioctl(s->container, VFIO_IOMMU_GET_INFO, iommu_info)) {
+ret = -errno;
+goto fail;
+}
+collect_usable_iova_ranges(s, iommu_info);
+}
+
 s->device = ioctl(s->group, VFIO_GROUP_GET_DEVICE_FD, device);
 
 if (s->device < 0) {
@@ -365,8 +426,13 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char 
*device,
 if (ret) {
 goto fail;
 }
+g_free(iommu_info);
 return 0;
 fail:
+g_free(s->usable_iova_ranges);
+s->usable_iova_ranges = NULL;
+s->nb_iova_ranges = 0;
+g_free(iommu_info);
 close(s->group);
 fail_container:
 close(s->container);
@@ -716,6 +782,8 @@ void qemu_vfio_close(QEMUVFIOState *s)

[PULL 08/17] block/io: refactor coroutine wrappers

2020-09-30 Thread Stefan Hajnoczi

From: Vladimir Sementsov-Ogievskiy 

Most of our coroutine wrappers already follow this convention:

We have 'coroutine_fn bdrv_co_()' as
the core function, and a wrapper 'bdrv_()' which does parameter packing and calls bdrv_run_co().

The only outsiders are the bdrv_prwv_co and
bdrv_common_block_status_above wrappers. Let's refactor them to behave
as the others, it simplifies further conversion of coroutine wrappers.

This patch adds an indirection layer, but it will be compensated by
a further commit, which will drop bdrv_co_prwv together with the
is_write logic, to keep the read and write paths separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Stefan Hajnoczi 
Message-Id: <20200924185414.28642-3-vsement...@virtuozzo.com>
---
 block/io.c | 60 +-
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/block/io.c b/block/io.c
index 11df1889f1..b4f6ab0ab1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -933,27 +933,31 @@ typedef struct RwCo {
 BdrvRequestFlags flags;
 } RwCo;
 
+static int coroutine_fn bdrv_co_prwv(BdrvChild *child, int64_t offset,
+ QEMUIOVector *qiov, bool is_write,
+ BdrvRequestFlags flags)
+{
+if (is_write) {
+return bdrv_co_pwritev(child, offset, qiov->size, qiov, flags);
+} else {
+return bdrv_co_preadv(child, offset, qiov->size, qiov, flags);
+}
+}
+
 static int coroutine_fn bdrv_rw_co_entry(void *opaque)
 {
 RwCo *rwco = opaque;
 
-if (!rwco->is_write) {
-return bdrv_co_preadv(rwco->child, rwco->offset,
-  rwco->qiov->size, rwco->qiov,
-  rwco->flags);
-} else {
-return bdrv_co_pwritev(rwco->child, rwco->offset,
-   rwco->qiov->size, rwco->qiov,
-   rwco->flags);
-}
+return bdrv_co_prwv(rwco->child, rwco->offset, rwco->qiov,
+rwco->is_write, rwco->flags);
 }
 
 /*
  * Process a vectored synchronous request using coroutines
  */
-static int bdrv_prwv_co(BdrvChild *child, int64_t offset,
-QEMUIOVector *qiov, bool is_write,
-BdrvRequestFlags flags)
+static int bdrv_prwv(BdrvChild *child, int64_t offset,
+ QEMUIOVector *qiov, bool is_write,
+ BdrvRequestFlags flags)
 {
 RwCo rwco = {
 .child = child,
@@ -971,8 +975,7 @@ int bdrv_pwrite_zeroes(BdrvChild *child, int64_t offset,
 {
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, bytes);
 
-return bdrv_prwv_co(child, offset, , true,
-BDRV_REQ_ZERO_WRITE | flags);
+return bdrv_prwv(child, offset, , true, BDRV_REQ_ZERO_WRITE | flags);
 }
 
 /*
@@ -1021,7 +1024,7 @@ int bdrv_preadv(BdrvChild *child, int64_t offset, 
QEMUIOVector *qiov)
 {
 int ret;
 
-ret = bdrv_prwv_co(child, offset, qiov, false, 0);
+ret = bdrv_prwv(child, offset, qiov, false, 0);
 if (ret < 0) {
 return ret;
 }
@@ -1045,7 +1048,7 @@ int bdrv_pwritev(BdrvChild *child, int64_t offset, 
QEMUIOVector *qiov)
 {
 int ret;
 
-ret = bdrv_prwv_co(child, offset, qiov, true, 0);
+ret = bdrv_prwv(child, offset, qiov, true, 0);
 if (ret < 0) {
 return ret;
 }
@@ -2449,14 +2452,15 @@ early_out:
 return ret;
 }
 
-static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
-   BlockDriverState *base,
-   bool want_zero,
-   int64_t offset,
-   int64_t bytes,
-   int64_t *pnum,
-   int64_t *map,
-   BlockDriverState **file)
+static int coroutine_fn
+bdrv_co_common_block_status_above(BlockDriverState *bs,
+  BlockDriverState *base,
+  bool want_zero,
+  int64_t offset,
+  int64_t bytes,
+  int64_t *pnum,
+  int64_t *map,
+  BlockDriverState **file)
 {
 BlockDriverState *p;
 int ret = 0;
@@ -2494,10 +2498,10 @@ static int coroutine_fn 
bdrv_block_status_above_co_entry(void *opaque)
 {
 BdrvCoBlockStatusData *data = opaque;
 
-return bdrv_co_block_status_above(data->bs, data->base,
-  data->want_zero,
-  data->offset, data->bytes,
-  data->pnum, data->map, data->file);
+

[PULL 15/17] docs: add 'io_uring' option to 'aio' param in qemu-options.hx

2020-09-30 Thread Stefan Hajnoczi

From: Stefano Garzarella 

When we added io_uring AIO engine, we forgot to update qemu-options.hx,
so qemu(1) man page and qemu help were outdated.

Signed-off-by: Stefano Garzarella 
Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Julia Suvorova 
Reviewed-by: Pankaj Gupta 
Message-Id: <20200924151511.131471-1-sgarz...@redhat.com>
---
 qemu-options.hx | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 3564c2303f..1da52a269c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1053,7 +1053,8 @@ SRST
 The path to the image file in the local filesystem
 
 ``aio``
-Specifies the AIO backend (threads/native, default: threads)
+Specifies the AIO backend (threads/native/io_uring,
+default: threads)
 
 ``locking``
 Specifies whether the image file is protected with Linux OFD
@@ -1175,7 +1176,8 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
 "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
 "   
[,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
 "   [,snapshot=on|off][,rerror=ignore|stop|report]\n"
-"   
[,werror=ignore|stop|report|enospc][,id=name][,aio=threads|native]\n"
+"   [,werror=ignore|stop|report|enospc][,id=name]\n"
+"   [,aio=threads|native|io_uring]\n"
 "   [,readonly=on|off][,copy-on-read=on|off]\n"
 "   [,discard=ignore|unmap][,detect-zeroes=on|off|unmap]\n"
 "   [[,bps=b]|[[,bps_rd=r][,bps_wr=w]]]\n"
@@ -1247,8 +1249,8 @@ SRST
 The default mode is ``cache=writeback``.
 
 ``aio=aio``
-aio is "threads", or "native" and selects between pthread based
-disk I/O and native Linux AIO.
+aio is "threads", "native", or "io_uring" and selects between pthread
+based disk I/O, native Linux AIO, or Linux io_uring API.
 
 ``format=format``
 Specify which disk format will be used rather than detecting the
-- 
2.26.2

1 2 >

1 - 100 of 113 matches

Mail list logo