[PATCH] qdev-monitor: QAPIfy QMP device_add

2024-07-08 Thread Stefan Hajnoczi
The QMP device_add monitor command converts the QDict arguments to
QemuOpts and then back again to QDict. This process only supports scalar
types. Device properties like virtio-blk-pci's iothread-vq-mapping (an
array of objects) are silently dropped by qemu_opts_from_qdict() during
the QemuOpts conversion even though QAPI is capable of validating them.
As a result, hotplugging virtio-blk-pci devices with the
iothread-vq-mapping property does not work as expected (the property is
ignored). It's time to QAPIfy QMP device_add!

Get rid of the QemuOpts conversion in qmp_device_add() and call
qdev_device_add_from_qdict() with from_json=true. Using the QMP
command's QDict arguments directly allows non-scalar properties.

The HMP is also adjusted since qmp_device_add()'s now expects properly
typed JSON arguments and cannot be used from HMP anymore. Move the code
that was previously in qmp_device_add() (with QemuOpts conversion and
from_json=false) into hmp_device_add() so that its behavior is
unchanged.

This patch changes the behavior of QMP device_add but not HMP
device_add. QMP clients that sent incorrectly typed device_add QMP
commands no longer work. This is a breaking change but clients should be
using the correct types already. See the netdev_add QAPIfication in
commit db2a380c8457 for similar reasoning.

Markus helped me figure this out and even provided a draft patch. The
code ended up very close to what he suggested.

Suggested-by: Markus Armbruster 
Cc: Daniel P. Berrangé 
Signed-off-by: Stefan Hajnoczi 
---
 system/qdev-monitor.c | 41 -
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 6af6ef7d66..1427aa173c 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -849,18 +849,9 @@ void hmp_info_qdm(Monitor *mon, const QDict *qdict)
 
 void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
 {
-QemuOpts *opts;
 DeviceState *dev;
 
-opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, errp);
-if (!opts) {
-return;
-}
-if (!monitor_cur_is_qmp() && qdev_device_help(opts)) {
-qemu_opts_del(opts);
-return;
-}
-dev = qdev_device_add(opts, errp);
+dev = qdev_device_add_from_qdict(qdict, true, errp);
 if (!dev) {
 /*
  * Drain all pending RCU callbacks. This is done because
@@ -872,8 +863,6 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp)
  * to the user
  */
 drain_call_rcu();
-
-qemu_opts_del(opts);
 return;
 }
 object_unref(OBJECT(dev));
@@ -967,8 +956,34 @@ void qmp_device_del(const char *id, Error **errp)
 void hmp_device_add(Monitor *mon, const QDict *qdict)
 {
 Error *err = NULL;
+QemuOpts *opts;
+DeviceState *dev;
 
-qmp_device_add((QDict *)qdict, NULL, );
+opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, );
+if (!opts) {
+goto out;
+}
+if (qdev_device_help(opts)) {
+qemu_opts_del(opts);
+return;
+}
+dev = qdev_device_add(opts, );
+if (!dev) {
+/*
+ * Drain all pending RCU callbacks. This is done because
+ * some bus related operations can delay a device removal
+ * (in this case this can happen if device is added and then
+ * removed due to a configuration error)
+ * to a RCU callback, but user might expect that this interface
+ * will finish its job completely once qmp command returns result
+ * to the user
+ */
+drain_call_rcu();
+
+qemu_opts_del(opts);
+}
+object_unref(OBJECT(dev));
+out:
 hmp_handle_error(mon, err);
 }
 
-- 
2.45.2




Re: [PATCH v7 00/10] Support persistent reservation operations

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:04PM +0800, Changqi Lu wrote:
> Hi,
> 
> Patch v7 has been modified.
> Thanks again to Stefan for reviewing the code.
> 
> v6->v7:
> - Add buferlen size check at SCSI layer.
> - Add pr_cap calculation in bdrv_merge_limits() function at block layer,
>   so the ugly bs->file->bs->bl.pr_cap in scsi and nvme layers was
>   changed to bs->bl.pr_cap.
> - Fix memory leak at iscsi driver, and some other spelling errors.

I have left comments. Thanks!

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v7 06/10] block/nvme: add reservation command protocol constants

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:10PM +0800, Changqi Lu wrote:
> Add constants for the NVMe persistent command protocol.
> The constants include the reservation command opcode and
> reservation type values defined in section 7 of the NVMe
> 2.0 specification.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/block/nvme.h | 61 
>  1 file changed, 61 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v7 00/10] Support persistent reservation operations

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:04PM +0800, Changqi Lu wrote:
> Hi,
> 
> Patch v7 has been modified.
> Thanks again to Stefan for reviewing the code.
> 
> v6->v7:
> - Add buferlen size check at SCSI layer.
> - Add pr_cap calculation in bdrv_merge_limits() function at block layer,
>   so the ugly bs->file->bs->bl.pr_cap in scsi and nvme layers was
>   changed to bs->bl.pr_cap.
> - Fix memory leak at iscsi driver, and some other spelling errors.

I have left comments. Thanks!

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v7 10/10] block/iscsi: add persistent reservation in/out driver

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:14PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations for iscsi driver.
> The following methods are implemented: bdrv_co_pr_read_keys,
> bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
> bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/iscsi.c | 431 ++
>  1 file changed, 431 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 2ff14b7472..9a546f48de 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -96,6 +96,7 @@ typedef struct IscsiLun {
>  unsigned long *allocmap_valid;
>  long allocmap_size;
>  int cluster_size;
> +uint8_t pr_cap;
>  bool use_16_for_rw;
>  bool write_protected;
>  bool lbpme;
> @@ -280,6 +281,10 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
> status,
>  iTask->err_code = -error;
>  iTask->err_str = g_strdup(iscsi_get_error(iscsi));
>  }
> +} else if (status == SCSI_STATUS_RESERVATION_CONFLICT) {
> +iTask->err_code = -EBADE;
> +error_report("iSCSI Persistent Reservation Conflict: %s",
> + iscsi_get_error(iscsi));
>  }
>  }
>  }
> @@ -1792,6 +1797,52 @@ static void iscsi_save_designator(IscsiLun *lun,
>  }
>  }
>  
> +static void iscsi_get_pr_cap_sync(IscsiLun *iscsilun, Error **errp)
> +{
> +struct scsi_task *task = NULL;
> +struct scsi_persistent_reserve_in_report_capabilities *rc = NULL;
> +int retries = ISCSI_CMD_RETRIES;
> +int xferlen = sizeof(struct 
> scsi_persistent_reserve_in_report_capabilities);
> +
> +do {
> +if (task != NULL) {
> +scsi_free_scsi_task(task);
> +task = NULL;
> +}
> +
> +task = iscsi_persistent_reserve_in_sync(iscsilun->iscsi,
> +   iscsilun->lun, SCSI_PR_IN_REPORT_CAPABILITIES, xferlen);
> +if (task != NULL && task->status == SCSI_STATUS_GOOD) {
> +rc = scsi_datain_unmarshall(task);
> +if (rc == NULL) {
> +error_setg(errp,
> +"iSCSI: Failed to unmarshall report capabilities data.");
> +} else {
> +iscsilun->pr_cap =
> +
> scsi_pr_cap_to_block(rc->persistent_reservation_type_mask);
> +iscsilun->pr_cap |= (rc->ptpl_a) ? BLK_PR_CAP_PTPL : 0;
> +}
> +break;
> +}
> +
> +if (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> +&& task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
> +break;
> +}
> +
> +} while (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> + && task->sense.key == SCSI_SENSE_UNIT_ATTENTION
> + && retries-- > 0);

The if statement is the same condition as the while statement (except
for the retry counter)? It looks like retrying logic won't work in
practice because the if statement breaks early.

> +
> +if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +error_setg(errp, "iSCSI: failed to send report capabilities 
> command");
> +}

Did you test this function against a SCSI target that does not implement
the optional PERSISTENT RESERVE IN operation? iscsi_open() must succeed
when the target does not implement this.

> +
> +if (task) {
> +scsi_free_scsi_task(task);
> +}
> +}
> +
>  static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>Error **errp)
>  {
> @@ -2024,6 +2075,11 @@ static int iscsi_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
>  }
>  
> +iscsi_get_pr_cap_sync(iscsilun, _err);
> +if (local_err != NULL) {
> +error_propagate(errp, local_err);
> +ret = -EINVAL;
> +}
>  out:
>  qemu_opts_del(opts);
>  g_free(initiator_name);
> @@ -2110,6 +2166,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  bs->bl.opt_transfer = pow2floor(iscsilun->bl.opt_xfer_len *
>  iscsilun->block_size);
>  }
> +
> +bs->bl.pr_cap = iscsilun->pr_cap;
>  }
>  
>  /* Note that this will not re-establish a connection with an iSCSI target - 
> it
> @@ -2408,6 +2466,371 @@ out_unlock:
>  return r;
>  }
>  
> +static int coroutine_fn
> +iscsi_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation,
> +  uint32_t num_keys, uint64_t *keys)
> +{
> +IscsiLun *iscsilun = bs->opaque;
> +QEMUIOVector qiov;
> +struct IscsiTask iTask;
> +int xferlen = sizeof(struct scsi_persistent_reserve_in_read_keys) +
> +  sizeof(uint64_t) * num_keys;
> +uint8_t *buf = 

Re: [PATCH v7 06/10] block/nvme: add reservation command protocol constants

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:10PM +0800, Changqi Lu wrote:
> Add constants for the NVMe persistent command protocol.
> The constants include the reservation command opcode and
> reservation type values defined in section 7 of the NVMe
> 2.0 specification.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/block/nvme.h | 61 
>  1 file changed, 61 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v7 10/10] block/iscsi: add persistent reservation in/out driver

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:14PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations for iscsi driver.
> The following methods are implemented: bdrv_co_pr_read_keys,
> bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
> bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/iscsi.c | 431 ++
>  1 file changed, 431 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 2ff14b7472..9a546f48de 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -96,6 +96,7 @@ typedef struct IscsiLun {
>  unsigned long *allocmap_valid;
>  long allocmap_size;
>  int cluster_size;
> +uint8_t pr_cap;
>  bool use_16_for_rw;
>  bool write_protected;
>  bool lbpme;
> @@ -280,6 +281,10 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
> status,
>  iTask->err_code = -error;
>  iTask->err_str = g_strdup(iscsi_get_error(iscsi));
>  }
> +} else if (status == SCSI_STATUS_RESERVATION_CONFLICT) {
> +iTask->err_code = -EBADE;
> +error_report("iSCSI Persistent Reservation Conflict: %s",
> + iscsi_get_error(iscsi));
>  }
>  }
>  }
> @@ -1792,6 +1797,52 @@ static void iscsi_save_designator(IscsiLun *lun,
>  }
>  }
>  
> +static void iscsi_get_pr_cap_sync(IscsiLun *iscsilun, Error **errp)
> +{
> +struct scsi_task *task = NULL;
> +struct scsi_persistent_reserve_in_report_capabilities *rc = NULL;
> +int retries = ISCSI_CMD_RETRIES;
> +int xferlen = sizeof(struct 
> scsi_persistent_reserve_in_report_capabilities);
> +
> +do {
> +if (task != NULL) {
> +scsi_free_scsi_task(task);
> +task = NULL;
> +}
> +
> +task = iscsi_persistent_reserve_in_sync(iscsilun->iscsi,
> +   iscsilun->lun, SCSI_PR_IN_REPORT_CAPABILITIES, xferlen);
> +if (task != NULL && task->status == SCSI_STATUS_GOOD) {
> +rc = scsi_datain_unmarshall(task);
> +if (rc == NULL) {
> +error_setg(errp,
> +"iSCSI: Failed to unmarshall report capabilities data.");
> +} else {
> +iscsilun->pr_cap =
> +
> scsi_pr_cap_to_block(rc->persistent_reservation_type_mask);
> +iscsilun->pr_cap |= (rc->ptpl_a) ? BLK_PR_CAP_PTPL : 0;
> +}
> +break;
> +}
> +
> +if (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> +&& task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
> +break;
> +}
> +
> +} while (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> + && task->sense.key == SCSI_SENSE_UNIT_ATTENTION
> + && retries-- > 0);

The if statement is the same condition as the while statement (except
for the retry counter)? It looks like retrying logic won't work in
practice because the if statement breaks early.

> +
> +if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +error_setg(errp, "iSCSI: failed to send report capabilities 
> command");
> +}

Did you test this function against a SCSI target that does not implement
the optional PERSISTENT RESERVE IN operation? iscsi_open() must succeed
when the target does not implement this.

> +
> +if (task) {
> +scsi_free_scsi_task(task);
> +}
> +}
> +
>  static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>Error **errp)
>  {
> @@ -2024,6 +2075,11 @@ static int iscsi_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
>  }
>  
> +iscsi_get_pr_cap_sync(iscsilun, _err);
> +if (local_err != NULL) {
> +error_propagate(errp, local_err);
> +ret = -EINVAL;
> +}
>  out:
>  qemu_opts_del(opts);
>  g_free(initiator_name);
> @@ -2110,6 +2166,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  bs->bl.opt_transfer = pow2floor(iscsilun->bl.opt_xfer_len *
>  iscsilun->block_size);
>  }
> +
> +bs->bl.pr_cap = iscsilun->pr_cap;
>  }
>  
>  /* Note that this will not re-establish a connection with an iSCSI target - 
> it
> @@ -2408,6 +2466,371 @@ out_unlock:
>  return r;
>  }
>  
> +static int coroutine_fn
> +iscsi_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation,
> +  uint32_t num_keys, uint64_t *keys)
> +{
> +IscsiLun *iscsilun = bs->opaque;
> +QEMUIOVector qiov;
> +struct IscsiTask iTask;
> +int xferlen = sizeof(struct scsi_persistent_reserve_in_read_keys) +
> +  sizeof(uint64_t) * num_keys;
> +uint8_t *buf = 

Re: [PATCH v7 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:09PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/scsi/scsi-disk.c | 368 
>  1 file changed, 368 insertions(+)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..f0c3ce774f 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -42,6 +43,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "qom/object.h"
> +#include "block/block_int.h"
>  
>  #ifdef __linux
>  #include 
> @@ -1474,6 +1476,362 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(>req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +SCSIDiskReq *req;
> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +SCSIDiskReq *req;
> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(>req);
> +num_keys = MIN(blk_keys->num_keys, ret);

The behavior of scsi_disk_req_check_error() above is strange for
pr_read_keys operations. When --drive ...,rerror=ignore and ret < 0 this
line is reached and we don't want a negative num_keys value. It would be
safer to use (ret > 0 ? ret : 0) instead of the raw value of ret.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v7 08/10] hw/nvme: enable ONCS and rescap function

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:12PM +0800, Changqi Lu wrote:
> This commit enables ONCS to support the reservation
> function at the controller level. Also enables rescap
> function in the namespace by detecting the supported reservation
> function in the backend driver.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/nvme/ctrl.c   | 3 ++-
>  hw/nvme/ns.c | 5 +
>  include/block/nvme.h | 2 +-
>  3 files changed, 8 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v7 08/10] hw/nvme: enable ONCS and rescap function

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:12PM +0800, Changqi Lu wrote:
> This commit enables ONCS to support the reservation
> function at the controller level. Also enables rescap
> function in the namespace by detecting the supported reservation
> function in the backend driver.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/nvme/ctrl.c   | 3 ++-
>  hw/nvme/ns.c | 5 +
>  include/block/nvme.h | 2 +-
>  3 files changed, 8 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v7 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-07-08 Thread Stefan Hajnoczi
On Fri, Jul 05, 2024 at 06:56:09PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/scsi/scsi-disk.c | 368 
>  1 file changed, 368 insertions(+)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..f0c3ce774f 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -42,6 +43,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "qom/object.h"
> +#include "block/block_int.h"
>  
>  #ifdef __linux
>  #include 
> @@ -1474,6 +1476,362 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(>req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +SCSIDiskReq *req;
> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +SCSIDiskReq *req;
> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(>req);
> +num_keys = MIN(blk_keys->num_keys, ret);

The behavior of scsi_disk_req_check_error() above is strange for
pr_read_keys operations. When --drive ...,rerror=ignore and ret < 0 this
line is reached and we don't want a negative num_keys value. It would be
safer to use (ret > 0 ? ret : 0) instead of the raw value of ret.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v6 08/10] hw/nvme: enable ONCS and rescap function

2024-07-05 Thread Stefan Hajnoczi
On Thu, Jul 04, 2024 at 08:20:31PM +0200, Stefan Hajnoczi wrote:
> On Thu, Jun 13, 2024 at 03:13:25PM +0800, Changqi Lu wrote:
> > This commit enables ONCS to support the reservation
> > function at the controller level. Also enables rescap
> > function in the namespace by detecting the supported reservation
> > function in the backend driver.
> > 
> > Signed-off-by: Changqi Lu 
> > Signed-off-by: zhenwei pi 
> > ---
> >  hw/nvme/ctrl.c | 3 ++-
> >  hw/nvme/ns.c   | 5 +
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index 127c3d2383..182307a48b 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -8248,7 +8248,8 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> > *pci_dev)
> >  id->nn = cpu_to_le32(NVME_MAX_NAMESPACES);
> >  id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROES | NVME_ONCS_TIMESTAMP |
> > NVME_ONCS_FEATURES | NVME_ONCS_DSM |
> > -   NVME_ONCS_COMPARE | NVME_ONCS_COPY);
> > +   NVME_ONCS_COMPARE | NVME_ONCS_COPY |
> > +   NVME_ONCS_RESRVATIONS);
> 
> RESRVATIONS -> RESERVATIONS typo?
> 
> >  
> >  /*
> >   * NOTE: If this device ever supports a command set that does NOT use 
> > 0x0
> > diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> > index ea8db175db..320c9bf658 100644
> > --- a/hw/nvme/ns.c
> > +++ b/hw/nvme/ns.c
> > @@ -20,6 +20,7 @@
> >  #include "qemu/bitops.h"
> >  #include "sysemu/sysemu.h"
> >  #include "sysemu/block-backend.h"
> > +#include "block/block_int.h"
> >  
> >  #include "nvme.h"
> >  #include "trace.h"
> > @@ -33,6 +34,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
> >  BlockDriverInfo bdi;
> >  int npdg, ret;
> >  int64_t nlbas;
> > +uint8_t blk_pr_cap;
> >  
> >  ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
> >  ns->lbasz = 1 << ns->lbaf.ds;
> > @@ -55,6 +57,9 @@ void nvme_ns_init_format(NvmeNamespace *ns)
> >  }
> >  
> >  id_ns->npda = id_ns->npdg = npdg - 1;
> > +
> > +blk_pr_cap = blk_bs(ns->blkconf.blk)->file->bs->bl.pr_cap;
> 
> Kevin: This unprotected block graph access and the assumption that
> ->file->bs exists could be problematic. What is the best practice for
> making this code safe and defensive?

I posted the following reply in another sub-thread and it seems worth
mentioning here:

"->file could be NULL if the SCSI disk points directly to
--blockdev file without a --blockdev raw on top. I think the block layer
should propagate pr_cap from the leaves of the block graph to the root
node via bdrv_merge_limits() so that traversing the graph (->file) is
not necessary. Instead this line should just be bs->bl.pr_cap."

I think ->file shouldn't be accessed at all. That also sidesteps the
block graph locking question.

> 
> > +id_ns->rescap = block_pr_cap_to_nvme(blk_pr_cap);
> >  }
> >  
> >  static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
> > -- 
> > 2.20.1
> > 




signature.asc
Description: PGP signature


Re: [PATCH v6 08/10] hw/nvme: enable ONCS and rescap function

2024-07-05 Thread Stefan Hajnoczi
On Thu, Jul 04, 2024 at 08:20:31PM +0200, Stefan Hajnoczi wrote:
> On Thu, Jun 13, 2024 at 03:13:25PM +0800, Changqi Lu wrote:
> > This commit enables ONCS to support the reservation
> > function at the controller level. Also enables rescap
> > function in the namespace by detecting the supported reservation
> > function in the backend driver.
> > 
> > Signed-off-by: Changqi Lu 
> > Signed-off-by: zhenwei pi 
> > ---
> >  hw/nvme/ctrl.c | 3 ++-
> >  hw/nvme/ns.c   | 5 +
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index 127c3d2383..182307a48b 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -8248,7 +8248,8 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> > *pci_dev)
> >  id->nn = cpu_to_le32(NVME_MAX_NAMESPACES);
> >  id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROES | NVME_ONCS_TIMESTAMP |
> > NVME_ONCS_FEATURES | NVME_ONCS_DSM |
> > -   NVME_ONCS_COMPARE | NVME_ONCS_COPY);
> > +   NVME_ONCS_COMPARE | NVME_ONCS_COPY |
> > +   NVME_ONCS_RESRVATIONS);
> 
> RESRVATIONS -> RESERVATIONS typo?
> 
> >  
> >  /*
> >   * NOTE: If this device ever supports a command set that does NOT use 
> > 0x0
> > diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> > index ea8db175db..320c9bf658 100644
> > --- a/hw/nvme/ns.c
> > +++ b/hw/nvme/ns.c
> > @@ -20,6 +20,7 @@
> >  #include "qemu/bitops.h"
> >  #include "sysemu/sysemu.h"
> >  #include "sysemu/block-backend.h"
> > +#include "block/block_int.h"
> >  
> >  #include "nvme.h"
> >  #include "trace.h"
> > @@ -33,6 +34,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
> >  BlockDriverInfo bdi;
> >  int npdg, ret;
> >  int64_t nlbas;
> > +uint8_t blk_pr_cap;
> >  
> >  ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
> >  ns->lbasz = 1 << ns->lbaf.ds;
> > @@ -55,6 +57,9 @@ void nvme_ns_init_format(NvmeNamespace *ns)
> >  }
> >  
> >  id_ns->npda = id_ns->npdg = npdg - 1;
> > +
> > +blk_pr_cap = blk_bs(ns->blkconf.blk)->file->bs->bl.pr_cap;
> 
> Kevin: This unprotected block graph access and the assumption that
> ->file->bs exists could be problematic. What is the best practice for
> making this code safe and defensive?

I posted the following reply in another sub-thread and it seems worth
mentioning here:

"->file could be NULL if the SCSI disk points directly to
--blockdev file without a --blockdev raw on top. I think the block layer
should propagate pr_cap from the leaves of the block graph to the root
node via bdrv_merge_limits() so that traversing the graph (->file) is
not necessary. Instead this line should just be bs->bl.pr_cap."

I think ->file shouldn't be accessed at all. That also sidesteps the
block graph locking question.

> 
> > +id_ns->rescap = block_pr_cap_to_nvme(blk_pr_cap);
> >  }
> >  
> >  static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
> > -- 
> > 2.20.1
> > 




signature.asc
Description: PGP signature


Re: [PATCH v6 09/10] hw/nvme: add reservation protocal command

2024-07-04 Thread Stefan Hajnoczi
I will skip this since Klaus Jensen's review is required for NVMe anyway.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v6 09/10] hw/nvme: add reservation protocal command

2024-07-04 Thread Stefan Hajnoczi
I will skip this since Klaus Jensen's review is required for NVMe anyway.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v6 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:22PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/scsi/scsi-disk.c | 352 
>  1 file changed, 352 insertions(+)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..0e964dbd87 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -42,6 +43,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "qom/object.h"
> +#include "block/block_int.h"
>  
>  #ifdef __linux
>  #include 
> @@ -1474,6 +1476,346 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(>req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +void *req;
> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +void *req;
> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(>req);
> +num_keys = MIN(blk_keys->num_keys, ret);
> +blk_keys->generation = cpu_to_be32(blk_keys->generation);
> +memcpy([0], _keys->generation, 4);
> +for (int i = 0; i < num_keys; i++) {
> +blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
> +memcpy([8 + i * 8], _keys->keys[i], 8);
> +}
> +num_keys = cpu_to_be32(num_keys * 8);
> +memcpy([4], _keys, 4);
> +
> +scsi_req_data(>req, r->buflen);
> +done:
> +scsi_req_unref(>req);
> +g_free(blk_keys->keys);
> +g_free(blk_keys);
> +}
> +
> +static int scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
> +{
> +SCSIPrReadKeys *blk_keys;
> +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
> +int buflen = MIN(r->req.cmd.xfer, r->buflen);
> +int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);
> +
> +blk_keys = g_new0(SCSIPrReadKeys, 1);
> +blk_keys->generation = 0;
> +/* num_keys is the maximum number of keys that can be transmitted */
> +blk_keys->num_keys = num_keys;
> +blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
> +blk_keys->req = r;
> +
> +/* The request is used as the AIO opaque value, so add a ref.  */
> +scsi_req_ref(>req);
> +r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
> _keys->generation,
> +blk_keys->num_keys, blk_keys->keys,
> +scsi_pr_read_keys_complete, 
> blk_keys);
> +return 0;
> +}
> +
> +static void scsi_pr_read_reservation_complete(void *opaque, int ret)
> +{
> +uint8_t *buf;
> +uint32_t additional_len = 0;
> +SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_rsv->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(>req);
> +blk_rsv->generation = cpu_to_be32(blk_rsv->generation);
> +memcpy([0], _rsv->generation, 4);
> +if (ret) {
> +additional_len = cpu_to_be32(16);
> +blk_rsv->key = cpu_to_be64(blk_rsv->key);
> +memcpy([8], _rsv->key, 8);
> +buf[21] = block_pr_type_to_scsi(blk_rsv->type) & 0xf;
> +} else {
> +additional_len = cpu_to_be32(0);
> +}
> +
> +memcpy([4], _len, 4);
> + 

Re: [PATCH v6 10/10] block/iscsi: add persistent reservation in/out driver

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:27PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations for iscsi driver.
> The following methods are implemented: bdrv_co_pr_read_keys,
> bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
> bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/iscsi.c | 443 ++
>  1 file changed, 443 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 2ff14b7472..d94ebe35bd 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -96,6 +96,7 @@ typedef struct IscsiLun {
>  unsigned long *allocmap_valid;
>  long allocmap_size;
>  int cluster_size;
> +uint8_t pr_cap;
>  bool use_16_for_rw;
>  bool write_protected;
>  bool lbpme;
> @@ -280,6 +281,8 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
> status,
>  iTask->err_code = -error;
>  iTask->err_str = g_strdup(iscsi_get_error(iscsi));
>  }
> +} else if (status == SCSI_STATUS_RESERVATION_CONFLICT) {
> +iTask->err_code = -EBADE;

Should err_str be set too? For example, iscsi_co_writev() seems to
assume err_str is set if the iSCSI task fails.

>  }
>  }
>  }
> @@ -1792,6 +1795,52 @@ static void iscsi_save_designator(IscsiLun *lun,
>  }
>  }
>  
> +static void iscsi_get_pr_cap_sync(IscsiLun *iscsilun, Error **errp)
> +{
> +struct scsi_task *task = NULL;
> +struct scsi_persistent_reserve_in_report_capabilities *rc = NULL;
> +int retries = ISCSI_CMD_RETRIES;
> +int xferlen = sizeof(struct 
> scsi_persistent_reserve_in_report_capabilities);
> +
> +do {
> +if (task != NULL) {
> +scsi_free_scsi_task(task);
> +task = NULL;
> +}
> +
> +task = iscsi_persistent_reserve_in_sync(iscsilun->iscsi,
> +   iscsilun->lun, SCSI_PR_IN_REPORT_CAPABILITIES, xferlen);
> +if (task != NULL && task->status == SCSI_STATUS_GOOD) {
> +rc = scsi_datain_unmarshall(task);
> +if (rc == NULL) {
> +error_setg(errp,
> +"iSCSI: Failed to unmarshall report capabilities data.");
> +} else {
> +iscsilun->pr_cap =
> +
> scsi_pr_cap_to_block(rc->persistent_reservation_type_mask);
> +iscsilun->pr_cap |= (rc->ptpl_a) ? BLK_PR_CAP_PTPL : 0;
> +}
> +break;
> +}
> +
> +if (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> +&& task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
> +break;
> +}
> +
> +} while (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> + && task->sense.key == SCSI_SENSE_UNIT_ATTENTION
> + && retries-- > 0);
> +
> +if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +error_setg(errp, "iSCSI: failed to send report capabilities 
> command");
> +}
> +
> +if (task) {
> +scsi_free_scsi_task(task);
> +}
> +}
> +
>  static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>Error **errp)
>  {
> @@ -2024,6 +2073,11 @@ static int iscsi_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
>  }
>  
> +iscsi_get_pr_cap_sync(iscsilun, _err);
> +if (local_err != NULL) {
> +error_propagate(errp, local_err);
> +ret = -EINVAL;
> +}
>  out:
>  qemu_opts_del(opts);
>  g_free(initiator_name);
> @@ -2110,6 +2164,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  bs->bl.opt_transfer = pow2floor(iscsilun->bl.opt_xfer_len *
>  iscsilun->block_size);
>  }
> +
> +bs->bl.pr_cap = iscsilun->pr_cap;
>  }
>  
>  /* Note that this will not re-establish a connection with an iSCSI target - 
> it
> @@ -2408,6 +2464,385 @@ out_unlock:
>  return r;
>  }
>  
> +static int coroutine_fn
> +iscsi_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation,
> +  uint32_t num_keys, uint64_t *keys)
> +{
> +IscsiLun *iscsilun = bs->opaque;
> +QEMUIOVector qiov;
> +struct IscsiTask iTask;
> +int xferlen = sizeof(struct scsi_persistent_reserve_in_read_keys) +
> +  sizeof(uint64_t) * num_keys;
> +uint8_t *buf = g_malloc0(xferlen);
> +int32_t num_collect_keys = 0;
> +int r = 0;
> +
> +qemu_iovec_init_buf(, buf, xferlen);
> +iscsi_co_init_iscsitask(iscsilun, );
> +qemu_mutex_lock(>mutex);
> +retry:
> +iTask.task = iscsi_persistent_reserve_in_task(iscsilun->iscsi,
> + iscsilun->lun, SCSI_PR_IN_READ_KEYS, xferlen,
> + iscsi_co_generic_cb, );
> +
> +if 

Re: [PATCH v6 10/10] block/iscsi: add persistent reservation in/out driver

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:27PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations for iscsi driver.
> The following methods are implemented: bdrv_co_pr_read_keys,
> bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
> bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/iscsi.c | 443 ++
>  1 file changed, 443 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 2ff14b7472..d94ebe35bd 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -96,6 +96,7 @@ typedef struct IscsiLun {
>  unsigned long *allocmap_valid;
>  long allocmap_size;
>  int cluster_size;
> +uint8_t pr_cap;
>  bool use_16_for_rw;
>  bool write_protected;
>  bool lbpme;
> @@ -280,6 +281,8 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
> status,
>  iTask->err_code = -error;
>  iTask->err_str = g_strdup(iscsi_get_error(iscsi));
>  }
> +} else if (status == SCSI_STATUS_RESERVATION_CONFLICT) {
> +iTask->err_code = -EBADE;

Should err_str be set too? For example, iscsi_co_writev() seems to
assume err_str is set if the iSCSI task fails.

>  }
>  }
>  }
> @@ -1792,6 +1795,52 @@ static void iscsi_save_designator(IscsiLun *lun,
>  }
>  }
>  
> +static void iscsi_get_pr_cap_sync(IscsiLun *iscsilun, Error **errp)
> +{
> +struct scsi_task *task = NULL;
> +struct scsi_persistent_reserve_in_report_capabilities *rc = NULL;
> +int retries = ISCSI_CMD_RETRIES;
> +int xferlen = sizeof(struct 
> scsi_persistent_reserve_in_report_capabilities);
> +
> +do {
> +if (task != NULL) {
> +scsi_free_scsi_task(task);
> +task = NULL;
> +}
> +
> +task = iscsi_persistent_reserve_in_sync(iscsilun->iscsi,
> +   iscsilun->lun, SCSI_PR_IN_REPORT_CAPABILITIES, xferlen);
> +if (task != NULL && task->status == SCSI_STATUS_GOOD) {
> +rc = scsi_datain_unmarshall(task);
> +if (rc == NULL) {
> +error_setg(errp,
> +"iSCSI: Failed to unmarshall report capabilities data.");
> +} else {
> +iscsilun->pr_cap =
> +
> scsi_pr_cap_to_block(rc->persistent_reservation_type_mask);
> +iscsilun->pr_cap |= (rc->ptpl_a) ? BLK_PR_CAP_PTPL : 0;
> +}
> +break;
> +}
> +
> +if (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> +&& task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
> +break;
> +}
> +
> +} while (task != NULL && task->status == SCSI_STATUS_CHECK_CONDITION
> + && task->sense.key == SCSI_SENSE_UNIT_ATTENTION
> + && retries-- > 0);
> +
> +if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +error_setg(errp, "iSCSI: failed to send report capabilities 
> command");
> +}
> +
> +if (task) {
> +scsi_free_scsi_task(task);
> +}
> +}
> +
>  static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>Error **errp)
>  {
> @@ -2024,6 +2073,11 @@ static int iscsi_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
>  }
>  
> +iscsi_get_pr_cap_sync(iscsilun, _err);
> +if (local_err != NULL) {
> +error_propagate(errp, local_err);
> +ret = -EINVAL;
> +}
>  out:
>  qemu_opts_del(opts);
>  g_free(initiator_name);
> @@ -2110,6 +2164,8 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  bs->bl.opt_transfer = pow2floor(iscsilun->bl.opt_xfer_len *
>  iscsilun->block_size);
>  }
> +
> +bs->bl.pr_cap = iscsilun->pr_cap;
>  }
>  
>  /* Note that this will not re-establish a connection with an iSCSI target - 
> it
> @@ -2408,6 +2464,385 @@ out_unlock:
>  return r;
>  }
>  
> +static int coroutine_fn
> +iscsi_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation,
> +  uint32_t num_keys, uint64_t *keys)
> +{
> +IscsiLun *iscsilun = bs->opaque;
> +QEMUIOVector qiov;
> +struct IscsiTask iTask;
> +int xferlen = sizeof(struct scsi_persistent_reserve_in_read_keys) +
> +  sizeof(uint64_t) * num_keys;
> +uint8_t *buf = g_malloc0(xferlen);
> +int32_t num_collect_keys = 0;
> +int r = 0;
> +
> +qemu_iovec_init_buf(, buf, xferlen);
> +iscsi_co_init_iscsitask(iscsilun, );
> +qemu_mutex_lock(>mutex);
> +retry:
> +iTask.task = iscsi_persistent_reserve_in_task(iscsilun->iscsi,
> + iscsilun->lun, SCSI_PR_IN_READ_KEYS, xferlen,
> + iscsi_co_generic_cb, );
> +
> +if 

Re: [PATCH v6 06/10] block/nvme: add reservation command protocol constants

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:23PM +0800, Changqi Lu wrote:
> Add constants for the NVMe persistent command protocol.
> The constants include the reservation command opcode and
> reservation type values defined in section 7 of the NVMe
> 2.0 specification.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/block/nvme.h | 61 
>  1 file changed, 61 insertions(+)
> 
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index bb231d0b9a..da6ccb0f3b 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -633,6 +633,10 @@ enum NvmeIoCommands {
>  NVME_CMD_WRITE_ZEROES   = 0x08,
>  NVME_CMD_DSM= 0x09,
>  NVME_CMD_VERIFY = 0x0c,
> +NVME_CMD_RESV_REGISTER  = 0x0d,
> +NVME_CMD_RESV_REPORT= 0x0e,
> +NVME_CMD_RESV_ACQUIRE   = 0x11,
> +NVME_CMD_RESV_RELEASE   = 0x15,
>  NVME_CMD_IO_MGMT_RECV   = 0x12,

Keep NVME_CMD_IO_MGMT_RECV (0x12) before NVME_CMD_RESV_RELEASE (0x15) in
sorted order?

>  NVME_CMD_COPY   = 0x19,
>  NVME_CMD_IO_MGMT_SEND   = 0x1d,
> @@ -641,6 +645,63 @@ enum NvmeIoCommands {
>  NVME_CMD_ZONE_APPEND= 0x7d,
>  };
>  
> +typedef enum {
> +NVME_RESV_REGISTER_ACTION_REGISTER  = 0x00,
> +NVME_RESV_REGISTER_ACTION_UNREGISTER= 0x01,
> +NVME_RESV_REGISTER_ACTION_REPLACE   = 0x02,
> +} NvmeReservationRegisterAction;
> +
> +typedef enum {
> +NVME_RESV_RELEASE_ACTION_RELEASE= 0x00,
> +NVME_RESV_RELEASE_ACTION_CLEAR  = 0x01,
> +} NvmeReservationReleaseAction;
> +
> +typedef enum {
> +NVME_RESV_ACQUIRE_ACTION_ACQUIRE= 0x00,
> +NVME_RESV_ACQUIRE_ACTION_PREEMPT= 0x01,
> +NVME_RESV_ACQUIRE_ACTION_PREEMPT_AND_ABORT  = 0x02,
> +} NvmeReservationAcquireAction;
> +
> +typedef enum {
> +NVME_RESV_WRITE_EXCLUSIVE   = 0x01,
> +NVME_RESV_EXCLUSIVE_ACCESS  = 0x02,
> +NVME_RESV_WRITE_EXCLUSIVE_REGS_ONLY = 0x03,
> +NVME_RESV_EXCLUSIVE_ACCESS_REGS_ONLY= 0x04,
> +NVME_RESV_WRITE_EXCLUSIVE_ALL_REGS  = 0x05,
> +NVME_RESV_EXCLUSIVE_ACCESS_ALL_REGS = 0x06,
> +} NvmeResvType;
> +
> +typedef enum {
> +NVME_RESV_PTPL_NO_CHANGE = 0x00,
> +NVME_RESV_PTPL_DISABLE   = 0x02,
> +NVME_RESV_PTPL_ENABLE= 0x03,
> +} NvmeResvPTPL;
> +
> +typedef enum NVMEPrCap {
> +/* Persist Through Power Loss */
> +NVME_PR_CAP_PTPL = 1 << 0,
> +/* Write Exclusive reservation type */
> +NVME_PR_CAP_WR_EX = 1 << 1,
> +/* Exclusive Access reservation type */
> +NVME_PR_CAP_EX_AC = 1 << 2,
> +/* Write Exclusive Registrants Only reservation type */
> +NVME_PR_CAP_WR_EX_RO = 1 << 3,
> +/* Exclusive Access Registrants Only reservation type */
> +NVME_PR_CAP_EX_AC_RO = 1 << 4,
> +/* Write Exclusive All Registrants reservation type */
> +NVME_PR_CAP_WR_EX_AR = 1 << 5,
> +/* Exclusive Access All Registrants reservation type */
> +NVME_PR_CAP_EX_AC_AR = 1 << 6,
> +
> +NVME_PR_CAP_ALL = (NVME_PR_CAP_PTPL |
> +  NVME_PR_CAP_WR_EX |
> +  NVME_PR_CAP_EX_AC |
> +  NVME_PR_CAP_WR_EX_RO |
> +  NVME_PR_CAP_EX_AC_RO |
> +  NVME_PR_CAP_WR_EX_AR |
> +  NVME_PR_CAP_EX_AC_AR),
> +} NvmePrCap;
> +
>  typedef struct QEMU_PACKED NvmeDeleteQ {
>  uint8_t opcode;
>  uint8_t flags;
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature


Re: [PATCH v6 08/10] hw/nvme: enable ONCS and rescap function

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:25PM +0800, Changqi Lu wrote:
> This commit enables ONCS to support the reservation
> function at the controller level. Also enables rescap
> function in the namespace by detecting the supported reservation
> function in the backend driver.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/nvme/ctrl.c | 3 ++-
>  hw/nvme/ns.c   | 5 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 127c3d2383..182307a48b 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -8248,7 +8248,8 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  id->nn = cpu_to_le32(NVME_MAX_NAMESPACES);
>  id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROES | NVME_ONCS_TIMESTAMP |
> NVME_ONCS_FEATURES | NVME_ONCS_DSM |
> -   NVME_ONCS_COMPARE | NVME_ONCS_COPY);
> +   NVME_ONCS_COMPARE | NVME_ONCS_COPY |
> +   NVME_ONCS_RESRVATIONS);

RESRVATIONS -> RESERVATIONS typo?

>  
>  /*
>   * NOTE: If this device ever supports a command set that does NOT use 0x0
> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> index ea8db175db..320c9bf658 100644
> --- a/hw/nvme/ns.c
> +++ b/hw/nvme/ns.c
> @@ -20,6 +20,7 @@
>  #include "qemu/bitops.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/block-backend.h"
> +#include "block/block_int.h"
>  
>  #include "nvme.h"
>  #include "trace.h"
> @@ -33,6 +34,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  BlockDriverInfo bdi;
>  int npdg, ret;
>  int64_t nlbas;
> +uint8_t blk_pr_cap;
>  
>  ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
>  ns->lbasz = 1 << ns->lbaf.ds;
> @@ -55,6 +57,9 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  }
>  
>  id_ns->npda = id_ns->npdg = npdg - 1;
> +
> +blk_pr_cap = blk_bs(ns->blkconf.blk)->file->bs->bl.pr_cap;

Kevin: This unprotected block graph access and the assumption that
->file->bs exists could be problematic. What is the best practice for
making this code safe and defensive?

> +id_ns->rescap = block_pr_cap_to_nvme(blk_pr_cap);
>  }
>  
>  static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature


Re: [PATCH v6 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:22PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/scsi/scsi-disk.c | 352 
>  1 file changed, 352 insertions(+)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..0e964dbd87 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -42,6 +43,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "qom/object.h"
> +#include "block/block_int.h"
>  
>  #ifdef __linux
>  #include 
> @@ -1474,6 +1476,346 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(>req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +void *req;
> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +void *req;
> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(>req);
> +num_keys = MIN(blk_keys->num_keys, ret);
> +blk_keys->generation = cpu_to_be32(blk_keys->generation);
> +memcpy([0], _keys->generation, 4);
> +for (int i = 0; i < num_keys; i++) {
> +blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
> +memcpy([8 + i * 8], _keys->keys[i], 8);
> +}
> +num_keys = cpu_to_be32(num_keys * 8);
> +memcpy([4], _keys, 4);
> +
> +scsi_req_data(>req, r->buflen);
> +done:
> +scsi_req_unref(>req);
> +g_free(blk_keys->keys);
> +g_free(blk_keys);
> +}
> +
> +static int scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
> +{
> +SCSIPrReadKeys *blk_keys;
> +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
> +int buflen = MIN(r->req.cmd.xfer, r->buflen);
> +int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);
> +
> +blk_keys = g_new0(SCSIPrReadKeys, 1);
> +blk_keys->generation = 0;
> +/* num_keys is the maximum number of keys that can be transmitted */
> +blk_keys->num_keys = num_keys;
> +blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
> +blk_keys->req = r;
> +
> +/* The request is used as the AIO opaque value, so add a ref.  */
> +scsi_req_ref(>req);
> +r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
> _keys->generation,
> +blk_keys->num_keys, blk_keys->keys,
> +scsi_pr_read_keys_complete, 
> blk_keys);
> +return 0;
> +}
> +
> +static void scsi_pr_read_reservation_complete(void *opaque, int ret)
> +{
> +uint8_t *buf;
> +uint32_t additional_len = 0;
> +SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_rsv->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(>req);
> +blk_rsv->generation = cpu_to_be32(blk_rsv->generation);
> +memcpy([0], _rsv->generation, 4);
> +if (ret) {
> +additional_len = cpu_to_be32(16);
> +blk_rsv->key = cpu_to_be64(blk_rsv->key);
> +memcpy([8], _rsv->key, 8);
> +buf[21] = block_pr_type_to_scsi(blk_rsv->type) & 0xf;
> +} else {
> +additional_len = cpu_to_be32(0);
> +}
> +
> +memcpy([4], _len, 4);
> + 

Re: [PATCH v6 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:22PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 

As mentioned in my reply to a previous version, I don't understand the
buffer allocation/sizing in hw/scsi/ so I haven't been able to fully
review this code for buffer overflows and input validation. cmd.xfer
isn't consistently used for size checks in the new functions. Maybe some
checks are missing?

> ---
>  hw/scsi/scsi-disk.c | 352 
>  1 file changed, 352 insertions(+)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..0e964dbd87 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -42,6 +43,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "qom/object.h"
> +#include "block/block_int.h"
>  
>  #ifdef __linux
>  #include 
> @@ -1474,6 +1476,346 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(>req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +void *req;

Why is this field void * instead of SCSIDiskReq *?

> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +void *req;

Same here.

> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(>req);
> +num_keys = MIN(blk_keys->num_keys, ret);
> +blk_keys->generation = cpu_to_be32(blk_keys->generation);
> +memcpy([0], _keys->generation, 4);
> +for (int i = 0; i < num_keys; i++) {
> +blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
> +memcpy([8 + i * 8], _keys->keys[i], 8);
> +}
> +num_keys = cpu_to_be32(num_keys * 8);
> +memcpy([4], _keys, 4);
> +
> +scsi_req_data(>req, r->buflen);
> +done:
> +scsi_req_unref(>req);
> +g_free(blk_keys->keys);
> +g_free(blk_keys);
> +}
> +
> +static int scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
> +{
> +SCSIPrReadKeys *blk_keys;
> +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
> +int buflen = MIN(r->req.cmd.xfer, r->buflen);
> +int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);

If buflen is an untrusted input then num_keys < 0 and maybe num_keys ==
0 need to be rejected with an error.

> +
> +blk_keys = g_new0(SCSIPrReadKeys, 1);
> +blk_keys->generation = 0;
> +/* num_keys is the maximum number of keys that can be transmitted */
> +blk_keys->num_keys = num_keys;
> +blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
> +blk_keys->req = r;
> +
> +/* The request is used as the AIO opaque value, so add a ref.  */
> +scsi_req_ref(>req);
> +r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
> _keys->generation,
> +blk_keys->num_keys, blk_keys->keys,
> +scsi_pr_read_keys_complete, 
> blk_keys);
> +return 0;
> +}
> +
> +static void scsi_pr_read_reservation_complete(void *opaque, int ret)
> +{
> +uint8_t *buf;
> +uint32_t additional_len = 0;
> +SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_rsv->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto 

Re: [PATCH v6 06/10] block/nvme: add reservation command protocol constants

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:23PM +0800, Changqi Lu wrote:
> Add constants for the NVMe persistent command protocol.
> The constants include the reservation command opcode and
> reservation type values defined in section 7 of the NVMe
> 2.0 specification.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/block/nvme.h | 61 
>  1 file changed, 61 insertions(+)
> 
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index bb231d0b9a..da6ccb0f3b 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -633,6 +633,10 @@ enum NvmeIoCommands {
>  NVME_CMD_WRITE_ZEROES   = 0x08,
>  NVME_CMD_DSM= 0x09,
>  NVME_CMD_VERIFY = 0x0c,
> +NVME_CMD_RESV_REGISTER  = 0x0d,
> +NVME_CMD_RESV_REPORT= 0x0e,
> +NVME_CMD_RESV_ACQUIRE   = 0x11,
> +NVME_CMD_RESV_RELEASE   = 0x15,
>  NVME_CMD_IO_MGMT_RECV   = 0x12,

Keep NVME_CMD_IO_MGMT_RECV (0x12) before NVME_CMD_RESV_RELEASE (0x15) in
sorted order?

>  NVME_CMD_COPY   = 0x19,
>  NVME_CMD_IO_MGMT_SEND   = 0x1d,
> @@ -641,6 +645,63 @@ enum NvmeIoCommands {
>  NVME_CMD_ZONE_APPEND= 0x7d,
>  };
>  
> +typedef enum {
> +NVME_RESV_REGISTER_ACTION_REGISTER  = 0x00,
> +NVME_RESV_REGISTER_ACTION_UNREGISTER= 0x01,
> +NVME_RESV_REGISTER_ACTION_REPLACE   = 0x02,
> +} NvmeReservationRegisterAction;
> +
> +typedef enum {
> +NVME_RESV_RELEASE_ACTION_RELEASE= 0x00,
> +NVME_RESV_RELEASE_ACTION_CLEAR  = 0x01,
> +} NvmeReservationReleaseAction;
> +
> +typedef enum {
> +NVME_RESV_ACQUIRE_ACTION_ACQUIRE= 0x00,
> +NVME_RESV_ACQUIRE_ACTION_PREEMPT= 0x01,
> +NVME_RESV_ACQUIRE_ACTION_PREEMPT_AND_ABORT  = 0x02,
> +} NvmeReservationAcquireAction;
> +
> +typedef enum {
> +NVME_RESV_WRITE_EXCLUSIVE   = 0x01,
> +NVME_RESV_EXCLUSIVE_ACCESS  = 0x02,
> +NVME_RESV_WRITE_EXCLUSIVE_REGS_ONLY = 0x03,
> +NVME_RESV_EXCLUSIVE_ACCESS_REGS_ONLY= 0x04,
> +NVME_RESV_WRITE_EXCLUSIVE_ALL_REGS  = 0x05,
> +NVME_RESV_EXCLUSIVE_ACCESS_ALL_REGS = 0x06,
> +} NvmeResvType;
> +
> +typedef enum {
> +NVME_RESV_PTPL_NO_CHANGE = 0x00,
> +NVME_RESV_PTPL_DISABLE   = 0x02,
> +NVME_RESV_PTPL_ENABLE= 0x03,
> +} NvmeResvPTPL;
> +
> +typedef enum NVMEPrCap {
> +/* Persist Through Power Loss */
> +NVME_PR_CAP_PTPL = 1 << 0,
> +/* Write Exclusive reservation type */
> +NVME_PR_CAP_WR_EX = 1 << 1,
> +/* Exclusive Access reservation type */
> +NVME_PR_CAP_EX_AC = 1 << 2,
> +/* Write Exclusive Registrants Only reservation type */
> +NVME_PR_CAP_WR_EX_RO = 1 << 3,
> +/* Exclusive Access Registrants Only reservation type */
> +NVME_PR_CAP_EX_AC_RO = 1 << 4,
> +/* Write Exclusive All Registrants reservation type */
> +NVME_PR_CAP_WR_EX_AR = 1 << 5,
> +/* Exclusive Access All Registrants reservation type */
> +NVME_PR_CAP_EX_AC_AR = 1 << 6,
> +
> +NVME_PR_CAP_ALL = (NVME_PR_CAP_PTPL |
> +  NVME_PR_CAP_WR_EX |
> +  NVME_PR_CAP_EX_AC |
> +  NVME_PR_CAP_WR_EX_RO |
> +  NVME_PR_CAP_EX_AC_RO |
> +  NVME_PR_CAP_WR_EX_AR |
> +  NVME_PR_CAP_EX_AC_AR),
> +} NvmePrCap;
> +
>  typedef struct QEMU_PACKED NvmeDeleteQ {
>  uint8_t opcode;
>  uint8_t flags;
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature


Re: [PATCH v6 08/10] hw/nvme: enable ONCS and rescap function

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:25PM +0800, Changqi Lu wrote:
> This commit enables ONCS to support the reservation
> function at the controller level. Also enables rescap
> function in the namespace by detecting the supported reservation
> function in the backend driver.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  hw/nvme/ctrl.c | 3 ++-
>  hw/nvme/ns.c   | 5 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 127c3d2383..182307a48b 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -8248,7 +8248,8 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  id->nn = cpu_to_le32(NVME_MAX_NAMESPACES);
>  id->oncs = cpu_to_le16(NVME_ONCS_WRITE_ZEROES | NVME_ONCS_TIMESTAMP |
> NVME_ONCS_FEATURES | NVME_ONCS_DSM |
> -   NVME_ONCS_COMPARE | NVME_ONCS_COPY);
> +   NVME_ONCS_COMPARE | NVME_ONCS_COPY |
> +   NVME_ONCS_RESRVATIONS);

RESRVATIONS -> RESERVATIONS typo?

>  
>  /*
>   * NOTE: If this device ever supports a command set that does NOT use 0x0
> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> index ea8db175db..320c9bf658 100644
> --- a/hw/nvme/ns.c
> +++ b/hw/nvme/ns.c
> @@ -20,6 +20,7 @@
>  #include "qemu/bitops.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/block-backend.h"
> +#include "block/block_int.h"
>  
>  #include "nvme.h"
>  #include "trace.h"
> @@ -33,6 +34,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  BlockDriverInfo bdi;
>  int npdg, ret;
>  int64_t nlbas;
> +uint8_t blk_pr_cap;
>  
>  ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
>  ns->lbasz = 1 << ns->lbaf.ds;
> @@ -55,6 +57,9 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  }
>  
>  id_ns->npda = id_ns->npdg = npdg - 1;
> +
> +blk_pr_cap = blk_bs(ns->blkconf.blk)->file->bs->bl.pr_cap;

Kevin: This unprotected block graph access and the assumption that
->file->bs exists could be problematic. What is the best practice for
making this code safe and defensive?

> +id_ns->rescap = block_pr_cap_to_nvme(blk_pr_cap);
>  }
>  
>  static int nvme_ns_init(NvmeNamespace *ns, Error **errp)
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature


Re: [PATCH v6 05/10] hw/scsi: add persistent reservation in/out api for scsi device

2024-07-04 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:22PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations in the
> SCSI device layer. By introducing the persistent
> reservation in/out api, this enables the SCSI device
> to perform reservation-related tasks, including querying
> keys, querying reservation status, registering reservation
> keys, initiating and releasing reservations, as well as
> clearing and preempting reservations held by other keys.
> 
> These operations are crucial for management and control of
> shared storage resources in a persistent manner.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 

As mentioned in my reply to a previous version, I don't understand the
buffer allocation/sizing in hw/scsi/ so I haven't been able to fully
review this code for buffer overflows and input validation. cmd.xfer
isn't consistently used for size checks in the new functions. Maybe some
checks are missing?

> ---
>  hw/scsi/scsi-disk.c | 352 
>  1 file changed, 352 insertions(+)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 4bd7af9d0c..0e964dbd87 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -32,6 +32,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/scsi/emulation.h"
>  #include "scsi/constants.h"
> +#include "scsi/utils.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
>  #include "hw/block/block.h"
> @@ -42,6 +43,7 @@
>  #include "qemu/cutils.h"
>  #include "trace.h"
>  #include "qom/object.h"
> +#include "block/block_int.h"
>  
>  #ifdef __linux
>  #include 
> @@ -1474,6 +1476,346 @@ static void scsi_disk_emulate_read_data(SCSIRequest 
> *req)
>  scsi_req_complete(>req, GOOD);
>  }
>  
> +typedef struct SCSIPrReadKeys {
> +uint32_t generation;
> +uint32_t num_keys;
> +uint64_t *keys;
> +void *req;

Why is this field void * instead of SCSIDiskReq *?

> +} SCSIPrReadKeys;
> +
> +typedef struct SCSIPrReadReservation {
> +uint32_t generation;
> +uint64_t key;
> +BlockPrType type;
> +void *req;

Same here.

> +} SCSIPrReadReservation;
> +
> +static void scsi_pr_read_keys_complete(void *opaque, int ret)
> +{
> +int num_keys;
> +uint8_t *buf;
> +SCSIPrReadKeys *blk_keys = (SCSIPrReadKeys *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_keys->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto done;
> +}
> +
> +buf = scsi_req_get_buf(>req);
> +num_keys = MIN(blk_keys->num_keys, ret);
> +blk_keys->generation = cpu_to_be32(blk_keys->generation);
> +memcpy([0], _keys->generation, 4);
> +for (int i = 0; i < num_keys; i++) {
> +blk_keys->keys[i] = cpu_to_be64(blk_keys->keys[i]);
> +memcpy([8 + i * 8], _keys->keys[i], 8);
> +}
> +num_keys = cpu_to_be32(num_keys * 8);
> +memcpy([4], _keys, 4);
> +
> +scsi_req_data(>req, r->buflen);
> +done:
> +scsi_req_unref(>req);
> +g_free(blk_keys->keys);
> +g_free(blk_keys);
> +}
> +
> +static int scsi_disk_emulate_pr_read_keys(SCSIRequest *req)
> +{
> +SCSIPrReadKeys *blk_keys;
> +SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
> +int buflen = MIN(r->req.cmd.xfer, r->buflen);
> +int num_keys = (buflen - sizeof(uint32_t) * 2) / sizeof(uint64_t);

If buflen is an untrusted input then num_keys < 0 and maybe num_keys ==
0 need to be rejected with an error.

> +
> +blk_keys = g_new0(SCSIPrReadKeys, 1);
> +blk_keys->generation = 0;
> +/* num_keys is the maximum number of keys that can be transmitted */
> +blk_keys->num_keys = num_keys;
> +blk_keys->keys = g_malloc(sizeof(uint64_t) * num_keys);
> +blk_keys->req = r;
> +
> +/* The request is used as the AIO opaque value, so add a ref.  */
> +scsi_req_ref(>req);
> +r->req.aiocb = blk_aio_pr_read_keys(s->qdev.conf.blk, 
> _keys->generation,
> +blk_keys->num_keys, blk_keys->keys,
> +scsi_pr_read_keys_complete, 
> blk_keys);
> +return 0;
> +}
> +
> +static void scsi_pr_read_reservation_complete(void *opaque, int ret)
> +{
> +uint8_t *buf;
> +uint32_t additional_len = 0;
> +SCSIPrReadReservation *blk_rsv = (SCSIPrReadReservation *)opaque;
> +SCSIDiskReq *r = (SCSIDiskReq *)blk_rsv->req;
> +SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
> +
> +assert(blk_get_aio_context(s->qdev.conf.blk) ==
> +qemu_get_current_aio_context());
> +
> +assert(r->req.aiocb != NULL);
> +r->req.aiocb = NULL;
> +
> +if (scsi_disk_req_check_error(r, ret, true)) {
> +goto 

Re: [PATCH v6 04/10] scsi/util: add helper functions for persistent reservation types conversion

2024-06-26 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:21PM +0800, Changqi Lu wrote:
> This commit introduces two helper functions
> that facilitate the conversion between the
> persistent reservation types used in the SCSI
> protocol and those used in the block layer.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/scsi/utils.h |  8 +
>  scsi/utils.c | 81 
>  2 files changed, 89 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v6 04/10] scsi/util: add helper functions for persistent reservation types conversion

2024-06-26 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:21PM +0800, Changqi Lu wrote:
> This commit introduces two helper functions
> that facilitate the conversion between the
> persistent reservation types used in the SCSI
> protocol and those used in the block layer.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/scsi/utils.h |  8 +
>  scsi/utils.c | 81 
>  2 files changed, 89 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v6 03/10] scsi/constant: add persistent reservation in/out protocol constants

2024-06-26 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:20PM +0800, Changqi Lu wrote:
> Add constants for the persistent reservation in/out protocol
> in the scsi/constant module. The constants include the persistent
> reservation command, type, and scope values defined in sections
> 6.13 and 6.14 of the SCSI Primary Commands-4 (SPC-4) specification.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/scsi/constants.h | 52 
>  1 file changed, 52 insertions(+)

These new constants are not copied from Linux include/scsi/scsi_proto.h
like the rest of the file, but it's okay because constants.h is not
kept in sync with the Linux headers.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v6 03/10] scsi/constant: add persistent reservation in/out protocol constants

2024-06-26 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:20PM +0800, Changqi Lu wrote:
> Add constants for the persistent reservation in/out protocol
> in the scsi/constant module. The constants include the persistent
> reservation command, type, and scope values defined in sections
> 6.13 and 6.14 of the SCSI Primary Commands-4 (SPC-4) specification.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  include/scsi/constants.h | 52 
>  1 file changed, 52 insertions(+)

These new constants are not copied from Linux include/scsi/scsi_proto.h
like the rest of the file, but it's okay because constants.h is not
kept in sync with the Linux headers.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v6 02/10] block/raw: add persistent reservation in/out driver

2024-06-26 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:19PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations for raw driver.
> The following methods are implemented: bdrv_co_pr_read_keys,
> bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
> bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/raw-format.c | 56 ++
>  1 file changed, 56 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v6 02/10] block/raw: add persistent reservation in/out driver

2024-06-26 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:19PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations for raw driver.
> The following methods are implemented: bdrv_co_pr_read_keys,
> bdrv_co_pr_read_reservation, bdrv_co_pr_register, bdrv_co_pr_reserve,
> bdrv_co_pr_release, bdrv_co_pr_clear and bdrv_co_pr_preempt.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/raw-format.c | 56 ++
>  1 file changed, 56 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v6 01/10] block: add persistent reservation in/out api

2024-06-26 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:18PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations
> at the block level. The following operations
> are included:
> 
> - read_keys:retrieves the list of registered keys.
> - read_reservation: retrieves the current reservation status.
> - register: registers a new reservation key.
> - reserve:  initiates a reservation for a specific key.
> - release:  releases a reservation for a specific key.
> - clear:clears all existing reservations.
> - preempt:  preempts a reservation held by another key.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/block-backend.c | 403 ++
>  block/io.c| 163 
>  include/block/block-common.h  |  40 +++
>  include/block/block-io.h  |  20 ++
>  include/block/block_int-common.h  |  84 +++
>  include/sysemu/block-backend-io.h |  24 ++
>  6 files changed, 734 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v6 01/10] block: add persistent reservation in/out api

2024-06-26 Thread Stefan Hajnoczi
On Thu, Jun 13, 2024 at 03:13:18PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations
> at the block level. The following operations
> are included:
> 
> - read_keys:retrieves the list of registered keys.
> - read_reservation: retrieves the current reservation status.
> - register: registers a new reservation key.
> - reserve:  initiates a reservation for a specific key.
> - release:  releases a reservation for a specific key.
> - clear:clears all existing reservations.
> - preempt:  preempts a reservation held by another key.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/block-backend.c | 403 ++
>  block/io.c| 163 
>  include/block/block-common.h  |  40 +++
>  include/block/block-io.h  |  20 ++
>  include/block/block_int-common.h  |  84 +++
>  include/sysemu/block-backend-io.h |  24 ++
>  6 files changed, 734 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [RFC PATCH 1/1] vhost-user: add shmem mmap request

2024-06-26 Thread Stefan Hajnoczi
On Wed, 26 Jun 2024 at 03:54, Albert Esteve  wrote:
>
> Hi Stefan,
>
> On Wed, Jun 5, 2024 at 4:28 PM Stefan Hajnoczi  wrote:
>>
>> On Wed, Jun 05, 2024 at 10:13:32AM +0200, Albert Esteve wrote:
>> > On Tue, Jun 4, 2024 at 8:54 PM Stefan Hajnoczi  wrote:
>> >
>> > > On Thu, May 30, 2024 at 05:22:23PM +0200, Albert Esteve wrote:
>> > > > Add SHMEM_MAP/UNMAP requests to vhost-user.
>> > > >
>> > > > This request allows backends to dynamically map
>> > > > fds into a shared memory region indentified by
>> > >
>> > > Please call this "VIRTIO Shared Memory Region" everywhere (code,
>> > > vhost-user spec, commit description, etc) so it's clear that this is not
>> > > about vhost-user shared memory tables/regions.
>> > >
>> > > > its `shmid`. Then, the fd memory is advertised
>> > > > to the frontend through a BAR+offset, so it can
>> > > > be read by the driver while its valid.
>> > >
>> > > Why is a PCI BAR mentioned here? vhost-user does not know about the
>> > > VIRTIO Transport (e.g. PCI) being used. It's the frontend's job to
>> > > report VIRTIO Shared Memory Regions to the driver.
>> > >
>> > >
>> > I will remove PCI BAR, as it is true that it depends on the
>> > transport. I was trying to explain that the driver
>> > will use the shm_base + shm_offset to access
>> > the mapped memory.
>> >
>> >
>> > > >
>> > > > Then, the backend can munmap the memory range
>> > > > in a given shared memory region (again, identified
>> > > > by its `shmid`), to free it. After this, the
>> > > > region becomes private and shall not be accessed
>> > > > by the frontend anymore.
>> > >
>> > > What does "private" mean?
>> > >
>> > > The frontend must mmap PROT_NONE to reserve the virtual memory space
>> > > when no fd is mapped in the VIRTIO Shared Memory Region. Otherwise an
>> > > unrelated mmap(NULL, ...) might use that address range and the guest
>> > > would have access to the host memory! This is a security issue and needs
>> > > to be mentioned explicitly in the spec.
>> > >
>> >
>> > I mentioned private because it changes the mapping from MAP_SHARED
>> > to MAP_PRIVATE. I will highlight PROT_NONE instead.
>>
>> I see. Then "MAP_PRIVATE" would be clearer. I wasn't sure whether you
>> mean mmap flags or something like the memory range is no longer
>> accessible to the driver.
>>
>> >
>> >
>> > >
>> > > >
>> > > > Initializing the memory region is reponsiblity
>> > > > of the PCI device that will using it.
>> > >
>> > > What does this mean?
>> > >
>> >
>> > The MemoryRegion is declared in `struct VirtIODevice`,
>> > but it is uninitialized in this commit. So I was trying to say
>> > that the initialization will happen in, e.g., vhost-user-gpu-pci.c
>> > with something like `memory_region_init` , and later `pci_register_bar`.
>>
>> Okay. The device model needs to create MemoryRegion instances for the
>> device's Shared Memory Regions and add them to the VirtIODevice.
>>
>> --device vhost-user-device will need to query the backend since, unlike
>> vhost-user-gpu-pci and friends, it doesn't have knowledge of specific
>> device types. It will need to create MemoryRegions enumerated from the
>> backend.
>>
>> By the way, the VIRTIO MMIO Transport also supports VIRTIO Shared Memory
>> Regions so this work should not be tied to PCI.
>>
>> >
>> > I am testing that part still.
>> >
>> >
>> > >
>> > > >
>> > > > Signed-off-by: Albert Esteve 
>> > > > ---
>> > > >  docs/interop/vhost-user.rst |  23 
>> > > >  hw/virtio/vhost-user.c  | 106 
>> > > >  hw/virtio/virtio.c  |   2 +
>> > > >  include/hw/virtio/virtio.h  |   3 +
>> > > >  4 files changed, 134 insertions(+)
>> > >
>> > > Two missing pieces:
>> > >
>> > > 1. QEMU's --device vhost-user-device needs a way to enumerate VIRTIO
>> > > Shared Memory Regions from the vhost-user backend. vhost-user-device is
>> > > a generic vhost-user 

Re: [RFC PATCH v3 2/5] rust: add bindgen step as a meson dependency

2024-06-24 Thread Stefan Hajnoczi
On Thu, 20 Jun 2024 at 14:35, Manos Pitsidianakis
 wrote:
>
> On Thu, 20 Jun 2024 15:32, Alex Bennée  wrote:
> >Manos Pitsidianakis  writes:
> >
> >> Add mechanism to generate rust hw targets that depend on a custom
> >> bindgen target for rust bindings to C.
> >>
> >> This way bindings will be created before the rust crate is compiled.
> >>
> >> The bindings will end up in BUILDDIR/{target}-generated.rs and have the 
> >> same name
> >> as a target:
> >>
> >> ninja aarch64-softmmu-generated.rs
> >>
> >
> >> +
> >> +
> >> +rust_targets = {}
> >> +
> >> +cargo_wrapper = [
> >> +  find_program(meson.global_source_root() / 'scripts/cargo_wrapper.py'),
> >> +  '--config-headers', meson.project_build_root() / 'config-host.h',
> >> +  '--meson-build-root', meson.project_build_root(),
> >> +  '--meson-build-dir', meson.current_build_dir(),
> >> +  '--meson-source-dir', meson.current_source_dir(),
> >> +]
> >
> >I'm unclear what the difference between meson-build-root and
> >meson-build-dir is?
>
> Build-dir is the subdir of the current subdir(...) meson.build file
>
> So if we are building under qemu/build, meson_build_root is qemu/build
> and meson_build_dir is qemu/build/rust
>
> >
> >We also end up defining crate-dir and outdir. Aren't these all
> >derivable from whatever module we are building?
>
> Crate dir is the source directory (i.e. qemu/rust/pl011) that contains
> the crate's manifest file Cargo.toml.
>
> Outdir is where to put the final build artifact for meson to find. We
> could derive that from the build directories and package names somehow
> but I chose to be explicit instead of doing indirect logic to make the
> process less magic.
>
> I know it's a lot so I'm open to simplifications. The only problem is
> that all of these directories, except the crate source code, are defined
> from meson and can change with any refactor we do from the meson side of
> things.

Expanding the help text for these command-line options would make it
easier to understand. It would be great to include an example path
too.

Stefan



Re: [PATCH v2] Consider discard option when writing zeros

2024-06-24 Thread Stefan Hajnoczi
On Wed, Jun 19, 2024 at 08:43:25PM +0300, Nir Soffer wrote:
> Tested using:

Hi Nir,
This looks like a good candidate for the qemu-iotests test suite. Adding
it to the automated tests will protect against future regressions.

Please add the script and the expected output to
tests/qemu-iotests/test/write-zeroes-unmap and run it using
`(cd build && tests/qemu-iotests/check write-zeroes-unmap)`.

See the existing test cases in tests/qemu-iotests/ and
tests/qemu-iotests/tests/ for examples. Some are shell scripts and
others are Python. I think shell makes sense for this test case. You can
copy the test framework boilerplate from an existing test case.

Thanks,
Stefan

> 
> $ cat test-unmap.sh
> #!/bin/sh
> 
> qemu=${1:?Usage: $0 qemu-executable}
> img=/tmp/test.raw
> 
> echo
> echo "defaults - write zeroes"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -z 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw >/dev/null
> du -sh $img
> 
> echo
> echo "defaults - write zeroes unmap"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw >/dev/null
> du -sh $img
> 
> echo
> echo "defaults - write actual zeros"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw >/dev/null
> du -sh $img
> 
> echo
> echo "discard=off - write zeroes unmap"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,discard=off >/dev/null
> du -sh $img
> 
> echo
> echo "detect-zeros=on - write actual zeros"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,detect-zeroes=on >/dev/null
> du -sh $img
> 
> echo
> echo "detect-zeros=unmap,discard=unmap - write actual zeros"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' |  $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,detect-zeroes=unmap,discard=unmap
> >/dev/null
> du -sh $img
> 
> echo
> echo "discard=unmap - write zeroes"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -z 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,discard=unmap >/dev/null
> du -sh $img
> 
> echo
> echo "discard=unmap - write zeroes unmap"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,discard=unmap >/dev/null
> du -sh $img
> 
> rm $img
> 
> 
> Before this change:
> 
> $ cat before.out
> 
> defaults - write zeroes
> 1.0M /tmp/test.raw
> 
> defaults - write zeroes unmap
> 0 /tmp/test.raw
> 
> defaults - write actual zeros
> 1.0M /tmp/test.raw
> 
> discard=off - write zeroes unmap
> 0 /tmp/test.raw
> 
> detect-zeros=on - write actual zeros
> 1.0M /tmp/test.raw
> 
> detect-zeros=unmap,discard=unmap - write actual zeros
> 0 /tmp/test.raw
> 
> discard=unmap - write zeroes
> 1.0M /tmp/test.raw
> 
> discard=unmap - write zeroes unmap
> 0 /tmp/test.raw
> [nsoffer build (consider-discard-option)]$
> 
> 
> After this change:
> 
> $ cat after.out
> 
> defaults - write zeroes
> 1.0M /tmp/test.raw
> 
> defaults - write zeroes unmap
> 1.0M /tmp/test.raw
> 
> defaults - write actual zeros
> 1.0M /tmp/test.raw
> 
> discard=off - write zeroes unmap
> 1.0M /tmp/test.raw
> 
> detect-zeros=on - write actual zeros
> 1.0M /tmp/test.raw
> 
> detect-zeros=unmap,discard=unmap - write actual zeros
> 0 /tmp/test.raw
> 
> discard=unmap - write zeroes
> 1.0M /tmp/test.raw
> 
> discard=unmap - write zeroes unmap
> 0 /tmp/test.raw
> 
> 
> Differences:
> 
> $ diff -u before.out after.out
> --- before.out 2024-06-19 20:24:09.234083713 +0300
> +++ after.out 2024-06-19 20:24:20.526165573 +0300
> @@ -3,13 +3,13 @@
>  1.0M /tmp/test.raw
> 
>  defaults - write zeroes unmap
> -0 /tmp/test.raw
> +1.0M /tmp/test.raw
> 
>  defaults - write actual zeros
>  1.0M /tmp/test.raw
> 
>  discard=off - write zeroes unmap
> -0 /tmp/test.raw
> +1.0M /tmp/test.raw
> 
> On Wed, Jun 19, 2024 at 8:40 PM Nir Soffer  wrote:
> 
> > When opening an image with discard=off, we punch hole in the image when
> > writing zeroes, making the image sparse. This breaks users that want to
> > ensure that writes cannot fail with ENOSPACE by using fully allocated
> > images.
> >
> > bdrv_co_pwrite_zeroes() correctly disable BDRV_REQ_MAY_UNMAP if we
> > opened the child without discard=unmap or discard=on. But we don't go
> > through this function when accessing the top node. Move the check down
> > to bdrv_co_do_pwrite_zeroes() which seems to be used in all code paths.
> >
> > Issues:
> > - We don't punch hole by default, so images are kept allocated. Before
> >   this change we punched holes by default. I'm not sure this is a good
> >   change in behavior.
> > - Need to run all block tests
> > - Not sure that we have tests covering unmapping, 

Re: [PATCH v2] Consider discard option when writing zeros

2024-06-24 Thread Stefan Hajnoczi
On Wed, Jun 19, 2024 at 08:43:25PM +0300, Nir Soffer wrote:
> Tested using:

Hi Nir,
This looks like a good candidate for the qemu-iotests test suite. Adding
it to the automated tests will protect against future regressions.

Please add the script and the expected output to
tests/qemu-iotests/test/write-zeroes-unmap and run it using
`(cd build && tests/qemu-iotests/check write-zeroes-unmap)`.

See the existing test cases in tests/qemu-iotests/ and
tests/qemu-iotests/tests/ for examples. Some are shell scripts and
others are Python. I think shell makes sense for this test case. You can
copy the test framework boilerplate from an existing test case.

Thanks,
Stefan

> 
> $ cat test-unmap.sh
> #!/bin/sh
> 
> qemu=${1:?Usage: $0 qemu-executable}
> img=/tmp/test.raw
> 
> echo
> echo "defaults - write zeroes"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -z 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw >/dev/null
> du -sh $img
> 
> echo
> echo "defaults - write zeroes unmap"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw >/dev/null
> du -sh $img
> 
> echo
> echo "defaults - write actual zeros"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw >/dev/null
> du -sh $img
> 
> echo
> echo "discard=off - write zeroes unmap"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,discard=off >/dev/null
> du -sh $img
> 
> echo
> echo "detect-zeros=on - write actual zeros"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,detect-zeroes=on >/dev/null
> du -sh $img
> 
> echo
> echo "detect-zeros=unmap,discard=unmap - write actual zeros"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -P 0 0 1m"\nquit' |  $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,detect-zeroes=unmap,discard=unmap
> >/dev/null
> du -sh $img
> 
> echo
> echo "discard=unmap - write zeroes"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -z 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,discard=unmap >/dev/null
> du -sh $img
> 
> echo
> echo "discard=unmap - write zeroes unmap"
> fallocate -l 1m $img
> echo -e 'qemu-io none0 "write -zu 0 1m"\nquit' | $qemu -monitor stdio \
> -drive if=none,file=$img,format=raw,discard=unmap >/dev/null
> du -sh $img
> 
> rm $img
> 
> 
> Before this change:
> 
> $ cat before.out
> 
> defaults - write zeroes
> 1.0M /tmp/test.raw
> 
> defaults - write zeroes unmap
> 0 /tmp/test.raw
> 
> defaults - write actual zeros
> 1.0M /tmp/test.raw
> 
> discard=off - write zeroes unmap
> 0 /tmp/test.raw
> 
> detect-zeros=on - write actual zeros
> 1.0M /tmp/test.raw
> 
> detect-zeros=unmap,discard=unmap - write actual zeros
> 0 /tmp/test.raw
> 
> discard=unmap - write zeroes
> 1.0M /tmp/test.raw
> 
> discard=unmap - write zeroes unmap
> 0 /tmp/test.raw
> [nsoffer build (consider-discard-option)]$
> 
> 
> After this change:
> 
> $ cat after.out
> 
> defaults - write zeroes
> 1.0M /tmp/test.raw
> 
> defaults - write zeroes unmap
> 1.0M /tmp/test.raw
> 
> defaults - write actual zeros
> 1.0M /tmp/test.raw
> 
> discard=off - write zeroes unmap
> 1.0M /tmp/test.raw
> 
> detect-zeros=on - write actual zeros
> 1.0M /tmp/test.raw
> 
> detect-zeros=unmap,discard=unmap - write actual zeros
> 0 /tmp/test.raw
> 
> discard=unmap - write zeroes
> 1.0M /tmp/test.raw
> 
> discard=unmap - write zeroes unmap
> 0 /tmp/test.raw
> 
> 
> Differences:
> 
> $ diff -u before.out after.out
> --- before.out 2024-06-19 20:24:09.234083713 +0300
> +++ after.out 2024-06-19 20:24:20.526165573 +0300
> @@ -3,13 +3,13 @@
>  1.0M /tmp/test.raw
> 
>  defaults - write zeroes unmap
> -0 /tmp/test.raw
> +1.0M /tmp/test.raw
> 
>  defaults - write actual zeros
>  1.0M /tmp/test.raw
> 
>  discard=off - write zeroes unmap
> -0 /tmp/test.raw
> +1.0M /tmp/test.raw
> 
> On Wed, Jun 19, 2024 at 8:40 PM Nir Soffer  wrote:
> 
> > When opening an image with discard=off, we punch hole in the image when
> > writing zeroes, making the image sparse. This breaks users that want to
> > ensure that writes cannot fail with ENOSPACE by using fully allocated
> > images.
> >
> > bdrv_co_pwrite_zeroes() correctly disable BDRV_REQ_MAY_UNMAP if we
> > opened the child without discard=unmap or discard=on. But we don't go
> > through this function when accessing the top node. Move the check down
> > to bdrv_co_do_pwrite_zeroes() which seems to be used in all code paths.
> >
> > Issues:
> > - We don't punch hole by default, so images are kept allocated. Before
> >   this change we punched holes by default. I'm not sure this is a good
> >   change in behavior.
> > - Need to run all block tests
> > - Not sure that we have tests covering unmapping, 

Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context

2024-06-18 Thread Stefan Hajnoczi
On Fri, Jun 14, 2024 at 11:29:13AM +0200, Fiona Ebner wrote:
> Am 12.06.24 um 17:34 schrieb Stefan Hajnoczi:
> > 
> > Thank you for investigating! It looks like we would be trading one
> > issue (the assertion failures you mentioned) for another (a rare, but
> > possible, hang).
> > 
> > I'm not sure what the best solution is. It seems like vm_stop() is the
> > first place where things go awry. It's where we should exit device
> > emulation code. Doing that probably requires an asynchronous API that
> > takes a callback. Do you want to try that?
> > 
> 
> I can try, but I'm afraid it will be a while (at least a few weeks)
> until I can get around to it.

I am wrapping current work up and then going on vacation at the end of
June until mid-July. I'll let you know if I get a chance to look at it
when I'm back.

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context

2024-06-12 Thread Stefan Hajnoczi
On Wed, 12 Jun 2024 at 05:21, Fiona Ebner  wrote:
>
> Am 11.06.24 um 16:04 schrieb Stefan Hajnoczi:
> > On Tue, Jun 11, 2024 at 02:08:49PM +0200, Fiona Ebner wrote:
> >> Am 06.06.24 um 20:36 schrieb Stefan Hajnoczi:
> >>> On Wed, Jun 05, 2024 at 02:08:48PM +0200, Fiona Ebner wrote:
> >>>> The fact that the snapshot_save_job_bh() is scheduled in the main
> >>>> loop's qemu_aio_context AioContext means that it might get executed
> >>>> during a vCPU thread's aio_poll(). But saving of the VM state cannot
> >>>> happen while the guest or devices are active and can lead to assertion
> >>>> failures. See issue #2111 for two examples. Avoid the problem by
> >>>> scheduling the snapshot_save_job_bh() in the iohandler AioContext,
> >>>> which is not polled by vCPU threads.
> >>>>
> >>>> Solves Issue #2111.
> >>>>
> >>>> This change also solves the following issue:
> >>>>
> >>>> Since commit effd60c878 ("monitor: only run coroutine commands in
> >>>> qemu_aio_context"), the 'snapshot-save' QMP call would not respond
> >>>> right after starting the job anymore, but only after the job finished,
> >>>> which can take a long time. The reason is, because after commit
> >>>> effd60c878, do_qmp_dispatch_bh() runs in the iohandler AioContext.
> >>>> When do_qmp_dispatch_bh() wakes the qmp_dispatch() coroutine, the
> >>>> coroutine cannot be entered immediately anymore, but needs to be
> >>>> scheduled to the main loop's qemu_aio_context AioContext. But
> >>>> snapshot_save_job_bh() was scheduled first to the same AioContext and
> >>>> thus gets executed first.
> >>>>
> >>>> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/2111
> >>>> Signed-off-by: Fiona Ebner 
> >>>> ---
> >>>>
> >>>> While initial smoke testing seems fine, I'm not familiar enough with
> >>>> this to rule out any pitfalls with the approach. Any reason why
> >>>> scheduling to the iohandler AioContext could be wrong here?
> >>>
> >>> If something waits for a BlockJob to finish using aio_poll() from
> >>> qemu_aio_context then a deadlock is possible since the iohandler_ctx
> >>> won't get a chance to execute. The only suspicious code path I found was
> >>> job_completed_txn_abort_locked() -> job_finish_sync_locked() but I'm not
> >>> sure whether it triggers this scenario. Please check that code path.
> >>>
> >>
> >> Sorry, I don't understand. Isn't executing the scheduled BH the only
> >> additional progress that the iohandler_ctx needs to make compared to
> >> before the patch? How exactly would that cause issues when waiting for a
> >> BlockJob?
> >>
> >> Or do you mean something waiting for the SnapshotJob from
> >> qemu_aio_context before snapshot_save_job_bh had the chance to run?
> >
> > Yes, exactly. job_finish_sync_locked() will hang since iohandler_ctx has
> > no chance to execute. But I haven't audited the code to understand
> > whether this can happen.
> So job_finish_sync_locked() is executed in
> job_completed_txn_abort_locked() when the following branch is taken
>
> > if (!job_is_completed_locked(other_job))
>
> and there is no other job in the transaction, so we can assume other_job
> being the snapshot-save job itself.
>
> The callers of job_completed_txn_abort_locked():
>
> 1. in job_do_finalize_locked() if job->ret is non-zero. The callers of
> which are:
>
> 1a. in job_finalize_locked() if JOB_VERB_FINALIZE is allowed, meaning
> job->status is JOB_STATUS_PENDING, so job_is_completed_locked() will be
> true.
>
> 1b. job_completed_txn_success_locked() sets job->status to
> JOB_STATUS_WAITING before, so job_is_completed_locked() will be true.
>
> 2. in job_completed_locked() it is only done if job->ret is non-zero, in
> which case job->status was set to JOB_STATUS_ABORTING by the preceding
> job_update_rc_locked(), and thus job_is_completed_locked() will be true.
>
> 3. in job_cancel_locked() if job->deferred_to_main_loop is true, which
> is set in job_co_entry() before job_exit() is scheduled as a BH and is
> also set in job_do_dismiss_locked(). In the former case, the
> snapshot_save_job_bh has already been executed. In the latter case,
> job_is_completed_locked() will be true (since job_early_fail() is not
> used for the snapshot job).
>
>
> Howeve

Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode

2024-06-11 Thread Stefan Hajnoczi
On Tue, Jun 11, 2024 at 07:19:08AM +0200, Christoph Hellwig wrote:
> virtblk_update_cache_mode boils down to a single call to
> blk_queue_write_cache.  Remove it in preparation for moving the cache
> control flags into the queue_limits.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/virtio_blk.c | 13 +++--
>  1 file changed, 3 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [RFC PATCH v1 1/6] build-sys: Add rust feature option

2024-06-11 Thread Stefan Hajnoczi
On Tue, 11 Jun 2024 at 13:54, Manos Pitsidianakis
 wrote:
>
> On Tue, 11 Jun 2024 at 17:05, Stefan Hajnoczi  wrote:
> >
> > On Mon, Jun 10, 2024 at 09:22:36PM +0300, Manos Pitsidianakis wrote:
> > > Add options for Rust in meson_options.txt, meson.build, configure to
> > > prepare for adding Rust code in the followup commits.
> > >
> > > `rust` is a reserved meson name, so we have to use an alternative.
> > > `with_rust` was chosen.
> > >
> > > Signed-off-by: Manos Pitsidianakis 
> > > ---
> > > The cargo wrapper script hardcodes some rust target triples. This is
> > > just temporary.
> > > ---
> > >  .gitignore   |   2 +
> > >  configure|  12 +++
> > >  meson.build  |  11 ++
> > >  meson_options.txt|   4 +
> > >  scripts/cargo_wrapper.py | 211 +++
> > >  5 files changed, 240 insertions(+)
> > >  create mode 100644 scripts/cargo_wrapper.py
> > >
> > > diff --git a/.gitignore b/.gitignore
> > > index 61fa39967b..f42b0d937e 100644
> > > --- a/.gitignore
> > > +++ b/.gitignore
> > > @@ -2,6 +2,8 @@
> > >  /build/
> > >  /.cache/
> > >  /.vscode/
> > > +/target/
> > > +rust/**/target
> >
> > Are these necessary since the cargo build command-line below uses
> > --target-dir ?
> >
> > Adding new build output directories outside build/ makes it harder to
> > clean up the source tree and ensure no state from previous builds
> > remains.
>
> Agreed! These build directories would show up when using cargo
> directly instead of through the cargo_wrapper.py script, i.e. during
> development. I'd consider it an edge case, it won't happen much and if
> it does it's better to gitignore them than accidentally checking them
> in. Also, whatever artifacts are in a `target` directory won't be used
> for compilation with qemu inside a build directory.

Why would someone bypass the build system? I don't think we should
encourage developers to do this.

>
>
> > >  *.pyc
> > >  .sdk
> > >  .stgit-*
> > > diff --git a/configure b/configure
> > > index 38ee257701..c195630771 100755
> > > --- a/configure
> > > +++ b/configure
> > > @@ -302,6 +302,9 @@ else
> > >objcc="${objcc-${cross_prefix}clang}"
> > >  fi
> > >
> > > +with_rust="auto"
> > > +with_rust_target_triple=""
> > > +
> > >  ar="${AR-${cross_prefix}ar}"
> > >  as="${AS-${cross_prefix}as}"
> > >  ccas="${CCAS-$cc}"
> > > @@ -760,6 +763,12 @@ for opt do
> > >;;
> > >--gdb=*) gdb_bin="$optarg"
> > >;;
> > > +  --enable-rust) with_rust=enabled
> > > +  ;;
> > > +  --disable-rust) with_rust=disabled
> > > +  ;;
> > > +  --rust-target-triple=*) with_rust_target_triple="$optarg"
> > > +  ;;
> > ># everything else has the same name in configure and meson
> > >--*) meson_option_parse "$opt" "$optarg"
> > >;;
> > > @@ -1796,6 +1805,9 @@ if test "$skip_meson" = no; then
> > >test -n "${LIB_FUZZING_ENGINE+xxx}" && meson_option_add 
> > > "-Dfuzzing_engine=$LIB_FUZZING_ENGINE"
> > >test "$plugins" = yes && meson_option_add "-Dplugins=true"
> > >test "$tcg" != enabled && meson_option_add "-Dtcg=$tcg"
> > > +  test "$with_rust" != enabled && meson_option_add 
> > > "-Dwith_rust=$with_rust"
> > > +  test "$with_rust" != enabled && meson_option_add 
> > > "-Dwith_rust=$with_rust"
> >
> > Duplicate line.
>
> Thanks!
>
> >
> > > +  test "$with_rust_target_triple" != "" && meson_option_add 
> > > "-Dwith_rust_target_triple=$with_rust_target_triple"
> > >run_meson() {
> > >  NINJA=$ninja $meson setup "$@" "$PWD" "$source_path"
> > >}
> > > diff --git a/meson.build b/meson.build
> > > index a9de71d450..3533889852 100644
> > > --- a/meson.build
> > > +++ b/meson.build
> > > @@ -290,6 +290,12 @@ foreach lang : all_languages
> > >endif
> > >  endforeach
> > >
> > > +cargo = not_found
> > >

Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode

2024-06-11 Thread Stefan Hajnoczi
On Tue, Jun 11, 2024 at 07:19:08AM +0200, Christoph Hellwig wrote:
> virtblk_update_cache_mode boils down to a single call to
> blk_queue_write_cache.  Remove it in preparation for moving the cache
> control flags into the queue_limits.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/virtio_blk.c | 13 +++--
>  1 file changed, 3 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode

2024-06-11 Thread Stefan Hajnoczi
On Tue, Jun 11, 2024 at 07:19:08AM +0200, Christoph Hellwig wrote:
> virtblk_update_cache_mode boils down to a single call to
> blk_queue_write_cache.  Remove it in preparation for moving the cache
> control flags into the queue_limits.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/virtio_blk.c | 13 +++--
>  1 file changed, 3 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode

2024-06-11 Thread Stefan Hajnoczi
On Tue, Jun 11, 2024 at 07:19:08AM +0200, Christoph Hellwig wrote:
> virtblk_update_cache_mode boils down to a single call to
> blk_queue_write_cache.  Remove it in preparation for moving the cache
> control flags into the queue_limits.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/virtio_blk.c | 13 +++--
>  1 file changed, 3 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: Re: [PATCH v5 00/10] Support persistent reservation operations

2024-06-11 Thread Stefan Hajnoczi
On Mon, Jun 10, 2024 at 07:55:20PM -0700, 卢长奇 wrote:
> Hi,
> 
> Sorry, I explained it in patch2 and forgot to reply your email.
> 
> The existing PRManager only works with local scsi devices. This series
> will completely decouple devices and drivers. The device can not only be
> scsi, but also other devices such as nvme. The same is true for the
> driver, which is completely unrestricted.
> 
> And block/file-posix.c can implement the new block driver, and
> pr_manager can be executed after splicing ioctl commands in these
> drivers. This will be implemented in subsequent patches.

Thanks for explaining!

Stefan

> 
> On 2024/6/11 01:18, Stefan Hajnoczi wrote:
> > On Thu, Jun 06, 2024 at 08:24:34PM +0800, Changqi Lu wrote:
> >> Hi,
> >>
> >> patchv5 has been modified.
> >>
> >> Sincerely hope that everyone can help review the
> >> code and provide some suggestions.
> >>
> >> v4->v5:
> >> - Fixed a memory leak bug at hw/nvme/ctrl.c.
> >>
> >> v3->v4:
> >> - At the nvme layer, the two patches of enabling the ONCS
> >> function and enabling rescap are combined into one.
> >> - At the nvme layer, add helper functions for pr capacity
> >> conversion between the block layer and the nvme layer.
> >>
> >> v2->v3:
> >> In v2 Persist Through Power Loss(PTPL) is enable default.
> >> In v3 PTPL is supported, which is passed as a parameter.
> >>
> >> v1->v2:
> >> - Add sg_persist --report-capabilities for SCSI protocol and enable
> >> oncs and rescap for NVMe protocol.
> >> - Add persistent reservation capabilities constants and helper functions
> for
> >> SCSI and NVMe protocol.
> >> - Add comments for necessary APIs.
> >>
> >> v1:
> >> - Add seven APIs about persistent reservation command for block layer.
> >> These APIs including reading keys, reading reservations, registering,
> >> reserving, releasing, clearing and preempting.
> >> - Add the necessary pr-related operation APIs for both the
> >> SCSI protocol and NVMe protocol at the device layer.
> >> - Add scsi driver at the driver layer to verify the functions
> >
> > My question from v1 is unanswered:
> >
> > What is the relationship to the existing PRManager functionality
> > (docs/interop/pr-helper.rst) where block/file-posix.c interprets SCSI
> > ioctls and sends persistent reservation requests to an external helper
> > process?
> >
> > I wonder if block/file-posix.c can implement the new block driver
> > callbacks using pr_mgr (while keeping the existing scsi-generic
> > support).
> >
> > Thanks,
> > Stefan
> >
> >>
> >>
> >> Changqi Lu (10):
> >> block: add persistent reservation in/out api
> >> block/raw: add persistent reservation in/out driver
> >> scsi/constant: add persistent reservation in/out protocol constants
> >> scsi/util: add helper functions for persistent reservation types
> >> conversion
> >> hw/scsi: add persistent reservation in/out api for scsi device
> >> block/nvme: add reservation command protocol constants
> >> hw/nvme: add helper functions for converting reservation types
> >> hw/nvme: enable ONCS and rescap function
> >> hw/nvme: add reservation protocal command
> >> block/iscsi: add persistent reservation in/out driver
> >>
> >> block/block-backend.c | 397 ++
> >> block/io.c | 163 +++
> >> block/iscsi.c | 443 ++
> >> block/raw-format.c | 56 
> >> hw/nvme/ctrl.c | 326 +-
> >> hw/nvme/ns.c | 5 +
> >> hw/nvme/nvme.h | 84 ++
> >> hw/scsi/scsi-disk.c | 352 
> >> include/block/block-common.h | 40 +++
> >> include/block/block-io.h | 20 ++
> >> include/block/block_int-common.h | 84 ++
> >> include/block/nvme.h | 98 +++
> >> include/scsi/constants.h | 52 
> >> include/scsi/utils.h | 8 +
> >> include/sysemu/block-backend-io.h | 24 ++
> >> scsi/utils.c | 81 ++
> >> 16 files changed, 2231 insertions(+), 2 deletions(-)
> >>
> >> --
> >> 2.20.1
> >>


signature.asc
Description: PGP signature


Re: Re: [PATCH v5 00/10] Support persistent reservation operations

2024-06-11 Thread Stefan Hajnoczi
On Mon, Jun 10, 2024 at 07:55:20PM -0700, 卢长奇 wrote:
> Hi,
> 
> Sorry, I explained it in patch2 and forgot to reply your email.
> 
> The existing PRManager only works with local scsi devices. This series
> will completely decouple devices and drivers. The device can not only be
> scsi, but also other devices such as nvme. The same is true for the
> driver, which is completely unrestricted.
> 
> And block/file-posix.c can implement the new block driver, and
> pr_manager can be executed after splicing ioctl commands in these
> drivers. This will be implemented in subsequent patches.

Thanks for explaining!

Stefan

> 
> On 2024/6/11 01:18, Stefan Hajnoczi wrote:
> > On Thu, Jun 06, 2024 at 08:24:34PM +0800, Changqi Lu wrote:
> >> Hi,
> >>
> >> patchv5 has been modified.
> >>
> >> Sincerely hope that everyone can help review the
> >> code and provide some suggestions.
> >>
> >> v4->v5:
> >> - Fixed a memory leak bug at hw/nvme/ctrl.c.
> >>
> >> v3->v4:
> >> - At the nvme layer, the two patches of enabling the ONCS
> >> function and enabling rescap are combined into one.
> >> - At the nvme layer, add helper functions for pr capacity
> >> conversion between the block layer and the nvme layer.
> >>
> >> v2->v3:
> >> In v2 Persist Through Power Loss(PTPL) is enable default.
> >> In v3 PTPL is supported, which is passed as a parameter.
> >>
> >> v1->v2:
> >> - Add sg_persist --report-capabilities for SCSI protocol and enable
> >> oncs and rescap for NVMe protocol.
> >> - Add persistent reservation capabilities constants and helper functions
> for
> >> SCSI and NVMe protocol.
> >> - Add comments for necessary APIs.
> >>
> >> v1:
> >> - Add seven APIs about persistent reservation command for block layer.
> >> These APIs including reading keys, reading reservations, registering,
> >> reserving, releasing, clearing and preempting.
> >> - Add the necessary pr-related operation APIs for both the
> >> SCSI protocol and NVMe protocol at the device layer.
> >> - Add scsi driver at the driver layer to verify the functions
> >
> > My question from v1 is unanswered:
> >
> > What is the relationship to the existing PRManager functionality
> > (docs/interop/pr-helper.rst) where block/file-posix.c interprets SCSI
> > ioctls and sends persistent reservation requests to an external helper
> > process?
> >
> > I wonder if block/file-posix.c can implement the new block driver
> > callbacks using pr_mgr (while keeping the existing scsi-generic
> > support).
> >
> > Thanks,
> > Stefan
> >
> >>
> >>
> >> Changqi Lu (10):
> >> block: add persistent reservation in/out api
> >> block/raw: add persistent reservation in/out driver
> >> scsi/constant: add persistent reservation in/out protocol constants
> >> scsi/util: add helper functions for persistent reservation types
> >> conversion
> >> hw/scsi: add persistent reservation in/out api for scsi device
> >> block/nvme: add reservation command protocol constants
> >> hw/nvme: add helper functions for converting reservation types
> >> hw/nvme: enable ONCS and rescap function
> >> hw/nvme: add reservation protocal command
> >> block/iscsi: add persistent reservation in/out driver
> >>
> >> block/block-backend.c | 397 ++
> >> block/io.c | 163 +++
> >> block/iscsi.c | 443 ++
> >> block/raw-format.c | 56 
> >> hw/nvme/ctrl.c | 326 +-
> >> hw/nvme/ns.c | 5 +
> >> hw/nvme/nvme.h | 84 ++
> >> hw/scsi/scsi-disk.c | 352 
> >> include/block/block-common.h | 40 +++
> >> include/block/block-io.h | 20 ++
> >> include/block/block_int-common.h | 84 ++
> >> include/block/nvme.h | 98 +++
> >> include/scsi/constants.h | 52 
> >> include/scsi/utils.h | 8 +
> >> include/sysemu/block-backend-io.h | 24 ++
> >> scsi/utils.c | 81 ++
> >> 16 files changed, 2231 insertions(+), 2 deletions(-)
> >>
> >> --
> >> 2.20.1
> >>


signature.asc
Description: PGP signature


Re: [RFC PATCH v1 1/6] build-sys: Add rust feature option

2024-06-11 Thread Stefan Hajnoczi
On Mon, Jun 10, 2024 at 09:22:36PM +0300, Manos Pitsidianakis wrote:
> Add options for Rust in meson_options.txt, meson.build, configure to
> prepare for adding Rust code in the followup commits.
> 
> `rust` is a reserved meson name, so we have to use an alternative.
> `with_rust` was chosen.
> 
> Signed-off-by: Manos Pitsidianakis 
> ---
> The cargo wrapper script hardcodes some rust target triples. This is 
> just temporary.
> ---
>  .gitignore   |   2 +
>  configure|  12 +++
>  meson.build  |  11 ++
>  meson_options.txt|   4 +
>  scripts/cargo_wrapper.py | 211 +++
>  5 files changed, 240 insertions(+)
>  create mode 100644 scripts/cargo_wrapper.py
> 
> diff --git a/.gitignore b/.gitignore
> index 61fa39967b..f42b0d937e 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -2,6 +2,8 @@
>  /build/
>  /.cache/
>  /.vscode/
> +/target/
> +rust/**/target

Are these necessary since the cargo build command-line below uses
--target-dir ?

Adding new build output directories outside build/ makes it harder to
clean up the source tree and ensure no state from previous builds
remains.

>  *.pyc
>  .sdk
>  .stgit-*
> diff --git a/configure b/configure
> index 38ee257701..c195630771 100755
> --- a/configure
> +++ b/configure
> @@ -302,6 +302,9 @@ else
>objcc="${objcc-${cross_prefix}clang}"
>  fi
>  
> +with_rust="auto"
> +with_rust_target_triple=""
> +
>  ar="${AR-${cross_prefix}ar}"
>  as="${AS-${cross_prefix}as}"
>  ccas="${CCAS-$cc}"
> @@ -760,6 +763,12 @@ for opt do
>;;
>--gdb=*) gdb_bin="$optarg"
>;;
> +  --enable-rust) with_rust=enabled
> +  ;;
> +  --disable-rust) with_rust=disabled
> +  ;;
> +  --rust-target-triple=*) with_rust_target_triple="$optarg"
> +  ;;
># everything else has the same name in configure and meson
>--*) meson_option_parse "$opt" "$optarg"
>;;
> @@ -1796,6 +1805,9 @@ if test "$skip_meson" = no; then
>test -n "${LIB_FUZZING_ENGINE+xxx}" && meson_option_add 
> "-Dfuzzing_engine=$LIB_FUZZING_ENGINE"
>test "$plugins" = yes && meson_option_add "-Dplugins=true"
>test "$tcg" != enabled && meson_option_add "-Dtcg=$tcg"
> +  test "$with_rust" != enabled && meson_option_add "-Dwith_rust=$with_rust"
> +  test "$with_rust" != enabled && meson_option_add "-Dwith_rust=$with_rust"

Duplicate line.

> +  test "$with_rust_target_triple" != "" && meson_option_add 
> "-Dwith_rust_target_triple=$with_rust_target_triple"
>run_meson() {
>  NINJA=$ninja $meson setup "$@" "$PWD" "$source_path"
>}
> diff --git a/meson.build b/meson.build
> index a9de71d450..3533889852 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -290,6 +290,12 @@ foreach lang : all_languages
>endif
>  endforeach
>  
> +cargo = not_found
> +if get_option('with_rust').allowed()
> +  cargo = find_program('cargo', required: get_option('with_rust'))
> +endif
> +with_rust = cargo.found()
> +
>  # default flags for all hosts
>  # We use -fwrapv to tell the compiler that we require a C dialect where
>  # left shift of signed integers is well defined and has the expected
> @@ -2066,6 +2072,7 @@ endif
>  
>  config_host_data = configuration_data()
>  
> +config_host_data.set('CONFIG_WITH_RUST', with_rust)
>  audio_drivers_selected = []
>  if have_system
>audio_drivers_available = {
> @@ -4190,6 +4197,10 @@ if 'objc' in all_languages
>  else
>summary_info += {'Objective-C compiler': false}
>  endif
> +summary_info += {'Rust support':  with_rust}
> +if with_rust and get_option('with_rust_target_triple') != ''
> +  summary_info += {'Rust target': get_option('with_rust_target_triple')}
> +endif
>  option_cflags = (get_option('debug') ? ['-g'] : [])
>  if get_option('optimization') != 'plain'
>option_cflags += ['-O' + get_option('optimization')]
> diff --git a/meson_options.txt b/meson_options.txt
> index 4c1583eb40..223491b731 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -366,3 +366,7 @@ option('qemu_ga_version', type: 'string', value: '',
>  
>  option('hexagon_idef_parser', type : 'boolean', value : true,
> description: 'use idef-parser to automatically generate TCG code for 
> the Hexagon frontend')
> +option('with_rust', type: 'feature', value: 'auto',
> +   description: 'Enable Rust support')
> +option('with_rust_target_triple', type : 'string', value: '',
> +   description: 'Rust target triple')
> diff --git a/scripts/cargo_wrapper.py b/scripts/cargo_wrapper.py
> new file mode 100644
> index 00..d338effdaa
> --- /dev/null
> +++ b/scripts/cargo_wrapper.py
> @@ -0,0 +1,211 @@
> +#!/usr/bin/env python3
> +# Copyright (c) 2020 Red Hat, Inc.
> +# Copyright (c) 2023 Linaro Ltd.
> +#
> +# Authors:
> +#  Manos Pitsidianakis 
> +#  Marc-André Lureau 
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later.  See the COPYING file in the top-level directory.
> +
> +import argparse
> +import configparser
> +import distutils.file_util

Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context

2024-06-11 Thread Stefan Hajnoczi
On Tue, Jun 11, 2024 at 02:08:49PM +0200, Fiona Ebner wrote:
> Am 06.06.24 um 20:36 schrieb Stefan Hajnoczi:
> > On Wed, Jun 05, 2024 at 02:08:48PM +0200, Fiona Ebner wrote:
> >> The fact that the snapshot_save_job_bh() is scheduled in the main
> >> loop's qemu_aio_context AioContext means that it might get executed
> >> during a vCPU thread's aio_poll(). But saving of the VM state cannot
> >> happen while the guest or devices are active and can lead to assertion
> >> failures. See issue #2111 for two examples. Avoid the problem by
> >> scheduling the snapshot_save_job_bh() in the iohandler AioContext,
> >> which is not polled by vCPU threads.
> >>
> >> Solves Issue #2111.
> >>
> >> This change also solves the following issue:
> >>
> >> Since commit effd60c878 ("monitor: only run coroutine commands in
> >> qemu_aio_context"), the 'snapshot-save' QMP call would not respond
> >> right after starting the job anymore, but only after the job finished,
> >> which can take a long time. The reason is, because after commit
> >> effd60c878, do_qmp_dispatch_bh() runs in the iohandler AioContext.
> >> When do_qmp_dispatch_bh() wakes the qmp_dispatch() coroutine, the
> >> coroutine cannot be entered immediately anymore, but needs to be
> >> scheduled to the main loop's qemu_aio_context AioContext. But
> >> snapshot_save_job_bh() was scheduled first to the same AioContext and
> >> thus gets executed first.
> >>
> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/2111
> >> Signed-off-by: Fiona Ebner 
> >> ---
> >>
> >> While initial smoke testing seems fine, I'm not familiar enough with
> >> this to rule out any pitfalls with the approach. Any reason why
> >> scheduling to the iohandler AioContext could be wrong here?
> > 
> > If something waits for a BlockJob to finish using aio_poll() from
> > qemu_aio_context then a deadlock is possible since the iohandler_ctx
> > won't get a chance to execute. The only suspicious code path I found was
> > job_completed_txn_abort_locked() -> job_finish_sync_locked() but I'm not
> > sure whether it triggers this scenario. Please check that code path.
> > 
> 
> Sorry, I don't understand. Isn't executing the scheduled BH the only
> additional progress that the iohandler_ctx needs to make compared to
> before the patch? How exactly would that cause issues when waiting for a
> BlockJob?
> 
> Or do you mean something waiting for the SnapshotJob from
> qemu_aio_context before snapshot_save_job_bh had the chance to run?

Yes, exactly. job_finish_sync_locked() will hang since iohandler_ctx has
no chance to execute. But I haven't audited the code to understand
whether this can happen.

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH v1 0/6] Implement ARM PL011 in Rust

2024-06-10 Thread Stefan Hajnoczi
On Mon, 10 Jun 2024 at 16:27, Manos Pitsidianakis
 wrote:
>
> On Mon, 10 Jun 2024 22:59, Stefan Hajnoczi  wrote:
> >> What are the issues with not using the compiler, rustc, directly?
> >> -
> >> [whataretheissueswith] Back to [TOC]
> >>
> >> 1. Tooling
> >>Mostly writing up the build-sys tooling to do so. Ideally we'd
> >>compile everything without cargo but rustc directly.
> >
> >Why would that be ideal?
>
> It remove the indirection level of meson<->cargo<->rustc. I don't have a
> concrete idea on how to tackle this, but if cargo ends up not strictly
> necessary, I don't see why we cannot use one build system.

The convenience of being able to use cargo dependencies without
special QEMU meson build system effort seems worth the overhead of
meson<->cargo<->rustc to me. There is a blog post that explores using
cargo crates using meson's wrap dependencies here, and it seems like
extra work:
https://coaxion.net/blog/2023/04/building-a-gstreamer-plugin-in-rust-with-meson-instead-of-cargo/

It's possible to use just meson today, but I don't think it's
practical when using cargo dependencies.

>
> >
> >>
> >>If we decide we need Rust's `std` library support, we could
> >>investigate whether building it from scratch is a good solution. This
> >>will only build the bits we need in our devices.
> >
> >Whether or not to use std is a fundamental decision. It might be
> >difficult to back from std later on. This is something that should be
> >discussed in more detail.
> >
> >Do you want to avoid std for maximum flexibility in the future, or are
> >there QEMU use cases today where std is unavailable?
>
> For flexibility, and for being compatible with more versions.
>
> But I do not want to avoid it, what I am saying is we can do a custom
> build of it instead of linking to the rust toolchain's prebuilt version.

What advantages does a custom build of std bring?

>
> >
> >>
> >> 2. Rust dependencies
> >>We could go without them completely. I chose deliberately to include
> >>one dependency in my UART implementation, `bilge`[0], because it has
> >>an elegant way of representing typed bitfields for the UART's
> >>registers.
> >>
> >> [0]: Article: https://hecatia-elegua.github.io/blog/no-more-bit-fiddling/
> >>  Crates.io page: https://crates.io/crates/bilge
> >>  Repository: https://github.com/hecatia-elegua/bilge
> >
> >I guess there will be interest in using rust-vmm crates in some way.
> >
> >Bindings to platform features that are not available in core or std
> >will also be desirable. We probably don't want to reinvent them.
>
>
> Agreed.
>
> >
> >>
> >> Should QEMU use third-party dependencies?
> >> -
> >> [shouldqemuusethirdparty] Back to [TOC]
> >>
> >> In my personal opinion, if we need a dependency we need a strong
> >> argument for it. A dependency needs a trusted upstream source, a QEMU
> >> maintainer to make sure it us up-to-date in QEMU etc.
> >>
> >> We already fetch some projects with meson subprojects, so this is not a
> >> new reality. Cargo allows you to define "locked" dependencies which is
> >> the same as only fetching specific commits by SHA. No suspicious
> >> tarballs, and no disappearing dependencies a la left-pad in npm.
> >>
> >> However, I believe it's worth considering vendoring every dependency by
> >> default, if they prove to be few, for the sake of having a local QEMU
> >> git clone buildable without network access.
> >
> >Do you mean vendoring by committing them to qemu.git or just the
> >practice of running `cargo vendor` locally for users who decide they
> >want to keep a copy of the dependencies?
>
>
> Committing, with an option to opt-out. They are generally not big in
> size. I am not of strong opinion on this one, I'm very open to
> alternatives.

Fedora and Debian want Rust applications to use distro-packaged
crates. No vendoring and no crates.io online access. It's a bit of a
pain because Rust developers need to make sure their code works with
whatever version of crates Fedora and Debian provide.

The `cargo vendor` command makes it easy for anyone wishing to collect
the required dependencies for offline builds (something I've used for
CentOS builds where vendoring is allowed).

I suggest not vendoring packages in qemu.git. Users can still run
`cargo vendor` for easy 

Re: [RFC PATCH v1 0/6] Implement ARM PL011 in Rust

2024-06-10 Thread Stefan Hajnoczi
On Mon, 10 Jun 2024 at 14:23, Manos Pitsidianakis
 wrote:
>
> Hello everyone,
>
> This is an early draft of my work on implementing a very simple device,
> in this case the ARM PL011 (which in C code resides in hw/char/pl011.c
> and is used in hw/arm/virt.c).
>
> The device is functional, with copied logic from the C code but with
> effort not to make a direct C to Rust translation. In other words, do
> not write Rust as a C developer would.
>
> That goal is not complete but a best-effort case. To give a specific
> example, register values are typed but interrupt bit flags are not (but
> could be). I will leave such minutiae for later iterations.
>
> By the way, the wiki page for Rust was revived to keep track of all
> current series on the mailing list https://wiki.qemu.org/RustInQemu
>
> a #qemu-rust IRC channel was also created for rust-specific discussion
> that might flood #qemu
>
> 
> A request: keep comments to Rust in relation to the QEMU project and no
> debates on the merits of the language itself. These are valid concerns,
> but it'd be better if they were on separate mailing list threads.
> 
>
> Table of contents: [TOC]
>
> - How can I try it? [howcanItryit]
> - What are the most important points to focus on, at this point?
>   [whatarethemostimportant]
>   - What are the issues with not using the compiler, rustc, directly?
> [whataretheissueswith]
> 1. Tooling
> 2. Rust dependencies
>
>   - Should QEMU use third-party dependencies? [shouldqemuusethirdparty]
>   - Should QEMU provide wrapping Rust APIs over QEMU internals?
> [qemuprovidewrappingrustapis]
>   - Will QEMU now depend on Rust and thus not build on my XYZ platform?
> [qemudependonrustnotbuildonxyz]
> - How is the compilation structured? [howisthecompilationstructured]
> - The generated.rs rust file includes a bunch of junk definitions?
>   [generatedrsincludesjunk]
> - The staticlib artifact contains a bunch of mangled .o objects?
>   [staticlibmangledobjects]
>
> How can I try it?
> =
> [howcanItryit] Back to [TOC]
>
> Hopefully applying this patches (or checking out `master` branch from
> https://gitlab.com/epilys/rust-for-qemu/ current commit
> de81929e0e9d470deac2c6b449b7a5183325e7ee )
>
> Tag for this RFC is rust-pl011-rfc-v1
>
> Rustdoc documentation is hosted on
>
> https://rust-for-qemu-epilys-aebb06ca9f9adfe6584811c14ae44156501d935ba4.gitlab.io/pl011/index.html
>
> If `cargo` and `bindgen` is installed in your system, you should be able
> to build qemu-system-aarch64 with configure flag --enable-rust and
> launch an arm virt VM. One of the patches hardcodes the default UART of
> the machine to the Rust one, so if something goes wrong you will see it
> upon launching qemu-system-aarch64.
>
> To confirm it is there for sure, run e.g. info qom-tree on the monitor
> and look for x-pl011-rust.
>
>
> What are the most important points to focus on, at this point?
> ==
> [whatarethemostimportant] Back to [TOC]
>
> In my opinion, integration of the go-to Rust build system (Cargo and
> crates.io) with the build system we use in QEMU. This is "easily" done
> in some definition of the word with a python wrapper script.
>
> What are the issues with not using the compiler, rustc, directly?
> -
> [whataretheissueswith] Back to [TOC]
>
> 1. Tooling
>Mostly writing up the build-sys tooling to do so. Ideally we'd
>compile everything without cargo but rustc directly.

Why would that be ideal?

>
>If we decide we need Rust's `std` library support, we could
>investigate whether building it from scratch is a good solution. This
>will only build the bits we need in our devices.

Whether or not to use std is a fundamental decision. It might be
difficult to back from std later on. This is something that should be
discussed in more detail.

Do you want to avoid std for maximum flexibility in the future, or are
there QEMU use cases today where std is unavailable?

>
> 2. Rust dependencies
>We could go without them completely. I chose deliberately to include
>one dependency in my UART implementation, `bilge`[0], because it has
>an elegant way of representing typed bitfields for the UART's
>registers.
>
> [0]: Article: https://hecatia-elegua.github.io/blog/no-more-bit-fiddling/
>  Crates.io page: https://crates.io/crates/bilge
>  Repository: https://github.com/hecatia-elegua/bilge

I guess there will be interest in using rust-vmm crates in some way.

Bindings to platform features that are not available in core or std
will also be desirable. We probably don't want to reinvent them.

>
> Should QEMU use third-party dependencies?
> -
> [shouldqemuusethirdparty] Back to 

Re: [PATCH 0/2] virtio-fs: change handling of failure at request enqueue

2024-06-10 Thread Stefan Hajnoczi
On Fri, May 17, 2024 at 09:04:33PM +0200, Peter-Jan Gootzen wrote:
> This patch set aims to improve the latencies of virtio-fs requests when
> the system is under high load, namely when the application's IO depth
> is greater than the size of the Virtio queue.
> 
> We found that the handling of -ENOMEM when enqueueing requests is
> inconsistent with other parts of the kernel (e.g. FUSE) and also
> obstructs optimizing the enqueueing behavior.
> 
> It is important to first remove the -ENOMEM behavior as the new style of
> retrying virtio-fs requests in patch 2/2 is only correct in case of
> -ENOSPC. With -ENOMEM the failed enqueue might never be retried as there
> might not be another completion to trigger retrying the enqueue.
> 
> Note that this patch series is a revival of my patch that was last
> discussed in the mailing list on 2023-08-16.
> 
> Peter-Jan Gootzen (2):
>   virtio-fs: let -ENOMEM bubble up or burst gently
>   virtio-fs: improved request latencies when Virtio queue is full
> 
>  fs/fuse/virtio_fs.c | 40 ++--
>  1 file changed, 22 insertions(+), 18 deletions(-)
> 
> -- 
> 2.34.1
> 

This is a nice improvement, thank you!

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH v5 01/10] block: add persistent reservation in/out api

2024-06-10 Thread Stefan Hajnoczi
On Thu, Jun 06, 2024 at 08:24:35PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations
> at the block level. The following operations
> are included:
> 
> - read_keys:retrieves the list of registered keys.
> - read_reservation: retrieves the current reservation status.
> - register: registers a new reservation key.
> - reserve:  initiates a reservation for a specific key.
> - release:  releases a reservation for a specific key.
> - clear:clears all existing reservations.
> - preempt:  preempts a reservation held by another key.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/block-backend.c | 397 ++
>  block/io.c| 163 
>  include/block/block-common.h  |  40 +++
>  include/block/block-io.h  |  20 ++
>  include/block/block_int-common.h  |  84 +++
>  include/sysemu/block-backend-io.h |  24 ++
>  6 files changed, 728 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index db6f9b92a3..6707d94df7 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1770,6 +1770,403 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned 
> long int req, void *buf,
>  return blk_aio_prwv(blk, req, 0, buf, blk_aio_ioctl_entry, 0, cb, 
> opaque);
>  }
>  
> +typedef struct BlkPrInCo {
> +BlockBackend *blk;
> +uint32_t *generation;
> +uint32_t num_keys;
> +BlockPrType *type;
> +uint64_t *keys;
> +int ret;
> +} BlkPrInCo;
> +
> +typedef struct BlkPrInCB {
> +BlockAIOCB common;
> +BlkPrInCo prco;
> +bool has_returned;
> +} BlkPrInCB;
> +
> +static const AIOCBInfo blk_pr_in_aiocb_info = {
> +.aiocb_size = sizeof(BlkPrInCB),
> +};
> +
> +static void blk_pr_in_complete(BlkPrInCB *acb)
> +{
> +if (acb->has_returned) {
> +acb->common.cb(acb->common.opaque, acb->prco.ret);
> +blk_dec_in_flight(acb->prco.blk);

Did you receive my replies to v1 of this patch series?

Please take a look at them and respond:
https://lore.kernel.org/qemu-devel/20240508093629.441057-1-luchangqi@bytedance.com/

Thanks,
Stefan

> +qemu_aio_unref(acb);
> +}
> +}
> +
> +static void blk_pr_in_complete_bh(void *opaque)
> +{
> +BlkPrInCB *acb = opaque;
> +assert(acb->has_returned);
> +blk_pr_in_complete(acb);
> +}
> +
> +static BlockAIOCB *blk_aio_pr_in(BlockBackend *blk, uint32_t *generation,
> + uint32_t num_keys, BlockPrType *type,
> + uint64_t *keys, CoroutineEntry co_entry,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> +BlkPrInCB *acb;
> +Coroutine *co;
> +
> +blk_inc_in_flight(blk);
> +acb = blk_aio_get(_pr_in_aiocb_info, blk, cb, opaque);
> +acb->prco = (BlkPrInCo) {
> +.blk= blk,
> +.generation = generation,
> +.num_keys   = num_keys,
> +.type   = type,
> +.ret= NOT_DONE,
> +.keys   = keys,
> +};
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(co_entry, acb);
> +aio_co_enter(qemu_get_current_aio_context(), co);
> +
> +acb->has_returned = true;
> +if (acb->prco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
> + blk_pr_in_complete_bh, acb);
> +}
> +
> +return >common;
> +}
> +
> +/* To be called between exactly one pair of blk_inc/dec_in_flight() */
> +static int coroutine_fn
> +blk_aio_pr_do_read_keys(BlockBackend *blk, uint32_t *generation,
> +uint32_t num_keys, uint64_t *keys)
> +{
> +IO_CODE();
> +
> +blk_wait_while_drained(blk);
> +GRAPH_RDLOCK_GUARD();
> +
> +if (!blk_co_is_available(blk)) {
> +return -ENOMEDIUM;
> +}
> +
> +return bdrv_co_pr_read_keys(blk_bs(blk), generation, num_keys, keys);
> +}
> +
> +static void coroutine_fn blk_aio_pr_read_keys_entry(void *opaque)
> +{
> +BlkPrInCB *acb = opaque;
> +BlkPrInCo *prco = >prco;
> +
> +prco->ret = blk_aio_pr_do_read_keys(prco->blk, prco->generation,
> +prco->num_keys, prco->keys);
> +blk_pr_in_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_pr_read_keys(BlockBackend *blk, uint32_t *generation,
> + uint32_t num_keys, uint64_t *keys,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> +IO_CODE();
> +return blk_aio_pr_in(blk, generation, num_keys, NULL, keys,
> + blk_aio_pr_read_keys_entry, cb, opaque);
> +}
> +
> +/* To be called between exactly one pair of blk_inc/dec_in_flight() */
> +static int coroutine_fn
> +blk_aio_pr_do_read_reservation(BlockBackend *blk, uint32_t *generation,
> +   uint64_t *key, BlockPrType 

Re: [PATCH v5 01/10] block: add persistent reservation in/out api

2024-06-10 Thread Stefan Hajnoczi
On Thu, Jun 06, 2024 at 08:24:35PM +0800, Changqi Lu wrote:
> Add persistent reservation in/out operations
> at the block level. The following operations
> are included:
> 
> - read_keys:retrieves the list of registered keys.
> - read_reservation: retrieves the current reservation status.
> - register: registers a new reservation key.
> - reserve:  initiates a reservation for a specific key.
> - release:  releases a reservation for a specific key.
> - clear:clears all existing reservations.
> - preempt:  preempts a reservation held by another key.
> 
> Signed-off-by: Changqi Lu 
> Signed-off-by: zhenwei pi 
> ---
>  block/block-backend.c | 397 ++
>  block/io.c| 163 
>  include/block/block-common.h  |  40 +++
>  include/block/block-io.h  |  20 ++
>  include/block/block_int-common.h  |  84 +++
>  include/sysemu/block-backend-io.h |  24 ++
>  6 files changed, 728 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index db6f9b92a3..6707d94df7 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1770,6 +1770,403 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned 
> long int req, void *buf,
>  return blk_aio_prwv(blk, req, 0, buf, blk_aio_ioctl_entry, 0, cb, 
> opaque);
>  }
>  
> +typedef struct BlkPrInCo {
> +BlockBackend *blk;
> +uint32_t *generation;
> +uint32_t num_keys;
> +BlockPrType *type;
> +uint64_t *keys;
> +int ret;
> +} BlkPrInCo;
> +
> +typedef struct BlkPrInCB {
> +BlockAIOCB common;
> +BlkPrInCo prco;
> +bool has_returned;
> +} BlkPrInCB;
> +
> +static const AIOCBInfo blk_pr_in_aiocb_info = {
> +.aiocb_size = sizeof(BlkPrInCB),
> +};
> +
> +static void blk_pr_in_complete(BlkPrInCB *acb)
> +{
> +if (acb->has_returned) {
> +acb->common.cb(acb->common.opaque, acb->prco.ret);
> +blk_dec_in_flight(acb->prco.blk);

Did you receive my replies to v1 of this patch series?

Please take a look at them and respond:
https://lore.kernel.org/qemu-devel/20240508093629.441057-1-luchangqi@bytedance.com/

Thanks,
Stefan

> +qemu_aio_unref(acb);
> +}
> +}
> +
> +static void blk_pr_in_complete_bh(void *opaque)
> +{
> +BlkPrInCB *acb = opaque;
> +assert(acb->has_returned);
> +blk_pr_in_complete(acb);
> +}
> +
> +static BlockAIOCB *blk_aio_pr_in(BlockBackend *blk, uint32_t *generation,
> + uint32_t num_keys, BlockPrType *type,
> + uint64_t *keys, CoroutineEntry co_entry,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> +BlkPrInCB *acb;
> +Coroutine *co;
> +
> +blk_inc_in_flight(blk);
> +acb = blk_aio_get(_pr_in_aiocb_info, blk, cb, opaque);
> +acb->prco = (BlkPrInCo) {
> +.blk= blk,
> +.generation = generation,
> +.num_keys   = num_keys,
> +.type   = type,
> +.ret= NOT_DONE,
> +.keys   = keys,
> +};
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(co_entry, acb);
> +aio_co_enter(qemu_get_current_aio_context(), co);
> +
> +acb->has_returned = true;
> +if (acb->prco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
> + blk_pr_in_complete_bh, acb);
> +}
> +
> +return >common;
> +}
> +
> +/* To be called between exactly one pair of blk_inc/dec_in_flight() */
> +static int coroutine_fn
> +blk_aio_pr_do_read_keys(BlockBackend *blk, uint32_t *generation,
> +uint32_t num_keys, uint64_t *keys)
> +{
> +IO_CODE();
> +
> +blk_wait_while_drained(blk);
> +GRAPH_RDLOCK_GUARD();
> +
> +if (!blk_co_is_available(blk)) {
> +return -ENOMEDIUM;
> +}
> +
> +return bdrv_co_pr_read_keys(blk_bs(blk), generation, num_keys, keys);
> +}
> +
> +static void coroutine_fn blk_aio_pr_read_keys_entry(void *opaque)
> +{
> +BlkPrInCB *acb = opaque;
> +BlkPrInCo *prco = >prco;
> +
> +prco->ret = blk_aio_pr_do_read_keys(prco->blk, prco->generation,
> +prco->num_keys, prco->keys);
> +blk_pr_in_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_pr_read_keys(BlockBackend *blk, uint32_t *generation,
> + uint32_t num_keys, uint64_t *keys,
> + BlockCompletionFunc *cb, void *opaque)
> +{
> +IO_CODE();
> +return blk_aio_pr_in(blk, generation, num_keys, NULL, keys,
> + blk_aio_pr_read_keys_entry, cb, opaque);
> +}
> +
> +/* To be called between exactly one pair of blk_inc/dec_in_flight() */
> +static int coroutine_fn
> +blk_aio_pr_do_read_reservation(BlockBackend *blk, uint32_t *generation,
> +   uint64_t *key, BlockPrType 

Re: [PATCH v5 00/10] Support persistent reservation operations

2024-06-10 Thread Stefan Hajnoczi
On Thu, Jun 06, 2024 at 08:24:34PM +0800, Changqi Lu wrote:
> Hi,
> 
> patchv5 has been modified. 
> 
> Sincerely hope that everyone can help review the
> code and provide some suggestions.
> 
> v4->v5:
> - Fixed a memory leak bug at hw/nvme/ctrl.c.
> 
> v3->v4:
> - At the nvme layer, the two patches of enabling the ONCS
>   function and enabling rescap are combined into one.
> - At the nvme layer, add helper functions for pr capacity
>   conversion between the block layer and the nvme layer.
> 
> v2->v3:
> In v2 Persist Through Power Loss(PTPL) is enable default.
> In v3 PTPL is supported, which is passed as a parameter.
> 
> v1->v2:
> - Add sg_persist --report-capabilities for SCSI protocol and enable
>   oncs and rescap for NVMe protocol.
> - Add persistent reservation capabilities constants and helper functions for
>   SCSI and NVMe protocol.
> - Add comments for necessary APIs.
> 
> v1:
> - Add seven APIs about persistent reservation command for block layer.
>   These APIs including reading keys, reading reservations, registering,
>   reserving, releasing, clearing and preempting.
> - Add the necessary pr-related operation APIs for both the
>   SCSI protocol and NVMe protocol at the device layer.
> - Add scsi driver at the driver layer to verify the functions

My question from v1 is unanswered:

  What is the relationship to the existing PRManager functionality
  (docs/interop/pr-helper.rst) where block/file-posix.c interprets SCSI
  ioctls and sends persistent reservation requests to an external helper
  process?

  I wonder if block/file-posix.c can implement the new block driver
  callbacks using pr_mgr (while keeping the existing scsi-generic
  support).

Thanks,
Stefan

> 
> 
> Changqi Lu (10):
>   block: add persistent reservation in/out api
>   block/raw: add persistent reservation in/out driver
>   scsi/constant: add persistent reservation in/out protocol constants
>   scsi/util: add helper functions for persistent reservation types
> conversion
>   hw/scsi: add persistent reservation in/out api for scsi device
>   block/nvme: add reservation command protocol constants
>   hw/nvme: add helper functions for converting reservation types
>   hw/nvme: enable ONCS and rescap function
>   hw/nvme: add reservation protocal command
>   block/iscsi: add persistent reservation in/out driver
> 
>  block/block-backend.c | 397 ++
>  block/io.c| 163 +++
>  block/iscsi.c | 443 ++
>  block/raw-format.c|  56 
>  hw/nvme/ctrl.c| 326 +-
>  hw/nvme/ns.c  |   5 +
>  hw/nvme/nvme.h|  84 ++
>  hw/scsi/scsi-disk.c   | 352 
>  include/block/block-common.h  |  40 +++
>  include/block/block-io.h  |  20 ++
>  include/block/block_int-common.h  |  84 ++
>  include/block/nvme.h  |  98 +++
>  include/scsi/constants.h  |  52 
>  include/scsi/utils.h  |   8 +
>  include/sysemu/block-backend-io.h |  24 ++
>  scsi/utils.c  |  81 ++
>  16 files changed, 2231 insertions(+), 2 deletions(-)
> 
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature


Re: [PATCH v5 00/10] Support persistent reservation operations

2024-06-10 Thread Stefan Hajnoczi
On Thu, Jun 06, 2024 at 08:24:34PM +0800, Changqi Lu wrote:
> Hi,
> 
> patchv5 has been modified. 
> 
> Sincerely hope that everyone can help review the
> code and provide some suggestions.
> 
> v4->v5:
> - Fixed a memory leak bug at hw/nvme/ctrl.c.
> 
> v3->v4:
> - At the nvme layer, the two patches of enabling the ONCS
>   function and enabling rescap are combined into one.
> - At the nvme layer, add helper functions for pr capacity
>   conversion between the block layer and the nvme layer.
> 
> v2->v3:
> In v2 Persist Through Power Loss(PTPL) is enable default.
> In v3 PTPL is supported, which is passed as a parameter.
> 
> v1->v2:
> - Add sg_persist --report-capabilities for SCSI protocol and enable
>   oncs and rescap for NVMe protocol.
> - Add persistent reservation capabilities constants and helper functions for
>   SCSI and NVMe protocol.
> - Add comments for necessary APIs.
> 
> v1:
> - Add seven APIs about persistent reservation command for block layer.
>   These APIs including reading keys, reading reservations, registering,
>   reserving, releasing, clearing and preempting.
> - Add the necessary pr-related operation APIs for both the
>   SCSI protocol and NVMe protocol at the device layer.
> - Add scsi driver at the driver layer to verify the functions

My question from v1 is unanswered:

  What is the relationship to the existing PRManager functionality
  (docs/interop/pr-helper.rst) where block/file-posix.c interprets SCSI
  ioctls and sends persistent reservation requests to an external helper
  process?

  I wonder if block/file-posix.c can implement the new block driver
  callbacks using pr_mgr (while keeping the existing scsi-generic
  support).

Thanks,
Stefan

> 
> 
> Changqi Lu (10):
>   block: add persistent reservation in/out api
>   block/raw: add persistent reservation in/out driver
>   scsi/constant: add persistent reservation in/out protocol constants
>   scsi/util: add helper functions for persistent reservation types
> conversion
>   hw/scsi: add persistent reservation in/out api for scsi device
>   block/nvme: add reservation command protocol constants
>   hw/nvme: add helper functions for converting reservation types
>   hw/nvme: enable ONCS and rescap function
>   hw/nvme: add reservation protocal command
>   block/iscsi: add persistent reservation in/out driver
> 
>  block/block-backend.c | 397 ++
>  block/io.c| 163 +++
>  block/iscsi.c | 443 ++
>  block/raw-format.c|  56 
>  hw/nvme/ctrl.c| 326 +-
>  hw/nvme/ns.c  |   5 +
>  hw/nvme/nvme.h|  84 ++
>  hw/scsi/scsi-disk.c   | 352 
>  include/block/block-common.h  |  40 +++
>  include/block/block-io.h  |  20 ++
>  include/block/block_int-common.h  |  84 ++
>  include/block/nvme.h  |  98 +++
>  include/scsi/constants.h  |  52 
>  include/scsi/utils.h  |   8 +
>  include/sysemu/block-backend-io.h |  24 ++
>  scsi/utils.c  |  81 ++
>  16 files changed, 2231 insertions(+), 2 deletions(-)
> 
> -- 
> 2.20.1
> 


signature.asc
Description: PGP signature


[PULL 5/6] hw/vfio: Remove newline character in trace events

2024-06-10 Thread Stefan Hajnoczi
From: Philippe Mathieu-Daudé 

Trace events aren't designed to be multi-lines.
Remove the newline characters.

Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Mads Ynddal 
Reviewed-by: Daniel P. Berrangé 
Message-id: 20240606103943.79116-5-phi...@linaro.org
Signed-off-by: Stefan Hajnoczi 
---
 hw/vfio/trace-events | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 64161bf6f4..e16179b507 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -19,7 +19,7 @@ vfio_msix_fixup(const char *name, int bar, uint64_t start, 
uint64_t end) " (%s)
 vfio_msix_relo(const char *name, int bar, uint64_t offset) " (%s) BAR %d 
offset 0x%"PRIx64""
 vfio_msi_enable(const char *name, int nr_vectors) " (%s) Enabled %d MSI 
vectors"
 vfio_msi_disable(const char *name) " (%s)"
-vfio_pci_load_rom(const char *name, unsigned long size, unsigned long offset, 
unsigned long flags) "Device %s ROM:\n  size: 0x%lx, offset: 0x%lx, flags: 
0x%lx"
+vfio_pci_load_rom(const char *name, unsigned long size, unsigned long offset, 
unsigned long flags) "Device '%s' ROM: size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
 vfio_rom_read(const char *name, uint64_t addr, int size, uint64_t data) " (%s, 
0x%"PRIx64", 0x%x) = 0x%"PRIx64
 vfio_pci_size_rom(const char *name, int size) "%s ROM size 0x%x"
 vfio_vga_write(uint64_t addr, uint64_t data, int size) " (0x%"PRIx64", 
0x%"PRIx64", %d)"
@@ -35,7 +35,7 @@ vfio_pci_hot_reset(const char *name, const char *type) " (%s) 
%s"
 vfio_pci_hot_reset_has_dep_devices(const char *name) "%s: hot reset dependent 
devices:"
 vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, 
int group_id) "\t%04x:%02x:%02x.%x group %d"
 vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: 
%s"
-vfio_populate_device_config(const char *name, unsigned long size, unsigned 
long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 
0x%lx, flags: 0x%lx"
+vfio_populate_device_config(const char *name, unsigned long size, unsigned 
long offset, unsigned long flags) "Device '%s' config: size: 0x%lx, offset: 
0x%lx, flags: 0x%lx"
 vfio_populate_device_get_irq_info_failure(const char *errstr) 
"VFIO_DEVICE_GET_IRQ_INFO failure: %s"
 vfio_attach_device(const char *name, int group_id) " (%s) group %d"
 vfio_detach_device(const char *name, int group_id) " (%s) group %d"
-- 
2.45.1




[PULL 6/6] tracetool: Forbid newline character in event format

2024-06-10 Thread Stefan Hajnoczi
From: Philippe Mathieu-Daudé 

Events aren't designed to be multi-lines. Multiple events
can be used instead. Prevent that format using multi-lines
by forbidding the newline character.

Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Mads Ynddal 
Reviewed-by: Daniel P. Berrangé 
Message-id: 20240606103943.79116-6-phi...@linaro.org
Signed-off-by: Stefan Hajnoczi 
---
 scripts/tracetool/__init__.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 7237abe0e8..bc03238c0f 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -301,6 +301,8 @@ def build(line_str, lineno, filename):
 if fmt.endswith(r'\n"'):
 raise ValueError("Event format must not end with a newline "
  "character")
+if '\\n' in fmt:
+raise ValueError("Event format must not use new line character")
 
 if len(fmt_trans) > 0:
 fmt = [fmt_trans, fmt]
-- 
2.45.1




[PULL 4/6] hw/usb: Remove newline character in trace events

2024-06-10 Thread Stefan Hajnoczi
From: Philippe Mathieu-Daudé 

Trace events aren't designed to be multi-lines.
Remove the newline characters.

Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Mads Ynddal 
Reviewed-by: Daniel P. Berrangé 
Message-id: 20240606103943.79116-4-phi...@linaro.org
Signed-off-by: Stefan Hajnoczi 
---
 hw/usb/trace-events | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/usb/trace-events b/hw/usb/trace-events
index fd7b90d70c..46732717a9 100644
--- a/hw/usb/trace-events
+++ b/hw/usb/trace-events
@@ -15,7 +15,7 @@ usb_ohci_exit(const char *s) "%s"
 
 # hcd-ohci.c
 usb_ohci_iso_td_read_failed(uint32_t addr) "ISO_TD read error at 0x%x"
-usb_ohci_iso_td_head(uint32_t head, uint32_t tail, uint32_t flags, uint32_t 
bp, uint32_t next, uint32_t be, uint32_t framenum, uint32_t startframe, 
uint32_t framecount, int rel_frame_num) "ISO_TD ED head 0x%.8x tailp 
0x%.8x\n0x%.8x 0x%.8x 0x%.8x 0x%.8x\nframe_number 0x%.8x starting_frame 
0x%.8x\nframe_count  0x%.8x relative %d"
+usb_ohci_iso_td_head(uint32_t head, uint32_t tail, uint32_t flags, uint32_t 
bp, uint32_t next, uint32_t be, uint32_t framenum, uint32_t startframe, 
uint32_t framecount, int rel_frame_num) "ISO_TD ED head 0x%.8x tailp 0x%.8x, 
flags 0x%.8x bp 0x%.8x next 0x%.8x be 0x%.8x, frame_number 0x%.8x 
starting_frame 0x%.8x, frame_count 0x%.8x relative %d"
 usb_ohci_iso_td_head_offset(uint32_t o0, uint32_t o1, uint32_t o2, uint32_t 
o3, uint32_t o4, uint32_t o5, uint32_t o6, uint32_t o7) "0x%.8x 0x%.8x 0x%.8x 
0x%.8x 0x%.8x 0x%.8x 0x%.8x 0x%.8x"
 usb_ohci_iso_td_relative_frame_number_neg(int rel) "ISO_TD R=%d < 0"
 usb_ohci_iso_td_relative_frame_number_big(int rel, int count) "ISO_TD R=%d > 
FC=%d"
@@ -23,7 +23,7 @@ usb_ohci_iso_td_bad_direction(int dir) "Bad direction %d"
 usb_ohci_iso_td_bad_bp_be(uint32_t bp, uint32_t be) "ISO_TD bp 0x%.8x be 
0x%.8x"
 usb_ohci_iso_td_bad_cc_not_accessed(uint32_t start, uint32_t next) "ISO_TD cc 
!= not accessed 0x%.8x 0x%.8x"
 usb_ohci_iso_td_bad_cc_overrun(uint32_t start, uint32_t next) "ISO_TD 
start_offset=0x%.8x > next_offset=0x%.8x"
-usb_ohci_iso_td_so(uint32_t so, uint32_t eo, uint32_t s, uint32_t e, const 
char *str, ssize_t len, int ret) "0x%.8x eo 0x%.8x\nsa 0x%.8x ea 0x%.8x\ndir %s 
len %zu ret %d"
+usb_ohci_iso_td_so(uint32_t so, uint32_t eo, uint32_t s, uint32_t e, const 
char *str, ssize_t len, int ret) "0x%.8x eo 0x%.8x sa 0x%.8x ea 0x%.8x dir %s 
len %zu ret %d"
 usb_ohci_iso_td_data_overrun(int ret, ssize_t len) "DataOverrun %d > %zu"
 usb_ohci_iso_td_data_underrun(int ret) "DataUnderrun %d"
 usb_ohci_iso_td_nak(int ret) "got NAK/STALL %d"
@@ -55,7 +55,7 @@ usb_ohci_td_pkt_full(const char *dir, const char *buf) "%s 
data: %s"
 usb_ohci_td_too_many_pending(int ep) "ep=%d"
 usb_ohci_td_packet_status(int status) "status=%d"
 usb_ohci_ed_read_error(uint32_t addr) "ED read error at 0x%x"
-usb_ohci_ed_pkt(uint32_t cur, int h, int c, uint32_t head, uint32_t tail, 
uint32_t next) "ED @ 0x%.8x h=%u c=%u\n  head=0x%.8x tailp=0x%.8x next=0x%.8x"
+usb_ohci_ed_pkt(uint32_t cur, int h, int c, uint32_t head, uint32_t tail, 
uint32_t next) "ED @ 0x%.8x h=%u c=%u head=0x%.8x tailp=0x%.8x next=0x%.8x"
 usb_ohci_ed_pkt_flags(uint32_t fa, uint32_t en, uint32_t d, int s, int k, int 
f, uint32_t mps) "fa=%u en=%u d=%u s=%u k=%u f=%u mps=%u"
 usb_ohci_hcca_read_error(uint32_t addr) "HCCA read error at 0x%x"
 usb_ohci_mem_read(uint32_t size, const char *name, uint32_t addr, uint32_t 
offs, uint32_t val) "%d %s 0x%x %d -> 0x%x"
-- 
2.45.1




[PULL 3/6] hw/sh4: Remove newline character in trace events

2024-06-10 Thread Stefan Hajnoczi
From: Philippe Mathieu-Daudé 

Trace events aren't designed to be multi-lines. Remove
the newline character which doesn't bring much value.

Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Mads Ynddal 
Reviewed-by: Daniel P. Berrangé 
Message-id: 20240606103943.79116-3-phi...@linaro.org
Signed-off-by: Stefan Hajnoczi 
---
 hw/sh4/trace-events | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/sh4/trace-events b/hw/sh4/trace-events
index 4b61cd56c8..6bfd7eebc4 100644
--- a/hw/sh4/trace-events
+++ b/hw/sh4/trace-events
@@ -1,3 +1,3 @@
 # sh7750.c
-sh7750_porta(uint16_t prev, uint16_t cur, uint16_t pdtr, uint16_t pctr) "porta 
changed from 0x%04x to 0x%04x\npdtra=0x%04x, pctra=0x%08x"
-sh7750_portb(uint16_t prev, uint16_t cur, uint16_t pdtr, uint16_t pctr) "portb 
changed from 0x%04x to 0x%04x\npdtrb=0x%04x, pctrb=0x%08x"
+sh7750_porta(uint16_t prev, uint16_t cur, uint16_t pdtr, uint16_t pctr) "porta 
changed from 0x%04x to 0x%04x (pdtra=0x%04x, pctra=0x%08x)"
+sh7750_portb(uint16_t prev, uint16_t cur, uint16_t pdtr, uint16_t pctr) "portb 
changed from 0x%04x to 0x%04x (pdtrb=0x%04x, pctrb=0x%08x)"
-- 
2.45.1




[PULL 2/6] backends/tpm: Remove newline character in trace event

2024-06-10 Thread Stefan Hajnoczi
From: Philippe Mathieu-Daudé 

Split the 'tpm_util_show_buffer' event in two to avoid
using a newline character.

Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Mads Ynddal 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Stefan Berger 
Message-id: 20240606103943.79116-2-phi...@linaro.org
Signed-off-by: Stefan Hajnoczi 
---
 backends/tpm/tpm_util.c   | 5 +++--
 backends/tpm/trace-events | 3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/backends/tpm/tpm_util.c b/backends/tpm/tpm_util.c
index 1856589c3b..cf138551df 100644
--- a/backends/tpm/tpm_util.c
+++ b/backends/tpm/tpm_util.c
@@ -339,10 +339,11 @@ void tpm_util_show_buffer(const unsigned char *buffer,
 size_t len, i;
 char *line_buffer, *p;
 
-if (!trace_event_get_state_backends(TRACE_TPM_UTIL_SHOW_BUFFER)) {
+if (!trace_event_get_state_backends(TRACE_TPM_UTIL_SHOW_BUFFER_CONTENT)) {
 return;
 }
 len = MIN(tpm_cmd_get_size(buffer), buffer_size);
+trace_tpm_util_show_buffer_header(string, len);
 
 /*
  * allocate enough room for 3 chars per buffer entry plus a
@@ -356,7 +357,7 @@ void tpm_util_show_buffer(const unsigned char *buffer,
 }
 p += sprintf(p, "%.2X ", buffer[i]);
 }
-trace_tpm_util_show_buffer(string, len, line_buffer);
+trace_tpm_util_show_buffer_content(line_buffer);
 
 g_free(line_buffer);
 }
diff --git a/backends/tpm/trace-events b/backends/tpm/trace-events
index 1ecef42a07..cb5cfa6510 100644
--- a/backends/tpm/trace-events
+++ b/backends/tpm/trace-events
@@ -10,7 +10,8 @@ tpm_util_get_buffer_size_len(uint32_t len, size_t expected) 
"tpm_resp->len = %u,
 tpm_util_get_buffer_size_hdr_len2(uint32_t len, size_t expected) 
"tpm2_resp->hdr.len = %u, expected = %zu"
 tpm_util_get_buffer_size_len2(uint32_t len, size_t expected) "tpm2_resp->len = 
%u, expected = %zu"
 tpm_util_get_buffer_size(size_t len) "buffersize of device: %zu"
-tpm_util_show_buffer(const char *direction, size_t len, const char *buf) 
"direction: %s len: %zu\n%s"
+tpm_util_show_buffer_header(const char *direction, size_t len) "direction: %s 
len: %zu"
+tpm_util_show_buffer_content(const char *buf) "%s"
 
 # tpm_emulator.c
 tpm_emulator_set_locality(uint8_t locty) "setting locality to %d"
-- 
2.45.1




[PULL 1/6] tracetool: Remove unused vcpu.py script

2024-06-10 Thread Stefan Hajnoczi
From: Philippe Mathieu-Daudé 

vcpu.py is pointless since commit 89aafcf2a7 ("trace:
remove code that depends on setting vcpu"), remote it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Zhao Liu 
Message-id: 20240606102631.78152-1-phi...@linaro.org
Signed-off-by: Stefan Hajnoczi 
---
 meson.build   |  1 -
 scripts/tracetool/__init__.py |  8 +
 scripts/tracetool/vcpu.py | 59 ---
 3 files changed, 1 insertion(+), 67 deletions(-)
 delete mode 100644 scripts/tracetool/vcpu.py

diff --git a/meson.build b/meson.build
index ec59effca2..91278667ea 100644
--- a/meson.build
+++ b/meson.build
@@ -3232,7 +3232,6 @@ tracetool_depends = files(
   'scripts/tracetool/format/log_stap.py',
   'scripts/tracetool/format/stap.py',
   'scripts/tracetool/__init__.py',
-  'scripts/tracetool/vcpu.py'
 )
 
 qemu_version_cmd = [find_program('scripts/qemu-version.sh'),
diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index b887540a55..7237abe0e8 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -306,13 +306,7 @@ def build(line_str, lineno, filename):
 fmt = [fmt_trans, fmt]
 args = Arguments.build(groups["args"])
 
-event = Event(name, props, fmt, args, lineno, filename)
-
-# add implicit arguments when using the 'vcpu' property
-import tracetool.vcpu
-event = tracetool.vcpu.transform_event(event)
-
-return event
+return Event(name, props, fmt, args, lineno, filename)
 
 def __repr__(self):
 """Evaluable string representation for this object."""
diff --git a/scripts/tracetool/vcpu.py b/scripts/tracetool/vcpu.py
deleted file mode 100644
index d232cb1d06..00
--- a/scripts/tracetool/vcpu.py
+++ /dev/null
@@ -1,59 +0,0 @@
-# -*- coding: utf-8 -*-
-
-"""
-Generic management for the 'vcpu' property.
-
-"""
-
-__author__ = "Lluís Vilanova "
-__copyright__  = "Copyright 2016, Lluís Vilanova "
-__license__= "GPL version 2 or (at your option) any later version"
-
-__maintainer__ = "Stefan Hajnoczi"
-__email__  = "stefa...@redhat.com"
-
-
-from tracetool import Arguments, try_import
-
-
-def transform_event(event):
-"""Transform event to comply with the 'vcpu' property (if present)."""
-if "vcpu" in event.properties:
-event.args = Arguments([("void *", "__cpu"), event.args])
-fmt = "\"cpu=%p \""
-event.fmt = fmt + event.fmt
-return event
-
-
-def transform_args(format, event, *args, **kwargs):
-"""Transforms the arguments to suit the specified format.
-
-The format module must implement function 'vcpu_args', which receives the
-implicit arguments added by the 'vcpu' property, and must return suitable
-arguments for the given format.
-
-The function is only called for events with the 'vcpu' property.
-
-Parameters
-==
-format : str
-Format module name.
-event : Event
-args, kwargs
-Passed to 'vcpu_transform_args'.
-
-Returns
-===
-Arguments
-The transformed arguments, including the non-implicit ones.
-
-"""
-if "vcpu" in event.properties:
-ok, func = try_import("tracetool.format." + format,
-  "vcpu_transform_args")
-assert ok
-assert func
-return Arguments([func(event.args[:1], *args, **kwargs),
-  event.args[1:]])
-else:
-return event.args
-- 
2.45.1




[PULL 0/6] Tracing patches

2024-06-10 Thread Stefan Hajnoczi
The following changes since commit 80e8f0602168f451a93e71cbb1d59e93d745e62e:

  Merge tag 'bsd-user-misc-2024q2-pull-request' of gitlab.com:bsdimp/qemu into 
staging (2024-06-09 11:21:55 -0700)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/tracing-pull-request

for you to fetch changes up to 4c2b6f328742084a5bd770af7c3a2ef07828c41c:

  tracetool: Forbid newline character in event format (2024-06-10 13:05:27 
-0400)


Pull request

Cleanups from Philippe Mathieu-Daudé.



Philippe Mathieu-Daudé (6):
  tracetool: Remove unused vcpu.py script
  backends/tpm: Remove newline character in trace event
  hw/sh4: Remove newline character in trace events
  hw/usb: Remove newline character in trace events
  hw/vfio: Remove newline character in trace events
  tracetool: Forbid newline character in event format

 meson.build   |  1 -
 backends/tpm/tpm_util.c   |  5 +--
 backends/tpm/trace-events |  3 +-
 hw/sh4/trace-events   |  4 +--
 hw/usb/trace-events   |  6 ++--
 hw/vfio/trace-events  |  4 +--
 scripts/tracetool/__init__.py | 10 ++
 scripts/tracetool/vcpu.py | 59 ---
 8 files changed, 15 insertions(+), 77 deletions(-)
 delete mode 100644 scripts/tracetool/vcpu.py

-- 
2.45.1




Re: [PATCH 0/5] trace: Remove and forbid newline characters in event format

2024-06-10 Thread Stefan Hajnoczi
On Thu, Jun 06, 2024 at 12:39:38PM +0200, Philippe Mathieu-Daudé wrote:
> Trace events aren't designed to be multi-lines.
> Few format use the newline character: remove it
> and forbid further uses.
> 
> Philippe Mathieu-Daudé (5):
>   backends/tpm: Remove newline character in trace event
>   hw/sh4: Remove newline character in trace events
>   hw/usb: Remove newline character in trace events
>   hw/vfio: Remove newline character in trace events
>   tracetool: Forbid newline character in event format
> 
>  backends/tpm/tpm_util.c   | 5 +++--
>  backends/tpm/trace-events | 3 ++-
>  hw/sh4/trace-events   | 4 ++--
>  hw/usb/trace-events   | 6 +++---
>  hw/vfio/trace-events  | 4 ++--
>  scripts/tracetool/__init__.py | 2 ++
>  6 files changed, 14 insertions(+), 10 deletions(-)
> 
> -- 
> 2.41.0
> 

Thanks, applied to my tracing tree:
https://gitlab.com/stefanha/qemu/commits/tracing

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] tracetool: Remove unused vcpu.py script

2024-06-10 Thread Stefan Hajnoczi
On Thu, Jun 06, 2024 at 12:26:31PM +0200, Philippe Mathieu-Daudé wrote:
> vcpu.py is pointless since commit 89aafcf2a7 ("trace:
> remove code that depends on setting vcpu"), remote it.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  meson.build   |  1 -
>  scripts/tracetool/__init__.py |  8 +
>  scripts/tracetool/vcpu.py | 59 ---
>  3 files changed, 1 insertion(+), 67 deletions(-)
>  delete mode 100644 scripts/tracetool/vcpu.py

Thanks, applied to my tracing tree:
https://gitlab.com/stefanha/qemu/commits/tracing

Stefan


signature.asc
Description: PGP signature


Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context

2024-06-06 Thread Stefan Hajnoczi
On Wed, Jun 05, 2024 at 02:08:48PM +0200, Fiona Ebner wrote:
> The fact that the snapshot_save_job_bh() is scheduled in the main
> loop's qemu_aio_context AioContext means that it might get executed
> during a vCPU thread's aio_poll(). But saving of the VM state cannot
> happen while the guest or devices are active and can lead to assertion
> failures. See issue #2111 for two examples. Avoid the problem by
> scheduling the snapshot_save_job_bh() in the iohandler AioContext,
> which is not polled by vCPU threads.
> 
> Solves Issue #2111.
> 
> This change also solves the following issue:
> 
> Since commit effd60c878 ("monitor: only run coroutine commands in
> qemu_aio_context"), the 'snapshot-save' QMP call would not respond
> right after starting the job anymore, but only after the job finished,
> which can take a long time. The reason is, because after commit
> effd60c878, do_qmp_dispatch_bh() runs in the iohandler AioContext.
> When do_qmp_dispatch_bh() wakes the qmp_dispatch() coroutine, the
> coroutine cannot be entered immediately anymore, but needs to be
> scheduled to the main loop's qemu_aio_context AioContext. But
> snapshot_save_job_bh() was scheduled first to the same AioContext and
> thus gets executed first.
> 
> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/2111
> Signed-off-by: Fiona Ebner 
> ---
> 
> While initial smoke testing seems fine, I'm not familiar enough with
> this to rule out any pitfalls with the approach. Any reason why
> scheduling to the iohandler AioContext could be wrong here?

If something waits for a BlockJob to finish using aio_poll() from
qemu_aio_context then a deadlock is possible since the iohandler_ctx
won't get a chance to execute. The only suspicious code path I found was
job_completed_txn_abort_locked() -> job_finish_sync_locked() but I'm not
sure whether it triggers this scenario. Please check that code path.

> Should the same be done for the snapshot_load_job_bh and
> snapshot_delete_job_bh to keep it consistent?

In the long term it would be cleaner to move away from synchronous APIs
that rely on nested event loops. They have been a source of bugs for
years.

If vm_stop() and perhaps other operations in save_snapshot() were
asynchronous, then it would be safe to run the operation in
qemu_aio_context without using iohandler_ctx. vm_stop() wouldn't invoke
its callback until vCPUs were quiesced and outside device emulation
code.

I think this patch is fine as a one-line bug fix, but we should be
careful about falling back on this trick because it makes the codebase
harder to understand and more fragile.

> 
>  migration/savevm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index c621f2359b..0086b76ab0 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -3459,7 +3459,7 @@ static int coroutine_fn snapshot_save_job_run(Job *job, 
> Error **errp)
>  SnapshotJob *s = container_of(job, SnapshotJob, common);
>  s->errp = errp;
>  s->co = qemu_coroutine_self();
> -aio_bh_schedule_oneshot(qemu_get_aio_context(),
> +aio_bh_schedule_oneshot(iohandler_get_aio_context(),
>  snapshot_save_job_bh, job);
>  qemu_coroutine_yield();
>  return s->ret ? 0 : -1;
> -- 
> 2.39.2


signature.asc
Description: PGP signature


Re: [PATCH] target/s390x: Fix tracing header path in TCG mem_helper.c

2024-06-06 Thread Stefan Hajnoczi
On Thu, Jun 06, 2024 at 12:30:26PM +0200, Philippe Mathieu-Daudé wrote:
> Commit c9274b6bf0 ("target/s390x: start moving TCG-only code
> to tcg/") moved mem_helper.c, but the trace-events file is
> still in the parent directory, so is the generated trace.h.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Ideally we should only use trace events from current directory.

Yes, that would be cleaner. Is it possible to move the relevant trace
events to the trace-events file in target/s390x/tcg/?

> ---
>  target/s390x/tcg/mem_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
> index 6a308c5553..1fb6cbb6cf 100644
> --- a/target/s390x/tcg/mem_helper.c
> +++ b/target/s390x/tcg/mem_helper.c
> @@ -30,7 +30,7 @@
>  #include "hw/core/tcg-cpu-ops.h"
>  #include "qemu/int128.h"
>  #include "qemu/atomic128.h"
> -#include "trace.h"
> +#include "../trace.h"
>  
>  #if !defined(CONFIG_USER_ONLY)
>  #include "hw/s390x/storage-keys.h"
> -- 
> 2.41.0
> 


signature.asc
Description: PGP signature


Re: [RFC PATCH 1/1] vhost-user: add shmem mmap request

2024-06-05 Thread Stefan Hajnoczi
On Wed, Jun 5, 2024, 12:02 David Hildenbrand  wrote:

> On 05.06.24 17:19, Stefan Hajnoczi wrote:
> > On Wed, 5 Jun 2024 at 10:29, Stefan Hajnoczi 
> wrote:
> >>
> >> On Wed, Jun 05, 2024 at 10:13:32AM +0200, Albert Esteve wrote:
> >>> On Tue, Jun 4, 2024 at 8:54 PM Stefan Hajnoczi 
> wrote:
> >>>
> >>>> On Thu, May 30, 2024 at 05:22:23PM +0200, Albert Esteve wrote:
> >>>>> Add SHMEM_MAP/UNMAP requests to vhost-user.
> >>>>>
> >>>>> This request allows backends to dynamically map
> >>>>> fds into a shared memory region indentified by
> >>>>
> >>>> Please call this "VIRTIO Shared Memory Region" everywhere (code,
> >>>> vhost-user spec, commit description, etc) so it's clear that this is
> not
> >>>> about vhost-user shared memory tables/regions.
> >>>>
> >>>>> its `shmid`. Then, the fd memory is advertised
> >>>>> to the frontend through a BAR+offset, so it can
> >>>>> be read by the driver while its valid.
> >>>>
> >>>> Why is a PCI BAR mentioned here? vhost-user does not know about the
> >>>> VIRTIO Transport (e.g. PCI) being used. It's the frontend's job to
> >>>> report VIRTIO Shared Memory Regions to the driver.
> >>>>
> >>>>
> >>> I will remove PCI BAR, as it is true that it depends on the
> >>> transport. I was trying to explain that the driver
> >>> will use the shm_base + shm_offset to access
> >>> the mapped memory.
> >>>
> >>>
> >>>>>
> >>>>> Then, the backend can munmap the memory range
> >>>>> in a given shared memory region (again, identified
> >>>>> by its `shmid`), to free it. After this, the
> >>>>> region becomes private and shall not be accessed
> >>>>> by the frontend anymore.
> >>>>
> >>>> What does "private" mean?
> >>>>
> >>>> The frontend must mmap PROT_NONE to reserve the virtual memory space
> >>>> when no fd is mapped in the VIRTIO Shared Memory Region. Otherwise an
> >>>> unrelated mmap(NULL, ...) might use that address range and the guest
> >>>> would have access to the host memory! This is a security issue and
> needs
> >>>> to be mentioned explicitly in the spec.
> >>>>
> >>>
> >>> I mentioned private because it changes the mapping from MAP_SHARED
> >>> to MAP_PRIVATE. I will highlight PROT_NONE instead.
> >>
> >> I see. Then "MAP_PRIVATE" would be clearer. I wasn't sure whether you
> >> mean mmap flags or something like the memory range is no longer
> >> accessible to the driver.
> >
> > One more thing: please check whether kvm.ko memory regions need to be
> > modified or split to match the SHMEM_MAP mapping's read/write
> > permissions.
> >
> > The VIRTIO Shared Memory Area pages can have PROT_READ, PROT_WRITE,
> > PROT_READ|PROT_WRITE, or PROT_NONE.
> >
> > kvm.ko memory regions are read/write or read-only. I'm not sure what
> > happens when the guest accesses a kvm.ko memory region containing
> > mappings with permissions more restrictive than its kvm.ko memory
> > region.
>
> IIRC, the KVM R/O memory region requests could allow to further reduce
> permissions (assuming your mmap is R/W you could map it R/O into the KVM
> MMU), but I might remember things incorrectly.
>

I'm thinking about the opposite case where KVM is configured for R/W but
the mmap is more restrictive. This patch series makes this scenario
possible.


>
> > In other words, the kvm.ko memory region would allow the
> > access but the Linux virtual memory configuration would cause a page
> > fault.
> >
> > For example, imagine a QEMU MemoryRegion containing a SHMEM_MAP
> > mapping with PROT_READ. The kvm.ko memory region would be read/write
> > (unless extra steps were taken to tell kvm.ko about the permissions).
> > When the guest stores to the PROT_READ page, kvm.ko will process a
> > fault...and I'm not sure what happens next.
> >
> > A similar scenario occurs when a PROT_NONE mapping exists within a
> > kvm.ko memory region. I don't remember how kvm.ko behaves when the
> > guest tries to access the pages.
> >
> > It's worth figuring this out before going further because it could
> > become tricky if issues are discovered later. I have CCed David

Re: [RFC PATCH 1/1] vhost-user: add shmem mmap request

2024-06-05 Thread Stefan Hajnoczi
On Wed, 5 Jun 2024 at 10:29, Stefan Hajnoczi  wrote:
>
> On Wed, Jun 05, 2024 at 10:13:32AM +0200, Albert Esteve wrote:
> > On Tue, Jun 4, 2024 at 8:54 PM Stefan Hajnoczi  wrote:
> >
> > > On Thu, May 30, 2024 at 05:22:23PM +0200, Albert Esteve wrote:
> > > > Add SHMEM_MAP/UNMAP requests to vhost-user.
> > > >
> > > > This request allows backends to dynamically map
> > > > fds into a shared memory region indentified by
> > >
> > > Please call this "VIRTIO Shared Memory Region" everywhere (code,
> > > vhost-user spec, commit description, etc) so it's clear that this is not
> > > about vhost-user shared memory tables/regions.
> > >
> > > > its `shmid`. Then, the fd memory is advertised
> > > > to the frontend through a BAR+offset, so it can
> > > > be read by the driver while its valid.
> > >
> > > Why is a PCI BAR mentioned here? vhost-user does not know about the
> > > VIRTIO Transport (e.g. PCI) being used. It's the frontend's job to
> > > report VIRTIO Shared Memory Regions to the driver.
> > >
> > >
> > I will remove PCI BAR, as it is true that it depends on the
> > transport. I was trying to explain that the driver
> > will use the shm_base + shm_offset to access
> > the mapped memory.
> >
> >
> > > >
> > > > Then, the backend can munmap the memory range
> > > > in a given shared memory region (again, identified
> > > > by its `shmid`), to free it. After this, the
> > > > region becomes private and shall not be accessed
> > > > by the frontend anymore.
> > >
> > > What does "private" mean?
> > >
> > > The frontend must mmap PROT_NONE to reserve the virtual memory space
> > > when no fd is mapped in the VIRTIO Shared Memory Region. Otherwise an
> > > unrelated mmap(NULL, ...) might use that address range and the guest
> > > would have access to the host memory! This is a security issue and needs
> > > to be mentioned explicitly in the spec.
> > >
> >
> > I mentioned private because it changes the mapping from MAP_SHARED
> > to MAP_PRIVATE. I will highlight PROT_NONE instead.
>
> I see. Then "MAP_PRIVATE" would be clearer. I wasn't sure whether you
> mean mmap flags or something like the memory range is no longer
> accessible to the driver.

One more thing: please check whether kvm.ko memory regions need to be
modified or split to match the SHMEM_MAP mapping's read/write
permissions.

The VIRTIO Shared Memory Area pages can have PROT_READ, PROT_WRITE,
PROT_READ|PROT_WRITE, or PROT_NONE.

kvm.ko memory regions are read/write or read-only. I'm not sure what
happens when the guest accesses a kvm.ko memory region containing
mappings with permissions more restrictive than its kvm.ko memory
region. In other words, the kvm.ko memory region would allow the
access but the Linux virtual memory configuration would cause a page
fault.

For example, imagine a QEMU MemoryRegion containing a SHMEM_MAP
mapping with PROT_READ. The kvm.ko memory region would be read/write
(unless extra steps were taken to tell kvm.ko about the permissions).
When the guest stores to the PROT_READ page, kvm.ko will process a
fault...and I'm not sure what happens next.

A similar scenario occurs when a PROT_NONE mapping exists within a
kvm.ko memory region. I don't remember how kvm.ko behaves when the
guest tries to access the pages.

It's worth figuring this out before going further because it could
become tricky if issues are discovered later. I have CCed David
Hildenbrand in case he knows.

Stefan



Re: [RFC PATCH 1/1] vhost-user: add shmem mmap request

2024-06-05 Thread Stefan Hajnoczi
On Wed, Jun 05, 2024 at 10:13:32AM +0200, Albert Esteve wrote:
> On Tue, Jun 4, 2024 at 8:54 PM Stefan Hajnoczi  wrote:
> 
> > On Thu, May 30, 2024 at 05:22:23PM +0200, Albert Esteve wrote:
> > > Add SHMEM_MAP/UNMAP requests to vhost-user.
> > >
> > > This request allows backends to dynamically map
> > > fds into a shared memory region indentified by
> >
> > Please call this "VIRTIO Shared Memory Region" everywhere (code,
> > vhost-user spec, commit description, etc) so it's clear that this is not
> > about vhost-user shared memory tables/regions.
> >
> > > its `shmid`. Then, the fd memory is advertised
> > > to the frontend through a BAR+offset, so it can
> > > be read by the driver while its valid.
> >
> > Why is a PCI BAR mentioned here? vhost-user does not know about the
> > VIRTIO Transport (e.g. PCI) being used. It's the frontend's job to
> > report VIRTIO Shared Memory Regions to the driver.
> >
> >
> I will remove PCI BAR, as it is true that it depends on the
> transport. I was trying to explain that the driver
> will use the shm_base + shm_offset to access
> the mapped memory.
> 
> 
> > >
> > > Then, the backend can munmap the memory range
> > > in a given shared memory region (again, identified
> > > by its `shmid`), to free it. After this, the
> > > region becomes private and shall not be accessed
> > > by the frontend anymore.
> >
> > What does "private" mean?
> >
> > The frontend must mmap PROT_NONE to reserve the virtual memory space
> > when no fd is mapped in the VIRTIO Shared Memory Region. Otherwise an
> > unrelated mmap(NULL, ...) might use that address range and the guest
> > would have access to the host memory! This is a security issue and needs
> > to be mentioned explicitly in the spec.
> >
> 
> I mentioned private because it changes the mapping from MAP_SHARED
> to MAP_PRIVATE. I will highlight PROT_NONE instead.

I see. Then "MAP_PRIVATE" would be clearer. I wasn't sure whether you
mean mmap flags or something like the memory range is no longer
accessible to the driver.

> 
> 
> >
> > >
> > > Initializing the memory region is reponsiblity
> > > of the PCI device that will using it.
> >
> > What does this mean?
> >
> 
> The MemoryRegion is declared in `struct VirtIODevice`,
> but it is uninitialized in this commit. So I was trying to say
> that the initialization will happen in, e.g., vhost-user-gpu-pci.c
> with something like `memory_region_init` , and later `pci_register_bar`.

Okay. The device model needs to create MemoryRegion instances for the
device's Shared Memory Regions and add them to the VirtIODevice.

--device vhost-user-device will need to query the backend since, unlike
vhost-user-gpu-pci and friends, it doesn't have knowledge of specific
device types. It will need to create MemoryRegions enumerated from the
backend.

By the way, the VIRTIO MMIO Transport also supports VIRTIO Shared Memory
Regions so this work should not be tied to PCI.

> 
> I am testing that part still.
> 
> 
> >
> > >
> > > Signed-off-by: Albert Esteve 
> > > ---
> > >  docs/interop/vhost-user.rst |  23 
> > >  hw/virtio/vhost-user.c  | 106 
> > >  hw/virtio/virtio.c  |   2 +
> > >  include/hw/virtio/virtio.h  |   3 +
> > >  4 files changed, 134 insertions(+)
> >
> > Two missing pieces:
> >
> > 1. QEMU's --device vhost-user-device needs a way to enumerate VIRTIO
> > Shared Memory Regions from the vhost-user backend. vhost-user-device is
> > a generic vhost-user frontend without knowledge of the device type, so
> > it doesn't know what the valid shmids are and what size the regions
> > have.
> >
> 
> Ok. I was assuming that if a device (backend) makes a request without a
> valid shmid or not enough size in the region to perform the mmap, would
> just fail in the VMM performing the actual mmap operation. So it would
> not necessarily need to know about valid shmids or region sizes.

But then --device vhost-user-device wouldn't be able to support VIRTIO
Shared Memory Regions, which means this patch series is incomplete. New
vhost-user features need to support both --device vhost-user--*
and --device vhost-user-device.

What's needed is:
1. New vhost-user messages so the frontend can query the shmids and
   sizes from the backend.
2. QEMU --device vhost-user-device code that queries the VIRTIO Shared
   Memory Regions from the vhost-user backend and then creates
   MemoryRegions for them.

> 
> 
> >
>

Re: [RFC PATCH 0/1] vhost-user: Add SHMEM_MAP/UNMAP requests

2024-06-05 Thread Stefan Hajnoczi
On Wed, Jun 05, 2024 at 09:24:36AM +0200, Albert Esteve wrote:
> On Tue, Jun 4, 2024 at 8:16 PM Stefan Hajnoczi  wrote:
> 
> > On Thu, May 30, 2024 at 05:22:22PM +0200, Albert Esteve wrote:
> > > Hi all,
> > >
> > > This is an early attempt to have backends
> > > support dynamic fd mapping into shared
> > > memory regions. As such, there are a few
> > > things that need settling, so I wanted to
> > > post this first to have some early feedback.
> > >
> > > The usecase for this is, e.g., to support
> > > vhost-user-gpu RESOURCE_BLOB operations,
> > > or DAX Window request for virtio-fs. In
> > > general, any operation where a backend
> > > would need to mmap an fd to a shared
> > > memory so that the guest can access it.
> >
> > I wanted to mention that this sentence confuses me because:
> >
> > - The frontend will mmap an fd into the guest's memory space so that a
> >   VIRTIO Shared Memory Region is exposed to the guest. The backend
> >   requests the frontend to perform this operation. The backend does not
> >   invoke mmap itself.
> >
> 
> Sorry for the confused wording. It is true that the backend does not
> do the mmap, but requests it to be done. One point of confusion for
> me from your sentence is that I refer to the driver as the frontend,

They are different concepts. Frontend is defined in the vhost-user spec
and driver is defined in the VIRTIO spec.

The frontend is the application that uses vhost-user protocol messages
to communicate with the backend.

The driver uses VIRTIO device model interfaces like virtqueues to
communicate with the device.

> and the mapping is done by the VMM (i.e., QEMU).
> 
> But yeah, I agree and the scenario you describe is what
> I had in mind. Thanks for pointing it out. I will rephrase it
> in follow-up patches.

Thanks!

> 
> 
> >
> > - "Shared memory" is ambiguous. Please call it VIRTIO Shared Memory
> >   Region to differentiate from vhost-user shared memory tables/regions.
> >
> 
> Ok!
> 
> 
> >
> > > The request will be processed by the VMM,
> > > that will, in turn, trigger a mmap with
> > > the instructed parameters (i.e., shmid,
> > > shm_offset, fd_offset, fd, lenght).
> > >
> > > As there are already a couple devices
> > > that could benefit of such a feature,
> > > and more could require it in the future,
> > > my intention was to make it generic.
> > >
> > > To that end, I declared the shared
> > > memory region list in `VirtIODevice`.
> > > I could add a couple commodity
> > > functions to add new regions to the list,
> > > so that the devices can use them. But
> > > I wanted to gather some feedback before
> > > refining it further, as I am probably
> > > missing some required steps/or security
> > > concerns that I am not taking into account.
> > >
> > > Albert Esteve (1):
> > >   vhost-user: add shmem mmap request
> > >
> > >  docs/interop/vhost-user.rst |  23 
> > >  hw/virtio/vhost-user.c  | 106 
> > >  hw/virtio/virtio.c  |   2 +
> > >  include/hw/virtio/virtio.h  |   3 +
> > >  4 files changed, 134 insertions(+)
> > >
> > > --
> > > 2.44.0
> > >
> >


signature.asc
Description: PGP signature


Re: [PATCH] fuse: cleanup request queuing towards virtiofs

2024-06-05 Thread Stefan Hajnoczi
On Wed, Jun 05, 2024 at 10:40:44AM +, Peter-Jan Gootzen wrote:
> On Wed, 2024-05-29 at 14:32 -0400, Stefan Hajnoczi wrote:
> > On Wed, May 29, 2024 at 05:52:07PM +0200, Miklos Szeredi wrote:
> > > Virtiofs has its own queing mechanism, but still requests are first
> > > queued
> > > on fiq->pending to be immediately dequeued and queued onto the
> > > virtio
> > > queue.
> > > 
> > > The queuing on fiq->pending is unnecessary and might even have some
> > > performance impact due to being a contention point.
> > > 
> > > Forget requests are handled similarly.
> > > 
> > > Move the queuing of requests and forgets into the fiq->ops->*.
> > > fuse_iqueue_ops are renamed to reflect the new semantics.
> > > 
> > > Signed-off-by: Miklos Szeredi 
> > > ---
> > >  fs/fuse/dev.c   | 159 -
> > > ---
> > >  fs/fuse/fuse_i.h    |  19 ++
> > >  fs/fuse/virtio_fs.c |  41 
> > >  3 files changed, 106 insertions(+), 113 deletions(-)
> > 
> > This is a little scary but I can't think of a scenario where directly
> > dispatching requests to virtqueues is a problem.
> > 
> > Is there someone who can run single and multiqueue virtiofs
> > performance
> > benchmarks?
> > 
> > Reviewed-by: Stefan Hajnoczi 
> 
> I ran some tests and experiments on the patch (on top of v6.10-rc2) with
> our multi-queue capable virtio-fs device. No issues were found.
> 
> Experimental system setup (which is not the fastest possible setup nor
> the most optimized setup!):
> # Host:
>- Dell PowerEdge R7525
>- CPU: 2x AMD EPYC 7413 24-Core
>- VM: QEMU KVM with 24 cores, vCPUs locked to the NUMA nodes on which
> the DPU is attached. VFIO-pci device to passthrough the DPU.   
> Running a default x86_64 ext4 buildroot with fio installed.
> # Virtio-fs device:
>- BlueField-3 DPU
>- CPU: ARM Cortex-A78AE, 16 cores
>- One thread per queue, each busy polling on one request queue
>- Each queue is 1024 descriptors deep
> # Workload (deviations are specified in the table):
>- fio 3.34
>- sequential read
>- ioengine=io_uring, single 4GiB file, iodepth=128, bs=256KiB,
> runtime=30s, ramp_time=10s, direct=1
>- T is the number of threads (numjobs=T with thread=1)
>- Q is the number of request queues
> 
> | Workload   | Before patch | After patch |
> | -- |  | --- |
> | T=1 Q=1| 9216MiB/s| 9201MiB/s   |
> | T=2 Q=2| 10.8GiB/s| 10.7GiB/s   |
> | T=4 Q=4| 12.6GiB/s| 12.2GiB/s   |
> | T=8 Q=8| 19.5GiB/s| 19.7GiB/s   |
> | T=16 Q=1   | 9451MiB/s| 9558MiB/s   |
> | T=16 Q=2   | 13.5GiB/s| 13.4GiB/s   |
> | T=16 Q=4   | 11.8GiB/s| 11.4GiB/s   |
> | T=16 Q=8   | 11.1GiB/s| 10.8GiB/s   |
> | T=24 Q=24  | 26.5GiB/s| 26.5GiB/s   |
> | T=24 Q=24 24 files | 26.5GiB/s| 26.6GiB/s   |
> | T=24 Q=24 4k   | 948MiB/s | 955MiB/s|
> 
> Averaging out those results, the difference is within a reasonable
> margin of a error (less than 1%). So in this setup's
> case we see no difference in performance.
> However if the virtio-fs device was more optimized, e.g. if it didn't
> copy the data to its memory, then the bottleneck could possibly be on
> the driver-side and this patch could show some benefit at those higher
> message rates.
> 
> So although I would have hoped for some performance increase already
> with this setup, I still think this is a good patch and a logical
> optimization for high performance virtio-fs devices that might show a
> benefit in the future.
> 
> Tested-by: Peter-Jan Gootzen 
> Reviewed-by: Peter-Jan Gootzen 

Thank you!

Stefan



signature.asc
Description: PGP signature


Re: [RFC PATCH 1/1] vhost-user: add shmem mmap request

2024-06-04 Thread Stefan Hajnoczi
On Thu, May 30, 2024 at 05:22:23PM +0200, Albert Esteve wrote:
> Add SHMEM_MAP/UNMAP requests to vhost-user.
> 
> This request allows backends to dynamically map
> fds into a shared memory region indentified by

Please call this "VIRTIO Shared Memory Region" everywhere (code,
vhost-user spec, commit description, etc) so it's clear that this is not
about vhost-user shared memory tables/regions.

> its `shmid`. Then, the fd memory is advertised
> to the frontend through a BAR+offset, so it can
> be read by the driver while its valid.

Why is a PCI BAR mentioned here? vhost-user does not know about the
VIRTIO Transport (e.g. PCI) being used. It's the frontend's job to
report VIRTIO Shared Memory Regions to the driver.

> 
> Then, the backend can munmap the memory range
> in a given shared memory region (again, identified
> by its `shmid`), to free it. After this, the
> region becomes private and shall not be accessed
> by the frontend anymore.

What does "private" mean?

The frontend must mmap PROT_NONE to reserve the virtual memory space
when no fd is mapped in the VIRTIO Shared Memory Region. Otherwise an
unrelated mmap(NULL, ...) might use that address range and the guest
would have access to the host memory! This is a security issue and needs
to be mentioned explicitly in the spec.

> 
> Initializing the memory region is reponsiblity
> of the PCI device that will using it.

What does this mean?

> 
> Signed-off-by: Albert Esteve 
> ---
>  docs/interop/vhost-user.rst |  23 
>  hw/virtio/vhost-user.c  | 106 
>  hw/virtio/virtio.c  |   2 +
>  include/hw/virtio/virtio.h  |   3 +
>  4 files changed, 134 insertions(+)

Two missing pieces:

1. QEMU's --device vhost-user-device needs a way to enumerate VIRTIO
Shared Memory Regions from the vhost-user backend. vhost-user-device is
a generic vhost-user frontend without knowledge of the device type, so
it doesn't know what the valid shmids are and what size the regions
have.

2. Other backends don't see these mappings. If the guest submits a vring
descriptor referencing a mapping to another backend, then that backend
won't be able to access this memory. David Gilbert hit this problem when
working on the virtiofs DAX Window. Either the frontend needs to forward
all SHMAP_MAP/UNMAP messages to the other backends (inefficient and
maybe racy!) or a new "memcpy" message is needed as a fallback for when
vhost-user memory region translation fails.

> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index d8419fd2f1..3caf2a290c 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1859,6 +1859,29 @@ is sent by the front-end.
>when the operation is successful, or non-zero otherwise. Note that if the
>operation fails, no fd is sent to the backend.
>  
> +``VHOST_USER_BACKEND_SHMEM_MAP``
> +  :id: 9
> +  :equivalent ioctl: N/A
> +  :request payload: fd and ``struct VhostUserMMap``
> +  :reply payload: N/A
> +
> +  This message can be submitted by the backends to advertise a new mapping
> +  to be made in a given shared memory region. Upon receiving the message,
> +  QEMU will mmap the given fd into the shared memory region with the

s/QEMU/the frontend/

> +  requested ``shmid``. A reply is generated indicating whether mapping
> +  succeeded.

Please document whether mapping over an existing mapping is allowed. I
think it should be allowed because it might be useful to atomically
update a mapping without a race where the driver sees unmapped memory.

If mapping over an existing mapping is allowed, does the new mapping
need to cover the old mapping exactly, or can it span multiple previous
mappings or a subset of an existing mapping?

From a security point of view we need to be careful here. A potentially
untrusted backend process now has the ability to mmap an arbitrary file
descriptor into the frontend process. The backend can cause
denial of service by creating many small mappings to exhaust the OS
limits on virtual memory areas. The backend can map memory to use as
part of a security compromise, so we need to be sure the virtual memory
addresses are not leaked to the backend and only read/write page
permissions are available.

> +``VHOST_USER_BACKEND_SHMEM_UNMAP``
> +  :id: 10
> +  :equivalent ioctl: N/A
> +  :request payload: ``struct VhostUserMMap``
> +  :reply payload: N/A
> +
> +  This message can be submitted by the backends so that QEMU un-mmap

s/QEMU/the frontend/

> +  a given range (``offset``, ``len``) in the shared memory region with the
> +  requested ``shmid``.

Does the range need to correspond to a previously-mapped VhostUserMMap
or can it cross multiple VhostUserMMaps, be a subset of a VhostUserMMap,
etc?

> +  A reply is generated indicating whether unmapping succeeded.
> +
>  .. _reply_ack:
>  
>  VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..9526b9d07f 100644
> --- 

Re: [RFC PATCH 0/1] vhost-user: Add SHMEM_MAP/UNMAP requests

2024-06-04 Thread Stefan Hajnoczi
On Thu, May 30, 2024 at 05:22:22PM +0200, Albert Esteve wrote:
> Hi all,
> 
> This is an early attempt to have backends
> support dynamic fd mapping into shared
> memory regions. As such, there are a few
> things that need settling, so I wanted to
> post this first to have some early feedback.
> 
> The usecase for this is, e.g., to support
> vhost-user-gpu RESOURCE_BLOB operations,
> or DAX Window request for virtio-fs. In
> general, any operation where a backend
> would need to mmap an fd to a shared
> memory so that the guest can access it.

I wanted to mention that this sentence confuses me because:

- The frontend will mmap an fd into the guest's memory space so that a
  VIRTIO Shared Memory Region is exposed to the guest. The backend
  requests the frontend to perform this operation. The backend does not
  invoke mmap itself.

- "Shared memory" is ambiguous. Please call it VIRTIO Shared Memory
  Region to differentiate from vhost-user shared memory tables/regions.

> The request will be processed by the VMM,
> that will, in turn, trigger a mmap with
> the instructed parameters (i.e., shmid,
> shm_offset, fd_offset, fd, lenght).
> 
> As there are already a couple devices
> that could benefit of such a feature,
> and more could require it in the future,
> my intention was to make it generic.
> 
> To that end, I declared the shared
> memory region list in `VirtIODevice`.
> I could add a couple commodity
> functions to add new regions to the list,
> so that the devices can use them. But
> I wanted to gather some feedback before
> refining it further, as I am probably
> missing some required steps/or security
> concerns that I am not taking into account.
> 
> Albert Esteve (1):
>   vhost-user: add shmem mmap request
> 
>  docs/interop/vhost-user.rst |  23 
>  hw/virtio/vhost-user.c  | 106 
>  hw/virtio/virtio.c  |   2 +
>  include/hw/virtio/virtio.h  |   3 +
>  4 files changed, 134 insertions(+)
> 
> -- 
> 2.44.0
> 


signature.asc
Description: PGP signature


Re: Addressing architectural differences between FUSE driver and fs - Re: virtio-fs tests between host(x86) and dpu(arm64)

2024-06-03 Thread Stefan Hajnoczi
On Mon, Jun 03, 2024 at 04:56:14PM +0200, Miklos Szeredi wrote:
> On Mon, Jun 3, 2024 at 3:44 PM Stefan Hajnoczi  wrote:
> >
> > On Mon, Jun 03, 2024 at 11:06:19AM +0200, Miklos Szeredi wrote:
> > > On Mon, 3 Jun 2024 at 10:53, Peter-Jan Gootzen  
> > > wrote:
> > >
> > > > We also considered this idea, it would kind of be like locking FUSE into
> > > > being x86. However I think this is not backwards compatible. Currently
> > > > an ARM64 client and ARM64 server work just fine. But making such a
> > > > change would break if the client has the new driver version and the
> > > > server is not updated to know that it should interpret x86 specifically.
> > >
> > > This would need to be negotiated, of course.
> > >
> > > But it's certainly simpler to just indicate the client arch in the
> > > INIT request.   Let's go with that for now.
> >
> > In the long term it would be cleanest to choose a single canonical
> > format instead of requiring drivers and devices to implement many
> > arch-specific formats. I liked the single canonical format idea you
> > suggested.
> >
> > My only concern is whether there are more commands/fields in FUSE that
> > operate in an arch-specific way (e.g. ioctl)? If there really are parts
> > that need to be arch-specific, then it might be necessary to negotiate
> > an architecture after all.
> 
> How about something like this:
> 
>  - by default fall back to no translation for backward compatibility
>  - server may request matching by sending its own arch identifier in
> fuse_init_in
>  - client sends back its arch identifier in fuse_init_out
>  - client also sends back a flag indicating whether it will transform
> to canonical or not
> 
> This means the client does not have to implement translation for every
> architecture, only ones which are frequently used as guest.  The
> server may opt to implement its own translation if it's lacking in the
> client, or it can just fail.

From the client perspective:

1. Do not negotiate arch in fuse_init_out - hopefully client and server
   know what they are doing :). This is the current behavior.
2. Reply to fuse_init_in with server's arch in fuse_init_out - client
   translates according to server's arch.
3. Reply to fuse_init_in with canonical flag set in fuse_init_out -
   client and server use canonical format.

From the server perspective:

1. Client does not negotiate arch - the current behavior (good luck!).
2. Arch received in fuse_init_out from client - must be equal to
   server's arch since there is no way for the server to reject the
   arch.
3. Canonical flag received in fuse_init_out from client - client and
   server use canonical format.

Is this what you had in mind?

Stefan


signature.asc
Description: PGP signature


Re: Addressing architectural differences between FUSE driver and fs - Re: virtio-fs tests between host(x86) and dpu(arm64)

2024-06-03 Thread Stefan Hajnoczi
On Mon, Jun 03, 2024 at 11:06:19AM +0200, Miklos Szeredi wrote:
> On Mon, 3 Jun 2024 at 10:53, Peter-Jan Gootzen  wrote:
> 
> > We also considered this idea, it would kind of be like locking FUSE into
> > being x86. However I think this is not backwards compatible. Currently
> > an ARM64 client and ARM64 server work just fine. But making such a
> > change would break if the client has the new driver version and the
> > server is not updated to know that it should interpret x86 specifically.
> 
> This would need to be negotiated, of course.
> 
> But it's certainly simpler to just indicate the client arch in the
> INIT request.   Let's go with that for now.

In the long term it would be cleanest to choose a single canonical
format instead of requiring drivers and devices to implement many
arch-specific formats. I liked the single canonical format idea you
suggested.

My only concern is whether there are more commands/fields in FUSE that
operate in an arch-specific way (e.g. ioctl)? If there really are parts
that need to be arch-specific, then it might be necessary to negotiate
an architecture after all.

Stefan

> 
> Thanks,
> Miklos
> 


signature.asc
Description: PGP signature


Re: [RFC 0/6] scripts: Rewrite simpletrace printer in Rust

2024-05-29 Thread Stefan Hajnoczi
On Wed, May 29, 2024 at 10:10:00PM +0800, Zhao Liu wrote:
> Hi Stefan and Mads,
> 
> On Wed, May 29, 2024 at 11:33:42AM +0200, Mads Ynddal wrote:
> > Date: Wed, 29 May 2024 11:33:42 +0200
> > From: Mads Ynddal 
> > Subject: Re: [RFC 0/6] scripts: Rewrite simpletrace printer in Rust
> > X-Mailer: Apple Mail (2.3774.600.62)
> > 
> > 
> > >> Maybe later, Rust-simpletrace and python-simpletrace can differ, e.g.
> > >> the former goes for performance and the latter for scalability.
> > > 
> > > Rewriting an existing, maintained component without buy-in from the
> > > maintainers is risky. Mads is the maintainer of simpletrace.py and I am
> > > the overall tracing maintainer. While the performance improvement is
> > > nice, I'm a skeptical about the need for this and wonder whether thought
> > > was put into how simpletrace should evolve.
> > > 
> > > There are disadvantages to maintaining multiple implementations:
> > > - File format changes need to be coordinated across implementations to
> > >  prevent compatibility issues. In other words, changing the
> > >  trace-events format becomes harder and discourages future work.
> > > - Multiple implementations makes life harder for users because they need
> > >  to decide between implementations and understand the trade-offs.
> > > - There is more maintenance overall.
> > > 
> > > I think we should have a single simpletrace implementation to avoid
> > > these issues. The Python implementation is more convenient for
> > > user-written trace analysis scripts. The Rust implementation has better
> > > performance (although I'm not aware of efforts to improve the Python
> > > implementation's performance, so who knows).
> > > 
> > > I'm ambivalent about why a reimplementation is necessary. What I would
> > > like to see first is the TCG binary tracing functionality. Find the
> > > limits of the Python simpletrace implementation and then it will be
> > > clear whether a Rust reimplementation makes sense.
> > > 
> > > If Mads agrees, I am happy with a Rust reimplementation, but please
> > > demonstrate why a reimplementation is necessary first.
> > > 
> > > Stefan
> > 
> > I didn't want to shoot down the idea, since it seemed like somebody had a 
> > plan
> > with GSoC. But I actually agree, that I'm not quite convinced.
> > 
> > I think I'd need to see some data that showed the Python version is 
> > inadequate.
> > I know Python is not the fastest, but is it so prohibitively slow, that we
> > cannot make the TCG analysis? I'm not saying it can't be true, but it'd be 
> > nice
> > to see it demonstrated before making decisions.
> > 
> > Because, as you point out, there's a lot of downsides to having two 
> > versions. So
> > the benefits have to clearly outweigh the additional work.
> > 
> > I have a lot of other questions, but let's maybe start with the core idea 
> > first.
> > 
> > —
> > Mads Ynddal
> >
> 
> I really appreciate your patience and explanations, and your feedback
> and review has helped me a lot!
> 
> Yes, simple repetition creates much maintenance burden (though I'm happy
> to help with), and the argument for current performance isn't convincing
> enough.
> 
> Getting back to the project itself, as you have said, the core is still
> further support for TCG-related traces, and I'll continue to work on it,
> and then look back based on such work to see what issues there are with
> traces or what improvements can be made.

Thanks for doing that and sorry for holding up the work you have already
done!

Stefan


signature.asc
Description: PGP signature


Re: [RFC 1/6] scripts/simpletrace-rust: Add the basic cargo framework

2024-05-29 Thread Stefan Hajnoczi
On Wed, May 29, 2024 at 10:30:13PM +0800, Zhao Liu wrote:
> Hi Stefan,
> 
> On Tue, May 28, 2024 at 10:14:01AM -0400, Stefan Hajnoczi wrote:
> > Date: Tue, 28 May 2024 10:14:01 -0400
> > From: Stefan Hajnoczi 
> > Subject: Re: [RFC 1/6] scripts/simpletrace-rust: Add the basic cargo
> >  framework
> > 
> > On Tue, May 28, 2024 at 03:53:55PM +0800, Zhao Liu wrote:
> > > Hi Stefan,
> > > 
> > > [snip]
> > > 
> > > > > diff --git a/scripts/simpletrace-rust/.rustfmt.toml 
> > > > > b/scripts/simpletrace-rust/.rustfmt.toml
> > > > > new file mode 100644
> > > > > index ..97a97c24ebfb
> > > > > --- /dev/null
> > > > > +++ b/scripts/simpletrace-rust/.rustfmt.toml
> > > > > @@ -0,0 +1,9 @@
> > > > > +brace_style = "AlwaysNextLine"
> > > > > +comment_width = 80
> > > > > +edition = "2021"
> > > > > +group_imports = "StdExternalCrate"
> > > > > +imports_granularity = "item"
> > > > > +max_width = 80
> > > > > +use_field_init_shorthand = true
> > > > > +use_try_shorthand = true
> > > > > +wrap_comments = true
> > > > 
> > > > There should be QEMU-wide policy. That said, why is it necessary to 
> > > > customize rustfmt?
> > > 
> > > Indeed, but QEMU's style for Rust is currently undefined, so I'm trying
> > > to add this to make it easier to check the style...I will separate it
> > > out as a style policy proposal.
> > 
> > Why is a config file necessary? QEMU should use the default Rust style.
> > 
> 
> There are some that may be overdone, but I think some basic may still
> be necessary, like "comment_width = 80", "max_width = 80",
> "wrap_comments". Is it necessary to specify the width? As C.

Let's agree to follow the Rust coding style from the start, then the
problem is solved. My view is that deviating from the standard Rust
coding style in order to make QEMU Rust code resemble QEMU C code is
less helpful than following Rust conventions so our Rust code looks like
Rust.

> 
> And, "group_imports" and "imports_granularity" (refered from crosvm),
> can also be used to standardize including styles and improve
> readability, since importing can be done in many different styles.
> 
> This fmt config is something like ./script/check_patch.pl for QEMU/linux.
> Different programs have different practices, so I feel like that's an
> open too!

In languages like Rust that have a standard, let's follow the standard
instead of inventing our own way of formatting code.

> This certainly also depends on the maintainer's/your preferences, ;-)
> in what way looks more comfortable/convenient that is the best,
> completely according to the default is also good.

This will probably affect all Rust code in QEMU so everyone's opinion
counts.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] fuse: cleanup request queuing towards virtiofs

2024-05-29 Thread Stefan Hajnoczi
On Wed, May 29, 2024 at 05:52:07PM +0200, Miklos Szeredi wrote:
> Virtiofs has its own queing mechanism, but still requests are first queued
> on fiq->pending to be immediately dequeued and queued onto the virtio
> queue.
> 
> The queuing on fiq->pending is unnecessary and might even have some
> performance impact due to being a contention point.
> 
> Forget requests are handled similarly.
> 
> Move the queuing of requests and forgets into the fiq->ops->*.
> fuse_iqueue_ops are renamed to reflect the new semantics.
> 
> Signed-off-by: Miklos Szeredi 
> ---
>  fs/fuse/dev.c   | 159 
>  fs/fuse/fuse_i.h|  19 ++
>  fs/fuse/virtio_fs.c |  41 
>  3 files changed, 106 insertions(+), 113 deletions(-)

This is a little scary but I can't think of a scenario where directly
dispatching requests to virtqueues is a problem.

Is there someone who can run single and multiqueue virtiofs performance
benchmarks?

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH] Use "void *" as parameter for functions that are used for aio_set_event_notifier()

2024-05-29 Thread Stefan Hajnoczi
On Wed, May 29, 2024 at 07:49:48PM +0200, Thomas Huth wrote:
> aio_set_event_notifier() and aio_set_event_notifier_poll() in
> util/aio-posix.c and util/aio-win32.c are casting function pointers of
> functions that take an "EventNotifier *" pointer as parameter to function
> pointers that take a "void *" pointer as parameter (i.e. the IOHandler
> type). When those function pointers are later used to call the referenced
> function, this triggers undefined behavior errors with the latest version
> of Clang in Fedora 40 when compiling with the option "-fsanitize=undefined".
> And this also prevents enabling the strict mode of CFI which is currently
> disabled with -fsanitize-cfi-icall-generalize-pointers. Thus let us avoid
> the problem by using "void *" as parameter in all spots where it is needed.
> 
> Signed-off-by: Thomas Huth 
> ---
>  Yes, I know, the patch looks ugly ... but I don't see a better way to
>  tackle this. If someone has a better idea, suggestions are welcome!

An alternative is adding EventNotifierHandler *io_read, *io_poll_ready,
*io_poll_begin, and *io_poll_end fields to EventNotifier so that
aio_set_event_notifier() and aio_set_event_notifier_poll() can pass
helper functions to the underlying aio_set_fd_handler() and
aio_set_fd_poll() APIs. These helper functions then invoke the
EventNotifier callbacks:

/* Helpers */
static void event_notifier_io_read(void *opaque)
{
EventNotifier *notifier = opaque;
notifier->io_read(notifier);
}

static void event_notifier_io_poll_ready(void *opaque)
{
EventNotifier *notifier = opaque;
notifier->io_poll_ready(notifier);
}

void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier,
EventNotifierHandler *io_read,
AioPollFn *io_poll,
EventNotifierHandler *io_poll_ready)
{
notifier->io_read = io_read;
notifier->io_poll_ready = io_poll_ready;

aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
   io_read ? event_notifier_io_read : NULL,
   NULL, io_poll,
   io_poll_ready ? event_notifier_io_poll_ready : NULL,
   notifier);
}

...same for aio_set_event_notifier_poll()...

This is not beautiful either but keeps the API type safe and simpler for
users.

> 
>  include/block/aio.h|  8 
>  include/hw/virtio/virtio.h |  2 +-
>  include/qemu/main-loop.h   |  3 +--
>  block/linux-aio.c  |  6 +++---
>  block/nvme.c   |  8 
>  block/win32-aio.c  |  4 ++--
>  hw/hyperv/hyperv.c |  6 +++---
>  hw/hyperv/hyperv_testdev.c |  5 +++--
>  hw/hyperv/vmbus.c  |  8 
>  hw/nvme/ctrl.c |  8 
>  hw/usb/ccid-card-emulated.c|  5 +++--
>  hw/virtio/vhost-shadow-virtqueue.c | 11 ++-
>  hw/virtio/vhost.c  |  5 +++--
>  hw/virtio/virtio.c | 26 ++
>  tests/unit/test-aio.c  |  9 +
>  tests/unit/test-nested-aio-poll.c  |  8 
>  util/aio-posix.c   | 14 ++
>  util/aio-win32.c   | 10 +-
>  util/async.c   |  6 +++---
>  util/main-loop.c   |  3 +--
>  20 files changed, 79 insertions(+), 76 deletions(-)
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 8378553eb9..01e7ea069d 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -476,9 +476,9 @@ void aio_set_fd_handler(AioContext *ctx,
>   */
>  void aio_set_event_notifier(AioContext *ctx,
>  EventNotifier *notifier,
> -EventNotifierHandler *io_read,
> +IOHandler *io_read,
>  AioPollFn *io_poll,
> -EventNotifierHandler *io_poll_ready);
> +IOHandler *io_poll_ready);
>  
>  /*
>   * Set polling begin/end callbacks for an event notifier that has already 
> been
> @@ -491,8 +491,8 @@ void aio_set_event_notifier(AioContext *ctx,
>   */
>  void aio_set_event_notifier_poll(AioContext *ctx,
>   EventNotifier *notifier,
> - EventNotifierHandler *io_poll_begin,
> - EventNotifierHandler *io_poll_end);
> + IOHandler *io_poll_begin,
> + IOHandler *io_poll_end);
>  
>  /* Return a GSource that lets the main loop poll the file descriptors 
> attached
>   * to this AioContext.
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..e98cecfdd7 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -396,7 +396,7 @@ void 

Re: [PATCH] Use "void *" as parameter for functions that are used for aio_set_event_notifier()

2024-05-29 Thread Stefan Hajnoczi
On Wed, May 29, 2024 at 07:49:48PM +0200, Thomas Huth wrote:
> aio_set_event_notifier() and aio_set_event_notifier_poll() in
> util/aio-posix.c and util/aio-win32.c are casting function pointers of
> functions that take an "EventNotifier *" pointer as parameter to function
> pointers that take a "void *" pointer as parameter (i.e. the IOHandler
> type). When those function pointers are later used to call the referenced
> function, this triggers undefined behavior errors with the latest version
> of Clang in Fedora 40 when compiling with the option "-fsanitize=undefined".
> And this also prevents enabling the strict mode of CFI which is currently
> disabled with -fsanitize-cfi-icall-generalize-pointers. Thus let us avoid
> the problem by using "void *" as parameter in all spots where it is needed.
> 
> Signed-off-by: Thomas Huth 
> ---
>  Yes, I know, the patch looks ugly ... but I don't see a better way to
>  tackle this. If someone has a better idea, suggestions are welcome!

An alternative is adding EventNotifierHandler *io_read, *io_poll_ready,
*io_poll_begin, and *io_poll_end fields to EventNotifier so that
aio_set_event_notifier() and aio_set_event_notifier_poll() can pass
helper functions to the underlying aio_set_fd_handler() and
aio_set_fd_poll() APIs. These helper functions then invoke the
EventNotifier callbacks:

/* Helpers */
static void event_notifier_io_read(void *opaque)
{
EventNotifier *notifier = opaque;
notifier->io_read(notifier);
}

static void event_notifier_io_poll_ready(void *opaque)
{
EventNotifier *notifier = opaque;
notifier->io_poll_ready(notifier);
}

void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier,
EventNotifierHandler *io_read,
AioPollFn *io_poll,
EventNotifierHandler *io_poll_ready)
{
notifier->io_read = io_read;
notifier->io_poll_ready = io_poll_ready;

aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
   io_read ? event_notifier_io_read : NULL,
   NULL, io_poll,
   io_poll_ready ? event_notifier_io_poll_ready : NULL,
   notifier);
}

...same for aio_set_event_notifier_poll()...

This is not beautiful either but keeps the API type safe and simpler for
users.

> 
>  include/block/aio.h|  8 
>  include/hw/virtio/virtio.h |  2 +-
>  include/qemu/main-loop.h   |  3 +--
>  block/linux-aio.c  |  6 +++---
>  block/nvme.c   |  8 
>  block/win32-aio.c  |  4 ++--
>  hw/hyperv/hyperv.c |  6 +++---
>  hw/hyperv/hyperv_testdev.c |  5 +++--
>  hw/hyperv/vmbus.c  |  8 
>  hw/nvme/ctrl.c |  8 
>  hw/usb/ccid-card-emulated.c|  5 +++--
>  hw/virtio/vhost-shadow-virtqueue.c | 11 ++-
>  hw/virtio/vhost.c  |  5 +++--
>  hw/virtio/virtio.c | 26 ++
>  tests/unit/test-aio.c  |  9 +
>  tests/unit/test-nested-aio-poll.c  |  8 
>  util/aio-posix.c   | 14 ++
>  util/aio-win32.c   | 10 +-
>  util/async.c   |  6 +++---
>  util/main-loop.c   |  3 +--
>  20 files changed, 79 insertions(+), 76 deletions(-)
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 8378553eb9..01e7ea069d 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -476,9 +476,9 @@ void aio_set_fd_handler(AioContext *ctx,
>   */
>  void aio_set_event_notifier(AioContext *ctx,
>  EventNotifier *notifier,
> -EventNotifierHandler *io_read,
> +IOHandler *io_read,
>  AioPollFn *io_poll,
> -EventNotifierHandler *io_poll_ready);
> +IOHandler *io_poll_ready);
>  
>  /*
>   * Set polling begin/end callbacks for an event notifier that has already 
> been
> @@ -491,8 +491,8 @@ void aio_set_event_notifier(AioContext *ctx,
>   */
>  void aio_set_event_notifier_poll(AioContext *ctx,
>   EventNotifier *notifier,
> - EventNotifierHandler *io_poll_begin,
> - EventNotifierHandler *io_poll_end);
> + IOHandler *io_poll_begin,
> + IOHandler *io_poll_end);
>  
>  /* Return a GSource that lets the main loop poll the file descriptors 
> attached
>   * to this AioContext.
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..e98cecfdd7 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -396,7 +396,7 @@ void 

Re: [PATCH 0/2] block/crypto: do not require number of threads upfront

2024-05-29 Thread Stefan Hajnoczi
On Wed, May 29, 2024 at 06:50:34PM +0200, Kevin Wolf wrote:
> Am 27.05.2024 um 17:58 hat Stefan Hajnoczi geschrieben:
> > The block layer does not know how many threads will perform I/O. It is 
> > possible
> > to exceed the number of threads that is given to qcrypto_block_open() and 
> > this
> > can trigger an assertion failure in qcrypto_block_pop_cipher().
> > 
> > This patch series removes the n_threads argument and instead handles an
> > arbitrary number of threads.
> > ---
> > Is it secure to store the key in QCryptoBlock? In this series I assumed the
> > answer is yes since the QCryptoBlock's cipher state is equally sensitive, 
> > but
> > I'm not familiar with this code or a crypto expert.
> 
> I would assume the same, but I'm not merging this yet because I think
> you said you'd like to have input from danpb?
> 
> Reviewed-by: Kevin Wolf 

Yes, please wait until Dan comments.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [PATCH 0/2] block/crypto: do not require number of threads upfront

2024-05-29 Thread Stefan Hajnoczi
On Wed, May 29, 2024 at 06:50:34PM +0200, Kevin Wolf wrote:
> Am 27.05.2024 um 17:58 hat Stefan Hajnoczi geschrieben:
> > The block layer does not know how many threads will perform I/O. It is 
> > possible
> > to exceed the number of threads that is given to qcrypto_block_open() and 
> > this
> > can trigger an assertion failure in qcrypto_block_pop_cipher().
> > 
> > This patch series removes the n_threads argument and instead handles an
> > arbitrary number of threads.
> > ---
> > Is it secure to store the key in QCryptoBlock? In this series I assumed the
> > answer is yes since the QCryptoBlock's cipher state is equally sensitive, 
> > but
> > I'm not familiar with this code or a crypto expert.
> 
> I would assume the same, but I'm not merging this yet because I think
> you said you'd like to have input from danpb?
> 
> Reviewed-by: Kevin Wolf 

Yes, please wait until Dan comments.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [RFC 1/6] scripts/simpletrace-rust: Add the basic cargo framework

2024-05-28 Thread Stefan Hajnoczi
On Tue, May 28, 2024 at 03:53:55PM +0800, Zhao Liu wrote:
> Hi Stefan,
> 
> [snip]
> 
> > > diff --git a/scripts/simpletrace-rust/.rustfmt.toml 
> > > b/scripts/simpletrace-rust/.rustfmt.toml
> > > new file mode 100644
> > > index ..97a97c24ebfb
> > > --- /dev/null
> > > +++ b/scripts/simpletrace-rust/.rustfmt.toml
> > > @@ -0,0 +1,9 @@
> > > +brace_style = "AlwaysNextLine"
> > > +comment_width = 80
> > > +edition = "2021"
> > > +group_imports = "StdExternalCrate"
> > > +imports_granularity = "item"
> > > +max_width = 80
> > > +use_field_init_shorthand = true
> > > +use_try_shorthand = true
> > > +wrap_comments = true
> > 
> > There should be QEMU-wide policy. That said, why is it necessary to 
> > customize rustfmt?
> 
> Indeed, but QEMU's style for Rust is currently undefined, so I'm trying
> to add this to make it easier to check the style...I will separate it
> out as a style policy proposal.

Why is a config file necessary? QEMU should use the default Rust style.

Stefan


signature.asc
Description: PGP signature


Re: [RFC 0/6] scripts: Rewrite simpletrace printer in Rust

2024-05-28 Thread Stefan Hajnoczi
On Tue, May 28, 2024 at 02:48:42PM +0800, Zhao Liu wrote:
> Hi Stefan,
> 
> On Mon, May 27, 2024 at 03:59:44PM -0400, Stefan Hajnoczi wrote:
> > Date: Mon, 27 May 2024 15:59:44 -0400
> > From: Stefan Hajnoczi 
> > Subject: Re: [RFC 0/6] scripts: Rewrite simpletrace printer in Rust
> > 
> > On Mon, May 27, 2024 at 04:14:15PM +0800, Zhao Liu wrote:
> > > Hi maintainers and list,
> > > 
> > > This RFC series attempts to re-implement simpletrace.py with Rust, which
> > > is the 1st task of Paolo's GSoC 2024 proposal.
> > > 
> > > There are two motivations for this work:
> > > 1. This is an open chance to discuss how to integrate Rust into QEMU.
> > > 2. Rust delivers faster parsing.
> > > 
> > > 
> > > Introduction
> > > 
> > > 
> > > Code framework
> > > --
> > > 
> > > I choose "cargo" to organize the code, because the current
> > > implementation depends on external crates (Rust's library), such as
> > > "backtrace" for getting frameinfo, "clap" for parsing the cli, "rex" for
> > > regular matching, and so on. (Meson's support for external crates is
> > > still incomplete. [2])
> > > 
> > > The simpletrace-rust created in this series is not yet integrated into
> > > the QEMU compilation chain, so it can only be compiled independently, e.g.
> > > under ./scripts/simpletrace/, compile it be:
> > > 
> > > cargo build --release
> > 
> > Please make sure it's built by .gitlab-ci.d/ so that the continuous
> > integration system prevents bitrot. You can add a job that runs the
> > cargo build.
> 
> Thanks! I'll do this.
> 
> > > 
> > > The code tree for the entire simpletrace-rust is as follows:
> > > 
> > > $ script/simpletrace-rust .
> > > .
> > > ├── Cargo.toml
> > > └── src
> > > └── main.rs   // The simpletrace logic (similar to simpletrace.py).
> > > └── trace.rs  // The Argument and Event abstraction (refer to
> > >   // tracetool/__init__.py).
> > > 
> > > My question about meson v.s. cargo, I put it at the end of the cover
> > > letter (the section "Opens on Rust Support").
> > > 
> > > The following two sections are lessons I've learned from this Rust
> > > practice.
> > > 
> > > 
> > > Performance
> > > ---
> > > 
> > > I did the performance comparison using the rust-simpletrace prototype with
> > > the python one:
> > > 
> > > * On the i7-10700 CPU @ 2.90GHz machine, parsing and outputting a 35M
> > > trace binary file for 10 times on each item:
> > > 
> > >   AVE (ms)   Rust v.s. Python
> > > Rust   (stdout)   12687.16114.46%
> > > Python (stdout)   14521.85
> > > 
> > > Rust   (file)  1422.44264.99%
> > > Python (file)  3769.37
> > > 
> > > - The "stdout" lines represent output to the screen.
> > > - The "file" lines represent output to a file (via "> file").
> > > 
> > > This Rust version contains some optimizations (including print, regular
> > > matching, etc.), but there should be plenty of room for optimization.
> > > 
> > > The current performance bottleneck is the reading binary trace file,
> > > since I am parsing headers and event payloads one after the other, so
> > > that the IO read overhead accounts for 33%, which can be further
> > > optimized in the future.
> > 
> > Performance will become more important when large amounts of TCG data is
> > captured, as described in the project idea:
> > https://wiki.qemu.org/Internships/ProjectIdeas/TCGBinaryTracing
> > 
> > While I can't think of a time in the past where simpletrace.py's
> > performance bothered me, improving performance is still welcome. Just
> > don't spend too much time on performance (and making the code more
> > complex) unless there is a real need.
> 
> Yes, I agree that it shouldn't be over-optimized.
> 
> The logic in the current Rust version is pretty much a carbon copy of
> the Python version, without additional complex logic introduced, but the
> improvements in x2.6 were obtained by optimizing IO:
> 
> * reading the event configuration file, where I called the buffered
>   interface,
> * and the output formatted trace log, w

Re: [RFC 4/6] scripts/simpletrace-rust: Parse and check trace recode file

2024-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2024 at 04:14:19PM +0800, Zhao Liu wrote:
> Refer to scripts/simpletrace.py, parse and check the simple trace
> backend binary trace file.
> 
> Note, in order to keep certain backtrace information to get frame,
> adjust the cargo debug level for release version to "line-tables-only",
> which slows down the program, but is necessary.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Zhao Liu 
> ---
>  scripts/simpletrace-rust/Cargo.lock  |  79 +
>  scripts/simpletrace-rust/Cargo.toml  |   4 +
>  scripts/simpletrace-rust/src/main.rs | 253 ++-
>  3 files changed, 329 insertions(+), 7 deletions(-)
> 
> diff --git a/scripts/simpletrace-rust/Cargo.lock 
> b/scripts/simpletrace-rust/Cargo.lock
> index 3d815014eb44..37d80974ffe7 100644
> --- a/scripts/simpletrace-rust/Cargo.lock
> +++ b/scripts/simpletrace-rust/Cargo.lock
> @@ -2,6 +2,21 @@
>  # It is not intended for manual editing.
>  version = 3
>  
> +[[package]]
> +name = "addr2line"
> +version = "0.21.0"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "8a30b2e23b9e17a9f90641c7ab1549cd9b44f296d3ccbf309d2863cfe398a0cb"
> +dependencies = [
> + "gimli",
> +]
> +
> +[[package]]
> +name = "adler"
> +version = "1.0.2"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "f26201604c87b1e01bd3d98f8d5d9a8fcbb815e8cedb41ffccbeb4bf593a35fe"
> +
>  [[package]]
>  name = "aho-corasick"
>  version = "1.1.3"
> @@ -60,6 +75,33 @@ dependencies = [
>   "windows-sys",
>  ]
>  
> +[[package]]
> +name = "backtrace"
> +version = "0.3.71"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "26b05800d2e817c8b3b4b54abd461726265fa9789ae34330622f2db9ee696f9d"
> +dependencies = [
> + "addr2line",
> + "cc",
> + "cfg-if",
> + "libc",
> + "miniz_oxide",
> + "object",
> + "rustc-demangle",
> +]
> +
> +[[package]]
> +name = "cc"
> +version = "1.0.98"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "41c270e7540d725e65ac7f1b212ac8ce349719624d7bcff99f8e2e488e8cf03f"
> +
> +[[package]]
> +name = "cfg-if"
> +version = "1.0.0"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
> +
>  [[package]]
>  name = "clap"
>  version = "4.5.4"
> @@ -93,18 +135,48 @@ version = "1.0.1"
>  source = "registry+https://github.com/rust-lang/crates.io-index;
>  checksum = "0b6a852b24ab71dffc585bcb46eaf7959d175cb865a7152e35b348d1b2960422"
>  
> +[[package]]
> +name = "gimli"
> +version = "0.28.1"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "4271d37baee1b8c7e4b708028c57d816cf9d2434acb33a549475f78c181f6253"
> +
>  [[package]]
>  name = "is_terminal_polyfill"
>  version = "1.70.0"
>  source = "registry+https://github.com/rust-lang/crates.io-index;
>  checksum = "f8478577c03552c21db0e2724ffb8986a5ce7af88107e6be5d2ee6e158c12800"
>  
> +[[package]]
> +name = "libc"
> +version = "0.2.155"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "97b3888a4aecf77e811145cadf6eef5901f4782c53886191b2f693f24761847c"
> +
>  [[package]]
>  name = "memchr"
>  version = "2.7.2"
>  source = "registry+https://github.com/rust-lang/crates.io-index;
>  checksum = "6c8640c5d730cb13ebd907d8d04b52f55ac9a2eec55b440c8892f40d56c76c1d"
>  
> +[[package]]
> +name = "miniz_oxide"
> +version = "0.7.3"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "87dfd01fe195c66b572b37921ad8803d010623c0aca821bea2302239d155cdae"
> +dependencies = [
> + "adler",
> +]
> +
> +[[package]]
> +name = "object"
> +version = "0.32.2"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "a6a622008b6e321afc04970976f62ee297fdbaa6f95318ca343e3eebb9648441"
> +dependencies = [
> + "memchr",
> +]
> +
>  [[package]]
>  name = "once_cell"
>  version = "1.19.0"
> @@ -158,10 +230,17 @@ version = "0.8.3"
>  source = "registry+https://github.com/rust-lang/crates.io-index;
>  checksum = "adad44e29e4c806119491a7f06f03de4d1af22c3a680dd47f1e6e179439d1f56"
>  
> +[[package]]
> +name = "rustc-demangle"
> +version = "0.1.24"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "719b953e2095829ee67db738b3bfa9fa368c94900df327b3f07fe6e794d2fe1f"
> +
>  [[package]]
>  name = "simpletrace-rust"
>  version = "0.1.0"
>  dependencies = [
> + "backtrace",
>   "clap",
>   "once_cell",
>   "regex",
> diff --git a/scripts/simpletrace-rust/Cargo.toml 
> b/scripts/simpletrace-rust/Cargo.toml
> index 24a79d04e566..23a3179de01c 100644
> --- a/scripts/simpletrace-rust/Cargo.toml
> +++ b/scripts/simpletrace-rust/Cargo.toml
> @@ -7,7 +7,11 @@ authors = ["Zhao Liu "]
>  license = "GPL-2.0-or-later"
>  
>  [dependencies]
> +backtrace = "0.3"
>  clap = "4.5.4"
>  once_cell = "1.19.0"
>  regex = "1.10.4"
>  thiserror = "1.0.20"
> +
> +[profile.release]
> +debug = "line-tables-only"
> diff 

Re: [RFC 3/6] scripts/simpletrace-rust: Add helpers to parse trace file

2024-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2024 at 04:14:18PM +0800, Zhao Liu wrote:
> Refer to scripts/simpletrace.py, add the helpers to read the trace file
> and parse the record type field, record header and log header.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Zhao Liu 
> ---
>  scripts/simpletrace-rust/src/main.rs | 151 +++
>  1 file changed, 151 insertions(+)
> 
> diff --git a/scripts/simpletrace-rust/src/main.rs 
> b/scripts/simpletrace-rust/src/main.rs
> index 2d2926b7658d..b3b8baee7c66 100644
> --- a/scripts/simpletrace-rust/src/main.rs
> +++ b/scripts/simpletrace-rust/src/main.rs
> @@ -14,21 +14,172 @@
>  mod trace;
>  
>  use std::env;
> +use std::fs::File;
> +use std::io::Error as IOError;
> +use std::io::ErrorKind;
> +use std::io::Read;
>  
>  use clap::Arg;
>  use clap::Command;
>  use thiserror::Error;
>  use trace::Event;
>  
> +const RECORD_TYPE_MAPPING: u64 = 0;
> +const RECORD_TYPE_EVENT: u64 = 1;
> +
>  #[derive(Error, Debug)]
>  pub enum Error
>  {
>  #[error("usage: {0} [--no-header]  ")]
>  CliOptionUnmatch(String),
> +#[error("Failed to read file: {0}")]
> +ReadFile(IOError),
> +#[error("Unknown record type ({0})")]
> +UnknownRecType(u64),
>  }
>  
>  pub type Result = std::result::Result;
>  
> +enum RecordType
> +{
> +Empty,
> +Mapping,
> +Event,
> +}
> +
> +#[repr(C)]
> +#[derive(Clone, Copy, Default)]
> +struct RecordRawType
> +{
> +rtype: u64,
> +}
> +
> +impl RecordType
> +{
> +fn read_type(mut fobj: ) -> Result
> +{
> +let mut tbuf = [0u8; 8];
> +if let Err(e) = fobj.read_exact( tbuf) {
> +if e.kind() == ErrorKind::UnexpectedEof {
> +return Ok(RecordType::Empty);
> +} else {
> +return Err(Error::ReadFile(e));
> +}
> +}
> +
> +/*
> + * Safe because the layout of the trace record requires us to parse
> + * the type first, and then there is a check on the validity of the
> + * record type.
> + */
> +let raw_t =
> +unsafe { std::mem::transmute::<[u8; 8], RecordRawType>(tbuf) };

A safe alternative: 
https://doc.rust-lang.org/std/primitive.u64.html#method.from_ne_bytes?

> +match raw_t.rtype {
> +RECORD_TYPE_MAPPING => Ok(RecordType::Mapping),
> +RECORD_TYPE_EVENT => Ok(RecordType::Event),
> +_ => Err(Error::UnknownRecType(raw_t.rtype)),
> +}
> +}
> +}
> +
> +trait ReadHeader
> +{
> +fn read_header(fobj: ) -> Result
> +where
> +Self: Sized;
> +}
> +
> +#[repr(C)]
> +#[derive(Clone, Copy)]
> +struct LogHeader
> +{
> +event_id: u64,
> +magic: u64,
> +version: u64,
> +}
> +
> +impl ReadHeader for LogHeader
> +{
> +fn read_header(mut fobj: ) -> Result
> +{
> +let mut raw_hdr = [0u8; 24];
> +fobj.read_exact( raw_hdr).map_err(Error::ReadFile)?;
> +
> +/*
> + * Safe because the size of log header (struct LogHeader)
> + * is 24 bytes, which is ensured by simple trace backend.
> + */
> +let hdr =
> +unsafe { std::mem::transmute::<[u8; 24], LogHeader>(raw_hdr) };

Or u64::from_ne_bytes() for each field.

> +Ok(hdr)
> +}
> +}
> +
> +#[derive(Default)]
> +struct RecordInfo
> +{
> +event_id: u64,
> +timestamp_ns: u64,
> +record_pid: u32,
> +args_payload: Vec,
> +}
> +
> +impl RecordInfo
> +{
> +fn new() -> Self
> +{
> +Default::default()
> +}
> +}
> +
> +#[repr(C)]
> +#[derive(Clone, Copy)]
> +struct RecordHeader
> +{
> +event_id: u64,
> +timestamp_ns: u64,
> +record_length: u32,
> +record_pid: u32,
> +}
> +
> +impl RecordHeader
> +{
> +fn extract_record(, mut fobj: ) -> Result
> +{
> +let mut info = RecordInfo::new();
> +
> +info.event_id = self.event_id;
> +info.timestamp_ns = self.timestamp_ns;
> +info.record_pid = self.record_pid;
> +info.args_payload = vec![
> +0u8;
> +self.record_length as usize
> +- std::mem::size_of::()
> +];
> +fobj.read_exact( info.args_payload)
> +.map_err(Error::ReadFile)?;
> +
> +Ok(info)
> +}
> +}
> +
> +impl ReadHeader for RecordHeader
> +{
> +fn read_header(mut fobj: ) -> Result
> +{
> +let mut raw_hdr = [0u8; 24];
> +fobj.read_exact( raw_hdr).map_err(Error::ReadFile)?;
> +
> +/*
> + * Safe because the size of record header (struct RecordHeader)
> + * is 24 bytes, which is ensured by simple trace backend.
> + */
> +let hdr: RecordHeader =
> +unsafe { std::mem::transmute::<[u8; 24], RecordHeader>(raw_hdr) 
> };

Or u64::from_ne_bytes() and u32::from_ne_bytes() for all fields.

> +Ok(hdr)
> +}
> +}
> +
>  pub struct EventArgPayload {}
>  
>  trait Analyzer
> -- 
> 2.34.1
> 


signature.asc
Description: PGP signature


Re: [RFC 2/6] scripts/simpletrace-rust: Support Event & Arguments in trace module

2024-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2024 at 04:14:17PM +0800, Zhao Liu wrote:
> Refer to scripts/tracetool/__init__.py, add Event & Arguments
> abstractions in trace module.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Zhao Liu 
> ---
>  scripts/simpletrace-rust/Cargo.lock   |  52 
>  scripts/simpletrace-rust/Cargo.toml   |   2 +
>  scripts/simpletrace-rust/src/trace.rs | 330 +-
>  3 files changed, 383 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/simpletrace-rust/Cargo.lock 
> b/scripts/simpletrace-rust/Cargo.lock
> index 4a0ff8092dcb..3d815014eb44 100644
> --- a/scripts/simpletrace-rust/Cargo.lock
> +++ b/scripts/simpletrace-rust/Cargo.lock
> @@ -2,6 +2,15 @@
>  # It is not intended for manual editing.
>  version = 3
>  
> +[[package]]
> +name = "aho-corasick"
> +version = "1.1.3"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "8e60d3430d3a69478ad0993f19238d2df97c507009a52b3c10addcd7f6bcb916"
> +dependencies = [
> + "memchr",
> +]
> +
>  [[package]]
>  name = "anstream"
>  version = "0.6.14"
> @@ -90,6 +99,18 @@ version = "1.70.0"
>  source = "registry+https://github.com/rust-lang/crates.io-index;
>  checksum = "f8478577c03552c21db0e2724ffb8986a5ce7af88107e6be5d2ee6e158c12800"
>  
> +[[package]]
> +name = "memchr"
> +version = "2.7.2"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "6c8640c5d730cb13ebd907d8d04b52f55ac9a2eec55b440c8892f40d56c76c1d"
> +
> +[[package]]
> +name = "once_cell"
> +version = "1.19.0"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "3fdb12b2476b595f9358c5161aa467c2438859caa136dec86c26fdd2efe17b92"
> +
>  [[package]]
>  name = "proc-macro2"
>  version = "1.0.83"
> @@ -108,11 +129,42 @@ dependencies = [
>   "proc-macro2",
>  ]
>  
> +[[package]]
> +name = "regex"
> +version = "1.10.4"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "c117dbdfde9c8308975b6a18d71f3f385c89461f7b3fb054288ecf2a2058ba4c"
> +dependencies = [
> + "aho-corasick",
> + "memchr",
> + "regex-automata",
> + "regex-syntax",
> +]
> +
> +[[package]]
> +name = "regex-automata"
> +version = "0.4.6"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "86b83b8b9847f9bf95ef68afb0b8e6cdb80f498442f5179a29fad448fcc1eaea"
> +dependencies = [
> + "aho-corasick",
> + "memchr",
> + "regex-syntax",
> +]
> +
> +[[package]]
> +name = "regex-syntax"
> +version = "0.8.3"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "adad44e29e4c806119491a7f06f03de4d1af22c3a680dd47f1e6e179439d1f56"
> +
>  [[package]]
>  name = "simpletrace-rust"
>  version = "0.1.0"
>  dependencies = [
>   "clap",
> + "once_cell",
> + "regex",
>   "thiserror",
>  ]
>  
> diff --git a/scripts/simpletrace-rust/Cargo.toml 
> b/scripts/simpletrace-rust/Cargo.toml
> index b44ba1569dad..24a79d04e566 100644
> --- a/scripts/simpletrace-rust/Cargo.toml
> +++ b/scripts/simpletrace-rust/Cargo.toml
> @@ -8,4 +8,6 @@ license = "GPL-2.0-or-later"
>  
>  [dependencies]
>  clap = "4.5.4"
> +once_cell = "1.19.0"
> +regex = "1.10.4"
>  thiserror = "1.0.20"
> diff --git a/scripts/simpletrace-rust/src/trace.rs 
> b/scripts/simpletrace-rust/src/trace.rs
> index 3a4b06435b8b..f41d6e0b5bc3 100644
> --- a/scripts/simpletrace-rust/src/trace.rs
> +++ b/scripts/simpletrace-rust/src/trace.rs
> @@ -8,4 +8,332 @@
>   * SPDX-License-Identifier: GPL-2.0-or-later
>   */
>  
> -pub struct Event {}
> +#![allow(dead_code)]
> +
> +use std::fs::read_to_string;
> +
> +use once_cell::sync::Lazy;
> +use regex::Regex;
> +use thiserror::Error;
> +
> +#[derive(Error, Debug)]
> +pub enum Error
> +{
> +#[error("Empty argument (did you forget to use 'void'?)")]
> +EmptyArg,
> +#[error("Event '{0}' has more than maximum permitted argument count")]
> +InvalidArgCnt(String),
> +#[error("{0} does not end with a new line")]
> +InvalidEventFile(String),
> +#[error("Invalid format: {0}")]
> +InvalidFormat(String),
> +#[error(
> +"Argument type '{0}' is not allowed. \
> +Only standard C types and fixed size integer \
> +types should be used. struct, union, and \
> +other complex pointer types should be \
> +declared as 'void *'"
> +)]
> +InvalidType(String),
> +#[error("Error at {0}:{1}: {2}")]
> +ReadEventFail(String, usize, String),
> +#[error("Unknown event: {0}")]
> +UnknownEvent(String),
> +#[error("Unknown properties: {0}")]
> +UnknownProp(String),
> +}
> +
> +pub type Result = std::result::Result;
> +
> +/*
> + * Refer to the description of ALLOWED_TYPES in
> + * scripts/tracetool/__init__.py.

Please don't reference the Python implementation because this will not
age well. It may bitrot if the Python code changes or if the Python
implementation is deprecated then the source file will go away
altogether. Make the Rust implementation self-contained. If there are
common file format 

Re: [RFC 1/6] scripts/simpletrace-rust: Add the basic cargo framework

2024-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2024 at 04:14:16PM +0800, Zhao Liu wrote:
> Define the basic cargo framework to support compiling simpletrace-rust
> via cargo, and add the Rust code style (with some nightly features)
> check items to make Rust style as close to the QEMU C code as possible.
> 
> With the base cargo package, define the basic code framework for
> simpletrace-rust, approximating the Python version, and also abstract
> Analyzer operations for simpletrace-rust. Event and other future
> trace-related structures are placed in the trace module.
> 
> Additionally, support basic command line parsing for simpletrace-rust as
> a start.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Zhao Liu 
> ---
>  scripts/simpletrace-rust/.gitignore|   1 +
>  scripts/simpletrace-rust/.rustfmt.toml |   9 +
>  scripts/simpletrace-rust/Cargo.lock| 239 +
>  scripts/simpletrace-rust/Cargo.toml|  11 ++
>  scripts/simpletrace-rust/src/main.rs   | 173 ++
>  scripts/simpletrace-rust/src/trace.rs  |  11 ++
>  6 files changed, 444 insertions(+)
>  create mode 100644 scripts/simpletrace-rust/.gitignore
>  create mode 100644 scripts/simpletrace-rust/.rustfmt.toml
>  create mode 100644 scripts/simpletrace-rust/Cargo.lock
>  create mode 100644 scripts/simpletrace-rust/Cargo.toml
>  create mode 100644 scripts/simpletrace-rust/src/main.rs
>  create mode 100644 scripts/simpletrace-rust/src/trace.rs
> 
> diff --git a/scripts/simpletrace-rust/.gitignore 
> b/scripts/simpletrace-rust/.gitignore
> new file mode 100644
> index ..2f7896d1d136
> --- /dev/null
> +++ b/scripts/simpletrace-rust/.gitignore
> @@ -0,0 +1 @@
> +target/
> diff --git a/scripts/simpletrace-rust/.rustfmt.toml 
> b/scripts/simpletrace-rust/.rustfmt.toml
> new file mode 100644
> index ..97a97c24ebfb
> --- /dev/null
> +++ b/scripts/simpletrace-rust/.rustfmt.toml
> @@ -0,0 +1,9 @@
> +brace_style = "AlwaysNextLine"
> +comment_width = 80
> +edition = "2021"
> +group_imports = "StdExternalCrate"
> +imports_granularity = "item"
> +max_width = 80
> +use_field_init_shorthand = true
> +use_try_shorthand = true
> +wrap_comments = true

There should be QEMU-wide policy. That said, why is it necessary to customize 
rustfmt?

> diff --git a/scripts/simpletrace-rust/Cargo.lock 
> b/scripts/simpletrace-rust/Cargo.lock
> new file mode 100644
> index ..4a0ff8092dcb
> --- /dev/null
> +++ b/scripts/simpletrace-rust/Cargo.lock
> @@ -0,0 +1,239 @@
> +# This file is automatically @generated by Cargo.
> +# It is not intended for manual editing.
> +version = 3
> +
> +[[package]]
> +name = "anstream"
> +version = "0.6.14"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "418c75fa768af9c03be99d17643f93f79bbba589895012a80e3452a19ddda15b"
> +dependencies = [
> + "anstyle",
> + "anstyle-parse",
> + "anstyle-query",
> + "anstyle-wincon",
> + "colorchoice",
> + "is_terminal_polyfill",
> + "utf8parse",
> +]
> +
> +[[package]]
> +name = "anstyle"
> +version = "1.0.7"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "038dfcf04a5feb68e9c60b21c9625a54c2c0616e79b72b0fd87075a056ae1d1b"
> +
> +[[package]]
> +name = "anstyle-parse"
> +version = "0.2.4"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "c03a11a9034d92058ceb6ee011ce58af4a9bf61491aa7e1e59ecd24bd40d22d4"
> +dependencies = [
> + "utf8parse",
> +]
> +
> +[[package]]
> +name = "anstyle-query"
> +version = "1.0.3"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "a64c907d4e79225ac72e2a354c9ce84d50ebb4586dee56c82b3ee73004f537f5"
> +dependencies = [
> + "windows-sys",
> +]
> +
> +[[package]]
> +name = "anstyle-wincon"
> +version = "3.0.3"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "61a38449feb7068f52bb06c12759005cf459ee52bb4adc1d5a7c4322d716fb19"
> +dependencies = [
> + "anstyle",
> + "windows-sys",
> +]
> +
> +[[package]]
> +name = "clap"
> +version = "4.5.4"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "90bc066a67923782aa8515dbaea16946c5bcc5addbd668bb80af688e53e548a0"
> +dependencies = [
> + "clap_builder",
> +]
> +
> +[[package]]
> +name = "clap_builder"
> +version = "4.5.2"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "ae129e2e766ae0ec03484e609954119f123cc1fe650337e155d03b022f24f7b4"
> +dependencies = [
> + "anstream",
> + "anstyle",
> + "clap_lex",
> + "strsim",
> +]
> +
> +[[package]]
> +name = "clap_lex"
> +version = "0.7.0"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "98cc8fbded0c607b7ba9dd60cd98df59af97e84d24e49c8557331cfc26d301ce"
> +
> +[[package]]
> +name = "colorchoice"
> +version = "1.0.1"
> +source = "registry+https://github.com/rust-lang/crates.io-index;
> +checksum = "0b6a852b24ab71dffc585bcb46eaf7959d175cb865a7152e35b348d1b2960422"
> +
> +[[package]]
> +name = "is_terminal_polyfill"

Re: [RFC 0/6] scripts: Rewrite simpletrace printer in Rust

2024-05-27 Thread Stefan Hajnoczi
On Mon, May 27, 2024 at 04:14:15PM +0800, Zhao Liu wrote:
> Hi maintainers and list,
> 
> This RFC series attempts to re-implement simpletrace.py with Rust, which
> is the 1st task of Paolo's GSoC 2024 proposal.
> 
> There are two motivations for this work:
> 1. This is an open chance to discuss how to integrate Rust into QEMU.
> 2. Rust delivers faster parsing.
> 
> 
> Introduction
> 
> 
> Code framework
> --
> 
> I choose "cargo" to organize the code, because the current
> implementation depends on external crates (Rust's library), such as
> "backtrace" for getting frameinfo, "clap" for parsing the cli, "rex" for
> regular matching, and so on. (Meson's support for external crates is
> still incomplete. [2])
> 
> The simpletrace-rust created in this series is not yet integrated into
> the QEMU compilation chain, so it can only be compiled independently, e.g.
> under ./scripts/simpletrace/, compile it be:
> 
> cargo build --release

Please make sure it's built by .gitlab-ci.d/ so that the continuous
integration system prevents bitrot. You can add a job that runs the
cargo build.

> 
> The code tree for the entire simpletrace-rust is as follows:
> 
> $ script/simpletrace-rust .
> .
> ├── Cargo.toml
> └── src
> └── main.rs   // The simpletrace logic (similar to simpletrace.py).
> └── trace.rs  // The Argument and Event abstraction (refer to
>   // tracetool/__init__.py).
> 
> My question about meson v.s. cargo, I put it at the end of the cover
> letter (the section "Opens on Rust Support").
> 
> The following two sections are lessons I've learned from this Rust
> practice.
> 
> 
> Performance
> ---
> 
> I did the performance comparison using the rust-simpletrace prototype with
> the python one:
> 
> * On the i7-10700 CPU @ 2.90GHz machine, parsing and outputting a 35M
> trace binary file for 10 times on each item:
> 
>   AVE (ms)   Rust v.s. Python
> Rust   (stdout)   12687.16114.46%
> Python (stdout)   14521.85
> 
> Rust   (file)  1422.44264.99%
> Python (file)  3769.37
> 
> - The "stdout" lines represent output to the screen.
> - The "file" lines represent output to a file (via "> file").
> 
> This Rust version contains some optimizations (including print, regular
> matching, etc.), but there should be plenty of room for optimization.
> 
> The current performance bottleneck is the reading binary trace file,
> since I am parsing headers and event payloads one after the other, so
> that the IO read overhead accounts for 33%, which can be further
> optimized in the future.

Performance will become more important when large amounts of TCG data is
captured, as described in the project idea:
https://wiki.qemu.org/Internships/ProjectIdeas/TCGBinaryTracing

While I can't think of a time in the past where simpletrace.py's
performance bothered me, improving performance is still welcome. Just
don't spend too much time on performance (and making the code more
complex) unless there is a real need.

> Security
> 
> 
> This is an example.
> 
> Rust is very strict about type-checking, and it found timestamp reversal
> issue in simpletrace-rust [3] (sorry, haven't gotten around to digging
> deeper with more time)...in this RFC, I workingaround it by allowing
> negative values. And the python version, just silently covered this
> issue up.
>
> Opens on Rust Support
> =
> 
> Meson v.s. Cargo
> 
> 
> The first question is whether all Rust code (including under scripts)
> must be integrated into meson?
> 
> If so, because of [2] then I have to discard the external crates and
> build some more Rust wheels of my own to replace the previous external
> crates.
> 
> For the main part of the QEMU code, I think the answer must be Yes, but
> for the tools in the scripts directory, would it be possible to allow
> the use of cargo to build small tools/program for flexibility and
> migrate to meson later (as meson's support for rust becomes more
> mature)?

I have not seen a satisfying way to natively build Rust code using
meson. I remember reading about a tool that converts Cargo.toml files to
meson wrap files or something similar. That still doesn't feel great
because upstream works with Cargo and duplicating build information in
meson is a drag.

Calling cargo from meson is not ideal either, but it works and avoids
duplicating build information. This is the approach I would use for now
unless someone can point to an example of native Rust support in meson
that is clean.

Here is how libblkio calls cargo from meson:
https://gitlab.com/libblkio/libblkio/-/blob/main/src/meson.build
https://gitlab.com/libblkio/libblkio/-/blob/main/src/cargo-build.sh

> 
> 
> External crates
> ---
> 
> This is an additional question that naturally follows from the above
> question, do we have requirements for Rust's external crate? Is only std
> allowed?

There is no 

[PATCH 1/2] block/crypto: create ciphers on demand

2024-05-27 Thread Stefan Hajnoczi
Ciphers are pre-allocated by qcrypto_block_init_cipher() depending on
the given number of threads. The -device
virtio-blk-pci,iothread-vq-mapping= feature allows users to assign
multiple IOThreads to a virtio-blk device, but the association between
the virtio-blk device and the block driver happens after the block
driver is already open.

When the number of threads given to qcrypto_block_init_cipher() is
smaller than the actual number of threads at runtime, the
block->n_free_ciphers > 0 assertion in qcrypto_block_pop_cipher() can
fail.

Get rid of qcrypto_block_init_cipher() n_thread's argument and allocate
ciphers on demand.

Reported-by: Qing Wang 
Buglink: https://issues.redhat.com/browse/RHEL-36159
Signed-off-by: Stefan Hajnoczi 
---
 crypto/blockpriv.h  |  12 +++--
 crypto/block-luks.c |   3 +-
 crypto/block-qcow.c |   2 +-
 crypto/block.c  | 113 ++--
 4 files changed, 79 insertions(+), 51 deletions(-)

diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h
index 836f3b4726..4bf6043d5d 100644
--- a/crypto/blockpriv.h
+++ b/crypto/blockpriv.h
@@ -32,8 +32,14 @@ struct QCryptoBlock {
 const QCryptoBlockDriver *driver;
 void *opaque;
 
-QCryptoCipher **ciphers;
-size_t n_ciphers;
+/* Cipher parameters */
+QCryptoCipherAlgorithm alg;
+QCryptoCipherMode mode;
+uint8_t *key;
+size_t nkey;
+
+QCryptoCipher **free_ciphers;
+size_t max_free_ciphers;
 size_t n_free_ciphers;
 QCryptoIVGen *ivgen;
 QemuMutex mutex;
@@ -130,7 +136,7 @@ int qcrypto_block_init_cipher(QCryptoBlock *block,
   QCryptoCipherAlgorithm alg,
   QCryptoCipherMode mode,
   const uint8_t *key, size_t nkey,
-  size_t n_threads, Error **errp);
+  Error **errp);
 
 void qcrypto_block_free_cipher(QCryptoBlock *block);
 
diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 3ee928fb5a..3357852c0a 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -1262,7 +1262,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
   luks->cipher_mode,
   masterkey,
   luks->header.master_key_len,
-  n_threads,
   errp) < 0) {
 goto fail;
 }
@@ -1456,7 +1455,7 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 /* Setup the block device payload encryption objects */
 if (qcrypto_block_init_cipher(block, luks_opts.cipher_alg,
   luks_opts.cipher_mode, masterkey,
-  luks->header.master_key_len, 1, errp) < 0) {
+  luks->header.master_key_len, errp) < 0) {
 goto error;
 }
 
diff --git a/crypto/block-qcow.c b/crypto/block-qcow.c
index 4d7cf36a8f..02305058e3 100644
--- a/crypto/block-qcow.c
+++ b/crypto/block-qcow.c
@@ -75,7 +75,7 @@ qcrypto_block_qcow_init(QCryptoBlock *block,
 ret = qcrypto_block_init_cipher(block, QCRYPTO_CIPHER_ALG_AES_128,
 QCRYPTO_CIPHER_MODE_CBC,
 keybuf, G_N_ELEMENTS(keybuf),
-n_threads, errp);
+errp);
 if (ret < 0) {
 ret = -ENOTSUP;
 goto fail;
diff --git a/crypto/block.c b/crypto/block.c
index 506ea1d1a3..ba6d1cebc7 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -20,6 +20,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qemu/lockable.h"
 #include "blockpriv.h"
 #include "block-qcow.h"
 #include "block-luks.h"
@@ -57,6 +58,8 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions 
*options,
 {
 QCryptoBlock *block = g_new0(QCryptoBlock, 1);
 
+qemu_mutex_init(>mutex);
+
 block->format = options->format;
 
 if (options->format >= G_N_ELEMENTS(qcrypto_block_drivers) ||
@@ -76,8 +79,6 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions 
*options,
 return NULL;
 }
 
-qemu_mutex_init(>mutex);
-
 return block;
 }
 
@@ -92,6 +93,8 @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions 
*options,
 {
 QCryptoBlock *block = g_new0(QCryptoBlock, 1);
 
+qemu_mutex_init(>mutex);
+
 block->format = options->format;
 
 if (options->format >= G_N_ELEMENTS(qcrypto_block_drivers) ||
@@ -111,8 +114,6 @@ QCryptoBlock 
*qcrypto_block_create(QCryptoBlockCreateOptions *options,
 return NULL;
 }
 
-qemu_mutex_init(>mutex);
-
 return block;
 }
 
@@ -227,37 +228,42 @@ QCryptoCipher *qcrypto_block_get_cipher(QCryptoBlock 
*block)
  * This function is used only in test with one thread (it's safe to skip

[PATCH 0/2] block/crypto: do not require number of threads upfront

2024-05-27 Thread Stefan Hajnoczi
The block layer does not know how many threads will perform I/O. It is possible
to exceed the number of threads that is given to qcrypto_block_open() and this
can trigger an assertion failure in qcrypto_block_pop_cipher().

This patch series removes the n_threads argument and instead handles an
arbitrary number of threads.
---
Is it secure to store the key in QCryptoBlock? In this series I assumed the
answer is yes since the QCryptoBlock's cipher state is equally sensitive, but
I'm not familiar with this code or a crypto expert.

Stefan Hajnoczi (2):
  block/crypto: create ciphers on demand
  crypto/block: drop qcrypto_block_open() n_threads argument

 crypto/blockpriv.h |  13 ++--
 include/crypto/block.h |   2 -
 block/crypto.c |   1 -
 block/qcow.c   |   2 +-
 block/qcow2.c  |   5 +-
 crypto/block-luks.c|   4 +-
 crypto/block-qcow.c|   8 +--
 crypto/block.c | 116 -
 tests/unit/test-crypto-block.c |   4 --
 9 files changed, 85 insertions(+), 70 deletions(-)

-- 
2.45.1




[PATCH 0/2] block/crypto: do not require number of threads upfront

2024-05-27 Thread Stefan Hajnoczi
The block layer does not know how many threads will perform I/O. It is possible
to exceed the number of threads that is given to qcrypto_block_open() and this
can trigger an assertion failure in qcrypto_block_pop_cipher().

This patch series removes the n_threads argument and instead handles an
arbitrary number of threads.
---
Is it secure to store the key in QCryptoBlock? In this series I assumed the
answer is yes since the QCryptoBlock's cipher state is equally sensitive, but
I'm not familiar with this code or a crypto expert.

Stefan Hajnoczi (2):
  block/crypto: create ciphers on demand
  crypto/block: drop qcrypto_block_open() n_threads argument

 crypto/blockpriv.h |  13 ++--
 include/crypto/block.h |   2 -
 block/crypto.c |   1 -
 block/qcow.c   |   2 +-
 block/qcow2.c  |   5 +-
 crypto/block-luks.c|   4 +-
 crypto/block-qcow.c|   8 +--
 crypto/block.c | 116 -
 tests/unit/test-crypto-block.c |   4 --
 9 files changed, 85 insertions(+), 70 deletions(-)

-- 
2.45.1




[PATCH 1/2] block/crypto: create ciphers on demand

2024-05-27 Thread Stefan Hajnoczi
Ciphers are pre-allocated by qcrypto_block_init_cipher() depending on
the given number of threads. The -device
virtio-blk-pci,iothread-vq-mapping= feature allows users to assign
multiple IOThreads to a virtio-blk device, but the association between
the virtio-blk device and the block driver happens after the block
driver is already open.

When the number of threads given to qcrypto_block_init_cipher() is
smaller than the actual number of threads at runtime, the
block->n_free_ciphers > 0 assertion in qcrypto_block_pop_cipher() can
fail.

Get rid of qcrypto_block_init_cipher() n_thread's argument and allocate
ciphers on demand.

Reported-by: Qing Wang 
Buglink: https://issues.redhat.com/browse/RHEL-36159
Signed-off-by: Stefan Hajnoczi 
---
 crypto/blockpriv.h  |  12 +++--
 crypto/block-luks.c |   3 +-
 crypto/block-qcow.c |   2 +-
 crypto/block.c  | 113 ++--
 4 files changed, 79 insertions(+), 51 deletions(-)

diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h
index 836f3b4726..4bf6043d5d 100644
--- a/crypto/blockpriv.h
+++ b/crypto/blockpriv.h
@@ -32,8 +32,14 @@ struct QCryptoBlock {
 const QCryptoBlockDriver *driver;
 void *opaque;
 
-QCryptoCipher **ciphers;
-size_t n_ciphers;
+/* Cipher parameters */
+QCryptoCipherAlgorithm alg;
+QCryptoCipherMode mode;
+uint8_t *key;
+size_t nkey;
+
+QCryptoCipher **free_ciphers;
+size_t max_free_ciphers;
 size_t n_free_ciphers;
 QCryptoIVGen *ivgen;
 QemuMutex mutex;
@@ -130,7 +136,7 @@ int qcrypto_block_init_cipher(QCryptoBlock *block,
   QCryptoCipherAlgorithm alg,
   QCryptoCipherMode mode,
   const uint8_t *key, size_t nkey,
-  size_t n_threads, Error **errp);
+  Error **errp);
 
 void qcrypto_block_free_cipher(QCryptoBlock *block);
 
diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 3ee928fb5a..3357852c0a 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -1262,7 +1262,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
   luks->cipher_mode,
   masterkey,
   luks->header.master_key_len,
-  n_threads,
   errp) < 0) {
 goto fail;
 }
@@ -1456,7 +1455,7 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 /* Setup the block device payload encryption objects */
 if (qcrypto_block_init_cipher(block, luks_opts.cipher_alg,
   luks_opts.cipher_mode, masterkey,
-  luks->header.master_key_len, 1, errp) < 0) {
+  luks->header.master_key_len, errp) < 0) {
 goto error;
 }
 
diff --git a/crypto/block-qcow.c b/crypto/block-qcow.c
index 4d7cf36a8f..02305058e3 100644
--- a/crypto/block-qcow.c
+++ b/crypto/block-qcow.c
@@ -75,7 +75,7 @@ qcrypto_block_qcow_init(QCryptoBlock *block,
 ret = qcrypto_block_init_cipher(block, QCRYPTO_CIPHER_ALG_AES_128,
 QCRYPTO_CIPHER_MODE_CBC,
 keybuf, G_N_ELEMENTS(keybuf),
-n_threads, errp);
+errp);
 if (ret < 0) {
 ret = -ENOTSUP;
 goto fail;
diff --git a/crypto/block.c b/crypto/block.c
index 506ea1d1a3..ba6d1cebc7 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -20,6 +20,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qemu/lockable.h"
 #include "blockpriv.h"
 #include "block-qcow.h"
 #include "block-luks.h"
@@ -57,6 +58,8 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions 
*options,
 {
 QCryptoBlock *block = g_new0(QCryptoBlock, 1);
 
+qemu_mutex_init(>mutex);
+
 block->format = options->format;
 
 if (options->format >= G_N_ELEMENTS(qcrypto_block_drivers) ||
@@ -76,8 +79,6 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions 
*options,
 return NULL;
 }
 
-qemu_mutex_init(>mutex);
-
 return block;
 }
 
@@ -92,6 +93,8 @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions 
*options,
 {
 QCryptoBlock *block = g_new0(QCryptoBlock, 1);
 
+qemu_mutex_init(>mutex);
+
 block->format = options->format;
 
 if (options->format >= G_N_ELEMENTS(qcrypto_block_drivers) ||
@@ -111,8 +114,6 @@ QCryptoBlock 
*qcrypto_block_create(QCryptoBlockCreateOptions *options,
 return NULL;
 }
 
-qemu_mutex_init(>mutex);
-
 return block;
 }
 
@@ -227,37 +228,42 @@ QCryptoCipher *qcrypto_block_get_cipher(QCryptoBlock 
*block)
  * This function is used only in test with one thread (it's safe to skip

[PATCH 2/2] crypto/block: drop qcrypto_block_open() n_threads argument

2024-05-27 Thread Stefan Hajnoczi
The n_threads argument is no longer used since the previous commit.
Remove it.

Signed-off-by: Stefan Hajnoczi 
---
 crypto/blockpriv.h | 1 -
 include/crypto/block.h | 2 --
 block/crypto.c | 1 -
 block/qcow.c   | 2 +-
 block/qcow2.c  | 5 ++---
 crypto/block-luks.c| 1 -
 crypto/block-qcow.c| 6 ++
 crypto/block.c | 3 +--
 tests/unit/test-crypto-block.c | 4 
 9 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h
index 4bf6043d5d..b8f77cb5eb 100644
--- a/crypto/blockpriv.h
+++ b/crypto/blockpriv.h
@@ -59,7 +59,6 @@ struct QCryptoBlockDriver {
 QCryptoBlockReadFunc readfunc,
 void *opaque,
 unsigned int flags,
-size_t n_threads,
 Error **errp);
 
 int (*create)(QCryptoBlock *block,
diff --git a/include/crypto/block.h b/include/crypto/block.h
index 92e823c9f2..5b5d039800 100644
--- a/include/crypto/block.h
+++ b/include/crypto/block.h
@@ -76,7 +76,6 @@ typedef enum {
  * @readfunc: callback for reading data from the volume
  * @opaque: data to pass to @readfunc
  * @flags: bitmask of QCryptoBlockOpenFlags values
- * @n_threads: allow concurrent I/O from up to @n_threads threads
  * @errp: pointer to a NULL-initialized error object
  *
  * Create a new block encryption object for an existing
@@ -113,7 +112,6 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions 
*options,
  QCryptoBlockReadFunc readfunc,
  void *opaque,
  unsigned int flags,
- size_t n_threads,
  Error **errp);
 
 typedef enum {
diff --git a/block/crypto.c b/block/crypto.c
index 21eed909c1..4eed3ffa6a 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -363,7 +363,6 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
block_crypto_read_func,
bs,
cflags,
-   1,
errp);
 
 if (!crypto->block) {
diff --git a/block/qcow.c b/block/qcow.c
index ca8e1d5ec8..c2f89db055 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -211,7 +211,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, 
int flags,
 cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
 }
 s->crypto = qcrypto_block_open(crypto_opts, "encrypt.",
-   NULL, NULL, cflags, 1, errp);
+   NULL, NULL, cflags, errp);
 if (!s->crypto) {
 ret = -EINVAL;
 goto fail;
diff --git a/block/qcow2.c b/block/qcow2.c
index 956128b409..10883a2494 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -321,7 +321,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 }
 s->crypto = qcrypto_block_open(s->crypto_opts, "encrypt.",
qcow2_crypto_hdr_read_func,
-   bs, cflags, QCOW2_MAX_THREADS, 
errp);
+   bs, cflags, errp);
 if (!s->crypto) {
 return -EINVAL;
 }
@@ -1701,8 +1701,7 @@ qcow2_do_open(BlockDriverState *bs, QDict *options, int 
flags,
 cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
 }
 s->crypto = qcrypto_block_open(s->crypto_opts, "encrypt.",
-   NULL, NULL, cflags,
-   QCOW2_MAX_THREADS, errp);
+   NULL, NULL, cflags, errp);
 if (!s->crypto) {
 ret = -EINVAL;
 goto fail;
diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 3357852c0a..5b777c15d3 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -1189,7 +1189,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 QCryptoBlockReadFunc readfunc,
 void *opaque,
 unsigned int flags,
-size_t n_threads,
 Error **errp)
 {
 QCryptoBlockLUKS *luks = NULL;
diff --git a/crypto/block-qcow.c b/crypto/block-qcow.c
index 02305058e3..42e9556e42 100644
--- a/crypto/block-qcow.c
+++ b/crypto/block-qcow.c
@@ -44,7 +44,6 @@ qcrypto_block_qcow_has_format(const uint8_t *buf 
G_GNUC_UNUSED,
 static int
 qcrypto_block_qcow_init(QCryptoBlock *block,
 const char *keysecret,
-size_t n_threads,
 Error **errp)
 {
 char *password;
@@ -1

[PATCH 2/2] crypto/block: drop qcrypto_block_open() n_threads argument

2024-05-27 Thread Stefan Hajnoczi
The n_threads argument is no longer used since the previous commit.
Remove it.

Signed-off-by: Stefan Hajnoczi 
---
 crypto/blockpriv.h | 1 -
 include/crypto/block.h | 2 --
 block/crypto.c | 1 -
 block/qcow.c   | 2 +-
 block/qcow2.c  | 5 ++---
 crypto/block-luks.c| 1 -
 crypto/block-qcow.c| 6 ++
 crypto/block.c | 3 +--
 tests/unit/test-crypto-block.c | 4 
 9 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h
index 4bf6043d5d..b8f77cb5eb 100644
--- a/crypto/blockpriv.h
+++ b/crypto/blockpriv.h
@@ -59,7 +59,6 @@ struct QCryptoBlockDriver {
 QCryptoBlockReadFunc readfunc,
 void *opaque,
 unsigned int flags,
-size_t n_threads,
 Error **errp);
 
 int (*create)(QCryptoBlock *block,
diff --git a/include/crypto/block.h b/include/crypto/block.h
index 92e823c9f2..5b5d039800 100644
--- a/include/crypto/block.h
+++ b/include/crypto/block.h
@@ -76,7 +76,6 @@ typedef enum {
  * @readfunc: callback for reading data from the volume
  * @opaque: data to pass to @readfunc
  * @flags: bitmask of QCryptoBlockOpenFlags values
- * @n_threads: allow concurrent I/O from up to @n_threads threads
  * @errp: pointer to a NULL-initialized error object
  *
  * Create a new block encryption object for an existing
@@ -113,7 +112,6 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions 
*options,
  QCryptoBlockReadFunc readfunc,
  void *opaque,
  unsigned int flags,
- size_t n_threads,
  Error **errp);
 
 typedef enum {
diff --git a/block/crypto.c b/block/crypto.c
index 21eed909c1..4eed3ffa6a 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -363,7 +363,6 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
block_crypto_read_func,
bs,
cflags,
-   1,
errp);
 
 if (!crypto->block) {
diff --git a/block/qcow.c b/block/qcow.c
index ca8e1d5ec8..c2f89db055 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -211,7 +211,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, 
int flags,
 cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
 }
 s->crypto = qcrypto_block_open(crypto_opts, "encrypt.",
-   NULL, NULL, cflags, 1, errp);
+   NULL, NULL, cflags, errp);
 if (!s->crypto) {
 ret = -EINVAL;
 goto fail;
diff --git a/block/qcow2.c b/block/qcow2.c
index 956128b409..10883a2494 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -321,7 +321,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 }
 s->crypto = qcrypto_block_open(s->crypto_opts, "encrypt.",
qcow2_crypto_hdr_read_func,
-   bs, cflags, QCOW2_MAX_THREADS, 
errp);
+   bs, cflags, errp);
 if (!s->crypto) {
 return -EINVAL;
 }
@@ -1701,8 +1701,7 @@ qcow2_do_open(BlockDriverState *bs, QDict *options, int 
flags,
 cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
 }
 s->crypto = qcrypto_block_open(s->crypto_opts, "encrypt.",
-   NULL, NULL, cflags,
-   QCOW2_MAX_THREADS, errp);
+   NULL, NULL, cflags, errp);
 if (!s->crypto) {
 ret = -EINVAL;
 goto fail;
diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 3357852c0a..5b777c15d3 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -1189,7 +1189,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 QCryptoBlockReadFunc readfunc,
 void *opaque,
 unsigned int flags,
-size_t n_threads,
 Error **errp)
 {
 QCryptoBlockLUKS *luks = NULL;
diff --git a/crypto/block-qcow.c b/crypto/block-qcow.c
index 02305058e3..42e9556e42 100644
--- a/crypto/block-qcow.c
+++ b/crypto/block-qcow.c
@@ -44,7 +44,6 @@ qcrypto_block_qcow_has_format(const uint8_t *buf 
G_GNUC_UNUSED,
 static int
 qcrypto_block_qcow_init(QCryptoBlock *block,
 const char *keysecret,
-size_t n_threads,
 Error **errp)
 {
 char *password;
@@ -1

Re: [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators

2024-05-21 Thread Stefan Hajnoczi
On Thu, 16 May 2024 at 12:23, Daniel P. Berrangé  wrote:
>
> This patch kicks the hornet's nest of AI / LLM code generators.
>
> With the increasing interest in code generators in recent times,
> it is inevitable that QEMU contributions will include AI generated
> code. Thus far we have remained silent on the matter. Given that
> everyone knows these tools exist, our current position has to be
> considered tacit acceptance of the use of AI generated code in QEMU.
>
> The question for the project is whether that is a good position for
> QEMU to take or not ?
>
> IANAL, but I like to think I'm reasonably proficient at understanding
> open source licensing. I am not inherantly against the use of AI tools,
> rather I am anti-risk. I also want to see OSS licenses respected and
> complied with.
>
> AFAICT at its current state of (im)maturity the question of licensing
> of AI code generator output does not have a broadly accepted / settled
> legal position. This is an inherant bias/self-interest from the vendors
> promoting their usage, who tend to minimize/dismiss the legal questions.
> From my POV, this puts such tools in a position of elevated legal risk.
>
> Given the fuzziness over the legal position of generated code from
> such tools, I don't consider it credible (today) for a contributor
> to assert compliance with the DCO terms (b) or (c) (which is a stated
> pre-requisite for QEMU accepting patches) when a patch includes (or is
> derived from) AI generated code.
>
> By implication, I think that QEMU must (for now) explicitly decline
> to (knowingly) accept AI generated code.
>
> Perhaps a few years down the line the legal uncertainty will have
> reduced and we can re-evaluate this policy.
>
> Discuss...

Although this policy is unenforceable, I think it's a valid position
to take until the legal situation becomes clear.

Acked-by: Stefan Hajnoczi 



Re: [PATCH] qio: Inherit follow_coroutine_ctx across TLS

2024-05-16 Thread Stefan Hajnoczi
On Wed, May 15, 2024 at 09:14:06PM -0500, Eric Blake wrote:
> Since qemu 8.2, the combination of NBD + TLS + iothread crashes on an
> assertion failure:
> 
> qemu-kvm: ../io/channel.c:534: void qio_channel_restart_read(void *): 
> Assertion `qemu_get_current_aio_context() == 
> qemu_coroutine_get_aio_context(co)' failed.
> 
> It turns out that when we removed AioContext locking, we did so by
> having NBD tell its qio channels that it wanted to opt in to
> qio_channel_set_follow_coroutine_ctx(); but while we opted in on the
> main channel, we did not opt in on the TLS wrapper channel.
> qemu-iotests has coverage of NBD+iothread and NBD+TLS, but apparently
> no coverage of NBD+TLS+iothread, or we would have noticed this
> regression sooner.  (I'll add that in the next patch)
> 
> But while we could manually opt in to the TLS thread in nbd/server.c,
> it is more generic if all qio channels that wrap other channels
> inherit the follow status, in the same way that they inherit feature
> bits.
> 
> CC: Stefan Hajnoczi 
> CC: Daniel P. Berrangé 
> CC: qemu-sta...@nongnu.org
> Fixes: https://issues.redhat.com/browse/RHEL-34786
> Fixes: 06e0f098 ("io: follow coroutine AioContext in qio_channel_yield()", 
> v8.2.0)
> Signed-off-by: Eric Blake 
> 
> ---
> 
> Maybe we should turn ioc->follow_coroutine_ctx into a
> QIO_CHANNEL_FEATURE_* bit?

It seems like existing feature bits are for characteristics inherent in
the specific channel, like whether it can pass file descriptors.
Following the coroutine AioContext is not an inherent in the specific
channel, it's something that can be toggled at runtime on any channel.

If larger changes are being considered, I would look into always
following the coroutine's AioContext and getting rid of the API to
toggle this behavior completely. From commit 06e0f098d612's description:

  While the API is has become simpler, there is one wart: QIOChannel has a
  special case for the iohandler AioContext (used for handlers that must not run
  in nested event loops). I didn't find an elegant way preserve that behavior, 
so
  I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
  for opting in to the new AioContext model. By default QIOChannel uses the
  iohandler AioHandler. Code that formerly called
  qio_channel_attach_aio_context() now calls
  qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
  created.

> And I have not yet written the promised qemu-iotests patch, but I
> wanted to get this on the list before I'm offline for a week.
> 
> ---
>  io/channel-tls.c | 26 +++++++---
>  io/channel-websock.c |  1 +
>  2 files changed, 16 insertions(+), 11 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 00/20] qapi: new sphinx qapi domain pre-requisites

2024-05-16 Thread Stefan Hajnoczi
  |   4 +-
>  scripts/qapi/visit.py |   9 +-
>  tests/qapi-schema/doc-empty-section.err   |   2 +-
>  tests/qapi-schema/doc-empty-section.json  |   2 +-
>  tests/qapi-schema/doc-good.json   |  18 +-
>  tests/qapi-schema/doc-good.out|  61 +++---
>  tests/qapi-schema/doc-good.txt    |  31 +--
>  .../qapi-schema/doc-interleaved-section.json  |   2 +-
>  47 files changed, 1152 insertions(+), 753 deletions(-)
>  create mode 100755 scripts/qapi-lint.sh
>  create mode 100644 scripts/qapi/Makefile
> 
> -- 
> 2.44.0
> 
> 

For block-core.json/block-export.json/block.json:

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 00/20] qapi: new sphinx qapi domain pre-requisites

2024-05-16 Thread Stefan Hajnoczi
  |   4 +-
>  scripts/qapi/visit.py |   9 +-
>  tests/qapi-schema/doc-empty-section.err   |   2 +-
>  tests/qapi-schema/doc-empty-section.json  |   2 +-
>  tests/qapi-schema/doc-good.json   |  18 +-
>  tests/qapi-schema/doc-good.out|  61 +++---
>  tests/qapi-schema/doc-good.txt    |  31 +--
>  .../qapi-schema/doc-interleaved-section.json  |   2 +-
>  47 files changed, 1152 insertions(+), 753 deletions(-)
>  create mode 100755 scripts/qapi-lint.sh
>  create mode 100644 scripts/qapi/Makefile
> 
> -- 
> 2.44.0
> 
> 

For block-core.json/block-export.json/block.json:

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH] scripts/simpletrace: Mark output with unstable timestamp as WARN

2024-05-14 Thread Stefan Hajnoczi
On Tue, May 14, 2024, 03:57 Zhao Liu  wrote:

> Hi Stefan,
>
> > QEMU uses clock_gettime(CLOCK_MONOTONIC) on Linux hosts. The man page
> > says:
> >
> >   All CLOCK_MONOTONIC variants guarantee that the time returned by
> >   consecutive  calls  will  not go backwards, but successive calls
> >   may—depending  on  the  architecture—return  identical  (not-in‐
> >   creased) time values.
> >
> > trace_record_start() calls clock_gettime(CLOCK_MONOTONIC) so trace events
> > should have monotonically increasing timestamps.
> >
> > I don't see a scenario where trace record A's timestamp is greater than
> > trace record B's timestamp unless the clock is non-monotonic.
> >
> > Which host CPU architecture and operating system are you running?
>
> I tested on these 2 machines:
> * CML (intel 10th) with Ubuntu 22.04 + kernel v6.5.0-28
> * MTL (intel 14th) with Ubuntu 22.04.2 + kernel v6.9.0
>
> > Please attach to the QEMU process with gdb and print out the value of
> > the use_rt_clock variable or add a printf in init_get_clock(). The value
> > should be 1.
>
> Thanks, on both above machines, use_rt_clock is 1 and there're both
> timestamp reversal issues with the following debug print:
>
> diff --git a/include/qemu/timer.h b/include/qemu/timer.h
> index 9a366e551fb3..7657785c27dc 100644
> --- a/include/qemu/timer.h
> +++ b/include/qemu/timer.h
> @@ -831,10 +831,17 @@ extern int use_rt_clock;
>
>  static inline int64_t get_clock(void)
>  {
> +static int64_t clock = 0;
>

Please try with a thread local variable (__thread) to check whether this
happens within a single thread.

If it only happens with a global variable then we'd need to look more
closely at race conditions in the patch below. I don't think the patch is a
reliable way to detect non-monotonic timestamps in a multi-threaded program.

 if (use_rt_clock) {
>  struct timespec ts;
>  clock_gettime(CLOCK_MONOTONIC, );
> -return ts.tv_sec * 10LL + ts.tv_nsec;
> +int64_t tmp = ts.tv_sec * 10LL + ts.tv_nsec;
> +if (tmp <= clock) {
> +printf("get_clock: strange, clock: %ld, tmp: %ld\n", clock,
> tmp);
> +}
> +assert(tmp > clock);
> +clock = tmp;
> +return clock;
>  } else {
>  /* XXX: using gettimeofday leads to problems if the date
> changes, so it should be avoided. */
> diff --git a/util/qemu-timer-common.c b/util/qemu-timer-common.c
> index cc1326f72646..3bf06eb4a4ce 100644
> --- a/util/qemu-timer-common.c
> +++ b/util/qemu-timer-common.c
> @@ -59,5 +59,6 @@ static void __attribute__((constructor))
> init_get_clock(void)
>  use_rt_clock = 1;
>  }
>  clock_start = get_clock();
> +printf("init_get_clock: use_rt_clock: %d\n", use_rt_clock);
>  }
>  #endif
>
> ---
> The timestamp interval is very small, for example:
> get_clock: strange, clock: 3302130503505, tmp: 3302130503503
>
> or
>
> get_clock: strange, clock: 2761577819846455, tmp: 2761577819846395
>
> I also tried to use CLOCK_MONOTONIC_RAW, but there's still the reversal
> issue.
>
> Thanks,
> Zhao
>
>


  1   2   3   4   5   6   7   8   9   10   >