date:20230607

[PATCH] target/riscv/vector_helper.c: clean up reference of MTYPE

2023-06-07 Thread Xiao Wang

There's no code using MTYPE, which was a concept used in older vector
implementation.

Signed-off-by: Xiao Wang 
---
 target/riscv/vector_helper.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index f261e726c2..1e06e7447c 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -378,7 +378,7 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
 
 /*
  * masked unit-stride load and store operation will be a special case of
- * stride, stride = NF * sizeof (MTYPE)
+ * stride, stride = NF * sizeof (ETYPE)
  */
 
 #define GEN_VEXT_LD_US(NAME, ETYPE, LOAD_FN)\
@@ -650,10 +650,6 @@ GEN_VEXT_LDFF(vle64ff_v, int64_t, lde_d)
 #define DO_MAX(N, M)  ((N) >= (M) ? (N) : (M))
 #define DO_MIN(N, M)  ((N) >= (M) ? (M) : (N))
 
-/* Unsigned min/max */
-#define DO_MAXU(N, M) DO_MAX((UMTYPE)N, (UMTYPE)M)
-#define DO_MINU(N, M) DO_MIN((UMTYPE)N, (UMTYPE)M)
-
 /*
  * load and store whole register instructions
  */
-- 
2.25.1

Re: [RFC PATCH 00/19] hugetlb support for KVM guest_mem

2023-06-07 Thread Isaku Yamahata

On Tue, Jun 06, 2023 at 07:03:45PM +,
Ackerley Tng  wrote:

> Hello,
> 
> This patchset builds upon a soon-to-be-published WIP patchset that Sean
> published at https://github.com/sean-jc/linux/tree/x86/kvm_gmem_solo, 
> mentioned
> at [1].
> 
> The tree can be found at:
> https://github.com/googleprodkernel/linux-cc/tree/gmem-hugetlb-rfc-v1
> 
> In this patchset, hugetlb support for KVM's guest_mem (aka gmem) is 
> introduced,
> allowing VM private memory (for confidential computing) to be backed by 
> hugetlb
> pages.
> 
> guest_mem provides userspace with a handle, with which userspace can allocate
> and deallocate memory for confidential VMs without mapping the memory into
> userspace.
> 
> Why use hugetlb instead of introducing a new allocator, like gmem does for 4K
> and transparent hugepages?
> 
> + hugetlb provides the following useful functionality, which would otherwise
>   have to be reimplemented:
> + Allocation of hugetlb pages at boot time, including
> + Parsing of kernel boot parameters to configure hugetlb
> + Tracking of usage in hstate
> + gmem will share the same system-wide pool of hugetlb pages, so users
>   don't have to have separate pools for hugetlb and gmem
> + Page accounting with subpools
> + hugetlb pages are tracked in subpools, which gmem uses to reserve
>   pages from the global hstate
> + Memory charging
> + hugetlb provides code that charges memory to cgroups
> + Reporting: hugetlb usage and availability are available at 
> /proc/meminfo,
>   etc
> 
> The first 11 patches in this patchset is a series of refactoring to decouple
> hugetlb and hugetlbfs.
> 
> The central thread binding the refactoring is that some functions (like
> inode_resv_map(), inode_subpool(), inode_hstate(), etc) rely on a hugetlbfs
> concept, that the resv_map, subpool, hstate, are in a specific field in a
> hugetlb inode.
> 
> Refactoring to parametrize functions by hstate, subpool, resv_map will allow
> hugetlb to be used by gmem and in other places where these data structures
> aren't necessarily stored in the same positions in the inode.
> 
> The refactoring proposed here is just the minimum required to get a
> proof-of-concept working with gmem. I would like to get opinions on this
> approach before doing further refactoring. (See TODOs)
> 
> TODOs:
> 
> + hugetlb/hugetlbfs refactoring
> + remove_inode_hugepages() no longer needs to be exposed, it is hugetlbfs
>   specific and used only in inode.c
> + remove_mapping_hugepages(), remove_inode_single_folio(),
>   hugetlb_unreserve_pages() shouldn't need to take inode as a parameter
> + Updating inode->i_blocks can be refactored to a separate function 
> and
>   called from hugetlbfs and gmem
> + alloc_hugetlb_folio_from_subpool() shouldn't need to be parametrized by
>   vma
> + hugetlb_reserve_pages() should be refactored to be symmetric with
>   hugetlb_unreserve_pages()
> + It should be parametrized by resv_map
> + alloc_hugetlb_folio_from_subpool() could perhaps use
>   hugetlb_reserve_pages()?
> + gmem
> + Figure out if resv_map should be used by gmem at all
> + Probably needs more refactoring to decouple resv_map from hugetlb
>   functions

Hi. If kvm gmem is compiled as kernel module, many symbols are failed to link.
You need to add EXPORT_SYMBOL{,_GPL} for exported symbols.
Or compile it to kernel instead of module?

Thanks,

> Questions for the community:
> 
> 1. In this patchset, every gmem file backed with hugetlb is given a new
>subpool. Is that desirable?
> + In hugetlbfs, a subpool always belongs to a mount, and hugetlbfs has one
>   mount per hugetlb size (2M, 1G, etc)
> + memfd_create(MFD_HUGETLB) effectively returns a full hugetlbfs file, so 
> it
>   (rightfully) uses the hugetlbfs kernel mounts and their subpools
> + I gave each file a subpool mostly to speed up implementation and still 
> be
>   able to reserve hugetlb pages from the global hstate based on the gmem
>   file size.
> + gmem, unlike hugetlbfs, isn't meant to be a full filesystem, so
> + Should there be multiple mounts, one for each hugetlb size?
> + Will the mounts be initialized on boot or on first gmem file 
> creation?
> + Or is one subpool per gmem file fine?
> 2. Should resv_map be used for gmem at all, since gmem doesn't allow userspace
>reservations?
> 
> [1] https://lore.kernel.org/lkml/zem5zq8oo+xna...@google.com/
> 
> ---
> 
> Ackerley Tng (19):
>   mm: hugetlb: Expose get_hstate_idx()
>   mm: hugetlb: Move and expose hugetlbfs_zero_partial_page
>   mm: hugetlb: Expose remove_inode_hugepages
>   mm: hugetlb: Decouple hstate, subpool from inode
>   mm: hugetlb: Allow alloc_hugetlb_folio() to be parametrized by subpool
> and hstate
>   mm: hugetlb: Provide hugetlb_filemap_add_folio()
>   mm: hugetlb: Refactor

[QEMU PATCH 1/1] virtgpu: do not destroy resources when guest suspend

2023-06-07 Thread Jiqian Chen

After suspending and resuming guest VM, you will get
a black screen, and the display can't come back.

This is because when guest did suspending, it called
into qemu to call virtio_gpu_gl_reset. In function
virtio_gpu_gl_reset, it destroyed resources and reset
renderer, which were used for display. As a result,
guest's screen can't come back to the time when it was
suspended and only showed black.

So, this patch adds a new ctrl message
VIRTIO_GPU_CMD_STATUS_FREEZING to get notification from
guest. If guest is during suspending, it sets freezing
status of virtgpu to true, this will prevent destroying
resources and resetting renderer when guest calls into
virtio_gpu_gl_reset. If guest is during resuming, it sets
freezing to false, and then virtio_gpu_gl_reset will keep
its origin actions and has no other impaction.

Signed-off-by: Jiqian Chen 
---
 hw/display/virtio-gpu-gl.c  |  9 ++-
 hw/display/virtio-gpu-virgl.c   |  3 +++
 hw/display/virtio-gpu.c | 26 +++--
 include/hw/virtio/virtio-gpu.h  |  3 +++
 include/standard-headers/linux/virtio_gpu.h |  9 +++
 5 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index e06be60dfb..e11ad233eb 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -100,7 +100,14 @@ static void virtio_gpu_gl_reset(VirtIODevice *vdev)
  */
 if (gl->renderer_inited && !gl->renderer_reset) {
 virtio_gpu_virgl_reset_scanout(g);
-gl->renderer_reset = true;
+/*
+ * If guest is suspending, we shouldn't reset renderer,
+ * otherwise, the display can't come back to the time when
+ * it was suspended after guest resumed.
+ */
+if (!g->freezing) {
+gl->renderer_reset = true;
+}
 }
 }
 
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 73cb92c8d5..183ec92d53 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -464,6 +464,9 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
 case VIRTIO_GPU_CMD_GET_EDID:
 virtio_gpu_get_edid(g, cmd);
 break;
+case VIRTIO_GPU_CMD_STATUS_FREEZING:
+virtio_gpu_cmd_status_freezing(g, cmd);
+break;
 default:
 cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
 break;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 5e15c79b94..8f235d7848 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -373,6 +373,16 @@ static void virtio_gpu_resource_create_blob(VirtIOGPU *g,
 QTAILQ_INSERT_HEAD(>reslist, res, next);
 }
 
+void virtio_gpu_cmd_status_freezing(VirtIOGPU *g,
+ struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_status_freezing sf;
+
+VIRTIO_GPU_FILL_CMD(sf);
+virtio_gpu_bswap_32(, sizeof(sf));
+g->freezing = sf.freezing;
+}
+
 static void virtio_gpu_disable_scanout(VirtIOGPU *g, int scanout_id)
 {
 struct virtio_gpu_scanout *scanout = >parent_obj.scanout[scanout_id];
@@ -986,6 +996,9 @@ void virtio_gpu_simple_process_cmd(VirtIOGPU *g,
 case VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING:
 virtio_gpu_resource_detach_backing(g, cmd);
 break;
+case VIRTIO_GPU_CMD_STATUS_FREEZING:
+virtio_gpu_cmd_status_freezing(g, cmd);
+break;
 default:
 cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
 break;
@@ -1344,6 +1357,8 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 QTAILQ_INIT(>reslist);
 QTAILQ_INIT(>cmdq);
 QTAILQ_INIT(>fenceq);
+
+g->freezing = false;
 }
 
 void virtio_gpu_reset(VirtIODevice *vdev)
@@ -1352,8 +1367,15 @@ void virtio_gpu_reset(VirtIODevice *vdev)
 struct virtio_gpu_simple_resource *res, *tmp;
 struct virtio_gpu_ctrl_command *cmd;
 
-QTAILQ_FOREACH_SAFE(res, >reslist, next, tmp) {
-virtio_gpu_resource_destroy(g, res);
+/*
+ * If guest is suspending, we shouldn't destroy resources,
+ * otherwise, the display can't come back to the time when
+ * it was suspended after guest resumed.
+ */
+if (!g->freezing) {
+QTAILQ_FOREACH_SAFE(res, >reslist, next, tmp) {
+virtio_gpu_resource_destroy(g, res);
+}
 }
 
 while (!QTAILQ_EMPTY(>cmdq)) {
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 2e28507efe..c21c2990fb 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -173,6 +173,7 @@ struct VirtIOGPU {
 
 uint64_t hostmem;
 
+bool freezing;
 bool processing_cmdq;
 QEMUTimer *fence_poll;
 QEMUTimer *print_stats;
@@ -284,5 +285,7 @@ void virtio_gpu_virgl_reset_scanout(VirtIOGPU *g);
 void virtio_gpu_virgl_reset(VirtIOGPU *g);
 int virtio_gpu_virgl_init(VirtIOGPU *g);
 int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g);
+void virtio_gpu_cmd_status_freezing(VirtIOGPU

[PATCH QEMU v5 2/8] qapi/migration: Introduce x-vcpu-dirty-limit-period parameter

2023-06-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Introduce "x-vcpu-dirty-limit-period" migration experimental
parameter, which is in the range of 1 to 1000ms and used to
make dirtyrate calculation period configurable.

Currently with the "x-vcpu-dirty-limit-period" varies, the
total time of live migration changes, test results show the
optimal value of "x-vcpu-dirty-limit-period" ranges from
500ms to 1000 ms. "x-vcpu-dirty-limit-period" should be made
stable once it proves best value can not be determined with
developer's experiments.

Signed-off-by: Hyman Huang(黄勇) 
Signed-off-by: Markus Armbruster 
---
 migration/migration-hmp-cmds.c |  8 
 migration/options.c| 28 
 qapi/migration.json| 34 +++---
 3 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 9885d7c9f7..352e9ec716 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -364,6 +364,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 }
 }
 }
+
+monitor_printf(mon, "%s: %" PRIu64 " ms\n",
+MigrationParameter_str(MIGRATION_PARAMETER_X_VCPU_DIRTY_LIMIT_PERIOD),
+params->x_vcpu_dirty_limit_period);
 }
 
 qapi_free_MigrationParameters(params);
@@ -620,6 +624,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 error_setg(, "The block-bitmap-mapping parameter can only be set "
"through QMP");
 break;
+case MIGRATION_PARAMETER_X_VCPU_DIRTY_LIMIT_PERIOD:
+p->has_x_vcpu_dirty_limit_period = true;
+visit_type_size(v, param, >x_vcpu_dirty_limit_period, );
+break;
 default:
 assert(0);
 }
diff --git a/migration/options.c b/migration/options.c
index b62ab30cd5..1cb735e35f 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -80,6 +80,8 @@
 #define DEFINE_PROP_MIG_CAP(name, x) \
 DEFINE_PROP_BOOL(name, MigrationState, capabilities[x], false)
 
+#define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD 1000/* microsecond */
+
 Property migration_properties[] = {
 DEFINE_PROP_BOOL("store-global-state", MigrationState,
  store_global_state, true),
@@ -163,6 +165,9 @@ Property migration_properties[] = {
 DEFINE_PROP_STRING("tls-creds", MigrationState, parameters.tls_creds),
 DEFINE_PROP_STRING("tls-hostname", MigrationState, 
parameters.tls_hostname),
 DEFINE_PROP_STRING("tls-authz", MigrationState, parameters.tls_authz),
+DEFINE_PROP_UINT64("x-vcpu-dirty-limit-period", MigrationState,
+   parameters.x_vcpu_dirty_limit_period,
+   DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -891,6 +896,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
s->parameters.block_bitmap_mapping);
 }
 
+params->has_x_vcpu_dirty_limit_period = true;
+params->x_vcpu_dirty_limit_period = 
s->parameters.x_vcpu_dirty_limit_period;
+
 return params;
 }
 
@@ -923,6 +931,7 @@ void migrate_params_init(MigrationParameters *params)
 params->has_announce_max = true;
 params->has_announce_rounds = true;
 params->has_announce_step = true;
+params->has_x_vcpu_dirty_limit_period = true;
 }
 
 /*
@@ -1083,6 +1092,15 @@ bool migrate_params_check(MigrationParameters *params, 
Error **errp)
 }
 #endif
 
+if (params->has_x_vcpu_dirty_limit_period &&
+(params->x_vcpu_dirty_limit_period < 1 ||
+ params->x_vcpu_dirty_limit_period > 1000)) {
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+   "x-vcpu-dirty-limit-period",
+   "a value between 1 and 1000");
+return false;
+}
+
 return true;
 }
 
@@ -1182,6 +1200,11 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 dest->has_block_bitmap_mapping = true;
 dest->block_bitmap_mapping = params->block_bitmap_mapping;
 }
+
+if (params->has_x_vcpu_dirty_limit_period) {
+dest->x_vcpu_dirty_limit_period =
+params->x_vcpu_dirty_limit_period;
+}
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1300,6 +1323,11 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 QAPI_CLONE(BitmapMigrationNodeAliasList,
params->block_bitmap_mapping);
 }
+
+if (params->has_x_vcpu_dirty_limit_period) {
+s->parameters.x_vcpu_dirty_limit_period =
+params->x_vcpu_dirty_limit_period;
+}
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/qapi/migration.json b/qapi/migration.json
index 179af0c4d8..8d491ee121 100644
--- a/qapi/migration.json
+++

[PATCH QEMU v5 3/8] qapi/migration: Introduce vcpu-dirty-limit parameters

2023-06-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Introduce "vcpu-dirty-limit" migration parameter used
to limit dirty page rate during live migration.

"vcpu-dirty-limit" and "x-vcpu-dirty-limit-period" are
two dirty-limit-related migration parameters, which can
be set before and during live migration by qmp
migrate-set-parameters.

This two parameters are used to help implement the dirty
page rate limit algo of migration.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
---
 migration/migration-hmp-cmds.c |  8 
 migration/options.c| 21 +
 qapi/migration.json| 18 +++---
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 352e9ec716..35e8020bbf 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -368,6 +368,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %" PRIu64 " ms\n",
 MigrationParameter_str(MIGRATION_PARAMETER_X_VCPU_DIRTY_LIMIT_PERIOD),
 params->x_vcpu_dirty_limit_period);
+
+monitor_printf(mon, "%s: %" PRIu64 " MB/s\n",
+MigrationParameter_str(MIGRATION_PARAMETER_VCPU_DIRTY_LIMIT),
+params->vcpu_dirty_limit);
 }
 
 qapi_free_MigrationParameters(params);
@@ -628,6 +632,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_x_vcpu_dirty_limit_period = true;
 visit_type_size(v, param, >x_vcpu_dirty_limit_period, );
 break;
+case MIGRATION_PARAMETER_VCPU_DIRTY_LIMIT:
+p->has_vcpu_dirty_limit = true;
+visit_type_size(v, param, >vcpu_dirty_limit, );
+break;
 default:
 assert(0);
 }
diff --git a/migration/options.c b/migration/options.c
index 1cb735e35f..8dc1ab10e1 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -81,6 +81,7 @@
 DEFINE_PROP_BOOL(name, MigrationState, capabilities[x], false)
 
 #define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD 1000/* microsecond */
+#define DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT1   /* MB/s */
 
 Property migration_properties[] = {
 DEFINE_PROP_BOOL("store-global-state", MigrationState,
@@ -168,6 +169,9 @@ Property migration_properties[] = {
 DEFINE_PROP_UINT64("x-vcpu-dirty-limit-period", MigrationState,
parameters.x_vcpu_dirty_limit_period,
DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT_PERIOD),
+DEFINE_PROP_UINT64("vcpu-dirty-limit", MigrationState,
+   parameters.vcpu_dirty_limit,
+   DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -898,6 +902,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 
 params->has_x_vcpu_dirty_limit_period = true;
 params->x_vcpu_dirty_limit_period = 
s->parameters.x_vcpu_dirty_limit_period;
+params->has_vcpu_dirty_limit = true;
+params->vcpu_dirty_limit = s->parameters.vcpu_dirty_limit;
 
 return params;
 }
@@ -932,6 +938,7 @@ void migrate_params_init(MigrationParameters *params)
 params->has_announce_rounds = true;
 params->has_announce_step = true;
 params->has_x_vcpu_dirty_limit_period = true;
+params->has_vcpu_dirty_limit = true;
 }
 
 /*
@@ -1101,6 +1108,14 @@ bool migrate_params_check(MigrationParameters *params, 
Error **errp)
 return false;
 }
 
+if (params->has_vcpu_dirty_limit &&
+(params->vcpu_dirty_limit < 1)) {
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+   "vcpu_dirty_limit",
+   "is invalid, it must greater then 1 MB/s");
+return false;
+}
+
 return true;
 }
 
@@ -1205,6 +1220,9 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 dest->x_vcpu_dirty_limit_period =
 params->x_vcpu_dirty_limit_period;
 }
+if (params->has_vcpu_dirty_limit) {
+dest->vcpu_dirty_limit = params->vcpu_dirty_limit;
+}
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1328,6 +1346,9 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 s->parameters.x_vcpu_dirty_limit_period =
 params->x_vcpu_dirty_limit_period;
 }
+if (params->has_vcpu_dirty_limit) {
+s->parameters.vcpu_dirty_limit = params->vcpu_dirty_limit;
+}
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/qapi/migration.json b/qapi/migration.json
index 8d491ee121..b970b68672 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -783,6 +783,9 @@
 # live migration. Should be in the range 1 to 
1000ms,
 # defaults to 1000ms. (Since 8.1)
 #
+# @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
+#

[QEMU PATCH 0/1]

2023-06-07 Thread Jiqian Chen

Hi all,

I am working to implement virtgpu S3 function on Xen.

Currently on Xen, if we start a guest who enables virtgpu, and then
run "echo mem > /sys/power/state" to suspend guest. And run
"sudo xl trigger  s3resume" to resume guest. We can find that
the guest kernel comes back, but the display doesn't. It just shown a
black screen.

Through reading codes, I founded that when guest was during suspending,
it called into Qemu to call virtio_gpu_gl_reset. In virtio_gpu_gl_reset,
it destroyed all resources and reset renderer. This made the display
gone after guest resumed.

I think we should keep resources or prevent they being destroyed when
guest is suspending. So, I add a new status named freezing to virtgpu,
and add a new ctrl message VIRTIO_GPU_CMD_STATUS_FREEZING to get
notification from guest. If freezing is set to true, and then Qemu will
realize that guest is suspending, it will not destroy resources and will
not reset renderer. If freezing is set to false, Qemu will do its origin
actions, and has no other impaction.

And now, display can come back and applications can continue their
status after guest resumes.

Jiqian Chen (1):
  virtgpu: do not destroy resources when guest suspend

 hw/display/virtio-gpu-gl.c  |  9 ++-
 hw/display/virtio-gpu-virgl.c   |  3 +++
 hw/display/virtio-gpu.c | 26 +++--
 include/hw/virtio/virtio-gpu.h  |  3 +++
 include/standard-headers/linux/virtio_gpu.h |  9 +++
 5 files changed, 47 insertions(+), 3 deletions(-)

-- 
2.34.1

[PATCH QEMU v5 7/8] migration: Extend query-migrate to provide dirty page limit info

2023-06-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Extend query-migrate to provide throttle time and estimated
ring full time with dirty-limit capability enabled, through which
we can observe if dirty limit take effect during live migration.

Signed-off-by: Hyman Huang(黄勇) 
Signed-off-by: Markus Armbruster 
---
 include/sysemu/dirtylimit.h|  2 ++
 migration/migration-hmp-cmds.c | 10 +
 migration/migration.c  | 10 +
 qapi/migration.json| 15 -
 softmmu/dirtylimit.c   | 39 ++
 5 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/dirtylimit.h b/include/sysemu/dirtylimit.h
index 8d2c1f3a6b..410a2bc0b6 100644
--- a/include/sysemu/dirtylimit.h
+++ b/include/sysemu/dirtylimit.h
@@ -34,4 +34,6 @@ void dirtylimit_set_vcpu(int cpu_index,
 void dirtylimit_set_all(uint64_t quota,
 bool enable);
 void dirtylimit_vcpu_execute(CPUState *cpu);
+int64_t dirtylimit_throttle_time_per_round(void);
+int64_t dirtylimit_ring_full_time(void);
 #endif
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 35e8020bbf..893c87493d 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -190,6 +190,16 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
info->cpu_throttle_percentage);
 }
 
+if (info->has_dirty_limit_throttle_time_per_round) {
+monitor_printf(mon, "dirty-limit throttle time: %" PRIi64 " us\n",
+   info->dirty_limit_throttle_time_per_round);
+}
+
+if (info->has_dirty_limit_ring_full_time) {
+monitor_printf(mon, "dirty-limit ring full time: %" PRIi64 " us\n",
+   info->dirty_limit_ring_full_time);
+}
+
 if (info->has_postcopy_blocktime) {
 monitor_printf(mon, "postcopy blocktime: %u\n",
info->postcopy_blocktime);
diff --git a/migration/migration.c b/migration/migration.c
index 4278b48af0..5e1abc9cee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -64,6 +64,7 @@
 #include "yank_functions.h"
 #include "sysemu/qtest.h"
 #include "options.h"
+#include "sysemu/dirtylimit.h"
 
 static NotifierList migration_state_notifiers =
 NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
@@ -968,6 +969,15 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 info->ram->dirty_pages_rate =
stat64_get(_stats.dirty_pages_rate);
 }
+
+if (migrate_dirty_limit() && dirtylimit_in_service()) {
+info->has_dirty_limit_throttle_time_per_round = true;
+info->dirty_limit_throttle_time_per_round =
+dirtylimit_throttle_time_per_round();
+
+info->has_dirty_limit_ring_full_time = true;
+info->dirty_limit_ring_full_time = dirtylimit_ring_full_time();
+}
 }
 
 static void populate_disk_info(MigrationInfo *info)
diff --git a/qapi/migration.json b/qapi/migration.json
index 0c4827d9c9..b31a8c615c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -250,6 +250,17 @@
 # blocked.  Present and non-empty when migration is blocked.
 # (since 6.0)
 #
+# @dirty-limit-throttle-time-per-round: Maximum throttle time (in 
microseconds) of virtual
+#   CPUs each dirty ring full round, which 
shows how
+#   MigrationCapability dirty-limit 
affects the guest
+#   during live migration. (since 8.1)
+#
+# @dirty-limit-ring-full-time: Estimated average dirty ring full time (in 
microseconds)
+#  each dirty ring full round, note that the value 
equals
+#  dirty ring memory size divided by average dirty 
page rate
+#  of virtual CPU, which can be used to observe 
the average
+#  memory load of virtual CPU indirectly. (since 
8.1)
+#
 # Since: 0.14
 ##
 { 'struct': 'MigrationInfo',
@@ -267,7 +278,9 @@
'*postcopy-blocktime' : 'uint32',
'*postcopy-vcpu-blocktime': ['uint32'],
'*compression': 'CompressionStats',
-   '*socket-address': ['SocketAddress'] } }
+   '*socket-address': ['SocketAddress'],
+   '*dirty-limit-throttle-time-per-round': 'int64',
+   '*dirty-limit-ring-full-time': 'int64'} }
 
 ##
 # @query-migrate:
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index ee47158986..0fb9d5b171 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -558,6 +558,45 @@ out:
 hmp_handle_error(mon, err);
 }
 
+/* Return the max throttle time of each virtual CPU */
+int64_t dirtylimit_throttle_time_per_round(void)
+{
+CPUState *cpu;
+int64_t max = 0;
+
+CPU_FOREACH(cpu) {
+if (cpu->throttle_us_per_full > max) {
+max = cpu->throttle_us_per_full;
+}
+}
+
+return max;
+}
+
+/*
+ *

[PATCH QEMU v5 0/8] migration: introduce dirtylimit capability

2023-06-07 Thread ~hyman

I'm awfully sorry about having not updated the patchset for a long time.
I have changed my email address to "yong.hu...@smartx.com", and
this email address will used to post the unfinished commits in the
further.

I have dropped the performance improvement data, please refer to the
following link to see the details.
https://lore.kernel.org/qemu-
devel/13a62aaf-f340-0dc7-7b68-7ecc4bb64...@chinatelecom.cn/

Please review if anyone have time. Thanks.
Yong

v5:
1. Rebase on master and enrich the comment for "dirty-limit" capability,
suggesting by Markus.
2. Drop commits that have already been merged.

v4:
1. Polish the docs and update the release version suggested by Markus
2. Rename the migrate exported info "dirty-limit-throttle-time-per-
round"
   to "dirty-limit-throttle-time-per-full".

v3(resend):
- fix the syntax error of the topic.

v3:
This version make some modifications inspired by Peter and Markus
as following:
1. Do the code clean up in [PATCH v2 02/11] suggested by Markus
2. Replace the [PATCH v2 03/11] with a much simpler patch posted by
   Peter to fix the following bug:
   https://bugzilla.redhat.com/show_bug.cgi?id=2124756
3. Fix the error path of migrate_params_check in [PATCH v2 04/11]
   pointed out by Markus. Enrich the commit message to explain why
   x-vcpu-dirty-limit-period an unstable parameter.
4. Refactor the dirty-limit convergence algo in [PATCH v2 07/11]
   suggested by Peter:
   a. apply blk_mig_bulk_active check before enable dirty-limit
   b. drop the unhelpful check function before enable dirty-limit
   c. change the migration_cancel logic, just cancel dirty-limit
  only if dirty-limit capability turned on.
   d. abstract a code clean commit [PATCH v3 07/10] to adjust
  the check order before enable auto-converge
5. Change the name of observing indexes during dirty-limit live
   migration to make them more easy-understanding. Use the
   maximum throttle time of vpus as "dirty-limit-throttle-time-per-full"
6. Fix some grammatical and spelling errors pointed out by Markus
   and enrich the document about the dirty-limit live migration
   observing indexes "dirty-limit-ring-full-time"
   and "dirty-limit-throttle-time-per-full"
7. Change the default value of x-vcpu-dirty-limit-period to 1000ms,
   which is optimal value pointed out in cover letter in that
   testing environment.
8. Drop the 2 guestperf test commits [PATCH v2 10/11],
   [PATCH v2 11/11] and post them with a standalone series in the
   future.

v2:
This version make a little bit modifications comparing with
version 1 as following:
1. fix the overflow issue reported by Peter Maydell
2. add parameter check for hmp "set_vcpu_dirty_limit" command
3. fix the racing issue between dirty ring reaper thread and
   Qemu main thread.
4. add migrate parameter check for x-vcpu-dirty-limit-period
   and vcpu-dirty-limit.
5. add the logic to forbid hmp/qmp commands set_vcpu_dirty_limit,
   cancel_vcpu_dirty_limit during dirty-limit live migration when
   implement dirty-limit convergence algo.
6. add capability check to ensure auto-converge and dirty-limit
   are mutually exclusive.
7. pre-check if kvm dirty ring size is configured before setting
   dirty-limit migrate parameter

Hyman Huang(黄勇) (8):
  softmmu/dirtylimit: Add parameter check for hmp "set_vcpu_dirty_limit"
  qapi/migration: Introduce x-vcpu-dirty-limit-period parameter
  qapi/migration: Introduce vcpu-dirty-limit parameters
  migration: Introduce dirty-limit capability
  migration: Refactor auto-converge capability logic
  migration: Implement dirty-limit convergence algo
  migration: Extend query-migrate to provide dirty page limit info
  tests: Add migration dirty-limit capability test

 include/sysemu/dirtylimit.h|   2 +
 migration/migration-hmp-cmds.c |  26 ++
 migration/migration.c  |  13 +++
 migration/options.c|  72 +++
 migration/options.h|   1 +
 migration/ram.c|  63 +++---
 migration/trace-events |   1 +
 qapi/migration.json|  73 ++--
 softmmu/dirtylimit.c   |  90 +--
 tests/qtest/migration-test.c   | 154 +
 10 files changed, 464 insertions(+), 31 deletions(-)

-- 
2.38.5

[PATCH QEMU v5 6/8] migration: Implement dirty-limit convergence algo

2023-06-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Implement dirty-limit convergence algo for live migration,
which is kind of like auto-converge algo but using dirty-limit
instead of cpu throttle to make migration convergent.

Enable dirty page limit if dirty_rate_high_cnt greater than 2
when dirty-limit capability enabled, Disable dirty-limit if
migration be cancled.

Note that "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit"
commands are not allowed during dirty-limit live migration.

Signed-off-by: Hyman Huang(黄勇) 
Signed-off-by: Markus Armbruster 
---
 migration/migration.c  |  3 ++
 migration/ram.c| 63 --
 migration/trace-events |  1 +
 softmmu/dirtylimit.c   | 22 +++
 4 files changed, 74 insertions(+), 15 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index dc05c6f6ea..4278b48af0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -165,6 +165,9 @@ void migration_cancel(const Error *error)
 if (error) {
 migrate_set_error(current_migration, error);
 }
+if (migrate_dirty_limit()) {
+qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
+}
 migrate_fd_cancel(current_migration);
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 132f1a81d9..d26c7a8193 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -46,6 +46,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-types-migration.h"
 #include "qapi/qapi-events-migration.h"
+#include "qapi/qapi-commands-migration.h"
 #include "qapi/qmp/qerror.h"
 #include "trace.h"
 #include "exec/ram_addr.h"
@@ -59,6 +60,8 @@
 #include "multifd.h"
 #include "sysemu/runstate.h"
 #include "options.h"
+#include "sysemu/dirtylimit.h"
+#include "sysemu/kvm.h"
 
 #include "hw/boards.h" /* for machine_dump_guest_core() */
 
@@ -983,6 +986,30 @@ static void migration_update_rates(RAMState *rs, int64_t 
end_time)
 }
 }
 
+/*
+ * Enable dirty-limit to throttle down the guest
+ */
+static void migration_dirty_limit_guest(void)
+{
+static int64_t quota_dirtyrate;
+MigrationState *s = migrate_get_current();
+
+/*
+ * If dirty limit already enabled and migration parameter
+ * vcpu-dirty-limit untouched.
+ */
+if (dirtylimit_in_service() &&
+quota_dirtyrate == s->parameters.vcpu_dirty_limit) {
+return;
+}
+
+quota_dirtyrate = s->parameters.vcpu_dirty_limit;
+
+/* Set or update quota dirty limit */
+qmp_set_vcpu_dirty_limit(false, -1, quota_dirtyrate, NULL);
+trace_migration_dirty_limit_guest(quota_dirtyrate);
+}
+
 static void migration_trigger_throttle(RAMState *rs)
 {
 uint64_t threshold = migrate_throttle_trigger_threshold();
@@ -991,26 +1018,32 @@ static void migration_trigger_throttle(RAMState *rs)
 uint64_t bytes_dirty_period = rs->num_dirty_pages_period * 
TARGET_PAGE_SIZE;
 uint64_t bytes_dirty_threshold = bytes_xfer_period * threshold / 100;
 
-/* During block migration the auto-converge logic incorrectly detects
- * that ram migration makes no progress. Avoid this by disabling the
- * throttling logic during the bulk phase of block migration. */
-if (blk_mig_bulk_active()) {
-return;
-}
+/*
+ * The following detection logic can be refined later. For now:
+ * Check to see if the ratio between dirtied bytes and the approx.
+ * amount of bytes that just got transferred since the last time
+ * we were in this routine reaches the threshold. If that happens
+ * twice, start or increase throttling.
+ */
 
-if (migrate_auto_converge()) {
-/* The following detection logic can be refined later. For now:
-   Check to see if the ratio between dirtied bytes and the approx.
-   amount of bytes that just got transferred since the last time
-   we were in this routine reaches the threshold. If that happens
-   twice, start or increase throttling. */
+if ((bytes_dirty_period > bytes_dirty_threshold) &&
+(++rs->dirty_rate_high_cnt >= 2)) {
+rs->dirty_rate_high_cnt = 0;
+/*
+ * During block migration the auto-converge logic incorrectly detects
+ * that ram migration makes no progress. Avoid this by disabling the
+ * throttling logic during the bulk phase of block migration
+ */
+if (blk_mig_bulk_active()) {
+return;
+}
 
-if ((bytes_dirty_period > bytes_dirty_threshold) &&
-(++rs->dirty_rate_high_cnt >= 2)) {
+if (migrate_auto_converge()) {
 trace_migration_throttle();
-rs->dirty_rate_high_cnt = 0;
 mig_throttle_guest_down(bytes_dirty_period,
 bytes_dirty_threshold);
+} else if (migrate_dirty_limit()) {
+migration_dirty_limit_guest();
 }
 }
 }
diff --git a/migration/trace-events b/migration/trace-events
index cdaef7a1ea..c5cb280d95 100644
--- a/migration/trace-events
+++

[PATCH QEMU v5 8/8] tests: Add migration dirty-limit capability test

2023-06-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Add migration dirty-limit capability test if kernel support
dirty ring.

Migration dirty-limit capability introduce dirty limit
capability, two parameters: x-vcpu-dirty-limit-period and
vcpu-dirty-limit are introduced to implement the live
migration with dirty limit.

The test case does the following things:
1. start src, dst vm and enable dirty-limit capability
2. start migrate and set cancel it to check if dirty limit
   stop working.
3. restart dst vm
4. start migrate and enable dirty-limit capability
5. check if migration satisfy the convergence condition
   during pre-switchover phase.

Signed-off-by: Hyman Huang(黄勇) 
---
 tests/qtest/migration-test.c | 154 +++
 1 file changed, 154 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355bbd9..60789a8d9f 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2609,6 +2609,158 @@ static void test_vcpu_dirty_limit(void)
 dirtylimit_stop_vm(vm);
 }
 
+static void migrate_dirty_limit_wait_showup(QTestState *from,
+const int64_t period,
+const int64_t value)
+{
+/* Enable dirty limit capability */
+migrate_set_capability(from, "dirty-limit", true);
+
+/* Set dirty limit parameters */
+migrate_set_parameter_int(from, "x-vcpu-dirty-limit-period", period);
+migrate_set_parameter_int(from, "vcpu-dirty-limit", value);
+
+/* Make sure migrate can't converge */
+migrate_ensure_non_converge(from);
+
+/* To check limit rate after precopy */
+migrate_set_capability(from, "pause-before-switchover", true);
+
+/* Wait for the serial output from the source */
+wait_for_serial("src_serial");
+}
+
+/*
+ * This test does:
+ *  source   target
+ *   migrate_incoming
+ * migrate
+ * migrate_cancel
+ *   restart target
+ * migrate
+ *
+ *  And see that if dirty limit works correctly
+ */
+static void test_migrate_dirty_limit(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+QTestState *from, *to;
+int64_t remaining, throttle_us_per_full;
+/*
+ * We want the test to be stable and as fast as possible.
+ * E.g., with 1Gb/s bandwith migration may pass without dirty limit,
+ * so we need to decrease a bandwidth.
+ */
+const int64_t dirtylimit_period = 1000, dirtylimit_value = 50;
+const int64_t max_bandwidth = 4; /* ~400Mb/s */
+const int64_t downtime_limit = 250; /* 250ms */
+/*
+ * We migrate through unix-socket (> 500Mb/s).
+ * Thus, expected migration speed ~= bandwidth limit (< 500Mb/s).
+ * So, we can predict expected_threshold
+ */
+const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000;
+int max_try_count = 10;
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+.use_dirty_ring = true,
+},
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+/* Start src, dst vm */
+if (test_migrate_start(, , args.listen_uri, )) {
+return;
+}
+
+/* Prepare for dirty limit migration and wait src vm show up */
+migrate_dirty_limit_wait_showup(from, dirtylimit_period, dirtylimit_value);
+
+/* Start migrate */
+migrate_qmp(from, uri, "{}");
+
+/* Wait for dirty limit throttle begin */
+throttle_us_per_full = 0;
+while (throttle_us_per_full == 0) {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+}
+
+/* Now cancel migrate and wait for dirty limit throttle switch off */
+migrate_cancel(from);
+wait_for_migration_status(from, "cancelled", NULL);
+
+/* Check if dirty limit throttle switched off, set timeout 1ms */
+do {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+} while (throttle_us_per_full != 0 && --max_try_count);
+
+/* Assert dirty limit is not in service */
+g_assert_cmpint(throttle_us_per_full, ==, 0);
+
+args = (MigrateCommon) {
+.start = {
+.only_target = true,
+.use_dirty_ring = true,
+},
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+/* Restart dst vm, src vm already show up so we needn't wait anymore */
+if (test_migrate_start(, , args.listen_uri, )) {
+return;
+}
+
+/* Start migrate */
+migrate_qmp(from, uri, "{}");
+
+/* Wait for dirty limit throttle begin */
+throttle_us_per_full = 0;
+while (throttle_us_per_full == 0) {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+

[PATCH QEMU v5 1/8] softmmu/dirtylimit: Add parameter check for hmp "set_vcpu_dirty_limit"

2023-06-07 Thread ~hyman

From: Hyman Huang(黄勇) 

dirty_rate paraemter of hmp command "set_vcpu_dirty_limit" is invalid
if less than 0, so add parameter check for it.

Note that this patch also delete the unsolicited help message and
clean up the code.

Signed-off-by: Hyman Huang(黄勇) 
Signed-off-by: Markus Armbruster 
Reviewed-by: Peter Xu 
---
 softmmu/dirtylimit.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 015a9038d1..5c12d26d49 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -515,14 +515,15 @@ void hmp_set_vcpu_dirty_limit(Monitor *mon, const QDict 
*qdict)
 int64_t cpu_index = qdict_get_try_int(qdict, "cpu_index", -1);
 Error *err = NULL;
 
-qmp_set_vcpu_dirty_limit(!!(cpu_index != -1), cpu_index, dirty_rate, );
-if (err) {
-hmp_handle_error(mon, err);
-return;
+if (dirty_rate < 0) {
+error_setg(, "invalid dirty page limit %ld", dirty_rate);
+goto out;
 }
 
-monitor_printf(mon, "[Please use 'info vcpu_dirty_limit' to query "
-   "dirty limit for virtual CPU]\n");
+qmp_set_vcpu_dirty_limit(!!(cpu_index != -1), cpu_index, dirty_rate, );
+
+out:
+hmp_handle_error(mon, err);
 }
 
 static struct DirtyLimitInfo *dirtylimit_query_vcpu(int cpu_index)
-- 
2.38.5

[PATCH QEMU v5 4/8] migration: Introduce dirty-limit capability

2023-06-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Introduce migration dirty-limit capability, which can
be turned on before live migration and limit dirty
page rate durty live migration.

Introduce migrate_dirty_limit function to help check
if dirty-limit capability enabled during live migration.

Meanwhile, refactor vcpu_dirty_rate_stat_collect
so that period can be configured instead of hardcoded.

dirty-limit capability is kind of like auto-converge
but using dirty limit instead of traditional cpu-throttle
to throttle guest down. To enable this feature, turn on
the dirty-limit capability before live migration using
migrate-set-capabilities, and set the parameters
"x-vcpu-dirty-limit-period", "vcpu-dirty-limit" suitably
to speed up convergence.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
---
 migration/options.c  | 23 +++
 migration/options.h  |  1 +
 qapi/migration.json  | 12 +++-
 softmmu/dirtylimit.c | 18 ++
 4 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index 8dc1ab10e1..a68264f3c3 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -27,6 +27,7 @@
 #include "qemu-file.h"
 #include "ram.h"
 #include "options.h"
+#include "sysemu/kvm.h"
 
 /* Maximum migrate downtime set to 2000 seconds */
 #define MAX_MIGRATE_DOWNTIME_SECONDS 2000
@@ -194,6 +195,7 @@ Property migration_properties[] = {
 DEFINE_PROP_MIG_CAP("x-zero-copy-send",
 MIGRATION_CAPABILITY_ZERO_COPY_SEND),
 #endif
+DEFINE_PROP_MIG_CAP("x-dirty-limit", MIGRATION_CAPABILITY_DIRTY_LIMIT),
 
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -205,6 +207,13 @@ bool migrate_auto_converge(void)
 return s->capabilities[MIGRATION_CAPABILITY_AUTO_CONVERGE];
 }
 
+bool migrate_dirty_limit(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->capabilities[MIGRATION_CAPABILITY_DIRTY_LIMIT];
+}
+
 bool migrate_background_snapshot(void)
 {
 MigrationState *s = migrate_get_current();
@@ -556,6 +565,20 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
Error **errp)
 }
 }
 
+if (new_caps[MIGRATION_CAPABILITY_DIRTY_LIMIT]) {
+if (new_caps[MIGRATION_CAPABILITY_AUTO_CONVERGE]) {
+error_setg(errp, "dirty-limit conflicts with auto-converge"
+   " either of then available currently");
+return false;
+}
+
+if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
+error_setg(errp, "dirty-limit requires KVM with accelerator"
+   " property 'dirty-ring-size' set");
+return false;
+}
+}
+
 return true;
 }
 
diff --git a/migration/options.h b/migration/options.h
index 45991af3c2..6f0d837932 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -24,6 +24,7 @@ extern Property migration_properties[];
 /* capabilities */
 
 bool migrate_auto_converge(void);
+bool migrate_dirty_limit(void);
 bool migrate_background_snapshot(void);
 bool migrate_block(void);
 bool migrate_colo(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index b970b68672..0c4827d9c9 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -487,6 +487,16 @@
 # and should not affect the correctness of postcopy migration.
 # (since 7.1)
 #
+# @dirty-limit: If enabled, migration will use the dirty-limit algo to
+#   throttle down guest instead of auto-converge algo.
+#   Throttle algo only works when vCPU's dirtyrate greater
+#   than 'vcpu-dirty-limit', read processes in guest os
+#   aren't penalized any more, so this algo can improve
+#   performance of vCPU during live migration. This is an
+#   optional performance feature and should not affect the
+#   correctness of the existing auto-converge algo.
+#   (since 8.1)
+#
 # Features:
 #
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -502,7 +512,7 @@
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
'validate-uuid', 'background-snapshot',
-   'zero-copy-send', 'postcopy-preempt'] }
+   'zero-copy-send', 'postcopy-preempt', 'dirty-limit'] }
 
 ##
 # @MigrationCapabilityStatus:
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 5c12d26d49..3f1103b04b 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -24,6 +24,9 @@
 #include "hw/boards.h"
 #include "sysemu/kvm.h"
 #include "trace.h"
+#include "migration/misc.h"
+#include "migration/migration.h"
+#include "migration/options.h"
 
 /*
  * Dirtylimit stop working if dirty page rate error
@@ -75,14 +78,21 @@ static bool dirtylimit_quit;
 
 static void vcpu_dirty_rate_stat_collect(void)
 {
+MigrationState *s = migrate_get_current();
 VcpuStat stat;
 int i = 0;
+int64_t period = DIRTYLIMIT_CALC_TIME_MS;
+
+if

[PATCH QEMU v5 5/8] migration: Refactor auto-converge capability logic

2023-06-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Check if block migration is running before throttling
guest down in auto-converge way.

Note that this modification is kind of like code clean,
because block migration does not depend on auto-converge
capability, so the order of checks can be adjusted.

Signed-off-by: Hyman Huang(黄勇) 
Acked-by: Peter Xu 
---
 migration/ram.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 88a6c82e63..132f1a81d9 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -994,7 +994,11 @@ static void migration_trigger_throttle(RAMState *rs)
 /* During block migration the auto-converge logic incorrectly detects
  * that ram migration makes no progress. Avoid this by disabling the
  * throttling logic during the bulk phase of block migration. */
-if (migrate_auto_converge() && !blk_mig_bulk_active()) {
+if (blk_mig_bulk_active()) {
+return;
+}
+
+if (migrate_auto_converge()) {
 /* The following detection logic can be refined later. For now:
Check to see if the ratio between dirtied bytes and the approx.
amount of bytes that just got transferred since the last time
-- 
2.38.5

Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu

2023-06-07 Thread Shesha Bhushan Sreenivasamurthy

Hi Fan,
   I am implementing DCD FMAPI commands and planning to start pushing changes 
to the below branch. That requires the contributions you have made. Can your 
changes be pushed to the below branch ?

https://gitlab.com/jic23/qemu/-/tree/cxl-2023-05-25


From: Fan Ni 
Sent: Monday, June 5, 2023 10:51 AM
To: Ira Weiny 
Cc: qemu-devel@nongnu.org ; jonathan.came...@huawei.com 
; linux-...@vger.kernel.org 
; gregory.pr...@memverge.com 
; hch...@avery-design.com.tw 
; cbr...@avery-design.com 
; dan.j.willi...@intel.com ; 
Adam Manzanares ; d...@stgolabs.net 
; nmtadam.sams...@gmail.com ; 
ni...@outlook.com 
Subject: Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu 
 
On Mon, Jun 05, 2023 at 10:35:48AM -0700, Ira Weiny wrote:
> Fan Ni wrote:
> > Since the early draft of DCD support in kernel is out
> > (https://urldefense.com/v3/__https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/*t__;Iw!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21LzAGNOk$
> >  ),
> > this patch series provide dcd emulation in qemu so people who are interested
> > can have an early try. It is noted that the patch series may need to be 
> > updated
> > accordingly if the kernel side implementation changes.
> 
> Fan,
> 
> Do you have a git tree we can pull this from which is updated to a more
> recent CXL branch from Jonathan?
> 
> Thanks,
> Ira

Hi Ira,

I have a git tree of the patch series based on Jonathan's branch
cxl-2023-02-28: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_moking_qemu-2Ddev_tree_dcd-2Drfe=DwIFAg=nKjWec2b6R0mOyPaz7xtfQ=Zta64bwn4nurTRpD4LY2OGr8KklkMRPn7Z_Qy0o4unU=w6dicn5kXEG4Imk6TpICIjdA6KJ-xt84dtHui-Y0fv5H13bijtzEvjxECKE5MHYf=3yeO9RN5FY3gPfO2y19X057YeqRTTQTQNfNA-Gfir_Q=
 .

That may be not new enough to include some of the recent patches, but I can
rebase it to a newer branch if you can tell me which branch you want to use.

Thanks,
Fan

> 
> > 
> > To support DCD emulation, the patch series add DCD related mailbox command
> > support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
> > with dynamic capacity extent and region representative.
> > To support read/write to the dynamic capacity of the device, a host backend
> > is provided and necessary check mechnism is added to ensure the dynamic
> > capacity accessed is backed with active dc extents.
> > Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not 
> > supported
> > , but we add two qmp interfaces for adding/releasing dynamic capacity 
> > extents.
> > Also, the support for multiple hosts sharing the same DCD case is missing.
> > 
> > Things we can try with the patch series together with kernel dcd code:
> > 1. Create DC regions to cover the address range of the dynamic capacity
> > regions.
> > 2. Add/release dynamic capacity extents to the device and notify the
> > kernel.
> > 3. Test kernel side code to accept added dc extents and create dax devices,
> > and release dc extents and notify the device
> > 4. Online the memory range backed with dc extents and let application use
> > them.
> > 
> > The patch series is based on Jonathan's local qemu branch:
> > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28__;!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21OO3UHEM$
> >  
> > 
> > Simple tests peformed with the patch series:
> > 1 Install cxl modules:
> > 
> > modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> > 
> > 2 Create dc regions:
> > 
> > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> > echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> > echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> > echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> > echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> > echo 0x1000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> > echo 0x1000 > /sys/bus/cxl/devices/$region/size
> > echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> > echo 1 > /sys/bus/cxl/devices/$region/commit
> > echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> > 
> > /home/fan/cxl/tools-and-scripts# cxl list
> > [
> >   {
> > "memdevs":[
> >   {
> > "memdev":"mem0",
> > "pmem_size":536870912,
> > "ram_size":0,
> > "serial":0,
> > "host":":0d:00.0"
> >   }
> > ]
> >   },
> >   {
> > "regions":[
> >   {
> > "region":"region0",
> > "resource":45365592064,
> > "size":268435456,
> > "interleave_ways":1,
> > "interleave_granularity":256,
> > "decode_state":"commit"
> >   }
> > ]
> >   }
> > ]
> > 
> > 3 Add two dc extents (128MB each) through qmp interface
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity-event",
> >  "arguments": {
> >   "path": "/machine/peripheral/cxl-pmem0",
> >

Re: [PATCH v2] block/file-posix: fix wps checking in raw_co_prw

2023-06-07 Thread Sam Li

Damien Le Moal  于2023年6月8日周四 09:29写道：
>
> On 6/8/23 03:57, Sam Li wrote:
> > If the write operation fails and the wps is NULL, then accessing it will
> > lead to data corruption.
> >
> > Solving the issue by adding a nullptr checking in get_zones_wp() where
> > the wps is used.
> >
> > This issue is found by Peter Maydell using the Coverity Tool (CID
> > 1512459).
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/file-posix.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index ac1ed54811..4a6c71c7f5 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2523,7 +2523,7 @@ out:
> >  }
> >  }
> >  } else {
> > -if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> > +if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND) && wps) {
> >  update_zones_wp(bs, s->fd, 0, 1);
>
> Nit: this could be:
>
> } else if (wps && type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
>
> However, both if & else side do something only if the above condition is true
> and we only need to that for a zoned drive. So the entire code block could
> really be simplified to be a lot more readable. Something like this (totally
> untested, not even compiled):
>
> #if defined(CONFIG_BLKZONED)
> if (bs->bl.zone_size && (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) 
> {
> BlockZoneWps *wps = bs->wps;
> uint64_t *wp;
>
> if (!wps) {
> return ret;
> }
>
> if (ret) {
> /* write error: update the wp from the underlying device */
> update_zones_wp(bs, s->fd, 0, 1);
> goto unlock;
> }
>
> wp = >wp[offset / bs->bl.zone_size];
> if (BDRV_ZT_IS_CONV(*wp)) {
> /* Conventional zones do not have a write pointer */
> goto unlock;
> }
>
> /* Return the written position for zone append */
> if (type & QEMU_AIO_ZONE_APPEND) {
> *s->offset = *wp;
> trace_zbd_zone_append_complete(bs,
> *s->offset >> BDRV_SECTOR_BITS);
> }
>
> /* Advance the wp if needed */
> if (offset + bytes > *wp) {
> *wp = offset + bytes;
> }
>
> unlock:
> qemu_co_mutex_unlock(>colock);
> }
> #endif
>
> And making this entire block a helper function (e.g. advance_zone_wp()) would
> further clean the code. But that should be done in another patch. Care to 
> send one ?

Sure. If replacing the current code block by saying advance_zone_wp(),
I guess this patch won't be necessary. So I will send another patch
(advance_zone_wp()...) after testing.

Sam

Re: [PATCH v2 3/3] hw/smbios: Fix core count in type4

2023-06-07 Thread Zhao Liu

On Wed, Jun 07, 2023 at 04:51:07PM +0200, Igor Mammedov wrote:
> Date: Wed, 7 Jun 2023 16:51:07 +0200
> From: Igor Mammedov 
> Subject: Re: [PATCH v2 3/3] hw/smbios: Fix core count in type4
> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu)
> 
> On Thu,  1 Jun 2023 17:29:52 +0800
> Zhao Liu  wrote:
> 
> > From: Zhao Liu 
> > 
> > From SMBIOS 3.0 specification, core count field means:
> > 
> > Core Count is the number of cores detected by the BIOS for this
> > processor socket. [1]
> > 
> > Before 003f230e37d7 ("machine: Tweak the order of topology members in
> > struct CpuTopology"), MachineState.smp.cores means "the number of cores
> > in one package", and it's correct to use smp.cores for core count.
> > 
> > But 003f230e37d7 changes the smp.cores' meaning to "the number of cores
> > in one die" and doesn't change the original smp.cores' use in smbios as
> > well, which makes core count in type4 go wrong.
> > 
> > Fix this issue with the correct "cores per socket" caculation.
> 
> see comment on 2/3 patch and do the same for cores.

Ok, thanks.

> 
> > 
> > [1] SMBIOS 3.0.0, section 7.5.6, Processor Information - Core Count
> > 
> > Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in 
> > struct CpuTopology")
> > Signed-off-by: Zhao Liu 
> > ---
> > Changes since v1:
> >  * Calculate cores_per_socket in a different way from
> >threads_per_socket.
> >  * Add the sanity check to ensure consistency of results between these 2
> >ways. This can help not miss any future change of cpu topology.
> > ---
> >  hw/smbios/smbios.c | 13 +++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> > index faf82d4ae646..2b46a51dfcad 100644
> > --- a/hw/smbios/smbios.c
> > +++ b/hw/smbios/smbios.c
> > @@ -714,6 +714,7 @@ static void smbios_build_type_4_table(MachineState *ms, 
> > unsigned instance)
> >  char sock_str[128];
> >  size_t tbl_len = SMBIOS_TYPE_4_LEN_V28;
> >  unsigned threads_per_socket;
> > +unsigned cores_per_socket;
> >  
> >  if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_64) {
> >  tbl_len = SMBIOS_TYPE_4_LEN_V30;
> > @@ -750,8 +751,16 @@ static void smbios_build_type_4_table(MachineState 
> > *ms, unsigned instance)
> >  
> >  /* smp.max_cpus is the total number of threads for the system. */
> >  threads_per_socket = ms->smp.max_cpus / ms->smp.sockets;
> > +cores_per_socket = ms->smp.cores * ms->smp.clusters * ms->smp.dies;
> >  
> > -t->core_count = (ms->smp.cores > 255) ? 0xFF : ms->smp.cores;
> > +/*
> > + * Currently, max_cpus = threads * cores * clusters * dies * sockets.
> > + * threads_per_socket and cores_per_socket are calculated in 2 ways so
> > + * that this sanity check ensures we won't miss any topology level.
> > + */
> > +g_assert(cores_per_socket == (threads_per_socket / ms->smp.threads));
> > +
> > +t->core_count = (cores_per_socket > 255) ? 0xFF : cores_per_socket;
> >  t->core_enabled = t->core_count;
> >  
> >  t->thread_count = (threads_per_socket > 255) ? 0xFF : 
> > threads_per_socket;
> > @@ -760,7 +769,7 @@ static void smbios_build_type_4_table(MachineState *ms, 
> > unsigned instance)
> >  t->processor_family2 = cpu_to_le16(0x01); /* Other */
> >  
> >  if (tbl_len == SMBIOS_TYPE_4_LEN_V30) {
> > -t->core_count2 = t->core_enabled2 = cpu_to_le16(ms->smp.cores);
> > +t->core_count2 = t->core_enabled2 = cpu_to_le16(cores_per_socket);
> >  t->thread_count2 = cpu_to_le16(threads_per_socket);
> >  }
> >  
>

Re: [PATCH v2 2/3] hw/smbios: Fix thread count in type4

2023-06-07 Thread Zhao Liu

On Wed, Jun 07, 2023 at 04:49:34PM +0200, Igor Mammedov wrote:
> Date: Wed, 7 Jun 2023 16:49:34 +0200
> From: Igor Mammedov 
> Subject: Re: [PATCH v2 2/3] hw/smbios: Fix thread count in type4
> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu)
> 
> On Thu,  1 Jun 2023 17:29:51 +0800
> Zhao Liu  wrote:
> 
> > From: Zhao Liu 
> > 
> > From SMBIOS 3.0 specification, thread count field means:
> > 
> > Thread Count is the total number of threads detected by the BIOS for
> > this processor socket. It is a processor-wide count, not a
> > thread-per-core count. [1]
> > 
> > So here we should use threads per socket other than threads per core.
> > 
> > [1] SMBIOS 3.0.0, section 7.5.8, Processor Information - Thread Count
> > 
> > Fixes: c97294ec1b9e ("SMBIOS: Build aggregate smbios tables and entry 
> > point")
> > Signed-off-by: Zhao Liu 
> > ---
> > Changes since v1:
> >  * Rename cpus_per_socket to threads_per_socket.
> >  * Add the comment about smp.max_cpus. Thread count and core count will
> >be calculated in 2 ways and will add a sanity check to ensure we
> >don't miss any topology level.
> > ---
> >  hw/smbios/smbios.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> > index d67415d44dd8..faf82d4ae646 100644
> > --- a/hw/smbios/smbios.c
> > +++ b/hw/smbios/smbios.c
> > @@ -713,6 +713,7 @@ static void smbios_build_type_4_table(MachineState *ms, 
> > unsigned instance)
> >  {
> >  char sock_str[128];
> >  size_t tbl_len = SMBIOS_TYPE_4_LEN_V28;
> > +unsigned threads_per_socket;
> >  
> >  if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_64) {
> >  tbl_len = SMBIOS_TYPE_4_LEN_V30;
> > @@ -747,17 +748,20 @@ static void smbios_build_type_4_table(MachineState 
> > *ms, unsigned instance)
> >  SMBIOS_TABLE_SET_STR(4, asset_tag_number_str, type4.asset);
> >  SMBIOS_TABLE_SET_STR(4, part_number_str, type4.part);
> >  
> > +/* smp.max_cpus is the total number of threads for the system. */
> > +threads_per_socket = ms->smp.max_cpus / ms->smp.sockets;
> 
> what I dislike here is introducing topo calculations with its own assumptions
> in random places.
> 
> I'd suggest to add threads_per_socket (even if it's just a helper field) into
> topo structure and calculate it with the rest on topology.
> And then use result here.

Thanks, I will try this way.

Zhao

> 
> > +
> >  t->core_count = (ms->smp.cores > 255) ? 0xFF : ms->smp.cores;
> >  t->core_enabled = t->core_count;
> >  
> > -t->thread_count = (ms->smp.threads > 255) ? 0xFF : ms->smp.threads;
> > +t->thread_count = (threads_per_socket > 255) ? 0xFF : 
> > threads_per_socket;
> >  
> >  t->processor_characteristics = cpu_to_le16(0x02); /* Unknown */
> >  t->processor_family2 = cpu_to_le16(0x01); /* Other */
> >  
> >  if (tbl_len == SMBIOS_TYPE_4_LEN_V30) {
> >  t->core_count2 = t->core_enabled2 = cpu_to_le16(ms->smp.cores);
> > -t->thread_count2 = cpu_to_le16(ms->smp.threads);
> > +t->thread_count2 = cpu_to_le16(threads_per_socket);
> >  }
> >  
> >  SMBIOS_BUILD_TABLE_POST;
>

Re: [PATCH v2 1/3] hw/smbios: Fix smbios_smp_sockets caculation

2023-06-07 Thread Zhao Liu

On Wed, Jun 07, 2023 at 04:35:03PM +0200, Igor Mammedov wrote:
> Date: Wed, 7 Jun 2023 16:35:03 +0200
> From: Igor Mammedov 
> Subject: Re: [PATCH v2 1/3] hw/smbios: Fix smbios_smp_sockets caculation
> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; x86_64-redhat-linux-gnu)

Hi Igor,

> 
> On Thu,  1 Jun 2023 17:29:50 +0800
> Zhao Liu  wrote:
> 
> > From: Zhao Liu 
> > 
> > Here're 2 mistakes:
> > 1. 003f230e37d7 ("machine: Tweak the order of topology members in struct
> >CpuTopology") changes the meaning of smp.cores but doesn't fix
> >original smp.cores uses. And because of the introduction of cluster,
> >now smp.cores means the number of cores in one cluster. So smp.cores
> >* smp.threads just means the cpus in a cluster not in a socket.
> 
> > 2. smp.cpus means the number of initial online cpus, not the total
> >number of cpus. For such topology calculation, smp.max_cpus
> >should be considered.
> that's probably not relevant to the patch.
> 

For the 2nd point, I mean the original calculation should use max_cpus
other than cpus to calculate sockets:

- smbios_smp_sockets = DIV_ROUND_UP(ms->smp.cpus,
+ smbios_smp_sockets = DIV_ROUND_UP(ms->smp.max_cpus,
ms->smp.cores * ms->smp.threads);


But since we already have smp.sockets, we can use it directly.

> 
> > 
> > Since the number of sockets has already been recorded in smp structure,
> > use smp.sockets directly.
> 
> 
> I'd rephrase commit message to something like this:
> ---
> CPU topology is calculated by ..., and trying to recalculate it here
> with another rules leads to an error, such as 
> 
>  ... example follows ..
> 
> So stop reinventing the another wheel and use topo values that ... has 
> calculated. 

Looks good for me. Thanks!

Regards,
Zhao

> 
> > 
> > Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in 
> > struct CpuTopology")
> > Signed-off-by: Zhao Liu 
> > ---
> >  hw/smbios/smbios.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> > index d2007e70fb05..d67415d44dd8 100644
> > --- a/hw/smbios/smbios.c
> > +++ b/hw/smbios/smbios.c
> > @@ -1088,8 +1088,7 @@ void smbios_get_tables(MachineState *ms,
> >  smbios_build_type_2_table();
> >  smbios_build_type_3_table();
> >  
> > -smbios_smp_sockets = DIV_ROUND_UP(ms->smp.cpus,
> > -  ms->smp.cores * ms->smp.threads);
> > +smbios_smp_sockets = ms->smp.sockets;
> >  assert(smbios_smp_sockets >= 1);
> >  
> >  for (i = 0; i < smbios_smp_sockets; i++) {
>

Re: [PATCH v2] block/file-posix: fix wps checking in raw_co_prw

2023-06-07 Thread Damien Le Moal

On 6/8/23 03:57, Sam Li wrote:
> If the write operation fails and the wps is NULL, then accessing it will
> lead to data corruption.
> 
> Solving the issue by adding a nullptr checking in get_zones_wp() where
> the wps is used.
> 
> This issue is found by Peter Maydell using the Coverity Tool (CID
> 1512459).
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index ac1ed54811..4a6c71c7f5 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2523,7 +2523,7 @@ out:
>  }
>  }
>  } else {
> -if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
> +if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND) && wps) {
>  update_zones_wp(bs, s->fd, 0, 1);

Nit: this could be:

} else if (wps && type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {

However, both if & else side do something only if the above condition is true
and we only need to that for a zoned drive. So the entire code block could
really be simplified to be a lot more readable. Something like this (totally
untested, not even compiled):

#if defined(CONFIG_BLKZONED)
if (bs->bl.zone_size && (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
BlockZoneWps *wps = bs->wps;
uint64_t *wp;

if (!wps) {
return ret;
}

if (ret) {
/* write error: update the wp from the underlying device */
update_zones_wp(bs, s->fd, 0, 1);
goto unlock;
}

wp = >wp[offset / bs->bl.zone_size];
if (BDRV_ZT_IS_CONV(*wp)) {
/* Conventional zones do not have a write pointer */
goto unlock;
}

/* Return the written position for zone append */
if (type & QEMU_AIO_ZONE_APPEND) {
*s->offset = *wp;
trace_zbd_zone_append_complete(bs,
*s->offset >> BDRV_SECTOR_BITS);
}

/* Advance the wp if needed */
if (offset + bytes > *wp) {
*wp = offset + bytes;
}

unlock:
qemu_co_mutex_unlock(>colock);
}
#endif

And making this entire block a helper function (e.g. advance_zone_wp()) would
further clean the code. But that should be done in another patch. Care to send 
one ?

-- 
Damien Le Moal
Western Digital Research

Re: Building of docs does not work anymore

2023-06-07 Thread John Snow

On Wed, Jun 7, 2023 at 5:46 AM Thomas Huth  wrote:
>
> On 07/06/2023 11.42, Thomas Huth wrote:
> >
> >   Hi Paolo, hi John,
> >
> > since the recent reworks with the Python venv, building of the docs does not
> > work for me on my RHEL 8 installation anymore.
> >
> > If I just run "configure" without any additional arguments, I get:
> >
> > - 8< -
> > $ ./configure
> > Using './build' as the directory for build output
> > python determined to be '/usr/bin/python3.8'
> > python version: Python 3.8.13
> > mkvenv: Creating non-isolated virtual environment at 'pyvenv'
> > mkvenv: checking for meson>=0.63.0
> > mkvenv: installing meson>=0.63.0
> > mkvenv: checking for sphinx>=1.6.0, sphinx-rtd-theme>=0.5.0
> >
> > *** Ouch! ***
> >
> > Could not provide build dependency 'sphinx>=1.6.0':
> >   • Python package 'sphinx' was not found nor installed.
> >   • mkvenv was configured to operate offline and did not check PyPI.
> >   • 'sphinx-build' was detected on your system at '/usr/bin/sphinx-build',
> > but the Python package 'sphinx' was not found by this Python interpreter
> > ('/usr/bin/python3.8'). Typically this means that 'sphinx-build' has been
> > installed against a different Python interpreter on your system.
> >
> > Sphinx not found/usable, disabling docs.
> > - 8< -

Looks right as far as I can see. Should this behave differently, do you think?

> >
> > If I enable downloads and enforce --enable-docs , I get:
> >
> > - 8< -
> > ./configure --enable-docs --enable-download
> > Using './build' as the directory for build output
> > python determined to be '/usr/bin/python3.8'
> > python version: Python 3.8.13
> > mkvenv: Creating non-isolated virtual environment at 'pyvenv'
> > mkvenv: checking for meson>=0.63.0
> > mkvenv: installing meson>=0.63.0
> > mkvenv: checking for sphinx>=1.6.0, sphinx-rtd-theme>=0.5.0
> > mkvenv: installing sphinx>=1.6.0, sphinx-rtd-theme>=0.5.0
> > ERROR: sphinx-rtd-theme 1.2.1 has requirement docutils<0.19, but you'll have
> > docutils 0.20.1 which is incompatible.
> > ERROR: sphinx-rtd-theme 1.2.1 has requirement sphinx<7,>=1.6, but you'll
> > have sphinx 7.0.1 which is incompatible.
> > - 8< -
>
> Actually, it seems like it builds the docs in the latter case ... but the
> two error messages still look quite menacing (printed with red letters).
>
>   Thomas

Hm, in this case it appears that the latest versions of these packages
for Python 3.8 actually conflict with each other, which is ... funny.
I would think that in this case:

1. You don't have existing Python3.8 packages if it isn't your
system's native Python interpreter
2. Pip should have not chosen packages that conflict with each other ...

Can you do me a favor and try:

> python3.8 -m pip list

and tell me what packages it says you have there?

When I run "pip install sphinx>=1.6.0 sphinx-rtd-theme>=0.5.0" in a
fresh Python3.8 virtual environment, it chooses these versions:
- Sphinx 6.2.1
- sphinx-rtd-theme 1.2.2

Seems like Sphinx 7.0.1 is quite a bit too new for this, I wonder why
it chose it?

--js

Re: [PATCH 1/1] tests/avocado: update firmware to enable sbsa-ref/max

2023-06-07 Thread Philippe Mathieu-Daudé


On 7/6/23 17:29, Marcin Juszkiewicz wrote:

W dniu 7.06.2023 o 16:33, Philippe Mathieu-Daudé pisze:

On 30/5/23 17:22, Marcin Juszkiewicz wrote:

Update prebuilt firmware images to have TF-A with FEAT_FGT support
enabled. This allowed us to enable test for "max" cpu in sbsa-ref
machine.

Signed-off-by: Marcin Juszkiewicz 
---
  tests/avocado/machine_aarch64_sbsaref.py | 22 +++---
  1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/tests/avocado/machine_aarch64_sbsaref.py 
b/tests/avocado/machine_aarch64_sbsaref.py

index 0a79fa7ab6..35f8042416 100644
--- a/tests/avocado/machine_aarch64_sbsaref.py
+++ b/tests/avocado/machine_aarch64_sbsaref.py
@@ -29,23 +29,23 @@ def fetch_firmware(self):
  """
  Flash volumes generated using:
-    - Fedora GNU Toolchain version 12.2.1 20220819 (Red Hat 
Cross 12.2.1-2)
+    - Fedora GNU Toolchain version 13.1.1 20230511 (Red Hat 
13.1.1-2)

  - Trusted Firmware-A
- https://github.com/ARM-software/arm-trusted-firmware/tree/5fdb2e54
+ https://github.com/ARM-software/arm-trusted-firmware/tree/c0d8ee38
  - Tianocore EDK II
-  https://github.com/tianocore/edk2/tree/494127613b
-  https://github.com/tianocore/edk2-non-osi/tree/41876073
-  https://github.com/tianocore/edk2-platforms/tree/8efa4f42
+  https://github.com/tianocore/edk2/tree/0f9283429dd4
+  https://github.com/tianocore/edk2-non-osi/tree/f0bb00937ad6
+  https://github.com/tianocore/edk2-platforms/tree/7880b92e2a04


Thanks for updating this comment!


Having a way to reproduce is crucial for CI.


-    @skip("requires TF-A update to handle FEAT_FGT")
+    @skipUnless(os.getenv("AVOCADO_TIMEOUT_EXPECTED"), "Test might 
timeout")


Can it still timeout?


All Linux based tests in this file have that @skipUnless as they take 
some time:


test_sbsaref_edk2_firmware: PASS (2.72 s)
test_sbsaref_alpine_linux_cortex_a57: PASS (23.71 s)
test_sbsaref_alpine_linux_neoverse_n1: PASS (23.53 s)
test_sbsaref_alpine_linux_max: PASS (28.16 s)


I suppose this was due to a bug we had with Avocado consuming QEMU's
console. I don't remember recent complains. Alex, do you know if this
was fixed?
We define the class timeout to 180s, so all tests inherit it. In your
run all tests take <30sec, it should be fine to run them on CI.
Adding ~1min30s extra on the job running these Avocado tests seem
reasonable to me, but I have Cc'ed Thomas who took care to reduce
testing time.

Regards,

Phil.

[PATCH v6 1/2] hw/i386/pc: Default to use SMBIOS 3.0 for newer machine models

2023-06-07 Thread Suravee Suthikulpanit

Currently, pc-q35 and pc-i44fx machine models are default to use SMBIOS 2.8
(32-bit entry point). Since SMBIOS 3.0 (64-bit entry point) is now fully
supported since QEMU 7.0, default to use SMBIOS 3.0 for newer machine
models. This is necessary to avoid the following message when launching
a VM with large number of vcpus.

   "SMBIOS 2.1 table length 66822 exceeds 65535"

Signed-off-by: Suravee Suthikulpanit 
---
 hw/i386/pc.c | 4 +++-
 hw/i386/pc_piix.c| 5 +
 hw/i386/pc_q35.c | 5 +
 include/hw/i386/pc.h | 1 +
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index bb62c994fa..33ffb03a32 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1853,6 +1853,7 @@ static void pc_machine_set_max_fw_size(Object *obj, 
Visitor *v,
 static void pc_machine_initfn(Object *obj)
 {
 PCMachineState *pcms = PC_MACHINE(obj);
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 
 #ifdef CONFIG_VMPORT
 pcms->vmport = ON_OFF_AUTO_AUTO;
@@ -1860,7 +1861,7 @@ static void pc_machine_initfn(Object *obj)
 pcms->vmport = ON_OFF_AUTO_OFF;
 #endif /* CONFIG_VMPORT */
 pcms->max_ram_below_4g = 0; /* use default */
-pcms->smbios_entry_point_type = SMBIOS_ENTRY_POINT_TYPE_32;
+pcms->smbios_entry_point_type = pcmc->default_smbios_ep_type;
 
 /* acpi build is enabled by default if machine supports it */
 pcms->acpi_build_enabled = PC_MACHINE_GET_CLASS(pcms)->has_acpi_build;
@@ -1980,6 +1981,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->nvdimm_supported = true;
 mc->smp_props.dies_supported = true;
 mc->default_ram_id = "pc.ram";
+pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_64;
 
 object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
 pc_machine_get_max_ram_below_4g, pc_machine_set_max_ram_below_4g,
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index d5b0dcd1fe..49462b0e29 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -476,11 +476,16 @@ DEFINE_I440FX_MACHINE(v8_1, "pc-i440fx-8.1", NULL,
 
 static void pc_i440fx_8_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+
 pc_i440fx_8_1_machine_options(m);
 m->alias = NULL;
 m->is_default = false;
 compat_props_add(m->compat_props, hw_compat_8_0, hw_compat_8_0_len);
 compat_props_add(m->compat_props, pc_compat_8_0, pc_compat_8_0_len);
+
+/* For pc-i44fx-8.0 and older, use SMBIOS 2.8 by default */
+pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_32;
 }
 
 DEFINE_I440FX_MACHINE(v8_0, "pc-i440fx-8.0", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 6155427e48..6b9fd4d537 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -387,10 +387,15 @@ DEFINE_Q35_MACHINE(v8_1, "pc-q35-8.1", NULL,
 
 static void pc_q35_8_0_machine_options(MachineClass *m)
 {
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+
 pc_q35_8_1_machine_options(m);
 m->alias = NULL;
 compat_props_add(m->compat_props, hw_compat_8_0, hw_compat_8_0_len);
 compat_props_add(m->compat_props, pc_compat_8_0, pc_compat_8_0_len);
+
+/* For pc-q35-8.0 and older, use SMBIOS 2.8 by default */
+pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_32;
 }
 
 DEFINE_Q35_MACHINE(v8_0, "pc-q35-8.0", NULL,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index c661e9cc80..6eec0fc51d 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -110,6 +110,7 @@ struct PCMachineClass {
 bool smbios_defaults;
 bool smbios_legacy_mode;
 bool smbios_uuid_encoded;
+SmbiosEntryPointType default_smbios_ep_type;
 
 /* RAM / address space compat: */
 bool gigabyte_align;
-- 
2.34.1

[PATCH v6 0/2] hw/i386/pc: Update max_cpus and default to SMBIOS

2023-06-07 Thread Suravee Suthikulpanit

In order to support large number of vcpus, a newer 64-bit SMBIOS
entry point type is needed. Therefore, upgrade the default SMBIOS version
for PC machines to SMBIOS 3.0 for newer systems. Then increase the maximum
number of vCPUs for Q35 models to 1024, which is the limit for KVM.

Changes from V5:
(https://lore.kernel.org/qemu-devel/20230607024939.703991-1-suravee.suthikulpa...@amd.com/T/#m5a9f0d0e2355aebf81501355a1bf349a9929f4bb)
 * Patch 1: Get rid of pc_machine_init_smbios() and simplify the logic
   per Igor's suggestion.
 * Patch 2: Added reviewed-by tag.

Thank you,
Suravee

Suravee Suthikulpanit (2):
  hw/i386/pc: Default to use SMBIOS 3.0 for newer machine models
  pc: q35: Bump max_cpus to 1024

 hw/i386/pc.c | 4 +++-
 hw/i386/pc_piix.c| 5 +
 hw/i386/pc_q35.c | 8 +++-
 include/hw/i386/pc.h | 1 +
 4 files changed, 16 insertions(+), 2 deletions(-)

-- 
2.34.1

[PATCH v6 2/2] pc: q35: Bump max_cpus to 1024

2023-06-07 Thread Suravee Suthikulpanit

Since KVM_MAX_VCPUS is currently defined to 1024 for x86 as shown in
arch/x86/include/asm/kvm_host.h, update QEMU limits to the same number.

In case KVM could not support the specified number of vcpus, QEMU would
return the following error message:

  qemu-system-x86_64: kvm_init_vcpu: kvm_get_vcpu failed (xxx): Invalid argument

Also, keep max_cpus at 288 for machine version 8.0 and older.

Cc: Igor Mammedov 
Cc: Daniel P. Berrangé 
Cc: Michael S. Tsirkin 
Cc: Julia Suvorova 
Reviewed-by: Igor Mammedov 
Signed-off-by: Suravee Suthikulpanit 
---
 hw/i386/pc_q35.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 6b9fd4d537..b26fd9bbaf 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -368,12 +368,12 @@ static void pc_q35_machine_options(MachineClass *m)
 m->default_nic = "e1000e";
 m->default_kernel_irqchip_split = false;
 m->no_floppy = 1;
+m->max_cpus = 1024;
 m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_RAMFB_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
-m->max_cpus = 288;
 }
 
 static void pc_q35_8_1_machine_options(MachineClass *m)
@@ -396,6 +396,7 @@ static void pc_q35_8_0_machine_options(MachineClass *m)
 
 /* For pc-q35-8.0 and older, use SMBIOS 2.8 by default */
 pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_32;
+m->max_cpus = 288;
 }
 
 DEFINE_Q35_MACHINE(v8_0, "pc-q35-8.0", NULL,
-- 
2.34.1

Re: [PATCH 1/1] maintainers: update maintainers list for vfio-user & multi-process QEMU

2023-06-07 Thread Philippe Mathieu-Daudé


On 7/6/23 17:58, Jagannathan Raman wrote:

Signed-off-by: Jagannathan Raman 
---
  MAINTAINERS | 1 -
  1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 436b3f0afefd..4a80a385118d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3786,7 +3786,6 @@ F: tests/tcg/aarch64/system/semiheap.c
  Multi-process QEMU
  M: Elena Ufimtseva 
  M: Jagannathan Raman 
-M: John G Johnson 
  S: Maintained
  F: docs/devel/multi-process.rst
  F: docs/system/multi-process.rst


Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v5 1/3] hw/i386/pc: Refactor logic to set SMBIOS defaults

2023-06-07 Thread Suthikulpanit, Suravee





On 6/7/2023 3:11 PM, Daniel P. Berrangé wrote:

On Tue, Jun 06, 2023 at 09:49:37PM -0500, Suravee Suthikulpanit wrote:

Into a helper function pc_machine_init_smbios() in preparation for
subsequent code to upgrade default SMBIOS entry point type.

Then, call the helper function from the pc_machine_initfn() to eliminate
duplicate code in pc_q35.c and pc_pixx.c. However, this changes the
ordering of when the smbios_set_defaults() is called to before
pc_machine_set_smbios_ep() (i.e. before handling the user specified
QEMU option "-M ...,smbios-entry-point-type=[32|64]" to override
the default type.)

Therefore, also call the helper function in pc_machine_set_smbios_ep()
to update the defaults.


This is unsafe - smbios_set_defaults is only intended to be called
once. Calling it twice leads to a SEGV due to double-free

$  ./build/qemu-system-x86_64 -machine pc,smbios-entry-point-type=64 -smbios 
file=/tmp/smbios_entry_point
Segmentation fault (core dumped)


Thanks for pointing this out. I missed this


IMHO we should just not do this refactoring. The existing duplicated
code is not a significant burden, and thus is better than having to
workaround calling pc_machine_set_smbios_ep too early in startup.


Ok

Thanks,
Suravee

Re: [PATCH v5 2/3] hw/i386/pc: Default to use SMBIOS 3.0 for newer machine models

2023-06-07 Thread Suthikulpanit, Suravee





On 6/7/2023 8:49 PM, Igor Mammedov wrote:

On Tue, 6 Jun 2023 21:49:38 -0500
Suravee Suthikulpanit  wrote:



and use this with the rest of your patch

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index b3d826a83a..c5bab28e9c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1859,7 +1859,7 @@ static void pc_machine_initfn(Object *obj)
  pcms->vmport = ON_OFF_AUTO_OFF;
  #endif /* CONFIG_VMPORT */
  pcms->max_ram_below_4g = 0; /* use default */
-pcms->smbios_entry_point_type = SMBIOS_ENTRY_POINT_TYPE_32;
+pcms->smbios_entry_point_type = pcmc->default_smbios_ep_type;


Ah, I missed this part. Thanks for suggestions. I'll send out v6.

Suravee


  /* acpi build is enabled by default if machine supports it */
  pcms->acpi_build_enabled = PC_MACHINE_GET_CLASS(pcms)->has_acpi_build;
@@ -1979,6 +1979,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
  mc->nvdimm_supported = true;
  mc->smp_props.dies_supported = true;
  mc->default_ram_id = "pc.ram";
+mc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_64;
  
  object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",

  pc_machine_get_max_ram_below_4g, pc_machine_set_max_ram_below_4g,

Re: [PATCH 16/16] target/riscv/kvm.c: read/write (cbom|cboz)_blocksize in KVM

2023-06-07 Thread Daniel Henrique Barboza





On 6/7/23 10:01, Andrew Jones wrote:

On Tue, May 30, 2023 at 04:46:23PM -0300, Daniel Henrique Barboza wrote:

If we don't set a proper cbom_blocksize|cboz_blocksize in the FDT the
Linux Kernel will fail to detect the availability of the CBOM/CBOZ
extensions, regardless of the contents of the 'riscv,isa' DT prop.

The FDT is being written using the cpu->cfg.cbom|z_blocksize attributes,
so let's use them. We'll also expose them as user flags like it is
already done with TCG.

However, in contrast with what happens with TCG, the user is not able to
set any value that is different from the 'host' value. And KVM can be
harsh dealing with it: a ENOTSUPP can be thrown for the mere attempt of
executing kvm_set_one_reg() for these 2 regs.

We'll read the 'host' value and use it to set these values, regardless of
user choice. If the user happened to chose a different value, error out.
We'll also error out if we failed to read the block sizes.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/kvm.c | 94 +-
  1 file changed, 92 insertions(+), 2 deletions(-)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 92b99fe261..7789d835e5 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -241,8 +241,16 @@ static void kvm_cpu_cfg_set(RISCVCPU *cpu, 
RISCVCPUMultiExtConfig *multi_ext,
  uint32_t val)
  {
  int cpu_cfg_offset = multi_ext->cpu_cfg_offset;
-bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+uint16_t *blocksize;
+bool *ext_enabled;
  
+if (strstr(multi_ext->name, "blocksize")) {

+blocksize = (void *)>cfg + cpu_cfg_offset;
+*blocksize = val;
+return;
+}


We should add 'get' accessors to each property and then always use those
accessors to get the values. Trying to share a single accessor across
properties, using the names to determine their sizes, is basically trying
to reinvent 'get' without the function pointer.


To be honest we don't need all this machinery for the blocksize attributes.
We check them only in a few cases and could access them directly via cpu->cfg.

I'll change this up in v2.


Daniel




+
+ext_enabled = (void *)>cfg + cpu_cfg_offset;
  *ext_enabled = val;
  }
  
@@ -250,8 +258,15 @@ static uint32_t kvm_cpu_cfg_get(RISCVCPU *cpu,

  RISCVCPUMultiExtConfig *multi_ext)
  {
  int cpu_cfg_offset = multi_ext->cpu_cfg_offset;
-bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+uint16_t *blocksize;
+bool *ext_enabled;
  
+if (strstr(multi_ext->name, "blocksize")) {

+blocksize = (void *)>cfg + cpu_cfg_offset;
+return *blocksize;
+}
+
+ext_enabled = (void *)>cfg + cpu_cfg_offset;
  return *ext_enabled;
  }
  
@@ -295,6 +310,33 @@ static void kvm_cpu_set_multi_ext_cfg(Object *obj, Visitor *v,

  kvm_cpu_cfg_set(cpu, multi_ext_cfg, value);
  }
  
+/*

+ * We'll avoid extra complexity by always assuming this
+ * array order with cbom first.
+ */
+static RISCVCPUMultiExtConfig kvm_cbomz_blksize_cfgs[] = {


Hmm, yet another cfg struct type, and this one is specific to block sizes.
I'd rather we find a way to keep cfg definitions more general and then use
the same struct for all.


+{.name = "cbom_blocksize", .cpu_cfg_offset = CPUCFG(cbom_blocksize),
+ .kvm_reg_id = KVM_REG_RISCV_CONFIG_REG(zicbom_block_size)},
+{.name = "cboz_blocksize", .cpu_cfg_offset = CPUCFG(cboz_blocksize),
+ .kvm_reg_id = KVM_REG_RISCV_CONFIG_REG(zicboz_block_size)},
+};
+
+static void kvm_cpu_set_cbomz_blksize(Object *obj, Visitor *v,
+  const char *name,
+  void *opaque, Error **errp)
+{
+RISCVCPUMultiExtConfig *cbomz_size_cfg = opaque;
+RISCVCPU *cpu = RISCV_CPU(obj);
+uint16_t value;
+
+if (!visit_type_uint16(v, name, , errp)) {
+return;
+}
+
+cbomz_size_cfg->user_set = true;
+kvm_cpu_cfg_set(cpu, cbomz_size_cfg, value);
+}
+
  static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU *cpu, CPUState *cs)
  {
  CPURISCVState *env = >env;
@@ -321,6 +363,45 @@ static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU 
*cpu, CPUState *cs)
  }
  }
  
+static void kvm_riscv_finalize_features(RISCVCPU *cpu, CPUState *cs)

+{
+CPURISCVState *env = >env;
+uint64_t id, reg;
+int i, ret;
+
+for (i = 0; i < ARRAY_SIZE(kvm_cbomz_blksize_cfgs); i++) {
+RISCVCPUMultiExtConfig *cbomz_cfg = _cbomz_blksize_cfgs[i];
+uint64_t host_val;
+
+if ((i == 0 && !cpu->cfg.ext_icbom) ||
+(i == 1 && !cpu->cfg.ext_icboz)) {


Rather than the required array order and this magic index stuff, we can
just save the offset of the ext_* boolean in the cfg structure, like we
already do for the *_blocksize, and then check it here.

Also, I think we want to warn here if cbomz_cfg->user_set is set. If the
user set some block size, but disabled the

[PATCH v2] hw/acpi: Fix PM control register access

2023-06-07 Thread BALATON Zoltan

On pegasos2 which has ACPI as part of VT8231 south bridge the board
firmware writes PM control register by accessing the second byte so
addr will be 1. This wasn't handled correctly and the write went to
addr 0 instead. Remove the acpi_pm1_cnt_write() function which is used
only once and does not take addr into account and handle non-zero
address in acpi_pm_cnt_{read|write}. This fixes ACPI shutdown with
pegasos2 firmware.

Signed-off-by: BALATON Zoltan 
---
 hw/acpi/core.c | 52 +-
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index 6da275c599..00b1e79a30 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -551,30 +551,6 @@ void acpi_pm_tmr_reset(ACPIREGS *ar)
 }
 
 /* ACPI PM1aCNT */
-static void acpi_pm1_cnt_write(ACPIREGS *ar, uint16_t val)
-{
-ar->pm1.cnt.cnt = val & ~(ACPI_BITMASK_SLEEP_ENABLE);
-
-if (val & ACPI_BITMASK_SLEEP_ENABLE) {
-/* change suspend type */
-uint16_t sus_typ = (val >> 10) & 7;
-switch (sus_typ) {
-case 0: /* soft power off */
-qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
-break;
-case 1:
-qemu_system_suspend_request();
-break;
-default:
-if (sus_typ == ar->pm1.cnt.s4_val) { /* S4 request */
-qapi_event_send_suspend_disk();
-qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
-}
-break;
-}
-}
-}
-
 void acpi_pm1_cnt_update(ACPIREGS *ar,
  bool sci_enable, bool sci_disable)
 {
@@ -593,13 +569,37 @@ void acpi_pm1_cnt_update(ACPIREGS *ar,
 static uint64_t acpi_pm_cnt_read(void *opaque, hwaddr addr, unsigned width)
 {
 ACPIREGS *ar = opaque;
-return ar->pm1.cnt.cnt;
+return ar->pm1.cnt.cnt >> addr * 8;
 }
 
 static void acpi_pm_cnt_write(void *opaque, hwaddr addr, uint64_t val,
   unsigned width)
 {
-acpi_pm1_cnt_write(opaque, val);
+ACPIREGS *ar = opaque;
+
+if (addr == 1) {
+val = val << 8 | (ar->pm1.cnt.cnt & 0xff);
+}
+ar->pm1.cnt.cnt = val & ~(ACPI_BITMASK_SLEEP_ENABLE);
+
+if (val & ACPI_BITMASK_SLEEP_ENABLE) {
+/* change suspend type */
+uint16_t sus_typ = (val >> 10) & 7;
+switch (sus_typ) {
+case 0: /* soft power off */
+qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+break;
+case 1:
+qemu_system_suspend_request();
+break;
+default:
+if (sus_typ == ar->pm1.cnt.s4_val) { /* S4 request */
+qapi_event_send_suspend_disk();
+qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+}
+break;
+}
+}
 }
 
 static const MemoryRegionOps acpi_pm_cnt_ops = {
-- 
2.30.9

Re: [PATCH 13/16] target/riscv/kvm.c: add multi-letter extension KVM properties

2023-06-07 Thread Daniel Henrique Barboza





On 6/7/23 08:48, Andrew Jones wrote:

On Tue, May 30, 2023 at 04:46:20PM -0300, Daniel Henrique Barboza wrote:

Let's add KVM user properties for the multi-letter extensions that KVM
currently supports: zicbom, zicboz, zihintpause, zbb, ssaia, sstc,
svinval and svpbmt.

As with the recently added MISA properties we're also going to add a
'user_set' flag in each of them. The flag will be set only if the user
chose an option that's different from the host and will require extra
handling from the KVM driver.

However, multi-letter CPUs have more cases to cover than MISA
extensions, so we're adding an extra 'supported' flag as well. This flag
will reflect if a given extension is supported by KVM, i.e. KVM knows
how to handle it. This is determined during KVM extension discovery in
kvm_riscv_init_multiext_cfg(), where we test for EINVAL errors. Any
other error different from EINVAL will cause an abort.


I wish that was ENOENT, but I suppose that ship sailed.



The 'supported' flag will then be used later on to give an exception for
users that are disabling multi-letter extensions that are unknown to
KVM.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/kvm.c | 136 +
  1 file changed, 136 insertions(+)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index bb1dafe263..b4193a10d8 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -202,6 +202,99 @@ static void kvm_riscv_update_cpu_misa_ext(RISCVCPU *cpu, 
CPUState *cs)
  }
  }
  
+typedef struct RISCVCPUMultiExtConfig {

+const char *name;


No description? I'd prefer we use the same cfg struct for single-letter
and multi-letter extensions. We can use a union to overlap cpu_cfg_offset
and misa_bit.


multi-letter extensions don't have a 'description' field in TCG. Nothing
prevents us from adding for KVM though.

And yes, I'll create a single struct to handle both MISA and multi-letter
extensions. We just need to have different getters/setters for each
category.




+int kvm_reg_id;
+int cpu_cfg_offset;
+bool supported;
+bool user_set;
+} RISCVCPUMultiExtConfig;
+
+#define CPUCFG(_prop) offsetof(struct RISCVCPUConfig, _prop)
+
+/*
+ * KVM ISA Multi-letter extensions. We care about the order
+ * since it'll be used to create the ISA string later on.
+ * We follow the same ordering rules of isa_edata_arr[]
+ * from target/riscv/cpu.c.
+ */
+static RISCVCPUMultiExtConfig kvm_multi_ext_cfgs[] = {
+{.name = "zicbom", .kvm_reg_id = KVM_RISCV_ISA_EXT_ZICBOM,
+ .cpu_cfg_offset = CPUCFG(ext_icbom)},
+{.name = "zicboz", .kvm_reg_id = KVM_RISCV_ISA_EXT_ZICBOZ,
+ .cpu_cfg_offset = CPUCFG(ext_icboz)},
+{.name = "zihintpause", .kvm_reg_id = KVM_RISCV_ISA_EXT_ZIHINTPAUSE,
+ .cpu_cfg_offset = CPUCFG(ext_zihintpause)},
+{.name = "zbb", .kvm_reg_id = KVM_RISCV_ISA_EXT_ZBB,
+ .cpu_cfg_offset = CPUCFG(ext_zbb)},
+{.name = "ssaia", .kvm_reg_id = KVM_RISCV_ISA_EXT_SSAIA,
+ .cpu_cfg_offset = CPUCFG(ext_ssaia)},
+{.name = "sstc", .kvm_reg_id = KVM_RISCV_ISA_EXT_SSTC,
+ .cpu_cfg_offset = CPUCFG(ext_sstc)},
+{.name = "svinval", .kvm_reg_id = KVM_RISCV_ISA_EXT_SVINVAL,
+ .cpu_cfg_offset = CPUCFG(ext_svinval)},
+{.name = "svpbmt", .kvm_reg_id = KVM_RISCV_ISA_EXT_SVPBMT,
+ .cpu_cfg_offset = CPUCFG(ext_svpbmt)},


As pointed out in the last patch, it'd be nice to share names (and
descriptions) with TCG.


I believe it's ok to do that for the MISA bits since it's a handful of entries.

But I'd rather deal with a little code duplication for multi-letter extensions
for now. We have 73 multi-letters extensions in TCG. Adding a description for
each one, change how the properties are being created and so on seems too
much for this series.




+};
+
+static void kvm_cpu_cfg_set(RISCVCPU *cpu, RISCVCPUMultiExtConfig *multi_ext,
+uint32_t val)
+{
+int cpu_cfg_offset = multi_ext->cpu_cfg_offset;
+bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+
+*ext_enabled = val;
+}
+
+static uint32_t kvm_cpu_cfg_get(RISCVCPU *cpu,
+RISCVCPUMultiExtConfig *multi_ext)
+{
+int cpu_cfg_offset = multi_ext->cpu_cfg_offset;
+bool *ext_enabled = (void *)>cfg + cpu_cfg_offset;
+
+return *ext_enabled;
+}
+
+static void kvm_cpu_set_multi_ext_cfg(Object *obj, Visitor *v,
+  const char *name,
+  void *opaque, Error **errp)
+{
+RISCVCPUMultiExtConfig *multi_ext_cfg = opaque;
+RISCVCPU *cpu = RISCV_CPU(obj);
+bool value, host_val;
+
+if (!visit_type_bool(v, name, , errp)) {
+return;
+}
+
+host_val = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
+
+/*
+ * Ignore if the user is setting the same value
+ * as the host.
+ */
+if (value == host_val) {
+return;
+}
+
+if (!multi_ext_cfg->supported) {
+/*
+ * Error out if the user is trying to enable

Re: [RFC v2 3/6] target/i386: Add native library calls

2023-06-07 Thread Richard Henderson


On 6/7/23 09:47, Yeqi Fu wrote:

+/* One unknown opcode for native call */
+#if defined(CONFIG_USER_ONLY)  && defined(CONFIG_USER_NATIVE_CALL)
+case 0x1ff:
+uint16_t sig = x86_lduw_code(env, s);
+switch (sig) {
+case NATIVE_MEMCPY:
+gen_helper_native_memcpy(cpu_env);
+break;
+case NATIVE_MEMSET:
+gen_helper_native_memset(cpu_env);
+break;
+case NATIVE_MEMCMP:
+gen_helper_native_memcmp(cpu_env);
+break;
+default:
+goto unknown_op;
+}
+break;
+#endif


This bit of code must be protected by native_calls_enabled() or some such, as we do with 
semihosting_enabled().


Which means that patch 6 should come before this, so that native_calls_enabled() can be 
true if and only if "-native-bypass" is given.



r~

Re: [RFC v2 4/6] target/mips: Add native library calls

2023-06-07 Thread Richard Henderson


On 6/7/23 09:47, Yeqi Fu wrote:

+void helper_native_memcpy(CPUMIPSState *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+void *ret;
+void *dest = g2h(cs, arg0);
+void *src = g2h(cs, arg1);
+size_t n = (size_t)arg2;
+ret = memcpy(dest, src, n);
+env->active_tc.gpr[2] = (target_ulong)h2g(ret);
+}


I would expect everything except for the guest ABI to be handled by common code, so that 
you do not have N copies of every native emulated function.  This needs to be something like


abi_ptr do_native_memcpy(CPUArchState *env, abi_ptr dst, abi_ptr src,
 abi_ptr len, uintptr_t ra);

void helper_native_memcpy(CPUMIPSState *env)
{
env->active_tc.gpr[2] =
do_native_memcpy(env, env->active_tc.gpr[4],
 env->active_tc.gpr[5],
 env->active_tc.gpr[6], GETPC());
}

Even better, provide some guest abstraction akin to va_start/va_arg so that all of the 
per-native function code becomes shared.



r~

Re: [RFC v2 3/6] target/i386: Add native library calls

2023-06-07 Thread Richard Henderson


On 6/7/23 09:47, Yeqi Fu wrote:

+arg0 = *(target_ulong *)g2h(cs, env->regs[R_ESP] + 4); \
+arg1 = *(target_ulong *)g2h(cs, env->regs[R_ESP] + 8); \
+arg2 = *(target_ulong *)g2h(cs, env->regs[R_ESP] + 12);


This is not correct, and will fail on big-endian hosts.

You need to use

uintptr_t ra = GETPC();
cpu_ldl_data_ra(env, guest_pointer, ra);

which will (amongst other things) take care of the byte swapping.


+void helper_native_memcpy(CPUX86State *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+void *ret;
+void *dest = g2h(cs, arg0);
+void *src = g2h(cs, arg1);
+size_t n = (size_t)arg2;
+ret = memcpy(dest, src, n);
+env->regs[R_EAX] = (target_ulong)h2g(ret);
+}


You need to do something for the case in which either src or dst is not 
accessible.

Routines like cpu_ldl_data_ra handle this for you, but you don't want to use 
that for memcpy.

There are several ways of doing this.  None of the existing helpers are ideal.

(A) void *dest = probe_write(env, arg0, arg2, MMU_USER_IDX, ra);
void *src = probe_read(env, arg1, arg2, MMU_USER_IDX, ra);

which will raise SIGSEGV in case any byte of either region is not correctly mapped, and 
also perform the guest-to-host address remapping.  However, probe_* are written to expect 
probing of no more than one page.  Which means you'd need a loop, processing remaining 
page fractions.


(B) There is page_check_range(), which can check a large region, but doesn't handle 
address translation.  And you still wind up with a race condition if another thread 
changes page mappings at the same time.


(C) Perform the address translation etc yourself, and then protect the actual host memory 
operation in the same way as exec/cpu_ldst.h functions:


set_helper_retaddr(ra);
memcpy(dest, src, n);
clear_helper_retaddr();

In this case you must also validate that 'n' is representable.  This is only an issue for 
32-bit host and 64-bit guest.  A check like (arg2 > SIZE_MAX) is likely to generate a 
silly warning about always false comparison on 64-bit hosts.  Therefore I suggest


if (n != arg2) {
/*
 * Overflow of size_t means that sequential pointer access would wrap.
 * We know that NULL is unmapped, so at least that one byte would fault.
 * There is nothing in the specification of memcpy that requires bytes
 * to be accessed in order, so we are allowed to fault early.
 */
cpu_loop_exit_sigsegv(env_cpu(env), 0, MMU_DATA_LOAD, true, ra);
}

Finally, you know the return value from the specification of memcpy: arg0.
There is no need to remap the return value back from host to guest space.


r~

Re: [PATCH v5 00/11] * Add allwinner-r40 support *

2023-06-07 Thread Niek Linnenbank

Hi Peter, Qianfan,

On Mon, Jun 5, 2023 at 5:31 PM Peter Maydell 
wrote:

> On Thu, 1 Jun 2023 at 19:48, Niek Linnenbank 
> wrote:
> >
> > Hi Qianfan,
> >
> > Thanks for sending the v5. From my side, I have no further comments on
> the content.
> > So please feel free to add the following to each of the patches 01-11 in
> the series:
> >
> > Reviewed-by: Niek Linnenbank 
> >
> > As a reminder and explained here on this page, you'll need to make sure
> these lines gets added to each of the commit messages:
> >
> https://www.qemu.org/docs/master/devel/submitting-a-patch.html#proper-use-of-reviewed-by-tags-can-aid-review
> >
> > Doing so would require you to send another updated v6, and baselined on
> the latest master.
>
> The rebase was simple, so I've applied this v5 to target-arm.next.
>
Great news!


> (The patch application tools can pick up Reviewed-by tags that are
> on-list without requiring a respin just for that.)
>
Ahh right, I wasn't aware of that. Thanks, I learned something new here.


>
> Qianfan: thanks for working on this feature and for your efforts
> in working through our patch review process.
>
Yes, great work indeed!

>
> Niek: thanks very much for taking the lead on the patch review of
> this series, it's been a tremendous help.
>
I'm glad to help. I only do qemu work in my spare time when I can, so it's
not much.
But I enjoy doing it and learn something new in the process also.

Regards,
Niek


>
> -- PMM
>


-- 
Niek Linnenbank

[PATCH v2] block/file-posix: fix wps checking in raw_co_prw

2023-06-07 Thread Sam Li

If the write operation fails and the wps is NULL, then accessing it will
lead to data corruption.

Solving the issue by adding a nullptr checking in get_zones_wp() where
the wps is used.

This issue is found by Peter Maydell using the Coverity Tool (CID
1512459).

Signed-off-by: Sam Li 
---
 block/file-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ac1ed54811..4a6c71c7f5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2523,7 +2523,7 @@ out:
 }
 }
 } else {
-if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
+if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND) && wps) {
 update_zones_wp(bs, s->fd, 0, 1);
 }
 }
-- 
2.40.1

Re: [PATCH] iotests: fix 194: filter out racy postcopy-active event

2023-06-07 Thread Richard Henderson


On 6/7/23 07:36, Vladimir Sementsov-Ogievskiy wrote:

The event is racy: it will not appear in the output if bitmap is
migrated during downtime period of migration and postcopy phase is not
started.

Fixes: ae00aa239847 "iotests: 194: test also migration of dirty bitmap"
Reported-by: Richard Henderson 
Signed-off-by: Vladimir Sementsov-Ogievskiy 


Queued and applied.


r~

Re: [PULL 00/12] xen queue

2023-06-07 Thread Richard Henderson


On 6/7/23 07:18, Anthony PERARD via wrote:

From: Anthony PERARD

The following changes since commit f5e6786de4815751b0a3d2235c760361f228ea48:

   Merge tag 'pull-target-arm-20230606' 
ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging (2023-06-06 
12:11:34 -0700)

are available in the Git repository at:

   https://xenbits.xen.org/git-http/people/aperard/qemu-dm.git  
tags/pull-xen-20230607

for you to fetch changes up to 9000666052f99ed4217e75b73636acae61e6fc2c:

   xen-block: fix segv on unrealize (2023-06-07 15:07:10 +0100)


Xen queue

- fix for xen-block segv
- Resolve TYPE_PIIX3_XEN_DEVICE
- Xen emulation build/Coverity fixes


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~

Re: [PULL 0/2] vfio-user queue

2023-06-07 Thread Richard Henderson


On 6/7/23 07:38, Jagannathan Raman wrote:

The following changes since commit f5e6786de4815751b0a3d2235c760361f228ea48:

   Merge tag 'pull-target-arm-20230606' 
ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging (2023-06-06 
12:11:34 -0700)

are available in the Git repository at:

   https://gitlab.com/jraman/qemu.git  tags/pull-vfio-user-20230607

for you to fetch changes up to 7771e8b86335968ee46538d1afd44246e7a062bc:

   docs: fix multi-process QEMU documentation (2023-06-07 10:21:53 -0400)


vfio-user: Fix the documentation for vfio-user and multi-process QEMU

Signed-off-by: Jagannathan Raman


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~

Re: [PATCH 04/20] migration: file URI

2023-06-07 Thread Steven Sistare

Please ignore this, wrong subject line.  I will resend. - steve

On 6/7/2023 2:37 PM, Steve Sistare wrote:
> Extend the migration URI to support file:.  This can be used for
> any migration scenario that does not require a reverse path.  It can be used
> as an alternative to 'exec:cat > file' in minimized containers that do not
> contain /bin/sh, and it is easier to use than the fd: URI.  It can
> be used in HMP commands, and as a qemu command-line parameter.
> 
> Signed-off-by: Steve Sistare 
> ---
>  migration/file.c   | 62 
> ++
>  migration/file.h   | 14 
>  migration/meson.build  |  1 +
>  migration/migration.c  |  5 
>  migration/trace-events |  4 
>  qemu-options.hx|  6 -
>  6 files changed, 91 insertions(+), 1 deletion(-)
>  create mode 100644 migration/file.c
>  create mode 100644 migration/file.h
> 
> diff --git a/migration/file.c b/migration/file.c
> new file mode 100644
> index 000..8e35827
> --- /dev/null
> +++ b/migration/file.c
> @@ -0,0 +1,62 @@
> +/*
> + * Copyright (c) 2021-2023 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "channel.h"
> +#include "file.h"
> +#include "migration.h"
> +#include "io/channel-file.h"
> +#include "io/channel-util.h"
> +#include "trace.h"
> +
> +void file_start_outgoing_migration(MigrationState *s, const char *filename,
> +   Error **errp)
> +{
> +g_autoptr(QIOChannelFile) fioc = NULL;
> +QIOChannel *ioc;
> +
> +trace_migration_file_outgoing(filename);
> +
> +fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
> + 0600, errp);
> +if (!fioc) {
> +return;
> +}
> +
> +ioc = QIO_CHANNEL(fioc);
> +qio_channel_set_name(ioc, "migration-file-outgoing");
> +migration_channel_connect(s, ioc, NULL, NULL);
> +}
> +
> +static gboolean file_accept_incoming_migration(QIOChannel *ioc,
> +   GIOCondition condition,
> +   gpointer opaque)
> +{
> +migration_channel_process_incoming(ioc);
> +object_unref(OBJECT(ioc));
> +return G_SOURCE_REMOVE;
> +}
> +
> +void file_start_incoming_migration(const char *filename, Error **errp)
> +{
> +QIOChannelFile *fioc = NULL;
> +QIOChannel *ioc;
> +
> +trace_migration_file_incoming(filename);
> +
> +fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
> +if (!fioc) {
> +return;
> +}
> +
> +ioc = QIO_CHANNEL(fioc);
> +qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
> +qio_channel_add_watch_full(ioc, G_IO_IN,
> +   file_accept_incoming_migration,
> +   NULL, NULL,
> +   g_main_context_get_thread_default());
> +}
> diff --git a/migration/file.h b/migration/file.h
> new file mode 100644
> index 000..841b94a
> --- /dev/null
> +++ b/migration/file.h
> @@ -0,0 +1,14 @@
> +/*
> + * Copyright (c) 2021-2023 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_MIGRATION_FILE_H
> +#define QEMU_MIGRATION_FILE_H
> +void file_start_incoming_migration(const char *filename, Error **errp);
> +
> +void file_start_outgoing_migration(MigrationState *s, const char *filename,
> +   Error **errp);
> +#endif
> diff --git a/migration/meson.build b/migration/meson.build
> index 8ba6e42..3af817e 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -16,6 +16,7 @@ softmmu_ss.add(files(
>'dirtyrate.c',
>'exec.c',
>'fd.c',
> +  'file.c',
>'global_state.c',
>'migration-hmp-cmds.c',
>'migration.c',
> diff --git a/migration/migration.c b/migration/migration.c
> index dc05c6f..cfbde86 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -20,6 +20,7 @@
>  #include "migration/blocker.h"
>  #include "exec.h"
>  #include "fd.h"
> +#include "file.h"
>  #include "socket.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/sysemu.h"
> @@ -442,6 +443,8 @@ static void qemu_start_incoming_migration(const char 
> *uri, Error **errp)
>  exec_start_incoming_migration(p, errp);
>  } else if (strstart(uri, "fd:", )) {
>  fd_start_incoming_migration(p, errp);
> +} else if (strstart(uri, "file:", )) {
> +file_start_incoming_migration(p, errp);
>  } else {
>  error_setg(errp, "unknown migration protocol: %s", uri);
>  }
> @@ -1662,6 +1665,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
> blk,
>  exec_start_outgoing_migration(s, p, _err);
>  } else if

[PATCH V2] migration: file URI

2023-06-07 Thread Steve Sistare

Extend the migration URI to support file:.  This can be used for
any migration scenario that does not require a reverse path.  It can be used
as an alternative to 'exec:cat > file' in minimized containers that do not
contain /bin/sh, and it is easier to use than the fd: URI.  It can
be used in HMP commands, and as a qemu command-line parameter.

Signed-off-by: Steve Sistare 
---
 migration/file.c   | 62 ++
 migration/file.h   | 14 
 migration/meson.build  |  1 +
 migration/migration.c  |  5 
 migration/trace-events |  4 
 qemu-options.hx|  6 -
 6 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 migration/file.c
 create mode 100644 migration/file.h

diff --git a/migration/file.c b/migration/file.c
new file mode 100644
index 000..8e35827
--- /dev/null
+++ b/migration/file.c
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2021-2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "channel.h"
+#include "file.h"
+#include "migration.h"
+#include "io/channel-file.h"
+#include "io/channel-util.h"
+#include "trace.h"
+
+void file_start_outgoing_migration(MigrationState *s, const char *filename,
+   Error **errp)
+{
+g_autoptr(QIOChannelFile) fioc = NULL;
+QIOChannel *ioc;
+
+trace_migration_file_outgoing(filename);
+
+fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
+ 0600, errp);
+if (!fioc) {
+return;
+}
+
+ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(ioc, "migration-file-outgoing");
+migration_channel_connect(s, ioc, NULL, NULL);
+}
+
+static gboolean file_accept_incoming_migration(QIOChannel *ioc,
+   GIOCondition condition,
+   gpointer opaque)
+{
+migration_channel_process_incoming(ioc);
+object_unref(OBJECT(ioc));
+return G_SOURCE_REMOVE;
+}
+
+void file_start_incoming_migration(const char *filename, Error **errp)
+{
+QIOChannelFile *fioc = NULL;
+QIOChannel *ioc;
+
+trace_migration_file_incoming(filename);
+
+fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
+if (!fioc) {
+return;
+}
+
+ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
+qio_channel_add_watch_full(ioc, G_IO_IN,
+   file_accept_incoming_migration,
+   NULL, NULL,
+   g_main_context_get_thread_default());
+}
diff --git a/migration/file.h b/migration/file.h
new file mode 100644
index 000..841b94a
--- /dev/null
+++ b/migration/file.h
@@ -0,0 +1,14 @@
+/*
+ * Copyright (c) 2021-2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_FILE_H
+#define QEMU_MIGRATION_FILE_H
+void file_start_incoming_migration(const char *filename, Error **errp);
+
+void file_start_outgoing_migration(MigrationState *s, const char *filename,
+   Error **errp);
+#endif
diff --git a/migration/meson.build b/migration/meson.build
index 8ba6e42..3af817e 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -16,6 +16,7 @@ softmmu_ss.add(files(
   'dirtyrate.c',
   'exec.c',
   'fd.c',
+  'file.c',
   'global_state.c',
   'migration-hmp-cmds.c',
   'migration.c',
diff --git a/migration/migration.c b/migration/migration.c
index dc05c6f..cfbde86 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -20,6 +20,7 @@
 #include "migration/blocker.h"
 #include "exec.h"
 #include "fd.h"
+#include "file.h"
 #include "socket.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
@@ -442,6 +443,8 @@ static void qemu_start_incoming_migration(const char *uri, 
Error **errp)
 exec_start_incoming_migration(p, errp);
 } else if (strstart(uri, "fd:", )) {
 fd_start_incoming_migration(p, errp);
+} else if (strstart(uri, "file:", )) {
+file_start_incoming_migration(p, errp);
 } else {
 error_setg(errp, "unknown migration protocol: %s", uri);
 }
@@ -1662,6 +1665,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 exec_start_outgoing_migration(s, p, _err);
 } else if (strstart(uri, "fd:", )) {
 fd_start_outgoing_migration(s, p, _err);
+} else if (strstart(uri, "file:", )) {
+file_start_outgoing_migration(s, p, _err);
 } else {
 if (!(has_resume && resume)) {
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
diff --git a/migration/trace-events b/migration/trace-events
index cdaef7a..c8c1771 100644
---

[PATCH 04/20] migration: file URI

2023-06-07 Thread Steve Sistare

Extend the migration URI to support file:.  This can be used for
any migration scenario that does not require a reverse path.  It can be used
as an alternative to 'exec:cat > file' in minimized containers that do not
contain /bin/sh, and it is easier to use than the fd: URI.  It can
be used in HMP commands, and as a qemu command-line parameter.

Signed-off-by: Steve Sistare 
---
 migration/file.c   | 62 ++
 migration/file.h   | 14 
 migration/meson.build  |  1 +
 migration/migration.c  |  5 
 migration/trace-events |  4 
 qemu-options.hx|  6 -
 6 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 migration/file.c
 create mode 100644 migration/file.h

diff --git a/migration/file.c b/migration/file.c
new file mode 100644
index 000..8e35827
--- /dev/null
+++ b/migration/file.c
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2021-2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "channel.h"
+#include "file.h"
+#include "migration.h"
+#include "io/channel-file.h"
+#include "io/channel-util.h"
+#include "trace.h"
+
+void file_start_outgoing_migration(MigrationState *s, const char *filename,
+   Error **errp)
+{
+g_autoptr(QIOChannelFile) fioc = NULL;
+QIOChannel *ioc;
+
+trace_migration_file_outgoing(filename);
+
+fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
+ 0600, errp);
+if (!fioc) {
+return;
+}
+
+ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(ioc, "migration-file-outgoing");
+migration_channel_connect(s, ioc, NULL, NULL);
+}
+
+static gboolean file_accept_incoming_migration(QIOChannel *ioc,
+   GIOCondition condition,
+   gpointer opaque)
+{
+migration_channel_process_incoming(ioc);
+object_unref(OBJECT(ioc));
+return G_SOURCE_REMOVE;
+}
+
+void file_start_incoming_migration(const char *filename, Error **errp)
+{
+QIOChannelFile *fioc = NULL;
+QIOChannel *ioc;
+
+trace_migration_file_incoming(filename);
+
+fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
+if (!fioc) {
+return;
+}
+
+ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
+qio_channel_add_watch_full(ioc, G_IO_IN,
+   file_accept_incoming_migration,
+   NULL, NULL,
+   g_main_context_get_thread_default());
+}
diff --git a/migration/file.h b/migration/file.h
new file mode 100644
index 000..841b94a
--- /dev/null
+++ b/migration/file.h
@@ -0,0 +1,14 @@
+/*
+ * Copyright (c) 2021-2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_FILE_H
+#define QEMU_MIGRATION_FILE_H
+void file_start_incoming_migration(const char *filename, Error **errp);
+
+void file_start_outgoing_migration(MigrationState *s, const char *filename,
+   Error **errp);
+#endif
diff --git a/migration/meson.build b/migration/meson.build
index 8ba6e42..3af817e 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -16,6 +16,7 @@ softmmu_ss.add(files(
   'dirtyrate.c',
   'exec.c',
   'fd.c',
+  'file.c',
   'global_state.c',
   'migration-hmp-cmds.c',
   'migration.c',
diff --git a/migration/migration.c b/migration/migration.c
index dc05c6f..cfbde86 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -20,6 +20,7 @@
 #include "migration/blocker.h"
 #include "exec.h"
 #include "fd.h"
+#include "file.h"
 #include "socket.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
@@ -442,6 +443,8 @@ static void qemu_start_incoming_migration(const char *uri, 
Error **errp)
 exec_start_incoming_migration(p, errp);
 } else if (strstart(uri, "fd:", )) {
 fd_start_incoming_migration(p, errp);
+} else if (strstart(uri, "file:", )) {
+file_start_incoming_migration(p, errp);
 } else {
 error_setg(errp, "unknown migration protocol: %s", uri);
 }
@@ -1662,6 +1665,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 exec_start_outgoing_migration(s, p, _err);
 } else if (strstart(uri, "fd:", )) {
 fd_start_outgoing_migration(s, p, _err);
+} else if (strstart(uri, "file:", )) {
+file_start_outgoing_migration(s, p, _err);
 } else {
 if (!(has_resume && resume)) {
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
diff --git a/migration/trace-events b/migration/trace-events
index cdaef7a..c8c1771 100644
---

Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu

2023-06-07 Thread Fan Ni

On Wed, Jun 07, 2023 at 06:13:01PM +, Shesha Bhushan Sreenivasamurthy wrote:
> Hi Fan,
>I am implementing DCD FMAPI commands and planning to start pushing changes 
> to the below branch. That requires the contributions you have made. Can your 
> changes be pushed to the below branch ?
> 
> https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2023-05-25__;!!EwVzqGoTKBqv-0DWAJBm!Vt5uIqwW-L4c4gh02ulI4M762JNQ3_aE9k9lb6QlwE2xm6T23ic7ig7Y77i1VN7l_RX_ySIQhre_z7Q0JA$
>  

Can you push changes to the branch directly? I think it is Jonathan's private
branch. However, I can fork the branch and rebase my patch series atop and
share with you the new repo if that helps you move forward your
work.
Let me know your thought.

Fan

> 
> 
> From: Fan Ni 
> Sent: Monday, June 5, 2023 10:51 AM
> To: Ira Weiny 
> Cc: qemu-devel@nongnu.org ; 
> jonathan.came...@huawei.com ; 
> linux-...@vger.kernel.org ; 
> gregory.pr...@memverge.com ; 
> hch...@avery-design.com.tw ; 
> cbr...@avery-design.com ; dan.j.willi...@intel.com 
> ; Adam Manzanares ; 
> d...@stgolabs.net ; nmtadam.sams...@gmail.com 
> ; ni...@outlook.com 
> Subject: Re: [Qemu RFC 0/7] Early enabling of DCD emulation in Qemu 
>  
> On Mon, Jun 05, 2023 at 10:35:48AM -0700, Ira Weiny wrote:
> > Fan Ni wrote:
> > > Since the early draft of DCD support in kernel is out
> > > (https://urldefense.com/v3/__https://lore.kernel.org/linux-cxl/20230417164126.GA1904906@bgt-140510-bm03/T/*t__;Iw!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21LzAGNOk$
> > >  ),
> > > this patch series provide dcd emulation in qemu so people who are 
> > > interested
> > > can have an early try. It is noted that the patch series may need to be 
> > > updated
> > > accordingly if the kernel side implementation changes.
> > 
> > Fan,
> > 
> > Do you have a git tree we can pull this from which is updated to a more
> > recent CXL branch from Jonathan?
> > 
> > Thanks,
> > Ira
> 
> Hi Ira,
> 
> I have a git tree of the patch series based on Jonathan's branch
> cxl-2023-02-28: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_moking_qemu-2Ddev_tree_dcd-2Drfe=DwIFAg=nKjWec2b6R0mOyPaz7xtfQ=Zta64bwn4nurTRpD4LY2OGr8KklkMRPn7Z_Qy0o4unU=w6dicn5kXEG4Imk6TpICIjdA6KJ-xt84dtHui-Y0fv5H13bijtzEvjxECKE5MHYf=3yeO9RN5FY3gPfO2y19X057YeqRTTQTQNfNA-Gfir_Q=
>  .
> 
> That may be not new enough to include some of the recent patches, but I can
> rebase it to a newer branch if you can tell me which branch you want to use.
> 
> Thanks,
> Fan
> 
> > 
> > > 
> > > To support DCD emulation, the patch series add DCD related mailbox command
> > > support (CXL Spec 3.0: 8.2.9.8.9), and extend the cxl type3 memory device
> > > with dynamic capacity extent and region representative.
> > > To support read/write to the dynamic capacity of the device, a host 
> > > backend
> > > is provided and necessary check mechnism is added to ensure the dynamic
> > > capacity accessed is backed with active dc extents.
> > > Currently FM related mailbox commands (cxl spec 3.0: 7.6.7.6) is not 
> > > supported
> > > , but we add two qmp interfaces for adding/releasing dynamic capacity 
> > > extents.
> > > Also, the support for multiple hosts sharing the same DCD case is missing.
> > > 
> > > Things we can try with the patch series together with kernel dcd code:
> > > 1. Create DC regions to cover the address range of the dynamic capacity
> > > regions.
> > > 2. Add/release dynamic capacity extents to the device and notify the
> > > kernel.
> > > 3. Test kernel side code to accept added dc extents and create dax 
> > > devices,
> > > and release dc extents and notify the device
> > > 4. Online the memory range backed with dc extents and let application use
> > > them.
> > > 
> > > The patch series is based on Jonathan's local qemu branch:
> > > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28__;!!EwVzqGoTKBqv-0DWAJBm!RHzXPIcSiGsqUciUIH6HnlG_W--4L5CHfvcOIeUFdwKFhAujXuFDxjymmpCdOu7SLr61rww7lr21OO3UHEM$
> > >  
> > > 
> > > Simple tests peformed with the patch series:
> > > 1 Install cxl modules:
> > > 
> > > modprobe -a cxl_acpi cxl_core cxl_pci cxl_port cxl_mem
> > > 
> > > 2 Create dc regions:
> > > 
> > > region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region)
> > > echo $region> /sys/bus/cxl/devices/decoder0.0/create_dc_region
> > > echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity
> > > echo 1 > /sys/bus/cxl/devices/$region/interleave_ways
> > > echo "dc" >/sys/bus/cxl/devices/decoder2.0/mode
> > > echo 0x1000 >/sys/bus/cxl/devices/decoder2.0/dpa_size
> > > echo 0x1000 > /sys/bus/cxl/devices/$region/size
> > > echo  "decoder2.0" > /sys/bus/cxl/devices/$region/target0
> > > echo 1 > /sys/bus/cxl/devices/$region/commit
> > > echo $region > /sys/bus/cxl/drivers/cxl_region/bind
> > > 
> > > /home/fan/cxl/tools-and-scripts# cxl list
> > > [
> > >   {
> > > "memdevs":[
> > >   {
> > >

Re: [PATCH v3 10/14] nbd/client: Initial support for extended headers

2023-06-07 Thread Eric Blake

The subject lines are confusing: 9/14 enables extended headers in the
server, while this one does not yet enable the headers but is merely a
preliminary step.  I'll probably retitle one or the other in v4.

On Wed, May 31, 2023 at 06:26:17PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.05.23 22:53, Eric Blake wrote:
> > Update the client code to be able to send an extended request, and
> > parse an extended header from the server.  Note that since we reject
> > any structured reply with a too-large payload, we can always normalize
> > a valid header back into the compact form, so that the caller need not
> > deal with two branches of a union.  Still, until a later patch lets
> > the client negotiate extended headers, the code added here should not
> > be reached.  Note that because of the different magic numbers, it is
> > just as easy to trace and then tolerate a non-compliant server sending
> > the wrong header reply as it would be to insist that the server is
> > compliant.
> > 
> > The only caller to nbd_receive_reply() always passed NULL for errp;
> > since we are changing the signature anyways, I decided to sink the
> > decision to ignore errors one layer lower.
> 
> This way nbd_receive_simple_reply() and nbd_receive_structured_reply_chunk() 
> are called now only with explicit NULL last argument.. And we start to drop 
> all errors.
> 
> Also, actually, we'd better add errp parameter to the caller - 
> nbd_receive_replies(), because its caller (nbd_co_do_receive_one_chunk()) 
> will benefit of it, as it already has errp.

I can explore plumbing errp back through for v4.

> > @@ -1394,28 +1401,34 @@ static int nbd_receive_simple_reply(QIOChannel 
> > *ioc, NBDSimpleReply *reply,
> > 
> >   /* nbd_receive_structured_reply_chunk
> >* Read structured reply chunk except magic field (which should be already
> > - * read).
> > + * read).  Normalize into the compact form.
> >* Payload is not read.
> >*/
> > -static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
> > -  NBDStructuredReplyChunk 
> > *chunk,
> > +static int nbd_receive_structured_reply_chunk(QIOChannel *ioc, NBDReply 
> > *chunk,
> > Error **errp)
> 
> Hmm, _structured_or_extened_ ? Or at least in comment above the function we 
> should mention this.

I'm going with 'nbd_receive_reply_chunk', since both structured and
extended modes receive chunks.

> 
> >   {
> >   int ret;
> > +size_t len;
> > +uint64_t payload_len;
> > 
> > -assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
> > +if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
> > +len = sizeof(chunk->structured);
> > +} else {
> > +assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
> > +len = sizeof(chunk->extended);
> > +}
> > 
> >   ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
> > -   sizeof(*chunk) - sizeof(chunk->magic), "structured 
> > chunk",
> 
> Would be good to print "extended chunk" in error message for EXTENDED case.

Or even just "chunk header", which covers both modes.

> >   int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
> > -   NBDReply *reply, Error **errp)
> > +   NBDReply *reply, NBDHeaderStyle hdr)
> >   {
> >   int ret;
> >   const char *type;
> > 
> > -ret = nbd_read_eof(bs, ioc, >magic, sizeof(reply->magic), errp);
> > +ret = nbd_read_eof(bs, ioc, >magic, sizeof(reply->magic), NULL);
> >   if (ret <= 0) {
> >   return ret;
> >   }
> > 
> >   reply->magic = be32_to_cpu(reply->magic);
> > 
> > +/* Diagnose but accept wrong-width header */
> >   switch (reply->magic) {
> >   case NBD_SIMPLE_REPLY_MAGIC:
> > -ret = nbd_receive_simple_reply(ioc, >simple, errp);
> > +if (hdr >= NBD_HEADER_EXTENDED) {
> > +trace_nbd_receive_wrong_header(reply->magic);
> 
> maybe, trace also expected style

Sure, I can give that a shot.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] iotests: fix 194: filter out racy postcopy-active event

2023-06-07 Thread Stefan Hajnoczi

On Wed, Jun 07, 2023 at 05:36:06PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> The event is racy: it will not appear in the output if bitmap is
> migrated during downtime period of migration and postcopy phase is not
> started.
> 
> Fixes: ae00aa239847 "iotests: 194: test also migration of dirty bitmap"
> Reported-by: Richard Henderson 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> The patch fixes the problem described in
>   [PATCH] gitlab: Disable io-raw-194 for build-tcg-disabled
> and we can keep the test in gitlab ci
> 
>  tests/qemu-iotests/194 | 5 +
>  tests/qemu-iotests/194.out | 1 -
>  2 files changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 2/2] block/file-posix: fix wps checking in raw_co_prw

2023-06-07 Thread Stefan Hajnoczi

On Sun, Jun 04, 2023 at 02:16:58PM +0800, Sam Li wrote:
> If the write operation fails and the wps is NULL, then accessing it will
> lead to data corruption.
> 
> Solving the issue by adding a nullptr checking in get_zones_wp() where
> the wps is used.
> 
> This issue is found by Peter Maydell using the Coverity Tool (CID
> 1512459).
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 0d9d179a35..620942bf40 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1340,6 +1340,10 @@ static int get_zones_wp(BlockDriverState *bs, int fd, 
> int64_t offset,
>  rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct 
> blk_zone);
>  g_autofree struct blk_zone_report *rep = NULL;
>  
> +if (!wps) {
> +return -1;
> +}

An error will be printed every time this happens on a non-zoned device:

  static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
  unsigned int nrz)
  {
  if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
  error_report("update zone wp failed");

Please change the following code to avoid the call to update_zones_wp():

  #if defined(CONFIG_BLKZONED)
  {
  BlockZoneWps *wps = bs->wps;
  if (ret == 0) {
  if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
  && wps && bs->bl.zone_size) {
  uint64_t *wp = >wp[offset / bs->bl.zone_size];
  if (!BDRV_ZT_IS_CONV(*wp)) {
  if (type & QEMU_AIO_ZONE_APPEND) {
  *s->offset = *wp;
  trace_zbd_zone_append_complete(bs, *s->offset
  >> BDRV_SECTOR_BITS);
  }
  /* Advance the wp if needed */
  if (offset + bytes > *wp) {
  *wp = offset + bytes;
  }
  }
  }
  } else {
- if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
+ if (wps && (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
  update_zones_wp(bs, s->fd, 0, 1);
  }
  }

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 1/2] block/file-posix: fix g_file_get_contents return path

2023-06-07 Thread Stefan Hajnoczi

On Sun, Jun 04, 2023 at 02:16:57PM +0800, Sam Li wrote:
> The g_file_get_contents() function returns a g_boolean. If it fails, the
> returned value will be 0 instead of -1. Solve the issue by skipping
> assigning ret value.
> 
> This issue was found by Matthew Rosato using virtio-blk-{pci,ccw} backed
> by an NVMe partition e.g. /dev/nvme0n1p1 on s390x.
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)

The number of bytes returned was never used, so changing the return
value to 0 or -errno is fine:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2 00/11] block: Re-enable the graph lock

2023-06-07 Thread Stefan Hajnoczi

On Mon, Jun 05, 2023 at 10:57:00AM +0200, Kevin Wolf wrote:
> This series fixes the deadlock that was observed before commit ad128dff
> ('graph-lock: Disable locking for now'), which just disabled the graph
> lock completely as a workaround to get 8.0.1 stable.
> 
> In theory the problem is simple: We can't poll while still holding the
> lock of a different AioContext. So bdrv_graph_wrlock() just needs to
> drop that lock before it polls. However, there are a number of callers
> that don't even hold the AioContext lock they are supposed to hold, so
> temporarily unlocking tries to unlock a mutex that isn't locked,
> resulting in assertion failures.
> 
> Therefore, much of this series is just for fixing AioContext locking
> correctness. It is only the last two patches that actually fix the
> deadlock and reenable the graph locking.
> 
> v2:
> - Fixed patch 2 to actually lock the correct AioContext even if the
>   device doesn't support iothreads
> - Improved the commit message for patch 7 [Eric]
> - Fixed mismerge in patch 11 (v1 incorrectly left an #if 0 around)
> 
> Kevin Wolf (11):
>   iotests: Test active commit with iothread and background I/O
>   qdev-properties-system: Lock AioContext for blk_insert_bs()
>   test-block-iothread: Lock AioContext for blk_insert_bs()
>   block: Fix AioContext locking in bdrv_open_child()
>   block: Fix AioContext locking in bdrv_attach_child_common()
>   block: Fix AioContext locking in bdrv_reopen_parse_file_or_backing()
>   block: Fix AioContext locking in bdrv_open_inherit()
>   block: Fix AioContext locking in bdrv_open_backing_file()
>   blockjob: Fix AioContext locking in block_job_add_bdrv()
>   graph-lock: Unlock the AioContext while polling
>   Revert "graph-lock: Disable locking for now"
> 
>  include/block/graph-lock.h|   6 +-
>  block.c   | 103 --
>  block/graph-lock.c|  42 ---
>  blockjob.c|  17 ++-
>  hw/core/qdev-properties-system.c  |   8 +-
>  tests/unit/test-block-iothread.c  |   7 +-
>  .../tests/iothreads-commit-active |  85 +++
>  .../tests/iothreads-commit-active.out |  23 
>  8 files changed, 250 insertions(+), 41 deletions(-)
>  create mode 100755 tests/qemu-iotests/tests/iothreads-commit-active
>  create mode 100644 tests/qemu-iotests/tests/iothreads-commit-active.out
> 
> -- 
> 2.40.1
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH V9 00/46] Live Update

2023-06-07 Thread Steven Sistare

On 6/7/2023 11:55 AM, Michael Galaxy wrote:
> Another option could be to expose "-migrate-mode-disable" (instead of enable) 
> and just enable all 3 modes by default,
> since we are already required to switch from "normal" mode to a CPR-specific 
> mode when it is time to do a live update,
> if the intention is to preserve the capability to completely prevent a 
> running QEMU from using these modes
> before the VM starts up.
> 
> - Michael
> 
> On 6/6/23 17:15, Michael Galaxy wrote:
>> Hi Steve,
>>
>> In the current design you have, we have to specify both the command line 
>> parameter "-migrate-mode-enable cpr-reboot"
>> *and* issue the monitor command "migrate_set_parameter mode cpr-${mode}".
>>
>> Is it possible to opt-in to the CPR mode just once over the monitor instead 
>> of having to specify it twice on the command line?
>> This would also match the live migration model: You do not need to 
>> necessarily "opt in" to live migration mode through
>> a command line parameter, you simply request it when you need to. Can CPR 
>> behave the same way?
>>
>> This would also make switching over to a CPR-capable version of QEMU much 
>> simpler and would even make it work for
>> existing libvirt-managed guests as their command line parameters would no 
>> longer need to change. This would allow us to
>> simply power-off and power-on existing VMs to make them CPR-capable and then 
>> work on a libvirt patch later when
>> we're ready to do so.
>>
>>
>> Comments?

Hi Michael,
  Requiring -migrate-enable-mode allows qemu to initialize objects
differently, if necessary, so that migration for a mode is not blocked.
See callers of migrate_mode_enabled.  There is only one so far, in
ram_block_add.  If the mode is cpr-exec, then it creates anonymous ram
blocks using memfd_create, else using MAP_ANON.  In the V7 series, this
was controlled by a '-machine memfd-alloc=on' option.  

migrate-enable-mode is more future proof for the user.  If something new must
initialize differently to support cpr, then it adds a call to 
migrate_mode_enabled,
and the command line remains the same.  However, I could be persuaded to go 
either way.

A secondary reason for -migrate-enable-mode is to support the only-cpr-capable
option.  It needs to know which mode will be used, in order to check a
mode-specific blocker list.

- Steve

Re: [PATCH v4] 9pfs: prevent opening special files (CVE-2023-2861)

2023-06-07 Thread Michael Tokarev


07.06.2023 19:29, Christian Schoenebeck wrote:

The 9p protocol does not specifically define how server shall behave when
client tries to open a special file, however from security POV it does
make sense for 9p server to prohibit opening any special file on host side
in general. A sane Linux 9p client for instance would never attempt to
open a special file on host side, it would always handle those exclusively
on its guest side. A malicious client however could potentially escape
from the exported 9p tree by creating and opening a device file on host
side.

With QEMU this could only be exploited in the following unsafe setups:

   - Running QEMU binary as root AND 9p 'local' fs driver AND 'passthrough'
 security model.

or

   - Using 9p 'proxy' fs driver (which is running its helper daemon as
 root).

These setups were already discouraged for safety reasons before,
however for obvious reasons we are now tightening behaviour on this.

Fixes: CVE-2023-2861
Reported-by: Yanwu Shen 
Reported-by: Jietao Xiao 
Reported-by: Jinku Li 
Reported-by: Wenbo Shen 
Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 


Revived-by: Michael Tokarev 

Thank you!

/mjt

[QEMU][PATCH v7 09/10] hw/arm: introduce xenpvh machine

2023-06-07 Thread Vikram Garhwal

Add a new machine xenpvh which creates a IOREQ server to register/connect with
Xen Hypervisor.

Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
TPM emulator and connects to swtpm running on host machine via chardev socket
and support TPM functionalities for a guest domain.

Extra command line for aarch64 xenpvh QEMU to connect to swtpm:
-chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
-tpmdev emulator,id=tpm0,chardev=chrtpm \
-machine tpm-base-addr=0x0c00 \

swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
provides access to TPM functionality over socket, chardev and CUSE interface.
Github repo: https://github.com/stefanberger/swtpm
Example for starting swtpm on host machine:
mkdir /tmp/vtpm2
swtpm socket --tpmstate dir=/tmp/vtpm2 \
--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Stefano Stabellini 
---
 docs/system/arm/xenpvh.rst|  34 +++
 docs/system/target-arm.rst|   1 +
 hw/arm/meson.build|   2 +
 hw/arm/xen_arm.c  | 181 ++
 include/hw/arm/xen_arch_hvm.h |   9 ++
 include/hw/xen/arch_hvm.h |   2 +
 6 files changed, 229 insertions(+)
 create mode 100644 docs/system/arm/xenpvh.rst
 create mode 100644 hw/arm/xen_arm.c
 create mode 100644 include/hw/arm/xen_arch_hvm.h

diff --git a/docs/system/arm/xenpvh.rst b/docs/system/arm/xenpvh.rst
new file mode 100644
index 00..e1655c7ab8
--- /dev/null
+++ b/docs/system/arm/xenpvh.rst
@@ -0,0 +1,34 @@
+XENPVH (``xenpvh``)
+=
+This machine creates a IOREQ server to register/connect with Xen Hypervisor.
+
+When TPM is enabled, this machine also creates a tpm-tis-device at a user input
+tpm base address, adds a TPM emulator and connects to a swtpm application
+running on host machine via chardev socket. This enables xenpvh to support TPM
+functionalities for a guest domain.
+
+More information about TPM use and installing swtpm linux application can be
+found at: docs/specs/tpm.rst.
+
+Example for starting swtpm on host machine:
+.. code-block:: console
+
+mkdir /tmp/vtpm2
+swtpm socket --tpmstate dir=/tmp/vtpm2 \
+--ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &
+
+Sample QEMU xenpvh commands for running and connecting with Xen:
+.. code-block:: console
+
+qemu-system-aarch64 -xen-domid 1 \
+-chardev socket,id=libxl-cmd,path=qmp-libxl-1,server=on,wait=off \
+-mon chardev=libxl-cmd,mode=control \
+-chardev socket,id=libxenstat-cmd,path=qmp-libxenstat-1,server=on,wait=off 
\
+-mon chardev=libxenstat-cmd,mode=control \
+-xen-attach -name guest0 -vnc none -display none -nographic \
+-machine xenpvh -m 1301 \
+-chardev socket,id=chrtpm,path=tmp/vtpm2/swtpm-sock \
+-tpmdev emulator,id=tpm0,chardev=chrtpm -machine tpm-base-addr=0x0C00
+
+In above QEMU command, last two lines are for connecting xenpvh QEMU to swtpm
+via chardev socket.
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index 91ebc26c6d..af8d7c77d6 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -106,6 +106,7 @@ undocumented; you can get a complete list by running
arm/stm32
arm/virt
arm/xlnx-versal-virt
+   arm/xenpvh
 
 Emulated CPU architecture support
 =
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index b545ba0e4f..1b2a01a005 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -62,6 +62,8 @@ arm_ss.add(when: 'CONFIG_FSL_IMX7', if_true: 
files('fsl-imx7.c', 'mcimx7d-sabre.
 arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 
'mcimx6ul-evk.c'))
 arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
+arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
+arm_ss.add_all(xen_ss)
 
 softmmu_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))
 softmmu_ss.add(when: 'CONFIG_EXYNOS4', if_true: files('exynos4_boards.c'))
diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
new file mode 100644
index 00..19b1cb81ad
--- /dev/null
+++ b/hw/arm/xen_arm.c
@@ -0,0 +1,181 @@
+/*
+ * QEMU ARM Xen PVH Machine
+ *
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT

[QEMU][PATCH v7 07/10] hw/xen/xen-hvm-common: Use g_new and error_report

2023-06-07 Thread Vikram Garhwal

Replace g_malloc with g_new and perror with error_report.

Signed-off-by: Vikram Garhwal 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 hw/xen/xen-hvm-common.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index cb82f4b83d..42339c96bd 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -33,7 +33,7 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr,
 trace_xen_ram_alloc(ram_addr, size);
 
 nr_pfn = size >> TARGET_PAGE_BITS;
-pfn_list = g_malloc(sizeof (*pfn_list) * nr_pfn);
+pfn_list = g_new(xen_pfn_t, nr_pfn);
 
 for (i = 0; i < nr_pfn; i++) {
 pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
@@ -730,7 +730,7 @@ void destroy_hvm_domain(bool reboot)
 return;
 }
 if (errno != ENOTTY /* old Xen */) {
-perror("xendevicemodel_shutdown failed");
+error_report("xendevicemodel_shutdown failed with error %d", 
errno);
 }
 /* well, try the old thing then */
 }
@@ -784,7 +784,7 @@ static void xen_do_ioreq_register(XenIOState *state,
 }
 
 /* Note: cpus is empty at this point in init */
-state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
+state->cpu_by_vcpu_id = g_new0(CPUState *, max_cpus);
 
 rc = xen_set_ioreq_server_state(xen_domid, state->ioservid, true);
 if (rc < 0) {
@@ -793,7 +793,7 @@ static void xen_do_ioreq_register(XenIOState *state,
 goto err;
 }
 
-state->ioreq_local_port = g_malloc0(max_cpus * sizeof (evtchn_port_t));
+state->ioreq_local_port = g_new0(evtchn_port_t, max_cpus);
 
 /* FIXME: how about if we overflow the page here? */
 for (i = 0; i < max_cpus; i++) {
@@ -850,13 +850,13 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
max_cpus,
 
 state->xce_handle = qemu_xen_evtchn_open();
 if (state->xce_handle == NULL) {
-perror("xen: event channel open");
+error_report("xen: event channel open failed with error %d", errno);
 goto err;
 }
 
 state->xenstore = xs_daemon_open();
 if (state->xenstore == NULL) {
-perror("xen: xenstore open");
+error_report("xen: xenstore open failed with error %d", errno);
 goto err;
 }
 
-- 
2.17.1

[QEMU][PATCH v7 06/10] hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration failure

2023-06-07 Thread Vikram Garhwal

From: Stefano Stabellini 

On ARM it is possible to have a functioning xenpv machine with only the
PV backends and no IOREQ server. If the IOREQ server creation fails continue
to the PV backends initialization.

Also, moved the IOREQ registration and mapping subroutine to new function
xen_do_ioreq_register().

Signed-off-by: Stefano Stabellini 
Signed-off-by: Vikram Garhwal 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 hw/xen/xen-hvm-common.c | 57 +++--
 1 file changed, 38 insertions(+), 19 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index a31b067404..cb82f4b83d 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -764,27 +764,12 @@ void xen_shutdown_fatal_error(const char *fmt, ...)
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_ERROR);
 }
 
-void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
-MemoryListener xen_memory_listener)
+static void xen_do_ioreq_register(XenIOState *state,
+   unsigned int max_cpus,
+   MemoryListener xen_memory_listener)
 {
 int i, rc;
 
-setup_xen_backend_ops();
-
-state->xce_handle = qemu_xen_evtchn_open();
-if (state->xce_handle == NULL) {
-perror("xen: event channel open");
-goto err;
-}
-
-state->xenstore = xs_daemon_open();
-if (state->xenstore == NULL) {
-perror("xen: xenstore open");
-goto err;
-}
-
-xen_create_ioreq_server(xen_domid, >ioservid);
-
 state->exit.notify = xen_exit_notifier;
 qemu_add_exit_notifier(>exit);
 
@@ -849,12 +834,46 @@ void xen_register_ioreq(XenIOState *state, unsigned int 
max_cpus,
 QLIST_INIT(>dev_list);
 device_listener_register(>device_listener);
 
+return;
+
+err:
+error_report("xen hardware virtual machine initialisation failed");
+exit(1);
+}
+
+void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
+MemoryListener xen_memory_listener)
+{
+int rc;
+
+setup_xen_backend_ops();
+
+state->xce_handle = qemu_xen_evtchn_open();
+if (state->xce_handle == NULL) {
+perror("xen: event channel open");
+goto err;
+}
+
+state->xenstore = xs_daemon_open();
+if (state->xenstore == NULL) {
+perror("xen: xenstore open");
+goto err;
+}
+
+rc = xen_create_ioreq_server(xen_domid, >ioservid);
+if (!rc) {
+xen_do_ioreq_register(state, max_cpus, xen_memory_listener);
+} else {
+warn_report("xen: failed to create ioreq server");
+}
+
 xen_bus_init();
 
 xen_be_init();
 
 return;
+
 err:
-error_report("xen hardware virtual machine initialisation failed");
+error_report("xen hardware virtual machine backend registration failed");
 exit(1);
 }
-- 
2.17.1

[QEMU][PATCH v7 10/10] meson.build: enable xenpv machine build for ARM

2023-06-07 Thread Vikram Garhwal

Add CONFIG_XEN for aarch64 device to support build for ARM targets.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Alex Bennée 
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 786c69b06d..afba3b6441 100644
--- a/meson.build
+++ b/meson.build
@@ -136,7 +136,7 @@ endif
 if cpu in ['x86', 'x86_64', 'arm', 'aarch64']
   # i386 emulator provides xenpv machine type for multiple architectures
   accelerator_targets += {
-'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu'],
+'CONFIG_XEN': ['i386-softmmu', 'x86_64-softmmu', 'aarch64-softmmu'],
   }
 endif
 if cpu in ['x86', 'x86_64']
-- 
2.17.1

[QEMU][PATCH v7 08/10] meson.build: do not set have_xen_pci_passthrough for aarch64 targets

2023-06-07 Thread Vikram Garhwal

From: Stefano Stabellini 

have_xen_pci_passthrough is only used for Xen x86 VMs.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Alex Bennée 
---
 meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/meson.build b/meson.build
index a61d3e9b06..786c69b06d 100644
--- a/meson.build
+++ b/meson.build
@@ -1737,6 +1737,8 @@ have_xen_pci_passthrough = 
get_option('xen_pci_passthrough') \
error_message: 'Xen PCI passthrough requested but Xen not enabled') 
\
   .require(targetos == 'linux',
error_message: 'Xen PCI passthrough not available on this 
platform') \
+  .require(cpu == 'x86'  or cpu == 'x86_64',
+   error_message: 'Xen PCI passthrough not available on this 
platform') \
   .allowed()
 
 
-- 
2.17.1

[QEMU][PATCH v7 04/10] xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common

2023-06-07 Thread Vikram Garhwal

From: Stefano Stabellini 

This patch does following:
1. creates arch_handle_ioreq() and arch_xen_set_memory(). This is done in
preparation for moving most of xen-hvm code to an arch-neutral location,
move the x86-specific portion of xen_set_memory to arch_xen_set_memory.
Also, move handle_vmport_ioreq to arch_handle_ioreq.

2. Pure code movement: move common functions to hw/xen/xen-hvm-common.c
Extract common functionalities from hw/i386/xen/xen-hvm.c and move them to
hw/xen/xen-hvm-common.c. These common functions are useful for creating
an IOREQ server.

xen_hvm_init_pc() contains the architecture independent code for creating
and mapping a IOREQ server, connecting memory and IO listeners, initializing
a xen bus and registering backends. Moved this common xen code to a new
function xen_register_ioreq() which can be used by both x86 and ARM 
machines.

Following functions are moved to hw/xen/xen-hvm-common.c:
xen_vcpu_eport(), xen_vcpu_ioreq(), xen_ram_alloc(), xen_set_memory(),
xen_region_add(), xen_region_del(), xen_io_add(), xen_io_del(),
xen_device_realize(), xen_device_unrealize(),
cpu_get_ioreq_from_shared_memory(), cpu_get_ioreq(), do_inp(),
do_outp(), rw_phys_req_item(), read_phys_req_item(),
write_phys_req_item(), cpu_ioreq_pio(), cpu_ioreq_move(),
cpu_ioreq_config(), handle_ioreq(), handle_buffered_iopage(),
handle_buffered_io(), cpu_handle_ioreq(), xen_main_loop_prepare(),
xen_hvm_change_state_handler(), xen_exit_notifier(),
xen_map_ioreq_server(), destroy_hvm_domain() and
xen_shutdown_fatal_error()

3. Removed static type from below functions:
1. xen_region_add()
2. xen_region_del()
3. xen_io_add()
4. xen_io_del()
5. xen_device_realize()
6. xen_device_unrealize()
7. xen_hvm_change_state_handler()
8. cpu_ioreq_pio()
9. xen_exit_notifier()

4. Replace TARGET_PAGE_SIZE with XC_PAGE_SIZE to match the page side with Xen.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
 hw/i386/xen/trace-events|   14 -
 hw/i386/xen/xen-hvm.c   | 1016 ++-
 hw/xen/meson.build  |5 +-
 hw/xen/trace-events |   14 +
 hw/xen/xen-hvm-common.c |  860 ++
 include/hw/i386/xen_arch_hvm.h  |   11 +
 include/hw/xen/arch_hvm.h   |3 +
 include/hw/xen/xen-hvm-common.h |   99 +++
 8 files changed, 1054 insertions(+), 968 deletions(-)
 create mode 100644 hw/xen/xen-hvm-common.c
 create mode 100644 include/hw/i386/xen_arch_hvm.h
 create mode 100644 include/hw/xen/arch_hvm.h
 create mode 100644 include/hw/xen/xen-hvm-common.h

diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index a0c89d91c4..5d0a8d6dcf 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -7,17 +7,3 @@ xen_platform_log(char *s) "xen platform: %s"
 xen_pv_mmio_read(uint64_t addr) "WARNING: read from Xen PV Device MMIO space 
(address 0x%"PRIx64")"
 xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space 
(address 0x%"PRIx64")"
 
-# xen-hvm.c
-xen_ram_alloc(unsigned long ram_addr, unsigned long size) "requested: 0x%lx, 
size 0x%lx"
-xen_client_set_memory(uint64_t start_addr, unsigned long size, bool log_dirty) 
"0x%"PRIx64" size 0x%lx, log_dirty %i"
-handle_ioreq(void *req, uint32_t type, uint32_t dir, uint32_t df, uint32_t 
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) 
"I/O=%p type=%d dir=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d 
size=%d"
-handle_ioreq_read(void *req, uint32_t type, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p read 
type=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
-handle_ioreq_write(void *req, uint32_t type, uint32_t df, uint32_t 
data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) 
"I/O=%p write type=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d 
size=%d"
-cpu_ioreq_pio(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p pio dir=%d 
df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
-cpu_ioreq_pio_read_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) 
"I/O=%p pio read reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
-cpu_ioreq_pio_write_reg(void *req, uint64_t data, uint64_t addr, uint32_t 
size) "I/O=%p pio write reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
-cpu_ioreq_move(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p copy 
dir=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
-xen_map_resource_ioreq(uint32_t id, void *addr) "id: %u addr: %p"
-cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, 
uint32_t

[QEMU][PATCH v7 01/10] hw/i386/xen/: move xen-mapcache.c to hw/xen/

2023-06-07 Thread Vikram Garhwal

xen-mapcache.c contains common functions which can be used for enabling Xen on
aarch64 with IOREQ handling. Moving it out from hw/i386/xen to hw/xen to make it
accessible for both aarch64 and x86.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 hw/i386/meson.build  | 1 +
 hw/i386/xen/meson.build  | 1 -
 hw/i386/xen/trace-events | 5 -
 hw/xen/meson.build   | 4 
 hw/xen/trace-events  | 5 +
 hw/{i386 => }/xen/xen-mapcache.c | 0
 6 files changed, 10 insertions(+), 6 deletions(-)
 rename hw/{i386 => }/xen/xen-mapcache.c (100%)

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3..cfdbfdcbcb 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -33,5 +33,6 @@ subdir('kvm')
 subdir('xen')
 
 i386_ss.add_all(xenpv_ss)
+i386_ss.add_all(xen_ss)
 
 hw_arch += {'i386': i386_ss}
diff --git a/hw/i386/xen/meson.build b/hw/i386/xen/meson.build
index 2e64a34e16..3dc4c4f106 100644
--- a/hw/i386/xen/meson.build
+++ b/hw/i386/xen/meson.build
@@ -1,6 +1,5 @@
 i386_ss.add(when: 'CONFIG_XEN', if_true: files(
   'xen-hvm.c',
-  'xen-mapcache.c',
   'xen_apic.c',
   'xen_pvdevice.c',
 ))
diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index 5d6be61090..a0c89d91c4 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -21,8 +21,3 @@ xen_map_resource_ioreq(uint32_t id, void *addr) "id: %u addr: 
%p"
 cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, 
uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
 cpu_ioreq_config_write(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, 
uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
 
-# xen-mapcache.c
-xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
-xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
-xen_map_cache_return(void* ptr) "%p"
-
diff --git a/hw/xen/meson.build b/hw/xen/meson.build
index 19c6aabc7c..202752e557 100644
--- a/hw/xen/meson.build
+++ b/hw/xen/meson.build
@@ -26,3 +26,7 @@ else
 endif
 
 specific_ss.add_all(when: ['CONFIG_XEN', xen], if_true: xen_specific_ss)
+
+xen_ss = ss.source_set()
+
+xen_ss.add(when: 'CONFIG_XEN', if_true: files('xen-mapcache.c'))
diff --git a/hw/xen/trace-events b/hw/xen/trace-events
index 55c9e1df68..f977c7c8c6 100644
--- a/hw/xen/trace-events
+++ b/hw/xen/trace-events
@@ -41,3 +41,8 @@ xs_node_vprintf(char *path, char *value) "%s %s"
 xs_node_vscanf(char *path, char *value) "%s %s"
 xs_node_watch(char *path) "%s"
 xs_node_unwatch(char *path) "%s"
+
+# xen-mapcache.c
+xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
+xen_remap_bucket(uint64_t index) "index 0x%"PRIx64
+xen_map_cache_return(void* ptr) "%p"
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
similarity index 100%
rename from hw/i386/xen/xen-mapcache.c
rename to hw/xen/xen-mapcache.c
-- 
2.17.1

[QEMU][PATCH v7 05/10] include/hw/xen/xen_common: return error from xen_create_ioreq_server

2023-06-07 Thread Vikram Garhwal

From: Stefano Stabellini 

This is done to prepare for enabling xenpv support for ARM architecture.
On ARM it is possible to have a functioning xenpv machine with only the
PV backends and no IOREQ server. If the IOREQ server creation fails,
continue to the PV backends initialization.

Signed-off-by: Stefano Stabellini 
Signed-off-by: Vikram Garhwal 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 include/hw/xen/xen_native.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/hw/xen/xen_native.h b/include/hw/xen/xen_native.h
index 6bcc83baf9..8b01b071e5 100644
--- a/include/hw/xen/xen_native.h
+++ b/include/hw/xen/xen_native.h
@@ -433,9 +433,10 @@ static inline void xen_unmap_pcidev(domid_t dom,
 {
 }
 
-static inline void xen_create_ioreq_server(domid_t dom,
-   ioservid_t *ioservid)
+static inline int xen_create_ioreq_server(domid_t dom,
+  ioservid_t *ioservid)
 {
+return 0;
 }
 
 static inline void xen_destroy_ioreq_server(domid_t dom,
@@ -566,8 +567,8 @@ static inline void xen_unmap_pcidev(domid_t dom,
   PCI_FUNC(pci_dev->devfn));
 }
 
-static inline void xen_create_ioreq_server(domid_t dom,
-   ioservid_t *ioservid)
+static inline int xen_create_ioreq_server(domid_t dom,
+  ioservid_t *ioservid)
 {
 int rc = xendevicemodel_create_ioreq_server(xen_dmod, dom,
 HVM_IOREQSRV_BUFIOREQ_ATOMIC,
@@ -575,12 +576,14 @@ static inline void xen_create_ioreq_server(domid_t dom,
 
 if (rc == 0) {
 trace_xen_ioreq_server_create(*ioservid);
-return;
+return rc;
 }
 
 *ioservid = 0;
 use_default_ioreq_server = true;
 trace_xen_default_ioreq_server();
+
+return rc;
 }
 
 static inline void xen_destroy_ioreq_server(domid_t dom,
-- 
2.17.1

[QEMU][PATCH v7 03/10] hw/i386/xen/xen-hvm: move x86-specific fields out of XenIOState

2023-06-07 Thread Vikram Garhwal

From: Stefano Stabellini 

In preparation to moving most of xen-hvm code to an arch-neutral location, move:
- shared_vmport_page
- log_for_dirtybit
- dirty_bitmap
- suspend
- wakeup

out of XenIOState struct as these are only used on x86, especially the ones
related to dirty logging.
Updated XenIOState can be used for both aarch64 and x86.

Also, remove free_phys_offset as it was unused.

Signed-off-by: Stefano Stabellini 
Signed-off-by: Vikram Garhwal 
Reviewed-by: Paul Durrant 
Reviewed-by: Alex Bennée 
---
 hw/i386/xen/xen-hvm.c | 58 ---
 1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 5403ac4b89..6be5a250a8 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -74,6 +74,7 @@ struct shared_vmport_iopage {
 };
 typedef struct shared_vmport_iopage shared_vmport_iopage_t;
 #endif
+static shared_vmport_iopage_t *shared_vmport_page;
 
 static inline uint32_t xen_vcpu_eport(shared_iopage_t *shared_page, int i)
 {
@@ -96,6 +97,11 @@ typedef struct XenPhysmap {
 } XenPhysmap;
 
 static QLIST_HEAD(, XenPhysmap) xen_physmap;
+static const XenPhysmap *log_for_dirtybit;
+/* Buffer used by xen_sync_dirty_bitmap */
+static unsigned long *dirty_bitmap;
+static Notifier suspend;
+static Notifier wakeup;
 
 typedef struct XenPciDevice {
 PCIDevice *pci_dev;
@@ -106,7 +112,6 @@ typedef struct XenPciDevice {
 typedef struct XenIOState {
 ioservid_t ioservid;
 shared_iopage_t *shared_page;
-shared_vmport_iopage_t *shared_vmport_page;
 buffered_iopage_t *buffered_io_page;
 xenforeignmemory_resource_handle *fres;
 QEMUTimer *buffered_io_timer;
@@ -126,14 +131,8 @@ typedef struct XenIOState {
 MemoryListener io_listener;
 QLIST_HEAD(, XenPciDevice) dev_list;
 DeviceListener device_listener;
-hwaddr free_phys_offset;
-const XenPhysmap *log_for_dirtybit;
-/* Buffer used by xen_sync_dirty_bitmap */
-unsigned long *dirty_bitmap;
 
 Notifier exit;
-Notifier suspend;
-Notifier wakeup;
 } XenIOState;
 
 /* Xen specific function for piix pci */
@@ -463,10 +462,10 @@ static int xen_remove_from_physmap(XenIOState *state,
 }
 
 QLIST_REMOVE(physmap, list);
-if (state->log_for_dirtybit == physmap) {
-state->log_for_dirtybit = NULL;
-g_free(state->dirty_bitmap);
-state->dirty_bitmap = NULL;
+if (log_for_dirtybit == physmap) {
+log_for_dirtybit = NULL;
+g_free(dirty_bitmap);
+dirty_bitmap = NULL;
 }
 g_free(physmap);
 
@@ -627,16 +626,16 @@ static void xen_sync_dirty_bitmap(XenIOState *state,
 return;
 }
 
-if (state->log_for_dirtybit == NULL) {
-state->log_for_dirtybit = physmap;
-state->dirty_bitmap = g_new(unsigned long, bitmap_size);
-} else if (state->log_for_dirtybit != physmap) {
+if (log_for_dirtybit == NULL) {
+log_for_dirtybit = physmap;
+dirty_bitmap = g_new(unsigned long, bitmap_size);
+} else if (log_for_dirtybit != physmap) {
 /* Only one range for dirty bitmap can be tracked. */
 return;
 }
 
 rc = xen_track_dirty_vram(xen_domid, start_addr >> TARGET_PAGE_BITS,
-  npages, state->dirty_bitmap);
+  npages, dirty_bitmap);
 if (rc < 0) {
 #ifndef ENODATA
 #define ENODATA  ENOENT
@@ -651,7 +650,7 @@ static void xen_sync_dirty_bitmap(XenIOState *state,
 }
 
 for (i = 0; i < bitmap_size; i++) {
-unsigned long map = state->dirty_bitmap[i];
+unsigned long map = dirty_bitmap[i];
 while (map != 0) {
 j = ctzl(map);
 map &= ~(1ul << j);
@@ -677,12 +676,10 @@ static void xen_log_start(MemoryListener *listener,
 static void xen_log_stop(MemoryListener *listener, MemoryRegionSection 
*section,
  int old, int new)
 {
-XenIOState *state = container_of(listener, XenIOState, memory_listener);
-
 if (old & ~new & (1 << DIRTY_MEMORY_VGA)) {
-state->log_for_dirtybit = NULL;
-g_free(state->dirty_bitmap);
-state->dirty_bitmap = NULL;
+log_for_dirtybit = NULL;
+g_free(dirty_bitmap);
+dirty_bitmap = NULL;
 /* Disable dirty bit tracking */
 xen_track_dirty_vram(xen_domid, 0, 0, NULL);
 }
@@ -1022,9 +1019,9 @@ static void handle_vmport_ioreq(XenIOState *state, 
ioreq_t *req)
 {
 vmware_regs_t *vmport_regs;
 
-assert(state->shared_vmport_page);
+assert(shared_vmport_page);
 vmport_regs =
->shared_vmport_page->vcpu_vmport_regs[state->send_vcpu];
+_vmport_page->vcpu_vmport_regs[state->send_vcpu];
 QEMU_BUILD_BUG_ON(sizeof(*req) < sizeof(*vmport_regs));
 
 current_cpu = state->cpu_by_vcpu_id[state->send_vcpu];
@@ -1472,7 +1469,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 
 state->memory_listener = xen_memory_listener;

[QEMU][PATCH v7 00/10] Introduce xenpvh machine for arm architecture

2023-06-07 Thread Vikram Garhwal

Hi,
Rebased and resending the series with latest QEMU as it's been quite sometime.
There is one line code change in patch 04/10. Rest is just rebased with latest.

Also, this series has dependency on following gitlab-ci
patch: https://lists.nongnu.org/archive/html/qemu-devel/2023-06/msg01471.html.

I ran gitlab-ci and here are the successful build logs:
https://gitlab.com/Vikram.garhwal/qemu-ioreq/-/pipelines/891635328/

This series add xenpvh machine for aarch64. Motivation behind creating xenpvh
machine with IOREQ and TPM was to enable each guest on Xen aarch64 to have it's
own unique and emulated TPM.

This series does following:
1. Moved common xen functionalities from hw/i386/xen to hw/xen/ so those can
   be used for aarch64.
2. We added a minimal xenpvh arm machine which creates an IOREQ server and
   support TPM.

Also, checkpatch.pl fails for 03/12 and 06/12. These fails are due to
moving old code to new place which was not QEMU code style compatible.
No new add code was added.

Regards,
Vikram

ChangeLog:
v5->v7:
Change in PATCH 04/10:
Fix build error for cross compile case by adding
"#include "qemu/error-report.h" in include/hw/xen/xen-hvm-common.h
v4->v5:
Fix missing 3 lines of codes in xen_exit_notifier() due to rebase.
Fix 07/10 patch subject.

v3->v4:
Removed the out of series 04/12 patch.

v2->v3:
1. Change machine name to xenpvh as per Jurgen's input.
2. Add docs/system/xenpvh.rst documentation.
3. Removed GUEST_TPM_BASE and added tpm_base_address as property.
4. Correct CONFIG_TPM related issues.
5. Added xen_register_backend() function call to xen_register_ioreq().
6. Added Oleksandr's suggestion i.e. removed extra interface opening and
   used accel=xen option

v1 -> v2
Merged patch 05 and 06.
04/12: xen-hvm-common.c:
1. Moved xen_be_init() and xen_be_register_common() from
xen_register_ioreq() to xen_register_backend().
2. Changed g_malloc to g_new and perror -> error_setg_errno.
3. Created a local subroutine function for Xen_IOREQ_register.
4. Fixed build issues with inclusion of xenstore.h.
5. Fixed minor errors.

Stefano Stabellini (5):
  hw/i386/xen/xen-hvm: move x86-specific fields out of XenIOState
  xen-hvm: reorganize xen-hvm and move common function to xen-hvm-common
  include/hw/xen/xen_common: return error from xen_create_ioreq_server
  hw/xen/xen-hvm-common: skip ioreq creation on ioreq registration
failure
  meson.build: do not set have_xen_pci_passthrough for aarch64 targets

Vikram Garhwal (5):
  hw/i386/xen/: move xen-mapcache.c to hw/xen/
  hw/i386/xen: rearrange xen_hvm_init_pc
  hw/xen/xen-hvm-common: Use g_new and error_report
  hw/arm: introduce xenpvh machine
  meson.build: enable xenpv machine build for ARM

 docs/system/arm/xenpvh.rst   |   34 +
 docs/system/target-arm.rst   |1 +
 hw/arm/meson.build   |2 +
 hw/arm/xen_arm.c |  181 +
 hw/i386/meson.build  |1 +
 hw/i386/xen/meson.build  |1 -
 hw/i386/xen/trace-events |   19 -
 hw/i386/xen/xen-hvm.c| 1075 +++---
 hw/xen/meson.build   |7 +
 hw/xen/trace-events  |   19 +
 hw/xen/xen-hvm-common.c  |  879 
 hw/{i386 => }/xen/xen-mapcache.c |0
 include/hw/arm/xen_arch_hvm.h|9 +
 include/hw/i386/xen_arch_hvm.h   |   11 +
 include/hw/xen/arch_hvm.h|5 +
 include/hw/xen/xen-hvm-common.h  |   99 +++
 include/hw/xen/xen_native.h  |   13 +-
 meson.build  |4 +-
 18 files changed, 1350 insertions(+), 1010 deletions(-)
 create mode 100644 docs/system/arm/xenpvh.rst
 create mode 100644 hw/arm/xen_arm.c
 create mode 100644 hw/xen/xen-hvm-common.c
 rename hw/{i386 => }/xen/xen-mapcache.c (100%)
 create mode 100644 include/hw/arm/xen_arch_hvm.h
 create mode 100644 include/hw/i386/xen_arch_hvm.h
 create mode 100644 include/hw/xen/arch_hvm.h
 create mode 100644 include/hw/xen/xen-hvm-common.h

-- 
2.17.1

[QEMU][PATCH v7 02/10] hw/i386/xen: rearrange xen_hvm_init_pc

2023-06-07 Thread Vikram Garhwal

In preparation to moving most of xen-hvm code to an arch-neutral location,
move non IOREQ references to:
- xen_get_vmport_regs_pfn
- xen_suspend_notifier
- xen_wakeup_notifier
- xen_ram_init

towards the end of the xen_hvm_init_pc() function.

This is done to keep the common ioreq functions in one place which will be
moved to new function in next patch in order to make it common to both x86 and
aarch64 machines.

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Paul Durrant 
---
 hw/i386/xen/xen-hvm.c | 49 ++-
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index 56641a550e..5403ac4b89 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -1419,12 +1419,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 state->exit.notify = xen_exit_notifier;
 qemu_add_exit_notifier(>exit);
 
-state->suspend.notify = xen_suspend_notifier;
-qemu_register_suspend_notifier(>suspend);
-
-state->wakeup.notify = xen_wakeup_notifier;
-qemu_register_wakeup_notifier(>wakeup);
-
 /*
  * Register wake-up support in QMP query-current-machine API
  */
@@ -1435,23 +1429,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 goto err;
 }
 
-rc = xen_get_vmport_regs_pfn(xen_xc, xen_domid, _pfn);
-if (!rc) {
-DPRINTF("shared vmport page at pfn %lx\n", ioreq_pfn);
-state->shared_vmport_page =
-xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ|PROT_WRITE,
- 1, _pfn, NULL);
-if (state->shared_vmport_page == NULL) {
-error_report("map shared vmport IO page returned error %d 
handle=%p",
- errno, xen_xc);
-goto err;
-}
-} else if (rc != -ENOSYS) {
-error_report("get vmport regs pfn returned error %d, rc=%d",
- errno, rc);
-goto err;
-}
-
 /* Note: cpus is empty at this point in init */
 state->cpu_by_vcpu_id = g_new0(CPUState *, max_cpus);
 
@@ -1490,7 +1467,6 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 #else
 xen_map_cache_init(NULL, state);
 #endif
-xen_ram_init(pcms, ms->ram_size, ram_memory);
 
 qemu_add_vm_change_state_handler(xen_hvm_change_state_handler, state);
 
@@ -1511,6 +1487,31 @@ void xen_hvm_init_pc(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 QLIST_INIT(_physmap);
 xen_read_physmap(state);
 
+state->suspend.notify = xen_suspend_notifier;
+qemu_register_suspend_notifier(>suspend);
+
+state->wakeup.notify = xen_wakeup_notifier;
+qemu_register_wakeup_notifier(>wakeup);
+
+rc = xen_get_vmport_regs_pfn(xen_xc, xen_domid, _pfn);
+if (!rc) {
+DPRINTF("shared vmport page at pfn %lx\n", ioreq_pfn);
+state->shared_vmport_page =
+xenforeignmemory_map(xen_fmem, xen_domid, PROT_READ|PROT_WRITE,
+ 1, _pfn, NULL);
+if (state->shared_vmport_page == NULL) {
+error_report("map shared vmport IO page returned error %d 
handle=%p",
+ errno, xen_xc);
+goto err;
+}
+} else if (rc != -ENOSYS) {
+error_report("get vmport regs pfn returned error %d, rc=%d",
+ errno, rc);
+goto err;
+}
+
+xen_ram_init(pcms, ms->ram_size, ram_memory);
+
 /* Disable ACPI build because Xen handles it */
 pcms->acpi_build_enabled = false;
 
-- 
2.17.1

Re: [PATCH v2 06/12] aspeed/smc: Wire CS lines at reset

2023-06-07 Thread Cédric Le Goater


On 6/7/23 12:49, Joel Stanley wrote:

On Wed, 7 Jun 2023 at 04:40, Cédric Le Goater  wrote:


Currently, a set of default flash devices is created at machine init
and drives defined on the QEMU command line are associated to the FMC
and SPI controllers in sequence :

-drive file,format=raw,if=mtd
-drive file,format=raw,if=mtd

The CS lines are wired in the same creation loop. This makes a strong
assumption on the ordering and is not very flexible since only a
limited set of flash devices can be defined : 1 FMC + 1 or 2 SPI,
which is less than what the SoC really supports.

A better alternative would be to define the flash devices on the
command line using a blockdev attached to a CS line of a SSI bus :

 -blockdev node-name=fmc0,driver=file,filename=./flash.img
 -device mx66u51235f,addr=0x0,bus=ssi.0,drive=fmc0


I don't like the idea of making the command line more complicated

There are benefits to this change and patch 8 :

 - it is possible to define block backends out of order
 - it is possible to define *all* devices backends. Some machines support
   up to 8.
 - it is possible to use different flash models without adding new boards
 - as a consequence, the machine options "spi-model" and "fmc-model" can
   be deprecated. These were a clumsy interface.
 - with -nodefaults, the machine starts running by fetching instructions
   from the FMC0 device, which is what HW does.
 - and the machine option "execute-in-place" can be deprecated.


That is not a comment on this patch though, but it would be nice if we
could head towards decreasing the complexity.


Describing the devices on various buses comes at a cost.

Using -drive is still possible. It should be considered an optimization
loading the FMC0 contents as a ROM to speedup boot.

Thanks,

C.





However, user created flash devices are not correctly wired to their
SPI controller and consequently can not be used by the machine. Fix
that and wire the CS lines of all available devices when the SSI bus
is reset.

Signed-off-by: Cédric Le Goater 


Reviewed-by: Joel Stanley 



---
  hw/arm/aspeed.c | 5 +
  hw/ssi/aspeed_smc.c | 8 
  2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 76a1e7303de1..e5a49bb0b1a7 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -299,17 +299,14 @@ void aspeed_board_init_flashes(AspeedSMCState *s, const 
char *flashtype,

  for (i = 0; i < count; ++i) {
  DriveInfo *dinfo = drive_get(IF_MTD, 0, unit0 + i);
-qemu_irq cs_line;
  DeviceState *dev;

  dev = qdev_new(flashtype);
  if (dinfo) {
  qdev_prop_set_drive(dev, "drive", blk_by_legacy_dinfo(dinfo));
  }
+qdev_prop_set_uint8(dev, "addr", i);
  qdev_realize_and_unref(dev, BUS(s->spi), _fatal);
-
-cs_line = qdev_get_gpio_in_named(dev, SSI_GPIO_CS, 0);
-qdev_connect_gpio_out_named(DEVICE(s), "cs", i, cs_line);
  }
  }

diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c
index 72811693224d..2a4001b774a2 100644
--- a/hw/ssi/aspeed_smc.c
+++ b/hw/ssi/aspeed_smc.c
@@ -692,6 +692,14 @@ static void aspeed_smc_reset(DeviceState *d)
  memset(s->regs, 0, sizeof s->regs);
  }

+for (i = 0; i < asc->cs_num_max; i++) {
+DeviceState *dev = ssi_get_cs(s->spi, i);
+if (dev) {
+qemu_irq cs_line = qdev_get_gpio_in_named(dev, SSI_GPIO_CS, 0);
+qdev_connect_gpio_out_named(DEVICE(s), "cs", i, cs_line);
+}
+}
+
  /* Unselect all peripherals */
  for (i = 0; i < asc->cs_num_max; ++i) {
  s->regs[s->r_ctrl0 + i] |= CTRL_CE_STOP_ACTIVE;
--
2.40.1

Re: [PULL 0/5] misc ci fixes

2023-06-07 Thread Richard Henderson


On 6/7/23 08:40, Richard Henderson wrote:

The following changes since commit f5e6786de4815751b0a3d2235c760361f228ea48:

   Merge tag 'pull-target-arm-20230606' of 
https://git.linaro.org/people/pmaydell/qemu-arm into staging (2023-06-06 
12:11:34 -0700)

are available in the Git repository at:

   https://gitlab.com/rth7680/qemu.git tags/pull-ci-20230607

for you to fetch changes up to dcc28ab603f30df5cc8be1f759b423e94ae7d10f:

   iotests: fix 194: filter out racy postcopy-active event (2023-06-07 08:36:55 
-0700)


Fix TCI regressions vs Int128
Fix Arm build vs --disable-tcg
Fix iotest 194.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~

[RFC v2 4/6] target/mips: Add native library calls

2023-06-07 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 target/mips/helper.h|  6 
 target/mips/tcg/meson.build |  1 +
 target/mips/tcg/native_helper.c | 55 +
 target/mips/tcg/translate.c | 20 +++-
 target/mips/tcg/translate.h | 12 +++
 5 files changed, 93 insertions(+), 1 deletion(-)
 create mode 100644 target/mips/tcg/native_helper.c

diff --git a/target/mips/helper.h b/target/mips/helper.h
index de32d82e98..9fa949d78c 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -589,6 +589,12 @@ DEF_HELPER_FLAGS_3(dmthlip, 0, void, tl, tl, env)
 DEF_HELPER_FLAGS_3(wrdsp, 0, void, tl, tl, env)
 DEF_HELPER_FLAGS_2(rddsp, 0, tl, tl, env)
 
+#if defined(CONFIG_USER_ONLY)  && defined(CONFIG_USER_NATIVE_CALL)
+DEF_HELPER_1(native_memcpy, void, env)
+DEF_HELPER_1(native_memcmp, void, env)
+DEF_HELPER_1(native_memset, void, env)
+#endif
+
 #ifndef CONFIG_USER_ONLY
 #include "tcg/sysemu_helper.h.inc"
 #endif /* !CONFIG_USER_ONLY */
diff --git a/target/mips/tcg/meson.build b/target/mips/tcg/meson.build
index 7ee969ec8f..fb1ea64047 100644
--- a/target/mips/tcg/meson.build
+++ b/target/mips/tcg/meson.build
@@ -22,6 +22,7 @@ mips_ss.add(files(
   'txx9_translate.c',
   'vr54xx_helper.c',
   'vr54xx_translate.c',
+  'native_helper.c',
 ))
 mips_ss.add(when: 'TARGET_MIPS64', if_true: files(
   'tx79_translate.c',
diff --git a/target/mips/tcg/native_helper.c b/target/mips/tcg/native_helper.c
new file mode 100644
index 00..bfd9c92e17
--- /dev/null
+++ b/target/mips/tcg/native_helper.c
@@ -0,0 +1,55 @@
+/*
+ *  native function call helpers
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/helper-proto.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+
+#if defined(CONFIG_USER_ONLY)  && defined(CONFIG_USER_NATIVE_CALL)
+
+#define NATIVE_FN_W_3W()   \
+target_ulong arg0, arg1, arg2; \
+arg0 = env->active_tc.gpr[4]; /*"a0"*/ \
+arg1 = env->active_tc.gpr[5]; /*"a1"*/ \
+arg2 = env->active_tc.gpr[6]; /*"a2"*/
+
+void helper_native_memcpy(CPUMIPSState *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+void *ret;
+void *dest = g2h(cs, arg0);
+void *src = g2h(cs, arg1);
+size_t n = (size_t)arg2;
+ret = memcpy(dest, src, n);
+env->active_tc.gpr[2] = (target_ulong)h2g(ret);
+}
+
+void helper_native_memcmp(CPUMIPSState *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+int ret;
+void *s1 = g2h(cs, arg0);
+void *s2 = g2h(cs, arg1);
+size_t n = (size_t)arg2;
+ret = memcmp(s1, s2, n);
+env->active_tc.gpr[2] = ret;
+}
+
+void helper_native_memset(CPUMIPSState *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+void *ret;
+void *s = g2h(cs, arg0);
+int c = (int)arg1;
+size_t n = (size_t)arg2;
+ret = memset(s, c, n);
+env->active_tc.gpr[2] = (target_ulong)h2g(ret);
+}
+
+#endif
diff --git a/target/mips/tcg/translate.c b/target/mips/tcg/translate.c
index a6ca2e5a3b..d68ce6bc2f 100644
--- a/target/mips/tcg/translate.c
+++ b/target/mips/tcg/translate.c
@@ -36,6 +36,7 @@
 #include "qemu/qemu-print.h"
 #include "fpu_helper.h"
 #include "translate.h"
+#include "native/native-func.h"
 
 /*
  * Many sysemu-only helpers are not reachable for user-only.
@@ -13591,7 +13592,24 @@ static void decode_opc_special(CPUMIPSState *env, 
DisasContext *ctx)
 gen_helper_pmon(cpu_env, tcg_constant_i32(sa));
 #endif
 break;
-case OPC_SYSCALL:
+case OPC_SYSCALL:  /* 00 00 00 0C */
+if (native_bypass() && ctx->opcode) >> 24) & 0xff) == 0x1)) {
+uint16_t sig =  (ctx->opcode) >> 8 & 0x;
+switch (sig) {
+case NATIVE_MEMCPY:
+gen_helper_native_memcpy(cpu_env);
+break;
+case NATIVE_MEMSET:
+gen_helper_native_memset(cpu_env);
+break;
+case NATIVE_MEMCMP:
+gen_helper_native_memcmp(cpu_env);
+break;
+default:
+gen_reserved_instruction(ctx);
+}
+break;
+}
 generate_exception_end(ctx, EXCP_SYSCALL);
 break;
 case OPC_BREAK:
diff --git a/target/mips/tcg/translate.h b/target/mips/tcg/translate.h
index 69f85841d2..f0112d88aa 100644
--- a/target/mips/tcg/translate.h
+++ b/target/mips/tcg/translate.h
@@ -237,3 +237,15 @@ static inline bool cpu_is_bigendian(DisasContext *ctx)
 }
 
 #endif
+
+/*
+ * Check if the native bypass feature is enabled.
+ */
+static inline bool native_bypass(void)
+{
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_USER_NATIVE_CALL)
+return true;
+#else
+return false;
+#endif
+}
-- 
2.34.1

Re: [PATCH 2/3] migration/multifd: Protect accesses to migration_threads

2023-06-07 Thread Juan Quintela

Sounds good.

On Wed, Jun 7, 2023, 18:28 Peter Xu  wrote:

> On Wed, Jun 07, 2023 at 09:00:14AM -0300, Fabiano Rosas wrote:
> > >> diff --git a/migration/migration.c b/migration/migration.c
> > >> index e731fc98a1..b3b8345eb2 100644
> > >> --- a/migration/migration.c
> > >> +++ b/migration/migration.c
> > >> @@ -1146,6 +1146,7 @@ static void migrate_fd_cleanup(MigrationState
> *s)
> > >>  qemu_mutex_lock_iothread();
> > >>
> > >>  multifd_save_cleanup();
> > >> +qmp_migration_threads_cleanup();
> > >
> > > I think I will spare this one as the mutex is static, so we are not
> > > winning any memory back.
> > >
> >
> > Ok
>
> We could consider __attribute__((__constructor__)) in this case.
>
> --
> Peter Xu
>
>
>

[PATCH v4] 9pfs: prevent opening special files (CVE-2023-2861)

2023-06-07 Thread Christian Schoenebeck

The 9p protocol does not specifically define how server shall behave when
client tries to open a special file, however from security POV it does
make sense for 9p server to prohibit opening any special file on host side
in general. A sane Linux 9p client for instance would never attempt to
open a special file on host side, it would always handle those exclusively
on its guest side. A malicious client however could potentially escape
from the exported 9p tree by creating and opening a device file on host
side.

With QEMU this could only be exploited in the following unsafe setups:

  - Running QEMU binary as root AND 9p 'local' fs driver AND 'passthrough'
security model.

or

  - Using 9p 'proxy' fs driver (which is running its helper daemon as
root).

These setups were already discouraged for safety reasons before,
however for obvious reasons we are now tightening behaviour on this.

Fixes: CVE-2023-2861
Reported-by: Yanwu Shen 
Reported-by: Jietao Xiao 
Reported-by: Jinku Li 
Reported-by: Wenbo Shen 
Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 
---
 v3 -> v4:
 - Rename function check_is_regular_file_or_dir() ->
   close_if_special_file() and add detailed API comment.
 - Minor code style fix on open_regular().

 fsdev/virtfs-proxy-helper.c | 27 +++--
 hw/9pfs/9p-util.h   | 39 +
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/fsdev/virtfs-proxy-helper.c b/fsdev/virtfs-proxy-helper.c
index 5cafcd7703..d9511f429c 100644
--- a/fsdev/virtfs-proxy-helper.c
+++ b/fsdev/virtfs-proxy-helper.c
@@ -26,6 +26,7 @@
 #include "qemu/xattr.h"
 #include "9p-iov-marshal.h"
 #include "hw/9pfs/9p-proxy.h"
+#include "hw/9pfs/9p-util.h"
 #include "fsdev/9p-iov-marshal.h"
 
 #define PROGNAME "virtfs-proxy-helper"
@@ -338,6 +339,28 @@ static void resetugid(int suid, int sgid)
 }
 }
 
+/*
+ * Open regular file or directory. Attempts to open any special file are
+ * rejected.
+ *
+ * returns file descriptor or -1 on error
+ */
+static int open_regular(const char *pathname, int flags, mode_t mode)
+{
+int fd;
+
+fd = open(pathname, flags, mode);
+if (fd < 0) {
+return fd;
+}
+
+if (close_if_special_file(fd) < 0) {
+return -1;
+}
+
+return fd;
+}
+
 /*
  * send response in two parts
  * 1) ProxyHeader
@@ -682,7 +705,7 @@ static int do_create(struct iovec *iovec)
 if (ret < 0) {
 goto unmarshal_err_out;
 }
-ret = open(path.data, flags, mode);
+ret = open_regular(path.data, flags, mode);
 if (ret < 0) {
 ret = -errno;
 }
@@ -707,7 +730,7 @@ static int do_open(struct iovec *iovec)
 if (ret < 0) {
 goto err_out;
 }
-ret = open(path.data, flags);
+ret = open_regular(path.data, flags, 0);
 if (ret < 0) {
 ret = -errno;
 }
diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
index c314cf381d..df1b583a5e 100644
--- a/hw/9pfs/9p-util.h
+++ b/hw/9pfs/9p-util.h
@@ -13,6 +13,8 @@
 #ifndef QEMU_9P_UTIL_H
 #define QEMU_9P_UTIL_H
 
+#include "qemu/error-report.h"
+
 #ifdef O_PATH
 #define O_PATH_9P_UTIL O_PATH
 #else
@@ -95,6 +97,7 @@ static inline int errno_to_dotl(int err) {
 #endif
 
 #define qemu_openat openat
+#define qemu_fstat  fstat
 #define qemu_fstatatfstatat
 #define qemu_mkdiratmkdirat
 #define qemu_renameat   renameat
@@ -108,6 +111,38 @@ static inline void close_preserve_errno(int fd)
 errno = serrno;
 }
 
+/**
+ * close_if_special_file() - Close @fd if neither regular file nor directory.
+ *
+ * @fd: file descriptor of open file
+ * Return: 0 on regular file or directory, -1 otherwise
+ *
+ * CVE-2023-2861: Prohibit opening any special file directly on host
+ * (especially device files), as a compromised client could potentially gain
+ * access outside exported tree under certain, unsafe setups. We expect
+ * client to handle I/O on special files exclusively on guest side.
+ */
+static inline int close_if_special_file(int fd)
+{
+struct stat stbuf;
+
+if (qemu_fstat(fd, ) < 0) {
+close_preserve_errno(fd);
+return -1;
+}
+if (!S_ISREG(stbuf.st_mode) && !S_ISDIR(stbuf.st_mode)) {
+error_report_once(
+"9p: broken or compromised client detected; attempt to open "
+"special file (i.e. neither regular file, nor directory)"
+);
+close(fd);
+errno = ENXIO;
+return -1;
+}
+
+return 0;
+}
+
 static inline int openat_dir(int dirfd, const char *name)
 {
 return qemu_openat(dirfd, name,
@@ -142,6 +177,10 @@ again:
 return -1;
 }
 
+if (close_if_special_file(fd) < 0) {
+return -1;
+}
+
 serrno = errno;
 /* O_NONBLOCK was only needed to open the file. Let's drop it. We don't
  * do that with O_PATH since fcntl(F_SETFL) isn't supported, and openat()
-- 
2.30.2

[RFC v2 6/6] linux-user: Add '-native-bypass' option

2023-06-07 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 include/qemu/envlist.h |  1 +
 linux-user/main.c  | 23 +
 util/envlist.c | 56 ++
 3 files changed, 80 insertions(+)

diff --git a/include/qemu/envlist.h b/include/qemu/envlist.h
index 6006dfae44..865eb18e17 100644
--- a/include/qemu/envlist.h
+++ b/include/qemu/envlist.h
@@ -7,6 +7,7 @@ envlist_t *envlist_create(void);
 void envlist_free(envlist_t *);
 int envlist_setenv(envlist_t *, const char *);
 int envlist_unsetenv(envlist_t *, const char *);
+int envlist_appendenv(envlist_t *, const char *, const char *);
 int envlist_parse_set(envlist_t *, const char *);
 int envlist_parse_unset(envlist_t *, const char *);
 char **envlist_to_environ(const envlist_t *, size_t *);
diff --git a/linux-user/main.c b/linux-user/main.c
index 5e6b2e1714..313c116b3b 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -125,6 +125,8 @@ static void usage(int exitcode);
 static const char *interp_prefix = CONFIG_QEMU_INTERP_PREFIX;
 const char *qemu_uname_release;
 
+static const char *native_lib;
+
 #if !defined(TARGET_DEFAULT_STACK_SIZE)
 /* XXX: on x86 MAP_GROWSDOWN only works if ESP <= address + 32, so
we allocate a bigger stack. Need a better solution, for example
@@ -293,6 +295,13 @@ static void handle_arg_set_env(const char *arg)
 free(r);
 }
 
+#if defined(CONFIG_USER_ONLY)  && defined(CONFIG_USER_NATIVE_CALL)
+static void handle_arg_native_bypass(const char *arg)
+{
+native_lib = arg;
+}
+#endif
+
 static void handle_arg_unset_env(const char *arg)
 {
 char *r, *p, *token;
@@ -522,6 +531,10 @@ static const struct qemu_argument arg_table[] = {
  "",   "Generate a /tmp/perf-${pid}.map file for perf"},
 {"jitdump","QEMU_JITDUMP", false, handle_arg_jitdump,
  "",   "Generate a jit-${pid}.dump file for perf"},
+#if defined(CONFIG_USER_ONLY)  && defined(CONFIG_USER_NATIVE_CALL)
+{"native-bypass", "QEMU_NATIVE_BYPASS", true, handle_arg_native_bypass,
+ "",   "native bypass for library calls in user mode only."},
+#endif
 {NULL, NULL, false, NULL, NULL, NULL}
 };
 
@@ -826,6 +839,16 @@ int main(int argc, char **argv, char **envp)
 }
 }
 
+/* Set the library for native bypass  */
+if (native_lib != NULL) {
+char *token = malloc(strlen(native_lib) + 12);
+strcpy(token, "LD_PRELOAD=");
+strcat(token, native_lib);
+ if (envlist_appendenv(envlist, token, ":") != 0) {
+usage(EXIT_FAILURE);
+}
+}
+
 target_environ = envlist_to_environ(envlist, NULL);
 envlist_free(envlist);
 
diff --git a/util/envlist.c b/util/envlist.c
index db937c0427..713d52497e 100644
--- a/util/envlist.c
+++ b/util/envlist.c
@@ -201,6 +201,62 @@ envlist_unsetenv(envlist_t *envlist, const char *env)
 return (0);
 }
 
+/*
+ * Appends environment value to envlist. If the environment
+ * variable already exists, the new value is appended to the
+ * existing one.
+ *
+ * Returns 0 in success, errno otherwise.
+ */
+int
+envlist_appendenv(envlist_t *envlist, const char *env, const char *separator)
+{
+struct envlist_entry *entry = NULL;
+const char *eq_sign;
+size_t envname_len;
+
+if ((envlist == NULL) || (env == NULL)) {
+return (EINVAL);
+}
+
+/* find out first equals sign in given env */
+eq_sign = strchr(env, '=');
+if (eq_sign == NULL) {
+return (EINVAL);
+}
+envname_len = eq_sign - env + 1;
+
+/*
+ * If there already exists variable with given name,
+ * we append the new value to the existing one.
+ */
+for (entry = envlist->el_entries.lh_first; entry != NULL;
+entry = entry->ev_link.le_next) {
+if (strncmp(entry->ev_var, env, envname_len) == 0) {
+break;
+}
+}
+
+if (entry != NULL) {
+char *new_env_value = NULL;
+size_t new_env_len = strlen(entry->ev_var) + strlen(eq_sign)
++ strlen(separator) + 1;
+new_env_value = g_malloc(new_env_len);
+strcpy(new_env_value, entry->ev_var);
+strcat(new_env_value, separator);
+strcat(new_env_value, eq_sign + 1);
+g_free((char *)entry->ev_var);
+entry->ev_var = new_env_value;
+} else {
+envlist->el_count++;
+entry = g_malloc(sizeof(*entry));
+entry->ev_var = g_strdup(env);
+QLIST_INSERT_HEAD(>el_entries, entry, ev_link);
+}
+
+return (0);
+}
+
 /*
  * Returns given envlist as array of strings (in same form that
  * global variable environ is).  Caller must free returned memory
-- 
2.34.1

[RFC v2 2/6] Add the libnative library

2023-06-07 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 common-user/native/libnative.c | 65 ++
 include/native/libnative.h | 11 ++
 include/native/native-func.h   | 11 ++
 3 files changed, 87 insertions(+)
 create mode 100644 common-user/native/libnative.c
 create mode 100644 include/native/libnative.h
 create mode 100644 include/native/native-func.h

diff --git a/common-user/native/libnative.c b/common-user/native/libnative.c
new file mode 100644
index 00..d40e43c6fe
--- /dev/null
+++ b/common-user/native/libnative.c
@@ -0,0 +1,65 @@
+#include 
+#include 
+
+#include "native/libnative.h"
+#include "native/native-func.h"
+
+#define STR_MACRO(str) #str
+#define STR(num) STR_MACRO(num)
+
+#if defined(TARGET_I386) || defined(TARGET_X86_64)
+
+/* unused opcode */
+#define __PREFIX_INSTR \
+".byte 0x0f,0xff;"
+
+#define NATIVE_CALL_EXPR(func) \
+__PREFIX_INSTR \
+".word " STR(func) ";" : ::
+#endif
+
+#if defined(TARGET_ARM) || defined(TARGET_AARCH64)
+
+/* unused syscall number */
+#define __PREFIX_INSTR \
+"svc 0xff;"
+
+#define NATIVE_CALL_EXPR(func) \
+__PREFIX_INSTR \
+".word " STR(func) ";" : ::
+
+#endif
+
+#if defined(TARGET_MIPS) || defined(TARGET_MIPS64)
+
+/* unused bytes in syscall instructions */
+#define NATIVE_CALL_EXPR(func) \
+".long " STR((0x1 << 24) + (func << 8) + 0xC) ";" : ::
+
+#endif
+
+void *memcpy(void *dest, const void *src, size_t n)
+{
+__asm__ volatile(NATIVE_CALL_EXPR(NATIVE_MEMCPY));
+}
+
+int memcmp(const void *s1, const void *s2, size_t n)
+{
+__asm__ volatile(NATIVE_CALL_EXPR(NATIVE_MEMCMP));
+}
+void *memset(void *s, int c, size_t n)
+{
+__asm__ volatile(NATIVE_CALL_EXPR(NATIVE_MEMSET));
+}
+char *strcpy(char *dest, const char *src)
+{
+__asm__ volatile(NATIVE_CALL_EXPR(NATIVE_STRCPY));
+}
+int strcmp(const char *s1, const char *s2)
+{
+__asm__ volatile(NATIVE_CALL_EXPR(NATIVE_STRCMP));
+}
+char *strcat(char *dest, const char *src)
+{
+__asm__ volatile(NATIVE_CALL_EXPR(NATIVE_STRCAT));
+}
diff --git a/include/native/libnative.h b/include/native/libnative.h
new file mode 100644
index 00..d3c24f89f4
--- /dev/null
+++ b/include/native/libnative.h
@@ -0,0 +1,11 @@
+#ifndef __LIBNATIVE_H__
+#define __LIBNATIVE_H__
+
+void *memcpy(void *dest, const void *src, size_t n);
+int memcmp(const void *s1, const void *s2, size_t n);
+void *memset(void *s, int c, size_t n);
+char *strcpy(char *dest, const char *src);
+int strcmp(const char *s1, const char *s2);
+char *strcat(char *dest, const char *src);
+
+#endif /* __LIBNATIVE_H__ */
diff --git a/include/native/native-func.h b/include/native/native-func.h
new file mode 100644
index 00..d48a8e547a
--- /dev/null
+++ b/include/native/native-func.h
@@ -0,0 +1,11 @@
+#ifndef __NATIVE_FUNC_H__
+#define __NATIVE_FUNC_H__
+
+#define NATIVE_MEMCPY 0x1001
+#define NATIVE_MEMCMP 0x1002
+#define NATIVE_MEMSET 0x1003
+#define NATIVE_STRCPY 0x1004
+#define NATIVE_STRCMP 0x1005
+#define NATIVE_STRCAT 0x1006
+
+#endif
-- 
2.34.1

Re: [PATCH 0/1] update maintainers list for vfio-user & multi-process QEMU

2023-06-07 Thread Jag Raman




> On Jun 7, 2023, at 12:44 PM, Stefan Hajnoczi  wrote:
> 
> On Wed, 7 Jun 2023 at 11:58, Jagannathan Raman  wrote:
>> 
>> John Johnson doesn't work at Oracle anymore. I tried to contact him to
>> get his updated email address, but I haven't heard anything from him.
>> 
>> Jagannathan Raman (1):
>>  maintainers: update maintainers list for vfio-user & multi-process
>>QEMU
>> 
>> MAINTAINERS | 1 -
>> 1 file changed, 1 deletion(-)
> 
> JJ's last email to qemu-devel was in February 2023. Since he no longer
> works at Oracle, his email address is probably no longer functional.
> Therefore, I think it makes sense to remove him from MAINTAINERS for
> the time being. If he resumes work in this area he can be added back
> with a new email address.

I got it, thank you!

> 
> Reviewed-by: Stefan Hajnoczi

[RFC v2 5/6] target/arm: Add native library calls

2023-06-07 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 target/arm/helper.c| 47 ++
 target/arm/helper.h|  6 +
 target/arm/tcg/translate-a64.c | 22 
 target/arm/tcg/translate.c | 25 +-
 target/arm/tcg/translate.h | 19 ++
 5 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0b7fd2e7e6..03fbc3724b 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -25,6 +25,7 @@
 #include "sysemu/tcg.h"
 #include "qapi/error.h"
 #include "qemu/guest-random.h"
+#include "exec/cpu_ldst.h"
 #ifdef CONFIG_TCG
 #include "semihosting/common-semi.h"
 #endif
@@ -12045,3 +12046,49 @@ void aarch64_sve_change_el(CPUARMState *env, int 
old_el,
 }
 }
 #endif
+
+#if defined(CONFIG_USER_ONLY)  && defined(CONFIG_USER_NATIVE_CALL)
+
+#define NATIVE_FN_W_3W()   \
+target_ulong arg0, arg1, arg2; \
+arg0 = env->regs[0];   \
+arg1 = env->regs[1];   \
+arg2 = env->regs[2];
+
+void helper_native_memcpy(CPUARMState *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+void *ret;
+void *dest = g2h(cs, arg0);
+void *src = g2h(cs, arg1);
+size_t n = (size_t)arg2;
+ret = memcpy(dest, src, n);
+env->regs[0] = (target_ulong)h2g(ret);
+}
+
+void helper_native_memcmp(CPUARMState *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+int ret;
+void *s1 = g2h(cs, arg0);
+void *s2 = g2h(cs, arg1);
+size_t n = (size_t)arg2;
+ret = memcmp(s1, s2, n);
+env->regs[0] = ret;
+}
+
+void helper_native_memset(CPUARMState *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+void *ret;
+void *s = g2h(cs, arg0);
+int c = (int)arg1;
+size_t n = (size_t)arg2;
+ret = memset(s, c, n);
+env->regs[0] = (target_ulong)h2g(ret);
+}
+
+#endif
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 3335c2b10b..57144bf6fb 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -1038,6 +1038,12 @@ DEF_HELPER_FLAGS_5(gvec_uclamp_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_uclamp_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
 
+#if defined(CONFIG_USER_ONLY)  && defined(CONFIG_USER_NATIVE_CALL)
+DEF_HELPER_1(native_memcpy, void, env)
+DEF_HELPER_1(native_memcmp, void, env)
+DEF_HELPER_1(native_memset, void, env)
+#endif
+
 #ifdef TARGET_AARCH64
 #include "tcg/helper-a64.h"
 #include "tcg/helper-sve.h"
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 741a608739..04421af6c6 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -35,6 +35,7 @@
 #include "cpregs.h"
 #include "translate-a64.h"
 #include "qemu/atomic128.h"
+#include "native/native-func.h"
 
 static TCGv_i64 cpu_X[32];
 static TCGv_i64 cpu_pc;
@@ -2291,6 +2292,9 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 if (s->fgt_svc) {
 gen_exception_insn_el(s, 0, EXCP_UDEF, syndrome, 2);
 break;
+} else if (native_bypass() && imm16 == 0xff) {
+s->native_call_status = true;
+break;
 }
 gen_ss_advance(s);
 gen_exception_insn(s, 4, EXCP_SWI, syndrome);
@@ -14203,6 +14207,24 @@ static void aarch64_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 s->fp_access_checked = false;
 s->sve_access_checked = false;
 
+if (native_bypass() && s->native_call_status) {
+switch (insn) {
+case NATIVE_MEMCPY:
+gen_helper_native_memcpy(cpu_env);
+break;
+case NATIVE_MEMCMP:
+gen_helper_native_memcmp(cpu_env);
+break;
+case NATIVE_MEMSET:
+gen_helper_native_memset(cpu_env);
+break;
+default:
+unallocated_encoding(s);
+}
+s->native_call_status = false;
+return;
+}
+
 if (s->pstate_il) {
 /*
  * Illegal execution state. This has priority over BTI
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index 7468476724..83ce0f7437 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -34,7 +34,7 @@
 #include "exec/helper-gen.h"
 #include "exec/log.h"
 #include "cpregs.h"
-
+#include "native/native-func.h"
 
 #define ENABLE_ARCH_4Tarm_dc_feature(s, ARM_FEATURE_V4T)
 #define ENABLE_ARCH_5 arm_dc_feature(s, ARM_FEATURE_V5)
@@ -58,6 +58,10 @@ TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
 TCGv_i64 cpu_exclusive_addr;
 TCGv_i64 cpu_exclusive_val;
 
+#if defined(CONFIG_USER_ONLY) && !defined(TARGET_AARCH64)  \
+&& defined(CONFIG_USER_NATIVE_CALL)
+#endif
+
 #include "exec/gen-icount.h"
 
 static const char * const regnames[] =
@@ -8576,6 +8580,8 @@ static bool trans_SVC(DisasContext *s, arg_SVC *a)
 if (s->fgt_svc) {
 uint32_t syndrome = syn_aa32_svc(a->imm, s->thumb);

[RFC v2 1/6] build: Add configure options for native calls

2023-06-07 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 Makefile|  4 +++
 common-user/native/Makefile.include |  9 ++
 common-user/native/Makefile.target  | 22 +
 configure   | 50 +
 docs/devel/build-system.rst |  4 +++
 meson.build |  8 +
 meson_options.txt   |  2 ++
 scripts/meson-buildoptions.sh   |  4 +++
 8 files changed, 103 insertions(+)
 create mode 100644 common-user/native/Makefile.include
 create mode 100644 common-user/native/Makefile.target

diff --git a/Makefile b/Makefile
index 3c7d67142f..923da109bf 100644
--- a/Makefile
+++ b/Makefile
@@ -185,6 +185,10 @@ SUBDIR_MAKEFLAGS=$(if $(V),,--no-print-directory --quiet)
 
 include $(SRC_PATH)/tests/Makefile.include
 
+ifeq ($(CONFIG_USER_NATIVE),y)
+   include $(SRC_PATH)/common-user/native/Makefile.include
+endif
+
 all: recurse-all
 
 ROMS_RULES=$(foreach t, all clean distclean, $(addsuffix /$(t), $(ROMS)))
diff --git a/common-user/native/Makefile.include 
b/common-user/native/Makefile.include
new file mode 100644
index 00..40d20bcd4c
--- /dev/null
+++ b/common-user/native/Makefile.include
@@ -0,0 +1,9 @@
+.PHONY: build-native
+build-native: $(NATIVE_TARGETS:%=build-native-library-%)
+$(NATIVE_TARGETS:%=build-native-library-%): build-native-library-%:
+   $(call quiet-command, \
+   $(MAKE) -C common-user/native/$* $(SUBDIR_MAKEFLAGS), \
+   "BUILD","$* native library")
+# endif
+
+all: build-native
diff --git a/common-user/native/Makefile.target 
b/common-user/native/Makefile.target
new file mode 100644
index 00..1038367b37
--- /dev/null
+++ b/common-user/native/Makefile.target
@@ -0,0 +1,22 @@
+# -*- Mode: makefile -*-
+#
+# Library for native calls 
+#
+
+all:
+-include ../config-host.mak
+-include config-target.mak
+
+CFLAGS+=-I$(SRC_PATH)/include -O1 -fPIC -shared -fno-stack-protector
+LDFLAGS+=
+
+SRC = $(SRC_PATH)/common-user/native/libnative.c
+TARGET = libnative.so
+
+all: $(TARGET)
+
+$(TARGET): $(SRC)
+   $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
+
+clean:
+   rm -f $(TARGET)
diff --git a/configure b/configure
index 2a556d14c9..cc94d10c98 100755
--- a/configure
+++ b/configure
@@ -275,6 +275,7 @@ use_containers="yes"
 gdb_bin=$(command -v "gdb-multiarch" || command -v "gdb")
 gdb_arches=""
 werror=""
+user_native_call="disabled"
 
 # Don't accept a target_list environment variable.
 unset target_list
@@ -787,6 +788,10 @@ for opt do
   ;;
   --disable-vfio-user-server) vfio_user_server="disabled"
   ;;
+  --enable-user-native-call) user_native_call="enabled"
+  ;;
+  --disable-user-native-call) user_native_call="disabled"
+  ;;
   # everything else has the same name in configure and meson
   --*) meson_option_parse "$opt" "$optarg"
   ;;
@@ -1898,6 +1903,50 @@ if test "$tcg" = "enabled"; then
 fi
 )
 
+# common-user/native configuration
+native_flag_i386="-DTARGET_I386"
+native_flag_x86_64="-DTARGET_X86_64"
+native_flag_mips="-DTARGET_MIPS"
+native_flag_mips64="-DTARGET_MIPS64"
+native_flag_arm="-DTARGET_ARM"
+native_flag_aarch64="-DTARGET_AARCH64"
+
+(config_host_mak=common-user/native/config-host.mak
+mkdir -p common-user/native
+echo "# Automatically generated by configure - do not modify" > 
$config_host_mak
+echo "SRC_PATH=$source_path" >> $config_host_mak
+echo "HOST_CC=$host_cc" >> $config_host_mak
+
+native_targets=
+for target in $target_list; do
+  arch=${target%%-*}
+
+  case $target in
+*-linux-user|*-bsd-user)
+if probe_target_compiler $target || test -n "$container_image"; then
+mkdir -p "common-user/native/$target"
+config_target_mak=common-user/native/$target/config-target.mak
+ln -sf "$source_path/common-user/native/Makefile.target" 
"common-user/native/$target/Makefile"
+echo "# Automatically generated by configure - do not modify" > 
"$config_target_mak"
+echo "TARGET_NAME=$arch" >> "$config_target_mak"
+echo "TARGET=$target" >> "$config_target_mak"
+eval "target_native_flag=\${native_flag_$target_arch}"
+target_cflags="$target_cflags $target_native_flag"
+write_target_makefile "build-native-library-$target" >> 
"$config_target_mak"
+native_targets="$native_targets $target"
+fi
+  ;;
+  esac
+done
+
+# if native enabled
+if test "$user_native_call" = "enabled"; then
+echo "CONFIG_USER_NATIVE=y" >> config-host.mak
+echo "NATIVE_TARGETS=$native_targets" >> config-host.mak
+
+fi
+)
+
 if test "$skip_meson" = no; then
   cross="config-meson.cross.new"
   meson_quote() {
@@ -1980,6 +2029,7 @@ if test "$skip_meson" = no; then
   test "$smbd" != '' && meson_option_add "-Dsmbd=$smbd"
   test "$tcg" != enabled && meson_option_add "-Dtcg=$tcg"
   test "$vfio_user_server" != auto && meson_option_add 
"-Dvfio_user_server=$vfio_user_server"
+  test "$user_native_call" != auto && meson_option_add 
"-Duser_native_call=$user_native_call"
   run_meson() {

[RFC v2 0/6] Native Library Calls

2023-06-07 Thread Yeqi Fu

This patch introduces a set of feature instructions for native calls
and provides helpers to translate these instructions to corresponding
native functions. A shared library is also implemented, where native
functions are rewritten as feature instructions. At runtime, user
programs load the shared library, and feature instructions are
executed when native functions are called. This patch is applicable
to user programs with architectures x86, x86_64, arm, aarch64, mips,
and mips64. To build, compile libnative.c into a shared library for
the user program's architecture and run the
'../configure --enable-user-native-call && make' command.

Yeqi Fu (6):
  build: Add configure options for native calls
  Add the libnative library
  target/i386: Add native library calls
  target/mips: Add native library calls
  target/arm: Add native library calls
  linux-user: Add '-native-bypass' option

 Makefile |  4 ++
 common-user/native/Makefile.include  |  9 
 common-user/native/Makefile.target   | 22 ++
 common-user/native/libnative.c   | 65 
 configure| 50 +
 docs/devel/build-system.rst  |  4 ++
 include/native/libnative.h   | 11 +
 include/native/native-func.h | 11 +
 include/qemu/envlist.h   |  1 +
 linux-user/main.c| 23 ++
 meson.build  |  8 
 meson_options.txt|  2 +
 scripts/meson-buildoptions.sh|  4 ++
 target/arm/helper.c  | 47 
 target/arm/helper.h  |  6 +++
 target/arm/tcg/translate-a64.c   | 22 ++
 target/arm/tcg/translate.c   | 25 ++-
 target/arm/tcg/translate.h   | 19 
 target/i386/helper.h |  6 +++
 target/i386/tcg/translate.c  | 20 +
 target/i386/tcg/user/meson.build |  1 +
 target/i386/tcg/user/native_helper.c | 65 
 target/mips/helper.h |  6 +++
 target/mips/tcg/meson.build  |  1 +
 target/mips/tcg/native_helper.c  | 55 +++
 target/mips/tcg/translate.c  | 20 -
 target/mips/tcg/translate.h  | 12 +
 util/envlist.c   | 56 
 28 files changed, 573 insertions(+), 2 deletions(-)
 create mode 100644 common-user/native/Makefile.include
 create mode 100644 common-user/native/Makefile.target
 create mode 100644 common-user/native/libnative.c
 create mode 100644 include/native/libnative.h
 create mode 100644 include/native/native-func.h
 create mode 100644 target/i386/tcg/user/native_helper.c
 create mode 100644 target/mips/tcg/native_helper.c

-- 
2.34.1

[RFC v2 3/6] target/i386: Add native library calls

2023-06-07 Thread Yeqi Fu

Signed-off-by: Yeqi Fu 
---
 target/i386/helper.h |  6 +++
 target/i386/tcg/translate.c  | 20 +
 target/i386/tcg/user/meson.build |  1 +
 target/i386/tcg/user/native_helper.c | 65 
 4 files changed, 92 insertions(+)
 create mode 100644 target/i386/tcg/user/native_helper.c

diff --git a/target/i386/helper.h b/target/i386/helper.h
index e627a93107..6c91655887 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -221,3 +221,9 @@ DEF_HELPER_3(rcrq, tl, env, tl, tl)
 #endif
 
 DEF_HELPER_1(rdrand, tl, env)
+
+#if defined(CONFIG_USER_ONLY)  && defined(CONFIG_USER_NATIVE_CALL)
+DEF_HELPER_1(native_memcpy, void, env)
+DEF_HELPER_1(native_memcmp, void, env)
+DEF_HELPER_1(native_memset, void, env)
+#endif
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 91c9c0c478..eb0c1e9566 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -33,6 +33,7 @@
 #include "helper-tcg.h"
 
 #include "exec/log.h"
+#include "native/native-func.h"
 
 #define PREFIX_REPZ   0x01
 #define PREFIX_REPNZ  0x02
@@ -6806,6 +6807,25 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 case 0x1d0 ... 0x1fe:
 disas_insn_new(s, cpu, b);
 break;
+/* One unknown opcode for native call */
+#if defined(CONFIG_USER_ONLY)  && defined(CONFIG_USER_NATIVE_CALL)
+case 0x1ff:
+uint16_t sig = x86_lduw_code(env, s);
+switch (sig) {
+case NATIVE_MEMCPY:
+gen_helper_native_memcpy(cpu_env);
+break;
+case NATIVE_MEMSET:
+gen_helper_native_memset(cpu_env);
+break;
+case NATIVE_MEMCMP:
+gen_helper_native_memcmp(cpu_env);
+break;
+default:
+goto unknown_op;
+}
+break;
+#endif
 default:
 goto unknown_op;
 }
diff --git a/target/i386/tcg/user/meson.build b/target/i386/tcg/user/meson.build
index 1df6bc4343..490808bd65 100644
--- a/target/i386/tcg/user/meson.build
+++ b/target/i386/tcg/user/meson.build
@@ -1,4 +1,5 @@
 i386_user_ss.add(when: ['CONFIG_TCG', 'CONFIG_USER_ONLY'], if_true: files(
   'excp_helper.c',
   'seg_helper.c',
+  'native_helper.c',
 ))
diff --git a/target/i386/tcg/user/native_helper.c 
b/target/i386/tcg/user/native_helper.c
new file mode 100644
index 00..4a9b98eee2
--- /dev/null
+++ b/target/i386/tcg/user/native_helper.c
@@ -0,0 +1,65 @@
+/*
+ *  native function call helpers
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/helper-proto.h"
+#include "exec/exec-all.h"
+#include "exec/cpu_ldst.h"
+#include "tcg/helper-tcg.h"
+#include "tcg/seg_helper.h"
+
+#ifdef TARGET_X86_64
+#define NATIVE_FN_W_3W()   \
+target_ulong arg0, arg1, arg2; \
+arg0 = env->regs[R_EDI];   \
+arg1 = env->regs[R_ESI];   \
+arg2 = env->regs[R_EDX];
+#else
+/*
+ *  linux x86 has several calling conventions. The following implementation
+ *  is for the most commonly used cdecl calling convention.
+ */
+#define NATIVE_FN_W_3W()   \
+target_ulong arg0, arg1, arg2; \
+arg0 = *(target_ulong *)g2h(cs, env->regs[R_ESP] + 4); \
+arg1 = *(target_ulong *)g2h(cs, env->regs[R_ESP] + 8); \
+arg2 = *(target_ulong *)g2h(cs, env->regs[R_ESP] + 12);
+#endif
+
+void helper_native_memcpy(CPUX86State *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+void *ret;
+void *dest = g2h(cs, arg0);
+void *src = g2h(cs, arg1);
+size_t n = (size_t)arg2;
+ret = memcpy(dest, src, n);
+env->regs[R_EAX] = (target_ulong)h2g(ret);
+}
+
+void helper_native_memcmp(CPUX86State *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+int ret;
+void *s1 = g2h(cs, arg0);
+void *s2 = g2h(cs, arg1);
+size_t n = (size_t)arg2;
+ret = memcmp(s1, s2, n);
+env->regs[R_EAX] = ret;
+}
+
+void helper_native_memset(CPUX86State *env)
+{
+CPUState *cs = env_cpu(env);
+NATIVE_FN_W_3W();
+void *ret;
+void *s = g2h(cs, arg0);
+int c = (int)arg1;
+size_t n = (size_t)arg2;
+ret = memset(s, c, n);
+env->regs[R_EAX] = (target_ulong)h2g(ret);
+}
-- 
2.34.1

Re: [PATCH 0/1] update maintainers list for vfio-user & multi-process QEMU

2023-06-07 Thread Stefan Hajnoczi

On Wed, 7 Jun 2023 at 11:58, Jagannathan Raman  wrote:
>
> John Johnson doesn't work at Oracle anymore. I tried to contact him to
> get his updated email address, but I haven't heard anything from him.
>
> Jagannathan Raman (1):
>   maintainers: update maintainers list for vfio-user & multi-process
> QEMU
>
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)

JJ's last email to qemu-devel was in February 2023. Since he no longer
works at Oracle, his email address is probably no longer functional.
Therefore, I think it makes sense to remove him from MAINTAINERS for
the time being. If he resumes work in this area he can be added back
with a new email address.

Reviewed-by: Stefan Hajnoczi

Re: [PATCH V2] oslib: qemu_clear_cloexec

2023-06-07 Thread Steven Sistare

Hi Paolo,
  Can I get an RB from you on this patch, since you maintain posix?
This is needed for live update, to preserve vfio device descriptors and
character device descriptors across the exec of the new qemu binary.
If yes, I will rebase to the tip and repost a V3.

- Steve

On 2/7/2023 2:03 PM, Steve Sistare wrote:
> Define qemu_clear_cloexec, analogous to qemu_set_cloexec.  This will be
> used to preserve selected descriptors during cpr.
> 
> Signed-off-by: Steve Sistare 
> Reviewed-by: Dr. David Alan Gilbert 
> Reviewed-by: Marc-André Lureau 
> ---
>  include/qemu/osdep.h | 9 +
>  util/oslib-posix.c   | 9 +
>  util/oslib-win32.c   | 4 
>  3 files changed, 22 insertions(+)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 88c9fac..9c8c536 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -618,6 +618,15 @@ ssize_t qemu_write_full(int fd, const void *buf, size_t 
> count)
>  
>  void qemu_set_cloexec(int fd);
>  
> +/*
> + * Clear FD_CLOEXEC for a descriptor.
> + *
> + * The caller must guarantee that no other fork+exec's occur before the
> + * exec that is intended to inherit this descriptor, eg by suspending CPUs
> + * and blocking monitor commands.
> + */
> +void qemu_clear_cloexec(int fd);
> +
>  /* Return a dynamically allocated directory path that is appropriate for 
> storing
>   * local state.
>   *
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 59a891b..a8cc3d0 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -273,6 +273,15 @@ int qemu_socketpair(int domain, int type, int protocol, 
> int sv[2])
>  return ret;
>  }
>  
> +void qemu_clear_cloexec(int fd)
> +{
> +int f;
> +f = fcntl(fd, F_GETFD);
> +assert(f != -1);
> +f = fcntl(fd, F_SETFD, f & ~FD_CLOEXEC);
> +assert(f != -1);
> +}
> +
>  char *
>  qemu_get_local_state_dir(void)
>  {
> diff --git a/util/oslib-win32.c b/util/oslib-win32.c
> index 07ade41..756bee3 100644
> --- a/util/oslib-win32.c
> +++ b/util/oslib-win32.c
> @@ -222,6 +222,10 @@ void qemu_set_cloexec(int fd)
>  {
>  }
>  
> +void qemu_clear_cloexec(int fd)
> +{
> +}
> +
>  int qemu_get_thread_id(void)
>  {
>  return GetCurrentThreadId();

Re: [PATCH v2] target/riscv/vector_helper.c: Remove the check for extra tail elements

2023-06-07 Thread Weiwei Li




On 2023/6/7 17:16, Xiao Wang wrote:

Commit 752614cab8e6 ("target/riscv: rvv: Add tail agnostic for vector
load / store instructions") added an extra check for LMUL fragmentation,
intended for setting the "rest tail elements" in the last register for a
segment load insn.

Actually, the max_elements derived in vext_ld*() won't be a fraction of
vector register size, since the lmul encoded in desc is emul, which has
already been adjusted to 1 for LMUL fragmentation case by vext_get_emul()
in trans_rvv.c.inc, for ld_stride(), ld_us(), ld_index() and ldff().

Besides, vext_get_emul() has also taken EEW/SEW into consideration, so no
need to call vext_get_total_elems() which would base on the emul to derive
another emul, the second emul would be incorrect when esz differs from sew.

Thus this patch removes the check for extra tail elements.

Fixes: 752614cab8e6 ("target/riscv: rvv: Add tail agnostic for vector load / store 
instructions")

Signed-off-by: Xiao Wang 
---

Reviewed-by: Weiwei Li 

Weiwei Li

v2:
* Rebased on top of Alistair's riscv-to-apply.next branch.
---
  target/riscv/vector_helper.c | 22 ++
  1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 7505f9470a..f261e726c2 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -264,11 +264,10 @@ GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
  GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
  GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
  
-static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,

-   void *vd, uint32_t desc, uint32_t nf,
+static void vext_set_tail_elems_1s(target_ulong vl, void *vd,
+   uint32_t desc, uint32_t nf,
 uint32_t esz, uint32_t max_elems)
  {
-uint32_t total_elems, vlenb, registers_used;
  uint32_t vta = vext_vta(desc);
  int k;
  
@@ -276,19 +275,10 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,

  return;
  }
  
-total_elems = vext_get_total_elems(env, desc, esz);

-vlenb = riscv_cpu_cfg(env)->vlen >> 3;
-
  for (k = 0; k < nf; ++k) {
  vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
(k * max_elems + max_elems) * esz);
  }
-
-if (nf * max_elems % total_elems != 0) {
-registers_used = ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
-vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
-  registers_used * vlenb);
-}
  }
  
  /*

@@ -324,7 +314,7 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
  }
  env->vstart = 0;
  
-vext_set_tail_elems_1s(env, env->vl, vd, desc, nf, esz, max_elems);

+vext_set_tail_elems_1s(env->vl, vd, desc, nf, esz, max_elems);
  }
  
  #define GEN_VEXT_LD_STRIDE(NAME, ETYPE, LOAD_FN)\

@@ -383,7 +373,7 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
  }
  env->vstart = 0;
  
-vext_set_tail_elems_1s(env, evl, vd, desc, nf, esz, max_elems);

+vext_set_tail_elems_1s(evl, vd, desc, nf, esz, max_elems);
  }
  
  /*

@@ -504,7 +494,7 @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
  }
  env->vstart = 0;
  
-vext_set_tail_elems_1s(env, env->vl, vd, desc, nf, esz, max_elems);

+vext_set_tail_elems_1s(env->vl, vd, desc, nf, esz, max_elems);
  }
  
  #define GEN_VEXT_LD_INDEX(NAME, ETYPE, INDEX_FN, LOAD_FN)  \

@@ -634,7 +624,7 @@ ProbeSuccess:
  }
  env->vstart = 0;
  
-vext_set_tail_elems_1s(env, env->vl, vd, desc, nf, esz, max_elems);

+vext_set_tail_elems_1s(env->vl, vd, desc, nf, esz, max_elems);
  }
  
  #define GEN_VEXT_LDFF(NAME, ETYPE, LOAD_FN)   \

Re: [PATCH v5 1/3] hw/i386/pc: Refactor logic to set SMBIOS defaults

2023-06-07 Thread Igor Mammedov

On Tue, 6 Jun 2023 21:49:37 -0500
Suravee Suthikulpanit  wrote:

> Into a helper function pc_machine_init_smbios() in preparation for
> subsequent code to upgrade default SMBIOS entry point type.
> 
> Then, call the helper function from the pc_machine_initfn() to eliminate
> duplicate code in pc_q35.c and pc_pixx.c. However, this changes the
> ordering of when the smbios_set_defaults() is called to before
> pc_machine_set_smbios_ep() (i.e. before handling the user specified
> QEMU option "-M ...,smbios-entry-point-type=[32|64]" to override
> the default type.)
> 
> Therefore, also call the helper function in pc_machine_set_smbios_ep()
> to update the defaults.
> 
> There is no functional change.

with 2/3 amended as suggested, this patch is not necessary 
and 2/3 and 3/3 would do the job just fine

> 
> Signed-off-by: Suravee Suthikulpanit 
> ---
>  hw/i386/pc.c  | 24 +++-
>  hw/i386/pc_piix.c |  9 -
>  hw/i386/pc_q35.c  |  8 
>  3 files changed, 23 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index bb62c994fa..b720dc67b6 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1756,6 +1756,22 @@ static void 
> pc_machine_set_default_bus_bypass_iommu(Object *obj, bool value,
>  pcms->default_bus_bypass_iommu = value;
>  }
>  
> +static void pc_machine_init_smbios(PCMachineState *pcms)
> +{
> +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +MachineClass *mc = MACHINE_GET_CLASS(pcms);
> +
> +if (!pcmc->smbios_defaults) {
> +return;
> +}
> +
> +/* These values are guest ABI, do not change */
> +smbios_set_defaults("QEMU", mc->desc,
> +mc->name, pcmc->smbios_legacy_mode,
> +pcmc->smbios_uuid_encoded,
> +pcms->smbios_entry_point_type);
> +}
> +
>  static void pc_machine_get_smbios_ep(Object *obj, Visitor *v, const char 
> *name,
>   void *opaque, Error **errp)
>  {
> @@ -1768,9 +1784,14 @@ static void pc_machine_get_smbios_ep(Object *obj, 
> Visitor *v, const char *name,
>  static void pc_machine_set_smbios_ep(Object *obj, Visitor *v, const char 
> *name,
>   void *opaque, Error **errp)
>  {
> +SmbiosEntryPointType ep_type;
>  PCMachineState *pcms = PC_MACHINE(obj);
>  
> -visit_type_SmbiosEntryPointType(v, name, >smbios_entry_point_type, 
> errp);
> +if (!visit_type_SmbiosEntryPointType(v, name, _type, errp)) {
> +return;
> +}
> +pcms->smbios_entry_point_type = ep_type;
> +pc_machine_init_smbios(pcms);
>  }
>  
>  static void pc_machine_get_max_ram_below_4g(Object *obj, Visitor *v,
> @@ -1878,6 +1899,7 @@ static void pc_machine_initfn(Object *obj)
>  object_property_add_alias(OBJECT(pcms), "pcspk-audiodev",
>OBJECT(pcms->pcspk), "audiodev");
>  cxl_machine_init(obj, >cxl_devices_state);
> +pc_machine_init_smbios(pcms);
>  }
>  
>  int pc_machine_kvm_type(MachineState *machine, const char *kvm_type)
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index d5b0dcd1fe..da6ba4eeb4 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -198,15 +198,6 @@ static void pc_init1(MachineState *machine,
>  
>  pc_guest_info_init(pcms);
>  
> -if (pcmc->smbios_defaults) {
> -MachineClass *mc = MACHINE_GET_CLASS(machine);
> -/* These values are guest ABI, do not change */
> -smbios_set_defaults("QEMU", mc->desc,
> -mc->name, pcmc->smbios_legacy_mode,
> -pcmc->smbios_uuid_encoded,
> -pcms->smbios_entry_point_type);
> -}
> -
>  /* allocate ram and load rom/bios */
>  if (!xen_enabled()) {
>  pc_memory_init(pcms, system_memory, rom_memory, hole64_size);
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 6155427e48..a58cd1d3ea 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -198,14 +198,6 @@ static void pc_q35_init(MachineState *machine)
>  
>  pc_guest_info_init(pcms);
>  
> -if (pcmc->smbios_defaults) {
> -/* These values are guest ABI, do not change */
> -smbios_set_defaults("QEMU", mc->desc,
> -mc->name, pcmc->smbios_legacy_mode,
> -pcmc->smbios_uuid_encoded,
> -pcms->smbios_entry_point_type);
> -}
> -
>  /* create pci host bus */
>  q35_host = Q35_HOST_DEVICE(qdev_new(TYPE_Q35_HOST_DEVICE));
>

[PULL 5/5] iotests: fix 194: filter out racy postcopy-active event

2023-06-07 Thread Richard Henderson

From: Vladimir Sementsov-Ogievskiy 

The event is racy: it will not appear in the output if bitmap is
migrated during downtime period of migration and postcopy phase is not
started.

Fixes: ae00aa239847 "iotests: 194: test also migration of dirty bitmap"
Reported-by: Richard Henderson 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20230607143606.1557395-1-vsement...@yandex-team.ru>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 tests/qemu-iotests/194 | 5 +
 tests/qemu-iotests/194.out | 1 -
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index 68894371f5..c0ce82dd25 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -74,6 +74,11 @@ with iotests.FilePath('source.img') as source_img_path, \
 
 while True:
 event1 = source_vm.event_wait('MIGRATION')
+if event1['data']['status'] == 'postcopy-active':
+# This event is racy, it depends do we really do postcopy or bitmap
+# was migrated during downtime (and no data to migrate in postcopy
+# phase). So, don't log it.
+continue
 iotests.log(event1, filters=[iotests.filter_qmp_event])
 if event1['data']['status'] in ('completed', 'failed'):
 iotests.log('Gracefully ending the `drive-mirror` job on 
source...')
diff --git a/tests/qemu-iotests/194.out b/tests/qemu-iotests/194.out
index 4e6df1565a..376ed1d2e6 100644
--- a/tests/qemu-iotests/194.out
+++ b/tests/qemu-iotests/194.out
@@ -14,7 +14,6 @@ Starting migration...
 {"return": {}}
 {"data": {"status": "setup"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 {"data": {"status": "active"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"status": "postcopy-active"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 {"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 Gracefully ending the `drive-mirror` job on source...
 {"return": {}}
-- 
2.34.1

Re: [PATCH v2 5/8] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set

2023-06-07 Thread Niklas Cassel

On Wed, Jun 07, 2023 at 06:01:17PM +0200, Niklas Cassel wrote:
> On Mon, Jun 05, 2023 at 08:19:43PM -0400, John Snow wrote:
> > On Thu, Jun 1, 2023 at 9:46 AM Niklas Cassel  wrote:
> > >
> > > From: Niklas Cassel 
> > >
> > > For NCQ, PxCI is cleared on command queued successfully.
> > > For non-NCQ, PxCI is cleared on command completed successfully.
> > > Successfully means ERR_STAT, BUSY and DRQ are all cleared.
> > >
> > > A command that has ERR_STAT set, does not get to clear PxCI.
> > > See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
> > > and 5.3.16.5 ERR:FatalTaskfile.
> > >
> > > In the case of non-NCQ commands, not clearing PxCI is needed in order
> > > for host software to be able to see which command slot that failed.
> > >
> > > Signed-off-by: Niklas Cassel 
> > 
> > This patch causes the ahci test suite to hang. You might just need to
> > update the AHCI test suite.
> > 
> > "make check" will hang on the ahci-test as of this patch.
> 
> Argh :)
> 
> Is there any simple way to run only the ahci test suite?

To answer my own question:
QTEST_QEMU_BINARY=./build/qemu-system-x86_64 QTEST_QEMU_IMG=./build/qemu-img 
gtester -k --verbose -m=quick build/tests/qtest/ahci-test -o test_log.xml


Kind regards,
Niklas

[PULL 1/3] meson: fix "static build" entry in summary

2023-06-07 Thread Paolo Bonzini

Fixes: a0cbd2e8496 ("meson: use prefer_static option", 2023-05-18)
Signed-off-by: Paolo Bonzini 
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 553c8e0b9c5..c03326c922e 100644
--- a/meson.build
+++ b/meson.build
@@ -4088,7 +4088,7 @@ summary_info += {'QEMU_LDFLAGS':  ' 
'.join(qemu_ldflags)}
 summary_info += {'profiler':  get_option('profiler')}
 summary_info += {'link-time optimization (LTO)': get_option('b_lto')}
 summary_info += {'PIE':   get_option('b_pie')}
-summary_info += {'static build':  config_host.has_key('CONFIG_STATIC')}
+summary_info += {'static build':  get_option('prefer_static')}
 summary_info += {'malloc trim support': has_malloc_trim}
 summary_info += {'membarrier':have_membarrier}
 summary_info += {'debug graph lock':  get_option('debug_graph_lock')}
-- 
2.40.1

[PULL 2/3] configure: check for $download value properly

2023-06-07 Thread Paolo Bonzini

From: Michal Privoznik 

If configure was invoked with --disable-download and git
submodules were not checked out a warning is produced and the
configure script fails. But the $download variable (which
reflects the enable/disable download argument) is checked for in
a weird fashion:

  test -f "$download" = disabled

Drop the '-f' to check for the actual value of the variable.

Fixes: 2019cabfee0 ("meson: subprojects: replace submodules with wrap files", 
2023-06-06)
Signed-off-by: Michal Privoznik 
Signed-off-by: Paolo Bonzini 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 8765b88e12f..8a638dd82ae 100755
--- a/configure
+++ b/configure
@@ -767,7 +767,7 @@ if test "$plugins" = "yes" -a "$tcg" = "disabled"; then
 fi
 
 if ! test -f "$source_path/subprojects/keycodemapdb/README" \
-&& test -f "$download" = disabled
+&& test "$download" = disabled
 then
 echo
 echo "ERROR: missing subprojects"
-- 
2.40.1

[PULL 3/3] tests: fp: remove unused submodules

2023-06-07 Thread Paolo Bonzini

tests/fp/berkeley-softfloat-3 and tests/fp/berkeley-testfloat-3
have been replaced by subprojects, so remove the now-unnecessary
submodules.

Reported-by: Michal Privoznik 
Signed-off-by: Paolo Bonzini 
---
 tests/fp/berkeley-softfloat-3 | 1 -
 tests/fp/berkeley-testfloat-3 | 1 -
 2 files changed, 2 deletions(-)
 delete mode 16 tests/fp/berkeley-softfloat-3
 delete mode 16 tests/fp/berkeley-testfloat-3

diff --git a/tests/fp/berkeley-softfloat-3 b/tests/fp/berkeley-softfloat-3
deleted file mode 16
index b64af41c327..000
--- a/tests/fp/berkeley-softfloat-3
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit b64af41c3276f97f0e181920400ee056b9c88037
diff --git a/tests/fp/berkeley-testfloat-3 b/tests/fp/berkeley-testfloat-3
deleted file mode 16
index 40619cbb3bf..000
--- a/tests/fp/berkeley-testfloat-3
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit 40619cbb3bf32872df8c53cc457039229428a263
-- 
2.40.1

[PULL 3/5] target/arm: Only include tcg/oversized-guest.h if CONFIG_TCG

2023-06-07 Thread Richard Henderson

Fixes the build for --disable-tcg.

This header is only needed for cross-hosting.  Without CONFIG_TCG,
we know this is an AArch64 host, CONFIG_ATOMIC64 will be set, and
the TCG_OVERSIZED_GUEST block will never be compiled.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index b2dc223525..37bcb17a9e 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -14,8 +14,9 @@
 #include "cpu.h"
 #include "internals.h"
 #include "idau.h"
-#include "tcg/oversized-guest.h"
-
+#ifdef CONFIG_TCG
+# include "tcg/oversized-guest.h"
+#endif
 
 typedef struct S1Translate {
 ARMMMUIdx in_mmu_idx;
-- 
2.34.1

[PULL 5/6] target/tricore: Fix wrong PSW for call insns

2023-06-07 Thread Bastian Koppelmann

we were copying PSW into a local variable, updated PSW.CDE in the local
and never wrote it back. So when we called save_context_upper() we were
using the non-local version of PSW which did not contain the updated
PSW.CDE.

Signed-off-by: Bastian Koppelmann 
Message-Id: <20230526061946.54514-6-kbast...@mail.uni-paderborn.de>
---
 target/tricore/op_helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/tricore/op_helper.c b/target/tricore/op_helper.c
index 6fd2cbe20f..54f54811d9 100644
--- a/target/tricore/op_helper.c
+++ b/target/tricore/op_helper.c
@@ -2447,6 +2447,8 @@ void helper_call(CPUTriCoreState *env, uint32_t next_pc)
 }
 /* PSW.CDE = 1;*/
 psw |= MASK_PSW_CDE;
+psw_write(env, psw);
+
 /* tmp_FCX = FCX; */
 tmp_FCX = env->FCX;
 /* EA = {FCX.FCXS, 6'b0, FCX.FCXO, 6'b0}; */
-- 
2.40.1

[PULL 6/6] tests/tcg/tricore: Add recursion test for CSAs

2023-06-07 Thread Bastian Koppelmann

Signed-off-by: Bastian Koppelmann 
Message-Id: <20230526061946.54514-7-kbast...@mail.uni-paderborn.de>
---
 tests/tcg/tricore/Makefile.softmmu-target |  3 ++-
 tests/tcg/tricore/c/test_context_save_areas.c | 15 +++
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/tricore/c/test_context_save_areas.c

diff --git a/tests/tcg/tricore/Makefile.softmmu-target 
b/tests/tcg/tricore/Makefile.softmmu-target
index f051444991..aff7c1b580 100644
--- a/tests/tcg/tricore/Makefile.softmmu-target
+++ b/tests/tcg/tricore/Makefile.softmmu-target
@@ -4,7 +4,7 @@ C_TESTS_PATH = $(TESTS_PATH)/c
 
 LDFLAGS = -T$(TESTS_PATH)/link.ld --mcpu=tc162
 ASFLAGS = -mtc162
-CFLAGS = -mtc162 -c
+CFLAGS = -mtc162 -c -I$(TESTS_PATH)
 
 TESTS += test_abs.asm.tst
 TESTS += test_bmerge.asm.tst
@@ -23,6 +23,7 @@ TESTS += test_msub.asm.tst
 TESTS += test_muls.asm.tst
 
 TESTS += test_boot_to_main.c.tst
+TESTS += test_context_save_areas.c.tst
 
 QEMU_OPTS += -M tricore_testboard -cpu tc27x -nographic -kernel
 
diff --git a/tests/tcg/tricore/c/test_context_save_areas.c 
b/tests/tcg/tricore/c/test_context_save_areas.c
new file mode 100644
index 00..a300ee2f9c
--- /dev/null
+++ b/tests/tcg/tricore/c/test_context_save_areas.c
@@ -0,0 +1,15 @@
+#include "testdev_assert.h"
+
+static int fib(int n)
+{
+if (n == 1 || n == 2) {
+return 1;
+}
+return fib(n - 2) + fib(n - 1);
+}
+
+int main(int argc, char **argv)
+{
+testdev_assert(fib(10) == 55);
+return 0;
+}
-- 
2.40.1

[PULL 3/6] tests/tcg/tricore: Add first C program

2023-06-07 Thread Bastian Koppelmann

this allows us to exercise the startup code used by GCC to call main().

Signed-off-by: Bastian Koppelmann 
Message-Id: <20230526061946.54514-4-kbast...@mail.uni-paderborn.de>
---
 configure |   1 +
 tests/tcg/tricore/Makefile.softmmu-target |  13 +
 tests/tcg/tricore/c/crt0-tc2x.S   | 335 ++
 tests/tcg/tricore/c/test_boot_to_main.c   |  13 +
 tests/tcg/tricore/c/testdev_assert.h  |  18 ++
 tests/tcg/tricore/link.ld |  16 ++
 6 files changed, 396 insertions(+)
 create mode 100644 tests/tcg/tricore/c/crt0-tc2x.S
 create mode 100644 tests/tcg/tricore/c/test_boot_to_main.c
 create mode 100644 tests/tcg/tricore/c/testdev_assert.h

diff --git a/configure b/configure
index 8765b88e12..768e674633 100755
--- a/configure
+++ b/configure
@@ -1383,6 +1383,7 @@ probe_target_compiler() {
 container_cross_prefix=tricore-
 container_cross_as=tricore-as
 container_cross_ld=tricore-ld
+container_cross_cc=tricore-gcc
 break
 ;;
   x86_64)
diff --git a/tests/tcg/tricore/Makefile.softmmu-target 
b/tests/tcg/tricore/Makefile.softmmu-target
index 29c75acfb3..f051444991 100644
--- a/tests/tcg/tricore/Makefile.softmmu-target
+++ b/tests/tcg/tricore/Makefile.softmmu-target
@@ -1,8 +1,10 @@
 TESTS_PATH = $(SRC_PATH)/tests/tcg/tricore
 ASM_TESTS_PATH = $(TESTS_PATH)/asm
+C_TESTS_PATH = $(TESTS_PATH)/c
 
 LDFLAGS = -T$(TESTS_PATH)/link.ld --mcpu=tc162
 ASFLAGS = -mtc162
+CFLAGS = -mtc162 -c
 
 TESTS += test_abs.asm.tst
 TESTS += test_bmerge.asm.tst
@@ -20,6 +22,8 @@ TESTS += test_madd.asm.tst
 TESTS += test_msub.asm.tst
 TESTS += test_muls.asm.tst
 
+TESTS += test_boot_to_main.c.tst
+
 QEMU_OPTS += -M tricore_testboard -cpu tc27x -nographic -kernel
 
 %.pS: $(ASM_TESTS_PATH)/%.S
@@ -31,5 +35,14 @@ QEMU_OPTS += -M tricore_testboard -cpu tc27x -nographic 
-kernel
 %.asm.tst: %.o
$(LD) $(LDFLAGS) $< -o $@
 
+crt0-tc2x.o: $(C_TESTS_PATH)/crt0-tc2x.S
+   $(AS) $(ASFLAGS) -o $@ $<
+
+%.o: $(C_TESTS_PATH)/%.c
+   $(CC) $(CFLAGS) -o $@ $<
+
+%.c.tst: %.o crt0-tc2x.o
+   $(LD) $(LDFLAGS) -o $@ $^
+
 # We don't currently support the multiarch system tests
 undefine MULTIARCH_TESTS
diff --git a/tests/tcg/tricore/c/crt0-tc2x.S b/tests/tcg/tricore/c/crt0-tc2x.S
new file mode 100644
index 00..3100da123c
--- /dev/null
+++ b/tests/tcg/tricore/c/crt0-tc2x.S
@@ -0,0 +1,335 @@
+/*
+ * crt0-tc2x.S -- Startup code for GNU/TriCore applications.
+ *
+ * Copyright (C) 1998-2014 HighTec EDV-Systeme GmbH.
+ *
+ * This file is part of GCC.
+ *
+ * GCC is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3, or (at your option)
+ * any later version.
+ *
+ * GCC is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Under Section 7 of GPL version 3, you are granted additional
+ * permissions described in the GCC Runtime Library Exception, version
+ * 3.1, as published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License and
+ * a copy of the GCC Runtime Library Exception along with this program;
+ * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+ * .  */
+
+/* Define the Derivate Name as a hexvalue. This value
+ * is built-in defined in tricore-c.c (from tricore-devices.c)
+ * the derivate number as a hexvalue (e.g. TC1796 => 0x1796
+ * This name will be used in the memory.x Memory description to
+ * to confirm that the crt0.o and the memory.x will be get from
+ * same directory
+ */
+.section ".startup_code", "ax", @progbits
+.global _start
+.type _start,@function
+
+/* default BMI header (only TC2xxx devices) */
+.word   0x
+.word   0xb3590070
+.word   0x
+.word   0x
+.word   0x
+.word   0x
+.word   0x791eb864
+.word   0x86e1479b
+
+_start:
+.code32
+j   _startaddr
+.align  2
+
+_startaddr:
+/*
+ * initialize user and interrupt stack pointers
+ */
+movh.a  %sp,hi:__USTACK # load %sp
+lea %sp,[%sp]lo:__USTACK
+movh%d0,hi:__ISTACK # load $isp
+addi%d0,%d0,lo:__ISTACK
+mtcr$isp,%d0
+isync
+
+#;  install trap handlers
+
+movh%d0,hi:first_trap_table #; load $btv
+addi%d0,%d0,lo:first_trap_table
+mtcr$btv,%d0
+isync
+
+/*
+ * initialize call depth counter
+ */
+
+mfcr%d0,$psw
+or  %d0,%d0,0x7f# disable call depth counting
+andn%d0,%d0,0x80# clear CDE bit
+mtcr$psw,%d0
+isync
+
+/*
+ * initialize access to system global registers
+

[PULL 4/6] target/tricore: Refactor PCXI/ICR register fields

2023-06-07 Thread Bastian Koppelmann

starting from ISA version 1.6.1 (previously known as 1.6P/E), some
bitfields in PCXI and ICR have changed. We also refactor these
registers using the register fields API.

Signed-off-by: Bastian Koppelmann 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1453
Message-Id: <20230526061946.54514-5-kbast...@mail.uni-paderborn.de>
---
 target/tricore/cpu.h   | 39 -
 target/tricore/helper.c| 45 
 target/tricore/op_helper.c | 85 +++---
 target/tricore/translate.c | 10 -
 4 files changed, 123 insertions(+), 56 deletions(-)

diff --git a/target/tricore/cpu.h b/target/tricore/cpu.h
index 47d0ffb745..d98a3fb671 100644
--- a/target/tricore/cpu.h
+++ b/target/tricore/cpu.h
@@ -21,6 +21,7 @@
 #define TRICORE_CPU_H
 
 #include "cpu-qom.h"
+#include "hw/registerfields.h"
 #include "exec/cpu-defs.h"
 #include "qemu/cpu-float.h"
 #include "tricore-defs.h"
@@ -199,13 +200,33 @@ struct ArchCPU {
 hwaddr tricore_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
 void tricore_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
 
-
-#define MASK_PCXI_PCPN 0xff00
-#define MASK_PCXI_PIE_1_3  0x0080
-#define MASK_PCXI_PIE_1_6  0x0020
-#define MASK_PCXI_UL   0x0040
-#define MASK_PCXI_PCXS 0x000f
-#define MASK_PCXI_PCXO 0x
+FIELD(PCXI, PCPN_13, 24, 8)
+FIELD(PCXI, PCPN_161, 22, 8)
+FIELD(PCXI, PIE_13, 23, 1)
+FIELD(PCXI, PIE_161, 21, 1)
+FIELD(PCXI, UL_13, 22, 1)
+FIELD(PCXI, UL_161, 20, 1)
+FIELD(PCXI, PCXS, 16, 4)
+FIELD(PCXI, PCXO, 0, 16)
+uint32_t pcxi_get_ul(CPUTriCoreState *env);
+uint32_t pcxi_get_pie(CPUTriCoreState *env);
+uint32_t pcxi_get_pcpn(CPUTriCoreState *env);
+uint32_t pcxi_get_pcxs(CPUTriCoreState *env);
+uint32_t pcxi_get_pcxo(CPUTriCoreState *env);
+void pcxi_set_ul(CPUTriCoreState *env, uint32_t val);
+void pcxi_set_pie(CPUTriCoreState *env, uint32_t val);
+void pcxi_set_pcpn(CPUTriCoreState *env, uint32_t val);
+
+FIELD(ICR, IE_161, 15, 1)
+FIELD(ICR, IE_13, 8, 1)
+FIELD(ICR, PIPN, 16, 8)
+FIELD(ICR, CCPN, 0, 8)
+
+uint32_t icr_get_ie(CPUTriCoreState *env);
+uint32_t icr_get_ccpn(CPUTriCoreState *env);
+
+void icr_set_ccpn(CPUTriCoreState *env, uint32_t val);
+void icr_set_ie(CPUTriCoreState *env, uint32_t val);
 
 #define MASK_PSW_USB 0xff00
 #define MASK_USB_C   0x8000
@@ -228,10 +249,6 @@ void tricore_cpu_dump_state(CPUState *cpu, FILE *f, int 
flags);
 #define MASK_CPUID_MOD_32B 0xff00
 #define MASK_CPUID_REV 0x00ff
 
-#define MASK_ICR_PIPN 0x00ff
-#define MASK_ICR_IE_1_3   0x0100
-#define MASK_ICR_IE_1_6   0x8000
-#define MASK_ICR_CCPN 0x00ff
 
 #define MASK_FCX_FCXS 0x000f
 #define MASK_FCX_FCXO 0x
diff --git a/target/tricore/helper.c b/target/tricore/helper.c
index 114685cce4..284a749e50 100644
--- a/target/tricore/helper.c
+++ b/target/tricore/helper.c
@@ -17,6 +17,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "hw/registerfields.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "fpu/softfloat-helpers.h"
@@ -152,3 +153,47 @@ void psw_write(CPUTriCoreState *env, uint32_t val)
 
 fpu_set_state(env);
 }
+
+#define FIELD_GETTER_WITH_FEATURE(NAME, REG, FIELD, FEATURE) \
+uint32_t NAME(CPUTriCoreState *env) \
+{\
+if (tricore_feature(env, TRICORE_FEATURE_##FEATURE)) {   \
+return FIELD_EX32(env->REG, REG, FIELD ## _ ## FEATURE); \
+}\
+return FIELD_EX32(env->REG, REG, FIELD ## _13);  \
+}
+
+#define FIELD_GETTER(NAME, REG, FIELD)   \
+uint32_t NAME(CPUTriCoreState *env) \
+{\
+return FIELD_EX32(env->REG, REG, FIELD); \
+}
+
+#define FIELD_SETTER_WITH_FEATURE(NAME, REG, FIELD, FEATURE)  \
+void NAME(CPUTriCoreState *env, uint32_t val)\
+{ \
+if (tricore_feature(env, TRICORE_FEATURE_##FEATURE)) {\
+env->REG = FIELD_DP32(env->REG, REG, FIELD ## _ ## FEATURE, val); \
+} \
+env->REG = FIELD_DP32(env->REG, REG, FIELD ## _13, val);  \
+}
+
+#define FIELD_SETTER(NAME, REG, FIELD)\
+void NAME(CPUTriCoreState *env, uint32_t val)\
+{ \
+env->REG = FIELD_DP32(env->REG, REG, FIELD, val); \
+}
+
+FIELD_GETTER_WITH_FEATURE(pcxi_get_pcpn, PCXI, PCPN, 161)
+FIELD_SETTER_WITH_FEATURE(pcxi_set_pcpn, PCXI, PCPN, 161)
+FIELD_GETTER_WITH_FEATURE(pcxi_get_pie, PCXI, PIE, 161)
+FIELD_SETTER_WITH_FEATURE(pcxi_set_pie, PCXI, PIE, 161)
+FIELD_GETTER_WITH_FEATURE(pcxi_get_ul, PCXI, UL, 161)
+FIELD_SETTER_WITH_FEATURE(pcxi_set_ul, PCXI, UL, 161)
+FIELD_GETTER(pcxi_get_pcxs, PCXI, PCXS)

[PULL 0/6] tricore queue

2023-06-07 Thread Bastian Koppelmann

The following changes since commit f5e6786de4815751b0a3d2235c760361f228ea48:

  Merge tag 'pull-target-arm-20230606' of 
https://git.linaro.org/people/pmaydell/qemu-arm into staging (2023-06-06 
12:11:34 -0700)

are available in the Git repository at:

  https://github.com/bkoppelmann/qemu.git tags/pull-tricore-20230607

for you to fetch changes up to e926c94171ae37397c8c4b54cef60e5c7ebbf243:

  tests/tcg/tricore: Add recursion test for CSAs (2023-06-07 18:20:51 +0200)


- Refactor PCXI/ICR field handling in newer ISA versions
- Add simple tests written in C


Bastian Koppelmann (6):
  tests/tcg/tricore: Move asm tests into 'asm' directory
  tests/tcg/tricore: Uses label for memory addresses
  tests/tcg/tricore: Add first C program
  target/tricore: Refactor PCXI/ICR register fields
  target/tricore: Fix wrong PSW for call insns
  tests/tcg/tricore: Add recursion test for CSAs

 configure |   1 +
 target/tricore/cpu.h  |  39 ++-
 target/tricore/helper.c   |  45 
 target/tricore/op_helper.c|  87 +++
 target/tricore/translate.c|  10 +-
 tests/tcg/tricore/Makefile.softmmu-target |  49 ++--
 tests/tcg/tricore/{ => asm}/macros.h  |   1 -
 tests/tcg/tricore/{ => asm}/test_abs.S|   0
 tests/tcg/tricore/{ => asm}/test_bmerge.S |   0
 tests/tcg/tricore/{ => asm}/test_clz.S|   0
 tests/tcg/tricore/{ => asm}/test_dextr.S  |   0
 tests/tcg/tricore/{ => asm}/test_dvstep.S |   0
 tests/tcg/tricore/{ => asm}/test_fadd.S   |   0
 tests/tcg/tricore/{ => asm}/test_fmul.S   |   0
 tests/tcg/tricore/{ => asm}/test_ftoi.S   |   0
 tests/tcg/tricore/{ => asm}/test_imask.S  |   0
 tests/tcg/tricore/{ => asm}/test_insert.S |   0
 tests/tcg/tricore/{ => asm}/test_ld_bu.S  |   4 +-
 tests/tcg/tricore/asm/test_ld_h.S |  15 ++
 tests/tcg/tricore/{ => asm}/test_madd.S   |   0
 tests/tcg/tricore/{ => asm}/test_msub.S   |   0
 tests/tcg/tricore/{ => asm}/test_muls.S   |   0
 tests/tcg/tricore/c/crt0-tc2x.S   | 335 ++
 tests/tcg/tricore/c/test_boot_to_main.c   |  13 +
 tests/tcg/tricore/c/test_context_save_areas.c |  15 ++
 tests/tcg/tricore/c/testdev_assert.h  |  18 ++
 tests/tcg/tricore/link.ld |  16 ++
 tests/tcg/tricore/test_ld_h.S |  15 --
 28 files changed, 572 insertions(+), 91 deletions(-)
 rename tests/tcg/tricore/{ => asm}/macros.h (99%)
 rename tests/tcg/tricore/{ => asm}/test_abs.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_bmerge.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_clz.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_dextr.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_dvstep.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_fadd.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_fmul.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_ftoi.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_imask.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_insert.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_ld_bu.S (68%)
 create mode 100644 tests/tcg/tricore/asm/test_ld_h.S
 rename tests/tcg/tricore/{ => asm}/test_madd.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_msub.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_muls.S (100%)
 create mode 100644 tests/tcg/tricore/c/crt0-tc2x.S
 create mode 100644 tests/tcg/tricore/c/test_boot_to_main.c
 create mode 100644 tests/tcg/tricore/c/test_context_save_areas.c
 create mode 100644 tests/tcg/tricore/c/testdev_assert.h
 delete mode 100644 tests/tcg/tricore/test_ld_h.S

[PULL 2/6] tests/tcg/tricore: Uses label for memory addresses

2023-06-07 Thread Bastian Koppelmann

the linker might rearrange sections, so lets reference memory by label
name instead of addr + off.

Signed-off-by: Bastian Koppelmann 
Message-Id: <20230526061946.54514-3-kbast...@mail.uni-paderborn.de>
---
 tests/tcg/tricore/asm/macros.h | 1 -
 tests/tcg/tricore/asm/test_ld_bu.S | 4 ++--
 tests/tcg/tricore/asm/test_ld_h.S  | 8 
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/tests/tcg/tricore/asm/macros.h b/tests/tcg/tricore/asm/macros.h
index 3df2e0de82..b5087b5c97 100644
--- a/tests/tcg/tricore/asm/macros.h
+++ b/tests/tcg/tricore/asm/macros.h
@@ -25,7 +25,6 @@
 
 #define AREG_ADDR %a0
 #define AREG_CORRECT_RESULT %a3
-#define MEM_BASE_ADDR 0xd000
 
 #define DREG_DEV_ADDR %a15
 
diff --git a/tests/tcg/tricore/asm/test_ld_bu.S 
b/tests/tcg/tricore/asm/test_ld_bu.S
index ff9dac128b..4a1f40c37b 100644
--- a/tests/tcg/tricore/asm/test_ld_bu.S
+++ b/tests/tcg/tricore/asm/test_ld_bu.S
@@ -9,7 +9,7 @@ _start:
 #expect. addr reg val after load
 #   insn  num  expect. load value |  pattern for loading
 # || ||  |
-TEST_LD(ld.bu, 1, 0xff, MEM_BASE_ADDR + 4, [+AREG_ADDR]4) # pre_inc
-TEST_LD(ld.bu, 2, 0xad, MEM_BASE_ADDR + 4, [AREG_ADDR+]4) # post_inc
+TEST_LD(ld.bu, 1, 0xff, test_data + 4, [+AREG_ADDR]4) # pre_inc
+TEST_LD(ld.bu, 2, 0xad, test_data + 4, [AREG_ADDR+]4) # post_inc
 
 TEST_PASSFAIL
diff --git a/tests/tcg/tricore/asm/test_ld_h.S 
b/tests/tcg/tricore/asm/test_ld_h.S
index d3c157a046..f5e4959198 100644
--- a/tests/tcg/tricore/asm/test_ld_h.S
+++ b/tests/tcg/tricore/asm/test_ld_h.S
@@ -7,9 +7,9 @@ test_data:
 .global _start
 _start:
 #   expect. addr reg val after load
-#  insn  num  expect. load value |  pattern for loading
-#|| ||  |
-TEST_LD(ld.h, 1, 0xaffe, MEM_BASE_ADDR, [AREG_ADDR]2)
-TEST_LD_SRO(ld.h, 2, 0x22ff, MEM_BASE_ADDR, [AREG_ADDR]4)
+#  insn  num expect. load value |pattern for loading
+#|| |   |  |
+TEST_LD(ld.h, 1, 0xaffe, test_data, [AREG_ADDR]2)
+TEST_LD_SRO(ld.h, 2, 0x22ff, test_data, [AREG_ADDR]4)
 
 TEST_PASSFAIL
-- 
2.40.1

[PULL 1/6] tests/tcg/tricore: Move asm tests into 'asm' directory

2023-06-07 Thread Bastian Koppelmann

this seperates these tests from the upcoming tests written in C.
Also rename the compiled test to 'test_.asm.tst'.

Signed-off-by: Bastian Koppelmann 
Message-Id: <20230526061946.54514-2-kbast...@mail.uni-paderborn.de>
---
 tests/tcg/tricore/Makefile.softmmu-target | 35 ---
 tests/tcg/tricore/{ => asm}/macros.h  |  0
 tests/tcg/tricore/{ => asm}/test_abs.S|  0
 tests/tcg/tricore/{ => asm}/test_bmerge.S |  0
 tests/tcg/tricore/{ => asm}/test_clz.S|  0
 tests/tcg/tricore/{ => asm}/test_dextr.S  |  0
 tests/tcg/tricore/{ => asm}/test_dvstep.S |  0
 tests/tcg/tricore/{ => asm}/test_fadd.S   |  0
 tests/tcg/tricore/{ => asm}/test_fmul.S   |  0
 tests/tcg/tricore/{ => asm}/test_ftoi.S   |  0
 tests/tcg/tricore/{ => asm}/test_imask.S  |  0
 tests/tcg/tricore/{ => asm}/test_insert.S |  0
 tests/tcg/tricore/{ => asm}/test_ld_bu.S  |  0
 tests/tcg/tricore/{ => asm}/test_ld_h.S   |  0
 tests/tcg/tricore/{ => asm}/test_madd.S   |  0
 tests/tcg/tricore/{ => asm}/test_msub.S   |  0
 tests/tcg/tricore/{ => asm}/test_muls.S   |  0
 17 files changed, 18 insertions(+), 17 deletions(-)
 rename tests/tcg/tricore/{ => asm}/macros.h (100%)
 rename tests/tcg/tricore/{ => asm}/test_abs.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_bmerge.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_clz.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_dextr.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_dvstep.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_fadd.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_fmul.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_ftoi.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_imask.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_insert.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_ld_bu.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_ld_h.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_madd.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_msub.S (100%)
 rename tests/tcg/tricore/{ => asm}/test_muls.S (100%)

diff --git a/tests/tcg/tricore/Makefile.softmmu-target 
b/tests/tcg/tricore/Makefile.softmmu-target
index 49e573bc3b..29c75acfb3 100644
--- a/tests/tcg/tricore/Makefile.softmmu-target
+++ b/tests/tcg/tricore/Makefile.softmmu-target
@@ -1,33 +1,34 @@
 TESTS_PATH = $(SRC_PATH)/tests/tcg/tricore
+ASM_TESTS_PATH = $(TESTS_PATH)/asm
 
 LDFLAGS = -T$(TESTS_PATH)/link.ld --mcpu=tc162
 ASFLAGS = -mtc162
 
-TESTS += test_abs.tst
-TESTS += test_bmerge.tst
-TESTS += test_clz.tst
-TESTS += test_dextr.tst
-TESTS += test_dvstep.tst
-TESTS += test_fadd.tst
-TESTS += test_fmul.tst
-TESTS += test_ftoi.tst
-TESTS += test_imask.tst
-TESTS += test_insert.tst
-TESTS += test_ld_bu.tst
-TESTS += test_ld_h.tst
-TESTS += test_madd.tst
-TESTS += test_msub.tst
-TESTS += test_muls.tst
+TESTS += test_abs.asm.tst
+TESTS += test_bmerge.asm.tst
+TESTS += test_clz.asm.tst
+TESTS += test_dextr.asm.tst
+TESTS += test_dvstep.asm.tst
+TESTS += test_fadd.asm.tst
+TESTS += test_fmul.asm.tst
+TESTS += test_ftoi.asm.tst
+TESTS += test_imask.asm.tst
+TESTS += test_insert.asm.tst
+TESTS += test_ld_bu.asm.tst
+TESTS += test_ld_h.asm.tst
+TESTS += test_madd.asm.tst
+TESTS += test_msub.asm.tst
+TESTS += test_muls.asm.tst
 
 QEMU_OPTS += -M tricore_testboard -cpu tc27x -nographic -kernel
 
-%.pS: $(TESTS_PATH)/%.S
+%.pS: $(ASM_TESTS_PATH)/%.S
$(HOST_CC) -E -o $@ $<
 
 %.o: %.pS
$(AS) $(ASFLAGS) -o $@ $<
 
-%.tst: %.o
+%.asm.tst: %.o
$(LD) $(LDFLAGS) $< -o $@
 
 # We don't currently support the multiarch system tests
diff --git a/tests/tcg/tricore/macros.h b/tests/tcg/tricore/asm/macros.h
similarity index 100%
rename from tests/tcg/tricore/macros.h
rename to tests/tcg/tricore/asm/macros.h
diff --git a/tests/tcg/tricore/test_abs.S b/tests/tcg/tricore/asm/test_abs.S
similarity index 100%
rename from tests/tcg/tricore/test_abs.S
rename to tests/tcg/tricore/asm/test_abs.S
diff --git a/tests/tcg/tricore/test_bmerge.S 
b/tests/tcg/tricore/asm/test_bmerge.S
similarity index 100%
rename from tests/tcg/tricore/test_bmerge.S
rename to tests/tcg/tricore/asm/test_bmerge.S
diff --git a/tests/tcg/tricore/test_clz.S b/tests/tcg/tricore/asm/test_clz.S
similarity index 100%
rename from tests/tcg/tricore/test_clz.S
rename to tests/tcg/tricore/asm/test_clz.S
diff --git a/tests/tcg/tricore/test_dextr.S b/tests/tcg/tricore/asm/test_dextr.S
similarity index 100%
rename from tests/tcg/tricore/test_dextr.S
rename to tests/tcg/tricore/asm/test_dextr.S
diff --git a/tests/tcg/tricore/test_dvstep.S 
b/tests/tcg/tricore/asm/test_dvstep.S
similarity index 100%
rename from tests/tcg/tricore/test_dvstep.S
rename to tests/tcg/tricore/asm/test_dvstep.S
diff --git a/tests/tcg/tricore/test_fadd.S b/tests/tcg/tricore/asm/test_fadd.S
similarity index 100%
rename from tests/tcg/tricore/test_fadd.S
rename to tests/tcg/tricore/asm/test_fadd.S
diff --git a/tests/tcg/tricore/test_fmul.S b/tests/tcg/tricore/asm/test_fmul.S
similarity index 100%
rename from tests/tcg/tricore/test_fmul.S
rename to

Re: [PATCH v2 0/3] migration: Fix multifd cancel test

2023-06-07 Thread Peter Xu

On Wed, Jun 07, 2023 at 01:13:03PM -0300, Fabiano Rosas wrote:
> Fabiano Rosas (3):
>   migration/multifd: Rename threadinfo.c functions
>   migration/multifd: Protect accesses to migration_threads
>   tests/qtest: Re-enable multifd cancel test

Reviewed-by: Peter Xu 

-- 
Peter Xu

[PATCH v2 0/3] migration: Fix multifd cancel test

2023-06-07 Thread Fabiano Rosas

v2:
- patch 1: dropped the qmp_ prefix;

- patch 2: dropped the qemu_mutex_destroy;

   stopped moving the _remove functions (don't strictly need it
   anymore since not destroying the mutex explicitly);

   added the lock to protect the loop in
   qmp_query_migrationthreads;

   added __attribute__((constructor)).

CI run: https://gitlab.com/farosas/qemu/-/pipelines/892563231

v1:
https://lore.kernel.org/r/20230606144551.24367-1-faro...@suse.de

When doing cleanup of the multifd send threads we're calling
QLIST_REMOVE concurrently on the migration_threads list. This seems to
be the source of the crashes we've seen on the
multifd/tcp/plain/cancel tests.

I'm running the test in a loop and after a few dozen iterations I see
the crash in dmesg.

  QTEST_QEMU_BINARY=./qemu-system-x86_64 \
  QEMU_TEST_FLAKY_TESTS=1 \
  ./tests/qtest/migration-test -p /x86_64/migration/multifd/tcp/plain/cancel

  multifdsend_10[11382]: segfault at 18 ip 564b77de1e25 sp
  7fdf767fb610 error 6 in qemu-system-x86_64[564b777b4000+e1c000]
  Code: ec 10 48 89 7d f8 48 83 7d f8 00 74 58 48 8b 45 f8 48 8b 40 10
  48 85 c0 74 14 48 8b 45 f8 48 8b 40 10 48 8b 55 f8 48 8b 52 18 <48> 89
  50 18 48 8b 45 f8 48 8b 40 18 48 8b 55 f8 48 8b 52 10 48 89

the offending instruction is a mov dereferencing the
thread->node.le_next pointer at QLIST_REMOVE in MigrationThreadDel:

  void MigrationThreadDel(MigrationThread *thread)
  {
  if (thread) {
  QLIST_REMOVE(thread, node);
  g_free(thread);
  }
  }

where:
  #define QLIST_REMOVE(elm, field) do {   \
  if ((elm)->field.le_next != NULL)   \
  (elm)->field.le_next->field.le_prev =   \ <-- HERE
  (elm)->field.le_prev;   \
  *(elm)->field.le_prev = (elm)->field.le_next;   \
  (elm)->field.le_next = NULL;\
  (elm)->field.le_prev = NULL;\
  } while (/*CONSTCOND*/0)

The MigrationThreadDel function is called from the multifd threads and
is not under any lock, so several calls can race when accessing the
list.

(I actually hit this first on my fixed-ram branch which changes some
synchronization in multifd and makes the issue more frequent)

CI run: https://gitlab.com/farosas/qemu/-/pipelines/891000519

Fabiano Rosas (3):
  migration/multifd: Rename threadinfo.c functions
  migration/multifd: Protect accesses to migration_threads
  tests/qtest: Re-enable multifd cancel test

 migration/migration.c|  4 ++--
 migration/multifd.c  |  4 ++--
 migration/threadinfo.c   | 19 ---
 migration/threadinfo.h   |  7 ++-
 tests/qtest/migration-test.c | 10 ++
 5 files changed, 24 insertions(+), 20 deletions(-)

-- 
2.35.3

[PATCH v2 3/3] tests/qtest: Re-enable multifd cancel test

2023-06-07 Thread Fabiano Rosas

We've found the source of flakiness in this test, so re-enable it.

Reviewed-by: Juan Quintela 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b0c355bbd9..800ad23b75 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2778,14 +2778,8 @@ int main(int argc, char **argv)
 }
 qtest_add_func("/migration/multifd/tcp/plain/none",
test_multifd_tcp_none);
-/*
- * This test is flaky and sometimes fails in CI and otherwise:
- * don't run unless user opts in via environment variable.
- */
-if (getenv("QEMU_TEST_FLAKY_TESTS")) {
-qtest_add_func("/migration/multifd/tcp/plain/cancel",
-   test_multifd_tcp_cancel);
-}
+qtest_add_func("/migration/multifd/tcp/plain/cancel",
+   test_multifd_tcp_cancel);
 qtest_add_func("/migration/multifd/tcp/plain/zlib",
test_multifd_tcp_zlib);
 #ifdef CONFIG_ZSTD
-- 
2.35.3

[PATCH v2 2/3] migration/multifd: Protect accesses to migration_threads

2023-06-07 Thread Fabiano Rosas

This doubly linked list is common for all the multifd and migration
threads so we need to avoid concurrent access.

Add a mutex to protect the data from concurrent access. This fixes a
crash when removing two MigrationThread objects from the list at the
same time during cleanup of multifd threads.

Fixes: 671326201d ("migration: Introduce interface query-migrationthreads")
Signed-off-by: Fabiano Rosas 
---
 migration/threadinfo.c | 15 ++-
 migration/threadinfo.h |  2 --
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/migration/threadinfo.c b/migration/threadinfo.c
index 3dd9b14ae6..262990dd75 100644
--- a/migration/threadinfo.c
+++ b/migration/threadinfo.c
@@ -10,23 +10,35 @@
  *  See the COPYING file in the top-level directory.
  */
 
+#include "qemu/osdep.h"
+#include "qemu/queue.h"
+#include "qemu/lockable.h"
 #include "threadinfo.h"
 
+QemuMutex migration_threads_lock;
 static QLIST_HEAD(, MigrationThread) migration_threads;
 
+static void __attribute__((constructor)) migration_threads_init(void)
+{
+qemu_mutex_init(_threads_lock);
+}
+
 MigrationThread *migration_threads_add(const char *name, int thread_id)
 {
 MigrationThread *thread =  g_new0(MigrationThread, 1);
 thread->name = name;
 thread->thread_id = thread_id;
 
-QLIST_INSERT_HEAD(_threads, thread, node);
+WITH_QEMU_LOCK_GUARD(_threads_lock) {
+QLIST_INSERT_HEAD(_threads, thread, node);
+}
 
 return thread;
 }
 
 void migration_threads_remove(MigrationThread *thread)
 {
+QEMU_LOCK_GUARD(_threads_lock);
 if (thread) {
 QLIST_REMOVE(thread, node);
 g_free(thread);
@@ -39,6 +51,7 @@ MigrationThreadInfoList *qmp_query_migrationthreads(Error 
**errp)
 MigrationThreadInfoList **tail = 
 MigrationThread *thread = NULL;
 
+QEMU_LOCK_GUARD(_threads_lock);
 QLIST_FOREACH(thread, _threads, node) {
 MigrationThreadInfo *info = g_new0(MigrationThreadInfo, 1);
 info->name = g_strdup(thread->name);
diff --git a/migration/threadinfo.h b/migration/threadinfo.h
index 8aa6999d58..2f356ff312 100644
--- a/migration/threadinfo.h
+++ b/migration/threadinfo.h
@@ -10,8 +10,6 @@
  *  See the COPYING file in the top-level directory.
  */
 
-#include "qemu/queue.h"
-#include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-migration.h"
 
-- 
2.35.3

[PATCH v2 1/3] migration/multifd: Rename threadinfo.c functions

2023-06-07 Thread Fabiano Rosas

We're about to add more functions to this file so make it use the same
coding style as the rest of the code.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Juan Quintela 
Reviewed-by: Philippe Mathieu-Daudé 
---
 migration/migration.c  | 4 ++--
 migration/multifd.c| 4 ++--
 migration/threadinfo.c | 4 ++--
 migration/threadinfo.h | 5 ++---
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index dc05c6f6ea..3a001dd042 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2922,7 +2922,7 @@ static void *migration_thread(void *opaque)
 MigThrError thr_error;
 bool urgent = false;
 
-thread = MigrationThreadAdd("live_migration", qemu_get_thread_id());
+thread = migration_threads_add("live_migration", qemu_get_thread_id());
 
 rcu_register_thread();
 
@@ -3000,7 +3000,7 @@ static void *migration_thread(void *opaque)
 migration_iteration_finish(s);
 object_unref(OBJECT(s));
 rcu_unregister_thread();
-MigrationThreadDel(thread);
+migration_threads_remove(thread);
 return NULL;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 3387d8277f..4c6cee6547 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -651,7 +651,7 @@ static void *multifd_send_thread(void *opaque)
 int ret = 0;
 bool use_zero_copy_send = migrate_zero_copy_send();
 
-thread = MigrationThreadAdd(p->name, qemu_get_thread_id());
+thread = migration_threads_add(p->name, qemu_get_thread_id());
 
 trace_multifd_send_thread_start(p->id);
 rcu_register_thread();
@@ -767,7 +767,7 @@ out:
 qemu_mutex_unlock(>mutex);
 
 rcu_unregister_thread();
-MigrationThreadDel(thread);
+migration_threads_remove(thread);
 trace_multifd_send_thread_end(p->id, p->num_packets, 
p->total_normal_pages);
 
 return NULL;
diff --git a/migration/threadinfo.c b/migration/threadinfo.c
index 1de8b31855..3dd9b14ae6 100644
--- a/migration/threadinfo.c
+++ b/migration/threadinfo.c
@@ -14,7 +14,7 @@
 
 static QLIST_HEAD(, MigrationThread) migration_threads;
 
-MigrationThread *MigrationThreadAdd(const char *name, int thread_id)
+MigrationThread *migration_threads_add(const char *name, int thread_id)
 {
 MigrationThread *thread =  g_new0(MigrationThread, 1);
 thread->name = name;
@@ -25,7 +25,7 @@ MigrationThread *MigrationThreadAdd(const char *name, int 
thread_id)
 return thread;
 }
 
-void MigrationThreadDel(MigrationThread *thread)
+void migration_threads_remove(MigrationThread *thread)
 {
 if (thread) {
 QLIST_REMOVE(thread, node);
diff --git a/migration/threadinfo.h b/migration/threadinfo.h
index 4d69423c0a..8aa6999d58 100644
--- a/migration/threadinfo.h
+++ b/migration/threadinfo.h
@@ -23,6 +23,5 @@ struct MigrationThread {
 QLIST_ENTRY(MigrationThread) node;
 };
 
-MigrationThread *MigrationThreadAdd(const char *name, int thread_id);
-
-void MigrationThreadDel(MigrationThread *info);
+MigrationThread *migration_threads_add(const char *name, int thread_id);
+void migration_threads_remove(MigrationThread *info);
-- 
2.35.3

Re: [PATCH] vdpa: fix not using CVQ buffer in case of error

2023-06-07 Thread Michael Tokarev


07.06.2023 16:52, Eugenio Perez Martin wrote:

On Wed, Jun 7, 2023 at 12:11 PM Michael Tokarev  wrote:

..

Again, smells like a stable material, is it not?

Please Cc: qemu-sta...@nongnu.org for other changes you think should be
applied to stable qemu series.


Sorry, I totally forgot. This one should go to stable, yes.


That's okay, nothing to be sorry about. You did a good job
already fixing the issues.

Queued up.

Thank you!

/mjt

Re: [PATCH v2 5/8] hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set

2023-06-07 Thread Niklas Cassel

On Mon, Jun 05, 2023 at 08:19:43PM -0400, John Snow wrote:
> On Thu, Jun 1, 2023 at 9:46 AM Niklas Cassel  wrote:
> >
> > From: Niklas Cassel 
> >
> > For NCQ, PxCI is cleared on command queued successfully.
> > For non-NCQ, PxCI is cleared on command completed successfully.
> > Successfully means ERR_STAT, BUSY and DRQ are all cleared.
> >
> > A command that has ERR_STAT set, does not get to clear PxCI.
> > See AHCI 1.3.1, section 5.3.8, states RegFIS:Entry and RegFIS:ClearCI,
> > and 5.3.16.5 ERR:FatalTaskfile.
> >
> > In the case of non-NCQ commands, not clearing PxCI is needed in order
> > for host software to be able to see which command slot that failed.
> >
> > Signed-off-by: Niklas Cassel 
> 
> This patch causes the ahci test suite to hang. You might just need to
> update the AHCI test suite.
> 
> "make check" will hang on the ahci-test as of this patch.

Argh :)

Is there any simple way to run only the ahci test suite?

"make check" and "make check-qtest" are running many tests that I'm not
interested in.


Kind regards,
Niklas

[PATCH 1/1] maintainers: update maintainers list for vfio-user & multi-process QEMU

2023-06-07 Thread Jagannathan Raman

Signed-off-by: Jagannathan Raman 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 436b3f0afefd..4a80a385118d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3786,7 +3786,6 @@ F: tests/tcg/aarch64/system/semiheap.c
 Multi-process QEMU
 M: Elena Ufimtseva 
 M: Jagannathan Raman 
-M: John G Johnson 
 S: Maintained
 F: docs/devel/multi-process.rst
 F: docs/system/multi-process.rst
-- 
2.20.1

[PATCH 0/1] update maintainers list for vfio-user & multi-process QEMU

2023-06-07 Thread Jagannathan Raman

John Johnson doesn't work at Oracle anymore. I tried to contact him to
get his updated email address, but I haven't heard anything from him.

Jagannathan Raman (1):
  maintainers: update maintainers list for vfio-user & multi-process
QEMU

 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

-- 
2.20.1

Re: [PATCH V3] migration: simplify blockers

2023-06-07 Thread Peter Xu

On Wed, Jun 07, 2023 at 07:35:32AM -0700, Steve Sistare wrote:
> Modify migrate_add_blocker and migrate_del_blocker to take an Error **
> reason.  This allows migration to own the Error object, so that if
> an error occurs, migration code can free the Error and clear the client
> handle, simplifying client code.
> 
> This is also a pre-requisite for future patches that will add a mode
> argument to migration requests to support live update, and will maintain
> a list of blockers for each mode.  A blocker may apply to a single mode
> or to multiple modes, and passing Error** will allow one Error object to
> be registered for multiple modes.
> 
> No functional change.
> 
> Signed-off-by: Steve Sistare 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH V9 00/46] Live Update

2023-06-07 Thread Michael Galaxy

Another option could be to expose "-migrate-mode-disable" (instead of
enable) and just enable all 3 modes by default,
since we are already required to switch from "normal" mode to a
CPR-specific mode when it is time to do a live update,
if the intention is to preserve the capability to completely prevent a
running QEMU from using these modes

before the VM starts up.

- Michael

On 6/6/23 17:15, Michael Galaxy wrote:

Hi Steve,

In the current design you have, we have to specify both the command
line parameter "-migrate-mode-enable cpr-reboot"

*and* issue the monitor command "migrate_set_parameter mode cpr-${mode}".

Is it possible to opt-in to the CPR mode just once over the monitor
instead of having to specify it twice on the command line?
This would also match the live migration model: You do not need to
necessarily "opt in" to live migration mode through
a command line parameter, you simply request it when you need to. Can
CPR behave the same way?

This would also make switching over to a CPR-capable version of QEMU
much simpler and would even make it work for
existing libvirt-managed guests as their command line parameters would
no longer need to change. This would allow us to
simply power-off and power-on existing VMs to make them CPR-capable
and then work on a libvirt patch later when

we're ready to do so.

Comments?

- Michael

On 12/7/22 09:48, Steven Sistare wrote:

This series desperately needs review in its intersection with live migration.
The code in other areas has been reviewed and revised multiple times -- thank
you!

David, Juan, can you spare some time to review this? I have done my best to
order
the patches logically (see the labelled groups in this email), and to provide
complete and clear cover letter and commit messages. Can I do anything to
facilitate,
like doing a code walk through via zoom?

And of course, I welcome anyone's feedback.

Here is the original posting.

https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/1658851843-236870-1-git-send-email-steven.sist...@oracle.com/__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3EAFRUVu$

- Steve

On 7/26/2022 12:09 PM, Steve Sistare wrote:

This version of the live update patch series integrates live update into the
live migration framework. The new interfaces are:
* mode (migration parameter)
* cpr-exec-args (migration parameter)
* file (migration URI)
* migrate-mode-enable (command-line argument)
* only-cpr-capable (command-line argument)

Provide the cpr-exec and cpr-reboot migration modes for live update. These
save and restore VM state, with minimal guest pause time, so that qemu may be
updated to a new version in between. The caller sets the mode parameter
before invoking the migrate or migrate-incoming commands.

In cpr-reboot mode, the migrate command saves state to a file, allowing
one to quit qemu, reboot to an updated kernel, start an updated version of
qemu, and resume via the migrate-incoming command. The caller must specify
a migration URI that writes to and reads from a file. Unlike normal mode,
the use of certain local storage options does not block the migration, but
the caller must not modify guest block devices between the quit and restart.
The guest RAM memory-backend must be shared, and the @x-ignore-shared
migration capability must be set, to avoid saving it to the file. Guest RAM
must be non-volatile across reboot, which can be achieved by backing it with
a dax device, or /dev/shm PKRAM as proposed in
https://urldefense.com/v3/__https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com__;!!GjvTz_vk!U6U0yYHuO4GRyGQUNpn0TQdwlC2QjJfgYC0yE249AuONq8-5rs48pZ6l0K-LOSRHMA9cU2op2U8GC9hU3AKRJQux$
but this is not enforced. The restarted qemu arguments must match those used

to initially start qemu, plus the -incoming option.

The reboot mode supports vfio devices if the caller first suspends the guest,
such as by issuing guest-suspend-ram to the qemu guest agent. The guest
drivers' suspend methods flush outstanding requests and re-initialize the
devices, and thus there is no device state to save and restore. After
issuing migrate-incoming, the caller must issue a system_wakeup command to
resume.

In cpr-exec mode, the migrate command saves state to a file and directly
exec's a new version of qemu on the same host, replacing the original process
while retaining its PID. The caller must specify a migration URI that writes
to and reads from a file, and resumes execution via the migrate-incoming
command. Arguments for the new qemu process are taken from the cpr-exec-args
migration parameter, and must include the -incoming option.

Guest RAM must be backed by a memory backend with share=on, but cannot be
memory-backend-ram. The memory is re-mmap'd in the updated process, so guest
ram is efficiently preserved in place, albeit with new virtual addresses.
In addition, the '-migrate-mode-enable

1 2 3 >

1 - 100 of 268 matches

Mail list logo