date:20230216

Re: [RFC 12/52] hw/acpi: Replace MachineState.smp access with topology helpers

2023-02-16 Thread wangyanan (Y)

在 2023/2/17 11:14, Zhao Liu 写道:

On Thu, Feb 16, 2023 at 05:31:11PM +0800, wangyanan (Y) wrote:

Date: Thu, 16 Feb 2023 17:31:11 +0800
From: "wangyanan (Y)" 
Subject: Re: [RFC 12/52] hw/acpi: Replace MachineState.smp access with
  topology helpers

Hi Zhao,

在 2023/2/13 17:49, Zhao Liu 写道:

From: Zhao Liu 

At present, in QEMU only arm needs PPTT table to build cpu topology.

Before QEMU's arm supports hybrid architectures, it's enough to limit
the cpu topology of PPTT to smp type through the explicit smp interface
(machine_topo_get_smp_threads()).

Cc: Michael S. Tsirkin 
Cc: Igor Mammedov 
Cc: Ani Sinha 
Signed-off-by: Zhao Liu 
---
   hw/acpi/aml-build.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index ea331a20d131..693bd8833d10 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2044,7 +2044,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
   cluster_offset = socket_offset;
   }
-if (ms->smp.threads == 1) {
+if (machine_topo_get_smp_threads(ms) == 1) {
   build_processor_hierarchy_node(table_data,
   (1 << 1) | /* ACPI Processor ID valid */
   (1 << 3),  /* Node is a Leaf */

ACPI PPTT table is designed to also support the hybrid CPU topology
case where nodes on the same CPU topology level can have different
number of child nodes.

So to be general, the diff should be:
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index ea331a20d1..dfded95bbc 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2044,7 +2044,7 @@ void build_pptt(GArray *table_data, BIOSLinker
*linker, MachineState *ms,
  cluster_offset = socket_offset;
  }

-    if (ms->smp.threads == 1) {
+    if (machine_topo_get_threads_by_idx(n) == 1) {
  build_processor_hierarchy_node(table_data,
  (1 << 1) | /* ACPI Processor ID valid */
  (1 << 3),  /* Node is a Leaf */

Nice! I'll replace that.

Actually I'm recently working on ARM hmp virtualization which relys on
PPTT for topology representation, so we will also need PPTT to be general
for hybrid case anyway.

Good to know that you are considering hybrid support for arm.
BTW, I explained the difference between arm and x86's hybrid in previous
email [1] [2], mainly about whether the cpm model is the same.

I tentatively think that this difference can be solved by arch-specific
coretype(). Do you have any comments on this? Thanks!

Will look at that. Thanks.

[1]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg03884.html
[2]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg03789.html

Thanks,
Yanan

Re: [RFC 41/52] machine: Introduce core_type() hook

2023-02-16 Thread wangyanan (Y)


在 2023/2/17 11:26, Zhao Liu 写道:

On Thu, Feb 16, 2023 at 08:15:23PM +0800, wangyanan (Y) wrote:

Date: Thu, 16 Feb 2023 20:15:23 +0800
From: "wangyanan (Y)" 
Subject: Re: [RFC 41/52] machine: Introduce core_type() hook

Hi Zhao,

在 2023/2/13 17:50, Zhao Liu 写道:

From: Zhao Liu 

Since supported core types are architecture specific, we need this hook
to allow archs define its own parsing or validation method.

As the example, add the x86 core_type() which will be used in "-hybrid"
parameter parsing.

Signed-off-by: Zhao Liu 
---
   hw/core/machine-topo.c | 14 ++
   hw/core/machine.c  |  1 +
   hw/i386/x86.c  | 15 +++
   include/hw/boards.h|  7 +++
   4 files changed, 37 insertions(+)

diff --git a/hw/core/machine-topo.c b/hw/core/machine-topo.c
index 12c05510c1b5..f9ab08a1252e 100644
--- a/hw/core/machine-topo.c
+++ b/hw/core/machine-topo.c
@@ -352,3 +352,17 @@ void machine_parse_smp_config(MachineState *ms,
   return;
   }
   }
+
+/*
+ * machine_parse_hybrid_core_type: the default hook to parse hybrid core
+ * type corresponding to the coretype
+ * string option.
+ */
+int machine_parse_hybrid_core_type(MachineState *ms, const char *coretype)
+{
+if (strcmp(coretype, "") == 0 || strcmp(coretype, "none") == 0) {
+return 0;
+}
+
+return -1;
+}

Is it possible that coretype can be NULL?
What would *coretype be if the users don't explicitly specify coretype
in the command line?

At present, the coretype field cannot be omitted, which requires other code
changes to support omission (if omission is required in the future, there
should be an arch-specific method to supplement the default coretype at the
same time).

IIUC, we may need to support the handling of omission case at the
beginning. Not all archs have/need the core type concept when they
support hybrid, and if an arch does not have the core type concept,
it's best to forbid it in the CLI and leave the handling to the generic
machine_parse_hybrid_core_type and the arch-specific core_type
hook should be NULL.

Thanks,
Yanan



diff --git a/hw/core/machine.c b/hw/core/machine.c
index fad990f49b03..acc32b3be5f6 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -926,6 +926,7 @@ static void machine_class_init(ObjectClass *oc, void *data)
* On Linux, each node's border has to be 8MB aligned
*/
   mc->numa_mem_align_shift = 23;
+mc->core_type = machine_parse_hybrid_core_type;
   object_class_property_add_str(oc, "kernel",
   machine_get_kernel, machine_set_kernel);
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index f381fdc43180..f58a90359170 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1569,6 +1569,20 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, 
const char *name,
   qapi_free_SgxEPCList(list);
   }
+static int x86_parse_hybrid_core_type(MachineState *ms, const char *coretype)
+{
+X86HybridCoreType type;
+
+if (strcmp(coretype, "atom") == 0) {
+type = INTEL_ATOM_TYPE;
+} else if (strcmp(coretype, "core") == 0) {
+type = INTEL_CORE_TYPE;
+} else {
+type = INVALID_HYBRID_TYPE;
+}

What about:
INTEL_CORE_TYPE_ATOM
INTEL_CORE_TYPE_CORE
X86_CORE_TYPE_UNKNOWN ?
just a suggestion.

It looks better! Thanks.


Thanks,
Yanan

+return type;
+}
+
   static void x86_machine_initfn(Object *obj)
   {
   X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1596,6 +1610,7 @@ static void x86_machine_class_init(ObjectClass *oc, void 
*data)
   x86mc->save_tsc_khz = true;
   x86mc->fwcfg_dma_enabled = true;
   nc->nmi_monitor_handler = x86_nmi;
+mc->core_type = x86_parse_hybrid_core_type;
   object_class_property_add(oc, X86_MACHINE_SMM, "OnOffAuto",
   x86_machine_get_smm, x86_machine_set_smm,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 9364c90d5f1a..34ec035b5c9f 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -36,6 +36,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
  Error **errp);
   void machine_parse_smp_config(MachineState *ms,
 const SMPConfiguration *config, Error **errp);
+int machine_parse_hybrid_core_type(MachineState *ms, const char *coretype);
   /**
* machine_class_allow_dynamic_sysbus_dev: Add type to list of valid devices
@@ -199,6 +200,11 @@ typedef struct {
*Return the type of KVM corresponding to the kvm-type string option or
*computed based on other criteria such as the host kernel capabilities.
*kvm-type may be NULL if it is not needed.
+ * @core_type:
+ *Return the type of hybrid cores corresponding to the coretype string
+ *option. The default hook only accept "none" or "" since the most generic
+ *core topology should not specify any specific core type. Each arch can
+ *define its own core_type() hook to override the default one.

Re: [PATCH v2] Adding new machine Tiogapass in QEMU

2023-02-16 Thread Cédric Le Goater


Hello Karthikeyan,

On 2/16/23 19:43, Karthikeyan Pasupathi wrote:

This patch support Tiogapass in QEMU environment.
and introduced EEPROM BMC FRU data support "add tiogapass_bmc_fruid data"
along with the machine support.

Signed-off-by: Karthikeyan Pasupathi 


There are a couple of coding style issues that I will fix. This is minor.
(./scripts/checkpatch.pl is a good tool to run before sending.)

Reviewed-by: Cédric Le Goater 

Thanks,

C.


---
  hw/arm/aspeed.c| 32 
  hw/arm/aspeed_eeprom.c | 22 ++
  hw/arm/aspeed_eeprom.h |  3 +++
  3 files changed, 57 insertions(+)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 27dda58338..d12164420d 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -530,6 +530,15 @@ static void romulus_bmc_i2c_init(AspeedMachineState *bmc)
  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, 11), "ds1338", 
0x32);
  }
  
+static void tiogapass_bmc_i2c_init(AspeedMachineState *bmc)

+{
+AspeedSoCState *soc = &bmc->soc;
+
+at24c_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 4), 0x54, 128 * KiB);
+at24c_eeprom_init_rom(aspeed_i2c_get_bus(&soc->i2c, 6), 0x54, 128 * KiB, 
tiogapass_bmc_fruid,
+  tiogapass_bmc_fruid_len);
+}
+
  static void create_pca9552(AspeedSoCState *soc, int bus_id, int addr)
  {
  i2c_slave_create_simple(aspeed_i2c_get_bus(&soc->i2c, bus_id),
@@ -1191,6 +1200,25 @@ static void 
aspeed_machine_romulus_class_init(ObjectClass *oc, void *data)
  aspeed_soc_num_cpus(amc->soc_name);
  };
  
+static void aspeed_machine_tiogapass_class_init(ObjectClass *oc, void *data)

+{
+MachineClass *mc = MACHINE_CLASS(oc);
+AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
+
+mc->desc   = "Facebook Tiogapass BMC (ARM1176)";
+amc->soc_name  = "ast2500-a1";
+amc->hw_strap1 = AST2500_EVB_HW_STRAP1;
+amc->hw_strap2 = 0;
+amc->fmc_model = "n25q256a";
+amc->spi_model = "mx25l25635e";
+amc->num_cs= 2;
+amc->i2c_init  = tiogapass_bmc_i2c_init;
+mc->default_ram_size   = 1 * GiB;
+mc->default_cpus = mc->min_cpus = mc->max_cpus =
+aspeed_soc_num_cpus(amc->soc_name);
+aspeed_soc_num_cpus(amc->soc_name);
+};
+
  static void aspeed_machine_sonorapass_class_init(ObjectClass *oc, void *data)
  {
  MachineClass *mc = MACHINE_CLASS(oc);
@@ -1566,6 +1594,10 @@ static const TypeInfo aspeed_machine_types[] = {
  .name  = MACHINE_TYPE_NAME("tacoma-bmc"),
  .parent= TYPE_ASPEED_MACHINE,
  .class_init= aspeed_machine_tacoma_class_init,
+}, {
+.name  = MACHINE_TYPE_NAME("tiogapass-bmc"),
+.parent= TYPE_ASPEED_MACHINE,
+.class_init= aspeed_machine_tiogapass_class_init,
  }, {
  .name  = MACHINE_TYPE_NAME("g220a-bmc"),
  .parent= TYPE_ASPEED_MACHINE,
diff --git a/hw/arm/aspeed_eeprom.c b/hw/arm/aspeed_eeprom.c
index 04463acc9d..f937a6ceaa 100644
--- a/hw/arm/aspeed_eeprom.c
+++ b/hw/arm/aspeed_eeprom.c
@@ -6,6 +6,27 @@
  
  #include "aspeed_eeprom.h"
  
+/* Tiogapass BMC FRU */

+const uint8_t tiogapass_bmc_fruid[] = {
+0x01, 0x00, 0x00, 0x01, 0x0d, 0x00, 0x00, 0xf1, 0x01, 0x0c, 0x00, 0x36,
+0xe6, 0xd0, 0xc6, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xd2, 0x42, 0x4d,
+0x43, 0x20, 0x53, 0x74, 0x6f, 0x72, 0x61, 0x67, 0x65, 0x20, 0x4d, 0x6f,
+0x64, 0x75, 0x6c, 0x65, 0xcd, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58,
+0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xce, 0x58, 0x58, 0x58, 0x58, 0x58,
+0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc3, 0x31, 0x2e,
+0x30, 0xc9, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xd2,
+0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58,
+0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc1, 0x39, 0x01, 0x0c, 0x00, 0xc6,
+0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xd2, 0x54, 0x69, 0x6f, 0x67, 0x61,
+0x20, 0x50, 0x61, 0x73, 0x73, 0x20, 0x53, 0x69, 0x6e, 0x67, 0x6c, 0x65,
+0x32, 0xce, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58,
+0x58, 0x58, 0x58, 0x58, 0xc4, 0x58, 0x58, 0x58, 0x32, 0xcd, 0x58, 0x58,
+0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc7,
+0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc3, 0x31, 0x2e, 0x30, 0xc9,
+0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc8, 0x43, 0x6f,
+0x6e, 0x66, 0x69, 0x67, 0x20, 0x41, 0xc1, 0x45,
+};
+
  const uint8_t fby35_nic_fruid[] = {
  0x01, 0x00, 0x00, 0x01, 0x0f, 0x20, 0x00, 0xcf, 0x01, 0x0e, 0x19, 0xd7,
  0x5e, 0xcf, 0xc8, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xdd,
@@ -77,6 +98,7 @@ const uint8_t fby35_bmc_fruid[] = {
  0x6e, 0x66, 0x69, 0x67, 0x20, 0x41, 0xc1, 0x45,
  };
  
+const size_t tiogapass_bmc_fruid_len = sizeof(tiogapass_bmc_fruid);

  const size_t fby35_nic_fruid_len = sizeof(fby35_nic_fruid);
  const size_t fby35_bb_fruid_len = sizeof(fby35_bb_fruid);
  cons

Re: [PATCH qemu v3 0/2] hw/at24c support eeprom size less than equal 256 byte

2023-02-16 Thread Cédric Le Goater


Add Peter D. since he wrote the fuji.

Thanks for the contribution !

C.

On 2/17/23 04:43, ~ssinprem wrote:

- hw/at24c : modify at24c to support 1 byte address mode
- aspeed/fuji : correct the eeprom size

Sittisak Sinprem (2):
   hw/at24c : modify at24c to support 1 byte address mode
   aspeed/fuji : correct the eeprom size

  hw/arm/aspeed.c | 36 
  hw/nvram/eeprom_at24c.c | 28 +---
  2 files changed, 45 insertions(+), 19 deletions(-)

Re: [RFC 08/52] machine: Add helpers to get cpu topology info from MachineState.topo

2023-02-16 Thread wangyanan (Y)

在 2023/2/17 11:07, Zhao Liu 写道:

On Thu, Feb 16, 2023 at 04:38:38PM +0800, wangyanan (Y) wrote:

Date: Thu, 16 Feb 2023 16:38:38 +0800
From: "wangyanan (Y)" 
Subject: Re: [RFC 08/52] machine: Add helpers to get cpu topology info from
  MachineState.topo

Hi Zhao,

在 2023/2/13 17:49, Zhao Liu 写道:

From: Zhao Liu 

When MachineState.topo is introduced, the topology related structures
become complicated. In the general case (hybrid or smp topology),
accessing the topology information needs to determine whether it is
currently smp or hybrid topology, and then access the corresponding
MachineState.topo.smp or MachineState.topo.hybrid.

The best way to do this is to wrap the access to the topology to
avoid having to check each time it is accessed.

The following helpers are provided here:

- General interfaces - no need to worry about whether the underlying
topology is smp or hybrid:

* machine_topo_get_cpus()
* machine_topo_get_max_cpus()
* machine_topo_is_smp()
* machine_topo_get_sockets()
* machine_topo_get_dies()
* machine_topo_get_clusters()
* machine_topo_get_threads();
* machine_topo_get_cores();
* machine_topo_get_threads_by_idx()
* machine_topo_get_cores_by_idx()
* machine_topo_get_cores_per_socket()
* machine_topo_get_threads_per_socket()

- SMP-specific interfaces - provided for the cases that are clearly
known to be smp topology:

* machine_topo_get_smp_cores()
* machine_topo_get_smp_threads()

Since for hybrid topology, each core may has different threads, if
someone wants "cpus per core", the cpu_index is need to target a
specific core (machine_topo_get_threads_by_idx()). But for smp, there is
no need to be so troublesome, so for this case, we provide smp-specific
interfaces.

Signed-off-by: Zhao Liu 
---
   hw/core/machine-topo.c | 142 +
   include/hw/boards.h|  35 ++
   2 files changed, 177 insertions(+)

diff --git a/hw/core/machine-topo.c b/hw/core/machine-topo.c
index 7223f73f99b0..b20160479629 100644
--- a/hw/core/machine-topo.c
+++ b/hw/core/machine-topo.c
@@ -21,6 +21,148 @@
   #include "hw/boards.h"
   #include "qapi/error.h"
+unsigned int machine_topo_get_sockets(const MachineState *ms)
+{
+return machine_topo_is_smp(ms) ? ms->topo.smp.sockets :
+ ms->topo.hybrid.sockets;
+}
+
+unsigned int machine_topo_get_dies(const MachineState *ms)
+{
+return machine_topo_is_smp(ms) ? ms->topo.smp.dies :
+ ms->topo.hybrid.dies;
+}
+
+unsigned int machine_topo_get_clusters(const MachineState *ms)
+{
+return machine_topo_is_smp(ms) ? ms->topo.smp.clusters :
+ ms->topo.hybrid.clusters;
+}
+
+unsigned int machine_topo_get_smp_cores(const MachineState *ms)
+{
+g_assert(machine_topo_is_smp(ms));
+return ms->topo.smp.cores;
+}
+
+unsigned int machine_topo_get_smp_threads(const MachineState *ms)
+{
+g_assert(machine_topo_is_smp(ms));
+return ms->topo.smp.threads;
+}
+
+unsigned int machine_topo_get_threads(const MachineState *ms,
+  unsigned int cluster_id,
+  unsigned int core_id)
+{
+if (machine_topo_is_smp(ms)) {
+return ms->topo.smp.threads;
+} else {
+return ms->topo.hybrid.cluster_list[cluster_id]
+   .core_list[core_id].threads;
+}
+
+return 0;
+}
+
+unsigned int machine_topo_get_cores(const MachineState *ms,
+unsigned int cluster_id)
+{
+if (machine_topo_is_smp(ms)) {
+return ms->topo.smp.cores;
+} else {
+return ms->topo.hybrid.cluster_list[cluster_id].cores;
+}
+}

Is it possible to use variadic function so that those two smp specific
helpers can be avoided? It's a bit wired that we have the generic
machine_topo_get_threads but also need machine_topo_get_smp_threads
at the same time.

I am not sure about this, because variadic functions unify function
naming, but eliminate the "smp-specific" information from the name.

Trying to get the cres/threads without considering the cpu index can
only be used in smp scenarios, and I think the caller needs to
understand that he knows it's smp.

Ok, I get the point.
When it comes to the naming, would it be more concise to remove the
*_get_* in the fun name, such as machine_topo_get_cpus to
machine_topo_cpus, machine_topo_get_clusters to machine_topo_clusters.

And maybe rename machine_topo_get_cores(int cluster_id, int core_id) to
machine_topo_cores_by_ids?

Or machine_topo_get_cores() to machine_topo_cores_by_topo_ids()
and machine_topo_get_cores_by_idx to machine_topo_cores_by_cpu_idx()

+
+unsigned int machine_topo_get_threads_by_idx(const MachineState *ms,
+ unsigned int cpu_index)
+{
+unsigned cpus_per_die;
+unsigned tmp_idx;
+HybridCluster *cluster;
+HybridCore *core;
+
+if (machine_topo_is_smp(ms)) {
+return ms->t

Re: [PATCH v2 01/13] vdpa net: move iova tree creation from init to start

2023-02-16 Thread Si-Wei Liu





On 2/15/2023 11:35 PM, Eugenio Perez Martin wrote:

On Thu, Feb 16, 2023 at 3:15 AM Si-Wei Liu  wrote:



On 2/14/2023 11:07 AM, Eugenio Perez Martin wrote:

On Tue, Feb 14, 2023 at 2:45 AM Si-Wei Liu  wrote:


On 2/13/2023 3:14 AM, Eugenio Perez Martin wrote:

On Mon, Feb 13, 2023 at 7:51 AM Si-Wei Liu  wrote:

On 2/8/2023 1:42 AM, Eugenio Pérez wrote:

Only create iova_tree if and when it is needed.

The cleanup keeps being responsible of last VQ but this change allows it
to merge both cleanup functions.

Signed-off-by: Eugenio Pérez 
Acked-by: Jason Wang 
---
 net/vhost-vdpa.c | 99 ++--
 1 file changed, 71 insertions(+), 28 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index de5ed8ff22..a9e6c8f28e 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -178,13 +178,9 @@ err_init:
 static void vhost_vdpa_cleanup(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
-struct vhost_dev *dev = &s->vhost_net->dev;

 qemu_vfree(s->cvq_cmd_out_buffer);
 qemu_vfree(s->status);
-if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
-g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
-}
 if (s->vhost_net) {
 vhost_net_cleanup(s->vhost_net);
 g_free(s->vhost_net);
@@ -234,10 +230,64 @@ static ssize_t vhost_vdpa_receive(NetClientState *nc, 
const uint8_t *buf,
 return size;
 }

+/** From any vdpa net client, get the netclient of first queue pair */
+static VhostVDPAState *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
+{
+NICState *nic = qemu_get_nic(s->nc.peer);
+NetClientState *nc0 = qemu_get_peer(nic->ncs, 0);
+
+return DO_UPCAST(VhostVDPAState, nc, nc0);
+}
+
+static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
+{
+struct vhost_vdpa *v = &s->vhost_vdpa;
+
+if (v->shadow_vqs_enabled) {
+v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
+   v->iova_range.last);
+}
+}
+
+static int vhost_vdpa_net_data_start(NetClientState *nc)
+{
+VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+struct vhost_vdpa *v = &s->vhost_vdpa;
+
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+if (v->index == 0) {
+vhost_vdpa_net_data_start_first(s);
+return 0;
+}
+
+if (v->shadow_vqs_enabled) {
+VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s);
+v->iova_tree = s0->vhost_vdpa.iova_tree;
+}
+
+return 0;
+}
+
+static void vhost_vdpa_net_client_stop(NetClientState *nc)
+{
+VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+struct vhost_dev *dev;
+
+assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
+
+dev = s->vhost_vdpa.dev;
+if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
+g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
+}
+}
+
 static NetClientInfo net_vhost_vdpa_info = {
 .type = NET_CLIENT_DRIVER_VHOST_VDPA,
 .size = sizeof(VhostVDPAState),
 .receive = vhost_vdpa_receive,
+.start = vhost_vdpa_net_data_start,
+.stop = vhost_vdpa_net_client_stop,
 .cleanup = vhost_vdpa_cleanup,
 .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
 .has_ufo = vhost_vdpa_has_ufo,
@@ -351,7 +401,7 @@ dma_map_err:

 static int vhost_vdpa_net_cvq_start(NetClientState *nc)
 {
-VhostVDPAState *s;
+VhostVDPAState *s, *s0;
 struct vhost_vdpa *v;
 uint64_t backend_features;
 int64_t cvq_group;
@@ -425,6 +475,15 @@ out:
 return 0;
 }

+s0 = vhost_vdpa_net_first_nc_vdpa(s);
+if (s0->vhost_vdpa.iova_tree) {
+/* SVQ is already configured for all virtqueues */
+v->iova_tree = s0->vhost_vdpa.iova_tree;
+} else {
+v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
+   v->iova_range.last);

I wonder how this case could happen, vhost_vdpa_net_data_start_first()
should've allocated an iova tree on the first data vq. Is zero data vq
ever possible on net vhost-vdpa?


It's the case of the current qemu master when only CVQ is being
shadowed. It's not that "there are no data vq": If that case were
possible, CVQ vhost-vdpa state would be s0.

The case is that since only CVQ vhost-vdpa is the one being migrated,
only CVQ has an iova tree.

OK, so this corresponds to the case where live migration is not started
and CVQ starts in its own address space of VHOST_VDPA_NET_CVQ_ASID.
Thanks for explaining it!


With this series applied and with no migration running, the case is
the same as before: only SVQ gets shadowed. When migration starts, all
vqs are migrated, and share iova tree.

I wonder what is the reason to share the iova tree when migration
starts, I think CVQ may stay on its own VHOST_VDPA_NET_CVQ_ASID still?

Actual

Re: [PATCH RESEND 18/18] i386: Add new property to control L2 cache topo in CPUID.04H

2023-02-16 Thread Zhao Liu

On Fri, Feb 17, 2023 at 12:07:01PM +0800, wangyanan (Y) wrote:
> Date: Fri, 17 Feb 2023 12:07:01 +0800
> From: "wangyanan (Y)" 
> Subject: Re: [PATCH RESEND 18/18] i386: Add new property to control L2
>  cache topo in CPUID.04H
> 
> 在 2023/2/17 11:35, Zhao Liu 写道:
> > On Thu, Feb 16, 2023 at 09:14:54PM +0800, wangyanan (Y) wrote:
> > > Date: Thu, 16 Feb 2023 21:14:54 +0800
> > > From: "wangyanan (Y)" 
> > > Subject: Re: [PATCH RESEND 18/18] i386: Add new property to control L2
> > >   cache topo in CPUID.04H
> > > 
> > > 在 2023/2/13 17:36, Zhao Liu 写道:
> > > > From: Zhao Liu 
> > > > 
> > > > The property x-l2-cache-topo will be used to change the L2 cache
> > > > topology in CPUID.04H.
> > > > 
> > > > Now it allows user to set the L2 cache is shared in core level or
> > > > cluster level.
> > > > 
> > > > If user passes "-cpu x-l2-cache-topo=[core|cluster]" then older L2 cache
> > > > topology will be overrided by the new topology setting.
> > > Currently x-l2-cache-topo only defines the share level *globally*.
> > Yes, will set for all CPUs.
> > 
> > > I'm thinking how we can make the property more powerful so that it
> > > can specify which CPUs share l2 on core level and which CPUs share
> > > l2 on cluster level.
> > > 
> > > What would Intel's Hybrid CPUs do? Determine the l2 share level
> > > is core or cluster according to the CPU core type (Atom or Core)?
> > > While ARM does not have the core type concept but have CPUs
> > > that l2 is shared on different levels in the same system.
> > For example, Alderlake's "core" shares 1 L2 per core and every 4 "atom"s
> > share 1 L2. For this case, we can set the topology as:
> > 
> > cluster0 has 1 "core" and cluster1 has 4 "atom". Then set L2 shared on
> > cluster level.
> > 
> > Since cluster0 has only 1 "core" type core, then L2 per "core" works.
> > 
> > Not sure if this idea can be applied to arm?
> For a CPU topopoly where we have 2 clusters totally, 2 cores in cluster0
> have their own L1/L2 cache and 2 threads in each core, 4 cores in cluster1
> share one L2 cache and 1 thread in each core. The global way does not
> work well.
> 
> What about defining something general, which looks like -numa config:
> -cache-topo cache=l2, share_level="core", cpus='0-3'
> -cache-topo cache=l2, share_level="cluster", cpus='4-7'

Hi Yanan, here it may be necessary to check whether the cpu index set
in "cpus" is reasonable through the specific cpu topology.

For example, core0 has 2 CPUs: cpu0 and cpu1, and core1 has 2 CPUs: cpu2
and cpu3, then set l2 as:

-cache-topo cache=l2, share_level="core", cpus='0-2'
-cache-topo cache=l2, share_level="core", cpus='3'

Whether this command is legal depends on the meaning we give to the
parameter "cpu":
1. If "cpu" means all cpus share the cache set in this command, then
this command should fail since cpu2 and cpu3 are in a core.

2. If "cpu" means the affected cpus, then this command should find the
cores they belong to according to the cpu topology, and set L2 for those
cores. This command may return success.

What about removing share_level and ask "cpu" to mean all the sharing
cpus to avoid checking the cpu topology?

Then the above example should be:

-cache-topo cache=l2, cpus='0-1'
-cache-topo cache=l2, cpus='2-3'

This decouples cpu topology and cache topology completely and very
simple. In this way, determining the cache by specifying the shared cpu
is similar to that in x86 CPUID.04H.

But the price of simplicity is we may build a cache topology that doesn't
match the reality.

But if the cache topology must be set based on the cpu topology, another
way is consider specifying the cache when setting the topology
structure, which can be based on @Daniel's format [1]:

  -object cpu-socket,id=sock0,cache=l3
  -object cpu-die,id=die0,parent=sock0
  -object cpu-cluster,id=cluster0,parent=die0
  -object cpu-cluster,id=cluster1,parent=die0,cache=l2
  -object x86-cpu-model-core,id=cpu0,parent=cluster0,threads=2,cache=l1i,lid,l2
  -object x86-cpu-model-atom,id=cpu1,parent=cluster1,cache=l1i,lid
  -object x86-cpu-model-atom,id=cpu2,parent=cluster1,cache=l1i,l1d

Then from this command, cpu0 has a l2, and cpu1 and cpu2 shares a l2
(the l2 is inserted in cluster1).

This whole process is like when designing or building a CPU, the user
decides where to insert the caches. The advantage is that it is easier
to verify the rationality and is intuitive. But complicated.

(Also CC @Daniel for comments).

[1]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg03320.html

Thanks,
Zhao

> If we ever want to support custom share-level for L3/L1, no extra work
> is needed. We can also extend the CLI to support custom cache size, etc..
> 
> If you thinks this a good idea to explore, I can work on it, since I'm
> planing to add support cache topology for ARM.
> 
> Thanks,
> Yanan
> > > Thanks,
> > > Yanan
> > > > Here we expose to user "cluster" instead of "module", to be consistent
> > > > with "cluster-id" naming.
> > > > 
> > > > Si

[PATCH v3] audio/pwaudio.c: Add Pipewire audio backend for QEMU

2023-02-16 Thread Dorinda Bassey

This commit adds a new audiodev backend to allow QEMU to use Pipewire as
both an audio sink and source. This backend is available on most systems

Add Pipewire entry points for QEMU Pipewire audio backend
Add wrappers for QEMU Pipewire audio backend in qpw_pcm_ops()
qpw_write function returns the current state of the stream to pwaudio
and Writes some data to the server for playback streams using pipewire
spa_ringbuffer implementation.
qpw_read function returns the current state of the stream to pwaudio and
reads some data from the server for capture streams using pipewire
spa_ringbuffer implementation. These functions qpw_write and qpw_read
are called during playback and capture.
Added some functions that convert pw audio formats to QEMU audio format
and vice versa which would be needed in the pipewire audio sink and
source functions qpw_init_in() & qpw_init_out().
These methods that implement playback and recording will create streams
for playback and capture that will start processing and will result in
the on_process callbacks to be called.
Built a connection to the Pipewire sound system server in the
qpw_audio_init() method.

Signed-off-by: Dorinda Bassey 
---
v3:
Wrap commit log
add checks for v->stream
use constants instead of literals
fix typo error

 audio/audio.c |   3 +
 audio/audio_template.h|   4 +
 audio/meson.build |   1 +
 audio/pwaudio.c   | 827 ++
 meson.build   |   7 +
 meson_options.txt |   4 +-
 qapi/audio.json   |  45 ++
 qemu-options.hx   |  17 +
 scripts/meson-buildoptions.sh |   8 +-
 9 files changed, 913 insertions(+), 3 deletions(-)
 create mode 100644 audio/pwaudio.c

diff --git a/audio/audio.c b/audio/audio.c
index 4290309d18..aa55e41ad8 100644
--- a/audio/audio.c
+++ b/audio/audio.c
@@ -2069,6 +2069,9 @@ void audio_create_pdos(Audiodev *dev)
 #ifdef CONFIG_AUDIO_PA
 CASE(PA, pa, Pa);
 #endif
+#ifdef CONFIG_AUDIO_PIPEWIRE
+CASE(PIPEWIRE, pipewire, Pipewire);
+#endif
 #ifdef CONFIG_AUDIO_SDL
 CASE(SDL, sdl, Sdl);
 #endif
diff --git a/audio/audio_template.h b/audio/audio_template.h
index 42b4712acb..0f02afb921 100644
--- a/audio/audio_template.h
+++ b/audio/audio_template.h
@@ -355,6 +355,10 @@ AudiodevPerDirectionOptions *glue(audio_get_pdo_, 
TYPE)(Audiodev *dev)
 case AUDIODEV_DRIVER_PA:
 return qapi_AudiodevPaPerDirectionOptions_base(dev->u.pa.TYPE);
 #endif
+#ifdef CONFIG_AUDIO_PIPEWIRE
+case AUDIODEV_DRIVER_PIPEWIRE:
+return 
qapi_AudiodevPipewirePerDirectionOptions_base(dev->u.pipewire.TYPE);
+#endif
 #ifdef CONFIG_AUDIO_SDL
 case AUDIODEV_DRIVER_SDL:
 return qapi_AudiodevSdlPerDirectionOptions_base(dev->u.sdl.TYPE);
diff --git a/audio/meson.build b/audio/meson.build
index 074ba9..65a49c1a10 100644
--- a/audio/meson.build
+++ b/audio/meson.build
@@ -19,6 +19,7 @@ foreach m : [
   ['sdl', sdl, files('sdlaudio.c')],
   ['jack', jack, files('jackaudio.c')],
   ['sndio', sndio, files('sndioaudio.c')],
+  ['pipewire', pipewire, files('pwaudio.c')],
   ['spice', spice, files('spiceaudio.c')]
 ]
   if m[1].found()
diff --git a/audio/pwaudio.c b/audio/pwaudio.c
new file mode 100644
index 00..05a00b0859
--- /dev/null
+++ b/audio/pwaudio.c
@@ -0,0 +1,827 @@
+/*
+ * QEMU Pipewire audio driver
+ *
+ * Copyright (c) 2023 Red Hat Inc.
+ *
+ * Author: Dorinda Bassey   
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/module.h"
+#include "audio.h"
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define AUDIO_CAP "pipewire"
+#define RINGBUFFER_SIZE(1u << 22)
+#define RINGBUFFER_MASK(RINGBUFFER_SIZE - 1)
+#define BUFFER_SAMPLES512
+
+#include "audio_int.h"
+
+enum {
+MODE_SINK,
+MODE_SOURCE
+};
+
+typedef struct pwaudio {
+Audiodev *dev;
+struct pw_thread_loop *thread_loop;
+

Re: CXL 2.0 memory pooling emulation

2023-02-16 Thread Gregory Price

On Thu, Feb 16, 2023 at 06:00:57PM +, Jonathan Cameron wrote:
> On Wed, 15 Feb 2023 04:10:20 -0500
> Gregory Price  wrote:
> 
> > On Wed, Feb 15, 2023 at 03:18:54PM +, Jonathan Cameron via wrote:
> > > On Wed, 8 Feb 2023 16:28:44 -0600
> > > zhiting zhu  wrote:
> > >   
> > > 1) Emulate an Multi Headed Device.
> > >Initially connect two heads to different host bridges on a single QEMU
> > >machine.  That lets us test most of the code flows without needing
> > >to handle tests that involve multiple machines.
> > >Later, we could add a means to connect between two instances of QEMU.  
> > 
> > Hackiest way to do this is to connect the same memory backend to two
> > type-3 devices, with the obvious caveat that the device state will not
> > be consistent between views.
> > 
> > But we could, for example, just put the relevant shared state into an
> > optional shared memory area instead of a normally allocated region.
> > 
> > i can imagine this looking something like
> > 
> > memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=4G,share=true
> > cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,shm_token=mytoken
> > 
> > then you can have multiple qemu instances hook their relevant devices up
> > to a a backend that points to the same file, and instantiate their
> > shared state in the region shmget(mytoken).
> 
> That's not pretty.  For local instance I was thinking a primary device
> which also has the FM-API tunneled access via mailbox, and secondary devices
> that don't.  That would also apply to remote. The secondary device would
> then just receive some control commands on what to expose up to it's host.
> Not sure what convention on how to do that is in QEMU. Maybe a socket
> interface like is done for swtpm? With some ordering constraints on startup.
> 

I agree, it's certainly "not pretty".

I'd go so far as to call the baby ugly :].  Like i said: "The Hackiest way"

My understanding from looking around at some road shows is that some
of these early multi-headed devices are basically just SLD's with multiple
heads. Most of these devices had to be developed well before DCD's and
therefore the FM-API were placed in the spec, and we haven't seen or
heard of any of these early devices having any form of switch yet.

I don't see how this type of device is feasible unless it's either statically
provisioned (change firmware settings from bios on reboot) or implements
custom firmware commands to implement some form of exclusivity controls over
memory regions.

The former makes it not really a useful pooling device, so I'm sorta guessing
we'll see most of these early devices implement custom commands.

I'm just not sure these early MHD's are going to have any real form of
FM-API, but it would still be nice to emulate them.

~Gregory

Re: [PATCH v6 1/3] multifd: Create property multifd-flush-after-each-section

2023-02-16 Thread Markus Armbruster

Juan Quintela  writes:

> Markus Armbruster  wrote:
>> Juan Quintela  writes:
>>
> @@ -478,6 +478,24 @@
>  #should not affect the correctness of postcopy 
> migration.
>  #(since 7.1)
>  #
> +# @multifd-flush-after-each-section: flush every channel after each
> +#section sent.  This assures that
> +#we can't mix pages from one
> +#iteration through ram pages with
>>
>> RAM
>
> OK.
>
> +#pages for the following
> +#iteration.  We really only need
> +#to do this flush after we have go
>>
>> to flush after we have gone
>
> OK
>
> +#through all the dirty pages.
> +#For historical reasons, we do
> +#that after each section.  This is
>
>> we flush after each section
>
> OK
>
> +#suboptimal (we flush too many
> +#times).
>
>> inefficient: we flush too often.
>
> OK
>
> +#Default value is false.
> +#Setting this capability has no
> +#effect until the patch that
> +#removes this comment.
> +#(since 8.0)

 IMHO the core of this new "cap" is the new RAM_SAVE_FLAG_MULTIFD_FLUSH bit
 in the stream protocol, but it's not referenced here.  I would suggest
 simplify the content but highlight the core change:
>>>
>>> Actually it is the other way around.  What this capability will do is
>>> _NOT_ use RAM_SAVE_FLAG_MULTIFD_FLUSH protocol.
>>>
  @multifd-lazy-flush:  When enabled, multifd will only do sync flush after
>>
>> Spell out "synchronrous".
>
> ok.
>
each whole round of bitmap scan.  Otherwise it'll be
>>
>> Suggest to scratch "whole".
>
> ok.
>
done per RAM save iteration (which happens with a 
 much
higher frequency).
>>
>> Less detail than Juan's version.  I'm not sure how much detail is
>> appropriate for QMP reference documentation.
>>
Please consider enable this as long as possible, and
keep this off only if any of the src/dst QEMU binary
doesn't support it.
>>
>> Clear guidance on how to use it, good!
>>
>> Perhaps state it more forcefully: "Enable this when both source and
>> destination support it."
>>

This capability is bound to the new RAM save flag
RAM_SAVE_FLAG_MULTIFD_FLUSH, the new flag will only
be used and recognized when this feature bit set.
>>
>> Is RAM_SAVE_FLAG_MULTIFD_FLUSH visible in the QMP interface?  Or in the
>> migration stream?
>
> No.  Only migration stream.

Doc comments should be written for readers of the QEMU QMP Reference
Manual.  Is RAM_SAVE_FLAG_MULTIFD_FLUSH relevant for them?

Perhaps the relevant part is "the peer needs to enable this capability
too, or else", for a value of "or else".

What happens when the source enables, and the destination doesn't?

What happens when the destination enables, and the source doesn't?

Any particular reason for having the destination recognize the flag only
when the capability is enabled?

>> I'm asking because doc comments are QMP reference documentation, but
>> when writing them, it's easy to mistake them for internal documentation,
>> because, well, they're comments.
>
>>> Name is wrong.  It would be multifd-non-lazy-flush.  And I don't like
>>> negatives.  Real name is:
>>>
>>> multifd-I-messed-and-flush-too-many-times.
>>
>> If you don't like "non-lazy", say "eager".
>
> more than eager it is unnecesary.

"overeager"?

 I know you dislike multifd-lazy-flush, but that's still the best I can come
 up with when writting this (yeah I still like it :-p), please bare with me
 and take whatever you think the best.
>>>
>>> Libvirt assumes that all capabilities are false except if enabled.
>>> We want RAM_SAVE_FLAG_MULTFD_FLUSH by default (in new machine types).
>>>
>>> So, if we can do
>>>
>>> capability_use_new_way = true
>>>
>>> We change that to
>>>
>>> capability_use_old_way = true
>>>
>>> And then by default with false value is what we want.
>>
>> Eventually, all supported migration peers will support lazy flush.  What
>> then?  Will we flip the default?  Or will we ignore the capability and
>> always flush lazily?
>
> I have to take a step back.  Cope with me.
>
> How we fix problems in migration that make the stream incompatible.
> We create a property.
>
> sta

[PULL V3 9/9] vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check

2023-02-16 Thread Jason Wang

From: Eugenio Pérez 

VHOST_BACKEND_F_IOTLB_ASID is the feature bit, not the bitmask. Since
the device under test also provided VHOST_BACKEND_F_IOTLB_MSG_V2 and
VHOST_BACKEND_F_IOTLB_BATCH, this went unnoticed.

Fixes: c1a1008685 ("vdpa: always start CVQ in SVQ mode if possible")
Signed-off-by: Eugenio Pérez 
Reviewed-by: Michael S. Tsirkin 
Acked-by: Jason Wang 
Signed-off-by: Jason Wang 
---
 net/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 1a13a34..de5ed8f 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -384,7 +384,7 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc)
 g_strerror(errno), errno);
 return -1;
 }
-if (!(backend_features & VHOST_BACKEND_F_IOTLB_ASID) ||
+if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) ||
 !vhost_vdpa_net_valid_svq_features(v->dev->features, NULL)) {
 return 0;
 }
-- 
2.7.4

[PULL V3 8/9] net: stream: add a new option to automatically reconnect

2023-02-16 Thread Jason Wang

From: Laurent Vivier 

In stream mode, if the server shuts down there is currently
no way to reconnect the client to a new server without removing
the NIC device and the netdev backend (or to reboot).

This patch introduces a reconnect option that specifies a delay
to try to reconnect with the same parameters.

Add a new test in qtest to test the reconnect option and the
connect/disconnect events.

Signed-off-by: Laurent Vivier 
Signed-off-by: Jason Wang 
---
 net/stream.c|  53 ++-
 qapi/net.json   |   7 ++-
 qemu-options.hx |   6 +--
 tests/qtest/netdev-socket.c | 101 
 4 files changed, 162 insertions(+), 5 deletions(-)

diff --git a/net/stream.c b/net/stream.c
index 37ff727..9204b4c 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -39,6 +39,8 @@
 #include "io/channel-socket.h"
 #include "io/net-listener.h"
 #include "qapi/qapi-events-net.h"
+#include "qapi/qapi-visit-sockets.h"
+#include "qapi/clone-visitor.h"
 
 typedef struct NetStreamState {
 NetClientState nc;
@@ -49,11 +51,15 @@ typedef struct NetStreamState {
 guint ioc_write_tag;
 SocketReadState rs;
 unsigned int send_index;  /* number of bytes sent*/
+uint32_t reconnect;
+guint timer_tag;
+SocketAddress *addr;
 } NetStreamState;
 
 static void net_stream_listen(QIONetListener *listener,
   QIOChannelSocket *cioc,
   void *opaque);
+static void net_stream_arm_reconnect(NetStreamState *s);
 
 static gboolean net_stream_writable(QIOChannel *ioc,
 GIOCondition condition,
@@ -170,6 +176,7 @@ static gboolean net_stream_send(QIOChannel *ioc,
 qemu_set_info_str(&s->nc, "%s", "");
 
 qapi_event_send_netdev_stream_disconnected(s->nc.name);
+net_stream_arm_reconnect(s);
 
 return G_SOURCE_REMOVE;
 }
@@ -187,6 +194,14 @@ static gboolean net_stream_send(QIOChannel *ioc,
 static void net_stream_cleanup(NetClientState *nc)
 {
 NetStreamState *s = DO_UPCAST(NetStreamState, nc, nc);
+if (s->timer_tag) {
+g_source_remove(s->timer_tag);
+s->timer_tag = 0;
+}
+if (s->addr) {
+qapi_free_SocketAddress(s->addr);
+s->addr = NULL;
+}
 if (s->ioc) {
 if (QIO_CHANNEL_SOCKET(s->ioc)->fd != -1) {
 if (s->ioc_read_tag) {
@@ -346,12 +361,37 @@ static void net_stream_client_connected(QIOTask *task, 
gpointer opaque)
 error:
 object_unref(OBJECT(s->ioc));
 s->ioc = NULL;
+net_stream_arm_reconnect(s);
+}
+
+static gboolean net_stream_reconnect(gpointer data)
+{
+NetStreamState *s = data;
+QIOChannelSocket *sioc;
+
+s->timer_tag = 0;
+
+sioc = qio_channel_socket_new();
+s->ioc = QIO_CHANNEL(sioc);
+qio_channel_socket_connect_async(sioc, s->addr,
+ net_stream_client_connected, s,
+ NULL, NULL);
+return G_SOURCE_REMOVE;
+}
+
+static void net_stream_arm_reconnect(NetStreamState *s)
+{
+if (s->reconnect && s->timer_tag == 0) {
+s->timer_tag = g_timeout_add_seconds(s->reconnect,
+ net_stream_reconnect, s);
+}
 }
 
 static int net_stream_client_init(NetClientState *peer,
   const char *model,
   const char *name,
   SocketAddress *addr,
+  uint32_t reconnect,
   Error **errp)
 {
 NetStreamState *s;
@@ -364,6 +404,10 @@ static int net_stream_client_init(NetClientState *peer,
 s->ioc = QIO_CHANNEL(sioc);
 s->nc.link_down = true;
 
+s->reconnect = reconnect;
+if (reconnect) {
+s->addr = QAPI_CLONE(SocketAddress, addr);
+}
 qio_channel_socket_connect_async(sioc, addr,
  net_stream_client_connected, s,
  NULL, NULL);
@@ -380,7 +424,14 @@ int net_init_stream(const Netdev *netdev, const char *name,
 sock = &netdev->u.stream;
 
 if (!sock->has_server || !sock->server) {
-return net_stream_client_init(peer, "stream", name, sock->addr, errp);
+return net_stream_client_init(peer, "stream", name, sock->addr,
+  sock->has_reconnect ? sock->reconnect : 
0,
+  errp);
+}
+if (sock->has_reconnect) {
+error_setg(errp, "'reconnect' option is incompatible with "
+ "socket in server mode");
+return -1;
 }
 return net_stream_server_init(peer, "stream", name, sock->addr, errp);
 }
diff --git a/qapi/net.json b/qapi/net.json
index 522ac58..d6eb300 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -585,6 +585,10 @@
 # @addr: socket address to listen on (server=true)
 #or connect to (server=

[PULL V3 3/9] net: Replace "Supported NIC models" with "Available NIC models"

2023-02-16 Thread Jason Wang

From: Thomas Huth 

Just because a NIC model is compiled into the QEMU binary does not
necessary mean that it can be used with each and every machine.
So let's rather talk about "available" models instead of "supported"
models, just to avoid confusion.

Reviewed-by: Claudio Fontana 
Signed-off-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 net/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/net.c b/net/net.c
index e8cd95c..ebc7ce0 100644
--- a/net/net.c
+++ b/net/net.c
@@ -941,7 +941,7 @@ int qemu_show_nic_models(const char *arg, const char *const 
*models)
 return 0;
 }
 
-printf("Supported NIC models:\n");
+printf("Available NIC models:\n");
 for (i = 0 ; models[i]; i++) {
 printf("%s\n", models[i]);
 }
-- 
2.7.4

[PULL V3 5/9] hw/net/vmxnet3: allow VMXNET3_MAX_MTU itself as a value

2023-02-16 Thread Jason Wang

From: Fiona Ebner 

Currently, VMXNET3_MAX_MTU itself (being 9000) is not considered a
valid value for the MTU, but a guest running ESXi 7.0 might try to
set it and fail the assert [0].

In the Linux kernel, dev->max_mtu itself is a valid value for the MTU
and for the vmxnet3 driver it's 9000, so a guest running Linux will
also fail the assert when trying to set an MTU of 9000.

VMXNET3_MAX_MTU and s->mtu don't seem to be used in relation to buffer
allocations/accesses, so allowing the upper limit itself as a value
should be fine.

[0]: https://forum.proxmox.com/threads/114011/

Fixes: d05dcd94ae ("net: vmxnet3: validate configuration values during activate 
(CVE-2021-20203)")
Signed-off-by: Fiona Ebner 
Signed-off-by: Jason Wang 
---
 hw/net/vmxnet3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index d2ab527..56559cd 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1441,7 +1441,7 @@ static void vmxnet3_activate_device(VMXNET3State *s)
 vmxnet3_setup_rx_filtering(s);
 /* Cache fields from shared memory */
 s->mtu = VMXNET3_READ_DRV_SHARED32(d, s->drv_shmem, devRead.misc.mtu);
-assert(VMXNET3_MIN_MTU <= s->mtu && s->mtu < VMXNET3_MAX_MTU);
+assert(VMXNET3_MIN_MTU <= s->mtu && s->mtu <= VMXNET3_MAX_MTU);
 VMW_CFPRN("MTU is %u", s->mtu);
 
 s->max_rx_frags =
-- 
2.7.4

[PULL V3 1/9] net: Move the code to collect available NIC models to a separate function

2023-02-16 Thread Jason Wang

From: Thomas Huth 

The code that collects the available NIC models is not really specific
to PCI anymore and will be required in the next patch, too, so let's
move this into a new separate function in net.c instead.

Signed-off-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 hw/pci/pci.c  | 29 +
 include/net/net.h | 14 ++
 net/net.c | 34 ++
 3 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 208c16f..cc51f98 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1789,7 +1789,6 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, PCIBus 
*rootbus,
const char *default_devaddr)
 {
 const char *devaddr = nd->devaddr ? nd->devaddr : default_devaddr;
-GSList *list;
 GPtrArray *pci_nic_models;
 PCIBus *bus;
 PCIDevice *pci_dev;
@@ -1804,33 +1803,7 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, PCIBus 
*rootbus,
 nd->model = g_strdup("virtio-net-pci");
 }
 
-list = object_class_get_list_sorted(TYPE_PCI_DEVICE, false);
-pci_nic_models = g_ptr_array_new();
-while (list) {
-DeviceClass *dc = OBJECT_CLASS_CHECK(DeviceClass, list->data,
- TYPE_DEVICE);
-GSList *next;
-if (test_bit(DEVICE_CATEGORY_NETWORK, dc->categories) &&
-dc->user_creatable) {
-const char *name = object_class_get_name(list->data);
-/*
- * A network device might also be something else than a NIC, see
- * e.g. the "rocker" device. Thus we have to look for the "netdev"
- * property, too. Unfortunately, some devices like virtio-net only
- * create this property during instance_init, so we have to create
- * a temporary instance here to be able to check it.
- */
-Object *obj = object_new_with_class(OBJECT_CLASS(dc));
-if (object_property_find(obj, "netdev")) {
-g_ptr_array_add(pci_nic_models, (gpointer)name);
-}
-object_unref(obj);
-}
-next = list->next;
-g_slist_free_1(list);
-list = next;
-}
-g_ptr_array_add(pci_nic_models, NULL);
+pci_nic_models = qemu_get_nic_models(TYPE_PCI_DEVICE);
 
 if (qemu_show_nic_models(nd->model, (const char **)pci_nic_models->pdata)) 
{
 exit(0);
diff --git a/include/net/net.h b/include/net/net.h
index fad589c..1d88621 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -203,6 +203,20 @@ void net_socket_rs_init(SocketReadState *rs,
 bool vnet_hdr);
 NetClientState *qemu_get_peer(NetClientState *nc, int queue_index);
 
+/**
+ * qemu_get_nic_models:
+ * @device_type: Defines which devices should be taken into consideration
+ *   (e.g. TYPE_DEVICE for all devices, or TYPE_PCI_DEVICE for PCI)
+ *
+ * Get an array of pointers to names of NIC devices that are available in
+ * the QEMU binary. The array is terminated with a NULL pointer entry.
+ * The caller is responsible for freeing the memory when it is not required
+ * anymore, e.g. with g_ptr_array_free(..., true).
+ *
+ * Returns: Pointer to the array that contains the pointers to the names.
+ */
+GPtrArray *qemu_get_nic_models(const char *device_type);
+
 /* NIC info */
 
 #define MAX_NICS 8
diff --git a/net/net.c b/net/net.c
index 251fc5a..476a4b7 100644
--- a/net/net.c
+++ b/net/net.c
@@ -899,6 +899,40 @@ static int nic_get_free_idx(void)
 return -1;
 }
 
+GPtrArray *qemu_get_nic_models(const char *device_type)
+{
+GPtrArray *nic_models = g_ptr_array_new();
+GSList *list = object_class_get_list_sorted(device_type, false);
+
+while (list) {
+DeviceClass *dc = OBJECT_CLASS_CHECK(DeviceClass, list->data,
+ TYPE_DEVICE);
+GSList *next;
+if (test_bit(DEVICE_CATEGORY_NETWORK, dc->categories) &&
+dc->user_creatable) {
+const char *name = object_class_get_name(list->data);
+/*
+ * A network device might also be something else than a NIC, see
+ * e.g. the "rocker" device. Thus we have to look for the "netdev"
+ * property, too. Unfortunately, some devices like virtio-net only
+ * create this property during instance_init, so we have to create
+ * a temporary instance here to be able to check it.
+ */
+Object *obj = object_new_with_class(OBJECT_CLASS(dc));
+if (object_property_find(obj, "netdev")) {
+g_ptr_array_add(nic_models, (gpointer)name);
+}
+object_unref(obj);
+}
+next = list->next;
+g_slist_free_1(list);
+list = next;
+}
+g_ptr_array_add(nic_models, NULL);
+
+return nic_models;
+}
+
 int qemu_show_nic_models(const char *arg, const char *const *models)
 {

[PULL V3 7/9] vmnet: stop recieving events when VM is stopped

2023-02-16 Thread Jason Wang

From: Joelle van Dyne 

When the VM is stopped using the HMP command "stop", soon the handler will
stop reading from the vmnet interface. This causes a flood of
`VMNET_INTERFACE_PACKETS_AVAILABLE` events to arrive and puts the host CPU
at 100%. We fix this by removing the event handler from vmnet when the VM
is no longer in a running state and restore it when we return to a running
state.

Signed-off-by: Joelle van Dyne 
Signed-off-by: Jason Wang 
---
 net/vmnet-common.m | 48 +++-
 net/vmnet_int.h|  2 ++
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/net/vmnet-common.m b/net/vmnet-common.m
index 2cb60b9..2958283 100644
--- a/net/vmnet-common.m
+++ b/net/vmnet-common.m
@@ -17,6 +17,7 @@
 #include "clients.h"
 #include "qemu/error-report.h"
 #include "qapi/error.h"
+#include "sysemu/runstate.h"
 
 #include 
 #include 
@@ -242,6 +243,35 @@ static void vmnet_bufs_init(VmnetState *s)
 }
 }
 
+/**
+ * Called on state change to un-register/re-register handlers
+ */
+static void vmnet_vm_state_change_cb(void *opaque, bool running, RunState 
state)
+{
+VmnetState *s = opaque;
+
+if (running) {
+vmnet_interface_set_event_callback(
+s->vmnet_if,
+VMNET_INTERFACE_PACKETS_AVAILABLE,
+s->if_queue,
+^(interface_event_t event_id, xpc_object_t event) {
+assert(event_id == VMNET_INTERFACE_PACKETS_AVAILABLE);
+/*
+ * This function is being called from a non qemu thread, so
+ * we only schedule a BH, and do the rest of the io completion
+ * handling from vmnet_send_bh() which runs in a qemu context.
+ */
+qemu_bh_schedule(s->send_bh);
+});
+} else {
+vmnet_interface_set_event_callback(
+s->vmnet_if,
+VMNET_INTERFACE_PACKETS_AVAILABLE,
+NULL,
+NULL);
+}
+}
 
 int vmnet_if_create(NetClientState *nc,
 xpc_object_t if_desc,
@@ -329,19 +359,9 @@ int vmnet_if_create(NetClientState *nc,
 s->packets_send_current_pos = 0;
 s->packets_send_end_pos = 0;
 
-vmnet_interface_set_event_callback(
-s->vmnet_if,
-VMNET_INTERFACE_PACKETS_AVAILABLE,
-s->if_queue,
-^(interface_event_t event_id, xpc_object_t event) {
-assert(event_id == VMNET_INTERFACE_PACKETS_AVAILABLE);
-/*
- * This function is being called from a non qemu thread, so
- * we only schedule a BH, and do the rest of the io completion
- * handling from vmnet_send_bh() which runs in a qemu context.
- */
-qemu_bh_schedule(s->send_bh);
-});
+vmnet_vm_state_change_cb(s, 1, RUN_STATE_RUNNING);
+
+s->change = qemu_add_vm_change_state_handler(vmnet_vm_state_change_cb, s);
 
 return 0;
 }
@@ -356,6 +376,8 @@ void vmnet_cleanup_common(NetClientState *nc)
 return;
 }
 
+vmnet_vm_state_change_cb(s, 0, RUN_STATE_SHUTDOWN);
+qemu_del_vm_change_state_handler(s->change);
 if_stopped_sem = dispatch_semaphore_create(0);
 vmnet_stop_interface(
 s->vmnet_if,
diff --git a/net/vmnet_int.h b/net/vmnet_int.h
index d0b9059..a8a033d 100644
--- a/net/vmnet_int.h
+++ b/net/vmnet_int.h
@@ -45,6 +45,8 @@ typedef struct VmnetState {
 int packets_send_end_pos;
 
 struct iovec iov_buf[VMNET_PACKETS_LIMIT];
+
+VMChangeStateEntry *change;
 } VmnetState;
 
 const char *vmnet_status_map_str(vmnet_return_t status);
-- 
2.7.4

[PULL V3 4/9] hw/net/lan9118: log [read|write]b when mode_16bit is enabled rather than abort

2023-02-16 Thread Jason Wang

From: Qiang Liu 

This patch replaces hw_error to guest error log for [read|write]b
accesses when mode_16bit is enabled. This avoids aborting qemu.

Fixes: 1248f8d4cbc3 ("hw/lan9118: Add basic 16-bit mode support.")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1433
Reported-by: Qiang Liu 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Qiang Liu 
Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Jason Wang 
---
 hw/net/lan9118.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c
index f1cba55..e5c4af1 100644
--- a/hw/net/lan9118.c
+++ b/hw/net/lan9118.c
@@ -15,7 +15,6 @@
 #include "migration/vmstate.h"
 #include "net/net.h"
 #include "net/eth.h"
-#include "hw/hw.h"
 #include "hw/irq.h"
 #include "hw/net/lan9118.h"
 #include "hw/ptimer.h"
@@ -32,12 +31,8 @@
 #ifdef DEBUG_LAN9118
 #define DPRINTF(fmt, ...) \
 do { printf("lan9118: " fmt , ## __VA_ARGS__); } while (0)
-#define BADF(fmt, ...) \
-do { hw_error("lan9118: error: " fmt , ## __VA_ARGS__);} while (0)
 #else
 #define DPRINTF(fmt, ...) do {} while(0)
-#define BADF(fmt, ...) \
-do { fprintf(stderr, "lan9118: error: " fmt , ## __VA_ARGS__);} while (0)
 #endif
 
 /* The tx and rx fifo ports are a range of aliased 32-bit registers */
@@ -848,7 +843,8 @@ static uint32_t do_phy_read(lan9118_state *s, int reg)
 case 30: /* Interrupt mask */
 return s->phy_int_mask;
 default:
-BADF("PHY read reg %d\n", reg);
+qemu_log_mask(LOG_GUEST_ERROR,
+  "do_phy_read: PHY read reg %d\n", reg);
 return 0;
 }
 }
@@ -876,7 +872,8 @@ static void do_phy_write(lan9118_state *s, int reg, 
uint32_t val)
 phy_update_irq(s);
 break;
 default:
-BADF("PHY write reg %d = 0x%04x\n", reg, val);
+qemu_log_mask(LOG_GUEST_ERROR,
+  "do_phy_write: PHY write reg %d = 0x%04x\n", reg, val);
 }
 }
 
@@ -1209,7 +1206,8 @@ static void lan9118_16bit_mode_write(void *opaque, hwaddr 
offset,
 return;
 }
 
-hw_error("lan9118_write: Bad size 0x%x\n", size);
+qemu_log_mask(LOG_GUEST_ERROR,
+  "lan9118_16bit_mode_write: Bad size 0x%x\n", size);
 }
 
 static uint64_t lan9118_readl(void *opaque, hwaddr offset,
@@ -1324,7 +1322,8 @@ static uint64_t lan9118_16bit_mode_read(void *opaque, 
hwaddr offset,
 return lan9118_readl(opaque, offset, size);
 }
 
-hw_error("lan9118_read: Bad size 0x%x\n", size);
+qemu_log_mask(LOG_GUEST_ERROR,
+  "lan9118_16bit_mode_read: Bad size 0x%x\n", size);
 return 0;
 }
 
-- 
2.7.4

[PULL V3 6/9] net: Increase L2TPv3 buffer to fit jumboframes

2023-02-16 Thread Jason Wang

From: Christian Svensson 

Increase the allocated buffer size to fit larger packets.
Given that jumboframes can commonly be up to 9000 bytes the closest suitable
value seems to be 16 KiB.

Tested by running qemu towards a Linux L2TPv3 endpoint and pushing
jumboframe traffic through the interfaces.

Signed-off-by: Christian Svensson 
Signed-off-by: Jason Wang 
---
 net/l2tpv3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/l2tpv3.c b/net/l2tpv3.c
index 53b2d32..b5547cb 100644
--- a/net/l2tpv3.c
+++ b/net/l2tpv3.c
@@ -42,7 +42,7 @@
  */
 
 #define BUFFER_ALIGN sysconf(_SC_PAGESIZE)
-#define BUFFER_SIZE 2048
+#define BUFFER_SIZE 16384
 #define IOVSIZE 2
 #define MAX_L2TPV3_MSGCNT 64
 #define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE)
-- 
2.7.4

[PULL V3 2/9] net: Restore printing of the help text with "-nic help"

2023-02-16 Thread Jason Wang

From: Thomas Huth 

Running QEMU with "-nic help" used to work in QEMU 5.2 and earlier versions
(it showed the available netdev backends), but this feature got broken during
some refactoring in version 6.0. Let's restore the old behavior, and while
we're at it, let's also print the available NIC models here now since this
option can be used to configure both, netdev backend and model in one go.

Fixes: ad6f932fe8 ("net: do not exit on "netdev_add help" monitor command")
Signed-off-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 net/net.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/net.c b/net/net.c
index 476a4b7..e8cd95c 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1542,8 +1542,18 @@ static int net_param_nic(void *dummy, QemuOpts *opts, 
Error **errp)
 const char *type;
 
 type = qemu_opt_get(opts, "type");
-if (type && g_str_equal(type, "none")) {
-return 0;/* Nothing to do, default_net is cleared in vl.c */
+if (type) {
+if (g_str_equal(type, "none")) {
+return 0;/* Nothing to do, default_net is cleared in vl.c */
+}
+if (is_help_option(type)) {
+GPtrArray *nic_models = qemu_get_nic_models(TYPE_DEVICE);
+show_netdevs();
+printf("\n");
+qemu_show_nic_models(type, (const char **)nic_models->pdata);
+g_ptr_array_free(nic_models, true);
+exit(0);
+}
 }
 
 idx = nic_get_free_idx();
-- 
2.7.4

[PULL V3 0/9] Net patches

2023-02-16 Thread Jason Wang

The following changes since commit 6dffbe36af79e26a4d23f94a9a1c1201de99c261:

  Merge tag 'migration-20230215-pull-request' of 
https://gitlab.com/juan.quintela/qemu into staging (2023-02-16 13:09:51 +)

are available in the git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 525ae115222f0b0b6de7f9665976f640d18c200a:

  vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check (2023-02-17 13:31:33 +0800)



Changes since V2:
- drop patch hw/net/can/xlnx-zynqmp-can: fix assertion failures in 
transfer_fifo()

Changes since V1:
- Fix the wrong guest error detection in xlnx-zynqmp-can


Christian Svensson (1):
  net: Increase L2TPv3 buffer to fit jumboframes

Eugenio Pérez (1):
  vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check

Fiona Ebner (1):
  hw/net/vmxnet3: allow VMXNET3_MAX_MTU itself as a value

Joelle van Dyne (1):
  vmnet: stop recieving events when VM is stopped

Laurent Vivier (1):
  net: stream: add a new option to automatically reconnect

Qiang Liu (1):
  hw/net/lan9118: log [read|write]b when mode_16bit is enabled rather than 
abort

Thomas Huth (3):
  net: Move the code to collect available NIC models to a separate function
  net: Restore printing of the help text with "-nic help"
  net: Replace "Supported NIC models" with "Available NIC models"

 hw/net/lan9118.c|  17 
 hw/net/vmxnet3.c|   2 +-
 hw/pci/pci.c|  29 +
 include/net/net.h   |  14 ++
 net/l2tpv3.c|   2 +-
 net/net.c   |  50 --
 net/stream.c|  53 ++-
 net/vhost-vdpa.c|   2 +-
 net/vmnet-common.m  |  48 +++--
 net/vmnet_int.h |   2 +
 qapi/net.json   |   7 ++-
 qemu-options.hx |   6 +--
 tests/qtest/netdev-socket.c | 101 
 13 files changed, 272 insertions(+), 61 deletions(-)

Re: [PATCH v6 2/9] target/riscv: introduce riscv_cpu_cfg()

2023-02-16 Thread Richard Henderson


On 2/16/23 11:55, Daniel Henrique Barboza wrote:

We're going to do changes that requires accessing the RISCVCPUConfig
struct from the RISCVCPU, having access only to a CPURISCVState 'env'
pointer. Add a helper to make the code easier to read.

Signed-off-by: Daniel Henrique Barboza
---
  target/riscv/cpu.h | 5 +
  1 file changed, 5 insertions(+)


Reviewed-by: Richard Henderson 

r~

TCG asserts on some of translation blocks with plugin memory callback

2023-02-16 Thread Mikhail Tyutin

Hello,

I have been testing TCG plugin patch on latest Qemu build but noticed that it
fails with assert on some of the applications.

   ERROR:../accel/tcg/cpu-exec.c:983:cpu_exec_loop:
   assertion failed: (cpu->plugin_mem_cbs == ((void *)0))

It happens when TCG plugin sets memory callback in some of translation blocks.
The callback can be empty, it just needs to be there. Debugging it further I
see inject_mem_enable_helper() and inject_mem_disable_helper() functions that
are intended to set and reset cpu->plugin_mem_cbs to appropriate value.

The problem is that inject_mem_disable_helper() part gets removed inside of
reachable_code_pass() function. As the result we see this assert (the pointer
is not set to NULL at the end of translation block as it expects). Here is OP
listing just before reachable_code_pass() call:

  ext32u_i64 rcx,tmp3
  add_i64 rip,rip,$0xa
  goto_tb $0x0
  exit_tb $0x7fff64013300
  mov_i64 tmp11,$0x0; this is a part
  st_i64 $0x0,env,$0xf540   ; of inject_mem_disable_helper()
  set_label $L0
  exit_tb $0x7fff64013303


reachable_code_pass() removes everything after exit_tb until it reaches
set_label op as ‘dead’ code, which seems to be correct.

The question is how it is expected to work? Should inject_mem_disable_helper()
insert its zeroing OPs after “set_label $L0” or before “goto_tb $0x0” operation
to avoid dead code block?

Re: [PULL V2 00/10] Net patches

2023-02-16 Thread Jason Wang

On Thu, Feb 16, 2023 at 4:00 PM Philippe Mathieu-Daudé
 wrote:
>
> Hi Jason,
>
> On 16/2/23 06:24, Jason Wang wrote:
> > The following changes since commit 6a50f64ca01d0a7b97f14f069762bfd88160f31e:
> >
> >Merge tag 'pull-request-2023-02-14' of https://gitlab.com/thuth/qemu 
> > into staging (2023-02-14 14:46:10 +)
> >
> > are available in the git repository at:
> >
> >https://github.com/jasowang/qemu.git tags/net-pull-request
> >
> > for you to fetch changes up to 5e53a346d8bd2bd22522e1e7abd8f122673e4adf:
> >
> >vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check (2023-02-16 13:17:57 
> > +0800)
> >
> > 
> >
> > Changes since V1:
> > - Fix the wrong guest error detection in xlnx-zynqmp-can
> >
> > 
> > Christian Svensson (1):
> >net: Increase L2TPv3 buffer to fit jumboframes
> >
> > Eugenio Pérez (1):
> >vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check
> >
> > Fiona Ebner (1):
> >hw/net/vmxnet3: allow VMXNET3_MAX_MTU itself as a value
> >
> > Joelle van Dyne (1):
> >vmnet: stop recieving events when VM is stopped
> >
> > Laurent Vivier (1):
> >net: stream: add a new option to automatically reconnect
> >
> > Qiang Liu (2):
> >hw/net/lan9118: log [read|write]b when mode_16bit is enabled rather 
> > than abort
> >hw/net/can/xlnx-zynqmp-can: fix assertion failures in transfer_fifo()
>
> Can you have a look at this comment from v1?
> https://lore.kernel.org/qemu-devel/572fcb76-b2f7-20ca-0701-e22dd4e4c...@linaro.org/

For some reason, I miss this.

I will drop this patch from the pull request now.

Thanks

>

[PULL 10/10] docs/fuzz: remove mentions of fork-based fuzzing

2023-02-16 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
---
 docs/devel/fuzzing.rst | 22 ++
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/docs/devel/fuzzing.rst b/docs/devel/fuzzing.rst
index 715330c856..3bfcb33fc4 100644
--- a/docs/devel/fuzzing.rst
+++ b/docs/devel/fuzzing.rst
@@ -19,11 +19,6 @@ responsibility to ensure that state is reset between 
fuzzing-runs.
 Building the fuzzers
 
 
-*NOTE*: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is
-much faster, since the page-map has a smaller size. This is due to the fact 
that
-AddressSanitizer maps ~20TB of memory, as part of its detection. This results
-in a large page-map, and a much slower ``fork()``.
-
 To build the fuzzers, install a recent version of clang:
 Configure with (substitute the clang binaries with the version you installed).
 Here, enable-sanitizers, is optional but it allows us to reliably detect bugs
@@ -296,10 +291,9 @@ input. It is also responsible for manually calling 
``main_loop_wait`` to ensure
 that bottom halves are executed and any cleanup required before the next input.
 
 Since the same process is reused for many fuzzing runs, QEMU state needs to
-be reset at the end of each run. There are currently two implemented
-options for resetting state:
+be reset at the end of each run. For example, this can be done by rebooting the
+VM, after each run.
 
-- Reboot the guest between runs.
   - *Pros*: Straightforward and fast for simple fuzz targets.
 
   - *Cons*: Depending on the device, does not reset all device state. If the
@@ -308,15 +302,3 @@ options for resetting state:
 reboot.
 
   - *Example target*: ``i440fx-qtest-reboot-fuzz``
-
-- Run each test case in a separate forked process and copy the coverage
-   information back to the parent. This is fairly similar to AFL's "deferred"
-   fork-server mode [3]
-
-  - *Pros*: Relatively fast. Devices only need to be initialized once. No need 
to
-do slow reboots or vmloads.
-
-  - *Cons*: Not officially supported by libfuzzer. Does not work well for
- devices that rely on dedicated threads.
-
-  - *Example target*: ``virtio-net-fork-fuzz``
-- 
2.39.0

[PULL 07/10] fuzz/virtio-blk: remove fork-based fuzzer

2023-02-16 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
---
 tests/qtest/fuzz/virtio_blk_fuzz.c | 51 --
 1 file changed, 7 insertions(+), 44 deletions(-)

diff --git a/tests/qtest/fuzz/virtio_blk_fuzz.c 
b/tests/qtest/fuzz/virtio_blk_fuzz.c
index a9fb9ecf6c..651fd4f043 100644
--- a/tests/qtest/fuzz/virtio_blk_fuzz.c
+++ b/tests/qtest/fuzz/virtio_blk_fuzz.c
@@ -19,7 +19,6 @@
 #include "standard-headers/linux/virtio_pci.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "fuzz.h"
-#include "fork_fuzz.h"
 #include "qos_fuzz.h"
 
 #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
@@ -128,48 +127,24 @@ static void virtio_blk_fuzz(QTestState *s, 
QVirtioBlkQueues* queues,
 }
 }
 
-static void virtio_blk_fork_fuzz(QTestState *s,
-const unsigned char *Data, size_t Size)
-{
-QVirtioBlk *blk = fuzz_qos_obj;
-static QVirtioBlkQueues *queues;
-if (!queues) {
-queues = qvirtio_blk_init(blk->vdev, 0);
-}
-if (fork() == 0) {
-virtio_blk_fuzz(s, queues, Data, Size);
-flush_events(s);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
-}
-
 static void virtio_blk_with_flag_fuzz(QTestState *s,
 const unsigned char *Data, size_t Size)
 {
 QVirtioBlk *blk = fuzz_qos_obj;
 static QVirtioBlkQueues *queues;
 
-if (fork() == 0) {
-if (Size >= sizeof(uint64_t)) {
-queues = qvirtio_blk_init(blk->vdev, *(uint64_t *)Data);
-virtio_blk_fuzz(s, queues,
- Data + sizeof(uint64_t), Size - sizeof(uint64_t));
-flush_events(s);
-}
-_Exit(0);
-} else {
+if (Size >= sizeof(uint64_t)) {
+queues = qvirtio_blk_init(blk->vdev, *(uint64_t *)Data);
+virtio_blk_fuzz(s, queues,
+Data + sizeof(uint64_t), Size - sizeof(uint64_t));
 flush_events(s);
-wait(NULL);
 }
+fuzz_reset(s);
 }
 
 static void virtio_blk_pre_fuzz(QTestState *s)
 {
 qos_init_path(s);
-counter_shm_init();
 }
 
 static void drive_destroy(void *path)
@@ -208,22 +183,10 @@ static void *virtio_blk_test_setup(GString *cmd_line, 
void *arg)
 
 static void register_virtio_blk_fuzz_targets(void)
 {
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "virtio-blk-fuzz",
-.description = "Fuzz the virtio-blk virtual queues, forking "
-"for each fuzz run",
-.pre_vm_init = &counter_shm_init,
-.pre_fuzz = &virtio_blk_pre_fuzz,
-.fuzz = virtio_blk_fork_fuzz,},
-"virtio-blk",
-&(QOSGraphTestOptions){.before = virtio_blk_test_setup}
-);
-
 fuzz_add_qos_target(&(FuzzTarget){
 .name = "virtio-blk-flags-fuzz",
-.description = "Fuzz the virtio-blk virtual queues, forking "
-"for each fuzz run (also fuzzes the virtio flags)",
-.pre_vm_init = &counter_shm_init,
+.description = "Fuzz the virtio-blk virtual queues. "
+"Also fuzzes the virtio flags)",
 .pre_fuzz = &virtio_blk_pre_fuzz,
 .fuzz = virtio_blk_with_flag_fuzz,},
 "virtio-blk",
-- 
2.39.0

[PULL 03/10] fuzz/generic-fuzz: use reboots instead of forks to reset state

2023-02-16 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
---
 tests/qtest/fuzz/generic_fuzz.c | 114 ++--
 1 file changed, 22 insertions(+), 92 deletions(-)

diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
index 7326f6840b..f4acfa45cc 100644
--- a/tests/qtest/fuzz/generic_fuzz.c
+++ b/tests/qtest/fuzz/generic_fuzz.c
@@ -18,7 +18,6 @@
 #include "tests/qtest/libqtest.h"
 #include "tests/qtest/libqos/pci-pc.h"
 #include "fuzz.h"
-#include "fork_fuzz.h"
 #include "string.h"
 #include "exec/memory.h"
 #include "exec/ramblock.h"
@@ -29,6 +28,8 @@
 #include "generic_fuzz_configs.h"
 #include "hw/mem/sparse-mem.h"
 
+static void pci_enum(gpointer pcidev, gpointer bus);
+
 /*
  * SEPARATOR is used to separate "operations" in the fuzz input
  */
@@ -47,7 +48,6 @@ enum cmds {
 OP_CLOCK_STEP,
 };
 
-#define DEFAULT_TIMEOUT_US 10
 #define USEC_IN_SEC 10
 
 #define MAX_DMA_FILL_SIZE 0x1
@@ -60,8 +60,6 @@ typedef struct {
 ram_addr_t size; /* The number of bytes until the end of the I/O region */
 } address_range;
 
-static useconds_t timeout = DEFAULT_TIMEOUT_US;
-
 static bool qtest_log_enabled;
 
 MemoryRegion *sparse_mem_mr;
@@ -589,30 +587,6 @@ static void op_disable_pci(QTestState *s, const unsigned 
char *data, size_t len)
 pci_disabled = true;
 }
 
-static void handle_timeout(int sig)
-{
-if (qtest_log_enabled) {
-fprintf(stderr, "[Timeout]\n");
-fflush(stderr);
-}
-
-/*
- * If there is a crash, libfuzzer/ASAN forks a child to run an
- * "llvm-symbolizer" process for printing out a pretty stacktrace. It
- * communicates with this child using a pipe.  If we timeout+Exit, while
- * libfuzzer is still communicating with the llvm-symbolizer child, we will
- * be left with an orphan llvm-symbolizer process. Sometimes, this appears
- * to lead to a deadlock in the forkserver. Use waitpid to check if there
- * are any waitable children. If so, exit out of the signal-handler, and
- * let libfuzzer finish communicating with the child, and exit, on its own.
- */
-if (waitpid(-1, NULL, WNOHANG) == 0) {
-return;
-}
-
-_Exit(0);
-}
-
 /*
  * Here, we interpret random bytes from the fuzzer, as a sequence of commands.
  * Some commands can be variable-width, so we use a separator, SEPARATOR, to
@@ -669,64 +643,32 @@ static void generic_fuzz(QTestState *s, const unsigned 
char *Data, size_t Size)
 size_t cmd_len;
 uint8_t op;
 
-if (fork() == 0) {
-struct sigaction sact;
-struct itimerval timer;
-sigset_t set;
-/*
- * Sometimes the fuzzer will find inputs that take quite a long time to
- * process. Often times, these inputs do not result in new coverage.
- * Even if these inputs might be interesting, they can slow down the
- * fuzzer, overall. Set a timeout for each command to avoid hurting
- * performance, too much
- */
-if (timeout) {
-
-sigemptyset(&sact.sa_mask);
-sact.sa_flags   = SA_NODEFER;
-sact.sa_handler = handle_timeout;
-sigaction(SIGALRM, &sact, NULL);
+op_clear_dma_patterns(s, NULL, 0);
+pci_disabled = false;
 
-sigemptyset(&set);
-sigaddset(&set, SIGALRM);
-pthread_sigmask(SIG_UNBLOCK, &set, NULL);
-
-memset(&timer, 0, sizeof(timer));
-timer.it_value.tv_sec = timeout / USEC_IN_SEC;
-timer.it_value.tv_usec = timeout % USEC_IN_SEC;
-}
-
-op_clear_dma_patterns(s, NULL, 0);
-pci_disabled = false;
-
-while (cmd && Size) {
-/* Reset the timeout, each time we run a new command */
-if (timeout) {
-setitimer(ITIMER_REAL, &timer, NULL);
-}
+QPCIBus *pcibus = qpci_new_pc(s, NULL);
+g_ptr_array_foreach(fuzzable_pci_devices, pci_enum, pcibus);
+qpci_free_pc(pcibus);
 
-/* Get the length until the next command or end of input */
-nextcmd = memmem(cmd, Size, SEPARATOR, strlen(SEPARATOR));
-cmd_len = nextcmd ? nextcmd - cmd : Size;
+while (cmd && Size) {
+/* Get the length until the next command or end of input */
+nextcmd = memmem(cmd, Size, SEPARATOR, strlen(SEPARATOR));
+cmd_len = nextcmd ? nextcmd - cmd : Size;
 
-if (cmd_len > 0) {
-/* Interpret the first byte of the command as an opcode */
-op = *cmd % (sizeof(ops) / sizeof((ops)[0]));
-ops[op](s, cmd + 1, cmd_len - 1);
+if (cmd_len > 0) {
+/* Interpret the first byte of the command as an opcode */
+op = *cmd % (sizeof(ops) / sizeof((ops)[0]));
+ops[op](s, cmd + 1, cmd_len - 1);
 
-/* Run the main loop */
-flush_events(s);
-}
-/* Advance to the next command */
-

[PULL 08/10] fuzz/i440fx: remove fork-based fuzzer

2023-02-16 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
---
 tests/qtest/fuzz/i440fx_fuzz.c | 27 +--
 1 file changed, 1 insertion(+), 26 deletions(-)

diff --git a/tests/qtest/fuzz/i440fx_fuzz.c b/tests/qtest/fuzz/i440fx_fuzz.c
index b17fc725df..155fe018f8 100644
--- a/tests/qtest/fuzz/i440fx_fuzz.c
+++ b/tests/qtest/fuzz/i440fx_fuzz.c
@@ -18,7 +18,6 @@
 #include "tests/qtest/libqos/pci-pc.h"
 #include "fuzz.h"
 #include "qos_fuzz.h"
-#include "fork_fuzz.h"
 
 
 #define I440FX_PCI_HOST_BRIDGE_CFG 0xcf8
@@ -89,6 +88,7 @@ static void i440fx_fuzz_qtest(QTestState *s,
   size_t Size)
 {
 ioport_fuzz_qtest(s, Data, Size);
+fuzz_reset(s);
 }
 
 static void pciconfig_fuzz_qos(QTestState *s, QPCIBus *bus,
@@ -145,17 +145,6 @@ static void i440fx_fuzz_qos(QTestState *s,
 pciconfig_fuzz_qos(s, bus, Data, Size);
 }
 
-static void i440fx_fuzz_qos_fork(QTestState *s,
-const unsigned char *Data, size_t Size) {
-if (fork() == 0) {
-i440fx_fuzz_qos(s, Data, Size);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
-}
-
 static const char *i440fx_qtest_argv = TARGET_NAME " -machine accel=qtest"
" -m 0 -display none";
 static GString *i440fx_argv(FuzzTarget *t)
@@ -163,10 +152,6 @@ static GString *i440fx_argv(FuzzTarget *t)
 return g_string_new(i440fx_qtest_argv);
 }
 
-static void fork_init(void)
-{
-counter_shm_init();
-}
 
 static void register_pci_fuzz_targets(void)
 {
@@ -178,16 +163,6 @@ static void register_pci_fuzz_targets(void)
 .get_init_cmdline = i440fx_argv,
 .fuzz = i440fx_fuzz_qtest});
 
-/* Uses libqos and forks to prevent state leakage */
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "i440fx-qos-fork-fuzz",
-.description = "Fuzz the i440fx using raw qtest commands and "
-   "rebooting after each run",
-.pre_vm_init = &fork_init,
-.fuzz = i440fx_fuzz_qos_fork,},
-"i440FX-pcihost",
-&(QOSGraphTestOptions){}
-);
 
 /*
  * Uses libqos. Doesn't do anything to reset state. Note that if we were to
-- 
2.39.0

[PULL 05/10] fuzz/virtio-scsi: remove fork-based fuzzer

2023-02-16 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
---
 tests/qtest/fuzz/virtio_scsi_fuzz.c | 51 -
 1 file changed, 7 insertions(+), 44 deletions(-)

diff --git a/tests/qtest/fuzz/virtio_scsi_fuzz.c 
b/tests/qtest/fuzz/virtio_scsi_fuzz.c
index b3220ef6cb..b6268efd59 100644
--- a/tests/qtest/fuzz/virtio_scsi_fuzz.c
+++ b/tests/qtest/fuzz/virtio_scsi_fuzz.c
@@ -20,7 +20,6 @@
 #include "standard-headers/linux/virtio_pci.h"
 #include "standard-headers/linux/virtio_scsi.h"
 #include "fuzz.h"
-#include "fork_fuzz.h"
 #include "qos_fuzz.h"
 
 #define PCI_SLOT0x02
@@ -132,48 +131,24 @@ static void virtio_scsi_fuzz(QTestState *s, 
QVirtioSCSIQueues* queues,
 }
 }
 
-static void virtio_scsi_fork_fuzz(QTestState *s,
-const unsigned char *Data, size_t Size)
-{
-QVirtioSCSI *scsi = fuzz_qos_obj;
-static QVirtioSCSIQueues *queues;
-if (!queues) {
-queues = qvirtio_scsi_init(scsi->vdev, 0);
-}
-if (fork() == 0) {
-virtio_scsi_fuzz(s, queues, Data, Size);
-flush_events(s);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
-}
-
 static void virtio_scsi_with_flag_fuzz(QTestState *s,
 const unsigned char *Data, size_t Size)
 {
 QVirtioSCSI *scsi = fuzz_qos_obj;
 static QVirtioSCSIQueues *queues;
 
-if (fork() == 0) {
-if (Size >= sizeof(uint64_t)) {
-queues = qvirtio_scsi_init(scsi->vdev, *(uint64_t *)Data);
-virtio_scsi_fuzz(s, queues,
- Data + sizeof(uint64_t), Size - sizeof(uint64_t));
-flush_events(s);
-}
-_Exit(0);
-} else {
+if (Size >= sizeof(uint64_t)) {
+queues = qvirtio_scsi_init(scsi->vdev, *(uint64_t *)Data);
+virtio_scsi_fuzz(s, queues,
+Data + sizeof(uint64_t), Size - sizeof(uint64_t));
 flush_events(s);
-wait(NULL);
 }
+fuzz_reset(s);
 }
 
 static void virtio_scsi_pre_fuzz(QTestState *s)
 {
 qos_init_path(s);
-counter_shm_init();
 }
 
 static void *virtio_scsi_test_setup(GString *cmd_line, void *arg)
@@ -189,22 +164,10 @@ static void *virtio_scsi_test_setup(GString *cmd_line, 
void *arg)
 
 static void register_virtio_scsi_fuzz_targets(void)
 {
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "virtio-scsi-fuzz",
-.description = "Fuzz the virtio-scsi virtual queues, forking "
-"for each fuzz run",
-.pre_vm_init = &counter_shm_init,
-.pre_fuzz = &virtio_scsi_pre_fuzz,
-.fuzz = virtio_scsi_fork_fuzz,},
-"virtio-scsi",
-&(QOSGraphTestOptions){.before = virtio_scsi_test_setup}
-);
-
 fuzz_add_qos_target(&(FuzzTarget){
 .name = "virtio-scsi-flags-fuzz",
-.description = "Fuzz the virtio-scsi virtual queues, forking "
-"for each fuzz run (also fuzzes the virtio flags)",
-.pre_vm_init = &counter_shm_init,
+.description = "Fuzz the virtio-scsi virtual queues. "
+"Also fuzzes the virtio flags",
 .pre_fuzz = &virtio_scsi_pre_fuzz,
 .fuzz = virtio_scsi_with_flag_fuzz,},
 "virtio-scsi",
-- 
2.39.0

[PULL 09/10] fuzz: remove fork-fuzzing scaffolding

2023-02-16 Thread Alexander Bulekov

Fork-fuzzing provides a few pros, but our implementation prevents us
from using fuzzers other than libFuzzer, and may be causing issues such
as coverage-failure builds on OSS-Fuzz. It is not a great long-term
solution as it depends on internal implementation details of libFuzzer
(which is no longer in active development). Remove it in favor of other
methods of resetting state between inputs.

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
---
 meson.build   |  4 ---
 tests/qtest/fuzz/fork_fuzz.c  | 41 -
 tests/qtest/fuzz/fork_fuzz.h  | 23 --
 tests/qtest/fuzz/fork_fuzz.ld | 56 ---
 tests/qtest/fuzz/meson.build  |  6 ++--
 5 files changed, 3 insertions(+), 127 deletions(-)
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.c
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.h
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.ld

diff --git a/meson.build b/meson.build
index a76c855312..b6f92bba35 100644
--- a/meson.build
+++ b/meson.build
@@ -215,10 +215,6 @@ endif
 # Specify linker-script with add_project_link_arguments so that it is not 
placed
 # within a linker --start-group/--end-group pair
 if get_option('fuzzing')
-  add_project_link_arguments(['-Wl,-T,',
-  (meson.current_source_dir() / 
'tests/qtest/fuzz/fork_fuzz.ld')],
- native: false, language: all_languages)
-
   # Specify a filter to only instrument code that is directly related to
   # virtual-devices.
   configure_file(output: 'instrumentation-filter',
diff --git a/tests/qtest/fuzz/fork_fuzz.c b/tests/qtest/fuzz/fork_fuzz.c
deleted file mode 100644
index 6ffb2a7937..00
--- a/tests/qtest/fuzz/fork_fuzz.c
+++ /dev/null
@@ -1,41 +0,0 @@
-/*
- * Fork-based fuzzing helpers
- *
- * Copyright Red Hat Inc., 2019
- *
- * Authors:
- *  Alexander Bulekov   
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#include "qemu/osdep.h"
-#include "fork_fuzz.h"
-
-
-void counter_shm_init(void)
-{
-/* Copy what's in the counter region to a temporary buffer.. */
-void *copy = malloc(&__FUZZ_COUNTERS_END - &__FUZZ_COUNTERS_START);
-memcpy(copy,
-   &__FUZZ_COUNTERS_START,
-   &__FUZZ_COUNTERS_END - &__FUZZ_COUNTERS_START);
-
-/* Map a shared region over the counter region */
-if (mmap(&__FUZZ_COUNTERS_START,
- &__FUZZ_COUNTERS_END - &__FUZZ_COUNTERS_START,
- PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_ANONYMOUS,
- 0, 0) == MAP_FAILED) {
-perror("Error: ");
-exit(1);
-}
-
-/* Copy the original data back to the counter-region */
-memcpy(&__FUZZ_COUNTERS_START, copy,
-   &__FUZZ_COUNTERS_END - &__FUZZ_COUNTERS_START);
-free(copy);
-}
-
-
diff --git a/tests/qtest/fuzz/fork_fuzz.h b/tests/qtest/fuzz/fork_fuzz.h
deleted file mode 100644
index 9ecb8b58ef..00
--- a/tests/qtest/fuzz/fork_fuzz.h
+++ /dev/null
@@ -1,23 +0,0 @@
-/*
- * Fork-based fuzzing helpers
- *
- * Copyright Red Hat Inc., 2019
- *
- * Authors:
- *  Alexander Bulekov   
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef FORK_FUZZ_H
-#define FORK_FUZZ_H
-
-extern uint8_t __FUZZ_COUNTERS_START;
-extern uint8_t __FUZZ_COUNTERS_END;
-
-void counter_shm_init(void);
-
-#endif
-
diff --git a/tests/qtest/fuzz/fork_fuzz.ld b/tests/qtest/fuzz/fork_fuzz.ld
deleted file mode 100644
index cfb88b7fdb..00
--- a/tests/qtest/fuzz/fork_fuzz.ld
+++ /dev/null
@@ -1,56 +0,0 @@
-/*
- * We adjust linker script modification to place all of the stuff that needs to
- * persist across fuzzing runs into a contiguous section of memory. Then, it is
- * easy to re-map the counter-related memory as shared.
- */
-
-SECTIONS
-{
-  .data.fuzz_start : ALIGN(4K)
-  {
-  __FUZZ_COUNTERS_START = .;
-  __start___sancov_cntrs = .;
-  *(_*sancov_cntrs);
-  __stop___sancov_cntrs = .;
-
-  /* Lowest stack counter */
-  *(__sancov_lowest_stack);
-  }
-}
-INSERT AFTER .data;
-
-SECTIONS
-{
-  .data.fuzz_ordered :
-  {
-  /*
-   * Coverage counters. They're not necessary for fuzzing, but are useful
-   * for analyzing the fuzzing performance
-   */
-  __start___llvm_prf_cnts = .;
-  *(*llvm_prf_cnts);
-  __stop___llvm_prf_cnts = .;
-
-  /* Internal Libfuzzer TracePC object which contains the ValueProfileMap 
*/
-  FuzzerTracePC*(.bss*);
-  /*
-   * In case the above line fails, explicitly specify the (mangled) name of
-   * the object we care about
-   */
-   *(.bss._ZN6fuzzer3TPCE);
-  }
-}
-INSERT AFTER .data.fuzz_start;
-
-SECTIONS
-{
-  .data.fuzz_end : ALIGN(4K)
-  {
-  __FUZZ_COUNTERS_END = .;
-  }
-}
-/*
- * Don't overwrite the SECTIONS in the default linker script. Instead insert

[PULL 06/10] fuzz/virtio-net: remove fork-based fuzzer

2023-02-16 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
---
 tests/qtest/fuzz/virtio_net_fuzz.c | 54 +++---
 1 file changed, 5 insertions(+), 49 deletions(-)

diff --git a/tests/qtest/fuzz/virtio_net_fuzz.c 
b/tests/qtest/fuzz/virtio_net_fuzz.c
index c2c15f07f0..e239875e3b 100644
--- a/tests/qtest/fuzz/virtio_net_fuzz.c
+++ b/tests/qtest/fuzz/virtio_net_fuzz.c
@@ -16,7 +16,6 @@
 #include "tests/qtest/libqtest.h"
 #include "tests/qtest/libqos/virtio-net.h"
 #include "fuzz.h"
-#include "fork_fuzz.h"
 #include "qos_fuzz.h"
 
 
@@ -115,36 +114,18 @@ static void virtio_net_fuzz_multi(QTestState *s,
 }
 }
 
-static void virtio_net_fork_fuzz(QTestState *s,
-const unsigned char *Data, size_t Size)
-{
-if (fork() == 0) {
-virtio_net_fuzz_multi(s, Data, Size, false);
-flush_events(s);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
-}
 
-static void virtio_net_fork_fuzz_check_used(QTestState *s,
+static void virtio_net_fuzz_check_used(QTestState *s,
 const unsigned char *Data, size_t Size)
 {
-if (fork() == 0) {
-virtio_net_fuzz_multi(s, Data, Size, true);
-flush_events(s);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
+virtio_net_fuzz_multi(s, Data, Size, true);
+flush_events(s);
+fuzz_reset(s);
 }
 
 static void virtio_net_pre_fuzz(QTestState *s)
 {
 qos_init_path(s);
-counter_shm_init();
 }
 
 static void *virtio_net_test_setup_socket(GString *cmd_line, void *arg)
@@ -158,23 +139,8 @@ static void *virtio_net_test_setup_socket(GString 
*cmd_line, void *arg)
 return arg;
 }
 
-static void *virtio_net_test_setup_user(GString *cmd_line, void *arg)
-{
-g_string_append_printf(cmd_line, " -netdev user,id=hs0 ");
-return arg;
-}
-
 static void register_virtio_net_fuzz_targets(void)
 {
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "virtio-net-socket",
-.description = "Fuzz the virtio-net virtual queues. Fuzz incoming "
-"traffic using the socket backend",
-.pre_fuzz = &virtio_net_pre_fuzz,
-.fuzz = virtio_net_fork_fuzz,},
-"virtio-net",
-&(QOSGraphTestOptions){.before = virtio_net_test_setup_socket}
-);
 
 fuzz_add_qos_target(&(FuzzTarget){
 .name = "virtio-net-socket-check-used",
@@ -182,20 +148,10 @@ static void register_virtio_net_fuzz_targets(void)
 "descriptors to be used. Timeout may indicate improperly handled "
 "input",
 .pre_fuzz = &virtio_net_pre_fuzz,
-.fuzz = virtio_net_fork_fuzz_check_used,},
+.fuzz = virtio_net_fuzz_check_used,},
 "virtio-net",
 &(QOSGraphTestOptions){.before = virtio_net_test_setup_socket}
 );
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "virtio-net-slirp",
-.description = "Fuzz the virtio-net virtual queues with the slirp "
-" backend. Warning: May result in network traffic emitted from the 
"
-" process. Run in an isolated network environment.",
-.pre_fuzz = &virtio_net_pre_fuzz,
-.fuzz = virtio_net_fork_fuzz,},
-"virtio-net",
-&(QOSGraphTestOptions){.before = virtio_net_test_setup_user}
-);
 }
 
 fuzz_target_init(register_virtio_net_fuzz_targets);
-- 
2.39.0

[PULL 02/10] fuzz: add fuzz_reset API

2023-02-16 Thread Alexander Bulekov

As we are converting most fuzzers to rely on reboots to reset state,
introduce an API to make sure reboots are invoked in a consistent
manner.

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/fuzz.c | 6 ++
 tests/qtest/fuzz/fuzz.h | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/fuzz/fuzz.c b/tests/qtest/fuzz/fuzz.c
index eb7520544b..3bedb81b32 100644
--- a/tests/qtest/fuzz/fuzz.c
+++ b/tests/qtest/fuzz/fuzz.c
@@ -51,6 +51,12 @@ void flush_events(QTestState *s)
 }
 }
 
+void fuzz_reset(QTestState *s)
+{
+qemu_system_reset(SHUTDOWN_CAUSE_GUEST_RESET);
+main_loop_wait(true);
+}
+
 static QTestState *qtest_setup(void)
 {
 qtest_server_set_send_handler(&qtest_client_inproc_recv, &fuzz_qts);
diff --git a/tests/qtest/fuzz/fuzz.h b/tests/qtest/fuzz/fuzz.h
index 327c1c5a55..21d1362d65 100644
--- a/tests/qtest/fuzz/fuzz.h
+++ b/tests/qtest/fuzz/fuzz.h
@@ -103,7 +103,7 @@ typedef struct FuzzTarget {
 } FuzzTarget;
 
 void flush_events(QTestState *);
-void reboot(QTestState *);
+void fuzz_reset(QTestState *);
 
 /* Use the QTest ASCII protocol or call address_space API directly?*/
 void fuzz_qtest_set_serialize(bool option);
-- 
2.39.0

[PULL 01/10] hw/sparse-mem: clear memory on reset

2023-02-16 Thread Alexander Bulekov

We use sparse-mem for fuzzing. For long-running fuzzing processes, we
eventually end up with many allocated sparse-mem pages. To avoid this,
clear the allocated pages on system-reset.

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/mem/sparse-mem.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/mem/sparse-mem.c b/hw/mem/sparse-mem.c
index e6640eb8e7..72f038d47d 100644
--- a/hw/mem/sparse-mem.c
+++ b/hw/mem/sparse-mem.c
@@ -77,6 +77,13 @@ static void sparse_mem_write(void *opaque, hwaddr addr, 
uint64_t v,
 
 }
 
+static void sparse_mem_enter_reset(Object *obj, ResetType type)
+{
+SparseMemState *s = SPARSE_MEM(obj);
+g_hash_table_remove_all(s->mapped);
+return;
+}
+
 static const MemoryRegionOps sparse_mem_ops = {
 .read = sparse_mem_read,
 .write = sparse_mem_write,
@@ -123,7 +130,8 @@ static void sparse_mem_realize(DeviceState *dev, Error 
**errp)
 
 assert(s->baseaddr + s->length > s->baseaddr);
 
-s->mapped = g_hash_table_new(NULL, NULL);
+s->mapped = g_hash_table_new_full(NULL, NULL, NULL,
+  (GDestroyNotify)g_free);
 memory_region_init_io(&s->mmio, OBJECT(s), &sparse_mem_ops, s,
   "sparse-mem", s->length);
 sysbus_init_mmio(sbd, &s->mmio);
@@ -131,12 +139,15 @@ static void sparse_mem_realize(DeviceState *dev, Error 
**errp)
 
 static void sparse_mem_class_init(ObjectClass *klass, void *data)
 {
+ResettableClass *rc = RESETTABLE_CLASS(klass);
 DeviceClass *dc = DEVICE_CLASS(klass);
 
 device_class_set_props(dc, sparse_mem_properties);
 
 dc->desc = "Sparse Memory Device";
 dc->realize = sparse_mem_realize;
+
+rc->phases.enter = sparse_mem_enter_reset;
 }
 
 static const TypeInfo sparse_mem_types[] = {
-- 
2.39.0

[PULL 04/10] fuzz/generic-fuzz: add a limit on DMA bytes written

2023-02-16 Thread Alexander Bulekov

As we have repplaced fork-based fuzzing, with reboots - we can no longer
use a timeout+exit() to avoid slow inputs. Libfuzzer has its own timer
that it uses to catch slow inputs, however these timeouts are usually
seconds-minutes long: more than enough to bog-down the fuzzing process.
However, I found that slow inputs often attempt to fill overly large DMA
requests. Thus, we can mitigate most timeouts by setting a cap on the
total number of DMA bytes written by an input.

Signed-off-by: Alexander Bulekov 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Darren Kenny 
---
 tests/qtest/fuzz/generic_fuzz.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
index f4acfa45cc..c525d22951 100644
--- a/tests/qtest/fuzz/generic_fuzz.c
+++ b/tests/qtest/fuzz/generic_fuzz.c
@@ -51,6 +51,7 @@ enum cmds {
 #define USEC_IN_SEC 10
 
 #define MAX_DMA_FILL_SIZE 0x1
+#define MAX_TOTAL_DMA_SIZE 0x1000
 
 #define PCI_HOST_BRIDGE_CFG 0xcf8
 #define PCI_HOST_BRIDGE_DATA 0xcfc
@@ -61,6 +62,7 @@ typedef struct {
 } address_range;
 
 static bool qtest_log_enabled;
+size_t dma_bytes_written;
 
 MemoryRegion *sparse_mem_mr;
 
@@ -194,6 +196,7 @@ void fuzz_dma_read_cb(size_t addr, size_t len, MemoryRegion 
*mr)
  */
 if (dma_patterns->len == 0
 || len == 0
+|| dma_bytes_written + len > MAX_TOTAL_DMA_SIZE
 || (mr != current_machine->ram && mr != sparse_mem_mr)) {
 return;
 }
@@ -266,6 +269,7 @@ void fuzz_dma_read_cb(size_t addr, size_t len, MemoryRegion 
*mr)
 fflush(stderr);
 }
 qtest_memwrite(qts_global, addr, buf, l);
+dma_bytes_written += l;
 }
 len -= l;
 buf += l;
@@ -645,6 +649,7 @@ static void generic_fuzz(QTestState *s, const unsigned char 
*Data, size_t Size)
 
 op_clear_dma_patterns(s, NULL, 0);
 pci_disabled = false;
+dma_bytes_written = 0;
 
 QPCIBus *pcibus = qpci_new_pc(s, NULL);
 g_ptr_array_foreach(fuzzable_pci_devices, pci_enum, pcibus);
-- 
2.39.0

[PULL 00/10] Replace fork-based fuzzing with reboots

2023-02-16 Thread Alexander Bulekov

Hi Peter,
The following changes since commit 6dffbe36af79e26a4d23f94a9a1c1201de99c261:

  Merge tag 'migration-20230215-pull-request' of 
https://gitlab.com/juan.quintela/qemu into staging (2023-02-16 13:09:51 +)

are available in the Git repository at:

  https://gitlab.com/a1xndr/qemu/ tags/pr-2023-02-16

for you to fetch changes up to 7d9e5f18a94792ed875a1caed2bfcd1e68a49481:

  docs/fuzz: remove mentions of fork-based fuzzing (2023-02-16 23:02:46 -0500)


Replace fork-based fuzzing with reboots.
Now the fuzzers will reboot the guest between inputs.


Alexander Bulekov (10):
  hw/sparse-mem: clear memory on reset
  fuzz: add fuzz_reset API
  fuzz/generic-fuzz: use reboots instead of forks to reset state
  fuzz/generic-fuzz: add a limit on DMA bytes written
  fuzz/virtio-scsi: remove fork-based fuzzer
  fuzz/virtio-net: remove fork-based fuzzer
  fuzz/virtio-blk: remove fork-based fuzzer
  fuzz/i440fx: remove fork-based fuzzer
  fuzz: remove fork-fuzzing scaffolding
  docs/fuzz: remove mentions of fork-based fuzzing

 docs/devel/fuzzing.rst  |  22 +--
 hw/mem/sparse-mem.c |  13 +++-
 meson.build |   4 --
 tests/qtest/fuzz/fork_fuzz.c|  41 -
 tests/qtest/fuzz/fork_fuzz.h|  23 ---
 tests/qtest/fuzz/fork_fuzz.ld   |  56 -
 tests/qtest/fuzz/fuzz.c |   6 ++
 tests/qtest/fuzz/fuzz.h |   2 +-
 tests/qtest/fuzz/generic_fuzz.c | 119 
 tests/qtest/fuzz/i440fx_fuzz.c  |  27 +---
 tests/qtest/fuzz/meson.build|   6 +-
 tests/qtest/fuzz/virtio_blk_fuzz.c  |  51 +++-
 tests/qtest/fuzz/virtio_net_fuzz.c  |  54 ++--
 tests/qtest/fuzz/virtio_scsi_fuzz.c |  51 +++-
 14 files changed, 71 insertions(+), 404 deletions(-)
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.c
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.h
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.ld

Re: [PATCH RESEND 18/18] i386: Add new property to control L2 cache topo in CPUID.04H

2023-02-16 Thread wangyanan (Y)

在 2023/2/17 11:35, Zhao Liu 写道:

On Thu, Feb 16, 2023 at 09:14:54PM +0800, wangyanan (Y) wrote:

Date: Thu, 16 Feb 2023 21:14:54 +0800
From: "wangyanan (Y)" 
Subject: Re: [PATCH RESEND 18/18] i386: Add new property to control L2
  cache topo in CPUID.04H

在 2023/2/13 17:36, Zhao Liu 写道:

From: Zhao Liu 

The property x-l2-cache-topo will be used to change the L2 cache
topology in CPUID.04H.

Now it allows user to set the L2 cache is shared in core level or
cluster level.

If user passes "-cpu x-l2-cache-topo=[core|cluster]" then older L2 cache
topology will be overrided by the new topology setting.

Currently x-l2-cache-topo only defines the share level *globally*.

Yes, will set for all CPUs.

I'm thinking how we can make the property more powerful so that it
can specify which CPUs share l2 on core level and which CPUs share
l2 on cluster level.

What would Intel's Hybrid CPUs do? Determine the l2 share level
is core or cluster according to the CPU core type (Atom or Core)?
While ARM does not have the core type concept but have CPUs
that l2 is shared on different levels in the same system.

For example, Alderlake's "core" shares 1 L2 per core and every 4 "atom"s
share 1 L2. For this case, we can set the topology as:

cluster0 has 1 "core" and cluster1 has 4 "atom". Then set L2 shared on
cluster level.

Since cluster0 has only 1 "core" type core, then L2 per "core" works.

Not sure if this idea can be applied to arm?

For a CPU topopoly where we have 2 clusters totally, 2 cores in cluster0
have their own L1/L2 cache and 2 threads in each core, 4 cores in cluster1
share one L2 cache and 1 thread in each core. The global way does not
work well.

What about defining something general, which looks like -numa config:
-cache-topo cache=l2, share_level="core", cpus='0-3'
-cache-topo cache=l2, share_level="cluster", cpus='4-7'
If we ever want to support custom share-level for L3/L1, no extra work
is needed. We can also extend the CLI to support custom cache size, etc..

If you thinks this a good idea to explore, I can work on it, since I'm
planing to add support cache topology for ARM.

Thanks,
Yanan

Thanks,
Yanan

Here we expose to user "cluster" instead of "module", to be consistent
with "cluster-id" naming.

Since CPUID.04H is used by intel CPUs, this property is available on
intel CPUs as for now.

When necessary, it can be extended to CPUID.801DH for amd CPUs.

Signed-off-by: Zhao Liu 
---
   target/i386/cpu.c | 33 -
   target/i386/cpu.h |  2 ++
   2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5816dc99b1d4..cf84c720a431 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -240,12 +240,15 @@ static uint32_t max_processor_ids_for_cache(CPUCacheInfo 
*cache,
   case CORE:
   num_ids = 1 << apicid_core_offset(topo_info);
   break;
+case MODULE:
+num_ids = 1 << apicid_module_offset(topo_info);
+break;
   case DIE:
   num_ids = 1 << apicid_die_offset(topo_info);
   break;
   default:
   /*
- * Currently there is no use case for SMT, MODULE and PACKAGE, so use
+ * Currently there is no use case for SMT and PACKAGE, so use
* assert directly to facilitate debugging.
*/
   g_assert_not_reached();
@@ -6633,6 +6636,33 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
   env->cache_info_amd.l3_cache = &legacy_l3_cache;
   }
+if (cpu->l2_cache_topo_level) {
+/*
+ * FIXME: Currently only supports changing CPUID[4] (for intel), and
+ * will support changing CPUID[0x801D] when necessary.
+ */
+if (!IS_INTEL_CPU(env)) {
+error_setg(errp, "only intel cpus supports x-l2-cache-topo");
+return;
+}
+
+if (!strcmp(cpu->l2_cache_topo_level, "core")) {
+env->cache_info_cpuid4.l2_cache->share_level = CORE;
+} else if (!strcmp(cpu->l2_cache_topo_level, "cluster")) {
+/*
+ * We expose to users "cluster" instead of "module", to be
+ * consistent with "cluster-id" naming.
+ */
+env->cache_info_cpuid4.l2_cache->share_level = MODULE;
+} else {
+error_setg(errp,
+   "x-l2-cache-topo doesn't support '%s', "
+   "and it only supports 'core' or 'cluster'",
+   cpu->l2_cache_topo_level);
+return;
+}
+}
+
   #ifndef CONFIG_USER_ONLY
   MachineState *ms = MACHINE(qdev_get_machine());
   qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
@@ -7135,6 +7165,7 @@ static Property x86_cpu_properties[] = {
false),
   DEFINE_PROP_BOOL("x-intel-pt-auto-level", X86CPU, intel_pt_auto_level,
true),
+DEFINE_PROP_STRING("x-l2-cache-topo", X86CPU, l2_cache_topo_level),
   DEFINE_PR

Re: [PATCH 03/10] fuzz/generic-fuzz: use reboots instead of forks to reset state

2023-02-16 Thread Alexander Bulekov

On 230213 1426, Darren Kenny wrote:
> Hi Alex,
> 
> On Saturday, 2023-02-04 at 23:29:44 -05, Alexander Bulekov wrote:
> > Signed-off-by: Alexander Bulekov 
> > ---
> >  tests/qtest/fuzz/generic_fuzz.c | 106 +++-
> >  1 file changed, 23 insertions(+), 83 deletions(-)
> >
> > diff --git a/tests/qtest/fuzz/generic_fuzz.c 
> > b/tests/qtest/fuzz/generic_fuzz.c
> > index 7326f6840b..c2e5642150 100644
> > --- a/tests/qtest/fuzz/generic_fuzz.c
> > +++ b/tests/qtest/fuzz/generic_fuzz.c
> > @@ -18,7 +18,6 @@
> >  #include "tests/qtest/libqtest.h"
> >  #include "tests/qtest/libqos/pci-pc.h"
> >  #include "fuzz.h"
> > -#include "fork_fuzz.h"
> >  #include "string.h"
> >  #include "exec/memory.h"
> >  #include "exec/ramblock.h"
> > @@ -29,6 +28,8 @@
> >  #include "generic_fuzz_configs.h"
> >  #include "hw/mem/sparse-mem.h"
> >  
> > +static void pci_enum(gpointer pcidev, gpointer bus);
> > +
> >  /*
> >   * SEPARATOR is used to separate "operations" in the fuzz input
> >   */
> > @@ -589,30 +590,6 @@ static void op_disable_pci(QTestState *s, const 
> > unsigned char *data, size_t len)
> >  pci_disabled = true;
> >  }
> >  
> > -static void handle_timeout(int sig)
> > -{
> > -if (qtest_log_enabled) {
> > -fprintf(stderr, "[Timeout]\n");
> > -fflush(stderr);
> > -}
> > -
> > -/*
> > - * If there is a crash, libfuzzer/ASAN forks a child to run an
> > - * "llvm-symbolizer" process for printing out a pretty stacktrace. It
> > - * communicates with this child using a pipe.  If we timeout+Exit, 
> > while
> > - * libfuzzer is still communicating with the llvm-symbolizer child, we 
> > will
> > - * be left with an orphan llvm-symbolizer process. Sometimes, this 
> > appears
> > - * to lead to a deadlock in the forkserver. Use waitpid to check if 
> > there
> > - * are any waitable children. If so, exit out of the signal-handler, 
> > and
> > - * let libfuzzer finish communicating with the child, and exit, on its 
> > own.
> > - */
> > -if (waitpid(-1, NULL, WNOHANG) == 0) {
> > -return;
> > -}
> > -
> > -_Exit(0);
> > -}
> > -
> >  /*
> >
> 
> I'm presuming that the timeout is being left to the fuzz orchestrator
> now, rather than us managing it directly in our own way?

Yes. The fuzzer should handle timeouts directly now. 

-Alex

Re: [PATCH 04/10] fuzz/generic-fuzz: add a limit on DMA bytes written

2023-02-16 Thread Alexander Bulekov

On 230213 1438, Darren Kenny wrote:
> Hi Alex,
> 
> On Saturday, 2023-02-04 at 23:29:45 -05, Alexander Bulekov wrote:
> > As we have repplaced fork-based fuzzing, with reboots - we can no longer
> > use a timeout+exit() to avoid slow inputs. Libfuzzer has its own timer
> > that it uses to catch slow inputs, however these timeouts are usually
> > seconds-minutes long: more than enough to bog-down the fuzzing process.
> > However, I found that slow inputs often attempt to fill overly large DMA
> > requests. Thus, we can mitigate most timeouts by setting a cap on the
> > total number of DMA bytes written by an input.
> >
> > Signed-off-by: Alexander Bulekov 
> > ---
> >  tests/qtest/fuzz/generic_fuzz.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/tests/qtest/fuzz/generic_fuzz.c 
> > b/tests/qtest/fuzz/generic_fuzz.c
> > index c2e5642150..eab92cbc23 100644
> > --- a/tests/qtest/fuzz/generic_fuzz.c
> > +++ b/tests/qtest/fuzz/generic_fuzz.c
> > @@ -52,6 +52,7 @@ enum cmds {
> >  #define USEC_IN_SEC 10
> >  
> >  #define MAX_DMA_FILL_SIZE 0x1
> > +#define MAX_TOTAL_DMA_SIZE 0x1000
> >  
> >  #define PCI_HOST_BRIDGE_CFG 0xcf8
> >  #define PCI_HOST_BRIDGE_DATA 0xcfc
> > @@ -64,6 +65,7 @@ typedef struct {
> >  static useconds_t timeout = DEFAULT_TIMEOUT_US;
> >  
> >  static bool qtest_log_enabled;
> > +size_t dma_bytes_written;
> >  
> >  MemoryRegion *sparse_mem_mr;
> >  
> > @@ -197,6 +199,7 @@ void fuzz_dma_read_cb(size_t addr, size_t len, 
> > MemoryRegion *mr)
> >   */
> >  if (dma_patterns->len == 0
> >  || len == 0
> > +|| dma_bytes_written > MAX_TOTAL_DMA_SIZE
> 
> NIT: Just wondering if you should check dma_bytes_written + l as opposed
>  to dma_bytes_written? It's probably not important enough given it's
>  just an artificial limit, but thought I'd ask.
>

Done :)

> >  || (mr != current_machine->ram && mr != sparse_mem_mr)) {
> >  return;
> >  }
> > @@ -269,6 +272,7 @@ void fuzz_dma_read_cb(size_t addr, size_t len, 
> > MemoryRegion *mr)
> >  fflush(stderr);
> >  }
> >  qtest_memwrite(qts_global, addr, buf, l);
> > +dma_bytes_written += l;
> >  }
> >  len -= l;
> >  buf += l;
> > @@ -648,6 +652,7 @@ static void generic_fuzz(QTestState *s, const unsigned 
> > char *Data, size_t Size)
> >  
> >  op_clear_dma_patterns(s, NULL, 0);
> >  pci_disabled = false;
> > +dma_bytes_written = 0;
> >  
> >  QPCIBus *pcibus = qpci_new_pc(s, NULL);
> >  g_ptr_array_foreach(fuzzable_pci_devices, pci_enum, pcibus);
> > -- 
> > 2.39.0
> 
> While this will still consume the existing corpus, is it likely to
> cause these existing corpus to be trimmed?

Not sure - It would affect inputs that generate a lot of DMA
activity (though those should have been caught by our previous timeout
mechanism).

> 
> Otherwise, the changes look good:
> 
> Reviewed-by: Darren Kenny 
> 
> Thanks,
> 
> Darren.

Re: [PATCH RESEND 18/18] i386: Add new property to control L2 cache topo in CPUID.04H

2023-02-16 Thread wangyanan (Y)

在 2023/2/17 11:35, Zhao Liu 写道:

On Thu, Feb 16, 2023 at 09:14:54PM +0800, wangyanan (Y) wrote:

Date: Thu, 16 Feb 2023 21:14:54 +0800
From: "wangyanan (Y)" 
Subject: Re: [PATCH RESEND 18/18] i386: Add new property to control L2
  cache topo in CPUID.04H

在 2023/2/13 17:36, Zhao Liu 写道:

From: Zhao Liu 

The property x-l2-cache-topo will be used to change the L2 cache
topology in CPUID.04H.

Now it allows user to set the L2 cache is shared in core level or
cluster level.

If user passes "-cpu x-l2-cache-topo=[core|cluster]" then older L2 cache
topology will be overrided by the new topology setting.

Currently x-l2-cache-topo only defines the share level *globally*.

Yes, will set for all CPUs.

I'm thinking how we can make the property more powerful so that it
can specify which CPUs share l2 on core level and which CPUs share
l2 on cluster level.

What would Intel's Hybrid CPUs do? Determine the l2 share level
is core or cluster according to the CPU core type (Atom or Core)?
While ARM does not have the core type concept but have CPUs
that l2 is shared on different levels in the same system.

For example, Alderlake's "core" shares 1 L2 per core and every 4 "atom"s
share 1 L2. For this case, we can set the topology as:

cluster0 has 1 "core" and cluster1 has 4 "atom". Then set L2 shared on
cluster level.

Since cluster0 has only 1 "core" type core, then L2 per "core" works.
This brings restriction to the users that cluster0 must have *1* 
core-type core.
What if we set 2 vCores in cluster0 and 4 vCores in cluster1,  and bind 
cores in

cluster0 to 2 core-type pCores and bind cores in cluster1 to 4 atom-type
pCores？I think this is a necessary use case too.

Not sure if this idea can be applied to arm?

Thanks,
Yanan

Here we expose to user "cluster" instead of "module", to be consistent
with "cluster-id" naming.

Since CPUID.04H is used by intel CPUs, this property is available on
intel CPUs as for now.

When necessary, it can be extended to CPUID.801DH for amd CPUs.

Signed-off-by: Zhao Liu 
---
   target/i386/cpu.c | 33 -
   target/i386/cpu.h |  2 ++
   2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5816dc99b1d4..cf84c720a431 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -240,12 +240,15 @@ static uint32_t max_processor_ids_for_cache(CPUCacheInfo 
*cache,
   case CORE:
   num_ids = 1 << apicid_core_offset(topo_info);
   break;
+case MODULE:
+num_ids = 1 << apicid_module_offset(topo_info);
+break;
   case DIE:
   num_ids = 1 << apicid_die_offset(topo_info);
   break;
   default:
   /*
- * Currently there is no use case for SMT, MODULE and PACKAGE, so use
+ * Currently there is no use case for SMT and PACKAGE, so use
* assert directly to facilitate debugging.
*/
   g_assert_not_reached();
@@ -6633,6 +6636,33 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
   env->cache_info_amd.l3_cache = &legacy_l3_cache;
   }
+if (cpu->l2_cache_topo_level) {
+/*
+ * FIXME: Currently only supports changing CPUID[4] (for intel), and
+ * will support changing CPUID[0x801D] when necessary.
+ */
+if (!IS_INTEL_CPU(env)) {
+error_setg(errp, "only intel cpus supports x-l2-cache-topo");
+return;
+}
+
+if (!strcmp(cpu->l2_cache_topo_level, "core")) {
+env->cache_info_cpuid4.l2_cache->share_level = CORE;
+} else if (!strcmp(cpu->l2_cache_topo_level, "cluster")) {
+/*
+ * We expose to users "cluster" instead of "module", to be
+ * consistent with "cluster-id" naming.
+ */
+env->cache_info_cpuid4.l2_cache->share_level = MODULE;
+} else {
+error_setg(errp,
+   "x-l2-cache-topo doesn't support '%s', "
+   "and it only supports 'core' or 'cluster'",
+   cpu->l2_cache_topo_level);
+return;
+}
+}
+
   #ifndef CONFIG_USER_ONLY
   MachineState *ms = MACHINE(qdev_get_machine());
   qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
@@ -7135,6 +7165,7 @@ static Property x86_cpu_properties[] = {
false),
   DEFINE_PROP_BOOL("x-intel-pt-auto-level", X86CPU, intel_pt_auto_level,
true),
+DEFINE_PROP_STRING("x-l2-cache-topo", X86CPU, l2_cache_topo_level),
   DEFINE_PROP_END_OF_LIST()
   };
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5a955431f759..aa7e96c586c7 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1987,6 +1987,8 @@ struct ArchCPU {
   int32_t thread_id;
   int32_t hv_max_vps;
+
+char *l2_cache_topo_level;
   };

[PATCH qemu v3 2/2] aspeed/fuji : correct the eeprom size

2023-02-16 Thread ~ssinprem

From: Sittisak Sinprem 

Device 24C64 the size is 64 kilobits = 8kilobyte
Device 24C02 the size is 2 kilobits = 256byte

Signed-off-by: Sittisak Sinprem 
---
 hw/arm/aspeed.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 27dda58338..40f6076b44 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -840,42 +840,46 @@ static void fuji_bmc_i2c_init(AspeedMachineState *bmc)
 i2c_slave_create_simple(i2c[17], TYPE_LM75, 0x4c);
 i2c_slave_create_simple(i2c[17], TYPE_LM75, 0x4d);
 
-at24c_eeprom_init(i2c[19], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[20], 0x50, 2 * KiB);
-at24c_eeprom_init(i2c[22], 0x52, 2 * KiB);
+/*
+* EEPROM 24c64 size is 64Kbits or 8 Kbytes
+*24c02 size is 2Kbits or 256 bytes
+*/
+at24c_eeprom_init(i2c[19], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[20], 0x50, 256);
+at24c_eeprom_init(i2c[22], 0x52, 256);
 
 i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x48);
 i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x49);
 i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x4a);
 i2c_slave_create_simple(i2c[3], TYPE_TMP422, 0x4c);
 
-at24c_eeprom_init(i2c[8], 0x51, 64 * KiB);
+at24c_eeprom_init(i2c[8], 0x51, 8 * KiB);
 i2c_slave_create_simple(i2c[8], TYPE_LM75, 0x4a);
 
 i2c_slave_create_simple(i2c[50], TYPE_LM75, 0x4c);
-at24c_eeprom_init(i2c[50], 0x52, 64 * KiB);
+at24c_eeprom_init(i2c[50], 0x52, 8 * KiB);
 i2c_slave_create_simple(i2c[51], TYPE_TMP75, 0x48);
 i2c_slave_create_simple(i2c[52], TYPE_TMP75, 0x49);
 
 i2c_slave_create_simple(i2c[59], TYPE_TMP75, 0x48);
 i2c_slave_create_simple(i2c[60], TYPE_TMP75, 0x49);
 
-at24c_eeprom_init(i2c[65], 0x53, 64 * KiB);
+at24c_eeprom_init(i2c[65], 0x53, 8 * KiB);
 i2c_slave_create_simple(i2c[66], TYPE_TMP75, 0x49);
 i2c_slave_create_simple(i2c[66], TYPE_TMP75, 0x48);
-at24c_eeprom_init(i2c[68], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[69], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[70], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[71], 0x52, 64 * KiB);
+at24c_eeprom_init(i2c[68], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[69], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[70], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[71], 0x52, 8 * KiB);
 
-at24c_eeprom_init(i2c[73], 0x53, 64 * KiB);
+at24c_eeprom_init(i2c[73], 0x53, 8 * KiB);
 i2c_slave_create_simple(i2c[74], TYPE_TMP75, 0x49);
 i2c_slave_create_simple(i2c[74], TYPE_TMP75, 0x48);
-at24c_eeprom_init(i2c[76], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[77], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[78], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[79], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[28], 0x50, 2 * KiB);
+at24c_eeprom_init(i2c[76], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[77], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[78], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[79], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[28], 0x50, 256);
 
 for (int i = 0; i < 8; i++) {
 at24c_eeprom_init(i2c[81 + i * 8], 0x56, 64 * KiB);
-- 
2.34.6

[PATCH qemu v3 0/2] hw/at24c support eeprom size less than equal 256 byte

2023-02-16 Thread ~ssinprem

- hw/at24c : modify at24c to support 1 byte address mode
- aspeed/fuji : correct the eeprom size

Sittisak Sinprem (2):
  hw/at24c : modify at24c to support 1 byte address mode
  aspeed/fuji : correct the eeprom size

 hw/arm/aspeed.c | 36 
 hw/nvram/eeprom_at24c.c | 28 +---
 2 files changed, 45 insertions(+), 19 deletions(-)

-- 
2.34.6

[PATCH qemu v3 1/2] hw/at24c : modify at24c to support 1 byte address mode

2023-02-16 Thread ~ssinprem

From: Sittisak Sinprem 

Signed-off-by: Sittisak Sinprem 
---
 hw/nvram/eeprom_at24c.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
index 3328c32814..64259cde67 100644
--- a/hw/nvram/eeprom_at24c.c
+++ b/hw/nvram/eeprom_at24c.c
@@ -41,6 +41,12 @@ struct EEPROMState {
 uint16_t cur;
 /* total size in bytes */
 uint32_t rsize;
+/* address byte number 
+ *  for  24c01, 24c02 size <= 256 byte, use only 1 byte
+ *  otherwise size > 256, use 2 byte
+ */
+uint8_t asize;
+
 bool writable;
 /* cells changed since last START? */
 bool changed;
@@ -91,7 +97,10 @@ uint8_t at24c_eeprom_recv(I2CSlave *s)
 EEPROMState *ee = AT24C_EE(s);
 uint8_t ret;
 
-if (ee->haveaddr == 1) {
+/* If got the byte address but not completely with address size
+ *  will return the invalid value
+ */
+if (ee->haveaddr > 0 && ee->haveaddr < ee->asize) {
 return 0xff;
 }
 
@@ -108,11 +117,11 @@ int at24c_eeprom_send(I2CSlave *s, uint8_t data)
 {
 EEPROMState *ee = AT24C_EE(s);
 
-if (ee->haveaddr < 2) {
+if (ee->haveaddr < ee->asize) {
 ee->cur <<= 8;
 ee->cur |= data;
 ee->haveaddr++;
-if (ee->haveaddr == 2) {
+if (ee->haveaddr == ee->asize) {
 ee->cur %= ee->rsize;
 DPRINTK("Set pointer %04x\n", ee->cur);
 }
@@ -199,6 +208,18 @@ static void at24c_eeprom_realize(DeviceState *dev, Error 
**errp)
 }
 DPRINTK("Reset read backing file\n");
 }
+
+/*
+ * If address size didn't define with property set
+ *   value is 0 as default, setting it by Rom size detecting.
+ */
+if (ee->asize == 0) {
+if (ee->rsize <= 256) {
+ee->asize = 1;
+} else {
+ee->asize = 2;
+}
+}
 }
 
 static
@@ -213,6 +234,7 @@ void at24c_eeprom_reset(DeviceState *state)
 
 static Property at24c_eeprom_props[] = {
 DEFINE_PROP_UINT32("rom-size", EEPROMState, rsize, 0),
+DEFINE_PROP_UINT8("address-size", EEPROMState, asize, 0),
 DEFINE_PROP_BOOL("writable", EEPROMState, writable, true),
 DEFINE_PROP_DRIVE("drive", EEPROMState, blk),
 DEFINE_PROP_END_OF_LIST()
-- 
2.34.6

Re: [RFC 20/52] s390x: Replace MachineState.smp access with topology helpers

2023-02-16 Thread Zhao Liu

On Thu, Feb 16, 2023 at 02:38:55PM +0100, Thomas Huth wrote:
> Date: Thu, 16 Feb 2023 14:38:55 +0100
> From: Thomas Huth 
> Subject: Re: [RFC 20/52] s390x: Replace MachineState.smp access with
>  topology helpers
> 
> On 13/02/2023 10.50, Zhao Liu wrote:
> > From: Zhao Liu 
> > 
> > When MachineState.topo is introduced, the topology related structures
> > become complicated. So we wrapped the access to topology fields of
> > MachineState.topo into some helpers, and we are using these helpers
> > to replace the use of MachineState.smp.
> > 
> > In hw/s390x/s390-virtio-ccw.c, s390_init_cpus() needs "threads per core".
> > Before s390x supports hybrid, here we use smp-specific interface to get
> > "threads per core".
> > 
> > For other cases, it's straightforward to replace topology access with
> > wrapped generic interfaces.
> ...
> > diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
> > index 3ac7ec9acf4e..d297daed1117 100644
> > --- a/target/s390x/kvm/kvm.c
> > +++ b/target/s390x/kvm/kvm.c
> > @@ -406,9 +406,11 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
> >   int kvm_arch_init_vcpu(CPUState *cs)
> >   {
> > -unsigned int max_cpus = MACHINE(qdev_get_machine())->smp.max_cpus;
> > +unsigned int max_cpus;
> >   S390CPU *cpu = S390_CPU(cs);
> > +
> >   kvm_s390_set_cpu_state(cpu, cpu->env.cpu_state);
> > +max_cpus = machine_topo_get_max_cpus(MACHINE(qdev_get_machine()));
> >   cpu->irqstate = g_malloc0(VCPU_IRQ_BUF_SIZE(max_cpus));
> >   return 0;
> >   }
> > @@ -2097,14 +2099,15 @@ int kvm_s390_set_cpu_state(S390CPU *cpu, uint8_t 
> > cpu_state)
> >   void kvm_s390_vcpu_interrupt_pre_save(S390CPU *cpu)
> >   {
> > -unsigned int max_cpus = MACHINE(qdev_get_machine())->smp.max_cpus;
> > -struct kvm_s390_irq_state irq_state = {
> > -.buf = (uint64_t) cpu->irqstate,
> > -.len = VCPU_IRQ_BUF_SIZE(max_cpus),
> > -};
> > +unsigned int max_cpus;
> > +struct kvm_s390_irq_state irq_state;
> >   CPUState *cs = CPU(cpu);
> >   int32_t bytes;
> > +max_cpus = machine_topo_get_max_cpus(MACHINE(qdev_get_machine()));
> > +irq_state.buf = (uint64_t) cpu->irqstate;
> > +irq_state.len = VCPU_IRQ_BUF_SIZE(max_cpus);
> 
>  Hi!
> 
> Please don't replace struct initializers like this. There's a reason why
> these structs like irq_state are directly initialized with "= { ... }" at
> the beginning of the function: This automatically clears all fields that are
> not mentioned, e.g. also the "flags" field of struct kvm_s390_irq_state,
> which can be very important for structs that are passed to the kernel via an
> ioctl.
> You could use memset(..., 0, ...) instead, but people tend to forget that,
> too, so we settled on using struct initializers at the beginning instead. So
> please stick to that.

Thanks Thomas! Sorry I didn't notice this, I'll fix it and be careful in the
future.

Zhao

> 
>  Thanks,
>   Thomas
> 
> 
> >   if (!kvm_check_extension(kvm_state, KVM_CAP_S390_IRQ_STATE)) {
> >   return;
> >   }
> > diff --git a/target/s390x/tcg/excp_helper.c b/target/s390x/tcg/excp_helper.c
> > index bc767f044381..e396a89d5540 100644
> > --- a/target/s390x/tcg/excp_helper.c
> > +++ b/target/s390x/tcg/excp_helper.c
> > @@ -321,7 +321,7 @@ static void do_ext_interrupt(CPUS390XState *env)
> >   if ((env->pending_int & INTERRUPT_EMERGENCY_SIGNAL) &&
> >   (env->cregs[0] & CR0_EMERGENCY_SIGNAL_SC)) {
> >   MachineState *ms = MACHINE(qdev_get_machine());
> > -unsigned int max_cpus = ms->smp.max_cpus;
> > +unsigned int max_cpus = machine_topo_get_max_cpus(ms);
> >   lowcore->ext_int_code = cpu_to_be16(EXT_EMERGENCY);
> >   cpu_addr = find_first_bit(env->emergency_signals, S390_MAX_CPUS);
>

Re: [PATCH RESEND 18/18] i386: Add new property to control L2 cache topo in CPUID.04H

2023-02-16 Thread Zhao Liu

On Thu, Feb 16, 2023 at 09:14:54PM +0800, wangyanan (Y) wrote:
> Date: Thu, 16 Feb 2023 21:14:54 +0800
> From: "wangyanan (Y)" 
> Subject: Re: [PATCH RESEND 18/18] i386: Add new property to control L2
>  cache topo in CPUID.04H
> 
> 在 2023/2/13 17:36, Zhao Liu 写道:
> > From: Zhao Liu 
> > 
> > The property x-l2-cache-topo will be used to change the L2 cache
> > topology in CPUID.04H.
> > 
> > Now it allows user to set the L2 cache is shared in core level or
> > cluster level.
> > 
> > If user passes "-cpu x-l2-cache-topo=[core|cluster]" then older L2 cache
> > topology will be overrided by the new topology setting.
> Currently x-l2-cache-topo only defines the share level *globally*.

Yes, will set for all CPUs.

> I'm thinking how we can make the property more powerful so that it
> can specify which CPUs share l2 on core level and which CPUs share
> l2 on cluster level.
> 
> What would Intel's Hybrid CPUs do? Determine the l2 share level
> is core or cluster according to the CPU core type (Atom or Core)?
> While ARM does not have the core type concept but have CPUs
> that l2 is shared on different levels in the same system.

For example, Alderlake's "core" shares 1 L2 per core and every 4 "atom"s
share 1 L2. For this case, we can set the topology as:

cluster0 has 1 "core" and cluster1 has 4 "atom". Then set L2 shared on
cluster level.

Since cluster0 has only 1 "core" type core, then L2 per "core" works.

Not sure if this idea can be applied to arm?

> 
> Thanks,
> Yanan
> > Here we expose to user "cluster" instead of "module", to be consistent
> > with "cluster-id" naming.
> > 
> > Since CPUID.04H is used by intel CPUs, this property is available on
> > intel CPUs as for now.
> > 
> > When necessary, it can be extended to CPUID.801DH for amd CPUs.
> > 
> > Signed-off-by: Zhao Liu 
> > ---
> >   target/i386/cpu.c | 33 -
> >   target/i386/cpu.h |  2 ++
> >   2 files changed, 34 insertions(+), 1 deletion(-)
> > 
> > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > index 5816dc99b1d4..cf84c720a431 100644
> > --- a/target/i386/cpu.c
> > +++ b/target/i386/cpu.c
> > @@ -240,12 +240,15 @@ static uint32_t 
> > max_processor_ids_for_cache(CPUCacheInfo *cache,
> >   case CORE:
> >   num_ids = 1 << apicid_core_offset(topo_info);
> >   break;
> > +case MODULE:
> > +num_ids = 1 << apicid_module_offset(topo_info);
> > +break;
> >   case DIE:
> >   num_ids = 1 << apicid_die_offset(topo_info);
> >   break;
> >   default:
> >   /*
> > - * Currently there is no use case for SMT, MODULE and PACKAGE, so 
> > use
> > + * Currently there is no use case for SMT and PACKAGE, so use
> >* assert directly to facilitate debugging.
> >*/
> >   g_assert_not_reached();
> > @@ -6633,6 +6636,33 @@ static void x86_cpu_realizefn(DeviceState *dev, 
> > Error **errp)
> >   env->cache_info_amd.l3_cache = &legacy_l3_cache;
> >   }
> > +if (cpu->l2_cache_topo_level) {
> > +/*
> > + * FIXME: Currently only supports changing CPUID[4] (for intel), 
> > and
> > + * will support changing CPUID[0x801D] when necessary.
> > + */
> > +if (!IS_INTEL_CPU(env)) {
> > +error_setg(errp, "only intel cpus supports x-l2-cache-topo");
> > +return;
> > +}
> > +
> > +if (!strcmp(cpu->l2_cache_topo_level, "core")) {
> > +env->cache_info_cpuid4.l2_cache->share_level = CORE;
> > +} else if (!strcmp(cpu->l2_cache_topo_level, "cluster")) {
> > +/*
> > + * We expose to users "cluster" instead of "module", to be
> > + * consistent with "cluster-id" naming.
> > + */
> > +env->cache_info_cpuid4.l2_cache->share_level = MODULE;
> > +} else {
> > +error_setg(errp,
> > +   "x-l2-cache-topo doesn't support '%s', "
> > +   "and it only supports 'core' or 'cluster'",
> > +   cpu->l2_cache_topo_level);
> > +return;
> > +}
> > +}
> > +
> >   #ifndef CONFIG_USER_ONLY
> >   MachineState *ms = MACHINE(qdev_get_machine());
> >   qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
> > @@ -7135,6 +7165,7 @@ static Property x86_cpu_properties[] = {
> >false),
> >   DEFINE_PROP_BOOL("x-intel-pt-auto-level", X86CPU, intel_pt_auto_level,
> >true),
> > +DEFINE_PROP_STRING("x-l2-cache-topo", X86CPU, l2_cache_topo_level),
> >   DEFINE_PROP_END_OF_LIST()
> >   };
> > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > index 5a955431f759..aa7e96c586c7 100644
> > --- a/target/i386/cpu.h
> > +++ b/target/i386/cpu.h
> > @@ -1987,6 +1987,8 @@ struct ArchCPU {
> >   int32_t thread_id;
> >   int32_t hv_max_vps;
> > +
> > +char *l2_cache_topo_level;
> >   };
>

Re: [PATCH qemu v2 1/2] hw/at24c : modify at24c to support 1 byte address mode

2023-02-16 Thread Sittisak Sinprem

the rebase auto merge failure, I will resend patches again.

On Fri, Feb 17, 2023 at 10:12 AM ~ssinprem  wrote:
>
> From: Sittisak Sinprem 
>
> Signed-off-by: Sittisak Sinprem 
> ---
>  hw/nvram/eeprom_at24c.c | 46 +
>  1 file changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
> index 3328c32814..0cb650d635 100644
> --- a/hw/nvram/eeprom_at24c.c
> +++ b/hw/nvram/eeprom_at24c.c
> @@ -41,6 +41,12 @@ struct EEPROMState {
>  uint16_t cur;
>  /* total size in bytes */
>  uint32_t rsize;
> +/* address byte number
> + *  for  24c01, 24c02 size <= 256 byte, use only 1 byte
> + *  otherwise size > 256, use 2 byte
> + */
> +uint8_t asize;
> +
>  bool writable;
>  /* cells changed since last START? */
>  bool changed;
> @@ -91,7 +97,7 @@ uint8_t at24c_eeprom_recv(I2CSlave *s)
>  EEPROMState *ee = AT24C_EE(s);
>  uint8_t ret;
>
> -if (ee->haveaddr == 1) {
> +if (ee->haveaddr > 0 && ee->haveaddr < ee->asize) {
>  return 0xff;
>  }
>
> @@ -108,11 +114,11 @@ int at24c_eeprom_send(I2CSlave *s, uint8_t data)
>  {
>  EEPROMState *ee = AT24C_EE(s);
>
> -if (ee->haveaddr < 2) {
> +if (ee->haveaddr < ee->asize) {
>  ee->cur <<= 8;
>  ee->cur |= data;
>  ee->haveaddr++;
> -if (ee->haveaddr == 2) {
> +if (ee->haveaddr == ee->asize) {
>  ee->cur %= ee->rsize;
>  DPRINTK("Set pointer %04x\n", ee->cur);
>  }
> @@ -184,6 +190,29 @@ static void at24c_eeprom_realize(DeviceState *dev, Error 
> **errp)
>  }
>
>  ee->mem = g_malloc0(ee->rsize);
> +
> +/*
> + * If address size didn't define with property set
> + *  setting it from Rom size
> + */
> +if (ee->asize == 0) {
> +if (ee->rsize <= 256) {
> +ee->asize = 1;
> +} else {
> +ee->asize = 2;
> +}
> +}
> +}
> +
> +static
> +void at24c_eeprom_reset(DeviceState *state)
> +{
> +EEPROMState *ee = AT24C_EE(state);
> +
> +ee->changed = false;
> +ee->cur = 0;
> +ee->haveaddr = 0;
> +
>  memset(ee->mem, 0, ee->rsize);
>
>  if (ee->init_rom) {
> @@ -201,18 +230,9 @@ static void at24c_eeprom_realize(DeviceState *dev, Error 
> **errp)
>  }
>  }
>
> -static
> -void at24c_eeprom_reset(DeviceState *state)
> -{
> -EEPROMState *ee = AT24C_EE(state);
> -
> -ee->changed = false;
> -ee->cur = 0;
> -ee->haveaddr = 0;
> -}
> -
>  static Property at24c_eeprom_props[] = {
>  DEFINE_PROP_UINT32("rom-size", EEPROMState, rsize, 0),
> +DEFINE_PROP_UINT8("address-size", EEPROMState, asize, 0),
>  DEFINE_PROP_BOOL("writable", EEPROMState, writable, true),
>  DEFINE_PROP_DRIVE("drive", EEPROMState, blk),
>  DEFINE_PROP_END_OF_LIST()
> --
> 2.34.6
>

Re: [PATCH qemu v2 2/2] aspeed/fuji : correct the eeprom size

2023-02-16 Thread Sittisak Sinprem

the rebase auto merge failure, I will resend patches again.



On Fri, Feb 17, 2023 at 10:13 AM ~ssinprem  wrote:
>
> From: Sittisak Sinprem 
>
> Device 24C64 the size is 64 kilobits = 8kilobyte
> Device 24C02 the size is 2 kilobits = 256byte
>
> Signed-off-by: Sittisak Sinprem 
> ---
>  hw/arm/aspeed.c | 36 
>  1 file changed, 20 insertions(+), 16 deletions(-)
>
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 27dda58338..40f6076b44 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -840,42 +840,46 @@ static void fuji_bmc_i2c_init(AspeedMachineState *bmc)
>  i2c_slave_create_simple(i2c[17], TYPE_LM75, 0x4c);
>  i2c_slave_create_simple(i2c[17], TYPE_LM75, 0x4d);
>
> -at24c_eeprom_init(i2c[19], 0x52, 64 * KiB);
> -at24c_eeprom_init(i2c[20], 0x50, 2 * KiB);
> -at24c_eeprom_init(i2c[22], 0x52, 2 * KiB);
> +/*
> +* EEPROM 24c64 size is 64Kbits or 8 Kbytes
> +*24c02 size is 2Kbits or 256 bytes
> +*/
> +at24c_eeprom_init(i2c[19], 0x52, 8 * KiB);
> +at24c_eeprom_init(i2c[20], 0x50, 256);
> +at24c_eeprom_init(i2c[22], 0x52, 256);
>
>  i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x48);
>  i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x49);
>  i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x4a);
>  i2c_slave_create_simple(i2c[3], TYPE_TMP422, 0x4c);
>
> -at24c_eeprom_init(i2c[8], 0x51, 64 * KiB);
> +at24c_eeprom_init(i2c[8], 0x51, 8 * KiB);
>  i2c_slave_create_simple(i2c[8], TYPE_LM75, 0x4a);
>
>  i2c_slave_create_simple(i2c[50], TYPE_LM75, 0x4c);
> -at24c_eeprom_init(i2c[50], 0x52, 64 * KiB);
> +at24c_eeprom_init(i2c[50], 0x52, 8 * KiB);
>  i2c_slave_create_simple(i2c[51], TYPE_TMP75, 0x48);
>  i2c_slave_create_simple(i2c[52], TYPE_TMP75, 0x49);
>
>  i2c_slave_create_simple(i2c[59], TYPE_TMP75, 0x48);
>  i2c_slave_create_simple(i2c[60], TYPE_TMP75, 0x49);
>
> -at24c_eeprom_init(i2c[65], 0x53, 64 * KiB);
> +at24c_eeprom_init(i2c[65], 0x53, 8 * KiB);
>  i2c_slave_create_simple(i2c[66], TYPE_TMP75, 0x49);
>  i2c_slave_create_simple(i2c[66], TYPE_TMP75, 0x48);
> -at24c_eeprom_init(i2c[68], 0x52, 64 * KiB);
> -at24c_eeprom_init(i2c[69], 0x52, 64 * KiB);
> -at24c_eeprom_init(i2c[70], 0x52, 64 * KiB);
> -at24c_eeprom_init(i2c[71], 0x52, 64 * KiB);
> +at24c_eeprom_init(i2c[68], 0x52, 8 * KiB);
> +at24c_eeprom_init(i2c[69], 0x52, 8 * KiB);
> +at24c_eeprom_init(i2c[70], 0x52, 8 * KiB);
> +at24c_eeprom_init(i2c[71], 0x52, 8 * KiB);
>
> -at24c_eeprom_init(i2c[73], 0x53, 64 * KiB);
> +at24c_eeprom_init(i2c[73], 0x53, 8 * KiB);
>  i2c_slave_create_simple(i2c[74], TYPE_TMP75, 0x49);
>  i2c_slave_create_simple(i2c[74], TYPE_TMP75, 0x48);
> -at24c_eeprom_init(i2c[76], 0x52, 64 * KiB);
> -at24c_eeprom_init(i2c[77], 0x52, 64 * KiB);
> -at24c_eeprom_init(i2c[78], 0x52, 64 * KiB);
> -at24c_eeprom_init(i2c[79], 0x52, 64 * KiB);
> -at24c_eeprom_init(i2c[28], 0x50, 2 * KiB);
> +at24c_eeprom_init(i2c[76], 0x52, 8 * KiB);
> +at24c_eeprom_init(i2c[77], 0x52, 8 * KiB);
> +at24c_eeprom_init(i2c[78], 0x52, 8 * KiB);
> +at24c_eeprom_init(i2c[79], 0x52, 8 * KiB);
> +at24c_eeprom_init(i2c[28], 0x50, 256);
>
>  for (int i = 0; i < 8; i++) {
>  at24c_eeprom_init(i2c[81 + i * 8], 0x56, 64 * KiB);
> --
> 2.34.6

Re: [RFC 42/52] hw/machine: Add hybrid_supported in generic topo properties

2023-02-16 Thread Zhao Liu

On Thu, Feb 16, 2023 at 08:28:37PM +0800, wangyanan (Y) wrote:
> Date: Thu, 16 Feb 2023 20:28:37 +0800
> From: "wangyanan (Y)" 
> Subject: Re: [RFC 42/52] hw/machine: Add hybrid_supported in generic topo
>  properties
> 
> 在 2023/2/15 10:53, Zhao Liu 写道:
> > On Tue, Feb 14, 2023 at 09:46:50AM +0800, wangyanan (Y) wrote:
> > > Date: Tue, 14 Feb 2023 09:46:50 +0800
> > > From: "wangyanan (Y)" 
> > > Subject: Re: [RFC 42/52] hw/machine: Add hybrid_supported in generic topo
> > >   properties
> > > 
> > > Hi Zhao,
> > > 
> > > 在 2023/2/13 17:50, Zhao Liu 写道:
> > > > From: Zhao Liu 
> > > > 
> > > > Since hybrid cpu topology configuration can benefit not only x86, but
> > > > also other architectures/platforms that have supported (in real
> > > > machines) or will support hybrid CPU topology, "-hybrid" can be generic.
> > > > 
> > > > So add the generic topology property to configure if support hybrid
> > > > cpu topology for architectures/platforms in SmpCompatProps.
> > > > 
> > > > Also rename SmpCompatProps to TopoCompatProps to make this structure
> > > > more generic for both smp topology and hybrid topology.
> > > > 
> > > > Signed-off-by: Zhao Liu 
> > > > ---
> > > >include/hw/boards.h | 15 +++
> > > >1 file changed, 11 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > > index 34ec035b5c9f..17be3485e823 100644
> > > > --- a/include/hw/boards.h
> > > > +++ b/include/hw/boards.h
> > > > @@ -127,19 +127,26 @@ typedef struct {
> > > >} CPUArchIdList;
> > > >/**
> > > > - * SMPCompatProps:
> > > > - * @prefer_sockets - whether sockets are preferred over cores in smp 
> > > > parsing
> > > > + * TopoCompatProps:
> > > > + * @hybrid_support - whether hybrid cpu topology are supported by 
> > > > machine.
> > > inconsistent with the name in the definition below.
> > Thanks! Will fix.
> > 
> > > > + *   Note that hybrid cpu topology requires to specify 
> > > > the
> > > > + *   topology of each core so that there will no 
> > > > longer be
> > > > + *   a default core topology, thus prefer_sockets 
> > > > won't work
> > > > + *   when hybrid_support is enabled.
> > > > + * @prefer_sockets - whether sockets are preferred over cores in smp 
> > > > parsing.
> > > > + *   Not work when hybrid_support is enabled.
> > > > * @dies_supported - whether dies are supported by the machine
> > > > * @clusters_supported - whether clusters are supported by the 
> > > > machine
> > > > * @has_clusters - whether clusters are explicitly specified in the 
> > > > user
> > > > * provided SMP configuration
> > > > */
> > > >typedef struct {
> > > > +bool hybrid_supported;
> > > >bool prefer_sockets;
> > > >bool dies_supported;
> > > >bool clusters_supported;
> > > >bool has_clusters;
> > > > -} SMPCompatProps;
> > > > +} TopoCompatProps;
> > > Also here. "Rename SMPCompatProps to TopoCompatProps and
> > > move it to cpu-topology.h and adapt the code" should be organized
> > > in one or more separate patches, being pre-patches together with
> > > the conversion of CpuTopology before.
> > Do you think TopoCompatProps/SMPCompatProps should also be moved
> > into cpu-topology.h? It seems that SMPCompatProps is a collection
> > of properties of MachineClass.
> TopoCompatProps holds properties all about CPU topology, I think we
> can do this, cpu-topology.h will be included in boards.h any way. But it's
> ups to you whether to do this.😉

Yeah, it makes sense to manage all topologically related.
So I will, thanks!

> 
> Thanks,
> Yanan
> > > And put the "hybrid_supported"
> > > extension into another patch. Would this make it easier to review?
> > Yes, I agree. Thanks!
> > 
> > Zhao
> > 
> > > Thanks,
> > > Yanan
> > > >/**
> > > > * MachineClass:
> > > > @@ -281,7 +288,7 @@ struct MachineClass {
> > > >bool nvdimm_supported;
> > > >bool numa_mem_supported;
> > > >bool auto_enable_numa;
> > > > -SMPCompatProps smp_props;
> > > > +TopoCompatProps smp_props;
> > > >const char *default_ram_id;
> > > >HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
>

Re: [RFC 41/52] machine: Introduce core_type() hook

2023-02-16 Thread Zhao Liu

On Thu, Feb 16, 2023 at 08:15:23PM +0800, wangyanan (Y) wrote:
> Date: Thu, 16 Feb 2023 20:15:23 +0800
> From: "wangyanan (Y)" 
> Subject: Re: [RFC 41/52] machine: Introduce core_type() hook
> 
> Hi Zhao,
> 
> 在 2023/2/13 17:50, Zhao Liu 写道:
> > From: Zhao Liu 
> > 
> > Since supported core types are architecture specific, we need this hook
> > to allow archs define its own parsing or validation method.
> > 
> > As the example, add the x86 core_type() which will be used in "-hybrid"
> > parameter parsing.
> > 
> > Signed-off-by: Zhao Liu 
> > ---
> >   hw/core/machine-topo.c | 14 ++
> >   hw/core/machine.c  |  1 +
> >   hw/i386/x86.c  | 15 +++
> >   include/hw/boards.h|  7 +++
> >   4 files changed, 37 insertions(+)
> > 
> > diff --git a/hw/core/machine-topo.c b/hw/core/machine-topo.c
> > index 12c05510c1b5..f9ab08a1252e 100644
> > --- a/hw/core/machine-topo.c
> > +++ b/hw/core/machine-topo.c
> > @@ -352,3 +352,17 @@ void machine_parse_smp_config(MachineState *ms,
> >   return;
> >   }
> >   }
> > +
> > +/*
> > + * machine_parse_hybrid_core_type: the default hook to parse hybrid core
> > + * type corresponding to the coretype
> > + * string option.
> > + */
> > +int machine_parse_hybrid_core_type(MachineState *ms, const char *coretype)
> > +{
> > +if (strcmp(coretype, "") == 0 || strcmp(coretype, "none") == 0) {
> > +return 0;
> > +}
> > +
> > +return -1;
> > +}
> Is it possible that coretype can be NULL?
> What would *coretype be if the users don't explicitly specify coretype
> in the command line?

At present, the coretype field cannot be omitted, which requires other code
changes to support omission (if omission is required in the future, there
should be an arch-specific method to supplement the default coretype at the
same time).

> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index fad990f49b03..acc32b3be5f6 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -926,6 +926,7 @@ static void machine_class_init(ObjectClass *oc, void 
> > *data)
> >* On Linux, each node's border has to be 8MB aligned
> >*/
> >   mc->numa_mem_align_shift = 23;
> > +mc->core_type = machine_parse_hybrid_core_type;
> >   object_class_property_add_str(oc, "kernel",
> >   machine_get_kernel, machine_set_kernel);
> > diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> > index f381fdc43180..f58a90359170 100644
> > --- a/hw/i386/x86.c
> > +++ b/hw/i386/x86.c
> > @@ -1569,6 +1569,20 @@ static void machine_set_sgx_epc(Object *obj, Visitor 
> > *v, const char *name,
> >   qapi_free_SgxEPCList(list);
> >   }
> > +static int x86_parse_hybrid_core_type(MachineState *ms, const char 
> > *coretype)
> > +{
> > +X86HybridCoreType type;
> > +
> > +if (strcmp(coretype, "atom") == 0) {
> > +type = INTEL_ATOM_TYPE;
> > +} else if (strcmp(coretype, "core") == 0) {
> > +type = INTEL_CORE_TYPE;
> > +} else {
> > +type = INVALID_HYBRID_TYPE;
> > +}
> What about:
> INTEL_CORE_TYPE_ATOM
> INTEL_CORE_TYPE_CORE
> X86_CORE_TYPE_UNKNOWN ?
> just a suggestion.

It looks better! Thanks.

> 
> Thanks,
> Yanan
> > +return type;
> > +}
> > +
> >   static void x86_machine_initfn(Object *obj)
> >   {
> >   X86MachineState *x86ms = X86_MACHINE(obj);
> > @@ -1596,6 +1610,7 @@ static void x86_machine_class_init(ObjectClass *oc, 
> > void *data)
> >   x86mc->save_tsc_khz = true;
> >   x86mc->fwcfg_dma_enabled = true;
> >   nc->nmi_monitor_handler = x86_nmi;
> > +mc->core_type = x86_parse_hybrid_core_type;
> >   object_class_property_add(oc, X86_MACHINE_SMM, "OnOffAuto",
> >   x86_machine_get_smm, x86_machine_set_smm,
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index 9364c90d5f1a..34ec035b5c9f 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -36,6 +36,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
> >  Error **errp);
> >   void machine_parse_smp_config(MachineState *ms,
> > const SMPConfiguration *config, Error 
> > **errp);
> > +int machine_parse_hybrid_core_type(MachineState *ms, const char *coretype);
> >   /**
> >* machine_class_allow_dynamic_sysbus_dev: Add type to list of valid 
> > devices
> > @@ -199,6 +200,11 @@ typedef struct {
> >*Return the type of KVM corresponding to the kvm-type string option 
> > or
> >*computed based on other criteria such as the host kernel 
> > capabilities.
> >*kvm-type may be NULL if it is not needed.
> > + * @core_type:
> > + *Return the type of hybrid cores corresponding to the coretype string
> > + *option. The default hook only accept "none" or "" since the most 
> > generic
> > + *core topology should not specify any specific core type. Each arch 
> > can
> > + *define its own core_type(

Re: [RFC 23/52] arm: Replace MachineState.smp access with topology helpers

2023-02-16 Thread Zhao Liu

On Thu, Feb 16, 2023 at 06:46:30PM +0800, wangyanan (Y) wrote:
> Date: Thu, 16 Feb 2023 18:46:30 +0800
> From: "wangyanan (Y)" 
> Subject: Re: [RFC 23/52] arm: Replace MachineState.smp access with topology
>  helpers
> 
> 在 2023/2/13 17:50, Zhao Liu 写道:
> > From: Zhao Liu 
> > 
> > When MachineState.topo is introduced, the topology related structures
> > become complicated. So we wrapped the access to topology fields of
> > MachineState.topo into some helpers, and we are using these helpers
> > to replace the use of MachineState.smp.
> > 
> > Before arm supports hybrid, here we use smp-specific interface to get
> > "threads per core" and "cores per cluster".
> > 
> > For other cases, it's straightforward to replace topology access with
> > wrapped generic interfaces.
> Sorry. I have not yet understand the necessity of the mixed use
> of generic helpers and smp specific helpers😉. For a machine, the
> topo type is either smp or hybrid. So far the ARM virt machine's
> topo type is always smp, I don't see the difference between
> machine_topo_get_cores and machine_topo_get_smp_cores.

For hybrid, the cpu index is necessary.
But for the common usage of smp, people don't care about the cpu index,
so the cpu index cannot be obtained in many places.

Of course, in this smp case, it is also possible to pass in any cpu index
for machine_topo_get_cores() (such as machine_topo_get_cores(0)), but an
irrelevant cpu index always looks strange...so I introduced the
smp-specific interface.

> 
> When we want to support hybrid for ARM, change the naming
> of variables will be enough.
> 
> Thanks,
> Yanan
> > Cc: Jean-Christophe Dubois 
> > Cc: Andrey Smirnov 
> > Cc: Radoslaw Biernacki 
> > Cc: Leif Lindholm 
> > Cc: Shannon Zhao 
> > Cc: Alistair Francis 
> > Cc: Edgar E. Iglesias 
> > Signed-off-by: Zhao Liu 
> > ---
> >   hw/arm/fsl-imx6.c|  4 ++--
> >   hw/arm/fsl-imx6ul.c  |  4 ++--
> >   hw/arm/fsl-imx7.c|  4 ++--
> >   hw/arm/highbank.c|  2 +-
> >   hw/arm/realview.c|  2 +-
> >   hw/arm/sbsa-ref.c|  8 +++
> >   hw/arm/vexpress.c|  2 +-
> >   hw/arm/virt-acpi-build.c |  4 ++--
> >   hw/arm/virt.c| 50 ++--
> >   hw/arm/xlnx-zynqmp.c |  6 ++---
> >   include/hw/arm/virt.h|  2 +-
> >   target/arm/cpu.c |  2 +-
> >   target/arm/cpu_tcg.c |  2 +-
> >   target/arm/kvm.c |  2 +-
> >   14 files changed, 50 insertions(+), 44 deletions(-)
> > 
> > diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
> > index 00dafe3f62de..e94dec5e6c8d 100644
> > --- a/hw/arm/fsl-imx6.c
> > +++ b/hw/arm/fsl-imx6.c
> > @@ -41,7 +41,7 @@ static void fsl_imx6_init(Object *obj)
> >   char name[NAME_SIZE];
> >   int i;
> > -for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX6_NUM_CPUS); i++) {
> > +for (i = 0; i < MIN(machine_topo_get_cpus(ms), FSL_IMX6_NUM_CPUS); 
> > i++) {
> >   snprintf(name, NAME_SIZE, "cpu%d", i);
> >   object_initialize_child(obj, name, &s->cpu[i],
> >   ARM_CPU_TYPE_NAME("cortex-a9"));
> > @@ -108,7 +108,7 @@ static void fsl_imx6_realize(DeviceState *dev, Error 
> > **errp)
> >   FslIMX6State *s = FSL_IMX6(dev);
> >   uint16_t i;
> >   Error *err = NULL;
> > -unsigned int smp_cpus = ms->smp.cpus;
> > +unsigned int smp_cpus = machine_topo_get_cpus(ms);
> >   if (smp_cpus > FSL_IMX6_NUM_CPUS) {
> >   error_setg(errp, "%s: Only %d CPUs are supported (%d requested)",
> > diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
> > index d88d6cc1c5f9..1216b7ff1a92 100644
> > --- a/hw/arm/fsl-imx6ul.c
> > +++ b/hw/arm/fsl-imx6ul.c
> > @@ -160,9 +160,9 @@ static void fsl_imx6ul_realize(DeviceState *dev, Error 
> > **errp)
> >   SysBusDevice *sbd;
> >   DeviceState *d;
> > -if (ms->smp.cpus > 1) {
> > +if (machine_topo_get_cpus(ms) > 1) {
> >   error_setg(errp, "%s: Only a single CPU is supported (%d 
> > requested)",
> > -   TYPE_FSL_IMX6UL, ms->smp.cpus);
> > +   TYPE_FSL_IMX6UL, machine_topo_get_cpus(ms));
> >   return;
> >   }
> > diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
> > index afc74807990f..f3e569a6ec29 100644
> > --- a/hw/arm/fsl-imx7.c
> > +++ b/hw/arm/fsl-imx7.c
> > @@ -36,7 +36,7 @@ static void fsl_imx7_init(Object *obj)
> >   char name[NAME_SIZE];
> >   int i;
> > -for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX7_NUM_CPUS); i++) {
> > +for (i = 0; i < MIN(machine_topo_get_cpus(ms), FSL_IMX7_NUM_CPUS); 
> > i++) {
> >   snprintf(name, NAME_SIZE, "cpu%d", i);
> >   object_initialize_child(obj, name, &s->cpu[i],
> >   ARM_CPU_TYPE_NAME("cortex-a7"));
> > @@ -148,7 +148,7 @@ static void fsl_imx7_realize(DeviceState *dev, Error 
> > **errp)
> >   int i;
> >   qemu_irq irq;
> >   char name[NAME_SIZE];
> > -unsigned int smp_cpus = ms->smp.cpus;
> > +unsigned int s

[PATCH qemu v2 1/2] hw/at24c : modify at24c to support 1 byte address mode

2023-02-16 Thread ~ssinprem

From: Sittisak Sinprem 

Signed-off-by: Sittisak Sinprem 
---
 hw/nvram/eeprom_at24c.c | 46 +
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
index 3328c32814..0cb650d635 100644
--- a/hw/nvram/eeprom_at24c.c
+++ b/hw/nvram/eeprom_at24c.c
@@ -41,6 +41,12 @@ struct EEPROMState {
 uint16_t cur;
 /* total size in bytes */
 uint32_t rsize;
+/* address byte number 
+ *  for  24c01, 24c02 size <= 256 byte, use only 1 byte
+ *  otherwise size > 256, use 2 byte
+ */
+uint8_t asize;
+
 bool writable;
 /* cells changed since last START? */
 bool changed;
@@ -91,7 +97,7 @@ uint8_t at24c_eeprom_recv(I2CSlave *s)
 EEPROMState *ee = AT24C_EE(s);
 uint8_t ret;
 
-if (ee->haveaddr == 1) {
+if (ee->haveaddr > 0 && ee->haveaddr < ee->asize) {
 return 0xff;
 }
 
@@ -108,11 +114,11 @@ int at24c_eeprom_send(I2CSlave *s, uint8_t data)
 {
 EEPROMState *ee = AT24C_EE(s);
 
-if (ee->haveaddr < 2) {
+if (ee->haveaddr < ee->asize) {
 ee->cur <<= 8;
 ee->cur |= data;
 ee->haveaddr++;
-if (ee->haveaddr == 2) {
+if (ee->haveaddr == ee->asize) {
 ee->cur %= ee->rsize;
 DPRINTK("Set pointer %04x\n", ee->cur);
 }
@@ -184,6 +190,29 @@ static void at24c_eeprom_realize(DeviceState *dev, Error 
**errp)
 }
 
 ee->mem = g_malloc0(ee->rsize);
+
+/*
+ * If address size didn't define with property set
+ *  setting it from Rom size
+ */
+if (ee->asize == 0) {
+if (ee->rsize <= 256) {
+ee->asize = 1;
+} else {
+ee->asize = 2;
+}
+}
+}
+
+static
+void at24c_eeprom_reset(DeviceState *state)
+{
+EEPROMState *ee = AT24C_EE(state);
+
+ee->changed = false;
+ee->cur = 0;
+ee->haveaddr = 0;
+
 memset(ee->mem, 0, ee->rsize);
 
 if (ee->init_rom) {
@@ -201,18 +230,9 @@ static void at24c_eeprom_realize(DeviceState *dev, Error 
**errp)
 }
 }
 
-static
-void at24c_eeprom_reset(DeviceState *state)
-{
-EEPROMState *ee = AT24C_EE(state);
-
-ee->changed = false;
-ee->cur = 0;
-ee->haveaddr = 0;
-}
-
 static Property at24c_eeprom_props[] = {
 DEFINE_PROP_UINT32("rom-size", EEPROMState, rsize, 0),
+DEFINE_PROP_UINT8("address-size", EEPROMState, asize, 0),
 DEFINE_PROP_BOOL("writable", EEPROMState, writable, true),
 DEFINE_PROP_DRIVE("drive", EEPROMState, blk),
 DEFINE_PROP_END_OF_LIST()
-- 
2.34.6

[PATCH qemu v2 2/2] aspeed/fuji : correct the eeprom size

2023-02-16 Thread ~ssinprem

From: Sittisak Sinprem 

Device 24C64 the size is 64 kilobits = 8kilobyte
Device 24C02 the size is 2 kilobits = 256byte

Signed-off-by: Sittisak Sinprem 
---
 hw/arm/aspeed.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 27dda58338..40f6076b44 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -840,42 +840,46 @@ static void fuji_bmc_i2c_init(AspeedMachineState *bmc)
 i2c_slave_create_simple(i2c[17], TYPE_LM75, 0x4c);
 i2c_slave_create_simple(i2c[17], TYPE_LM75, 0x4d);
 
-at24c_eeprom_init(i2c[19], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[20], 0x50, 2 * KiB);
-at24c_eeprom_init(i2c[22], 0x52, 2 * KiB);
+/*
+* EEPROM 24c64 size is 64Kbits or 8 Kbytes
+*24c02 size is 2Kbits or 256 bytes
+*/
+at24c_eeprom_init(i2c[19], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[20], 0x50, 256);
+at24c_eeprom_init(i2c[22], 0x52, 256);
 
 i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x48);
 i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x49);
 i2c_slave_create_simple(i2c[3], TYPE_LM75, 0x4a);
 i2c_slave_create_simple(i2c[3], TYPE_TMP422, 0x4c);
 
-at24c_eeprom_init(i2c[8], 0x51, 64 * KiB);
+at24c_eeprom_init(i2c[8], 0x51, 8 * KiB);
 i2c_slave_create_simple(i2c[8], TYPE_LM75, 0x4a);
 
 i2c_slave_create_simple(i2c[50], TYPE_LM75, 0x4c);
-at24c_eeprom_init(i2c[50], 0x52, 64 * KiB);
+at24c_eeprom_init(i2c[50], 0x52, 8 * KiB);
 i2c_slave_create_simple(i2c[51], TYPE_TMP75, 0x48);
 i2c_slave_create_simple(i2c[52], TYPE_TMP75, 0x49);
 
 i2c_slave_create_simple(i2c[59], TYPE_TMP75, 0x48);
 i2c_slave_create_simple(i2c[60], TYPE_TMP75, 0x49);
 
-at24c_eeprom_init(i2c[65], 0x53, 64 * KiB);
+at24c_eeprom_init(i2c[65], 0x53, 8 * KiB);
 i2c_slave_create_simple(i2c[66], TYPE_TMP75, 0x49);
 i2c_slave_create_simple(i2c[66], TYPE_TMP75, 0x48);
-at24c_eeprom_init(i2c[68], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[69], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[70], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[71], 0x52, 64 * KiB);
+at24c_eeprom_init(i2c[68], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[69], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[70], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[71], 0x52, 8 * KiB);
 
-at24c_eeprom_init(i2c[73], 0x53, 64 * KiB);
+at24c_eeprom_init(i2c[73], 0x53, 8 * KiB);
 i2c_slave_create_simple(i2c[74], TYPE_TMP75, 0x49);
 i2c_slave_create_simple(i2c[74], TYPE_TMP75, 0x48);
-at24c_eeprom_init(i2c[76], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[77], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[78], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[79], 0x52, 64 * KiB);
-at24c_eeprom_init(i2c[28], 0x50, 2 * KiB);
+at24c_eeprom_init(i2c[76], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[77], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[78], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[79], 0x52, 8 * KiB);
+at24c_eeprom_init(i2c[28], 0x50, 256);
 
 for (int i = 0; i < 8; i++) {
 at24c_eeprom_init(i2c[81 + i * 8], 0x56, 64 * KiB);
-- 
2.34.6

Re: [RFC 12/52] hw/acpi: Replace MachineState.smp access with topology helpers

2023-02-16 Thread Zhao Liu

On Thu, Feb 16, 2023 at 05:31:11PM +0800, wangyanan (Y) wrote:
> Date: Thu, 16 Feb 2023 17:31:11 +0800
> From: "wangyanan (Y)" 
> Subject: Re: [RFC 12/52] hw/acpi: Replace MachineState.smp access with
>  topology helpers
> 
> Hi Zhao,
> 
> 在 2023/2/13 17:49, Zhao Liu 写道:
> > From: Zhao Liu 
> > 
> > At present, in QEMU only arm needs PPTT table to build cpu topology.
> > 
> > Before QEMU's arm supports hybrid architectures, it's enough to limit
> > the cpu topology of PPTT to smp type through the explicit smp interface
> > (machine_topo_get_smp_threads()).
> > 
> > Cc: Michael S. Tsirkin 
> > Cc: Igor Mammedov 
> > Cc: Ani Sinha 
> > Signed-off-by: Zhao Liu 
> > ---
> >   hw/acpi/aml-build.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > index ea331a20d131..693bd8833d10 100644
> > --- a/hw/acpi/aml-build.c
> > +++ b/hw/acpi/aml-build.c
> > @@ -2044,7 +2044,7 @@ void build_pptt(GArray *table_data, BIOSLinker 
> > *linker, MachineState *ms,
> >   cluster_offset = socket_offset;
> >   }
> > -if (ms->smp.threads == 1) {
> > +if (machine_topo_get_smp_threads(ms) == 1) {
> >   build_processor_hierarchy_node(table_data,
> >   (1 << 1) | /* ACPI Processor ID valid */
> >   (1 << 3),  /* Node is a Leaf */
> ACPI PPTT table is designed to also support the hybrid CPU topology
> case where nodes on the same CPU topology level can have different
> number of child nodes.
> 
> So to be general, the diff should be:
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index ea331a20d1..dfded95bbc 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -2044,7 +2044,7 @@ void build_pptt(GArray *table_data, BIOSLinker
> *linker, MachineState *ms,
>  cluster_offset = socket_offset;
>  }
> 
> -    if (ms->smp.threads == 1) {
> +    if (machine_topo_get_threads_by_idx(n) == 1) {
>  build_processor_hierarchy_node(table_data,
>  (1 << 1) | /* ACPI Processor ID valid */
>  (1 << 3),  /* Node is a Leaf */

Nice! I'll replace that.

> 
> Actually I'm recently working on ARM hmp virtualization which relys on
> PPTT for topology representation, so we will also need PPTT to be general
> for hybrid case anyway.

Good to know that you are considering hybrid support for arm.
BTW, I explained the difference between arm and x86's hybrid in previous
email [1] [2], mainly about whether the cpm model is the same.

I tentatively think that this difference can be solved by arch-specific
coretype(). Do you have any comments on this? Thanks!

[1]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg03884.html
[2]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg03789.html

> 
> Thanks,
> Yanan

Re: [RFC 08/52] machine: Add helpers to get cpu topology info from MachineState.topo

2023-02-16 Thread Zhao Liu

On Thu, Feb 16, 2023 at 04:38:38PM +0800, wangyanan (Y) wrote:
> Date: Thu, 16 Feb 2023 16:38:38 +0800
> From: "wangyanan (Y)" 
> Subject: Re: [RFC 08/52] machine: Add helpers to get cpu topology info from
>  MachineState.topo
> 
> Hi Zhao,
> 
> 在 2023/2/13 17:49, Zhao Liu 写道:
> > From: Zhao Liu 
> > 
> > When MachineState.topo is introduced, the topology related structures
> > become complicated. In the general case (hybrid or smp topology),
> > accessing the topology information needs to determine whether it is
> > currently smp or hybrid topology, and then access the corresponding
> > MachineState.topo.smp or MachineState.topo.hybrid.
> > 
> > The best way to do this is to wrap the access to the topology to
> > avoid having to check each time it is accessed.
> > 
> > The following helpers are provided here:
> > 
> > - General interfaces - no need to worry about whether the underlying
> >topology is smp or hybrid:
> > 
> > * machine_topo_get_cpus()
> > * machine_topo_get_max_cpus()
> > * machine_topo_is_smp()
> > * machine_topo_get_sockets()
> > * machine_topo_get_dies()
> > * machine_topo_get_clusters()
> > * machine_topo_get_threads();
> > * machine_topo_get_cores();
> > * machine_topo_get_threads_by_idx()
> > * machine_topo_get_cores_by_idx()
> > * machine_topo_get_cores_per_socket()
> > * machine_topo_get_threads_per_socket()
> > 
> > - SMP-specific interfaces - provided for the cases that are clearly
> > known to be smp topology:
> > 
> > * machine_topo_get_smp_cores()
> > * machine_topo_get_smp_threads()
> > 
> > Since for hybrid topology, each core may has different threads, if
> > someone wants "cpus per core", the cpu_index is need to target a
> > specific core (machine_topo_get_threads_by_idx()). But for smp, there is
> > no need to be so troublesome, so for this case, we provide smp-specific
> > interfaces.
> > 
> > Signed-off-by: Zhao Liu 
> > ---
> >   hw/core/machine-topo.c | 142 +
> >   include/hw/boards.h|  35 ++
> >   2 files changed, 177 insertions(+)
> > 
> > diff --git a/hw/core/machine-topo.c b/hw/core/machine-topo.c
> > index 7223f73f99b0..b20160479629 100644
> > --- a/hw/core/machine-topo.c
> > +++ b/hw/core/machine-topo.c
> > @@ -21,6 +21,148 @@
> >   #include "hw/boards.h"
> >   #include "qapi/error.h"
> > +unsigned int machine_topo_get_sockets(const MachineState *ms)
> > +{
> > +return machine_topo_is_smp(ms) ? ms->topo.smp.sockets :
> > + ms->topo.hybrid.sockets;
> > +}
> > +
> > +unsigned int machine_topo_get_dies(const MachineState *ms)
> > +{
> > +return machine_topo_is_smp(ms) ? ms->topo.smp.dies :
> > + ms->topo.hybrid.dies;
> > +}
> > +
> > +unsigned int machine_topo_get_clusters(const MachineState *ms)
> > +{
> > +return machine_topo_is_smp(ms) ? ms->topo.smp.clusters :
> > + ms->topo.hybrid.clusters;
> > +}
> > +
> > +unsigned int machine_topo_get_smp_cores(const MachineState *ms)
> > +{
> > +g_assert(machine_topo_is_smp(ms));
> > +return ms->topo.smp.cores;
> > +}
> > +
> > +unsigned int machine_topo_get_smp_threads(const MachineState *ms)
> > +{
> > +g_assert(machine_topo_is_smp(ms));
> > +return ms->topo.smp.threads;
> > +}
> > +
> > +unsigned int machine_topo_get_threads(const MachineState *ms,
> > +  unsigned int cluster_id,
> > +  unsigned int core_id)
> > +{
> > +if (machine_topo_is_smp(ms)) {
> > +return ms->topo.smp.threads;
> > +} else {
> > +return ms->topo.hybrid.cluster_list[cluster_id]
> > +   .core_list[core_id].threads;
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +unsigned int machine_topo_get_cores(const MachineState *ms,
> > +unsigned int cluster_id)
> > +{
> > +if (machine_topo_is_smp(ms)) {
> > +return ms->topo.smp.cores;
> > +} else {
> > +return ms->topo.hybrid.cluster_list[cluster_id].cores;
> > +}
> > +}
> Is it possible to use variadic function so that those two smp specific
> helpers can be avoided? It's a bit wired that we have the generic
> machine_topo_get_threads but also need machine_topo_get_smp_threads
> at the same time.

I am not sure about this, because variadic functions unify function
naming, but eliminate the "smp-specific" information from the name.

Trying to get the cres/threads without considering the cpu index can
only be used in smp scenarios, and I think the caller needs to
understand that he knows it's smp.

> > +
> > +unsigned int machine_topo_get_threads_by_idx(const MachineState *ms,
> > + unsigned int cpu_index)
> > +{
> > +unsigned cpus_per_die;
> > +unsigned tmp_idx;
> > +HybridCluster *cluster;
> > +HybridCore *core;
> > +
> > +if (machine_topo_is_smp(ms)) {
> > +return ms->to

Re: [PATCH 13/18] target/riscv: Allow debugger to access seed CSR

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 9:09, Bin Meng wrote:

At present seed CSR is not reported in the CSR XML hence gdb cannot
access it.

Fix it by addding a debugger check in its predicate() routine.

Signed-off-by: Bin Meng 
---

  target/riscv/csr.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 515b05348b..f1075b5728 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -458,6 +458,10 @@ static RISCVException seed(CPURISCVState *env, int csrno)
  }
  
  #if !defined(CONFIG_USER_ONLY)

+if (env->debugger) {
+return RISCV_EXCP_NONE;
+}
+


Reviewed-by: LIU Zhiwei 

Zhiwei


  /*
   * With a CSR read-write instruction:
   * 1) The seed CSR is always available in machine mode as normal.

Re: [PATCH v6 2/9] target/riscv: introduce riscv_cpu_cfg()

2023-02-16 Thread LIU Zhiwei



On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

We're going to do changes that requires accessing the RISCVCPUConfig
struct from the RISCVCPU, having access only to a CPURISCVState 'env'
pointer. Add a helper to make the code easier to read.

Signed-off-by: Daniel Henrique Barboza
---
  target/riscv/cpu.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 01803a020d..5e9626837b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -653,6 +653,11 @@ static inline RISCVMXL riscv_cpu_mxl(CPURISCVState *env)
  #endif
  #define riscv_cpu_mxl_bits(env) (1UL << (4 + riscv_cpu_mxl(env)))
  
+static inline const RISCVCPUConfig *riscv_cpu_cfg(CPURISCVState *env)

+{
+return &env_archcpu(env)->cfg;
+}
+


There many places in branch should use this interface, not just in this 
patch set.


For example,

static RISCVException seed(CPURISCVState *env, int csrno)
{
    RISCVCPU *cpu = env_archcpu(env);

    if (!cpu->cfg.ext_zkr) {
    return RISCV_EXCP_ILLEGAL_INST;
    }

The cpu here will not be used, except referring to the cfg.

Do you mind to unify the use?

Zhiwei


  #if defined(TARGET_RISCV32)
  #define cpu_recompute_xl(env)  ((void)(env), MXL_RV32)
  #else

Re: [PATCH 12/18] target/riscv: Allow debugger to access user timer and counter CSRs

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 9:09, Bin Meng wrote:

At present user timer and counter CSRs are not reported in the
CSR XML hence gdb cannot access them.

Fix it by addding a debugger check in their predicate() routine.

Signed-off-by: Bin Meng 
---

  target/riscv/csr.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 749d0ef83e..515b05348b 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -131,6 +131,10 @@ static RISCVException ctr(CPURISCVState *env, int csrno)
  
  skip_ext_pmu_check:
  
+if (env->debugger) {

+return RISCV_EXCP_NONE;
+}
+


Reviewed-by: LIU Zhiwei 


Zhiwei


  if (env->priv < PRV_M && !get_field(env->mcounteren, ctr_mask)) {
  return RISCV_EXCP_ILLEGAL_INST;
  }

Re: [PATCH] thread-posix: add support for setting threads name on OpenBSD

2023-02-16 Thread Brad Smith


ping.

On 2022-12-18 3:22 a.m., Brad Smith wrote:

Make use of pthread_set_name_np() to be able to set the threads name
on OpenBSD.

Signed-off-by: Brad Smith 
---
  meson.build  | 12 
  util/qemu-thread-posix.c |  9 -
  2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 5c6b5a1c75..68adcb6291 100644
--- a/meson.build
+++ b/meson.build
@@ -2123,6 +2123,18 @@ config_host_data.set('CONFIG_PTHREAD_SETNAME_NP_WO_TID', 
cc.links(gnu_source_pre
  pthread_create(&thread, 0, f, 0);
  return 0;
}''', dependencies: threads))
+config_host_data.set('CONFIG_PTHREAD_SET_NAME_NP', cc.links(gnu_source_prefix 
+ '''
+  #include 
+  #include 
+
+  static void *f(void *p) { return NULL; }
+  int main(void)
+  {
+pthread_t thread;
+pthread_create(&thread, 0, f, 0);
+pthread_set_name_np(thread, "QEMU");
+return 0;
+  }''', dependencies: threads))
  config_host_data.set('CONFIG_PTHREAD_CONDATTR_SETCLOCK', 
cc.links(gnu_source_prefix + '''
#include 
#include 
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index bae938c670..412caa45ef 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -18,6 +18,10 @@
  #include "qemu/tsan.h"
  #include "qemu/bitmap.h"
  
+#ifdef CONFIG_PTHREAD_SET_NAME_NP

+#include 
+#endif
+
  static bool name_threads;
  
  void qemu_thread_naming(bool enable)

@@ -25,7 +29,8 @@ void qemu_thread_naming(bool enable)
  name_threads = enable;
  
  #if !defined CONFIG_PTHREAD_SETNAME_NP_W_TID && \

-!defined CONFIG_PTHREAD_SETNAME_NP_WO_TID
+!defined CONFIG_PTHREAD_SETNAME_NP_WO_TID && \
+!defined CONFIG_PTHREAD_SET_NAME_NP
  /* This is a debugging option, not fatal */
  if (enable) {
  fprintf(stderr, "qemu: thread naming not supported on this host\n");
@@ -480,6 +485,8 @@ static void *qemu_thread_start(void *args)
  pthread_setname_np(pthread_self(), qemu_thread_args->name);
  # elif defined(CONFIG_PTHREAD_SETNAME_NP_WO_TID)
  pthread_setname_np(qemu_thread_args->name);
+# elif defined(CONFIG_PTHREAD_SET_NAME_NP)
+pthread_set_name_np(pthread_self(), qemu_thread_args->name);
  # endif
  }
  QEMU_TSAN_ANNOTATE_THREAD_NAME(qemu_thread_args->name);

Re: [PATCH 11/18] target/riscv: gdbstub: Drop the vector CSRs in riscv-vector.xml

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:02, Bin Meng wrote:

It's worth noting that the vector CSR predicate() has a similar
run-time check logic to the FPU CSR. With the previous patch our
gdbstub can correctly report these vector CSRs via the CSR xml.

Commit 719d3561b269 ("target/riscv: gdb: support vector registers for rv64 & 
rv32")
inserted these vector CSRs in an ad-hoc, non-standard way in the
riscv-vector.xml. Now we can treat these CSRs no different from
other CSRs.

Signed-off-by: Bin Meng 
---

  target/riscv/gdbstub.c | 75 --
  1 file changed, 75 deletions(-)

diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index ef52f41460..6048541606 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -127,40 +127,6 @@ static int riscv_gdb_set_fpu(CPURISCVState *env, uint8_t 
*mem_buf, int n)
  return 0;
  }
  
-/*

- * Convert register index number passed by GDB to the correspond
- * vector CSR number. Vector CSRs are defined after vector registers
- * in dynamic generated riscv-vector.xml, thus the starting register index
- * of vector CSRs is 32.
- * Return 0 if register index number is out of range.
- */
-static int riscv_gdb_vector_csrno(int num_regs)
-{
-/*
- * The order of vector CSRs in the switch case
- * should match with the order defined in csr_ops[].
- */
-switch (num_regs) {
-case 32:
-return CSR_VSTART;
-case 33:
-return CSR_VXSAT;
-case 34:
-return CSR_VXRM;
-case 35:
-return CSR_VCSR;
-case 36:
-return CSR_VL;
-case 37:
-return CSR_VTYPE;
-case 38:
-return CSR_VLENB;
-default:
-/* Unknown register. */
-return 0;
-}
-}
-
  static int riscv_gdb_get_vector(CPURISCVState *env, GByteArray *buf, int n)
  {
  uint16_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
@@ -174,19 +140,6 @@ static int riscv_gdb_get_vector(CPURISCVState *env, 
GByteArray *buf, int n)
  return cnt;
  }
  
-int csrno = riscv_gdb_vector_csrno(n);

-
-if (!csrno) {
-return 0;
-}
-
-target_ulong val = 0;
-int result = riscv_csrrw_debug(env, csrno, &val, 0, 0);
-
-if (result == RISCV_EXCP_NONE) {
-return gdb_get_regl(buf, val);
-}
-
  return 0;
  }
  
@@ -201,19 +154,6 @@ static int riscv_gdb_set_vector(CPURISCVState *env, uint8_t *mem_buf, int n)

  return vlenb;
  }
  
-int csrno = riscv_gdb_vector_csrno(n);

-
-if (!csrno) {
-return 0;
-}
-
-target_ulong val = ldtul_p(mem_buf);
-int result = riscv_csrrw_debug(env, csrno, NULL, val, -1);
-
-if (result == RISCV_EXCP_NONE) {
-return sizeof(target_ulong);
-}
-
  return 0;
  }
  
@@ -361,21 +301,6 @@ static int ricsv_gen_dynamic_vector_xml(CPUState *cs, int base_reg)

  num_regs++;
  }
  
-/* Define vector CSRs */

-const char *vector_csrs[7] = {
-"vstart", "vxsat", "vxrm", "vcsr",
-"vl", "vtype", "vlenb"
-};
-
-for (i = 0; i < 7; i++) {
-g_string_append_printf(s,
-   "",
-   vector_csrs[i], TARGET_LONG_BITS, base_reg++);
-num_regs++;
-}
-


Reviewed-by: LIU Zhiwei 

Zhiwei


  g_string_append_printf(s, "");
  
  cpu->dyn_vreg_xml = g_string_free(s, false);

Re: [PATCH 10/18] target/riscv: gdbstub: Turn on debugger mode before calling CSR predicate()

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:02, Bin Meng wrote:

Since commit 94452ac4cf26 ("target/riscv: remove fflags, frm, and fcsr from 
riscv-*-fpu.xml")
the 3 FPU CSRs are removed from the XML target decription. The
original intent of that commit was based on the assumption that
the 3 FPU CSRs will show up in the riscv-csr.xml so the ones in
riscv-*-fpu.xml are redundant. But unforuantely that is not ture.
As the FPU CSR predicate() has a run-time check on MSTATUS.FS,
at the time when CSR XML is generated MSTATUS.FS is unset, hence
no FPU CSRs will be reported.

The FPU CSR predicate() already considered such a case of being
accessed by a debugger. All we need to do is to turn on debugger
mode before calling predicate().

Signed-off-by: Bin Meng 
---

  target/riscv/gdbstub.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index 294f0ceb1c..ef52f41460 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -280,6 +280,10 @@ static int riscv_gen_dynamic_csr_xml(CPUState *cs, int 
base_reg)
  int bitsize = 16 << env->misa_mxl_max;
  int i;
  
+#if !defined(CONFIG_USER_ONLY)

+env->debugger = true;
+#endif
+
  /* Until gdb knows about 128-bit registers */
  if (bitsize > 64) {
  bitsize = 64;
@@ -308,6 +312,11 @@ static int riscv_gen_dynamic_csr_xml(CPUState *cs, int 
base_reg)
  g_string_append_printf(s, "");
  
  cpu->dyn_csr_xml = g_string_free(s, false);

+
+#if !defined(CONFIG_USER_ONLY)
+env->debugger = false;
+#endif
+


Reviewed-by: LIU Zhiwei 

Zhiwei


  return CSR_TABLE_SIZE;
  }

Re: [PATCH 09/18] target/riscv: Avoid reporting odd-numbered pmpcfgX in the CSR XML for RV64

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:02, Bin Meng wrote:

At present the odd-numbered PMP configuration registers for RV64 are
reported in the CSR XML by QEMU gdbstub. However these registers do
not exist on RV64 so trying to access them from gdb results in 'E14'.

Move the pmpcfgX index check from the actual read/write routine to
the PMP CSR predicate() routine, so that non-existent pmpcfgX won't
be reported in the CSR XML for RV64.

Signed-off-by: Bin Meng 
---

  target/riscv/csr.c | 23 ---
  1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0a3f2bef6f..749d0ef83e 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -412,6 +412,14 @@ static int aia_hmode32(CPURISCVState *env, int csrno)
  static RISCVException pmp(CPURISCVState *env, int csrno)
  {
  if (riscv_feature(env, RISCV_FEATURE_PMP)) {
+if (csrno <= CSR_PMPCFG3) {
+uint32_t reg_index = csrno - CSR_PMPCFG0;
+
+if ((reg_index & 1) && (riscv_cpu_mxl(env) == MXL_RV64)) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+}
+
  return RISCV_EXCP_NONE;
  }
  
@@ -3334,23 +3342,11 @@ static RISCVException write_mseccfg(CPURISCVState *env, int csrno,

  return RISCV_EXCP_NONE;
  }
  
-static bool check_pmp_reg_index(CPURISCVState *env, uint32_t reg_index)

-{
-/* TODO: RV128 restriction check */


Should keep this comment. Otherwise,

Reviewed-by: LIU Zhiwei 

Zhiwei


-if ((reg_index & 1) && (riscv_cpu_mxl(env) == MXL_RV64)) {
-return false;
-}
-return true;
-}
-
  static RISCVException read_pmpcfg(CPURISCVState *env, int csrno,
target_ulong *val)
  {
  uint32_t reg_index = csrno - CSR_PMPCFG0;
  
-if (!check_pmp_reg_index(env, reg_index)) {

-return RISCV_EXCP_ILLEGAL_INST;
-}
  *val = pmpcfg_csr_read(env, reg_index);
  return RISCV_EXCP_NONE;
  }
@@ -3360,9 +3356,6 @@ static RISCVException write_pmpcfg(CPURISCVState *env, 
int csrno,
  {
  uint32_t reg_index = csrno - CSR_PMPCFG0;
  
-if (!check_pmp_reg_index(env, reg_index)) {

-return RISCV_EXCP_ILLEGAL_INST;
-}
  pmpcfg_csr_write(env, reg_index, val);
  return RISCV_EXCP_NONE;
  }

Re: [PATCH v6 1/9] target/riscv: turn write_misa() into an official no-op

2023-02-16 Thread weiwei




On 2023/2/17 05:55, Daniel Henrique Barboza wrote:

At this moment, and apparently since ever, we have no way of enabling
RISCV_FEATURE_MISA. This means that all the code from write_misa(), all
the nuts and bolts that handles how to write this CSR, has always been a
no-op as well because write_misa() will always exit earlier.

This seems to be benign in the majority of cases. Booting an Ubuntu
'virt' guest and logging all the calls to 'write_misa' shows that no
writes to MISA CSR was attempted. Writing MISA, i.e. enabling/disabling
RISC-V extensions after the machine is powered on, seems to be a niche
use.

Before proceeding, let's recap what the spec says about MISA. It is a
CSR that is divided in 3 fields:

- MXL, Machine XLEN, described as "may be writable";

- MXLEN, the XLEN in M-mode, which is given by the setting of MXL or a
fixed value if MISA is zero;

- Extensions is defined as "a WARL field that can contain writable bits
where the implementation allows the supported ISA to be modified"

Thus what we have today (write_misa() being a no-op) is already a valid
spec implementation. We're not obliged to have a particular set of MISA
writable bits, and at this moment we have none.

Given that allowing the dormant code to write MISA can cause tricky bugs
to solve later on, and we don't have a particularly interesting case of
writing MISA to support today, and we're already not violating the
specification, let's erase all the body of write_misa() and turn it into
an official no-op instead of an accidental one. We'll keep consistent
with what we provide users today but with 50+ less lines to maintain.

RISCV_FEATURE_MISA enum is erased in the process since there's no one
else using it.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Bin Meng 
Reviewed-by: Andrew Jones 

Reviewed-by: Weiwei Li 

Regards,

Weiwei Li

---
  target/riscv/cpu.h |  1 -
  target/riscv/csr.c | 55 --
  2 files changed, 56 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7128438d8e..01803a020d 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -89,7 +89,6 @@ enum {
  RISCV_FEATURE_MMU,
  RISCV_FEATURE_PMP,
  RISCV_FEATURE_EPMP,
-RISCV_FEATURE_MISA,
  RISCV_FEATURE_DEBUG
  };
  
diff --git a/target/riscv/csr.c b/target/riscv/csr.c

index 1b0a0c1693..f7862ff4a4 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1329,61 +1329,6 @@ static RISCVException read_misa(CPURISCVState *env, int 
csrno,
  static RISCVException write_misa(CPURISCVState *env, int csrno,
   target_ulong val)
  {
-if (!riscv_feature(env, RISCV_FEATURE_MISA)) {
-/* drop write to misa */
-return RISCV_EXCP_NONE;
-}
-
-/* 'I' or 'E' must be present */
-if (!(val & (RVI | RVE))) {
-/* It is not, drop write to misa */
-return RISCV_EXCP_NONE;
-}
-
-/* 'E' excludes all other extensions */
-if (val & RVE) {
-/* when we support 'E' we can do "val = RVE;" however
- * for now we just drop writes if 'E' is present.
- */
-return RISCV_EXCP_NONE;
-}
-
-/*
- * misa.MXL writes are not supported by QEMU.
- * Drop writes to those bits.
- */
-
-/* Mask extensions that are not supported by this hart */
-val &= env->misa_ext_mask;
-
-/* Mask extensions that are not supported by QEMU */
-val &= (RVI | RVE | RVM | RVA | RVF | RVD | RVC | RVS | RVU | RVV);
-
-/* 'D' depends on 'F', so clear 'D' if 'F' is not present */
-if ((val & RVD) && !(val & RVF)) {
-val &= ~RVD;
-}
-
-/* Suppress 'C' if next instruction is not aligned
- * TODO: this should check next_pc
- */
-if ((val & RVC) && (GETPC() & ~3) != 0) {
-val &= ~RVC;
-}
-
-/* If nothing changed, do nothing. */
-if (val == env->misa_ext) {
-return RISCV_EXCP_NONE;
-}
-
-if (!(val & RVF)) {
-env->mstatus &= ~MSTATUS_FS;
-}
-
-/* flush translation cache */
-tb_flush(env_cpu(env));
-env->misa_ext = val;
-env->xl = riscv_cpu_mxl(env);
  return RISCV_EXCP_NONE;
  }

Re: [PATCH v6 2/9] target/riscv: introduce riscv_cpu_cfg()

2023-02-16 Thread weiwei




On 2023/2/17 05:55, Daniel Henrique Barboza wrote:

We're going to do changes that requires accessing the RISCVCPUConfig
struct from the RISCVCPU, having access only to a CPURISCVState 'env'
pointer. Add a helper to make the code easier to read.

Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Weiwei Li 

Regards,

Weiwei Li

---
  target/riscv/cpu.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 01803a020d..5e9626837b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -653,6 +653,11 @@ static inline RISCVMXL riscv_cpu_mxl(CPURISCVState *env)
  #endif
  #define riscv_cpu_mxl_bits(env) (1UL << (4 + riscv_cpu_mxl(env)))
  
+static inline const RISCVCPUConfig *riscv_cpu_cfg(CPURISCVState *env)

+{
+return &env_archcpu(env)->cfg;
+}
+
  #if defined(TARGET_RISCV32)
  #define cpu_recompute_xl(env)  ((void)(env), MXL_RV32)
  #else

Re: [PATCH 08/18] target/riscv: Simplify getting RISCVCPU pointer from env

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:02, Bin Meng wrote:

Use env_archcpu() to get RISCVCPU pointer from env directly.

Signed-off-by: Bin Meng 
---

  target/riscv/csr.c | 36 
  1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index da3b770894..0a3f2bef6f 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,8 +46,7 @@ static RISCVException smstateen_acc_ok(CPURISCVState *env, 
int index,
 uint64_t bit)
  {
  bool virt = riscv_cpu_virt_enabled(env);
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  
  if (env->priv == PRV_M || !cpu->cfg.ext_smstateen) {

  return RISCV_EXCP_NONE;
@@ -90,8 +89,7 @@ static RISCVException fs(CPURISCVState *env, int csrno)
  
  static RISCVException vs(CPURISCVState *env, int csrno)

  {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);


I see many RISCVCPU pointers are just for cfg in this patch. We can also 
use the new interface by Dianel, which will unify the interface for same 
function.


Otherwise,

Reviewed-by: LIU Zhiwei 

Zhiwei

  
  if (env->misa_ext & RVV ||

  cpu->cfg.ext_zve32f || cpu->cfg.ext_zve64f) {
@@ -108,8 +106,7 @@ static RISCVException vs(CPURISCVState *env, int csrno)
  static RISCVException ctr(CPURISCVState *env, int csrno)
  {
  #if !defined(CONFIG_USER_ONLY)
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  int ctr_index;
  target_ulong ctr_mask;
  int base_csrno = CSR_CYCLE;
@@ -166,8 +163,7 @@ static RISCVException ctr32(CPURISCVState *env, int csrno)
  #if !defined(CONFIG_USER_ONLY)
  static RISCVException mctr(CPURISCVState *env, int csrno)
  {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  int ctr_index;
  int base_csrno = CSR_MHPMCOUNTER3;
  
@@ -195,8 +191,7 @@ static RISCVException mctr32(CPURISCVState *env, int csrno)
  
  static RISCVException sscofpmf(CPURISCVState *env, int csrno)

  {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  
  if (!cpu->cfg.ext_sscofpmf) {

  return RISCV_EXCP_ILLEGAL_INST;
@@ -321,8 +316,7 @@ static RISCVException umode32(CPURISCVState *env, int csrno)
  
  static RISCVException mstateen(CPURISCVState *env, int csrno)

  {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  
  if (!cpu->cfg.ext_smstateen) {

  return RISCV_EXCP_ILLEGAL_INST;
@@ -333,8 +327,7 @@ static RISCVException mstateen(CPURISCVState *env, int 
csrno)
  
  static RISCVException hstateen_pred(CPURISCVState *env, int csrno, int base)

  {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  
  if (!cpu->cfg.ext_smstateen) {

  return RISCV_EXCP_ILLEGAL_INST;
@@ -363,8 +356,7 @@ static RISCVException sstateen(CPURISCVState *env, int 
csrno)
  {
  bool virt = riscv_cpu_virt_enabled(env);
  int index = csrno - CSR_SSTATEEN0;
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  
  if (!cpu->cfg.ext_smstateen) {

  return RISCV_EXCP_ILLEGAL_INST;
@@ -918,8 +910,7 @@ static RISCVException read_timeh(CPURISCVState *env, int 
csrno,
  
  static RISCVException sstc(CPURISCVState *env, int csrno)

  {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  bool hmode_check = false;
  
  if (!cpu->cfg.ext_sstc || !env->rdtime_fn) {

@@ -1152,8 +1143,7 @@ static RISCVException write_ignore(CPURISCVState *env, 
int csrno,
  static RISCVException read_mvendorid(CPURISCVState *env, int csrno,
   target_ulong *val)
  {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  
  *val = cpu->cfg.mvendorid;

  return RISCV_EXCP_NONE;
@@ -1162,8 +1152,7 @@ static RISCVException read_mvendorid(CPURISCVState *env, 
int csrno,
  static RISCVException read_marchid(CPURISCVState *env, int csrno,
 target_ulong *val)
  {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  
  *val = cpu->cfg.marchid;

  return RISCV_EXCP_NONE;
@@ -1172,8 +1161,7 @@ static RISCVException read_marchid(CPURISCVState *env, 
int csrno,
  static RISCVException read_mimpid(CPURISCVState *env, int csrno,
target_ulong *val)
  {
-CPUState *cs = env_cpu(env);
-RISCVCPU *cpu = RISCV_CPU(cs);
+RISCVCPU *cpu = env_archcpu(env);
  
  *val = cpu->cfg.mimpid;

  return RISCV_EXCP_NONE;

Re: [PATCH 07/18] target/riscv: Simplify {read,write}_pmpcfg() a little bit

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:02, Bin Meng wrote:

Use the register index that has already been calculated in the
pmpcfg_csr_{read,write} call.

Signed-off-by: Bin Meng 
---

  target/riscv/csr.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 8bbc75cbfa..da3b770894 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3363,7 +3363,7 @@ static RISCVException read_pmpcfg(CPURISCVState *env, int 
csrno,
  if (!check_pmp_reg_index(env, reg_index)) {
  return RISCV_EXCP_ILLEGAL_INST;
  }
-*val = pmpcfg_csr_read(env, csrno - CSR_PMPCFG0);
+*val = pmpcfg_csr_read(env, reg_index);
  return RISCV_EXCP_NONE;
  }
  
@@ -3375,7 +3375,7 @@ static RISCVException write_pmpcfg(CPURISCVState *env, int csrno,

  if (!check_pmp_reg_index(env, reg_index)) {
  return RISCV_EXCP_ILLEGAL_INST;
  }
-pmpcfg_csr_write(env, csrno - CSR_PMPCFG0, val);
+pmpcfg_csr_write(env, reg_index, val);


Reviewed-by: LIU Zhiwei 

Zhiwei


  return RISCV_EXCP_NONE;
  }

Re: [PATCH 06/18] target/riscv: Use 'bool' type for read_only

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:02, Bin Meng wrote:

The read_only variable is currently declared as an 'int', but it
should really be a 'bool'.

Signed-off-by: Bin Meng 
---

  target/riscv/csr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index cc74819759..8bbc75cbfa 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3778,7 +3778,7 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
 RISCVCPU *cpu)
  {
  /* check privileges and return RISCV_EXCP_ILLEGAL_INST if check fails */
-int read_only = get_field(csrno, 0xC00) == 3;
+bool read_only = get_field(csrno, 0xC00) == 3;


Reviewed-by: LIU Zhiwei 

Zhiwei


  int csr_min_priv = csr_ops[csrno].min_priv_ver;
  
  /* ensure the CSR extension is enabled. */

Re: [PATCH 05/18] target/riscv: Coding style fixes in csr.c

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:02, Bin Meng wrote:

Fix various places that violate QEMU coding style:

- correct multi-line comment format
- indent to opening parenthesis

Signed-off-by: Bin Meng 
---

  target/riscv/csr.c | 62 --
  1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index c2dd9d5af0..cc74819759 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -963,7 +963,7 @@ static RISCVException sstc_32(CPURISCVState *env, int csrno)
  }
  
  static RISCVException read_vstimecmp(CPURISCVState *env, int csrno,

-target_ulong *val)
+ target_ulong *val)
  {
  *val = env->vstimecmp;
  
@@ -971,7 +971,7 @@ static RISCVException read_vstimecmp(CPURISCVState *env, int csrno,

  }
  
  static RISCVException read_vstimecmph(CPURISCVState *env, int csrno,

-target_ulong *val)
+  target_ulong *val)
  {
  *val = env->vstimecmp >> 32;
  
@@ -979,7 +979,7 @@ static RISCVException read_vstimecmph(CPURISCVState *env, int csrno,

  }
  
  static RISCVException write_vstimecmp(CPURISCVState *env, int csrno,

-target_ulong val)
+  target_ulong val)
  {
  RISCVCPU *cpu = env_archcpu(env);
  
@@ -996,7 +996,7 @@ static RISCVException write_vstimecmp(CPURISCVState *env, int csrno,

  }
  
  static RISCVException write_vstimecmph(CPURISCVState *env, int csrno,

-target_ulong val)
+   target_ulong val)
  {
  RISCVCPU *cpu = env_archcpu(env);
  
@@ -1020,7 +1020,7 @@ static RISCVException read_stimecmp(CPURISCVState *env, int csrno,

  }
  
  static RISCVException read_stimecmph(CPURISCVState *env, int csrno,

-target_ulong *val)
+ target_ulong *val)
  {
  if (riscv_cpu_virt_enabled(env)) {
  *val = env->vstimecmp >> 32;
@@ -1032,7 +1032,7 @@ static RISCVException read_stimecmph(CPURISCVState *env, 
int csrno,
  }
  
  static RISCVException write_stimecmp(CPURISCVState *env, int csrno,

-target_ulong val)
+ target_ulong val)
  {
  RISCVCPU *cpu = env_archcpu(env);
  
@@ -1055,7 +1055,7 @@ static RISCVException write_stimecmp(CPURISCVState *env, int csrno,

  }
  
  static RISCVException write_stimecmph(CPURISCVState *env, int csrno,

-target_ulong val)
+  target_ulong val)
  {
  RISCVCPU *cpu = env_archcpu(env);
  
@@ -1342,7 +1342,8 @@ static RISCVException write_misa(CPURISCVState *env, int csrno,
  
  /* 'E' excludes all other extensions */

  if (val & RVE) {
-/* when we support 'E' we can do "val = RVE;" however
+/*
+ * when we support 'E' we can do "val = RVE;" however
   * for now we just drop writes if 'E' is present.
   */
  return RISCV_EXCP_NONE;
@@ -1364,7 +1365,8 @@ static RISCVException write_misa(CPURISCVState *env, int 
csrno,
  val &= ~RVD;
  }
  
-/* Suppress 'C' if next instruction is not aligned

+/*
+ * Suppress 'C' if next instruction is not aligned
   * TODO: this should check next_pc
   */
  if ((val & RVC) && (GETPC() & ~3) != 0) {
@@ -1833,28 +1835,28 @@ static RISCVException write_mscratch(CPURISCVState 
*env, int csrno,
  }
  
  static RISCVException read_mepc(CPURISCVState *env, int csrno,

- target_ulong *val)
+target_ulong *val)
  {
  *val = env->mepc;
  return RISCV_EXCP_NONE;
  }
  
  static RISCVException write_mepc(CPURISCVState *env, int csrno,

- target_ulong val)
+ target_ulong val)
  {
  env->mepc = val;
  return RISCV_EXCP_NONE;
  }
  
  static RISCVException read_mcause(CPURISCVState *env, int csrno,

- target_ulong *val)
+  target_ulong *val)
  {
  *val = env->mcause;
  return RISCV_EXCP_NONE;
  }
  
  static RISCVException write_mcause(CPURISCVState *env, int csrno,

- target_ulong val)
+   target_ulong val)
  {
  env->mcause = val;
  return RISCV_EXCP_NONE;
@@ -1876,14 +1878,14 @@ static RISCVException write_mtval(CPURISCVState *env, 
int csrno,
  
  /* Execution environment configuration setup */

  static RISCVException read_menvcfg(CPURISCVState *env, int csrno,
- target_ulong *val)
+   target_ulong *val)
  {
  *val = env->menvcfg;
  return RISCV_EXCP_NONE;
  }
  
  static RISCVException write_menvcf

Re: [PATCH 04/18] target/riscv: gdbstub: Do not generate CSR XML if Zicsr is disabled

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:02, Bin Meng wrote:

There is no need to generate the CSR XML if the Zicsr extension
is not enabled.

Signed-off-by: Bin Meng 
---

  target/riscv/gdbstub.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index 704f3d6922..294f0ceb1c 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -406,7 +406,10 @@ void riscv_cpu_register_gdb_regs_for_features(CPUState *cs)
  g_assert_not_reached();
  }
  
-gdb_register_coprocessor(cs, riscv_gdb_get_csr, riscv_gdb_set_csr,

- riscv_gen_dynamic_csr_xml(cs, cs->gdb_num_regs),
- "riscv-csr.xml", 0);
+if (cpu->cfg.ext_icsr) {
+int base_reg = cs->gdb_num_regs;
+gdb_register_coprocessor(cs, riscv_gdb_get_csr, riscv_gdb_set_csr,
+ riscv_gen_dynamic_csr_xml(cs, base_reg),
+ "riscv-csr.xml", 0);
+}


Reviewed-by: LIU Zhiwei 

Zhiwei


  }

Re: [PATCH 03/18] target/riscv: gdbstub: Minor change for better readability

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:01, Bin Meng wrote:

Use a variable 'base_reg' to represent cs->gdb_num_regs so that
the call to ricsv_gen_dynamic_vector_xml() can be placed in one
single line for better readability.

Signed-off-by: Bin Meng 
---

  target/riscv/gdbstub.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index e57372db38..704f3d6922 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -385,9 +385,9 @@ void riscv_cpu_register_gdb_regs_for_features(CPUState *cs)
   32, "riscv-32bit-fpu.xml", 0);
  }
  if (env->misa_ext & RVV) {
+int base_reg = cs->gdb_num_regs;
  gdb_register_coprocessor(cs, riscv_gdb_get_vector, 
riscv_gdb_set_vector,
- ricsv_gen_dynamic_vector_xml(cs,
-  
cs->gdb_num_regs),
+ ricsv_gen_dynamic_vector_xml(cs, base_reg),


Reviewed-by: LIU Zhiwei 

Zhiwei


   "riscv-vector.xml", 0);
  }
  switch (env->misa_mxl_max) {

Re: [PATCH v6 2/9] target/riscv: introduce riscv_cpu_cfg()

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 9:50, LIU Zhiwei wrote:


On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

We're going to do changes that requires accessing the RISCVCPUConfig
struct from the RISCVCPU, having access only to a CPURISCVState 'env'
pointer. Add a helper to make the code easier to read.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 01803a020d..5e9626837b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -653,6 +653,11 @@ static inline RISCVMXL 
riscv_cpu_mxl(CPURISCVState *env)

  #endif
  #define riscv_cpu_mxl_bits(env) (1UL << (4 + riscv_cpu_mxl(env)))
  +static inline const RISCVCPUConfig *riscv_cpu_cfg(CPURISCVState *env)


Maybe we should

static inline const* RISCVCPUConfig riscv_cpu_cfg(CPURISCVState *env) 
or just

static inline RISCVCPUConfig *riscv_cpu_cfg(CPURISCVState *env)


Ignore this comment. I see that you never change the fields from this 
pointer.


Zhiwei



Zhiwei


+{
+    return &env_archcpu(env)->cfg;
+}
+
  #if defined(TARGET_RISCV32)
  #define cpu_recompute_xl(env)  ((void)(env), MXL_RV32)
  #else

Re: [PATCH 02/18] target/riscv: Correct the priority policy of riscv_csrrw_check()

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:01, Bin Meng wrote:

The priority policy of riscv_csrrw_check() was once adjusted in
commit eacaf4401956 ("target/riscv: Fix priority of csr related check in 
riscv_csrrw_check")
whose commit message says the CSR existence check should come
before the access control check, but the code changes did not
agree with the commit message, that the predicate() check came
after the read / write check.

Fixes: eacaf4401956 ("target/riscv: Fix priority of csr related check in 
riscv_csrrw_check")
Signed-off-by: Bin Meng 
---

  target/riscv/csr.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 1b0a0c1693..c2dd9d5af0 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3793,15 +3793,15 @@ static inline RISCVException 
riscv_csrrw_check(CPURISCVState *env,
  return RISCV_EXCP_ILLEGAL_INST;
  }
  
-if (write_mask && read_only) {

-return RISCV_EXCP_ILLEGAL_INST;
-}
-
  RISCVException ret = csr_ops[csrno].predicate(env, csrno);
  if (ret != RISCV_EXCP_NONE) {
  return ret;
  }
  
+if (write_mask && read_only) {

+return RISCV_EXCP_ILLEGAL_INST;
+}
+


Reviewed-by: LIU Zhiwei 

Zhiwei


  #if !defined(CONFIG_USER_ONLY)
  int csr_priv, effective_priv = env->priv;

Re: [PATCH 01/18] target/riscv: gdbstub: Check priv spec version before reporting CSR

2023-02-16 Thread LIU Zhiwei




On 2023/2/14 2:01, Bin Meng wrote:

The gdbstub CSR XML is dynamically generated according to the result
of the CSR predicate() result. This has been working fine until
commit 7100fe6c2441 ("target/riscv: Enable privileged spec version 1.12")
introduced the privilege spec version check in riscv_csrrw_check().

When debugging the 'sifive_u' machine whose priv spec is at 1.10,
gdbstub reports priv spec 1.12 CSRs like menvcfg in the XML, hence
we see "remote failure reply 'E14'" message when examining all CSRs
via "info register system" from gdb.

Add the priv spec version check in the CSR XML generation logic to
fix this issue.

Fixes: 7100fe6c2441 ("target/riscv: Enable privileged spec version 1.12")
Signed-off-by: Bin Meng 
---

  target/riscv/gdbstub.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index 6e7bbdbd5e..e57372db38 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -290,6 +290,9 @@ static int riscv_gen_dynamic_csr_xml(CPUState *cs, int 
base_reg)
  g_string_append_printf(s, "");
  
  for (i = 0; i < CSR_TABLE_SIZE; i++) {

+if (env->priv_ver < csr_ops[i].min_priv_ver) {
+continue;
+}

Reviewed-by: LIU Zhiwei 

Zhiwei


  predicate = csr_ops[i].predicate;
  if (predicate && (predicate(env, i) == RISCV_EXCP_NONE)) {
  if (csr_ops[i].name) {

Re: [PATCH v6 9/9] target/riscv/cpu: remove CPUArchState::features and friends

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

The attribute is no longer used since we can retrieve all the enabled
features in the hart by using cpu->cfg instead.

Remove env->feature, riscv_feature() and riscv_set_feature(). We also
need to bump vmstate_riscv_cpu version_id and minimal_version_id since
'features' is no longer being migrated.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Weiwei Li 
Reviewed-by: Bin Meng 
Reviewed-by: Andrew Jones 
---
  target/riscv/cpu.h | 12 
  target/riscv/machine.c |  5 ++---
  2 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 0519d2ab0c..9897305184 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -173,8 +173,6 @@ struct CPUArchState {
  /* 128-bit helpers upper part return value */
  target_ulong retxh;
  
-uint32_t features;

-
  #ifdef CONFIG_USER_ONLY
  uint32_t elf_flags;
  #endif
@@ -524,16 +522,6 @@ static inline int riscv_has_ext(CPURISCVState *env, 
target_ulong ext)
  return (env->misa_ext & ext) != 0;
  }
  
-static inline bool riscv_feature(CPURISCVState *env, int feature)

-{
-return env->features & (1ULL << feature);
-}
-
-static inline void riscv_set_feature(CPURISCVState *env, int feature)
-{
-env->features |= (1ULL << feature);
-}
-
  #include "cpu_user.h"
  
  extern const char * const riscv_int_regnames[];

diff --git a/target/riscv/machine.c b/target/riscv/machine.c
index 67e9e56853..9c455931d8 100644
--- a/target/riscv/machine.c
+++ b/target/riscv/machine.c
@@ -331,8 +331,8 @@ static const VMStateDescription vmstate_pmu_ctr_state = {
  
  const VMStateDescription vmstate_riscv_cpu = {

  .name = "cpu",
-.version_id = 6,
-.minimum_version_id = 6,
+.version_id = 7,
+.minimum_version_id = 7,
  .post_load = riscv_cpu_post_load,
  .fields = (VMStateField[]) {
  VMSTATE_UINTTL_ARRAY(env.gpr, RISCVCPU, 32),
@@ -351,7 +351,6 @@ const VMStateDescription vmstate_riscv_cpu = {
  VMSTATE_UINT32(env.misa_ext, RISCVCPU),
  VMSTATE_UINT32(env.misa_mxl_max, RISCVCPU),
  VMSTATE_UINT32(env.misa_ext_mask, RISCVCPU),
-VMSTATE_UINT32(env.features, RISCVCPU),


Reviewed-by: LIU Zhiwei 

Zhiwei


  VMSTATE_UINTTL(env.priv, RISCVCPU),
  VMSTATE_UINTTL(env.virt, RISCVCPU),
  VMSTATE_UINT64(env.resetvec, RISCVCPU),

Re: [PATCH v6 8/9] target/riscv: remove RISCV_FEATURE_MMU

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

RISCV_FEATURE_MMU is set whether cpu->cfg.mmu is set, so let's just use
the flag directly instead.

With this change the enum is also removed. It is worth noticing that
this enum, and all the RISCV_FEATURES_* that were contained in it,
predates the existence of the cpu->cfg object. Today, using cpu->cfg is
an easier way to retrieve all the features and extensions enabled in the
hart.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Weiwei Li 
Reviewed-by: Bin Meng 
Reviewed-by: Andrew Jones 
---
  target/riscv/cpu.c| 4 
  target/riscv/cpu.h| 7 ---
  target/riscv/cpu_helper.c | 2 +-
  target/riscv/csr.c| 4 ++--
  target/riscv/monitor.c| 2 +-
  target/riscv/pmp.c| 2 +-
  6 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 7b1360d6ba..075033006c 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -919,10 +919,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  }
  }
  
-if (cpu->cfg.mmu) {

-riscv_set_feature(env, RISCV_FEATURE_MMU);
-}
-
  if (cpu->cfg.epmp && !cpu->cfg.pmp) {
  /*
   * Enhanced PMP should only be available
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 119a022af9..0519d2ab0c 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -81,13 +81,6 @@
  #define RVH RV('H')
  #define RVJ RV('J')
  
-/* S extension denotes that Supervisor mode exists, however it is possible

-   to have a core that support S mode but does not have an MMU and there
-   is currently no bit in misa to indicate whether an MMU exists or not
-   so a cpu features bitfield is required, likewise for optional PMP support */
-enum {
-RISCV_FEATURE_MMU,
-};
  
  /* Privileged specification version */

  enum {
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 292b6b3168..eda2293470 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -796,7 +796,7 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
  mode = PRV_U;
  }
  
-if (mode == PRV_M || !riscv_feature(env, RISCV_FEATURE_MMU)) {

+if (mode == PRV_M || !riscv_cpu_cfg(env)->mmu) {
  *physical = addr;
  *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
  return TRANSLATE_SUCCESS;
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d0ab00d870..fcc271c93c 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -2569,7 +2569,7 @@ static RISCVException rmw_siph(CPURISCVState *env, int 
csrno,
  static RISCVException read_satp(CPURISCVState *env, int csrno,
  target_ulong *val)
  {
-if (!riscv_feature(env, RISCV_FEATURE_MMU)) {
+if (!riscv_cpu_cfg(env)->mmu) {
  *val = 0;
  return RISCV_EXCP_NONE;
  }
@@ -2588,7 +2588,7 @@ static RISCVException write_satp(CPURISCVState *env, int 
csrno,
  {
  target_ulong vm, mask;
  
-if (!riscv_feature(env, RISCV_FEATURE_MMU)) {

+if (!riscv_cpu_cfg(env)->mmu) {
  return RISCV_EXCP_NONE;
  }
  
diff --git a/target/riscv/monitor.c b/target/riscv/monitor.c

index 236f93b9f5..f36ddfa967 100644
--- a/target/riscv/monitor.c
+++ b/target/riscv/monitor.c
@@ -218,7 +218,7 @@ void hmp_info_mem(Monitor *mon, const QDict *qdict)
  return;
  }
  
-if (!riscv_feature(env, RISCV_FEATURE_MMU)) {

+if (!riscv_cpu_cfg(env)->mmu) {
  monitor_printf(mon, "S-mode MMU unavailable\n");
  return;
  }
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 205bfbe090..a08cd95658 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -315,7 +315,7 @@ int pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
  }
  
  if (size == 0) {

-if (riscv_feature(env, RISCV_FEATURE_MMU)) {
+if (riscv_cpu_cfg(env)->mmu) {


Reviewed-by: LIU Zhiwei 

Zhiwei


  /*
   * If size is unknown (0), assume that all bytes
   * from addr to the end of the page will be accessed.

Re: [PATCH v1 0/6] hw/arm/virt: Support dirty ring

2023-02-16 Thread Zhenyu Zhang

[PATCH v1 0/6] hw/arm/virt: Support dirty ring

The patches work well on my arm Ampere host.
The test results are as expected.

Testing
===
(1) kvm-unit-tests/its-pending-migration,
kvm-unit-tests/its-migrate-unmapped-collection
and kvm-unit-tests/its-migration with dirty ring or normal dirty
page tracking
mechanism. All test cases passed.

QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64 ACCEL=kvm \
PROCESSOR=host ./its-migration

QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64 ACCEL=kvm \
PROCESSOR=host ./its-migrate-unmapped-collection

QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64 ACCEL=kvm \
PROCESSOR=host ./its-pending-migration

QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64
ACCEL=kvm,dirty-ring-size=65536 \
PROCESSOR=host ./its-migration

QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64
ACCEL=kvm,dirty-ring-size=65536 \
PROCESSOR=host ./its-migrate-unmapped-collection

QEMU=/home/zhenyzha/sandbox/qemu/build/qemu-system-aarch64
ACCEL=kvm,dirty-ring-size=65536 \
PROCESSOR=host ./its-pending-migration

(2) Combinations of migration, post-copy migration, e1000e and virtio-net
devices. All test cases passed.

-device '{"driver": "virtio-net-pci", "mac": "9a:97:8f:c7:cc:a6",
"rombar": 0, "netdev": "idDGdh30", "bus": "pcie-root-port-4", "addr":
"0x0"}'  \
-netdev tap,id=idDGdh30,vhost=on

-device '{"driver": "e1000e", "mac": "9a:fd:93:f1:97:b1",
"netdev": "idXDOtMA", "bus": "pcie-root-port-4", "addr": "0x0"}'  \
-netdev tap,id=idXDOtMA,vhost=on  \

(3) Simulate heavy memory pressure scenarios and compare the migration
performance difference between dirty ring and dirty logging.

I gave with a 200G memory guest, 40 vcpus, using 10g NIC as migration
channel.  When idle or dirty workload small, I don't observe major
difference on total migration time.  When with higher random dirty
workload (Anonymous mapping 180G memory, 256MB/s dirty rate upon).
Total migration time is (in seconds):


|-+|-|
| dirty ring (4k entries) | dirty logging   |
|-+|-|
|   67 | 74 |
|   67 | 74 |
|   66 | 76 |
|   66 | 73 |
|   67 | 76 |
|   67 | 76 |
|   66 | 73 |
|   67 | 74 |
|-+|-|

Summary:

dirty ring average:67s
dirty logging average: 75s

Tested-by: Zhenyu Zhang 


On Mon, Feb 13, 2023 at 8:39 AM Gavin Shan  wrote:
>
> This series intends to support dirty ring for live migration. The dirty
> ring use discrete buffer to track dirty pages. For ARM64, the speciality
> is to use backup bitmap to track dirty pages when there is no-running-vcpu
> context. It's known that the backup bitmap needs to be synchronized when
> KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup
> bitmap is collected in the last stage of migration.
>
> PATCH[1]Synchronize linux-headers for dirty ring
> PATCH[2]Introduce indicator of the last stage migration and pass it
> all the way down
> PATCH[3]Synchronize the backup bitmap in the last stage of live migration
> PATCH[4]Introduce helper kvm_dirty_ring_init() to enable the dirty ring
> PATCH[5-6]  Enable dirty ring for hw/arm/virt
>
> RFCv1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00171.html
>
> Testing
> ===
> (1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with
> dirty ring or normal dirty page tracking mechanism. All test cases passed.
>
> QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
> ./its-pending-migration
>
> QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
> ./its-migration
>
> QEMU=./qemu.main/build/qemu-system-aarch64 
> ACCEL=kvm,dirty-ring-size=65536 \
> ./its-pending-migration
>
> QEMU=./qemu.main/build/qemu-system-aarch64 
> ACCEL=kvm,dirty-ring-size=65536 \
> ./its-migration
>
> (2) Combinations of migration, post-copy migration, e1000e and virtio-net
> devices. All test cases passed.
>
> -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown  \
> -device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0
>
> -netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
> -device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0
>
> Changelog
> =
> v1:
>   * Combine two patches into one PATCH[v1 2/6] for the last stage indicator   
>  (

Re: [PATCH v6 7/9] hw/riscv/virt.c: do not use RISCV_FEATURE_MMU in create_fdt_socket_cpus()

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

Read cpu_ptr->cfg.mmu directly. As a bonus, use cpu_ptr in
riscv_isa_string().

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Weiwei Li 
Reviewed-by: Bin Meng 
Reviewed-by: Andrew Jones 
---
  hw/riscv/virt.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 86c4adc0c9..49f2c157f7 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -232,20 +232,21 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
socket,
  bool is_32_bit = riscv_is_32bit(&s->soc[0]);
  
  for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {

+RISCVCPU *cpu_ptr = &s->soc[socket].harts[cpu];
+
  cpu_phandle = (*phandle)++;
  
  cpu_name = g_strdup_printf("/cpus/cpu@%d",

  s->soc[socket].hartid_base + cpu);
  qemu_fdt_add_subnode(ms->fdt, cpu_name);
-if (riscv_feature(&s->soc[socket].harts[cpu].env,
-  RISCV_FEATURE_MMU)) {
+if (cpu_ptr->cfg.mmu) {
  qemu_fdt_setprop_string(ms->fdt, cpu_name, "mmu-type",
  (is_32_bit) ? "riscv,sv32" : 
"riscv,sv48");
  } else {
  qemu_fdt_setprop_string(ms->fdt, cpu_name, "mmu-type",
  "riscv,none");
  }
-name = riscv_isa_string(&s->soc[socket].harts[cpu]);
+name = riscv_isa_string(cpu_ptr);


Reviewed-by: LIU Zhiwei 

Zhiwei


  qemu_fdt_setprop_string(ms->fdt, cpu_name, "riscv,isa", name);
  g_free(name);
  qemu_fdt_setprop_string(ms->fdt, cpu_name, "compatible", "riscv");

Re: [PATCH 18/18] target/riscv: Move configuration check to envcfg CSRs predicate()

2023-02-16 Thread Bin Meng

Hi Palmer,

On Fri, Feb 17, 2023 at 12:40 AM Palmer Dabbelt  wrote:
>
> On Tue, 14 Feb 2023 18:22:21 PST (-0800), Bin Meng wrote:
> > On Tue, Feb 14, 2023 at 10:59 PM weiwei  wrote:
> >>
> >>
> >> On 2023/2/14 22:27, Bin Meng wrote:
> >> > At present the envcfg CSRs predicate() routines are generic one like
> >> > smode(), hmode. The configuration check is done in the read / write
> >> > routine. Create a new predicate routine to cover such check, so that
> >> > gdbstub can correctly report its existence.
> >> >
> >> > Signed-off-by: Bin Meng 
> >> >
> >> > ---
> >> >
> >> >   target/riscv/csr.c | 98 +-
> >> >   1 file changed, 61 insertions(+), 37 deletions(-)
> >> >
> >> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> >> > index 37350b8a6d..284ccc09dd 100644
> >> > --- a/target/riscv/csr.c
> >> > +++ b/target/riscv/csr.c
> >> > @@ -41,40 +41,6 @@ void riscv_set_csr_ops(int csrno, 
> >> > riscv_csr_operations *ops)
> >> >   }
> >> >
> >> >   /* Predicates */
> >> > -#if !defined(CONFIG_USER_ONLY)
> >> > -static RISCVException smstateen_acc_ok(CPURISCVState *env, int index,
> >> > -   uint64_t bit)
> >> > -{
> >> > -bool virt = riscv_cpu_virt_enabled(env);
> >> > -RISCVCPU *cpu = env_archcpu(env);
> >> > -
> >> > -if (env->priv == PRV_M || !cpu->cfg.ext_smstateen) {
> >> > -return RISCV_EXCP_NONE;
> >> > -}
> >> > -
> >> > -if (!(env->mstateen[index] & bit)) {
> >> > -return RISCV_EXCP_ILLEGAL_INST;
> >> > -}
> >> > -
> >> > -if (virt) {
> >> > -if (!(env->hstateen[index] & bit)) {
> >> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> >> > -}
> >> > -
> >> > -if (env->priv == PRV_U && !(env->sstateen[index] & bit)) {
> >> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> >> > -}
> >> > -}
> >> > -
> >> > -if (env->priv == PRV_U && riscv_has_ext(env, RVS)) {
> >> > -if (!(env->sstateen[index] & bit)) {
> >> > -return RISCV_EXCP_ILLEGAL_INST;
> >> > -}
> >> > -}
> >> > -
> >> > -return RISCV_EXCP_NONE;
> >> > -}
> >> > -#endif
> >> >
> >> >   static RISCVException fs(CPURISCVState *env, int csrno)
> >> >   {
> >> > @@ -318,6 +284,32 @@ static RISCVException umode32(CPURISCVState *env, 
> >> > int csrno)
> >> >   return umode(env, csrno);
> >> >   }
> >> >
> >> > +static RISCVException envcfg(CPURISCVState *env, int csrno)
> >> > +{
> >> > +RISCVCPU *cpu = env_archcpu(env);
> >> > +riscv_csr_predicate_fn predicate;
> >> > +
> >> > +if (cpu->cfg.ext_smstateen) {
> >> > +return RISCV_EXCP_ILLEGAL_INST;
> >> > +}
> >>
> >> This check seems not right here.  Why  ILLEGAL_INST is directly
> >> triggered if smstateen is enabled?
> >
> > This logic was there in the original codes. I was confused when I
> > looked at this as well.
> >
> > Anyway, if it is an issue, it should be a separate patch.
>
> Seems reasonable to me, it's always nice to split up the refactoring types.  
> So
> I queued this up as 4ac6c32224 ("Merge patch series "target/riscv: Various
> fixes to gdbstub and CSR access"").
>
> I had to fix up the From address on the patch you re-sent and there was a 
> minor
> merge conflict, but otherwise things look sane to me.  I'll hold off on 
> sending
> anything for a bit just in case, though.
>

There are some open comments in this series I need to address. Please
drop this v1. I will send v2 soon.

Regards,
Bin

Re: [PATCH v6 6/9] target/riscv: remove RISCV_FEATURE_PMP

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

RISCV_FEATURE_PMP is being set via riscv_set_feature() by mirroring the
cpu->cfg.pmp flag. Use the flag instead.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Weiwei Li 
Reviewed-by: Bin Meng 
Reviewed-by: Andrew Jones 
---
  target/riscv/cpu.c| 4 
  target/riscv/cpu.h| 1 -
  target/riscv/cpu_helper.c | 2 +-
  target/riscv/csr.c| 2 +-
  target/riscv/machine.c| 3 +--
  target/riscv/op_helper.c  | 2 +-
  target/riscv/pmp.c| 2 +-
  7 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 71b2042d73..7b1360d6ba 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -923,10 +923,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  riscv_set_feature(env, RISCV_FEATURE_MMU);
  }
  
-if (cpu->cfg.pmp) {

-riscv_set_feature(env, RISCV_FEATURE_PMP);
-}
-
  if (cpu->cfg.epmp && !cpu->cfg.pmp) {
  /*
   * Enhanced PMP should only be available
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 6d659d74fa..119a022af9 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -87,7 +87,6 @@
 so a cpu features bitfield is required, likewise for optional PMP support 
*/
  enum {
  RISCV_FEATURE_MMU,
-RISCV_FEATURE_PMP,
  };
  
  /* Privileged specification version */

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 4cdd247c6c..292b6b3168 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -706,7 +706,7 @@ static int get_physical_address_pmp(CPURISCVState *env, int 
*prot,
  pmp_priv_t pmp_priv;
  int pmp_index = -1;
  
-if (!riscv_feature(env, RISCV_FEATURE_PMP)) {

+if (!riscv_cpu_cfg(env)->pmp) {
  *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
  return TRANSLATE_SUCCESS;
  }
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index cdc68d3676..d0ab00d870 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -419,7 +419,7 @@ static int aia_hmode32(CPURISCVState *env, int csrno)
  
  static RISCVException pmp(CPURISCVState *env, int csrno)

  {
-if (riscv_feature(env, RISCV_FEATURE_PMP)) {
+if (riscv_cpu_cfg(env)->pmp) {
  return RISCV_EXCP_NONE;
  }
  
diff --git a/target/riscv/machine.c b/target/riscv/machine.c

index 4634968898..67e9e56853 100644
--- a/target/riscv/machine.c
+++ b/target/riscv/machine.c
@@ -27,9 +27,8 @@
  static bool pmp_needed(void *opaque)
  {
  RISCVCPU *cpu = opaque;
-CPURISCVState *env = &cpu->env;
  
-return riscv_feature(env, RISCV_FEATURE_PMP);

+return cpu->cfg.pmp;
  }
  
  static int pmp_post_load(void *opaque, int version_id)

diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index 48f918b71b..9c0b91c88f 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -195,7 +195,7 @@ target_ulong helper_mret(CPURISCVState *env)
  uint64_t mstatus = env->mstatus;
  target_ulong prev_priv = get_field(mstatus, MSTATUS_MPP);
  
-if (riscv_feature(env, RISCV_FEATURE_PMP) &&

+if (riscv_cpu_cfg(env)->pmp &&
  !pmp_get_num_rules(env) && (prev_priv != PRV_M)) {
  riscv_raise_exception(env, RISCV_EXCP_INST_ACCESS_FAULT, GETPC());
  }
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index aa4d1996e9..205bfbe090 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -265,7 +265,7 @@ static bool pmp_hart_has_privs_default(CPURISCVState *env, 
target_ulong addr,
  }
  }
  
-if ((!riscv_feature(env, RISCV_FEATURE_PMP)) || (mode == PRV_M)) {

+if (!riscv_cpu_cfg(env)->pmp || (mode == PRV_M)) {


Reviewed-by: LIU Zhiwei 

Zhiwei


  /*
   * Privileged spec v1.10 states if HW doesn't implement any PMP entry
   * or no PMP entry matches an M-Mode access, the access succeeds.

Re: [PATCH v6 5/9] target/riscv: remove RISCV_FEATURE_EPMP

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

RISCV_FEATURE_EPMP is always set to the same value as the cpu->cfg.epmp
flag. Use the flag directly.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Weiwei Li 
Reviewed-by: Bin Meng 
Reviewed-by: Andrew Jones 
---
  target/riscv/cpu.c | 10 +++---
  target/riscv/cpu.h |  1 -
  target/riscv/csr.c |  2 +-
  target/riscv/pmp.c |  4 ++--
  4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 4585ca74dc..71b2042d73 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -927,17 +927,13 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  riscv_set_feature(env, RISCV_FEATURE_PMP);
  }
  
-if (cpu->cfg.epmp) {

-riscv_set_feature(env, RISCV_FEATURE_EPMP);
-
+if (cpu->cfg.epmp && !cpu->cfg.pmp) {
  /*
   * Enhanced PMP should only be available
   * on harts with PMP support
   */
-if (!cpu->cfg.pmp) {
-error_setg(errp, "Invalid configuration: EPMP requires PMP 
support");
-return;
-}
+error_setg(errp, "Invalid configuration: EPMP requires PMP support");
+return;
  }
  
  
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h

index 2afb705930..6d659d74fa 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -88,7 +88,6 @@
  enum {
  RISCV_FEATURE_MMU,
  RISCV_FEATURE_PMP,
-RISCV_FEATURE_EPMP,
  };
  
  /* Privileged specification version */

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 58af2c0e66..cdc68d3676 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -428,7 +428,7 @@ static RISCVException pmp(CPURISCVState *env, int csrno)
  
  static RISCVException epmp(CPURISCVState *env, int csrno)

  {
-if (env->priv == PRV_M && riscv_feature(env, RISCV_FEATURE_EPMP)) {
+if (env->priv == PRV_M && riscv_cpu_cfg(env)->epmp) {
  return RISCV_EXCP_NONE;
  }
  
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c

index 4bc4113531..aa4d1996e9 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -88,7 +88,7 @@ static void pmp_write_cfg(CPURISCVState *env, uint32_t 
pmp_index, uint8_t val)
  if (pmp_index < MAX_RISCV_PMPS) {
  bool locked = true;
  
-if (riscv_feature(env, RISCV_FEATURE_EPMP)) {

+if (riscv_cpu_cfg(env)->epmp) {
  /* mseccfg.RLB is set */
  if (MSECCFG_RLB_ISSET(env)) {
  locked = false;
@@ -239,7 +239,7 @@ static bool pmp_hart_has_privs_default(CPURISCVState *env, 
target_ulong addr,
  {
  bool ret;
  
-if (riscv_feature(env, RISCV_FEATURE_EPMP)) {

+if (riscv_cpu_cfg(env)->epmp) {


Reviewed-by: LIU Zhiwei 

Zhiwei


  if (MSECCFG_MMWP_ISSET(env)) {
  /*
   * The Machine Mode Whitelist Policy (mseccfg.MMWP) is set

Re: [PATCH v6 4/9] target/riscv/cpu.c: error out if EPMP is enabled without PMP

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

Instead of silently ignoring the EPMP setting if there is no PMP
available, error out informing the user that EPMP depends on PMP
support:

$ ./qemu-system-riscv64 -cpu rv64,pmp=false,x-epmp=true
qemu-system-riscv64: Invalid configuration: EPMP requires PMP support

This will force users to pick saner options in the QEMU command line.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Weiwei Li 
Reviewed-by: Bin Meng 
Reviewed-by: Andrew Jones 
---
  target/riscv/cpu.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index e34a5e3f11..4585ca74dc 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -925,13 +925,18 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  
  if (cpu->cfg.pmp) {

  riscv_set_feature(env, RISCV_FEATURE_PMP);
+}
+
+if (cpu->cfg.epmp) {
+riscv_set_feature(env, RISCV_FEATURE_EPMP);
  
  /*

   * Enhanced PMP should only be available
   * on harts with PMP support
   */
-if (cpu->cfg.epmp) {
-riscv_set_feature(env, RISCV_FEATURE_EPMP);
+if (!cpu->cfg.pmp) {
+error_setg(errp, "Invalid configuration: EPMP requires PMP 
support");
+return;


Reviewed-by: LIU Zhiwei 

Zhiwei


  }
  }

Re: [PATCH v6 3/9] target/riscv: remove RISCV_FEATURE_DEBUG

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

RISCV_FEATURE_DEBUG will always follow the value defined by
cpu->cfg.debug flag. Read the flag instead.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Weiwei Li 
Reviewed-by: Bin Meng 
Reviewed-by: Andrew Jones 
---
  target/riscv/cpu.c| 6 +-
  target/riscv/cpu.h| 1 -
  target/riscv/cpu_helper.c | 2 +-
  target/riscv/csr.c| 2 +-
  target/riscv/machine.c| 3 +--
  5 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 93b52b826c..e34a5e3f11 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -637,7 +637,7 @@ static void riscv_cpu_reset_hold(Object *obj)
  set_default_nan_mode(1, &env->fp_status);
  
  #ifndef CONFIG_USER_ONLY

-if (riscv_feature(env, RISCV_FEATURE_DEBUG)) {
+if (cpu->cfg.debug) {
  riscv_trigger_init(env);
  }
  
@@ -935,10 +935,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp)

  }
  }
  
-if (cpu->cfg.debug) {

-riscv_set_feature(env, RISCV_FEATURE_DEBUG);
-}
-
  
  #ifndef CONFIG_USER_ONLY

  if (cpu->cfg.ext_sstc) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 5e9626837b..2afb705930 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -89,7 +89,6 @@ enum {
  RISCV_FEATURE_MMU,
  RISCV_FEATURE_PMP,
  RISCV_FEATURE_EPMP,
-RISCV_FEATURE_DEBUG
  };
  
  /* Privileged specification version */

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index ad8d82662c..4cdd247c6c 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -105,7 +105,7 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
  flags = FIELD_DP32(flags, TB_FLAGS, MSTATUS_HS_VS,
 get_field(env->mstatus_hs, MSTATUS_VS));
  }
-if (riscv_feature(env, RISCV_FEATURE_DEBUG) && !icount_enabled()) {
+if (cpu->cfg.debug && !icount_enabled()) {
  flags = FIELD_DP32(flags, TB_FLAGS, ITRIGGER, env->itrigger_enabled);
  }
  #endif
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index f7862ff4a4..58af2c0e66 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -437,7 +437,7 @@ static RISCVException epmp(CPURISCVState *env, int csrno)
  
  static RISCVException debug(CPURISCVState *env, int csrno)

  {
-if (riscv_feature(env, RISCV_FEATURE_DEBUG)) {
+if (riscv_cpu_cfg(env)->debug) {
  return RISCV_EXCP_NONE;
  }
  
diff --git a/target/riscv/machine.c b/target/riscv/machine.c

index c6ce318cce..4634968898 100644
--- a/target/riscv/machine.c
+++ b/target/riscv/machine.c
@@ -226,9 +226,8 @@ static const VMStateDescription vmstate_kvmtimer = {
  static bool debug_needed(void *opaque)
  {
  RISCVCPU *cpu = opaque;
-CPURISCVState *env = &cpu->env;
  
-return riscv_feature(env, RISCV_FEATURE_DEBUG);

+return cpu->cfg.debug;
  }
  


Reviewed-by: LIU Zhiwei 

Zhiwei


  static int debug_post_load(void *opaque, int version_id)

Re: [PATCH v6 2/9] target/riscv: introduce riscv_cpu_cfg()

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

We're going to do changes that requires accessing the RISCVCPUConfig
struct from the RISCVCPU, having access only to a CPURISCVState 'env'
pointer. Add a helper to make the code easier to read.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 01803a020d..5e9626837b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -653,6 +653,11 @@ static inline RISCVMXL riscv_cpu_mxl(CPURISCVState *env)
  #endif
  #define riscv_cpu_mxl_bits(env) (1UL << (4 + riscv_cpu_mxl(env)))
  
+static inline const RISCVCPUConfig *riscv_cpu_cfg(CPURISCVState *env)


Maybe we should

static inline const* RISCVCPUConfig riscv_cpu_cfg(CPURISCVState *env) or just
static inline RISCVCPUConfig *riscv_cpu_cfg(CPURISCVState *env)

Zhiwei


+{
+return &env_archcpu(env)->cfg;
+}
+
  #if defined(TARGET_RISCV32)
  #define cpu_recompute_xl(env)  ((void)(env), MXL_RV32)
  #else

Re: [PATCH v6 2/9] target/riscv: introduce riscv_cpu_cfg()

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

We're going to do changes that requires accessing the RISCVCPUConfig
struct from the RISCVCPU, having access only to a CPURISCVState 'env'
pointer. Add a helper to make the code easier to read.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 01803a020d..5e9626837b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -653,6 +653,11 @@ static inline RISCVMXL riscv_cpu_mxl(CPURISCVState *env)
  #endif
  #define riscv_cpu_mxl_bits(env) (1UL << (4 + riscv_cpu_mxl(env)))
  
+static inline const RISCVCPUConfig *riscv_cpu_cfg(CPURISCVState *env)

+{
+return &env_archcpu(env)->cfg;
+}
+
  #if defined(TARGET_RISCV32)
  #define cpu_recompute_xl(env)  ((void)(env), MXL_RV32)
  #else


Reviewed-by: LIU Zhiwei 

Zhiwei

Re: [PATCH v6 1/9] target/riscv: turn write_misa() into an official no-op

2023-02-16 Thread LIU Zhiwei




On 2023/2/17 5:55, Daniel Henrique Barboza wrote:

At this moment, and apparently since ever, we have no way of enabling
RISCV_FEATURE_MISA. This means that all the code from write_misa(), all
the nuts and bolts that handles how to write this CSR, has always been a
no-op as well because write_misa() will always exit earlier.

This seems to be benign in the majority of cases. Booting an Ubuntu
'virt' guest and logging all the calls to 'write_misa' shows that no
writes to MISA CSR was attempted. Writing MISA, i.e. enabling/disabling
RISC-V extensions after the machine is powered on, seems to be a niche
use.

Before proceeding, let's recap what the spec says about MISA. It is a
CSR that is divided in 3 fields:

- MXL, Machine XLEN, described as "may be writable";

- MXLEN, the XLEN in M-mode, which is given by the setting of MXL or a
fixed value if MISA is zero;

- Extensions is defined as "a WARL field that can contain writable bits
where the implementation allows the supported ISA to be modified"

Thus what we have today (write_misa() being a no-op) is already a valid
spec implementation. We're not obliged to have a particular set of MISA
writable bits, and at this moment we have none.


Hi Daniel,

I see there has been a discussion on this topic. And as no-op has no 
harmfulness for current implementation.
However, I still think we should make misa writable as default, which is 
also a valid spec implementation.


One reason is that may be we need to dynamic write  access for some cpus 
in the future. The other is we should
make QEMU a more useful implementation, not just a legal implementation. 
We have done in many aspects on this direction.


I prefer your implementation before v4. It's not a complicated 
implementation. And I think the other extensions on QEMU currently

can mostly be configurable already.

Your work is a good step towards to unify the configuration and the 
check.  I think two more steps we can go further.


1) Remove RVI/RVF and the similar macros, and add fields for them in the 
configuration struct.


2) Unify the check about configuration. write_misa and cpu_realize_fn 
can use the same check function.



As we have done these two steps, I think we can go more closely for the 
profile extension.



Zhiwei


Given that allowing the dormant code to write MISA can cause tricky bugs
to solve later on, and we don't have a particularly interesting case of
writing MISA to support today, and we're already not violating the
specification, let's erase all the body of write_misa() and turn it into
an official no-op instead of an accidental one. We'll keep consistent
with what we provide users today but with 50+ less lines to maintain.

RISCV_FEATURE_MISA enum is erased in the process since there's no one
else using it.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Bin Meng 
Reviewed-by: Andrew Jones 
---
  target/riscv/cpu.h |  1 -
  target/riscv/csr.c | 55 --
  2 files changed, 56 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7128438d8e..01803a020d 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -89,7 +89,6 @@ enum {
  RISCV_FEATURE_MMU,
  RISCV_FEATURE_PMP,
  RISCV_FEATURE_EPMP,
-RISCV_FEATURE_MISA,
  RISCV_FEATURE_DEBUG
  };
  
diff --git a/target/riscv/csr.c b/target/riscv/csr.c

index 1b0a0c1693..f7862ff4a4 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1329,61 +1329,6 @@ static RISCVException read_misa(CPURISCVState *env, int 
csrno,
  static RISCVException write_misa(CPURISCVState *env, int csrno,
   target_ulong val)
  {
-if (!riscv_feature(env, RISCV_FEATURE_MISA)) {
-/* drop write to misa */
-return RISCV_EXCP_NONE;
-}
-
-/* 'I' or 'E' must be present */
-if (!(val & (RVI | RVE))) {
-/* It is not, drop write to misa */
-return RISCV_EXCP_NONE;
-}
-
-/* 'E' excludes all other extensions */
-if (val & RVE) {
-/* when we support 'E' we can do "val = RVE;" however
- * for now we just drop writes if 'E' is present.
- */
-return RISCV_EXCP_NONE;
-}
-
-/*
- * misa.MXL writes are not supported by QEMU.
- * Drop writes to those bits.
- */
-
-/* Mask extensions that are not supported by this hart */
-val &= env->misa_ext_mask;
-
-/* Mask extensions that are not supported by QEMU */
-val &= (RVI | RVE | RVM | RVA | RVF | RVD | RVC | RVS | RVU | RVV);
-
-/* 'D' depends on 'F', so clear 'D' if 'F' is not present */
-if ((val & RVD) && !(val & RVF)) {
-val &= ~RVD;
-}
-
-/* Suppress 'C' if next instruction is not aligned
- * TODO: this should check next_pc
- */
-if ((val & RVC) && (GETPC() & ~3) != 0) {
-val &= ~RVC;
-}
-
-/* If nothing changed, do nothing. */
-if (val == env->misa_ext) {
-return RISCV_EXCP_NONE;
-}
-
-if (!(val & RVF)) {
-

Re: [PATCH] i386: Add new CPU model SapphireRapids

2023-02-16 Thread Xiaoyao Li


On 9/21/2022 10:51 PM, Dr. David Alan Gilbert wrote:

* Wang, Lei (lei4.w...@intel.com) wrote:

The new CPU model mostly inherits features from Icelake-Server, while
adding new features:
  - AMX (Advance Matrix eXtensions)
  - Bus Lock Debug Exception
and new instructions:
  - AVX VNNI (Vector Neural Network Instruction):
 - VPDPBUS: Multiply and Add Unsigned and Signed Bytes
 - VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation
 - VPDPWSSD: Multiply and Add Signed Word Integers
 - VPDPWSSDS: Multiply and Add Signed Integers with Saturation
  - FP16: Replicates existing AVX512 computational SP (FP32) instructions
using FP16 instead of FP32 for ~2X performance gain
  - SERIALIZE: Provide software with a simple way to force the processor to
complete all modifications, faster, allowed in all privilege levels and
not causing an unconditional VM exit
  - TSX Suspend Load Address Tracking: Allows programmers to choose which
memory accesses do not need to be tracked in the TSX read set
  - AVX512_BF16: Vector Neural Network Instructions supporting BFLOAT16
inputs and conversion instructions from IEEE single precision

Features may be added in future versions:
  - CET (virtualization support hasn't been merged)
Instructions may be added in future versions:
  - fast zero-length MOVSB (KVM doesn't support yet)
  - fast short STOSB (KVM doesn't support yet)
  - fast short CMPSB, SCASB (KVM doesn't support yet)

Signed-off-by: Wang, Lei 
Reviewed-by: Robert Hoo 


Hi,
What fills in the AMX tile and tmul information leafs
(0x1D, 0x1E)?


Current QEMU hard-codes the value of AMX tile and tmul information leafs 
(0x1D, 0x1E) if AMX is exposed to guest. In cpu_x86_cpuid(), 
target/i386/cpu.c


case 0x1D: {
/* AMX TILE */
*eax = 0;
*ebx = 0;
*ecx = 0;
*edx = 0;
if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
break;
}

if (count == 0) {
/* Highest numbered palette subleaf */
*eax = INTEL_AMX_TILE_MAX_SUBLEAF;
} else if (count == 1) {
*eax = INTEL_AMX_TOTAL_TILE_BYTES |
   (INTEL_AMX_BYTES_PER_TILE << 16);
*ebx = INTEL_AMX_BYTES_PER_ROW | (INTEL_AMX_TILE_MAX_NAMES 
<< 16);

*ecx = INTEL_AMX_TILE_MAX_ROWS;
}
break;
}
case 0x1E: {
/* AMX TMUL */
*eax = 0;
*ebx = 0;
*ecx = 0;
*edx = 0;
if (!(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_AMX_TILE)) {
break;
}

if (count == 0) {
/* Highest numbered palette subleaf */
*ebx = INTEL_AMX_TMUL_MAX_K | (INTEL_AMX_TMUL_MAX_N << 8);
}
break;
}




   In particular, how would we make sure when we migrate between two
generations of AMX/Tile/Tmul capable devices with different
register/palette/tmul limits that the migration is tied to the CPU type
correctly?


Since they'are hard-coded. The value of guest never change no matter 
what HW is.



   Would you expect all devices called a 'SappireRapids' to have the same
sizes?


I suppose here the devices you mean HW platform? If so, Intel commits 
that palette 1 value (CPUID leaf 0x1d, subleaf 0x1) will never change. 
And TMUL capability (CPUID leaf 0x1e) are constant for a few generations 
after SPR.


But this has no impact on migration safety as long as CPUID value of 
leaf 0x1d and 0x1e are tied to a named CPU model and doesn't vary on 
different hosts. And current QEMU code satisfy it since the values are 
hard-coded.


So, IMHO, it seems OK to define AMX in SPR cpu model with current QEMU 
as this patch does. Although hard-coded value of 0x1E seems to have 
potential issue if far future product reports smaller value (Intel only 
can commit it doesn't change for a few generations so far), it's another 
thing we can handle separately for AMX.


Dave and Paolo, what do you think?


Dave
  

---
  target/i386/cpu.c | 128 ++
  target/i386/cpu.h |   4 ++
  2 files changed, 132 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1db1278a59..abb43853d4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -3467,6 +3467,134 @@ static const X86CPUDefinition builtin_x86_defs[] = {
  { /* end of list */ }
  }
  },
+{
+.name = "SapphireRapids",
+.level = 0x20,
+.vendor = CPUID_VENDOR_INTEL,
+.family = 6,
+.model = 143,
+.stepping = 4,
+/*
+ * please keep the ascending order so that we can have a clear view of
+ * bit position of each feature.
+ */
+.features[FEAT_1_EDX] =
+CPUID_FP87 | CPUID_VME | CPUID_DE | CPUID_PSE | CPUID_TSC |
+CPUID_MSR | CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC |
+CPUID_SEP | CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV |
+

Re: [PATCH v6 2/9] target/riscv: introduce riscv_cpu_cfg()

2023-02-16 Thread Bin Meng

On Fri, Feb 17, 2023 at 8:45 AM Bin Meng  wrote:
>
> On Fri, Feb 17, 2023 at 5:57 AM Daniel Henrique Barboza
>  wrote:
> >
> > We're going to do changes that requires accessing the RISCVCPUConfig
> > struct from the RISCVCPU, having access only to a CPURISCVState 'env'
> > pointer. Add a helper to make the code easier to read.
> >
> > Signed-off-by: Daniel Henrique Barboza 
> > ---
> >  target/riscv/cpu.h | 5 +
> >  1 file changed, 5 insertions(+)
> >
>
> Looks like the RB tag is missing somehow?

Never mind. I see the difference :)

Regards,
Bin

Re: [PATCH v6 2/9] target/riscv: introduce riscv_cpu_cfg()

2023-02-16 Thread Bin Meng

On Fri, Feb 17, 2023 at 5:57 AM Daniel Henrique Barboza
 wrote:
>
> We're going to do changes that requires accessing the RISCVCPUConfig
> struct from the RISCVCPU, having access only to a CPURISCVState 'env'
> pointer. Add a helper to make the code easier to read.
>
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  target/riscv/cpu.h | 5 +
>  1 file changed, 5 insertions(+)
>

Looks like the RB tag is missing somehow?

Reviewed-by: Bin Meng

Re: [PATCH v3 02/11] build: Don't specify -no-pie for --static user-mode programs

2023-02-16 Thread Alex Bennée



Warner Losh  writes:

> When building with clang, -no-pie gives a warning on every single build,
> so remove it.
>
> Signed-off-by: Warner Losh 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v2 1/1] vhost-user-fs: add property to allow migration

2023-02-16 Thread Anton Kuchin


/* resend with fixed to: and cc: */

On 16/02/2023 18:22, Juan Quintela wrote:

"Michael S. Tsirkin"  wrote:

On Thu, Feb 16, 2023 at 11:11:22AM -0500, Michael S. Tsirkin wrote:

On Thu, Feb 16, 2023 at 03:14:05PM +0100, Juan Quintela wrote:

Anton Kuchin  wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But it is good to have an option for orchestrator to tune this according to
backend capabilities and migration configuration.

This patch adds device property 'migration' that is 'none' by default
to keep old behaviour but can be set to 'external' to explicitly allow
migration with minimal virtio device state in migration stream if daemon
has some way to sync FUSE state on src and dst without help from qemu.

Signed-off-by: Anton Kuchin 

Reviewed-by: Juan Quintela 

The migration bits are correct.

And I can think a better way to explain that one device is migrated
externally.


I'm bad at wording but I'll try to improve this one.
Suggestions will be really appreciated.



If you have to respin:


+static int vhost_user_fs_pre_save(void *opaque)
+{
+VHostUserFS *fs = (VHostUserFS *)opaque;

This hack is useless.


I will. Will get rid of that, thanks.


meaning the cast? yes.


I know that there are still lots of code that still have it.


Now remember that I have no clue about vhost-user-fs.

But this looks fishy

  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_UINT8(migration_type, VHostUserFS),
+VMSTATE_END_OF_LIST()

In fact why do we want to migrate this property?
We generally don't, we only migrate state.

See previous discussion.
In a nutshell, we are going to have internal migration in the future
(not done yet).

Later, Juan.


I think Michael is right. We don't need it at destination to know
what data is in the stream because subsections will tell us all we
need to know.




+},
+   .pre_save = vhost_user_fs_pre_save,
  };
  
  static Property vuf_properties[] = {

@@ -309,6 +337,10 @@ static Property vuf_properties[] = {
  DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
 conf.num_request_queues, 1),
  DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type,
+ VHOST_USER_MIGRATION_TYPE_NONE,
+ qdev_prop_vhost_user_migration_type,
+ uint8_t),
  DEFINE_PROP_END_OF_LIST(),

We have four properties here (5 with the new migration one), and you
only migrate one.

This looks fishy, but I don't know if it makes sense.
If they _have_ to be configured the same on source and destination, I
would transfer them and check in post_load that the values are correct.

Later, Juan.

Weird suggestion.  We generally don't do this kind of check - that
would be open-coding each property. It's management's job to make
sure things are consistent.

--
MST

[PATCH v3 04/11] bsd-user: various helper routines for sysctl

2023-02-16 Thread Warner Losh

cap_memory - Caps the memory to just below MAXINT
scale_to_guest_pages - Account for difference in host / guest page size
h2g_long_sat - converts a int64_t to a int32_t, saturating at max / min values
h2g_ulong_sat - converts a uint64_t to a uint32_t, saturating at max value

Signed-off-by: Warner Losh 
---
 bsd-user/freebsd/os-sys.c | 86 +++
 1 file changed, 86 insertions(+)

diff --git a/bsd-user/freebsd/os-sys.c b/bsd-user/freebsd/os-sys.c
index 1676ec10f83..9b84e90cb32 100644
--- a/bsd-user/freebsd/os-sys.c
+++ b/bsd-user/freebsd/os-sys.c
@@ -21,6 +21,92 @@
 #include "qemu.h"
 #include "target_arch_sysarch.h"
 
+#include 
+
+/*
+ * Length for the fixed length types.
+ * 0 means variable length for strings and structures
+ * Compare with sys/kern_sysctl.c ctl_size
+ * Note: Not all types appear to be used in-tree.
+ */
+static const int G_GNUC_UNUSED guest_ctl_size[CTLTYPE+1] = {
+[CTLTYPE_INT] = sizeof(abi_int),
+[CTLTYPE_UINT] = sizeof(abi_uint),
+[CTLTYPE_LONG] = sizeof(abi_long),
+[CTLTYPE_ULONG] = sizeof(abi_ulong),
+[CTLTYPE_S8] = sizeof(int8_t),
+[CTLTYPE_S16] = sizeof(int16_t),
+[CTLTYPE_S32] = sizeof(int32_t),
+[CTLTYPE_S64] = sizeof(int64_t),
+[CTLTYPE_U8] = sizeof(uint8_t),
+[CTLTYPE_U16] = sizeof(uint16_t),
+[CTLTYPE_U32] = sizeof(uint32_t),
+[CTLTYPE_U64] = sizeof(uint64_t),
+};
+
+static const int G_GNUC_UNUSED host_ctl_size[CTLTYPE+1] = {
+[CTLTYPE_INT] = sizeof(int),
+[CTLTYPE_UINT] = sizeof(u_int),
+[CTLTYPE_LONG] = sizeof(long),
+[CTLTYPE_ULONG] = sizeof(u_long),
+[CTLTYPE_S8] = sizeof(int8_t),
+[CTLTYPE_S16] = sizeof(int16_t),
+[CTLTYPE_S32] = sizeof(int32_t),
+[CTLTYPE_S64] = sizeof(int64_t),
+[CTLTYPE_U8] = sizeof(uint8_t),
+[CTLTYPE_U16] = sizeof(uint16_t),
+[CTLTYPE_U32] = sizeof(uint32_t),
+[CTLTYPE_U64] = sizeof(uint64_t),
+};
+
+#ifdef TARGET_ABI32
+/*
+ * Limit the amount of available memory to be most of the 32-bit address
+ * space. 0x100c000 was arrived at through trial and error as a good
+ * definition of 'most'.
+ */
+static const abi_ulong guest_max_mem = UINT32_MAX - 0x100c000 + 1;
+
+static abi_ulong G_GNUC_UNUSED cap_memory(uint64_t mem)
+{
+return MIN(guest_max_mem, mem);
+}
+#endif
+
+static abi_ulong G_GNUC_UNUSED scale_to_guest_pages(uint64_t pages)
+{
+/* Scale pages from host to guest */
+pages = muldiv64(pages, qemu_real_host_page_size(), TARGET_PAGE_SIZE);
+#ifdef TARGET_ABI32
+/* cap pages if need be */
+pages = MIN(pages, guest_max_mem / (abi_ulong)TARGET_PAGE_SIZE);
+#endif
+return pages;
+}
+
+#ifdef TARGET_ABI32
+/* Used only for TARGET_ABI32 */
+static abi_long G_GNUC_UNUSED h2g_long_sat(long l)
+{
+if (l > INT32_MAX) {
+l = INT32_MAX;
+} else if (l < INT32_MIN) {
+l = INT32_MIN;
+}
+return l;
+}
+
+static abi_ulong G_GNUC_UNUSED h2g_ulong_sat(u_long ul)
+{
+return MIN(ul, UINT32_MAX);
+}
+#endif
+
+/*
+ * placeholder until bsd-user downstream upstreams this with its thread support
+ */
+#define bsd_get_ncpu() 1
+
 /* sysarch() is architecture dependent. */
 abi_long do_freebsd_sysarch(void *cpu_env, abi_long arg1, abi_long arg2)
 {
-- 
2.39.1

[PATCH v3 09/11] bsd-user: Start translation of arch-specific sysctls

2023-02-16 Thread Warner Losh

From: Juergen Lock 

Intercept some syscalls that we need to translate (like the archiecture
we're running on) and translate them. These are only the simplest ones
so far.

Signed-off-by: Juergen Lock 
Co-Authored-by: Stacey Son 
Signed-off-by: Stacey Son 
Co-Authored-by: Warner Losh 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/os-sys.c | 145 +-
 1 file changed, 143 insertions(+), 2 deletions(-)

diff --git a/bsd-user/freebsd/os-sys.c b/bsd-user/freebsd/os-sys.c
index 42f0cc82279..1464e64428f 100644
--- a/bsd-user/freebsd/os-sys.c
+++ b/bsd-user/freebsd/os-sys.c
@@ -67,13 +67,13 @@ static const int host_ctl_size[CTLTYPE+1] = {
  */
 static const abi_ulong guest_max_mem = UINT32_MAX - 0x100c000 + 1;
 
-static abi_ulong G_GNUC_UNUSED cap_memory(uint64_t mem)
+static abi_ulong cap_memory(uint64_t mem)
 {
 return MIN(guest_max_mem, mem);
 }
 #endif
 
-static abi_ulong G_GNUC_UNUSED scale_to_guest_pages(uint64_t pages)
+static abi_ulong scale_to_guest_pages(uint64_t pages)
 {
 /* Scale pages from host to guest */
 pages = muldiv64(pages, qemu_real_host_page_size(), TARGET_PAGE_SIZE);
@@ -264,6 +264,146 @@ static abi_long G_GNUC_UNUSED 
do_freebsd_sysctl_oid(CPUArchState *env, int32_t *
 oidfmt(snamep, namelen, NULL, &kind);
 
 /* Handle some arch/emulator dependent sysctl()'s here. */
+switch (snamep[0]) {
+case CTL_KERN:
+switch (snamep[1]) {
+case KERN_USRSTACK:
+if (oldlen) {
+(*(abi_ulong *)holdp) = tswapal(TARGET_USRSTACK);
+}
+holdlen = sizeof(abi_ulong);
+ret = 0;
+goto out;
+
+case KERN_PS_STRINGS:
+if (oldlen) {
+(*(abi_ulong *)holdp) = tswapal(TARGET_PS_STRINGS);
+}
+holdlen = sizeof(abi_ulong);
+ret = 0;
+goto out;
+
+default:
+break;
+}
+break;
+
+case CTL_HW:
+switch (snamep[1]) {
+case HW_MACHINE:
+holdlen = sizeof(TARGET_HW_MACHINE);
+if (holdp) {
+strlcpy(holdp, TARGET_HW_MACHINE, oldlen);
+}
+ret = 0;
+goto out;
+
+case HW_MACHINE_ARCH:
+{
+holdlen = sizeof(TARGET_HW_MACHINE_ARCH);
+if (holdp) {
+strlcpy(holdp, TARGET_HW_MACHINE_ARCH, oldlen);
+}
+ret = 0;
+goto out;
+}
+case HW_NCPU:
+if (oldlen) {
+(*(abi_int *)holdp) = tswap32(bsd_get_ncpu());
+}
+holdlen = sizeof(int32_t);
+ret = 0;
+goto out;
+#if defined(TARGET_ARM)
+case HW_FLOATINGPT:
+if (oldlen) {
+ARMCPU *cpu = env_archcpu(env);
+*(abi_int *)holdp = cpu_isar_feature(aa32_vfp, cpu);
+}
+holdlen = sizeof(abi_int);
+ret = 0;
+goto out;
+#endif
+
+
+#ifdef TARGET_ABI32
+case HW_PHYSMEM:
+case HW_USERMEM:
+case HW_REALMEM:
+holdlen = sizeof(abi_ulong);
+ret = 0;
+
+if (oldlen) {
+int mib[2] = {snamep[0], snamep[1]};
+unsigned long lvalue;
+size_t len = sizeof(lvalue);
+
+if (sysctl(mib, 2, &lvalue, &len, NULL, 0) == -1) {
+ret = -1;
+} else {
+lvalue = cap_memory(lvalue);
+(*(abi_ulong *)holdp) = tswapal((abi_ulong)lvalue);
+}
+}
+goto out;
+#endif
+
+default:
+{
+static int oid_hw_availpages;
+static int oid_hw_pagesizes;
+
+if (!oid_hw_availpages) {
+int real_oid[CTL_MAXNAME + 2];
+size_t len = sizeof(real_oid) / sizeof(int);
+
+if (sysctlnametomib("hw.availpages", real_oid, &len) >= 0) {
+oid_hw_availpages = real_oid[1];
+}
+}
+if (!oid_hw_pagesizes) {
+int real_oid[CTL_MAXNAME + 2];
+size_t len = sizeof(real_oid) / sizeof(int);
+
+if (sysctlnametomib("hw.pagesizes", real_oid, &len) >= 0) {
+oid_hw_pagesizes = real_oid[1];
+}
+}
+
+if (oid_hw_availpages && snamep[1] == oid_hw_availpages) {
+long lvalue;
+size_t len = sizeof(lvalue);
+
+if (sysctlbyname("hw.availpages", &lvalue, &len, NULL, 0) == 
-1) {
+ret = -1;
+} else {
+if (oldlen) {
+lvalue = scale_to_guest_pages(lvalue);
+(*(abi_ulong *)holdp) = tswapal((abi_ulong)lvalue);
+}
+holdlen = sizeof(abi_ulong);
+

[PATCH v3 06/11] bsd-user: Helper routines h2g_old_sysctl

2023-02-16 Thread Warner Losh

h2g_old_sysctl does the byte swapping in the data to return it to the
target for the 'well known' types. For most of the types, either the
data is returned verbatim (strings, byte size, opaque we don't know
about) or it's returned with byte swapping (for all the integer
types). However, for ABI32 targets, LONG and ULONG are different sizes,
and need to be carefully converted (along with help from the caller).

Co-Authored-by: Sean Bruno 
Signed-off-by: Sean Bruno 
Co-Authored-by: Juergen Lock 
Signed-off-by: Juergen Lock 
Co-Authored-by: Raphael Kubo da Costa 
Signed-off-by: Raphael Kubo da Costa 
Co-Authored-by: Stacey Son 
Signed-off-by: Stacey Son 
Signed-off-by: Warner Losh 
---
 bsd-user/freebsd/os-sys.c | 100 --
 1 file changed, 96 insertions(+), 4 deletions(-)

diff --git a/bsd-user/freebsd/os-sys.c b/bsd-user/freebsd/os-sys.c
index 1bf2b51820e..77c2b157c61 100644
--- a/bsd-user/freebsd/os-sys.c
+++ b/bsd-user/freebsd/os-sys.c
@@ -29,7 +29,7 @@
  * Compare with sys/kern_sysctl.c ctl_size
  * Note: Not all types appear to be used in-tree.
  */
-static const int G_GNUC_UNUSED guest_ctl_size[CTLTYPE+1] = {
+static const int guest_ctl_size[CTLTYPE+1] = {
 [CTLTYPE_INT] = sizeof(abi_int),
 [CTLTYPE_UINT] = sizeof(abi_uint),
 [CTLTYPE_LONG] = sizeof(abi_long),
@@ -44,7 +44,7 @@ static const int G_GNUC_UNUSED guest_ctl_size[CTLTYPE+1] = {
 [CTLTYPE_U64] = sizeof(uint64_t),
 };
 
-static const int G_GNUC_UNUSED host_ctl_size[CTLTYPE+1] = {
+static const int host_ctl_size[CTLTYPE+1] = {
 [CTLTYPE_INT] = sizeof(int),
 [CTLTYPE_UINT] = sizeof(u_int),
 [CTLTYPE_LONG] = sizeof(long),
@@ -86,7 +86,7 @@ static abi_ulong G_GNUC_UNUSED scale_to_guest_pages(uint64_t 
pages)
 
 #ifdef TARGET_ABI32
 /* Used only for TARGET_ABI32 */
-static abi_long G_GNUC_UNUSED h2g_long_sat(long l)
+static abi_long h2g_long_sat(long l)
 {
 if (l > INT32_MAX) {
 l = INT32_MAX;
@@ -96,7 +96,7 @@ static abi_long G_GNUC_UNUSED h2g_long_sat(long l)
 return l;
 }
 
-static abi_ulong G_GNUC_UNUSED h2g_ulong_sat(u_long ul)
+static abi_ulong h2g_ulong_sat(u_long ul)
 {
 return MIN(ul, UINT32_MAX);
 }
@@ -139,6 +139,98 @@ static int G_GNUC_UNUSED oidfmt(int *oid, int len, char 
*fmt, uint32_t *kind)
 return 0;
 }
 
+/*
+ * Convert the old value from host to guest.
+ *
+ * For LONG and ULONG on ABI32, we need to 'down convert' the 8 byte quantities
+ * to 4 bytes. The caller setup a buffer in host memory to get this data from
+ * the kernel and pass it to us. We do the down conversion and adjust the 
length
+ * so the caller knows what to write as the returned length into the target 
when
+ * it copies the down converted values into the target.
+ *
+ * For normal integral types, we just need to byte swap. No size changes.
+ *
+ * For strings and node data, there's no conversion needed.
+ *
+ * For opaque data, per sysctl OID converts take care of it.
+ */
+static void G_GNUC_UNUSED h2g_old_sysctl(void *holdp, size_t *holdlen, 
uint32_t kind)
+{
+size_t len;
+int hlen, glen;
+uint8_t *hp, *gp;
+
+/*
+ * Although rare, we can have arrays of sysctl. Both sysctl_old_ddb in
+ * kern_sysctl.c and show_var in sbin/sysctl/sysctl.c have code that loops
+ * this way.  *holdlen has been set by the kernel to the host's length.
+ * Only LONG and ULONG on ABI32 have different sizes: see below.
+ */
+gp = hp = (uint8_t *)holdp;
+len = 0;
+hlen = host_ctl_size[kind & CTLTYPE];
+glen = guest_ctl_size[kind & CTLTYPE];
+
+/*
+ * hlen == 0 for CTLTYPE_STRING and CTLTYPE_NODE, which need no conversion
+ * as well as CTLTYPE_OPAQUE, which needs special converters.
+ */
+if (hlen == 0) {
+return;
+}
+
+while (len < *holdlen) {
+if (hlen == glen) {
+switch (hlen) {
+case 1:
+/* Nothing needed: no byteswapping and assigning in place */
+break;
+case 2:
+*(uint16_t *)gp = tswap16(*(uint16_t *)hp);
+break;
+case 4:
+*(uint32_t *)gp = tswap32(*(uint32_t *)hp);
+break;
+case 8:
+*(uint64_t *)gp = tswap64(*(uint64_t *)hp);
+break;
+default:
+g_assert_not_reached();
+}
+}
+else {
+#ifdef TARGET_ABI32
+/*
+ * Saturating assignment for the only two types that differ between
+ * 32-bit and 64-bit machines. All other integral types have the
+ * same, fixed size and will be converted w/o loss of precision
+ * in the above switch.
+ */
+switch (kind & CTLTYPE) {
+case CTLTYPE_LONG:
+*(abi_long *)gp = tswap32(h2g_long_sat(*(long *)hp));
+break;
+case CTLTYPE_ULONG:
+*(abi_ulong *)gp

[PATCH v3 08/11] bsd-user: common routine do_freebsd_sysctl_oid for all sysctl variants

2023-02-16 Thread Warner Losh

From: Juergen Lock 

do_freebsd_sysctl_oid filters out some of the binary and special sysctls
where host != target. None of the sysctls that have to be translated from
host to target are handled here.

Signed-off-by: Juergen Lock 
Co-Authored-by: Stacey Son 
Signed-off-by: Stacey Son 
Co-Authored-by: Warner Losh 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/os-sys.c | 90 +--
 1 file changed, 86 insertions(+), 4 deletions(-)

diff --git a/bsd-user/freebsd/os-sys.c b/bsd-user/freebsd/os-sys.c
index dee8c92309b..42f0cc82279 100644
--- a/bsd-user/freebsd/os-sys.c
+++ b/bsd-user/freebsd/os-sys.c
@@ -112,7 +112,7 @@ static abi_ulong h2g_ulong_sat(u_long ul)
  * sysctl, see /sys/kern/kern_sysctl.c:sysctl_sysctl_oidfmt() (compare to
  * src/sbin/sysctl/sysctl.c)
  */
-static int G_GNUC_UNUSED oidfmt(int *oid, int len, char *fmt, uint32_t *kind)
+static int oidfmt(int *oid, int len, char *fmt, uint32_t *kind)
 {
 int qoid[CTL_MAXNAME + 2];
 uint8_t buf[BUFSIZ];
@@ -154,7 +154,7 @@ static int G_GNUC_UNUSED oidfmt(int *oid, int len, char 
*fmt, uint32_t *kind)
  *
  * For opaque data, per sysctl OID converts take care of it.
  */
-static void G_GNUC_UNUSED h2g_old_sysctl(void *holdp, size_t *holdlen, 
uint32_t kind)
+static void h2g_old_sysctl(void *holdp, size_t *holdlen, uint32_t kind)
 {
 size_t len;
 int hlen, glen;
@@ -234,7 +234,7 @@ static void G_GNUC_UNUSED h2g_old_sysctl(void *holdp, 
size_t *holdlen, uint32_t
 /*
  * Convert the undocmented name2oid sysctl data for the target.
  */
-static inline void G_GNUC_UNUSED sysctl_name2oid(uint32_t *holdp, size_t 
holdlen)
+static inline void sysctl_name2oid(uint32_t *holdp, size_t holdlen)
 {
 size_t i, num = holdlen / sizeof(uint32_t);
 
@@ -243,12 +243,94 @@ static inline void G_GNUC_UNUSED sysctl_name2oid(uint32_t 
*holdp, size_t holdlen
 }
 }
 
-static inline void G_GNUC_UNUSED sysctl_oidfmt(uint32_t *holdp)
+static inline void sysctl_oidfmt(uint32_t *holdp)
 {
 /* byte swap the kind */
 holdp[0] = tswap32(holdp[0]);
 }
 
+static abi_long G_GNUC_UNUSED do_freebsd_sysctl_oid(CPUArchState *env, int32_t 
*snamep,
+int32_t namelen, void *holdp, size_t *holdlenp, void *hnewp,
+size_t newlen)
+{
+uint32_t kind = 0;
+abi_long ret;
+size_t holdlen, oldlen;
+#ifdef TARGET_ABI32
+void *old_holdp;
+#endif
+
+holdlen = oldlen = *holdlenp;
+oidfmt(snamep, namelen, NULL, &kind);
+
+/* Handle some arch/emulator dependent sysctl()'s here. */
+
+#ifdef TARGET_ABI32
+/*
+ * For long and ulong with a 64-bit host and a 32-bit target we have to do
+ * special things. holdlen here is the length provided by the target to the
+ * system call. So we allocate a buffer twice as large because longs are 
twice
+ * as big on the host which will be writing them. In h2g_old_sysctl we'll 
adjust
+ * them and adjust the length.
+ */
+if (kind == CTLTYPE_LONG || kind == CTLTYPE_ULONG) {
+old_holdp = holdp;
+holdlen = holdlen * 2;
+holdp = g_malloc(holdlen);
+}
+#endif
+
+ret = get_errno(sysctl(snamep, namelen, holdp, &holdlen, hnewp, newlen));
+if (!ret && (holdp != 0)) {
+
+if (snamep[0] == CTL_SYSCTL) {
+switch (snamep[1]) {
+case CTL_SYSCTL_NEXT:
+case CTL_SYSCTL_NAME2OID:
+case CTL_SYSCTL_NEXTNOSKIP:
+/*
+ * All of these return an OID array, so we need to convert to
+ * target.
+ */
+sysctl_name2oid(holdp, holdlen);
+break;
+
+case CTL_SYSCTL_OIDFMT:
+/* Handle oidfmt */
+sysctl_oidfmt(holdp);
+break;
+case CTL_SYSCTL_OIDDESCR:
+case CTL_SYSCTL_OIDLABEL:
+default:
+/* Handle it based on the type */
+h2g_old_sysctl(holdp, &holdlen, kind);
+/* NB: None of these are LONG or ULONG */
+break;
+}
+} else {
+/*
+ * Need to convert from host to target. All the weird special cases
+ * are handled above.
+ */
+h2g_old_sysctl(holdp, &holdlen, kind);
+#ifdef TARGET_ABI32
+/*
+ * For the 32-bit on 64-bit case, for longs we need to copy the
+ * now-converted buffer to the target and free the buffer.
+ */
+if (kind == CTLTYPE_LONG || kind == CTLTYPE_ULONG) {
+memcpy(old_holdp, holdp, holdlen);
+g_free(holdp);
+holdp = old_holdp;
+}
+#endif
+}
+}
+
+*holdlenp = holdlen;
+return ret;
+}
+
 /* sysarch() is architecture dependent. */
 abi_long do_freebsd_sysarch(void *cpu_env, abi_long arg1, abi_long arg2)
 {
-- 
2.39.1

[PATCH v3 07/11] bsd-user: sysctl helper funtions: sysctl_name2oid and sysctl_oidfmt

2023-02-16 Thread Warner Losh

From: Juergen Lock 

Helper functions for sysctl implementations. sysctl_name2oid and
sysctl_oidfmt convert oids between host and targets

Signed-off-by: Juergen Lock 
Reviewed-by: Warner Losh 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/os-sys.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/bsd-user/freebsd/os-sys.c b/bsd-user/freebsd/os-sys.c
index 77c2b157c61..dee8c92309b 100644
--- a/bsd-user/freebsd/os-sys.c
+++ b/bsd-user/freebsd/os-sys.c
@@ -231,6 +231,24 @@ static void G_GNUC_UNUSED h2g_old_sysctl(void *holdp, 
size_t *holdlen, uint32_t
 #endif
 }
 
+/*
+ * Convert the undocmented name2oid sysctl data for the target.
+ */
+static inline void G_GNUC_UNUSED sysctl_name2oid(uint32_t *holdp, size_t 
holdlen)
+{
+size_t i, num = holdlen / sizeof(uint32_t);
+
+for (i = 0; i < num; i++) {
+holdp[i] = tswap32(holdp[i]);
+}
+}
+
+static inline void G_GNUC_UNUSED sysctl_oidfmt(uint32_t *holdp)
+{
+/* byte swap the kind */
+holdp[0] = tswap32(holdp[0]);
+}
+
 /* sysarch() is architecture dependent. */
 abi_long do_freebsd_sysarch(void *cpu_env, abi_long arg1, abi_long arg2)
 {
-- 
2.39.1

[PATCH v3 01/11] bsd-user: Don't truncate the return value from freebsd_syscall

2023-02-16 Thread Warner Losh

From: Doug Rabson 

System call return values on FreeBSD are in a register (which is spelled
abi_long in qemu). This was being assigned into an int variable which
causes problems for 64bit targets.

Resolves: https://github.com/qemu-bsd-user/qemu-bsd-user/issues/40
Signed-off-by: Doug Rabson 
Reviewed-by: Warner Losh 
[ Edited commit message for upstreaming into qemu-project ]
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/os-syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 57996cad8ae..b4a663fc021 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -512,7 +512,7 @@ abi_long do_freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 abi_long arg8)
 {
 CPUState *cpu = env_cpu(cpu_env);
-int ret;
+abi_long ret;
 
 trace_guest_user_syscall(cpu, num, arg1, arg2, arg3, arg4, arg5, arg6, 
arg7, arg8);
 if (do_strace) {
-- 
2.39.1

[PATCH v3 02/11] build: Don't specify -no-pie for --static user-mode programs

2023-02-16 Thread Warner Losh

When building with clang, -no-pie gives a warning on every single build,
so remove it.

Signed-off-by: Warner Losh 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 64960c6000f..eb284ccf308 100755
--- a/configure
+++ b/configure
@@ -1313,7 +1313,7 @@ if test "$static" = "yes"; then
 error_exit "-static-pie not available due to missing toolchain support"
   else
 pie="no"
-QEMU_CFLAGS="-fno-pie -no-pie $QEMU_CFLAGS"
+QEMU_CFLAGS="-fno-pie $QEMU_CFLAGS"
   fi
 elif test "$pie" = "no"; then
   if compile_prog "-Werror -fno-pie" "-no-pie"; then
-- 
2.39.1

[PATCH v3 05/11] bsd-user: Helper routines oidfmt

2023-02-16 Thread Warner Losh

From: Stacey Son 

oidfmt uses undocumented system call to get the type of the sysctl.

Co-Authored-by: Sean Bruno 
Signed-off-by: Sean Bruno 
Co-Authored-by: Juergen Lock 
Signed-off-by: Juergen Lock 
Co-Authored-by: Raphael Kubo da Costa 
Signed-off-by: Raphael Kubo da Costa 
Signed-off-by: Stacey Son 
Reviewed-by: Warner Losh 
Signed-off-by: Warner Losh 
Acked-by: Richard Henderson 
---
 bsd-user/freebsd/os-sys.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/bsd-user/freebsd/os-sys.c b/bsd-user/freebsd/os-sys.c
index 9b84e90cb32..1bf2b51820e 100644
--- a/bsd-user/freebsd/os-sys.c
+++ b/bsd-user/freebsd/os-sys.c
@@ -107,6 +107,38 @@ static abi_ulong G_GNUC_UNUSED h2g_ulong_sat(u_long ul)
  */
 #define bsd_get_ncpu() 1
 
+/*
+ * This uses the undocumented oidfmt interface to find the kind of a requested
+ * sysctl, see /sys/kern/kern_sysctl.c:sysctl_sysctl_oidfmt() (compare to
+ * src/sbin/sysctl/sysctl.c)
+ */
+static int G_GNUC_UNUSED oidfmt(int *oid, int len, char *fmt, uint32_t *kind)
+{
+int qoid[CTL_MAXNAME + 2];
+uint8_t buf[BUFSIZ];
+int i;
+size_t j;
+
+qoid[0] = CTL_SYSCTL;
+qoid[1] = CTL_SYSCTL_OIDFMT;
+memcpy(qoid + 2, oid, len * sizeof(int));
+
+j = sizeof(buf);
+i = sysctl(qoid, len + 2, buf, &j, 0, 0);
+if (i) {
+return i;
+}
+
+if (kind) {
+*kind = *(uint32_t *)buf;
+}
+
+if (fmt) {
+strcpy(fmt, (char *)(buf + sizeof(uint32_t)));
+}
+return 0;
+}
+
 /* sysarch() is architecture dependent. */
 abi_long do_freebsd_sysarch(void *cpu_env, abi_long arg1, abi_long arg2)
 {
-- 
2.39.1

[PATCH v3 11/11] bsd-user: implement sysctlbyname(2)

2023-02-16 Thread Warner Losh

From: Kyle Evans 

do_freebsd_sysctlbyname needs to translate the 'name' back down to a OID
so we can intercept the special ones. Do that and call the common wrapper
do_freebsd_sysctl_oid.

Signed-off-by: Kyle Evans 
Reviewed-by: Warner Losh 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/os-sys.c | 67 +++
 bsd-user/freebsd/os-syscall.c |  4 +++
 bsd-user/qemu.h   |  3 ++
 3 files changed, 74 insertions(+)

diff --git a/bsd-user/freebsd/os-sys.c b/bsd-user/freebsd/os-sys.c
index f07ae7da740..d9386d3e7ef 100644
--- a/bsd-user/freebsd/os-sys.c
+++ b/bsd-user/freebsd/os-sys.c
@@ -472,6 +472,73 @@ out:
 return ret;
 }
 
+/*
+ * This syscall was created to make sysctlbyname(3) more efficient, but we 
can't
+ * really provide it in bsd-user.  Notably, we must always translate the names
+ * independently since some sysctl values have to be faked for the target
+ * environment, so it still has to break down to two syscalls for the 
underlying
+ * implementation.
+ */
+abi_long do_freebsd_sysctlbyname(CPUArchState *env, abi_ulong namep,
+int32_t namelen, abi_ulong oldp, abi_ulong oldlenp, abi_ulong newp,
+abi_ulong newlen)
+{
+abi_long ret = -TARGET_EFAULT;
+void *holdp = NULL, *hnewp = NULL;
+char *snamep = NULL;
+int oid[CTL_MAXNAME + 2];
+size_t holdlen, oidplen;
+abi_ulong oldlen = 0;
+
+/* oldlenp is read/write, pre-check here for write */
+if (oldlenp) {
+if (!access_ok(VERIFY_WRITE, oldlenp, sizeof(abi_ulong)) ||
+get_user_ual(oldlen, oldlenp)) {
+goto out;
+}
+}
+snamep = lock_user_string(namep);
+if (snamep == NULL) {
+goto out;
+}
+if (newp) {
+hnewp = lock_user(VERIFY_READ, newp, newlen, 1);
+if (hnewp == NULL) {
+goto out;
+}
+}
+if (oldp) {
+holdp = lock_user(VERIFY_WRITE, oldp, oldlen, 0);
+if (holdp == NULL) {
+goto out;
+}
+}
+holdlen = oldlen;
+
+oidplen = ARRAY_SIZE(oid);
+if (sysctlnametomib(snamep, oid, &oidplen) != 0) {
+ret = -TARGET_EINVAL;
+goto out;
+}
+
+ret = do_freebsd_sysctl_oid(env, oid, oidplen, holdp, &holdlen, hnewp,
+newlen);
+
+/*
+ * writeability pre-checked above. __sysctl(2) returns ENOMEM and updates
+ * oldlenp for the proper size to use.
+ */
+if (oldlenp && (ret == 0 || ret == -TARGET_ENOMEM)) {
+put_user_ual(holdlen, oldlenp);
+}
+out:
+unlock_user(snamep, namep, 0);
+unlock_user(holdp, oldp, ret == 0 ? holdlen : 0);
+unlock_user(hnewp, newp, 0);
+
+return ret;
+}
+
 abi_long do_freebsd_sysctl(CPUArchState *env, abi_ulong namep, int32_t namelen,
 abi_ulong oldp, abi_ulong oldlenp, abi_ulong newp, abi_ulong newlen)
 {
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index 20ab3d4d9a1..179a20c304b 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -498,6 +498,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_freebsd_sysctl(cpu_env, arg1, arg2, arg3, arg4, arg5, arg6);
 break;
 
+case TARGET_FREEBSD_NR___sysctlbyname: /* sysctlbyname(2) */
+ret = do_freebsd_sysctlbyname(cpu_env, arg1, arg2, arg3, arg4, arg5, 
arg6);
+break;
+
 case TARGET_FREEBSD_NR_sysarch: /* sysarch(2) */
 ret = do_freebsd_sysarch(cpu_env, arg1, arg2);
 break;
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index c7248cfde6f..e24a8cfcfb1 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -254,6 +254,9 @@ int host_to_target_errno(int err);
 /* os-sys.c */
 abi_long do_freebsd_sysctl(CPUArchState *env, abi_ulong namep, int32_t namelen,
 abi_ulong oldp, abi_ulong oldlenp, abi_ulong newp, abi_ulong newlen);
+abi_long do_freebsd_sysctlbyname(CPUArchState *env, abi_ulong namep,
+int32_t namelen, abi_ulong oldp, abi_ulong oldlenp, abi_ulong newp,
+abi_ulong newlen);
 abi_long do_freebsd_sysarch(void *cpu_env, abi_long arg1, abi_long arg2);
 
 /* user access */
-- 
2.39.1

Re: [PATCH v2 1/1] vhost-user-fs: add property to allow migration

2023-02-16 Thread Anton Kuchin


On 16/02/2023 18:22, Juan Quintela wrote:

"Michael S. Tsirkin"  wrote:

On Thu, Feb 16, 2023 at 11:11:22AM -0500, Michael S. Tsirkin wrote:

On Thu, Feb 16, 2023 at 03:14:05PM +0100, Juan Quintela wrote:

Anton Kuchin  wrote:

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But it is good to have an option for orchestrator to tune this according to
backend capabilities and migration configuration.

This patch adds device property 'migration' that is 'none' by default
to keep old behaviour but can be set to 'external' to explicitly allow
migration with minimal virtio device state in migration stream if daemon
has some way to sync FUSE state on src and dst without help from qemu.

Signed-off-by: Anton Kuchin 

Reviewed-by: Juan Quintela 

The migration bits are correct.

And I can think a better way to explain that one device is migrated
externally.


I'm bad at wording but I'll try to improve this one.
Suggestions will be really appreciated.



If you have to respin:


+static int vhost_user_fs_pre_save(void *opaque)
+{
+VHostUserFS *fs = (VHostUserFS *)opaque;

This hack is useless.


I will. Will get rid of that, thanks.


meaning the cast? yes.


I know that there are still lots of code that still have it.


Now remember that I have no clue about vhost-user-fs.

But this looks fishy

  static const VMStateDescription vuf_vmstate = {
  .name = "vhost-user-fs",
-.unmigratable = 1,
+.minimum_version_id = 0,
+.version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_VIRTIO_DEVICE,
+VMSTATE_UINT8(migration_type, VHostUserFS),
+VMSTATE_END_OF_LIST()

In fact why do we want to migrate this property?
We generally don't, we only migrate state.

See previous discussion.
In a nutshell, we are going to have internal migration in the future
(not done yet).

Later, Juan.


I think Michael is right. We don't need it at destination to know
what data is in the stream because subsections will tell us all we
need to know.




+},
+   .pre_save = vhost_user_fs_pre_save,
  };
  
  static Property vuf_properties[] = {

@@ -309,6 +337,10 @@ static Property vuf_properties[] = {
  DEFINE_PROP_UINT16("num-request-queues", VHostUserFS,
 conf.num_request_queues, 1),
  DEFINE_PROP_UINT16("queue-size", VHostUserFS, conf.queue_size, 128),
+DEFINE_PROP_UNSIGNED("migration", VHostUserFS, migration_type,
+ VHOST_USER_MIGRATION_TYPE_NONE,
+ qdev_prop_vhost_user_migration_type,
+ uint8_t),
  DEFINE_PROP_END_OF_LIST(),

We have four properties here (5 with the new migration one), and you
only migrate one.

This looks fishy, but I don't know if it makes sense.
If they _have_ to be configured the same on source and destination, I
would transfer them and check in post_load that the values are correct.

Later, Juan.

Weird suggestion.  We generally don't do this kind of check - that
would be open-coding each property. It's management's job to make
sure things are consistent.

--
MST

[PATCH v3 03/11] bsd-user: Add sysarch syscall

2023-02-16 Thread Warner Losh

From: Stacey Son 

Connect up the sysarch system call.

Signed-off-by: Juergen Lock 
Co-authored-by: Juergen Lock 
Signed-off-by: Stacey Son 
Reviewed-by: Warner Losh 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/os-syscall.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index b4a663fc021..e00997a818c 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -491,6 +491,13 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 ret = do_bsd_undelete(arg1);
 break;
 
+/*
+ * sys{ctl, arch, call}
+ */
+case TARGET_FREEBSD_NR_sysarch: /* sysarch(2) */
+ret = do_freebsd_sysarch(cpu_env, arg1, arg2);
+break;
+
 default:
 qemu_log_mask(LOG_UNIMP, "Unsupported syscall: %d\n", num);
 ret = -TARGET_ENOSYS;
-- 
2.39.1

[PATCH v3 10/11] bsd-user: do_freebsd_sysctl helper for sysctl(2)

2023-02-16 Thread Warner Losh

From: Kyle Evans 

Implement the wrapper function for sysctl(2). This puts the oid
arguments into a standard form and calls the common
do_freebsd_sysctl_oid.

Signed-off-by: Kyle Evans 
Co-Authored-by: Juergen Lock 
Signed-off-by: Juergen Lock 
Co-Authored-by: Stacey Son 
Signed-off-by: Stacey Son 
Reviewed-by: Warner Losh 
Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/freebsd/os-sys.c | 56 ++-
 bsd-user/freebsd/os-syscall.c |  4 +++
 bsd-user/qemu.h   |  2 ++
 3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/bsd-user/freebsd/os-sys.c b/bsd-user/freebsd/os-sys.c
index 1464e64428f..f07ae7da740 100644
--- a/bsd-user/freebsd/os-sys.c
+++ b/bsd-user/freebsd/os-sys.c
@@ -249,7 +249,7 @@ static inline void sysctl_oidfmt(uint32_t *holdp)
 holdp[0] = tswap32(holdp[0]);
 }
 
-static abi_long G_GNUC_UNUSED do_freebsd_sysctl_oid(CPUArchState *env, int32_t 
*snamep,
+static abi_long do_freebsd_sysctl_oid(CPUArchState *env, int32_t *snamep,
 int32_t namelen, void *holdp, size_t *holdlenp, void *hnewp,
 size_t newlen)
 {
@@ -472,6 +472,60 @@ out:
 return ret;
 }
 
+abi_long do_freebsd_sysctl(CPUArchState *env, abi_ulong namep, int32_t namelen,
+abi_ulong oldp, abi_ulong oldlenp, abi_ulong newp, abi_ulong newlen)
+{
+abi_long ret = -TARGET_EFAULT;
+void *hnamep, *holdp = NULL, *hnewp = NULL;
+size_t holdlen;
+abi_ulong oldlen = 0;
+int32_t *snamep = g_malloc(sizeof(int32_t) * namelen), *p, *q, i;
+
+/* oldlenp is read/write, pre-check here for write */
+if (oldlenp) {
+if (!access_ok(VERIFY_WRITE, oldlenp, sizeof(abi_ulong)) ||
+get_user_ual(oldlen, oldlenp)) {
+goto out;
+}
+}
+hnamep = lock_user(VERIFY_READ, namep, namelen, 1);
+if (hnamep == NULL) {
+goto out;
+}
+if (newp) {
+hnewp = lock_user(VERIFY_READ, newp, newlen, 1);
+if (hnewp == NULL) {
+goto out;
+}
+}
+if (oldp) {
+holdp = lock_user(VERIFY_WRITE, oldp, oldlen, 0);
+if (holdp == NULL) {
+goto out;
+}
+}
+holdlen = oldlen;
+for (p = hnamep, q = snamep, i = 0; i < namelen; p++, i++, q++) {
+*q = tswap32(*p);
+}
+
+ret = do_freebsd_sysctl_oid(env, snamep, namelen, holdp, &holdlen, hnewp,
+newlen);
+
+/*
+ * writeability pre-checked above. __sysctl(2) returns ENOMEM and updates
+ * oldlenp for the proper size to use.
+ */
+if (oldlenp && (ret == 0 || ret == -TARGET_ENOMEM)) {
+put_user_ual(holdlen, oldlenp);
+}
+unlock_user(hnamep, namep, 0);
+unlock_user(holdp, oldp, ret == 0 ? holdlen : 0);
+out:
+g_free(snamep);
+return ret;
+}
+
 /* sysarch() is architecture dependent. */
 abi_long do_freebsd_sysarch(void *cpu_env, abi_long arg1, abi_long arg2)
 {
diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
index e00997a818c..20ab3d4d9a1 100644
--- a/bsd-user/freebsd/os-syscall.c
+++ b/bsd-user/freebsd/os-syscall.c
@@ -494,6 +494,10 @@ static abi_long freebsd_syscall(void *cpu_env, int num, 
abi_long arg1,
 /*
  * sys{ctl, arch, call}
  */
+case TARGET_FREEBSD_NR___sysctl: /* sysctl(3) */
+ret = do_freebsd_sysctl(cpu_env, arg1, arg2, arg3, arg4, arg5, arg6);
+break;
+
 case TARGET_FREEBSD_NR_sysarch: /* sysarch(2) */
 ret = do_freebsd_sysarch(cpu_env, arg1, arg2);
 break;
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 0ceecfb6dfa..c7248cfde6f 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -252,6 +252,8 @@ bool is_error(abi_long ret);
 int host_to_target_errno(int err);
 
 /* os-sys.c */
+abi_long do_freebsd_sysctl(CPUArchState *env, abi_ulong namep, int32_t namelen,
+abi_ulong oldp, abi_ulong oldlenp, abi_ulong newp, abi_ulong newlen);
 abi_long do_freebsd_sysarch(void *cpu_env, abi_long arg1, abi_long arg2);
 
 /* user access */
-- 
2.39.1

[PATCH v3 00/11] 2023 Q1 bsd-user upstreaming: bugfixes and sysctl

2023-02-16 Thread Warner Losh

[ letter edited -- need reviews for these hunks
 bsd-user: Helper routines h2g_old_sysctl
 bsd-user: various helper routines for sysctl
]

This group of patches gets the basic framework for sysctl upstreamed. There's a
lot more to translate far too many binary blobs the kernel publishes via
sysctls, but I'm leaving those out in the name of simplicity.

There's also a bug fix from Doug Rabson that fixes a long int confusion leading
to a trunctation of addresses (oops)

There's a fix for the -static option, since clang hates -no-pie and needs only
-fno-pie.

Finally, I'm changing how I'm upstreaming a little. I'm doing a little deeper
dives into our rather chaotic repo to find a couple of authors I might have
missed. From here on out, I'll be using the original author's name as the git
author. I'll also tag the co-authors better as well when there's multiple people
that did something (other than reformat and/or move code around). I've
discovered more code moved about than I'd previously known. This seems more in
line with standard practice. Also, I've reviewed all these changes, but I don't
know if I need to add Reviewed-by: or not, so I've done it for one or two and
will make it consistent before the pull request. git log suggests maintainers
are inconsistent about this (or I've not discovered the rules they follow).

check-patch gives some line lenght warnings, but should otherwise be OK. There's
several static functions that aren't used until the end of the patch
series... Not sure the best way to suppress the build warnings there (but since
they are just warnings...).

v3:
o Removed -strict, it's not ready and needs a complete rethink.
o Add g_assert_not_reached()
o target -> guest in most places
o Use MIN() to simplify things
o Better types in many places (abi_int instead of int32_t)
o Use ARRAY_COUNT
o fix tabs copied from FreeBSD sources to spaces

v2:
o Created various helper functions to make the code a little better
o split a few patches that I thought would be approved together but
  that generated commentary. It's easier to manage 1 per patch for
  those.
o Add/delete G_GNU_UNUSED to ensure all patches compile w/o warnings
o Fix 64-bit running 32-bit binary to get a LONG or ULONG. Add a
  bounce buffer for these so we don't overflow anything on the target
  and return all the elements of arrays.
o Fixed a number of nits noticed in the review.
o Add or improve comments to explain things there were questions on
  during the review.
o fix noted typos
o fix host != target page size differences
o Add pointers to FreeBSD source code, as appropriate
o fix locking (mostly unlocking) on error paths
o Note: -strict feedback not yet applied due to large numbers of changes
  from the rest. Next round.

Doug Rabson (1):
  bsd-user: Don't truncate the return value from freebsd_syscall

Juergen Lock (3):
  bsd-user: sysctl helper funtions: sysctl_name2oid and sysctl_oidfmt
  bsd-user: common routine do_freebsd_sysctl_oid for all sysctl variants
  bsd-user: Start translation of arch-specific sysctls

Kyle Evans (2):
  bsd-user: do_freebsd_sysctl helper for sysctl(2)
  bsd-user: implement sysctlbyname(2)

Stacey Son (2):
  bsd-user: Add sysarch syscall
  bsd-user: Helper routines oidfmt

Warner Losh (3):
  build: Don't specify -no-pie for --static user-mode programs
  bsd-user: various helper routines for sysctl
  bsd-user: Helper routines h2g_old_sysctl

 bsd-user/freebsd/os-sys.c | 572 ++
 bsd-user/freebsd/os-syscall.c |  17 +-
 bsd-user/qemu.h   |   5 +
 configure |   2 +-
 4 files changed, 594 insertions(+), 2 deletions(-)

-- 
2.39.1

1 2 3 4 5 6 >

1 - 100 of 518 matches

Mail list logo