Re: [RFC PATCH 0/5] hw/arm/virt: Introduce cpu topology support

2021-03-04 Thread Ying Fang




On 3/1/2021 5:48 PM, Andrew Jones wrote:

On Fri, Feb 26, 2021 at 04:41:45PM +0800, Ying Fang wrote:



On 2/25/2021 8:02 PM, Andrew Jones wrote:

On Thu, Feb 25, 2021 at 04:56:22PM +0800, Ying Fang wrote:

An accurate cpu topology may help improve the cpu scheduler's decision
making when dealing with multi-core system. So cpu topology description
is helpful to provide guest with the right view. Dario Faggioli's talk
in [0] also shows the virtual topology may has impact on sched performace.
Thus this patch series is posted to introduce cpu topology support for
arm platform.

Both fdt and ACPI are introduced to present the cpu topology. To describe
the cpu topology via ACPI, a PPTT table is introduced according to the
processor hierarchy node structure. This series is derived from [1], in
[1] we are trying to bring both cpu and cache topology support for arm
platform, but there is still some issues to solve to support the cache
hierarchy. So we split the cpu topology part out and send it seperately.
The patch series to support cache hierarchy will be send later since
Salil Mehta's cpu hotplug feature need the cpu topology enabled first and
he is waiting for it to be upstreamed.

This patch series was initially based on the patches posted by Andrew Jones [2].
I jumped in on it since some OS vendor cooperative partner are eager for it.
Thanks for Andrew's contribution.

After applying this patch series, launch a guest with virt-6.0 and cpu
topology configured with sockets:cores:threads = 2:4:2, you will get the
bellow messages with the lscpu command.

-
Architecture:aarch64
CPU op-mode(s):  64-bit
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2


What CPU model was used? Did it actually support threads? If these were


It's tested on Huawei Kunpeng 920 CPU model and vcpu host-passthrough.
It does not support threads for now, but the next version 930 may
support it. Here we emulate a virtual cpu topology, a virtual 2 threads
is used to do the test.



KVM VCPUs, then I guess MPIDR.MT was not set on the CPUs. Apparently
that didn't confuse Linux? See [1] for how I once tried to deal with
threads.

[1] 
https://github.com/rhdrjones/qemu/commit/60218e0dd7b331031b644872d56f2aca42d0ff1e



If ACPI PPTT table is specified, the linux kernel won't check the MPIDR
register to populate cpu topology. Moreover MPIDR does not ensure a
right cpu topology. So it won't be a problem if MPIDR.MT is not set.


OK, so Linux doesn't care about MPIDR.MT with ACPI. What happens with
DT?


Behind the logical of Linux kernel, it tries to parse cpu topology in
smp_prepare_cpus (arch/arm64/kernel/topology.c). If cpu topology is
provided via DT, Linux kernel won't check MPIDR any more. This is the
same with ACPI enabled.






Core(s) per socket:  4
Socket(s):   2


Good, but what happens if you specify '-smp 16'? Do you get 16 sockets

   ^^ You didn't answer this question.


The latest qemu use smp_parse the parse -smp command line, by default if
-smp 16 is given, arm64 virt machine will get 16 sockets.




each with 1 core? Or, do you get 1 socket with 16 cores? And, which do
we want and why? If you look at [2], then you'll see I was assuming we
want to prefer cores over sockets, since without topology descriptions
that's what the Linux guest kernel would do.

[2] 
https://github.com/rhdrjones/qemu/commit/c0670b1bccb4d08c7cf7c6957cc8878a2af131dd



Thanks, I'll check the default way Linux does.


NUMA node(s):2


Why do we have two NUMA nodes in the guest? The two sockets in the
guest should not imply this.


The two NUMA nodes are emulated by Qemu since we already have guest numa
topology feature.


That's what I suspected, and I presume only a single node is present when
you don't use QEMU's NUMA feature - even when you supply a VCPU topology
with multiple sockets?


Agreed, I would like single numa node too if we do not use guest
numa feature. Here I provide the guest with two numa nodes and set the 
cpu affinity only to do a test.





Thanks,
drew


So the two sockets in the guest has nothing to do with
it. Actually even one socket may have two numa nodes in it in real cpu
model.



Thanks,
drew


Vendor ID:   HiSilicon
Model:   0
Model name:  Kunpeng-920
Stepping:0x1
BogoMIPS:200.00
NUMA node0 CPU(s):   0-7
NUMA node1 CPU(s):   8-15

[0] 
https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse
[1] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02166.html
[2] 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com

Ying Fang (5

Re: [RFC PATCH 4/5] hw/acpi/aml-build: add processor hierarchy node structure

2021-03-03 Thread Ying Fang




On 3/1/2021 11:50 PM, Michael S. Tsirkin wrote:

On Mon, Mar 01, 2021 at 10:39:19AM +0100, Andrew Jones wrote:

On Fri, Feb 26, 2021 at 10:23:03AM +0800, Ying Fang wrote:



On 2/25/2021 7:47 PM, Andrew Jones wrote:

On Thu, Feb 25, 2021 at 04:56:26PM +0800, Ying Fang wrote:

Add the processor hierarchy node structures to build ACPI information
for CPU topology. Since the private resources may be used to describe
cache hierarchy and it is variable among different topology level,
three helpers are introduced to describe the hierarchy.

(1) build_socket_hierarchy for socket description
(2) build_processor_hierarchy for processor description
(3) build_smt_hierarchy for thread (logic processor) description

Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
   hw/acpi/aml-build.c | 40 +
   include/hw/acpi/acpi-defs.h | 13 
   include/hw/acpi/aml-build.h |  7 +++
   3 files changed, 60 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a2cd7a5830..a0af3e9d73 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1888,6 +1888,46 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
table_data->len - slit_start, 1, oem_id, oem_table_id);
   }
+/*
+ * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
+ */
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, ACPI_PPTT_PHYSICAL_PACKAGE, 4);


Missing '/* Flags */'


Will fix.




+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */
+build_append_byte(tbl, 20);   /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2); /* Reserved */
+build_append_int_noprefix(tbl,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID |
+  ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD |
+  ACPI_PPTT_ACPI_LEAF_NODE, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent , 4); /* parent */


'parent' not capitalized. We want these comments to exactly match the text
in the spec.


Will fix.




+build_append_int_noprefix(tbl, id, 4);  /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);   /* Num of private resources */
+}
+
   /* build rev1/rev3/rev5.1 FADT */
   void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
   const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index cf9f44299c..45e10d886f 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -618,4 +618,17 @@ struct AcpiIortRC {
   } QEMU_PACKED;
   typedef struct AcpiIortRC AcpiIortRC;
+enum {
+ACPI_PPTT_TYPE_PROCESSOR = 0,
+ACPI_PPTT_TYPE_CACHE,
+ACPI_PPTT_TYPE_ID,
+ACPI_PPTT_TYPE_RESERVED
+};
+
+#define ACPI_PPTT_PHYSICAL_PACKAGE  (1)
+#define ACPI_PPTT_ACPI_PROCESSOR_ID_VALID   (1 << 1)
+#define ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD  (1 << 2)  /* ACPI 6.3 */
+#define ACPI_PPTT_ACPI_LEAF_NODE(1 << 3)  /* ACPI 6.3 */
+#define ACPI_PPTT_ACPI_IDENTICAL(1 << 4)  /* ACPI 6.3 */


You need to quote specific place in spec where this appeared, not
just version. and what about previous ones?


Thanks, Will fix.





+
   #endif
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 380d3e3924..7f0ca1a198 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -462,6 +462,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
   void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
   const char *oem_id, const char *oem_table_id);
+void build_socket_hier

Re: [RFC PATCH 0/5] hw/arm/virt: Introduce cpu topology support

2021-02-26 Thread Ying Fang




On 2/25/2021 8:02 PM, Andrew Jones wrote:

On Thu, Feb 25, 2021 at 04:56:22PM +0800, Ying Fang wrote:

An accurate cpu topology may help improve the cpu scheduler's decision
making when dealing with multi-core system. So cpu topology description
is helpful to provide guest with the right view. Dario Faggioli's talk
in [0] also shows the virtual topology may has impact on sched performace.
Thus this patch series is posted to introduce cpu topology support for
arm platform.

Both fdt and ACPI are introduced to present the cpu topology. To describe
the cpu topology via ACPI, a PPTT table is introduced according to the
processor hierarchy node structure. This series is derived from [1], in
[1] we are trying to bring both cpu and cache topology support for arm
platform, but there is still some issues to solve to support the cache
hierarchy. So we split the cpu topology part out and send it seperately.
The patch series to support cache hierarchy will be send later since
Salil Mehta's cpu hotplug feature need the cpu topology enabled first and
he is waiting for it to be upstreamed.

This patch series was initially based on the patches posted by Andrew Jones [2].
I jumped in on it since some OS vendor cooperative partner are eager for it.
Thanks for Andrew's contribution.

After applying this patch series, launch a guest with virt-6.0 and cpu
topology configured with sockets:cores:threads = 2:4:2, you will get the
bellow messages with the lscpu command.

-
Architecture:aarch64
CPU op-mode(s):  64-bit
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2


What CPU model was used? Did it actually support threads? If these were


It's tested on Huawei Kunpeng 920 CPU model and vcpu host-passthrough.
It does not support threads for now, but the next version 930 may
support it. Here we emulate a virtual cpu topology, a virtual 2 threads
is used to do the test.



KVM VCPUs, then I guess MPIDR.MT was not set on the CPUs. Apparently
that didn't confuse Linux? See [1] for how I once tried to deal with
threads.

[1] 
https://github.com/rhdrjones/qemu/commit/60218e0dd7b331031b644872d56f2aca42d0ff1e



If ACPI PPTT table is specified, the linux kernel won't check the MPIDR
register to populate cpu topology. Moreover MPIDR does not ensure a
right cpu topology. So it won't be a problem if MPIDR.MT is not set.


Core(s) per socket:  4
Socket(s):   2


Good, but what happens if you specify '-smp 16'? Do you get 16 sockets
each with 1 core? Or, do you get 1 socket with 16 cores? And, which do
we want and why? If you look at [2], then you'll see I was assuming we
want to prefer cores over sockets, since without topology descriptions
that's what the Linux guest kernel would do.

[2] 
https://github.com/rhdrjones/qemu/commit/c0670b1bccb4d08c7cf7c6957cc8878a2af131dd


NUMA node(s):2


Why do we have two NUMA nodes in the guest? The two sockets in the
guest should not imply this.


The two NUMA nodes are emulated by Qemu since we already have guest numa
topology feature. So the two sockets in the guest has nothing to do with
it. Actually even one socket may have two numa nodes in it in real cpu
model.



Thanks,
drew


Vendor ID:   HiSilicon
Model:   0
Model name:  Kunpeng-920
Stepping:0x1
BogoMIPS:200.00
NUMA node0 CPU(s):   0-7
NUMA node1 CPU(s):   8-15

[0] 
https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse
[1] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02166.html
[2] 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com

Ying Fang (5):
   device_tree: Add qemu_fdt_add_path
   hw/arm/virt: Add cpu-map to device tree
   hw/arm/virt-acpi-build: distinguish possible and present cpus
   hw/acpi/aml-build: add processor hierarchy node structure
   hw/arm/virt-acpi-build: add PPTT table

  hw/acpi/aml-build.c  | 40 ++
  hw/arm/virt-acpi-build.c | 64 +---
  hw/arm/virt.c| 40 +-
  include/hw/acpi/acpi-defs.h  | 13 
  include/hw/acpi/aml-build.h  |  7 
  include/hw/arm/virt.h|  1 +
  include/sysemu/device_tree.h |  1 +
  softmmu/device_tree.c| 45 +++--
  8 files changed, 204 insertions(+), 7 deletions(-)

--
2.23.0



.





Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: add PPTT table

2021-02-25 Thread Ying Fang




On 2/25/2021 7:38 PM, Andrew Jones wrote:


This is just [*] with some minor code changes

[*] 
https://github.com/rhdrjones/qemu/commit/439b38d67ca1f2cbfa5b9892a822b651ebd05c11

so it's disappointing that my name is nowhere to be found on it.

Also, the explanation of the DT and ACPI differences has been
dropped from the commit message of [*]. I'm not sure why.



Will fix that. I will add SOB of you then you can help to comment on it.


Thanks,
drew

On Thu, Feb 25, 2021 at 04:56:27PM +0800, Ying Fang wrote:

Add the Processor Properties Topology Table (PPTT) to present
CPU topology information to the guest. A three-level cpu
topology is built in accord with the linux kernel currently does.

Tested-by: Jiajie Li 
Signed-off-by: Ying Fang 
---
  hw/arm/virt-acpi-build.c | 50 
  1 file changed, 50 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index bb91152fe2..38d50ce66c 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -436,6 +436,50 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
   vms->oem_table_id);
  }
  
+static void

+build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
+{
+int pptt_start = table_data->len;
+int uid = 0, cpus = 0, socket = 0;
+MachineState *ms = MACHINE(vms);
+unsigned int smp_cores = ms->smp.cores;
+unsigned int smp_threads = ms->smp.threads;
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
+uint32_t socket_offset = table_data->len - pptt_start;
+int core;
+
+build_socket_hierarchy(table_data, 0, socket);
+
+for (core = 0; core < smp_cores; core++) {
+uint32_t core_offset = table_data->len - pptt_start;
+int thread;
+
+if (smp_threads <= 1) {
+build_processor_hierarchy(table_data,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID |
+  ACPI_PPTT_ACPI_LEAF_NODE,
+  socket_offset, uid++);
+ } else {
+build_processor_hierarchy(table_data,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID,
+  socket_offset, core);
+for (thread = 0; thread < smp_threads; thread++) {
+build_thread_hierarchy(table_data, core_offset, uid++);
+}
+ }
+}
+cpus += smp_cores * smp_threads;
+}
+
+build_header(linker, table_data,
+ (void *)(table_data->data + pptt_start), "PPTT",
+ table_data->len - pptt_start, 2,
+ vms->oem_id, vms->oem_table_id);
+}
+
  /* GTDT */
  static void
  build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -688,6 +732,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
  unsigned dsdt, xsdt;
  GArray *tables_blob = tables->table_data;
  MachineState *ms = MACHINE(vms);
+bool cpu_topology_enabled = !vmc->no_cpu_topology;
  
  table_offsets = g_array_new(false, true /* clear */,

  sizeof(uint32_t));
@@ -707,6 +752,11 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
  acpi_add_table(table_offsets, tables_blob);
  build_madt(tables_blob, tables->linker, vms);
  
+if (ms->smp.cpus > 1 && cpu_topology_enabled) {

+acpi_add_table(table_offsets, tables_blob);
+build_pptt(tables_blob, tables->linker, vms);
+}
+
  acpi_add_table(table_offsets, tables_blob);
  build_gtdt(tables_blob, tables->linker, vms);
  
--

2.23.0




.





Re: [RFC PATCH 4/5] hw/acpi/aml-build: add processor hierarchy node structure

2021-02-25 Thread Ying Fang




On 2/25/2021 7:47 PM, Andrew Jones wrote:

On Thu, Feb 25, 2021 at 04:56:26PM +0800, Ying Fang wrote:

Add the processor hierarchy node structures to build ACPI information
for CPU topology. Since the private resources may be used to describe
cache hierarchy and it is variable among different topology level,
three helpers are introduced to describe the hierarchy.

(1) build_socket_hierarchy for socket description
(2) build_processor_hierarchy for processor description
(3) build_smt_hierarchy for thread (logic processor) description

Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
  hw/acpi/aml-build.c | 40 +
  include/hw/acpi/acpi-defs.h | 13 
  include/hw/acpi/aml-build.h |  7 +++
  3 files changed, 60 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a2cd7a5830..a0af3e9d73 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1888,6 +1888,46 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
   table_data->len - slit_start, 1, oem_id, oem_table_id);
  }
  
+/*

+ * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
+ */
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, ACPI_PPTT_PHYSICAL_PACKAGE, 4);


Missing '/* Flags */'


Will fix.




+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */
+build_append_byte(tbl, 20);   /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2); /* Reserved */
+build_append_int_noprefix(tbl,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID |
+  ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD |
+  ACPI_PPTT_ACPI_LEAF_NODE, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent , 4); /* parent */


'parent' not capitalized. We want these comments to exactly match the text
in the spec.


Will fix.




+build_append_int_noprefix(tbl, id, 4);  /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);   /* Num of private resources */
+}
+
  /* build rev1/rev3/rev5.1 FADT */
  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
  const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index cf9f44299c..45e10d886f 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -618,4 +618,17 @@ struct AcpiIortRC {
  } QEMU_PACKED;
  typedef struct AcpiIortRC AcpiIortRC;
  
+enum {

+ACPI_PPTT_TYPE_PROCESSOR = 0,
+ACPI_PPTT_TYPE_CACHE,
+ACPI_PPTT_TYPE_ID,
+ACPI_PPTT_TYPE_RESERVED
+};
+
+#define ACPI_PPTT_PHYSICAL_PACKAGE  (1)
+#define ACPI_PPTT_ACPI_PROCESSOR_ID_VALID   (1 << 1)
+#define ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD  (1 << 2)  /* ACPI 6.3 */
+#define ACPI_PPTT_ACPI_LEAF_NODE(1 << 3)  /* ACPI 6.3 */
+#define ACPI_PPTT_ACPI_IDENTICAL(1 << 4)  /* ACPI 6.3 */
+
  #endif
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 380d3e3924..7f0ca1a198 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -462,6 +462,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
  void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
  const char *oem_id, const char *oem_table_id);
  
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);

+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id);
+
+void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);


Why does build_processor_hierarchy() take a flags argument

Re: [RFC PATCH 1/5] device_tree: Add qemu_fdt_add_path

2021-02-25 Thread Ying Fang




On 2/25/2021 9:25 PM, Andrew Jones wrote:

On Thu, Feb 25, 2021 at 08:54:40PM +0800, Ying Fang wrote:



On 2/25/2021 7:03 PM, Andrew Jones wrote:

Hi Ying Fang,

I don't see any change in this patch from what I have in my
tree, so this should be

   From: Andrew Jones 

Thanks,
drew



Yes, I picked it from your qemu branch:
https://github.com/rhdrjones/qemu/commit/ecfc1565f22187d2c715a99bbcd35cf3a7e428fa

So what can I do to make it "From: Andrew Jones " ?

Can I made it by using git commit --amend like below ?

git commit --amend --author "Andrew Jones "


That's one way to fix it now, but normally when you apply/cherry-pick
a patch it will keep the authorship. Then, all you have to do is
post like usual and the "From: ..." will show up automatically.



Hmm, I know cherry-pick can do that. But sometimes there maybe
conflicts, so I have to backport it by hand and copy the commit
msg back, thus the authorship may be lost.



Thanks,
drew




On Thu, Feb 25, 2021 at 04:56:23PM +0800, Ying Fang wrote:

qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except
it also adds any missing parent nodes. We also tweak an error
message of qemu_fdt_add_subnode().

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
   include/sysemu/device_tree.h |  1 +
   softmmu/device_tree.c| 45 ++--
   2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 982c89345f..15fb98af98 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
   uint32_t qemu_fdt_alloc_phandle(void *fdt);
   int qemu_fdt_nop_node(void *fdt, const char *node_path);
   int qemu_fdt_add_subnode(void *fdt, const char *name);
+int qemu_fdt_add_path(void *fdt, const char *path);
   #define qemu_fdt_setprop_cells(fdt, node_path, property, ...)
 \
   do { 
 \
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index b9a3ddc518..1e3857ca0c 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
   retval = fdt_add_subnode(fdt, parent, basename);
   if (retval < 0) {
-error_report("FDT: Failed to create subnode %s: %s", name,
- fdt_strerror(retval));
+error_report("%s: Failed to create subnode %s: %s",
+ __func__, name, fdt_strerror(retval));
   exit(1);
   }
@@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
   return retval;
   }
+/*
+ * Like qemu_fdt_add_subnode(), but will add all missing
+ * subnodes in the path.
+ */
+int qemu_fdt_add_path(void *fdt, const char *path)
+{
+char *dupname, *basename, *p;
+int parent, retval = -1;
+
+if (path[0] != '/') {
+return retval;
+}
+
+parent = fdt_path_offset(fdt, "/");
+p = dupname = g_strdup(path);
+
+while (p) {
+*p = '/';
+basename = p + 1;
+p = strchr(p + 1, '/');
+if (p) {
+*p = '\0';
+}
+retval = fdt_path_offset(fdt, dupname);
+if (retval < 0 && retval != -FDT_ERR_NOTFOUND) {
+error_report("%s: Invalid path %s: %s",
+ __func__, path, fdt_strerror(retval));
+exit(1);
+} else if (retval == -FDT_ERR_NOTFOUND) {
+retval = fdt_add_subnode(fdt, parent, basename);
+if (retval < 0) {
+break;
+}
+}
+parent = retval;
+}
+
+g_free(dupname);
+return retval;
+}
+
   void qemu_fdt_dumpdtb(void *fdt, int size)
   {
   const char *dumpdtb = current_machine->dumpdtb;
--
2.23.0




.





.





Re: [RFC PATCH 2/5] hw/arm/virt: Add cpu-map to device tree

2021-02-25 Thread Ying Fang




On 2/25/2021 7:16 PM, Andrew Jones wrote:

Hi Ying Fang,

The only difference between this and what I have in my tree[*]
is the removal of the socket node (which has been in the Linux
docs since June 2019). Any reason why you removed that node? In
any case, I think I deserve a bit more credit for this patch.


Sorry, you surely deserve it. I forget to add it here.
Should I have a SOB of you here ?

The latest linux kernel use a four level cpu topology defined in

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt?h=v5.11

ie. socket node, cluster node, core node, thread node.

The linux kernel 4.19 LTS use a three level cpu topology defined in
Documentation/devicetree/bindings/arm/topology.txt

ie. cluster node, core node, thread node.

Currently Qemu x86 has 4 level of cpu topology as: socket, die, core,
thread. Should arm64 active like it here ?

Further more, latest linux kernel define the cpu topology struct as.
So maybe it only cares about the socket, core, thread topology levels.

struct cpu_topology { 

int thread_id; 

int core_id; 

int package_id; 

int llc_id; 

cpumask_t thread_sibling; 

cpumask_t core_sibling; 

cpumask_t llc_sibling; 


};



[*] 
https://github.com/rhdrjones/qemu/commit/35feecdd43475608c8f55973a0c159eac4aafefd

Thanks,
drew

On Thu, Feb 25, 2021 at 04:56:24PM +0800, Ying Fang wrote:

Support device tree CPU topology descriptions.

Signed-off-by: Ying Fang 
---
  hw/arm/virt.c | 38 +-
  include/hw/arm/virt.h |  1 +
  2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 371147f3ae..c133b342b8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -351,10 +351,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
  int cpu;
  int addr_cells = 1;
  const MachineState *ms = MACHINE(vms);
+const VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
  int smp_cpus = ms->smp.cpus;
  
  /*

- * From Documentation/devicetree/bindings/arm/cpus.txt
+ * See Linux Documentation/devicetree/bindings/arm/cpus.yaml
   *  On ARM v8 64-bit systems value should be set to 2,
   *  that corresponds to the MPIDR_EL1 register size.
   *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
@@ -407,8 +408,42 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
  ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
  }
  
+if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) {

+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+}
+
  g_free(nodename);
  }
+
+if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) {
+/*
+ * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt
+ */
+qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map");
+
+for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
+char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu);
+char *map_path;
+
+if (ms->smp.threads > 1) {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d/%s%d",
+"cluster", cpu / (ms->smp.cores * ms->smp.threads),


a cluster node may be replaced by socket to keep accord with the latest 
kernel.



+"core", (cpu / ms->smp.threads) % ms->smp.cores,
+"thread", cpu % ms->smp.threads);
+} else {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d",
+"cluster", cpu / ms->smp.cores,
+"core", cpu % ms->smp.cores);
+}
+qemu_fdt_add_path(vms->fdt, map_path);
+qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path);
+g_free(map_path);
+g_free(cpu_path);
+}
+}
  }
  
  static void fdt_add_its_gic_node(VirtMachineState *vms)

@@ -2742,6 +2777,7 @@ static void virt_machine_5_2_options(MachineClass *mc)
  virt_machine_6_0_options(mc);
  compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
  vmc->no_secure_gpio = true;
+vmc->no_cpu_topology = true;
  }
  DEFINE_VIRT_MACHINE(5, 2)
  
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h

index ee9a93101e..7ef6d08ac3 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -129,6 +129,7 @@ struct VirtMachineClass {
  bool no_kvm_steal_time;
  bool acpi_expose_flash;
  bool no_secure_gpio;
+bool no_cpu_topology;
  };
  
  struct VirtMachineState {

--
2.23.0




.





Re: [RFC PATCH 1/5] device_tree: Add qemu_fdt_add_path

2021-02-25 Thread Ying Fang




On 2/25/2021 7:03 PM, Andrew Jones wrote:

Hi Ying Fang,

I don't see any change in this patch from what I have in my
tree, so this should be

  From: Andrew Jones 

Thanks,
drew



Yes, I picked it from your qemu branch:
https://github.com/rhdrjones/qemu/commit/ecfc1565f22187d2c715a99bbcd35cf3a7e428fa

So what can I do to make it "From: Andrew Jones " ?

Can I made it by using git commit --amend like below ?

git commit --amend --author "Andrew Jones "


On Thu, Feb 25, 2021 at 04:56:23PM +0800, Ying Fang wrote:

qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except
it also adds any missing parent nodes. We also tweak an error
message of qemu_fdt_add_subnode().

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
  include/sysemu/device_tree.h |  1 +
  softmmu/device_tree.c| 45 ++--
  2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 982c89345f..15fb98af98 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
  uint32_t qemu_fdt_alloc_phandle(void *fdt);
  int qemu_fdt_nop_node(void *fdt, const char *node_path);
  int qemu_fdt_add_subnode(void *fdt, const char *name);
+int qemu_fdt_add_path(void *fdt, const char *path);
  
  #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \

  do {  
\
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index b9a3ddc518..1e3857ca0c 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
  
  retval = fdt_add_subnode(fdt, parent, basename);

  if (retval < 0) {
-error_report("FDT: Failed to create subnode %s: %s", name,
- fdt_strerror(retval));
+error_report("%s: Failed to create subnode %s: %s",
+ __func__, name, fdt_strerror(retval));
  exit(1);
  }
  
@@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)

  return retval;
  }
  
+/*

+ * Like qemu_fdt_add_subnode(), but will add all missing
+ * subnodes in the path.
+ */
+int qemu_fdt_add_path(void *fdt, const char *path)
+{
+char *dupname, *basename, *p;
+int parent, retval = -1;
+
+if (path[0] != '/') {
+return retval;
+}
+
+parent = fdt_path_offset(fdt, "/");
+p = dupname = g_strdup(path);
+
+while (p) {
+*p = '/';
+basename = p + 1;
+p = strchr(p + 1, '/');
+if (p) {
+*p = '\0';
+}
+retval = fdt_path_offset(fdt, dupname);
+if (retval < 0 && retval != -FDT_ERR_NOTFOUND) {
+error_report("%s: Invalid path %s: %s",
+ __func__, path, fdt_strerror(retval));
+exit(1);
+} else if (retval == -FDT_ERR_NOTFOUND) {
+retval = fdt_add_subnode(fdt, parent, basename);
+if (retval < 0) {
+break;
+}
+}
+parent = retval;
+}
+
+g_free(dupname);
+return retval;
+}
+
  void qemu_fdt_dumpdtb(void *fdt, int size)
  {
  const char *dumpdtb = current_machine->dumpdtb;
--
2.23.0




.





[RFC PATCH 2/5] hw/arm/virt: Add cpu-map to device tree

2021-02-25 Thread Ying Fang
Support device tree CPU topology descriptions.

Signed-off-by: Ying Fang 
---
 hw/arm/virt.c | 38 +-
 include/hw/arm/virt.h |  1 +
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 371147f3ae..c133b342b8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -351,10 +351,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 int cpu;
 int addr_cells = 1;
 const MachineState *ms = MACHINE(vms);
+const VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
 int smp_cpus = ms->smp.cpus;
 
 /*
- * From Documentation/devicetree/bindings/arm/cpus.txt
+ * See Linux Documentation/devicetree/bindings/arm/cpus.yaml
  *  On ARM v8 64-bit systems value should be set to 2,
  *  that corresponds to the MPIDR_EL1 register size.
  *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
@@ -407,8 +408,42 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
 }
 
+if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) {
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+}
+
 g_free(nodename);
 }
+
+if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) {
+/*
+ * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt
+ */
+qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map");
+
+for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
+char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu);
+char *map_path;
+
+if (ms->smp.threads > 1) {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d/%s%d",
+"cluster", cpu / (ms->smp.cores * ms->smp.threads),
+"core", (cpu / ms->smp.threads) % ms->smp.cores,
+"thread", cpu % ms->smp.threads);
+} else {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d",
+"cluster", cpu / ms->smp.cores,
+"core", cpu % ms->smp.cores);
+}
+qemu_fdt_add_path(vms->fdt, map_path);
+qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path);
+g_free(map_path);
+g_free(cpu_path);
+}
+}
 }
 
 static void fdt_add_its_gic_node(VirtMachineState *vms)
@@ -2742,6 +2777,7 @@ static void virt_machine_5_2_options(MachineClass *mc)
 virt_machine_6_0_options(mc);
 compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
 vmc->no_secure_gpio = true;
+vmc->no_cpu_topology = true;
 }
 DEFINE_VIRT_MACHINE(5, 2)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ee9a93101e..7ef6d08ac3 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -129,6 +129,7 @@ struct VirtMachineClass {
 bool no_kvm_steal_time;
 bool acpi_expose_flash;
 bool no_secure_gpio;
+bool no_cpu_topology;
 };
 
 struct VirtMachineState {
-- 
2.23.0




[RFC PATCH 3/5] hw/arm/virt-acpi-build: distinguish possible and present cpus

2021-02-25 Thread Ying Fang
When building ACPI tables regarding CPUs we should always build
them for the number of possible CPUs, not the number of present
CPUs. We then ensure only the present CPUs are enabled in madt.
Furthermore, it is also needed if we are going to support CPU
hotplug in the future.

This patch is a rework based on Andrew Jones's contribution at
https://lists.gnu.org/archive/html/qemu-arm/2018-07/msg00076.html

Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 14 ++
 hw/arm/virt.c|  2 ++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f9c9df916c..bb91152fe2 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -61,13 +61,16 @@
 
 static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
 {
-MachineState *ms = MACHINE(vms);
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
 uint16_t i;
 
-for (i = 0; i < ms->smp.cpus; i++) {
+for (i = 0; i < possible_cpus->len; i++) {
 Aml *dev = aml_device("C%.03X", i);
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
 aml_append(dev, aml_name_decl("_UID", aml_int(i)));
+if (possible_cpus->cpus[i].cpu == NULL) {
+aml_append(dev, aml_name_decl("_STA", aml_int(0)));
+}
 aml_append(scope, dev);
 }
 }
@@ -479,6 +482,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 const int *irqmap = vms->irqmap;
 AcpiMadtGenericDistributor *gicd;
 AcpiMadtGenericMsiFrame *gic_msi;
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
 int i;
 
 acpi_data_push(table_data, sizeof(AcpiMultipleApicTable));
@@ -489,7 +493,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
 gicd->version = vms->gic_version;
 
-for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {
+for (i = 0; i < possible_cpus->len; i++) {
 AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
sizeof(*gicc));
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -504,7 +508,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicc->cpu_interface_number = cpu_to_le32(i);
 gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity);
 gicc->uid = cpu_to_le32(i);
-gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+if (possible_cpus->cpus[i].cpu != NULL) {
+gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+}
 
 if (arm_feature(>env, ARM_FEATURE_PMU)) {
 gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ));
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c133b342b8..75659502e2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2047,6 +2047,8 @@ static void machvirt_init(MachineState *machine)
 
 qdev_realize(DEVICE(cpuobj), NULL, _fatal);
 object_unref(cpuobj);
+/* Initialize cpu member here since cpu hotplug is not supported yet */
+machine->possible_cpus->cpus[n].cpu = cpuobj;
 }
 fdt_add_timer_nodes(vms);
 fdt_add_cpu_nodes(vms);
-- 
2.23.0




[RFC PATCH 1/5] device_tree: Add qemu_fdt_add_path

2021-02-25 Thread Ying Fang
qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except
it also adds any missing parent nodes. We also tweak an error
message of qemu_fdt_add_subnode().

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
 include/sysemu/device_tree.h |  1 +
 softmmu/device_tree.c| 45 ++--
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 982c89345f..15fb98af98 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
 uint32_t qemu_fdt_alloc_phandle(void *fdt);
 int qemu_fdt_nop_node(void *fdt, const char *node_path);
 int qemu_fdt_add_subnode(void *fdt, const char *name);
+int qemu_fdt_add_path(void *fdt, const char *path);
 
 #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \
 do {  \
diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
index b9a3ddc518..1e3857ca0c 100644
--- a/softmmu/device_tree.c
+++ b/softmmu/device_tree.c
@@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 
 retval = fdt_add_subnode(fdt, parent, basename);
 if (retval < 0) {
-error_report("FDT: Failed to create subnode %s: %s", name,
- fdt_strerror(retval));
+error_report("%s: Failed to create subnode %s: %s",
+ __func__, name, fdt_strerror(retval));
 exit(1);
 }
 
@@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 return retval;
 }
 
+/*
+ * Like qemu_fdt_add_subnode(), but will add all missing
+ * subnodes in the path.
+ */
+int qemu_fdt_add_path(void *fdt, const char *path)
+{
+char *dupname, *basename, *p;
+int parent, retval = -1;
+
+if (path[0] != '/') {
+return retval;
+}
+
+parent = fdt_path_offset(fdt, "/");
+p = dupname = g_strdup(path);
+
+while (p) {
+*p = '/';
+basename = p + 1;
+p = strchr(p + 1, '/');
+if (p) {
+*p = '\0';
+}
+retval = fdt_path_offset(fdt, dupname);
+if (retval < 0 && retval != -FDT_ERR_NOTFOUND) {
+error_report("%s: Invalid path %s: %s",
+ __func__, path, fdt_strerror(retval));
+exit(1);
+} else if (retval == -FDT_ERR_NOTFOUND) {
+retval = fdt_add_subnode(fdt, parent, basename);
+if (retval < 0) {
+break;
+}
+}
+parent = retval;
+}
+
+g_free(dupname);
+return retval;
+}
+
 void qemu_fdt_dumpdtb(void *fdt, int size)
 {
 const char *dumpdtb = current_machine->dumpdtb;
-- 
2.23.0




[RFC PATCH 4/5] hw/acpi/aml-build: add processor hierarchy node structure

2021-02-25 Thread Ying Fang
Add the processor hierarchy node structures to build ACPI information
for CPU topology. Since the private resources may be used to describe
cache hierarchy and it is variable among different topology level,
three helpers are introduced to describe the hierarchy.

(1) build_socket_hierarchy for socket description
(2) build_processor_hierarchy for processor description
(3) build_smt_hierarchy for thread (logic processor) description

Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
 hw/acpi/aml-build.c | 40 +
 include/hw/acpi/acpi-defs.h | 13 
 include/hw/acpi/aml-build.h |  7 +++
 3 files changed, 60 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a2cd7a5830..a0af3e9d73 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1888,6 +1888,46 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
  table_data->len - slit_start, 1, oem_id, oem_table_id);
 }
 
+/*
+ * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
+ */
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, ACPI_PPTT_PHYSICAL_PACKAGE, 4);
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */
+build_append_byte(tbl, 20);   /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2); /* Reserved */
+build_append_int_noprefix(tbl,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID |
+  ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD |
+  ACPI_PPTT_ACPI_LEAF_NODE, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent , 4); /* parent */
+build_append_int_noprefix(tbl, id, 4);  /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);   /* Num of private resources */
+}
+
 /* build rev1/rev3/rev5.1 FADT */
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index cf9f44299c..45e10d886f 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -618,4 +618,17 @@ struct AcpiIortRC {
 } QEMU_PACKED;
 typedef struct AcpiIortRC AcpiIortRC;
 
+enum {
+ACPI_PPTT_TYPE_PROCESSOR = 0,
+ACPI_PPTT_TYPE_CACHE,
+ACPI_PPTT_TYPE_ID,
+ACPI_PPTT_TYPE_RESERVED
+};
+
+#define ACPI_PPTT_PHYSICAL_PACKAGE  (1)
+#define ACPI_PPTT_ACPI_PROCESSOR_ID_VALID   (1 << 1)
+#define ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD  (1 << 2)  /* ACPI 6.3 */
+#define ACPI_PPTT_ACPI_LEAF_NODE(1 << 3)  /* ACPI 6.3 */
+#define ACPI_PPTT_ACPI_IDENTICAL(1 << 4)  /* ACPI 6.3 */
+
 #endif
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 380d3e3924..7f0ca1a198 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -462,6 +462,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
 const char *oem_id, const char *oem_table_id);
 
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id);
+
+void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id);
 
-- 
2.23.0




[RFC PATCH 5/5] hw/arm/virt-acpi-build: add PPTT table

2021-02-25 Thread Ying Fang
Add the Processor Properties Topology Table (PPTT) to present
CPU topology information to the guest. A three-level cpu
topology is built in accord with the linux kernel currently does.

Tested-by: Jiajie Li 
Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 50 
 1 file changed, 50 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index bb91152fe2..38d50ce66c 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -436,6 +436,50 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  vms->oem_table_id);
 }
 
+static void
+build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
+{
+int pptt_start = table_data->len;
+int uid = 0, cpus = 0, socket = 0;
+MachineState *ms = MACHINE(vms);
+unsigned int smp_cores = ms->smp.cores;
+unsigned int smp_threads = ms->smp.threads;
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
+uint32_t socket_offset = table_data->len - pptt_start;
+int core;
+
+build_socket_hierarchy(table_data, 0, socket);
+
+for (core = 0; core < smp_cores; core++) {
+uint32_t core_offset = table_data->len - pptt_start;
+int thread;
+
+if (smp_threads <= 1) {
+build_processor_hierarchy(table_data,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID |
+  ACPI_PPTT_ACPI_LEAF_NODE,
+  socket_offset, uid++);
+ } else {
+build_processor_hierarchy(table_data,
+  ACPI_PPTT_ACPI_PROCESSOR_ID_VALID,
+  socket_offset, core);
+for (thread = 0; thread < smp_threads; thread++) {
+build_thread_hierarchy(table_data, core_offset, uid++);
+}
+ }
+}
+cpus += smp_cores * smp_threads;
+}
+
+build_header(linker, table_data,
+ (void *)(table_data->data + pptt_start), "PPTT",
+ table_data->len - pptt_start, 2,
+ vms->oem_id, vms->oem_table_id);
+}
+
 /* GTDT */
 static void
 build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -688,6 +732,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 unsigned dsdt, xsdt;
 GArray *tables_blob = tables->table_data;
 MachineState *ms = MACHINE(vms);
+bool cpu_topology_enabled = !vmc->no_cpu_topology;
 
 table_offsets = g_array_new(false, true /* clear */,
 sizeof(uint32_t));
@@ -707,6 +752,11 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 acpi_add_table(table_offsets, tables_blob);
 build_madt(tables_blob, tables->linker, vms);
 
+if (ms->smp.cpus > 1 && cpu_topology_enabled) {
+acpi_add_table(table_offsets, tables_blob);
+build_pptt(tables_blob, tables->linker, vms);
+}
+
 acpi_add_table(table_offsets, tables_blob);
 build_gtdt(tables_blob, tables->linker, vms);
 
-- 
2.23.0




[RFC PATCH 0/5] hw/arm/virt: Introduce cpu topology support

2021-02-25 Thread Ying Fang
An accurate cpu topology may help improve the cpu scheduler's decision
making when dealing with multi-core system. So cpu topology description
is helpful to provide guest with the right view. Dario Faggioli's talk
in [0] also shows the virtual topology may has impact on sched performace.
Thus this patch series is posted to introduce cpu topology support for
arm platform.

Both fdt and ACPI are introduced to present the cpu topology. To describe
the cpu topology via ACPI, a PPTT table is introduced according to the
processor hierarchy node structure. This series is derived from [1], in
[1] we are trying to bring both cpu and cache topology support for arm
platform, but there is still some issues to solve to support the cache
hierarchy. So we split the cpu topology part out and send it seperately.
The patch series to support cache hierarchy will be send later since
Salil Mehta's cpu hotplug feature need the cpu topology enabled first and
he is waiting for it to be upstreamed.

This patch series was initially based on the patches posted by Andrew Jones [2].
I jumped in on it since some OS vendor cooperative partner are eager for it.
Thanks for Andrew's contribution.

After applying this patch series, launch a guest with virt-6.0 and cpu
topology configured with sockets:cores:threads = 2:4:2, you will get the
bellow messages with the lscpu command.

-
Architecture:aarch64
CPU op-mode(s):  64-bit
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):   2
NUMA node(s):2
Vendor ID:   HiSilicon
Model:   0
Model name:  Kunpeng-920
Stepping:0x1
BogoMIPS:200.00
NUMA node0 CPU(s):   0-7
NUMA node1 CPU(s):   8-15

[0] 
https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse
[1] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02166.html
[2] 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com

Ying Fang (5):
  device_tree: Add qemu_fdt_add_path
  hw/arm/virt: Add cpu-map to device tree
  hw/arm/virt-acpi-build: distinguish possible and present cpus
  hw/acpi/aml-build: add processor hierarchy node structure
  hw/arm/virt-acpi-build: add PPTT table

 hw/acpi/aml-build.c  | 40 ++
 hw/arm/virt-acpi-build.c | 64 +---
 hw/arm/virt.c| 40 +-
 include/hw/acpi/acpi-defs.h  | 13 
 include/hw/acpi/aml-build.h  |  7 
 include/hw/arm/virt.h|  1 +
 include/sysemu/device_tree.h |  1 +
 softmmu/device_tree.c| 45 +++--
 8 files changed, 204 insertions(+), 7 deletions(-)

-- 
2.23.0




Re: [PATCH v4 0/7] block: Add retry for werror=/rerror= mechanism

2021-01-24 Thread Ying Fang

Kindly ping for it.

Thanks for Stefan's suggestion, we have re-implement the concept by
introducing the 'retry' feature base on the werror=/rerror= mechanism.

Hope this thread won't be missed. Any comments and reviews are wellcome.

Thanks.
Ying Fang.

On 12/15/2020 8:30 PM, Jiahui Cen wrote:

A VM in the cloud environment may use a virutal disk as the backend storage,
and there are usually filesystems on the virtual block device. When backend
storage is temporarily down, any I/O issued to the virtual block device
will cause an error. For example, an error occurred in ext4 filesystem would
make the filesystem readonly. In production environment, a cloud backend
storage can be soon recovered. For example, an IP-SAN may be down due to
network failure and will be online soon after network is recovered. However,
the error in the filesystem may not be recovered unless a device reattach
or system restart. Thus an I/O retry mechanism is in need to implement a
self-healing system.

This patch series propose to extend the werror=/rerror= mechanism to add
a 'retry' feature. It can automatically retry failed I/O requests on error
without sending error back to guest, and guest can get back running smoothly
when I/O is recovred.

v3->v4:
* Adapt to werror=/rerror= mechanism.

v2->v3:
* Add a doc to describe I/O hang.

v1->v2:
* Rebase to fix compile problems.
* Fix incorrect remove of rehandle list.
* Provide rehandle pause interface.

REF: https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg06560.html

Signed-off-by: Jiahui Cen 
Signed-off-by: Ying Fang 

Jiahui Cen (7):
   qapi/block-core: Add retry option for error action
   block-backend: Introduce retry timer
   block-backend: Add device specific retry callback
   block-backend: Enable retry action on errors
   block-backend: Add timeout support for retry
   block: Add error retry param setting
   virtio_blk: Add support for retry on errors

  block/block-backend.c  | 66 
  blockdev.c | 52 +++
  hw/block/block.c   | 10 +++
  hw/block/virtio-blk.c  | 19 +-
  include/hw/block/block.h   |  7 ++-
  include/sysemu/block-backend.h | 10 +++
  qapi/block-core.json   |  4 +-
  7 files changed, 162 insertions(+), 6 deletions(-)





Re: [RFC PATCH v3 10/13] target/arm/cpu: Add cpu cache description for arm

2021-01-12 Thread Ying Fang




On 11/30/2020 9:00 PM, Peter Maydell wrote:

On Mon, 9 Nov 2020 at 03:05, Ying Fang  wrote:


Add the CPUCacheInfo structure to hold cpu cache information for ARM cpus.
A classic three level cache topology is used here. The default cache
capacity is given and userspace can overwrite these values.

Signed-off-by: Ying Fang 
---
  target/arm/cpu.c | 42 ++
  target/arm/cpu.h | 27 +++
  2 files changed, 69 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 056319859f..f1bac7452c 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -27,6 +27,7 @@
  #include "qapi/visitor.h"
  #include "cpu.h"
  #include "internals.h"
+#include "qemu/units.h"
  #include "exec/exec-all.h"
  #include "hw/qdev-properties.h"
  #if !defined(CONFIG_USER_ONLY)
@@ -997,6 +998,45 @@ uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz)
  return (Aff1 << ARM_AFF1_SHIFT) | Aff0;
  }

+static CPUCaches default_cache_info = {
+.l1d_cache = &(CPUCacheInfo) {
+.type = DATA_CACHE,
+.level = 1,
+.size = 64 * KiB,
+.line_size = 64,
+.associativity = 4,
+.sets = 256,
+.attributes = 0x02,
+},


Would it be possible to populate this structure from the
CLIDR/CCSIDR ID register values, rather than having to
specify the same thing in two places?


Sorry I missed this reply.

I had tried to fetch CLIDR/CCSID ID register values of host cpu
from KVM, however I did not get the value expected. May I made
some mistakes in KVM side.

Thanks for your guide, I'll try to populate them again.



thanks
-- PMM
.



Thanks.
Ying.



Re: [PATCH] hw/arm/virt: Remove virt machine state 'smp_cpus'

2020-12-16 Thread Ying Fang




On 12/16/2020 1:48 AM, Andrew Jones wrote:

virt machine's 'smp_cpus' and machine->smp.cpus must always have the
same value. And, anywhere we have virt machine state we have machine
state. So let's remove the redundancy. Also, to make it easier to see
that machine->smp is the true source for "smp_cpus" and "max_cpus",
avoid passing them in function parameters, preferring instead to get
them from the state.

No functional change intended.

Signed-off-by: Andrew Jones 


Reviewed-by: Ying Fang 


---
  hw/arm/virt-acpi-build.c |  9 +
  hw/arm/virt.c| 24 +++-
  include/hw/arm/virt.h|  3 +--
  3 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 711cf2069fe8..9d9ee2405345 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -59,11 +59,12 @@
  
  #define ACPI_BUILD_TABLE_SIZE 0x2
  
-static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)

+static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
  {
+MachineState *ms = MACHINE(vms);
  uint16_t i;
  
-for (i = 0; i < smp_cpus; i++) {

+for (i = 0; i < ms->smp.cpus; i++) {
  Aml *dev = aml_device("C%.03X", i);
  aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
  aml_append(dev, aml_name_decl("_UID", aml_int(i)));
@@ -484,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
  gicd->version = vms->gic_version;
  
-for (i = 0; i < vms->smp_cpus; i++) {

+for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {
  AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
 sizeof(*gicc));
  ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -603,7 +604,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
   * the RTC ACPI device at all when using UEFI.
   */
  scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, vms->smp_cpus);
+acpi_dsdt_add_cpus(scope, vms);
  acpi_dsdt_add_uart(scope, [VIRT_UART],
 (irqmap[VIRT_UART] + ARM_SPI_BASE));
  if (vmc->acpi_expose_flash) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 556592012ee0..534d306f3104 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -323,7 +323,7 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
  if (vms->gic_version == VIRT_GIC_VERSION_2) {
  irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
   GIC_FDT_IRQ_PPI_CPU_WIDTH,
- (1 << vms->smp_cpus) - 1);
+ (1 << MACHINE(vms)->smp.cpus) - 1);
  }
  
  qemu_fdt_add_subnode(vms->fdt, "/timer");

@@ -347,9 +347,9 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
  
  static void fdt_add_cpu_nodes(const VirtMachineState *vms)

  {
-int cpu;
-int addr_cells = 1;
  const MachineState *ms = MACHINE(vms);
+int smp_cpus = ms->smp.cpus, cpu;
+int addr_cells = 1;
  
  /*

   * From Documentation/devicetree/bindings/arm/cpus.txt
@@ -364,7 +364,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
   *  The simplest way to go is to examine affinity IDs of all our CPUs. If
   *  at least one of them has Aff3 populated, we set #address-cells to 2.
   */
-for (cpu = 0; cpu < vms->smp_cpus; cpu++) {
+for (cpu = 0; cpu < smp_cpus; cpu++) {
  ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
  
  if (armcpu->mp_affinity & ARM_AFF3_MASK) {

@@ -377,7 +377,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
  qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells);
  qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#size-cells", 0x0);
  
-for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {

+for (cpu = smp_cpus - 1; cpu >= 0; cpu--) {
  char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
  ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
  CPUState *cs = CPU(armcpu);
@@ -387,8 +387,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
  qemu_fdt_setprop_string(vms->fdt, nodename, "compatible",
  armcpu->dtb_compatible);
  
-if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED

-&& vms->smp_cpus > 1) {
+if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED && smp_cpus > 1) {
  qemu_fdt_setprop_string(vms->fdt, nodename,
  "enable-

Re: [RFC PATCH v3 01/13] hw/arm/virt: Spell out smp.cpus and smp.max_cpus

2020-11-17 Thread Ying Fang




On 11/9/2020 6:45 PM, Salil Mehta wrote:

Hi Fangying,
A trivial thing. This patch looks bit of a noise in this patch-set. Better
to send it as a separate patch-set and get it accepted.


Hmm, this patch looks like a code reactor for the somewhat confusing
*smp_cpus* which will tidy the code. Maybe Andrew could do that.


Thanks


From: fangying
Sent: Monday, November 9, 2020 3:05 AM
To: peter.mayd...@linaro.org
Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; drjo...@redhat.com;
imamm...@redhat.com; shannon.zha...@gmail.com; alistair.fran...@wdc.com;
Zhanghailiang ; Salil Mehta

Subject: [RFC PATCH v3 01/13] hw/arm/virt: Spell out smp.cpus and smp.max_cpus

From: Andrew Jones 

Prefer to spell out the smp.cpus and smp.max_cpus machine state
variables in order to make grepping easier and to avoid any
confusion as to what cpu count is being used where.

Signed-off-by: Andrew Jones 
---
  hw/arm/virt-acpi-build.c |  8 +++
  hw/arm/virt.c| 51 +++-
  include/hw/arm/virt.h|  2 +-
  3 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9747a6458f..a222981737 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -57,11 +57,11 @@

  #define ARM_SPI_BASE 32

-static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
+static void acpi_dsdt_add_cpus(Aml *scope, int cpus)
  {
  uint16_t i;

-for (i = 0; i < smp_cpus; i++) {
+for (i = 0; i < cpus; i++) {
  Aml *dev = aml_device("C%.03X", i);
  aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
  aml_append(dev, aml_name_decl("_UID", aml_int(i)));
@@ -480,7 +480,7 @@ build_madt(GArray *table_data, BIOSLinker *linker,
VirtMachineState *vms)
  gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
  gicd->version = vms->gic_version;

-for (i = 0; i < vms->smp_cpus; i++) {
+for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {
  AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
 sizeof(*gicc));
  ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -599,7 +599,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
VirtMachineState *vms)
   * the RTC ACPI device at all when using UEFI.
   */
  scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, vms->smp_cpus);
+acpi_dsdt_add_cpus(scope, ms->smp.cpus);
  acpi_dsdt_add_uart(scope, [VIRT_UART],
 (irqmap[VIRT_UART] + ARM_SPI_BASE));
  if (vmc->acpi_expose_flash) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e465a988d6..0069fa1298 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -322,7 +322,7 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
  if (vms->gic_version == VIRT_GIC_VERSION_2) {
  irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
   GIC_FDT_IRQ_PPI_CPU_WIDTH,
- (1 << vms->smp_cpus) - 1);
+ (1 << MACHINE(vms)->smp.cpus) - 1);
  }

  qemu_fdt_add_subnode(vms->fdt, "/timer");
@@ -363,7 +363,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
   *  The simplest way to go is to examine affinity IDs of all our CPUs. If
   *  at least one of them has Aff3 populated, we set #address-cells to 2.
   */
-for (cpu = 0; cpu < vms->smp_cpus; cpu++) {
+for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
  ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));

  if (armcpu->mp_affinity & ARM_AFF3_MASK) {
@@ -376,7 +376,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
  qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells);
  qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#size-cells", 0x0);

-for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
+for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
  char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
  ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
  CPUState *cs = CPU(armcpu);
@@ -387,7 +387,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
  armcpu->dtb_compatible);

  if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED
-&& vms->smp_cpus > 1) {
+&& ms->smp.cpus > 1) {
  qemu_fdt_setprop_string(vms->fdt, nodename,
  "enable-method", "psci");
  }
@@ -533,7 +533,7 @@ static void fdt_add_pmu_nodes(const VirtMachineState *vms)
  if (vms->gic_version == VIRT_GIC_VERSION_2) {
  irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
   GIC_FDT_IRQ_PPI_CPU_WIDTH,
- (1 << vms->smp_cpus) - 1);
+ (1 << MACHINE(vms)->smp.cpus) - 1);
  }

  qemu_fdt_add_subnode(vms->fdt, "/pmu");
@@ -622,14 

Re: Question on UEFI ACPI tables setup and probing on arm64

2020-11-09 Thread Ying Fang




On 11/7/2020 1:09 AM, Laszlo Ersek wrote:

On 11/05/20 05:30, Ying Fang wrote:


I see it in Qemu the *loader_start* is fixed at 1 GiB on the
physical address space which points to the DRAM base. In ArmVirtQemu.dsc
PcdDeviceTreeInitialBaseAddress is set 0x4000 with correspondence.

Here I also see the discussion about DRAM base for ArmVirtQemu.
https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03127.html

I am still not sure how UEFI knows that it is running on a ArmVirtQemu
machine type.


It doesn't know. It remains a convention.

This part is not auto-detected; the constants in QEMU and edk2 are
independently open-coded, their values were synchronized by human effort
initially.

The user or the management layer have to make sure they boot a UEFI
firmware binary on the machine type that is compatible with the machine
type.

There is some meta-data to help with that:



Thanks so much for the reply,
I now have the basic understanding how QEMU and EDK2 works together
after reading the docs and code there.


Does UEFI derive it from the fdt *compatible* property ?


Please see the schema "docs/interop/firmware.json" in the QEMU tree; in
particular the @FirmwareTarget element.

For an actual example: QEMU bundles some edk2 firmware binaries (purely
as a convenience, not for production), and those are accompanied by
matching descriptor files. See
"pc-bios/descriptors/60-edk2-aarch64.json". (It is a template that's
fixed up during QEMU installation, but that's tangential here.)

 "targets": [
 {
 "architecture": "aarch64",
 "machines": [
 "virt-*"
 ]
 }
 ],



Thanks, I'll look closer into it.


Thanks
Laszlo

.





[RFC PATCH v3 13/13] hw/arm/virt-acpi-build: Enable cpu and cache topology

2020-11-08 Thread Ying Fang
A helper struct AcpiCacheOffset is introduced to describe the offset
of three level caches. The cache hierarchy is built according to
ACPI spec v6.3 5.2.29.2. Let's enable CPU cache topology now.

Signed-off-by: Ying Fang 
---
 hw/acpi/aml-build.c | 19 +-
 hw/arm/virt-acpi-build.c| 52 -
 include/hw/acpi/acpi-defs.h |  6 +
 include/hw/acpi/aml-build.h |  7 ++---
 4 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 1a38110149..93a81fbaf5 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1799,27 +1799,32 @@ void build_cache_hierarchy(GArray *tbl,
 /*
  * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
  */
-void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+void build_socket_hierarchy(GArray *tbl, uint32_t parent,
+uint32_t offset, uint32_t id)
 {
 build_append_byte(tbl, 0);  /* Type 0 - processor */
-build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_byte(tbl, 24); /* Length, with private resources */
 build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
 build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */
 build_append_int_noprefix(tbl, parent, 4);  /* Parent */
 build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
-build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+build_append_int_noprefix(tbl, 1, 4);  /*  Number of private resources */
+build_append_int_noprefix(tbl, offset, 4);  /* Private resources */
 }
 
-void build_processor_hierarchy(GArray *tbl, uint32_t flags,
-   uint32_t parent, uint32_t id)
+void build_processor_hierarchy(GArray *tbl, uint32_t flags, uint32_t parent,
+   AcpiCacheOffset offset, uint32_t id)
 {
 build_append_byte(tbl, 0);  /* Type 0 - processor */
-build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_byte(tbl, 32); /* Length, with private resources */
 build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
 build_append_int_noprefix(tbl, flags, 4);  /* Flags */
 build_append_int_noprefix(tbl, parent, 4); /* Parent */
 build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
-build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+build_append_int_noprefix(tbl, 3, 4);  /* Number of private resources */
+build_append_int_noprefix(tbl, offset.l1d_offset, 4);/* Private resources 
*/
+build_append_int_noprefix(tbl, offset.l1i_offset, 4);/* Private resources 
*/
+build_append_int_noprefix(tbl, offset.l2_offset, 4); /* Private resources 
*/
 }
 
 void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 5784370257..ad49006b42 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -429,29 +429,69 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  "SRAT", table_data->len - srat_start, 3, NULL, NULL);
 }
 
-static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState 
*ms)
+static inline void arm_acpi_cache_info(CPUCacheInfo *cpu_cache,
+   AcpiCacheInfo *acpi_cache)
 {
+acpi_cache->size = cpu_cache->size;
+acpi_cache->sets = cpu_cache->sets;
+acpi_cache->associativity = cpu_cache->associativity;
+acpi_cache->attributes = cpu_cache->attributes;
+acpi_cache->line_size = cpu_cache->line_size;
+}
+
+static void build_pptt(GArray *table_data, BIOSLinker *linker,
+   VirtMachineState *vms)
+{
+MachineState *ms = MACHINE(vms);
 int pptt_start = table_data->len;
 int uid = 0, cpus = 0, socket;
 unsigned int smp_cores = ms->smp.cores;
 unsigned int smp_threads = ms->smp.threads;
+AcpiCacheOffset offset;
+ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpus));
+AcpiCacheInfo cache_info;
 
 acpi_data_push(table_data, sizeof(AcpiTableHeader));
 
 for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
-uint32_t socket_offset = table_data->len - pptt_start;
+uint32_t l3_offset = table_data->len - pptt_start;
+uint32_t socket_offset;
 int core;
 
-build_socket_hierarchy(table_data, 0, socket);
+/* L3 cache type structure */
+arm_acpi_cache_info(cpu->caches.l3_cache, _info);
+build_cache_hierarchy(table_data, 0, _info);
+
+socket_offset = table_data->len - pptt_start;
+build_socket_hierarchy(table_data, 0, l3_offset, socket);
 
 for (core = 0; core < smp_cores; core++) {
 uint32_t core_offset = table_data->len - pptt_start;
 int th

[RFC PATCH v3 03/13] hw/arm/virt: Replace smp_parse with one that prefers cores

2020-11-08 Thread Ying Fang
From: Andrew Jones 

The virt machine type has never used the CPU topology parameters, other
than number of online CPUs and max CPUs. When choosing how to allocate
those CPUs the default has been to assume cores. In preparation for
using the other CPU topology parameters let's use an smp_parse that
prefers cores over sockets. We can also enforce the topology matches
max_cpus check because we have no legacy to preserve.

Signed-off-by: Andrew Jones 
---
 hw/arm/virt.c | 76 +++
 1 file changed, 76 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ea24b576c6..ba902b53ba 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -78,6 +78,8 @@
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/char/pl011.h"
 #include "qemu/guest-random.h"
+#include "qapi/qmp/qerror.h"
+#include "sysemu/replay.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
 static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -2444,6 +2446,79 @@ static int virt_kvm_type(MachineState *ms, const char 
*type_str)
 return requested_pa_size > 40 ? requested_pa_size : 0;
 }
 
+/*
+ * Unlike smp_parse() in hw/core/machine.c, we prefer cores over sockets,
+ * e.g. '-smp 8' creates 1 socket with 8 cores.  Whereas '-smp 8' with
+ * hw/core/machine.c's smp_parse() creates 8 sockets, each with 1 core.
+ * Additionally, we can enforce the topology matches max_cpus check,
+ * because we have no legacy to preserve.
+ */
+static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
+{
+if (opts) {
+unsigned cpus= qemu_opt_get_number(opts, "cpus", 0);
+unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
+unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
+unsigned threads = qemu_opt_get_number(opts, "threads", 0);
+
+/*
+ * Compute missing values; prefer cores over sockets and
+ * sockets over threads.
+ */
+if (cpus == 0 || cores == 0) {
+sockets = sockets > 0 ? sockets : 1;
+threads = threads > 0 ? threads : 1;
+if (cpus == 0) {
+cores = cores > 0 ? cores : 1;
+cpus = cores * threads * sockets;
+} else {
+ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
+cores = ms->smp.max_cpus / (sockets * threads);
+}
+} else if (sockets == 0) {
+threads = threads > 0 ? threads : 1;
+sockets = cpus / (cores * threads);
+sockets = sockets > 0 ? sockets : 1;
+} else if (threads == 0) {
+threads = cpus / (cores * sockets);
+threads = threads > 0 ? threads : 1;
+} else if (sockets * cores * threads < cpus) {
+error_report("cpu topology: "
+ "sockets (%u) * cores (%u) * threads (%u) < "
+ "smp_cpus (%u)",
+ sockets, cores, threads, cpus);
+exit(1);
+}
+
+ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
+
+if (ms->smp.max_cpus < cpus) {
+error_report("maxcpus must be equal to or greater than smp");
+exit(1);
+}
+
+if (sockets * cores * threads != ms->smp.max_cpus) {
+error_report("cpu topology: "
+ "sockets (%u) * cores (%u) * threads (%u)"
+ "!= maxcpus (%u)",
+ sockets, cores, threads,
+ ms->smp.max_cpus);
+exit(1);
+}
+
+ms->smp.cpus = cpus;
+ms->smp.cores = cores;
+ms->smp.threads = threads;
+ms->smp.sockets = sockets;
+}
+
+if (ms->smp.cpus > 1) {
+Error *blocker = NULL;
+error_setg(, QERR_REPLAY_NOT_SUPPORTED, "smp");
+replay_add_blocker(blocker);
+}
+}
+
 static void virt_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -2469,6 +2544,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
 mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
+mc->smp_parse = virt_smp_parse;
 mc->kvm_type = virt_kvm_type;
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
-- 
2.23.0




[RFC PATCH v3 08/13] hw/acpi/aml-build: add processor hierarchy node structure

2020-11-08 Thread Ying Fang
Add the processor hierarchy node structures to build ACPI information
for cpu topology. Three helpers are introduced:

(1) build_socket_hierarchy for socket description structure
(2) build_processor_hierarchy for processor description structure
(3) build_smt_hierarchy for thread (logic processor) description structure

We split the processor hierarchy node structure descriptions into
three helpers even if there are some identical code snippets between
them. The reason is that the private resources are variable among
different topology level. This will make the ACPI PPTT table much
more readable and easy to construct.

Cc: Igor Mammedov 
Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
 hw/acpi/aml-build.c | 37 +
 include/hw/acpi/aml-build.h |  7 +++
 2 files changed, 44 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 3792ba96ce..d1aa9fd716 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1770,6 +1770,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms)
  table_data->len - slit_start, 1, NULL, NULL);
 }
 
+/*
+ * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
+ */
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */
+build_append_int_noprefix(tbl, parent, 4);  /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);/* Type 0 - processor */
+build_append_byte(tbl, 20);   /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2); /* Reserved */
+build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */
+build_append_int_noprefix(tbl, parent , 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4);  /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);   /* Num of private resources */
+}
+
 /* build rev1/rev3/rev5.1 FADT */
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index fe0055fffb..56474835a7 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -437,6 +437,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
 
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id);
+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id);
 
-- 
2.23.0




[RFC PATCH v3 11/13] hw/arm/virt: add fdt cache information

2020-11-08 Thread Ying Fang
Support devicetree cpu cache information descriptions

Signed-off-by: Ying Fang 
---
 hw/arm/virt.c | 92 +++
 1 file changed, 92 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b6cebb5549..21275e03c2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -346,6 +346,89 @@ static void fdt_add_timer_nodes(const VirtMachineState 
*vms)
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL2_IRQ, irqflags);
 }
 
+static void fdt_add_l3cache_nodes(const VirtMachineState *vms)
+{
+int i;
+const MachineState *ms = MACHINE(vms);
+ARMCPU *cpu = ARM_CPU(first_cpu);
+unsigned int smp_cores = ms->smp.cores;
+unsigned int sockets = ms->smp.max_cpus / smp_cores;
+
+for (i = 0; i < sockets; i++) {
+char *nodename = g_strdup_printf("/cpus/l3-cache%d", i);
+qemu_fdt_add_subnode(vms->fdt, nodename);
+qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache");
+qemu_fdt_setprop_string(vms->fdt, nodename, "cache-unified", "true");
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-level", 3);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size",
+  cpu->caches.l3_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size",
+  cpu->caches.l3_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets",
+  cpu->caches.l3_cache->sets);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+g_free(nodename);
+}
+}
+
+static void fdt_add_l2cache_nodes(const VirtMachineState *vms)
+{
+int i, j;
+const MachineState *ms = MACHINE(vms);
+unsigned int smp_cores = ms->smp.cores;
+signed int sockets = ms->smp.max_cpus / smp_cores;
+ARMCPU *cpu = ARM_CPU(first_cpu);
+
+for (i = 0; i < sockets; i++) {
+char *next_path = g_strdup_printf("/cpus/l3-cache%d", i);
+for (j = 0; j < smp_cores; j++) {
+char *nodename = g_strdup_printf("/cpus/l2-cache%d",
+  i * smp_cores + j);
+qemu_fdt_add_subnode(vms->fdt, nodename);
+qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache");
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size",
+  cpu->caches.l2_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size",
+  cpu->caches.l2_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets",
+  cpu->caches.l2_cache->sets);
+qemu_fdt_setprop_phandle(vms->fdt, nodename,
+  "next-level-cache", next_path);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+g_free(nodename);
+}
+g_free(next_path);
+}
+}
+
+static void fdt_add_l1cache_prop(const VirtMachineState *vms,
+char *nodename, int cpu_index)
+{
+
+ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpu_index));
+CPUCaches caches = cpu->caches;
+
+char *cachename = g_strdup_printf("/cpus/l2-cache%d", cpu_index);
+
+qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-size",
+  caches.l1d_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-line-size",
+  caches.l1d_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-sets",
+  caches.l1d_cache->sets);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-size",
+  caches.l1i_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-line-size",
+  caches.l1i_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-sets",
+  caches.l1i_cache->sets);
+qemu_fdt_setprop_phandle(vms->fdt, nodename, "next-level-cache",
+  cachename);
+g_free(cachename);
+}
+
 static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 {
 int cpu;
@@ -379,6 +462,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells);
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#s

[RFC PATCH v3 10/13] target/arm/cpu: Add cpu cache description for arm

2020-11-08 Thread Ying Fang
Add the CPUCacheInfo structure to hold cpu cache information for ARM cpus.
A classic three level cache topology is used here. The default cache
capacity is given and userspace can overwrite these values.

Signed-off-by: Ying Fang 
---
 target/arm/cpu.c | 42 ++
 target/arm/cpu.h | 27 +++
 2 files changed, 69 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 056319859f..f1bac7452c 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -27,6 +27,7 @@
 #include "qapi/visitor.h"
 #include "cpu.h"
 #include "internals.h"
+#include "qemu/units.h"
 #include "exec/exec-all.h"
 #include "hw/qdev-properties.h"
 #if !defined(CONFIG_USER_ONLY)
@@ -997,6 +998,45 @@ uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz)
 return (Aff1 << ARM_AFF1_SHIFT) | Aff0;
 }
 
+static CPUCaches default_cache_info = {
+.l1d_cache = &(CPUCacheInfo) {
+.type = DATA_CACHE,
+.level = 1,
+.size = 64 * KiB,
+.line_size = 64,
+.associativity = 4,
+.sets = 256,
+.attributes = 0x02,
+},
+.l1i_cache = &(CPUCacheInfo) {
+.type = INSTRUCTION_CACHE,
+.level = 1,
+.size = 64 * KiB,
+.line_size = 64,
+.associativity = 4,
+.sets = 256,
+.attributes = 0x04,
+},
+.l2_cache = &(CPUCacheInfo) {
+.type = UNIFIED_CACHE,
+.level = 2,
+.size = 512 * KiB,
+.line_size = 64,
+.associativity = 8,
+.sets = 1024,
+.attributes = 0x0a,
+},
+.l3_cache = &(CPUCacheInfo) {
+.type = UNIFIED_CACHE,
+.level = 3,
+.size = 65536 * KiB,
+.line_size = 64,
+.associativity = 15,
+.sets = 2048,
+.attributes = 0x0a,
+},
+};
+
 static void cpreg_hashtable_data_destroy(gpointer data)
 {
 /*
@@ -1841,6 +1881,8 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 }
 }
 
+cpu->caches = default_cache_info;
+
 qemu_init_vcpu(cs);
 cpu_reset(cs);
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index cfff1b5c8f..dbc33a9802 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -746,6 +746,30 @@ typedef enum ARMPSCIState {
 
 typedef struct ARMISARegisters ARMISARegisters;
 
+/* Cache information type */
+enum CacheType {
+DATA_CACHE,
+INSTRUCTION_CACHE,
+UNIFIED_CACHE
+};
+
+typedef struct CPUCacheInfo {
+enum CacheType type;  /* Cache Type*/
+uint8_t level;
+uint32_t size;/* Size in bytes */
+uint16_t line_size;   /* Line size in bytes */
+uint8_t associativity;/* Cache associativity */
+uint32_t sets;/* Number of sets */
+uint8_t attributes;   /* Cache attributest  */
+} CPUCacheInfo;
+
+typedef struct CPUCaches {
+CPUCacheInfo *l1d_cache;
+CPUCacheInfo *l1i_cache;
+CPUCacheInfo *l2_cache;
+CPUCacheInfo *l3_cache;
+} CPUCaches;
+
 /**
  * ARMCPU:
  * @env: #CPUARMState
@@ -987,6 +1011,9 @@ struct ARMCPU {
 
 /* Generic timer counter frequency, in Hz */
 uint64_t gt_cntfrq_hz;
+
+/* CPU cache information */
+CPUCaches caches;
 };
 
 unsigned int gt_cntfrq_period_ns(ARMCPU *cpu);
-- 
2.23.0




[RFC PATCH v3 01/13] hw/arm/virt: Spell out smp.cpus and smp.max_cpus

2020-11-08 Thread Ying Fang
From: Andrew Jones 

Prefer to spell out the smp.cpus and smp.max_cpus machine state
variables in order to make grepping easier and to avoid any
confusion as to what cpu count is being used where.

Signed-off-by: Andrew Jones 
---
 hw/arm/virt-acpi-build.c |  8 +++
 hw/arm/virt.c| 51 +++-
 include/hw/arm/virt.h|  2 +-
 3 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9747a6458f..a222981737 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -57,11 +57,11 @@
 
 #define ARM_SPI_BASE 32
 
-static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
+static void acpi_dsdt_add_cpus(Aml *scope, int cpus)
 {
 uint16_t i;
 
-for (i = 0; i < smp_cpus; i++) {
+for (i = 0; i < cpus; i++) {
 Aml *dev = aml_device("C%.03X", i);
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
 aml_append(dev, aml_name_decl("_UID", aml_int(i)));
@@ -480,7 +480,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
 gicd->version = vms->gic_version;
 
-for (i = 0; i < vms->smp_cpus; i++) {
+for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {
 AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
sizeof(*gicc));
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -599,7 +599,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  * the RTC ACPI device at all when using UEFI.
  */
 scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, vms->smp_cpus);
+acpi_dsdt_add_cpus(scope, ms->smp.cpus);
 acpi_dsdt_add_uart(scope, [VIRT_UART],
(irqmap[VIRT_UART] + ARM_SPI_BASE));
 if (vmc->acpi_expose_flash) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e465a988d6..0069fa1298 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -322,7 +322,7 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
 if (vms->gic_version == VIRT_GIC_VERSION_2) {
 irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
  GIC_FDT_IRQ_PPI_CPU_WIDTH,
- (1 << vms->smp_cpus) - 1);
+ (1 << MACHINE(vms)->smp.cpus) - 1);
 }
 
 qemu_fdt_add_subnode(vms->fdt, "/timer");
@@ -363,7 +363,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
  *  The simplest way to go is to examine affinity IDs of all our CPUs. If
  *  at least one of them has Aff3 populated, we set #address-cells to 2.
  */
-for (cpu = 0; cpu < vms->smp_cpus; cpu++) {
+for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
 
 if (armcpu->mp_affinity & ARM_AFF3_MASK) {
@@ -376,7 +376,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells);
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#size-cells", 0x0);
 
-for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
+for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
 char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
 CPUState *cs = CPU(armcpu);
@@ -387,7 +387,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 armcpu->dtb_compatible);
 
 if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED
-&& vms->smp_cpus > 1) {
+&& ms->smp.cpus > 1) {
 qemu_fdt_setprop_string(vms->fdt, nodename,
 "enable-method", "psci");
 }
@@ -533,7 +533,7 @@ static void fdt_add_pmu_nodes(const VirtMachineState *vms)
 if (vms->gic_version == VIRT_GIC_VERSION_2) {
 irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
  GIC_FDT_IRQ_PPI_CPU_WIDTH,
- (1 << vms->smp_cpus) - 1);
+ (1 << MACHINE(vms)->smp.cpus) - 1);
 }
 
 qemu_fdt_add_subnode(vms->fdt, "/pmu");
@@ -622,14 +622,13 @@ static void create_gic(VirtMachineState *vms)
 SysBusDevice *gicbusdev;
 const char *gictype;
 int type = vms->gic_version, i;
-unsigned int smp_cpus = ms->smp.cpus;
 uint32_t nb_redist_regions = 0;
 
 gictype = (type == 3) ? gicv3_class_name() : gic_class_name();
 
 vms->gic = qdev_new(gictype);
 qdev_prop_set_uint32(vms->gic, "revision", type);
-qdev_prop_set_uint32(vms->gic, "num-cpu", smp_cpus);
+qdev_prop_set_uint32(vms->gic, "num-cpu", ms->smp.cpus);
 /* Note that the num-irq property counts both internal and external
  * interrupts; there are always 32 of the former (mandated by GIC spec).
  */
@@ -641,7 +640,7 @@ static void 

[RFC PATCH v3 07/13] hw/arm/virt-acpi-build: distinguish possible and present cpus

2020-11-08 Thread Ying Fang
When building ACPI tables regarding CPUs we should always build
them for the number of possible CPUs, not the number of present
CPUs. We then ensure only the present CPUs are enabled in madt.
Furthermore, it is also needed if we are going to support CPU
hotplug in the future.

This patch is a rework based on Andrew Jones's contribution at
https://lists.gnu.org/archive/html/qemu-arm/2018-07/msg00076.html

Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 17 -
 hw/arm/virt.c|  3 +++
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a222981737..9edd6385dc 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -57,14 +57,18 @@
 
 #define ARM_SPI_BASE 32
 
-static void acpi_dsdt_add_cpus(Aml *scope, int cpus)
+static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
 {
 uint16_t i;
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
 
-for (i = 0; i < cpus; i++) {
+for (i = 0; i < possible_cpus->len; i++) {
 Aml *dev = aml_device("C%.03X", i);
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
 aml_append(dev, aml_name_decl("_UID", aml_int(i)));
+if (possible_cpus->cpus[i].cpu == NULL) {
+aml_append(dev, aml_name_decl("_STA", aml_int(0)));
+}
 aml_append(scope, dev);
 }
 }
@@ -470,6 +474,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 const int *irqmap = vms->irqmap;
 AcpiMadtGenericDistributor *gicd;
 AcpiMadtGenericMsiFrame *gic_msi;
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
 int i;
 
 acpi_data_push(table_data, sizeof(AcpiMultipleApicTable));
@@ -480,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
 gicd->version = vms->gic_version;
 
-for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {
+for (i = 0; i < possible_cpus->len; i++) {
 AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
sizeof(*gicc));
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -495,7 +500,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicc->cpu_interface_number = cpu_to_le32(i);
 gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity);
 gicc->uid = cpu_to_le32(i);
-gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+if (possible_cpus->cpus[i].cpu != NULL) {
+gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+}
 
 if (arm_feature(>env, ARM_FEATURE_PMU)) {
 gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ));
@@ -599,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  * the RTC ACPI device at all when using UEFI.
  */
 scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, ms->smp.cpus);
+acpi_dsdt_add_cpus(scope, vms);
 acpi_dsdt_add_uart(scope, [VIRT_UART],
(irqmap[VIRT_UART] + ARM_SPI_BASE));
 if (vmc->acpi_expose_flash) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d23b941020..b6cebb5549 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1977,6 +1977,9 @@ static void machvirt_init(MachineState *machine)
 
 qdev_realize(DEVICE(cpuobj), NULL, _fatal);
 object_unref(cpuobj);
+
+/* Initialize cpu member here since cpu hotplug is not supported yet */
+machine->possible_cpus->cpus[n].cpu = cpuobj;
 }
 fdt_add_timer_nodes(vms);
 fdt_add_cpu_nodes(vms);
-- 
2.23.0




[RFC PATCH v3 05/13] hw: add compat machines for 5.3

2020-11-08 Thread Ying Fang
Add 5.3 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Ying Fang 
---
 hw/arm/virt.c  |  9 -
 hw/core/machine.c  |  3 +++
 hw/i386/pc.c   |  3 +++
 hw/i386/pc_piix.c  | 15 ++-
 hw/i386/pc_q35.c   | 14 +-
 hw/ppc/spapr.c | 15 +--
 hw/s390x/s390-virtio-ccw.c | 14 +-
 include/hw/boards.h|  3 +++
 include/hw/i386/pc.h   |  3 +++
 9 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ba902b53ba..ff8a14439e 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2665,10 +2665,17 @@ static void machvirt_machine_init(void)
 }
 type_init(machvirt_machine_init);
 
+static void virt_machine_5_3_options(MachineClass *mc)
+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(5, 3)
+
 static void virt_machine_5_2_options(MachineClass *mc)
 {
+virt_machine_5_3_options(mc);
+compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
 }
-DEFINE_VIRT_MACHINE_AS_LATEST(5, 2)
+DEFINE_VIRT_MACHINE(5, 2)
 
 static void virt_machine_5_1_options(MachineClass *mc)
 {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 7e2f4ec08e..6dc77699a9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -28,6 +28,9 @@
 #include "hw/mem/nvdimm.h"
 #include "migration/vmstate.h"
 
+GlobalProperty hw_compat_5_2[] = { };
+const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
+
 GlobalProperty hw_compat_5_1[] = {
 { "vhost-scsi", "num_queues", "1"},
 { "vhost-user-blk", "num-queues", "1"},
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e87be5d29a..eaa046ff5d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -97,6 +97,9 @@
 #include "trace.h"
 #include CONFIG_DEVICES
 
+GlobalProperty pc_compat_5_2[] = { };
+const size_t pc_compat_5_2_len = G_N_ELEMENTS(pc_compat_5_2);
+
 GlobalProperty pc_compat_5_1[] = {
 { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
 };
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 3c2ae0612b..01254090ce 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -426,7 +426,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
 }
 
-static void pc_i440fx_5_2_machine_options(MachineClass *m)
+static void pc_i440fx_5_3_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_machine_options(m);
@@ -435,6 +435,19 @@ static void pc_i440fx_5_2_machine_options(MachineClass *m)
 pcmc->default_cpu_version = 1;
 }
 
+DEFINE_I440FX_MACHINE(v5_3, "pc-i440fx-5.3", NULL,
+  pc_i440fx_5_3_machine_options);
+
+static void pc_i440fx_5_2_machine_options(MachineClass *m)
+{
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+pc_i440fx_machine_options(m);
+m->alias = NULL;
+m->is_default = false;
+compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);
+}
+
 DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL,
   pc_i440fx_5_2_machine_options);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index a3f4959c43..dd14803edb 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -344,7 +344,7 @@ static void pc_q35_machine_options(MachineClass *m)
 m->max_cpus = 288;
 }
 
-static void pc_q35_5_2_machine_options(MachineClass *m)
+static void pc_q35_5_3_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_machine_options(m);
@@ -352,6 +352,18 @@ static void pc_q35_5_2_machine_options(MachineClass *m)
 pcmc->default_cpu_version = 1;
 }
 
+DEFINE_Q35_MACHINE(v5_3, "pc-q35-5.3", NULL,
+   pc_q35_5_3_machine_options);
+
+static void pc_q35_5_2_machine_options(MachineClass *m)
+{
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+pc_q35_machine_options(m);
+m->alias = NULL;
+compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);
+}
+
 DEFINE_Q35_MACHINE(v5_2, "pc-q35-5.2", NULL,
pc_q35_5_2_machine_options);
 
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2db810f73a..c292a3edd9 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4511,15 +4511,26 @@ static void 
spapr_machine_latest_class_options(MachineClass *mc)
 }\
 type_init(spapr_machine_register_##suffix)
 
+/*
+ * pseries-5.3
+ */
+static void spapr_machine_5_3_class_options(MachineClass *mc)
+{
+/* Defaults for the latest behaviour inherited from the base class */
+}
+
+DEFINE_SPAPR_MACHINE(5_3, "5.3", true);
+
 /*
  * pseries-5.2
  */
 static void spapr

[RFC PATCH v3 02/13] hw/arm/virt: Remove unused variable

2020-11-08 Thread Ying Fang
From: Andrew Jones 

We no longer use the smp_cpus virtual machine state variable.
Remove it.

Signed-off-by: Andrew Jones 
---
 hw/arm/virt.c | 2 --
 include/hw/arm/virt.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0069fa1298..ea24b576c6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1820,8 +1820,6 @@ static void machvirt_init(MachineState *machine)
 exit(1);
 }
 
-vms->smp_cpus = smp_cpus;
-
 if (vms->virt && kvm_enabled()) {
 error_report("mach-virt: KVM does not support providing "
  "Virtualization extensions to the guest CPU");
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 953d94acc0..010f24f580 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -151,7 +151,6 @@ struct VirtMachineState {
 MemMapEntry *memmap;
 char *pciehb_nodename;
 const int *irqmap;
-int smp_cpus;
 void *fdt;
 int fdt_size;
 uint32_t clock_phandle;
-- 
2.23.0




[RFC PATCH v3 09/13] hw/arm/virt-acpi-build: add PPTT table

2020-11-08 Thread Ying Fang
Add the Processor Properties Topology Table (PPTT) to present cpu topology
information to the guest.

Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 42 
 1 file changed, 42 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9edd6385dc..5784370257 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -429,6 +429,42 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  "SRAT", table_data->len - srat_start, 3, NULL, NULL);
 }
 
+static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState 
*ms)
+{
+int pptt_start = table_data->len;
+int uid = 0, cpus = 0, socket;
+unsigned int smp_cores = ms->smp.cores;
+unsigned int smp_threads = ms->smp.threads;
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
+uint32_t socket_offset = table_data->len - pptt_start;
+int core;
+
+build_socket_hierarchy(table_data, 0, socket);
+
+for (core = 0; core < smp_cores; core++) {
+uint32_t core_offset = table_data->len - pptt_start;
+int thread;
+
+if (smp_threads <= 1) {
+build_processor_hierarchy(table_data, 2, socket_offset, uid++);
+ } else {
+build_processor_hierarchy(table_data, 0, socket_offset, core);
+for (thread = 0; thread < smp_threads; thread++) {
+build_smt_hierarchy(table_data, core_offset, uid++);
+}
+ }
+}
+cpus += smp_cores * smp_threads;
+}
+
+build_header(linker, table_data,
+ (void *)(table_data->data + pptt_start), "PPTT",
+ table_data->len - pptt_start, 2, NULL, NULL);
+}
+
 /* GTDT */
 static void
 build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -669,6 +705,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 unsigned dsdt, xsdt;
 GArray *tables_blob = tables->table_data;
 MachineState *ms = MACHINE(vms);
+bool cpu_topology_enabled = !vmc->ignore_cpu_topology;
 
 table_offsets = g_array_new(false, true /* clear */,
 sizeof(uint32_t));
@@ -688,6 +725,11 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 acpi_add_table(table_offsets, tables_blob);
 build_madt(tables_blob, tables->linker, vms);
 
+if (cpu_topology_enabled) {
+acpi_add_table(table_offsets, tables_blob);
+build_pptt(tables_blob, tables->linker, ms);
+}
+
 acpi_add_table(table_offsets, tables_blob);
 build_gtdt(tables_blob, tables->linker, vms);
 
-- 
2.23.0




[RFC PATCH v3 00/13] hw/arm/virt: Introduce cpu and cache topology support

2020-11-08 Thread Ying Fang
An accurate cpu topology may help improve the cpu scheduler's decision
making when dealing with multi-core system. So cpu topology description
is helpful to provide guest with the right view. Cpu cache information may
also have slight impact on the sched domain, and even userspace software
may check the cpu cache information to do some optimizations. Dario Faggioli's
talk in [0] also shows the virtual topology may has impact on sched performace.
Thus this patch series is posted to provide cpu and cache topology support
for arm platform.

Both fdt and ACPI are introduced to present the cpu and cache topology.
To describe the cpu topology via ACPI, a PPTT table is introduced according
to the processor hierarchy node structure. To describe the cpu cache
information, a default cache hierarchy is given and built according to the
cache type structure defined by ACPI, it can be made configurable later.

The RFC v1 was posted at [1], we tried to map the MPIDR register into cpu
topology, however it is totally wrong. Andrew points it out that Linux kernel
is goint to stop using MPIDR for topology information [2]. The root cause is
the MPIDR register has been abused by ARM OEM manufactures. It is only used as
an identifer for a specific cpu, not representation of the topology. Moreover
this v2 is rebased on Andrew's latest branch shared [4].

This patch series was initially based on the patches posted by Andrew Jones [3].
I jumped in on it since some OS vendor cooperative partner are eager for it.
Thanks for Andrew's contribution.

After applying this patch series, launch a guest with virt-5.3 and cpu
topology configured with sockets:cores:threads = 2:4:2, you will get the
bellow messages with the lscpu command.

-
Architecture:aarch64
CPU op-mode(s):  64-bit
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):   2
NUMA node(s):2
Vendor ID:   HiSilicon
Model:   0
Model name:  Kunpeng-920
Stepping:0x1
BogoMIPS:200.00
L1d cache:   512 KiB
L1i cache:   512 KiB
L2 cache:4 MiB
L3 cache:128 MiB
NUMA node0 CPU(s):   0-7
NUMA node1 CPU(s):   8-15


changelog
v2 -> v3:
- Make use of possible_cpus->cpus[i].cpu to check against current online cpus

v1 -> v2:
- Rebased to the latest branch shared by Andrew Jones [4]
- Stop mapping MPIDR into vcpu topology

[0] 
https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse
[1] https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg06027.html
[2] 
https://patchwork.kernel.org/project/linux-arm-kernel/patch/20200829130016.26106-1-valentin.schnei...@arm.com/
[3] 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com
[4] https://github.com/rhdrjones/qemu/commits/virt-cpu-topology-refresh 

Andrew Jones (5):
  hw/arm/virt: Spell out smp.cpus and smp.max_cpus
  hw/arm/virt: Remove unused variable
  hw/arm/virt: Replace smp_parse with one that prefers cores
  device_tree: Add qemu_fdt_add_path
  hw/arm/virt: DT: add cpu-map

Ying Fang (8):
  hw: add compat machines for 5.3
  hw/arm/virt-acpi-build: distinguish possible and present cpus
  hw/acpi/aml-build: add processor hierarchy node structure
  hw/arm/virt-acpi-build: add PPTT table
  target/arm/cpu: Add cpu cache description for arm
  hw/arm/virt: add fdt cache information
  hw/acpi/aml-build: Build ACPI cpu cache hierarchy information
  hw/arm/virt-acpi-build: Enable cpu and cache topology

 device_tree.c|  45 +-
 hw/acpi/aml-build.c  |  68 +
 hw/arm/virt-acpi-build.c |  99 -
 hw/arm/virt.c| 273 +++
 hw/core/machine.c|   3 +
 hw/i386/pc.c |   3 +
 hw/i386/pc_piix.c|  15 +-
 hw/i386/pc_q35.c |  14 +-
 hw/ppc/spapr.c   |  15 +-
 hw/s390x/s390-virtio-ccw.c   |  14 +-
 include/hw/acpi/acpi-defs.h  |  14 ++
 include/hw/acpi/aml-build.h  |  11 ++
 include/hw/arm/virt.h|   4 +-
 include/hw/boards.h  |   3 +
 include/hw/i386/pc.h |   3 +
 include/sysemu/device_tree.h |   1 +
 target/arm/cpu.c |  42 ++
 target/arm/cpu.h |  27 
 18 files changed, 609 insertions(+), 45 deletions(-)

-- 
2.23.0




[RFC PATCH v3 12/13] hw/acpi/aml-build: Build ACPI cpu cache hierarchy information

2020-11-08 Thread Ying Fang
To build cache information, An AcpiCacheInfo structure is defined to
hold the type 1 cache structure according to ACPI spec v6.3 5.2.29.2.
A helper function build_cache_hierarchy is also introduced to encode
the cache information.

Signed-off-by: Ying Fang 
---
 hw/acpi/aml-build.c | 26 ++
 include/hw/acpi/acpi-defs.h |  8 
 include/hw/acpi/aml-build.h |  3 +++
 3 files changed, 37 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index d1aa9fd716..1a38110149 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1770,6 +1770,32 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms)
  table_data->len - slit_start, 1, NULL, NULL);
 }
 
+/* ACPI 6.3: 5.29.2 Cache type structure (Type 1) */
+static void build_cache_head(GArray *tbl, uint32_t next_level)
+{
+build_append_byte(tbl, 1);
+build_append_byte(tbl, 24);
+build_append_int_noprefix(tbl, 0, 2);
+build_append_int_noprefix(tbl, 0x7f, 4);
+build_append_int_noprefix(tbl, next_level, 4);
+}
+
+static void build_cache_tail(GArray *tbl, AcpiCacheInfo *cache_info)
+{
+build_append_int_noprefix(tbl, cache_info->size, 4);
+build_append_int_noprefix(tbl, cache_info->sets, 4);
+build_append_byte(tbl, cache_info->associativity);
+build_append_byte(tbl, cache_info->attributes);
+build_append_int_noprefix(tbl, cache_info->line_size, 2);
+}
+
+void build_cache_hierarchy(GArray *tbl,
+  uint32_t next_level, AcpiCacheInfo *cache_info)
+{
+build_cache_head(tbl, next_level);
+build_cache_tail(tbl, cache_info);
+}
+
 /*
  * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
  */
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 38a42f409a..3df38ab449 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -618,4 +618,12 @@ struct AcpiIortRC {
 } QEMU_PACKED;
 typedef struct AcpiIortRC AcpiIortRC;
 
+typedef struct AcpiCacheInfo {
+uint32_t size;
+uint32_t sets;
+uint8_t  associativity;
+uint8_t  attributes;
+uint16_t line_size;
+} AcpiCacheInfo;
+
 #endif
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 56474835a7..01078753a8 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -437,6 +437,9 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
 
+void build_cache_hierarchy(GArray *tbl,
+  uint32_t next_level, AcpiCacheInfo *cache_info);
+
 void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
 
 void build_processor_hierarchy(GArray *tbl, uint32_t flags,
-- 
2.23.0




[RFC PATCH v3 04/13] device_tree: Add qemu_fdt_add_path

2020-11-08 Thread Ying Fang
From: Andrew Jones 

qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except
it also adds any missing parent nodes. We also tweak an error
message of qemu_fdt_add_subnode().

We'll make use of the new function in a coming patch.

Signed-off-by: Andrew Jones 
---
 device_tree.c| 45 ++--
 include/sysemu/device_tree.h |  1 +
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/device_tree.c b/device_tree.c
index b335dae707..c080909bb9 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 
 retval = fdt_add_subnode(fdt, parent, basename);
 if (retval < 0) {
-error_report("FDT: Failed to create subnode %s: %s", name,
- fdt_strerror(retval));
+error_report("%s: Failed to create subnode %s: %s",
+ __func__, name, fdt_strerror(retval));
 exit(1);
 }
 
@@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 return retval;
 }
 
+/*
+ * Like qemu_fdt_add_subnode(), but will add all missing
+ * subnodes in the path.
+ */
+int qemu_fdt_add_path(void *fdt, const char *path)
+{
+char *dupname, *basename, *p;
+int parent, retval = -1;
+
+if (path[0] != '/') {
+return retval;
+}
+
+parent = fdt_path_offset(fdt, "/");
+p = dupname = g_strdup(path);
+
+while (p) {
+*p = '/';
+basename = p + 1;
+p = strchr(p + 1, '/');
+if (p) {
+*p = '\0';
+}
+retval = fdt_path_offset(fdt, dupname);
+if (retval < 0 && retval != -FDT_ERR_NOTFOUND) {
+error_report("%s: Invalid path %s: %s",
+ __func__, path, fdt_strerror(retval));
+exit(1);
+} else if (retval == -FDT_ERR_NOTFOUND) {
+retval = fdt_add_subnode(fdt, parent, basename);
+if (retval < 0) {
+break;
+}
+}
+parent = retval;
+}
+
+g_free(dupname);
+return retval;
+}
+
 void qemu_fdt_dumpdtb(void *fdt, int size)
 {
 const char *dumpdtb = qemu_opt_get(qemu_get_machine_opts(), "dumpdtb");
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 982c89345f..15fb98af98 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
 uint32_t qemu_fdt_alloc_phandle(void *fdt);
 int qemu_fdt_nop_node(void *fdt, const char *node_path);
 int qemu_fdt_add_subnode(void *fdt, const char *name);
+int qemu_fdt_add_path(void *fdt, const char *path);
 
 #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \
 do {  \
-- 
2.23.0




[RFC PATCH v3 06/13] hw/arm/virt: DT: add cpu-map

2020-11-08 Thread Ying Fang
From: Andrew Jones 

Support devicetree CPU topology descriptions.

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
 hw/arm/virt.c | 40 +++-
 include/hw/arm/virt.h |  1 +
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ff8a14439e..d23b941020 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -351,9 +351,10 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 int cpu;
 int addr_cells = 1;
 const MachineState *ms = MACHINE(vms);
+VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
 
 /*
- * From Documentation/devicetree/bindings/arm/cpus.txt
+ * See Linux Documentation/devicetree/bindings/arm/cpus.yaml
  *  On ARM v8 64-bit systems value should be set to 2,
  *  that corresponds to the MPIDR_EL1 register size.
  *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
@@ -407,8 +408,42 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
 }
 
+if (ms->smp.cpus > 1 && !vmc->ignore_cpu_topology) {
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+}
+
 g_free(nodename);
 }
+
+if (ms->smp.cpus > 1 && !vmc->ignore_cpu_topology) {
+/*
+ * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt
+ */
+qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map");
+
+for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
+char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu);
+char *map_path;
+
+if (ms->smp.threads > 1) {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d/%s%d",
+"cluster", cpu / (ms->smp.cores * ms->smp.threads),
+"core", (cpu / ms->smp.threads) % ms->smp.cores,
+"thread", cpu % ms->smp.threads);
+} else {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d",
+"cluster", cpu / ms->smp.cores,
+"core", cpu % ms->smp.cores);
+}
+qemu_fdt_add_path(vms->fdt, map_path);
+qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path);
+g_free(map_path);
+g_free(cpu_path);
+}
+}
 }
 
 static void fdt_add_its_gic_node(VirtMachineState *vms)
@@ -2672,8 +2707,11 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 3)
 
 static void virt_machine_5_2_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
 virt_machine_5_3_options(mc);
 compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+vmc->ignore_cpu_topology = true;
 }
 DEFINE_VIRT_MACHINE(5, 2)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 010f24f580..917bd8b645 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -118,6 +118,7 @@ typedef enum VirtGICType {
 struct VirtMachineClass {
 MachineClass parent;
 bool disallow_affinity_adjustment;
+bool ignore_cpu_topology;
 bool no_its;
 bool no_pmu;
 bool claim_edge_triggered_timers;
-- 
2.23.0




Re: Question on UEFI ACPI tables setup and probing on arm64

2020-11-04 Thread Ying Fang




On 11/5/2020 5:46 AM, Laszlo Ersek wrote:

+Ard, +Drew

On 11/03/20 13:39, Igor Mammedov wrote:

On Fri, 30 Oct 2020 10:50:01 +0800
Ying Fang  wrote:


Hi,

I have a question on UEFI/ACPI tables setup and probing on arm64 platform.


CCing Laszlo,
who might know how it's implemented.
  

Currently on arm64 platform guest can be booted with both fdt and ACPI
supported. If ACPI is enabled, [1] says the only defined method for
passing ACPI tables to the kernel is via the UEFI system configuration
table. So AFAIK, ACPI Should be dependent on UEFI.


That's correct. The ACPI entry point (RSD PTR) on AARCH64 is defined in
terms of UEFI.



What's more [2] says UEFI kernel support on the ARM architectures
is only available through a *stub*. The stub populates the FDT /chosen
node with some UEFI parameters describing the UEFI location info.


Yes.



So i dump /sys/firmware/fdt from the guest, it does have something like:

/dts-v1/;

/ {
#size-cells = <0x02>;
#address-cells = <0x02>;

chosen {
linux,uefi-mmap-desc-ver = <0x01>;
linux,uefi-mmap-desc-size = <0x30>;
linux,uefi-mmap-size = <0x810>;
linux,uefi-mmap-start = <0x04 0x3c0ce018>;
linux,uefi-system-table = <0x04 0x3f8b0018>;
bootargs = 
"BOOT_IMAGE=/vmlinuz-4.19.90-2003.4.0.0036.oe1.aarch64
root=/dev/mapper/openeuler-root ro rd.lvm.lv=openeuler/root
rd.lvm.lv=openeuler/swap video=VGA-1:640x480-32@60me
smmu.bypassdev=0x1000:0x17 smmu.bypassdev=0x1000:0x15
crashkernel=1024M,high video=efifb:off video=VGA-1:640x480-32@60me";
linux,initrd-end = <0x04 0x3a85a5da>;
linux,initrd-start = <0x04 0x392f2000>;
};
};

But the question is that I did not see any code adding the uefi
in fdt chosen node in *arm_load_dtb* or anywhere else.


That's because the "UEFI stub" is a part of the guest kernel. It wraps
the guest kernel image into a UEFI application binary. For a while, the
guest kernel runs as a UEFI application, stashing some UEFI artifacts in
*a* device tree, and then (after some other heavy lifting) jumping into
the kernel proper.


Qemu only maps the OVMF binary file into a pflash device.
So I'm really confused on how UEFI information is provided to
guest by qemu. Does anybody know of the details about it ?


It's complex, unfortunately.

(1) QEMU always generates a DTB for the guest firmware. This DTB is
placed at the base of the guest RAM.

See the arm_load_dtb() call in virt_machine_done() [hw/arm/virt.c] in
QEMU. I think.


Hi Laszlo. Thanks so much for sharing the details with us.
The reply nearly covers the boot sequence of aarch64 on the whole.

I see it in Qemu the *loader_start* is fixed at 1 GiB on the
physical address space which points to the DRAM base. In ArmVirtQemu.dsc
PcdDeviceTreeInitialBaseAddress is set 0x4000 with correspondence.

Here I also see the discussion about DRAM base for ArmVirtQemu.
https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03127.html

I am still not sure how UEFI knows that it is running on a ArmVirtQemu
machine type. Does UEFI derive it from the fdt *compatible* property ?




(2) QEMU generates ACPI content, and exposes it via fw_cfg.

See the virt_acpi_setup() call in the same virt_machine_done() function
[hw/arm/virt.c] in QEMU.


(3) The fw_cfg device itself is apparent to the guest firmware via the
DTB from point (1). See the following steps in edk2:

(3a) "ArmVirtPkg/Library/PlatformPeiLib/PlatformPeiLib.c"

This saves the initial DTB (from the base of guest RAM, where it could
be overwritten by whatever) to a dynamically allocated area. This
"stashing" occurs early.

(3b) "ArmVirtPkg/FdtClientDxe/FdtClientDxe.c"

This driver exposes the (dynamically reallocated / copied) DTB via a
custom UEFI protocol to the rest of the firmware. (This happens much
later.) This protocol / driver can be considered the "owner" of the
stashed DTB from (3a).

(3c) "ArmVirtPkg/Library/QemuFwCfgLib/QemuFwCfgLib.c"

This is the fw_cfg device access library, discovering the fw_cfg
registers via the above UEFI protocol. The library is linked into each
firmware module that needs fw_cfg access.


(4) The firmware interprets QEMU's DTB for actual content (parsing
values, configuring hardware, accessing devices).

This occurs in a whole bunch of locations, mostly via consuming the
custom protocol from (3b). Some info that's needed very early is parsed
out of the DTB right in step (3a).


(5) The guest firmware has a dedicated driver that checks whether QEMU
was configured with ACPI enabled or disabled, and publishes that choice
to the rest of the firmware. This is necessary because some firmware
actions / infrastructure parts cannot (must not) proceed until this
decision has been interpreted.

See in edk2:

- ArmVirtPkg/PlatformHasAcpiDtDxe

This 

Re: [RFC PATCH v2 07/13] hw/arm/virt-acpi-build: distinguish possible and present cpus Message

2020-11-02 Thread Ying Fang




On 10/30/2020 1:20 AM, Andrew Jones wrote:


You need to remove 'Message' from the summary.

On Tue, Oct 20, 2020 at 09:14:34PM +0800, Ying Fang wrote:

When building ACPI tables regarding CPUs we should always build
them for the number of possible CPUs, not the number of present
CPUs. We then ensure only the present CPUs are enabled.

Signed-off-by: Andrew Jones 


I guess my s-o-b is here because this is a rework of

https://github.com/rhdrjones/qemu/commit/b18d7a889f424b8a8679c43d7f4804fdeeeaf3fd

I think it changed enough you could just drop my authorship. A
based-on comment in the commit message would be more than enough.

Comment on the patch below.


Signed-off-by: Ying Fang 
---
  hw/arm/virt-acpi-build.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a222981737..fae5a26741 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -57,14 +57,18 @@
  
  #define ARM_SPI_BASE 32
  
-static void acpi_dsdt_add_cpus(Aml *scope, int cpus)

+static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
  {
  uint16_t i;
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
  
-for (i = 0; i < cpus; i++) {

+for (i = 0; i < possible_cpus->len; i++) {
  Aml *dev = aml_device("C%.03X", i);
  aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
  aml_append(dev, aml_name_decl("_UID", aml_int(i)));
+if (possible_cpus->cpus[i].cpu == NULL) {
+aml_append(dev, aml_name_decl("_STA", aml_int(0)));
+}
  aml_append(scope, dev);
  }
  }
@@ -470,6 +474,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  const int *irqmap = vms->irqmap;
  AcpiMadtGenericDistributor *gicd;
  AcpiMadtGenericMsiFrame *gic_msi;
+int possible_cpus = MACHINE(vms)->possible_cpus->len;
  int i;
  
  acpi_data_push(table_data, sizeof(AcpiMultipleApicTable));

@@ -480,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
  gicd->version = vms->gic_version;
  
-for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {

+for (i = 0; i < possible_cpus; i++) {
  AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
 sizeof(*gicc));
  ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -495,7 +500,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  gicc->cpu_interface_number = cpu_to_le32(i);
  gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity);
  gicc->uid = cpu_to_le32(i);
-gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+if (i < MACHINE(vms)->smp.cpus) {


Shouldn't this be

 if (possible_cpus->cpus[i].cpu != NULL) {


+gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+}


I now realized that I switched to use current cpu number as the limit to
make GIC flags enabled here.
However to judge NULL is much more suitable here.

Thanks,
Ying.

  
  if (arm_feature(>env, ARM_FEATURE_PMU)) {

  gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ));
@@ -599,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
   * the RTC ACPI device at all when using UEFI.
   */
  scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, ms->smp.cpus);
+acpi_dsdt_add_cpus(scope, vms);
  acpi_dsdt_add_uart(scope, [VIRT_UART],
 (irqmap[VIRT_UART] + ARM_SPI_BASE));
  if (vmc->acpi_expose_flash) {
--
2.23.0




Thanks,
drew

.





Re: [RFC PATCH v2 09/13] hw/arm/virt-acpi-build: add PPTT table

2020-11-02 Thread Ying Fang




On 10/30/2020 12:56 AM, Andrew Jones wrote:

On Tue, Oct 20, 2020 at 09:14:36PM +0800, Ying Fang wrote:

Add the Processor Properties Topology Table (PPTT) to present CPU topology
information to the guest.

Signed-off-by: Andrew Jones 


I don't know why I have an s-o-b here. I guess it's because this code
looks nearly identical to what I wrote, except for using the new and,
IMO, unnecessary build_socket_hierarchy and build_smt_hierarchy functions.

IMHO, you should drop the last patch and just take

https://github.com/rhdrjones/qemu/commit/439b38d67ca1f2cbfa5b9892a822b651ebd05c11

as it is, unless it needs to be fixed somehow

Thanks,
drew


This patch is based on your branch however it is slightly modified.
As described in:

[RFC,v2,08/13] hw/acpi/aml-build: add processor hierarchy node structure

The wrapper function build_socket_hierarchy and build_smt_hierarchy are
introduced to make later patch much more readable and make preparations 
for cache hierarchy.


Hope it won't make you confused. I will drop your branch patch and
give details in the commit message in the next post.

Thanks,
Ying



Signed-off-by: Ying Fang 
---
  hw/arm/virt-acpi-build.c | 42 
  1 file changed, 42 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index fae5a26741..e1f3ea50ad 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -429,6 +429,42 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
   "SRAT", table_data->len - srat_start, 3, NULL, NULL);
  }
  
+static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms)

+{
+int pptt_start = table_data->len;
+int uid = 0, cpus = 0, socket;
+unsigned int smp_cores = ms->smp.cores;
+unsigned int smp_threads = ms->smp.threads;
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
+uint32_t socket_offset = table_data->len - pptt_start;
+int core;
+
+build_socket_hierarchy(table_data, 0, socket);
+
+for (core = 0; core < smp_cores; core++) {
+uint32_t core_offset = table_data->len - pptt_start;
+int thread;
+
+if (smp_threads <= 1) {
+build_processor_hierarchy(table_data, 2, socket_offset, uid++);
+ } else {
+build_processor_hierarchy(table_data, 0, socket_offset, core);
+for (thread = 0; thread < smp_threads; thread++) {
+build_smt_hierarchy(table_data, core_offset, uid++);
+}
+ }
+}
+cpus += smp_cores * smp_threads;
+}
+
+build_header(linker, table_data,
+ (void *)(table_data->data + pptt_start), "PPTT",
+ table_data->len - pptt_start, 2, NULL, NULL);
+}
+
  /* GTDT */
  static void
  build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -669,6 +705,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
  unsigned dsdt, xsdt;
  GArray *tables_blob = tables->table_data;
  MachineState *ms = MACHINE(vms);
+bool cpu_topology_enabled = !vmc->ignore_cpu_topology;
  
  table_offsets = g_array_new(false, true /* clear */,

  sizeof(uint32_t));
@@ -688,6 +725,11 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
  acpi_add_table(table_offsets, tables_blob);
  build_madt(tables_blob, tables->linker, vms);
  
+if (cpu_topology_enabled) {

+acpi_add_table(table_offsets, tables_blob);
+build_pptt(tables_blob, tables->linker, ms);
+}
+
  acpi_add_table(table_offsets, tables_blob);
  build_gtdt(tables_blob, tables->linker, vms);
  
--

2.23.0




.





Re: [RFC PATCH v2 08/13] hw/acpi/aml-build: add processor hierarchy node structure

2020-11-02 Thread Ying Fang




On 10/30/2020 1:24 AM, Andrew Jones wrote:

On Tue, Oct 20, 2020 at 09:14:35PM +0800, Ying Fang wrote:

Add the processor hierarchy node structures to build ACPI information
for CPU topology. Three helpers are introduced:

(1) build_socket_hierarchy for socket description structure
(2) build_processor_hierarchy for processor description structure
(3) build_smt_hierarchy for thread (logic processor) description structure


I see now the reason to introduce three functions is because the last
patch adds different private resources. You should point that plan out
in this commit message.


Yes, the private resources are used to describe cache hierarchy
and it is variable among different topology level. I will point it
out in the commit message to avoid any confusion.

Thanks,
Ying



Thanks,
drew



Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
  hw/acpi/aml-build.c | 37 +
  include/hw/acpi/aml-build.h |  7 +++
  2 files changed, 44 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 3792ba96ce..da3b41b514 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1770,6 +1770,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms)
   table_data->len - slit_start, 1, NULL, NULL);
  }
  
+/*

+ * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
+ */
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */
+build_append_int_noprefix(tbl, parent, 4);  /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);/* Type 0 - processor */
+build_append_byte(tbl, 20);   /* Length, add private resources */
+build_append_int_noprefix(tbl, 0, 2); /* Reserved */
+build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */
+build_append_int_noprefix(tbl, parent , 4); /* parent */
+build_append_int_noprefix(tbl, id, 4);  /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);   /* Num of private resources */
+}
+
  /* build rev1/rev3/rev5.1 FADT */
  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
  const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index fe0055fffb..56474835a7 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -437,6 +437,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
  
  void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
  
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);

+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id);
+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
  const char *oem_id, const char *oem_table_id);
  
--

2.23.0




.





Re: [RFC PATCH v2 05/13] hw: add compat machines for 5.3

2020-11-02 Thread Ying Fang




On 10/30/2020 1:08 AM, Andrew Jones wrote:

On Tue, Oct 20, 2020 at 09:14:32PM +0800, Ying Fang wrote:

Add 5.2 machine types for arm/i440fx/q35/s390x/spapr.

   ^ 5.3


Thanks. Will fix, careless spelling mistake.



Thanks,
drew



Signed-off-by: Ying Fang 
---
  hw/arm/virt.c  |  9 -
  hw/core/machine.c  |  3 +++
  hw/i386/pc.c   |  3 +++
  hw/i386/pc_piix.c  | 15 ++-
  hw/i386/pc_q35.c   | 14 +-
  hw/ppc/spapr.c | 15 +--
  hw/s390x/s390-virtio-ccw.c | 14 +-
  include/hw/boards.h|  3 +++
  include/hw/i386/pc.h   |  3 +++
  9 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ba902b53ba..ff8a14439e 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2665,10 +2665,17 @@ static void machvirt_machine_init(void)
  }
  type_init(machvirt_machine_init);
  
+static void virt_machine_5_3_options(MachineClass *mc)

+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(5, 3)
+
  static void virt_machine_5_2_options(MachineClass *mc)
  {
+virt_machine_5_3_options(mc);
+compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
  }
-DEFINE_VIRT_MACHINE_AS_LATEST(5, 2)
+DEFINE_VIRT_MACHINE(5, 2)
  
  static void virt_machine_5_1_options(MachineClass *mc)

  {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 7e2f4ec08e..6dc77699a9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -28,6 +28,9 @@
  #include "hw/mem/nvdimm.h"
  #include "migration/vmstate.h"
  
+GlobalProperty hw_compat_5_2[] = { };

+const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
+
  GlobalProperty hw_compat_5_1[] = {
  { "vhost-scsi", "num_queues", "1"},
  { "vhost-user-blk", "num-queues", "1"},
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e87be5d29a..eaa046ff5d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -97,6 +97,9 @@
  #include "trace.h"
  #include CONFIG_DEVICES
  
+GlobalProperty pc_compat_5_2[] = { };

+const size_t pc_compat_5_2_len = G_N_ELEMENTS(pc_compat_5_2);
+
  GlobalProperty pc_compat_5_1[] = {
  { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
  };
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 3c2ae0612b..01254090ce 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -426,7 +426,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
  machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
  }
  
-static void pc_i440fx_5_2_machine_options(MachineClass *m)

+static void pc_i440fx_5_3_machine_options(MachineClass *m)
  {
  PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
  pc_i440fx_machine_options(m);
@@ -435,6 +435,19 @@ static void pc_i440fx_5_2_machine_options(MachineClass *m)
  pcmc->default_cpu_version = 1;
  }
  
+DEFINE_I440FX_MACHINE(v5_3, "pc-i440fx-5.3", NULL,

+  pc_i440fx_5_3_machine_options);
+
+static void pc_i440fx_5_2_machine_options(MachineClass *m)
+{
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+pc_i440fx_machine_options(m);
+m->alias = NULL;
+m->is_default = false;
+compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);
+}
+
  DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL,
pc_i440fx_5_2_machine_options);
  
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c

index a3f4959c43..dd14803edb 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -344,7 +344,7 @@ static void pc_q35_machine_options(MachineClass *m)
  m->max_cpus = 288;
  }
  
-static void pc_q35_5_2_machine_options(MachineClass *m)

+static void pc_q35_5_3_machine_options(MachineClass *m)
  {
  PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
  pc_q35_machine_options(m);
@@ -352,6 +352,18 @@ static void pc_q35_5_2_machine_options(MachineClass *m)
  pcmc->default_cpu_version = 1;
  }
  
+DEFINE_Q35_MACHINE(v5_3, "pc-q35-5.3", NULL,

+   pc_q35_5_3_machine_options);
+
+static void pc_q35_5_2_machine_options(MachineClass *m)
+{
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+pc_q35_machine_options(m);
+m->alias = NULL;
+compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);
+}
+
  DEFINE_Q35_MACHINE(v5_2, "pc-q35-5.2", NULL,
 pc_q35_5_2_machine_options);
  
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c

index 2db810f73a..c292a3edd9 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4511,15 +4511,26 @@ static void 
spapr_machine_latest_class_options(MachineClass *mc)
  }\
  type_init(spapr_machine_register_##suffix)
  
+/*

+ * pseries-

Re: [RFC PATCH v2 07/13] hw/arm/virt-acpi-build: distinguish possible and present cpus Message

2020-11-01 Thread Ying Fang




On 10/30/2020 1:20 AM, Andrew Jones wrote:


You need to remove 'Message' from the summary.

On Tue, Oct 20, 2020 at 09:14:34PM +0800, Ying Fang wrote:

When building ACPI tables regarding CPUs we should always build
them for the number of possible CPUs, not the number of present
CPUs. We then ensure only the present CPUs are enabled.

Signed-off-by: Andrew Jones 


I guess my s-o-b is here because this is a rework of

https://github.com/rhdrjones/qemu/commit/b18d7a889f424b8a8679c43d7f4804fdeeeaf3fd


The s-o-b is given since this one is based on your branch.



I think it changed enough you could just drop my authorship. A
based-on comment in the commit message would be more than enough.


Thanks. Will fix it. Hope it won't make you confused.



Comment on the patch below.


Signed-off-by: Ying Fang 
---
  hw/arm/virt-acpi-build.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a222981737..fae5a26741 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -57,14 +57,18 @@
  
  #define ARM_SPI_BASE 32
  
-static void acpi_dsdt_add_cpus(Aml *scope, int cpus)

+static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
  {
  uint16_t i;
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
  
-for (i = 0; i < cpus; i++) {

+for (i = 0; i < possible_cpus->len; i++) {
  Aml *dev = aml_device("C%.03X", i);
  aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
  aml_append(dev, aml_name_decl("_UID", aml_int(i)));
+if (possible_cpus->cpus[i].cpu == NULL) {
+aml_append(dev, aml_name_decl("_STA", aml_int(0)));
+}
  aml_append(scope, dev);
  }
  }
@@ -470,6 +474,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  const int *irqmap = vms->irqmap;
  AcpiMadtGenericDistributor *gicd;
  AcpiMadtGenericMsiFrame *gic_msi;
+int possible_cpus = MACHINE(vms)->possible_cpus->len;
  int i;
  
  acpi_data_push(table_data, sizeof(AcpiMultipleApicTable));

@@ -480,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
  gicd->version = vms->gic_version;
  
-for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {

+for (i = 0; i < possible_cpus; i++) {
  AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
 sizeof(*gicc));
  ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -495,7 +500,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  gicc->cpu_interface_number = cpu_to_le32(i);
  gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity);
  gicc->uid = cpu_to_le32(i);
-gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+if (i < MACHINE(vms)->smp.cpus) {


Shouldn't this be


Yes, Stupid mistake. Maybe it was lost when I am doing the rebase.
Will fix that. Thanks for your patience in the reply and review.

Ying Fang.


 if (possible_cpus->cpus[i].cpu != NULL) {


+gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+}
  
  if (arm_feature(>env, ARM_FEATURE_PMU)) {

  gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ));
@@ -599,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
   * the RTC ACPI device at all when using UEFI.
   */
  scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, ms->smp.cpus);
+acpi_dsdt_add_cpus(scope, vms);
  acpi_dsdt_add_uart(scope, [VIRT_UART],
 (irqmap[VIRT_UART] + ARM_SPI_BASE));
  if (vmc->acpi_expose_flash) {
--
2.23.0




Thanks,
drew

.





Question on UEFI ACPI tables setup and probing on arm64

2020-10-29 Thread Ying Fang

Hi,

I have a question on UEFI/ACPI tables setup and probing on arm64 platform.

Currently on arm64 platform guest can be booted with both fdt and ACPI
supported. If ACPI is enabled, [1] says the only defined method for
passing ACPI tables to the kernel is via the UEFI system configuration
table. So AFAIK, ACPI Should be dependent on UEFI.

What's more [2] says UEFI kernel support on the ARM architectures
is only available through a *stub*. The stub populates the FDT /chosen
node with some UEFI parameters describing the UEFI location info.

So i dump /sys/firmware/fdt from the guest, it does have something like:

/dts-v1/;

/ {
#size-cells = <0x02>;
#address-cells = <0x02>;

chosen {
linux,uefi-mmap-desc-ver = <0x01>;
linux,uefi-mmap-desc-size = <0x30>;
linux,uefi-mmap-size = <0x810>;
linux,uefi-mmap-start = <0x04 0x3c0ce018>;
linux,uefi-system-table = <0x04 0x3f8b0018>;
		bootargs = "BOOT_IMAGE=/vmlinuz-4.19.90-2003.4.0.0036.oe1.aarch64 
root=/dev/mapper/openeuler-root ro rd.lvm.lv=openeuler/root 
rd.lvm.lv=openeuler/swap video=VGA-1:640x480-32@60me 
smmu.bypassdev=0x1000:0x17 smmu.bypassdev=0x1000:0x15 
crashkernel=1024M,high video=efifb:off video=VGA-1:640x480-32@60me";

linux,initrd-end = <0x04 0x3a85a5da>;
linux,initrd-start = <0x04 0x392f2000>;
};
};

But the question is that I did not see any code adding the uefi
in fdt chosen node in *arm_load_dtb* or anywhere else.
Qemu only maps the OVMF binary file into a pflash device.
So I'm really confused on how UEFI information is provided to
guest by qemu. Does anybody know of the details about it ?

[1] https://www.kernel.org/doc/html/latest/arm64/arm-acpi.html
[2] https://www.kernel.org/doc/Documentation/arm/uefi.rst

Thanks.
Ying



[RFC PATCH v2 13/13] hw/arm/virt-acpi-build: Enable CPU cache topology

2020-10-20 Thread Ying Fang
A helper struct AcpiCacheOffset is introduced to describe the offset
of three level caches. The cache hierarchy is built according to
ACPI spec v6.3 5.2.29.2. Let's enable CPU cache topology now.

Signed-off-by: Ying Fang 
---
 hw/acpi/aml-build.c | 19 +-
 hw/arm/virt-acpi-build.c| 52 -
 include/hw/acpi/acpi-defs.h |  6 +
 include/hw/acpi/aml-build.h |  7 ++---
 4 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 6f0e8df49b..f449fa27e7 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1799,27 +1799,32 @@ void build_cache_hierarchy(GArray *tbl,
 /*
  * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
  */
-void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+void build_socket_hierarchy(GArray *tbl, uint32_t parent,
+uint32_t offset, uint32_t id)
 {
 build_append_byte(tbl, 0);  /* Type 0 - processor */
-build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_byte(tbl, 24); /* Length, no private resources */
 build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
 build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */
 build_append_int_noprefix(tbl, parent, 4);  /* Parent */
 build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
-build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+build_append_int_noprefix(tbl, 1, 4);  /*  Number of private resources */
+build_append_int_noprefix(tbl, offset, 4);  /* Private resources */
 }
 
-void build_processor_hierarchy(GArray *tbl, uint32_t flags,
-   uint32_t parent, uint32_t id)
+void build_processor_hierarchy(GArray *tbl, uint32_t flags, uint32_t parent,
+   AcpiCacheOffset offset, uint32_t id)
 {
 build_append_byte(tbl, 0);  /* Type 0 - processor */
-build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_byte(tbl, 32); /* Length, no private resources */
 build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
 build_append_int_noprefix(tbl, flags, 4);  /* Flags */
 build_append_int_noprefix(tbl, parent, 4); /* Parent */
 build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
-build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+build_append_int_noprefix(tbl, 3, 4);  /* Number of private resources */
+build_append_int_noprefix(tbl, offset.l1d_offset, 4);/* Private resources 
*/
+build_append_int_noprefix(tbl, offset.l1i_offset, 4);/* Private resources 
*/
+build_append_int_noprefix(tbl, offset.l2_offset, 4); /* Private resources 
*/
 }
 
 void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index e1f3ea50ad..8a026ba24e 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -429,29 +429,69 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  "SRAT", table_data->len - srat_start, 3, NULL, NULL);
 }
 
-static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState 
*ms)
+static inline void arm_acpi_cache_info(CPUCacheInfo *cpu_cache,
+   AcpiCacheInfo *acpi_cache)
 {
+acpi_cache->size = cpu_cache->size;
+acpi_cache->sets = cpu_cache->sets;
+acpi_cache->associativity = cpu_cache->associativity;
+acpi_cache->attributes = cpu_cache->attributes;
+acpi_cache->line_size = cpu_cache->line_size;
+}
+
+static void build_pptt(GArray *table_data, BIOSLinker *linker,
+   VirtMachineState *vms)
+{
+MachineState *ms = MACHINE(vms);
 int pptt_start = table_data->len;
 int uid = 0, cpus = 0, socket;
 unsigned int smp_cores = ms->smp.cores;
 unsigned int smp_threads = ms->smp.threads;
+AcpiCacheOffset offset;
+ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpus));
+AcpiCacheInfo cache_info;
 
 acpi_data_push(table_data, sizeof(AcpiTableHeader));
 
 for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
-uint32_t socket_offset = table_data->len - pptt_start;
+uint32_t l3_offset = table_data->len - pptt_start;
+uint32_t socket_offset;
 int core;
 
-build_socket_hierarchy(table_data, 0, socket);
+/* L3 cache type structure */
+arm_acpi_cache_info(cpu->caches.l3_cache, _info);
+build_cache_hierarchy(table_data, 0, _info);
+
+socket_offset = table_data->len - pptt_start;
+build_socket_hierarchy(table_data, 0, l3_offset, socket);
 
 for (core = 0; core < smp_cores; core++) {
 uint32_t core_offset = table_data->len - pptt_start;
 int th

[RFC PATCH v2 04/13] device_tree: Add qemu_fdt_add_path

2020-10-20 Thread Ying Fang
From: Andrew Jones 

qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except
it also adds any missing parent nodes. We also tweak an error
message of qemu_fdt_add_subnode().

We'll make use of the new function in a coming patch.

Signed-off-by: Andrew Jones 
---
 device_tree.c| 45 ++--
 include/sysemu/device_tree.h |  1 +
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/device_tree.c b/device_tree.c
index b335dae707..c080909bb9 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 
 retval = fdt_add_subnode(fdt, parent, basename);
 if (retval < 0) {
-error_report("FDT: Failed to create subnode %s: %s", name,
- fdt_strerror(retval));
+error_report("%s: Failed to create subnode %s: %s",
+ __func__, name, fdt_strerror(retval));
 exit(1);
 }
 
@@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 return retval;
 }
 
+/*
+ * Like qemu_fdt_add_subnode(), but will add all missing
+ * subnodes in the path.
+ */
+int qemu_fdt_add_path(void *fdt, const char *path)
+{
+char *dupname, *basename, *p;
+int parent, retval = -1;
+
+if (path[0] != '/') {
+return retval;
+}
+
+parent = fdt_path_offset(fdt, "/");
+p = dupname = g_strdup(path);
+
+while (p) {
+*p = '/';
+basename = p + 1;
+p = strchr(p + 1, '/');
+if (p) {
+*p = '\0';
+}
+retval = fdt_path_offset(fdt, dupname);
+if (retval < 0 && retval != -FDT_ERR_NOTFOUND) {
+error_report("%s: Invalid path %s: %s",
+ __func__, path, fdt_strerror(retval));
+exit(1);
+} else if (retval == -FDT_ERR_NOTFOUND) {
+retval = fdt_add_subnode(fdt, parent, basename);
+if (retval < 0) {
+break;
+}
+}
+parent = retval;
+}
+
+g_free(dupname);
+return retval;
+}
+
 void qemu_fdt_dumpdtb(void *fdt, int size)
 {
 const char *dumpdtb = qemu_opt_get(qemu_get_machine_opts(), "dumpdtb");
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 982c89345f..15fb98af98 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
 uint32_t qemu_fdt_alloc_phandle(void *fdt);
 int qemu_fdt_nop_node(void *fdt, const char *node_path);
 int qemu_fdt_add_subnode(void *fdt, const char *name);
+int qemu_fdt_add_path(void *fdt, const char *path);
 
 #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \
 do {  \
-- 
2.23.0




[RFC PATCH v2 03/13] hw/arm/virt: Replace smp_parse with one that prefers cores

2020-10-20 Thread Ying Fang
From: Andrew Jones 

The virt machine type has never used the CPU topology parameters, other
than number of online CPUs and max CPUs. When choosing how to allocate
those CPUs the default has been to assume cores. In preparation for
using the other CPU topology parameters let's use an smp_parse that
prefers cores over sockets. We can also enforce the topology matches
max_cpus check because we have no legacy to preserve.

Signed-off-by: Andrew Jones 
---
 hw/arm/virt.c | 76 +++
 1 file changed, 76 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ea24b576c6..ba902b53ba 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -78,6 +78,8 @@
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/char/pl011.h"
 #include "qemu/guest-random.h"
+#include "qapi/qmp/qerror.h"
+#include "sysemu/replay.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
 static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -2444,6 +2446,79 @@ static int virt_kvm_type(MachineState *ms, const char 
*type_str)
 return requested_pa_size > 40 ? requested_pa_size : 0;
 }
 
+/*
+ * Unlike smp_parse() in hw/core/machine.c, we prefer cores over sockets,
+ * e.g. '-smp 8' creates 1 socket with 8 cores.  Whereas '-smp 8' with
+ * hw/core/machine.c's smp_parse() creates 8 sockets, each with 1 core.
+ * Additionally, we can enforce the topology matches max_cpus check,
+ * because we have no legacy to preserve.
+ */
+static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
+{
+if (opts) {
+unsigned cpus= qemu_opt_get_number(opts, "cpus", 0);
+unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
+unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
+unsigned threads = qemu_opt_get_number(opts, "threads", 0);
+
+/*
+ * Compute missing values; prefer cores over sockets and
+ * sockets over threads.
+ */
+if (cpus == 0 || cores == 0) {
+sockets = sockets > 0 ? sockets : 1;
+threads = threads > 0 ? threads : 1;
+if (cpus == 0) {
+cores = cores > 0 ? cores : 1;
+cpus = cores * threads * sockets;
+} else {
+ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
+cores = ms->smp.max_cpus / (sockets * threads);
+}
+} else if (sockets == 0) {
+threads = threads > 0 ? threads : 1;
+sockets = cpus / (cores * threads);
+sockets = sockets > 0 ? sockets : 1;
+} else if (threads == 0) {
+threads = cpus / (cores * sockets);
+threads = threads > 0 ? threads : 1;
+} else if (sockets * cores * threads < cpus) {
+error_report("cpu topology: "
+ "sockets (%u) * cores (%u) * threads (%u) < "
+ "smp_cpus (%u)",
+ sockets, cores, threads, cpus);
+exit(1);
+}
+
+ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
+
+if (ms->smp.max_cpus < cpus) {
+error_report("maxcpus must be equal to or greater than smp");
+exit(1);
+}
+
+if (sockets * cores * threads != ms->smp.max_cpus) {
+error_report("cpu topology: "
+ "sockets (%u) * cores (%u) * threads (%u)"
+ "!= maxcpus (%u)",
+ sockets, cores, threads,
+ ms->smp.max_cpus);
+exit(1);
+}
+
+ms->smp.cpus = cpus;
+ms->smp.cores = cores;
+ms->smp.threads = threads;
+ms->smp.sockets = sockets;
+}
+
+if (ms->smp.cpus > 1) {
+Error *blocker = NULL;
+error_setg(, QERR_REPLAY_NOT_SUPPORTED, "smp");
+replay_add_blocker(blocker);
+}
+}
+
 static void virt_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -2469,6 +2544,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
 mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
+mc->smp_parse = virt_smp_parse;
 mc->kvm_type = virt_kvm_type;
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
-- 
2.23.0




[RFC PATCH v2 07/13] hw/arm/virt-acpi-build: distinguish possible and present cpus Message

2020-10-20 Thread Ying Fang
When building ACPI tables regarding CPUs we should always build
them for the number of possible CPUs, not the number of present
CPUs. We then ensure only the present CPUs are enabled.

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a222981737..fae5a26741 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -57,14 +57,18 @@
 
 #define ARM_SPI_BASE 32
 
-static void acpi_dsdt_add_cpus(Aml *scope, int cpus)
+static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
 {
 uint16_t i;
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
 
-for (i = 0; i < cpus; i++) {
+for (i = 0; i < possible_cpus->len; i++) {
 Aml *dev = aml_device("C%.03X", i);
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
 aml_append(dev, aml_name_decl("_UID", aml_int(i)));
+if (possible_cpus->cpus[i].cpu == NULL) {
+aml_append(dev, aml_name_decl("_STA", aml_int(0)));
+}
 aml_append(scope, dev);
 }
 }
@@ -470,6 +474,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 const int *irqmap = vms->irqmap;
 AcpiMadtGenericDistributor *gicd;
 AcpiMadtGenericMsiFrame *gic_msi;
+int possible_cpus = MACHINE(vms)->possible_cpus->len;
 int i;
 
 acpi_data_push(table_data, sizeof(AcpiMultipleApicTable));
@@ -480,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
 gicd->version = vms->gic_version;
 
-for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {
+for (i = 0; i < possible_cpus; i++) {
 AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
sizeof(*gicc));
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -495,7 +500,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicc->cpu_interface_number = cpu_to_le32(i);
 gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity);
 gicc->uid = cpu_to_le32(i);
-gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+if (i < MACHINE(vms)->smp.cpus) {
+gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+}
 
 if (arm_feature(>env, ARM_FEATURE_PMU)) {
 gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ));
@@ -599,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  * the RTC ACPI device at all when using UEFI.
  */
 scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, ms->smp.cpus);
+acpi_dsdt_add_cpus(scope, vms);
 acpi_dsdt_add_uart(scope, [VIRT_UART],
(irqmap[VIRT_UART] + ARM_SPI_BASE));
 if (vmc->acpi_expose_flash) {
-- 
2.23.0




[RFC PATCH v2 05/13] hw: add compat machines for 5.3

2020-10-20 Thread Ying Fang
Add 5.2 machine types for arm/i440fx/q35/s390x/spapr.

Signed-off-by: Ying Fang 
---
 hw/arm/virt.c  |  9 -
 hw/core/machine.c  |  3 +++
 hw/i386/pc.c   |  3 +++
 hw/i386/pc_piix.c  | 15 ++-
 hw/i386/pc_q35.c   | 14 +-
 hw/ppc/spapr.c | 15 +--
 hw/s390x/s390-virtio-ccw.c | 14 +-
 include/hw/boards.h|  3 +++
 include/hw/i386/pc.h   |  3 +++
 9 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ba902b53ba..ff8a14439e 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2665,10 +2665,17 @@ static void machvirt_machine_init(void)
 }
 type_init(machvirt_machine_init);
 
+static void virt_machine_5_3_options(MachineClass *mc)
+{
+}
+DEFINE_VIRT_MACHINE_AS_LATEST(5, 3)
+
 static void virt_machine_5_2_options(MachineClass *mc)
 {
+virt_machine_5_3_options(mc);
+compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
 }
-DEFINE_VIRT_MACHINE_AS_LATEST(5, 2)
+DEFINE_VIRT_MACHINE(5, 2)
 
 static void virt_machine_5_1_options(MachineClass *mc)
 {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 7e2f4ec08e..6dc77699a9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -28,6 +28,9 @@
 #include "hw/mem/nvdimm.h"
 #include "migration/vmstate.h"
 
+GlobalProperty hw_compat_5_2[] = { };
+const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
+
 GlobalProperty hw_compat_5_1[] = {
 { "vhost-scsi", "num_queues", "1"},
 { "vhost-user-blk", "num-queues", "1"},
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e87be5d29a..eaa046ff5d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -97,6 +97,9 @@
 #include "trace.h"
 #include CONFIG_DEVICES
 
+GlobalProperty pc_compat_5_2[] = { };
+const size_t pc_compat_5_2_len = G_N_ELEMENTS(pc_compat_5_2);
+
 GlobalProperty pc_compat_5_1[] = {
 { "ICH9-LPC", "x-smi-cpu-hotplug", "off" },
 };
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 3c2ae0612b..01254090ce 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -426,7 +426,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
 }
 
-static void pc_i440fx_5_2_machine_options(MachineClass *m)
+static void pc_i440fx_5_3_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_machine_options(m);
@@ -435,6 +435,19 @@ static void pc_i440fx_5_2_machine_options(MachineClass *m)
 pcmc->default_cpu_version = 1;
 }
 
+DEFINE_I440FX_MACHINE(v5_3, "pc-i440fx-5.3", NULL,
+  pc_i440fx_5_3_machine_options);
+
+static void pc_i440fx_5_2_machine_options(MachineClass *m)
+{
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+pc_i440fx_machine_options(m);
+m->alias = NULL;
+m->is_default = false;
+compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);
+}
+
 DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL,
   pc_i440fx_5_2_machine_options);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index a3f4959c43..dd14803edb 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -344,7 +344,7 @@ static void pc_q35_machine_options(MachineClass *m)
 m->max_cpus = 288;
 }
 
-static void pc_q35_5_2_machine_options(MachineClass *m)
+static void pc_q35_5_3_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_machine_options(m);
@@ -352,6 +352,18 @@ static void pc_q35_5_2_machine_options(MachineClass *m)
 pcmc->default_cpu_version = 1;
 }
 
+DEFINE_Q35_MACHINE(v5_3, "pc-q35-5.3", NULL,
+   pc_q35_5_3_machine_options);
+
+static void pc_q35_5_2_machine_options(MachineClass *m)
+{
+PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
+pc_q35_machine_options(m);
+m->alias = NULL;
+compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len);
+}
+
 DEFINE_Q35_MACHINE(v5_2, "pc-q35-5.2", NULL,
pc_q35_5_2_machine_options);
 
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2db810f73a..c292a3edd9 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4511,15 +4511,26 @@ static void 
spapr_machine_latest_class_options(MachineClass *mc)
 }\
 type_init(spapr_machine_register_##suffix)
 
+/*
+ * pseries-5.3
+ */
+static void spapr_machine_5_3_class_options(MachineClass *mc)
+{
+/* Defaults for the latest behaviour inherited from the base class */
+}
+
+DEFINE_SPAPR_MACHINE(5_3, "5.3", true);
+
 /*
  * pseries-5.2
  */
 static void spapr

[RFC PATCH v2 09/13] hw/arm/virt-acpi-build: add PPTT table

2020-10-20 Thread Ying Fang
Add the Processor Properties Topology Table (PPTT) to present CPU topology
information to the guest.

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 42 
 1 file changed, 42 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index fae5a26741..e1f3ea50ad 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -429,6 +429,42 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  "SRAT", table_data->len - srat_start, 3, NULL, NULL);
 }
 
+static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState 
*ms)
+{
+int pptt_start = table_data->len;
+int uid = 0, cpus = 0, socket;
+unsigned int smp_cores = ms->smp.cores;
+unsigned int smp_threads = ms->smp.threads;
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
+uint32_t socket_offset = table_data->len - pptt_start;
+int core;
+
+build_socket_hierarchy(table_data, 0, socket);
+
+for (core = 0; core < smp_cores; core++) {
+uint32_t core_offset = table_data->len - pptt_start;
+int thread;
+
+if (smp_threads <= 1) {
+build_processor_hierarchy(table_data, 2, socket_offset, uid++);
+ } else {
+build_processor_hierarchy(table_data, 0, socket_offset, core);
+for (thread = 0; thread < smp_threads; thread++) {
+build_smt_hierarchy(table_data, core_offset, uid++);
+}
+ }
+}
+cpus += smp_cores * smp_threads;
+}
+
+build_header(linker, table_data,
+ (void *)(table_data->data + pptt_start), "PPTT",
+ table_data->len - pptt_start, 2, NULL, NULL);
+}
+
 /* GTDT */
 static void
 build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -669,6 +705,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 unsigned dsdt, xsdt;
 GArray *tables_blob = tables->table_data;
 MachineState *ms = MACHINE(vms);
+bool cpu_topology_enabled = !vmc->ignore_cpu_topology;
 
 table_offsets = g_array_new(false, true /* clear */,
 sizeof(uint32_t));
@@ -688,6 +725,11 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 acpi_add_table(table_offsets, tables_blob);
 build_madt(tables_blob, tables->linker, vms);
 
+if (cpu_topology_enabled) {
+acpi_add_table(table_offsets, tables_blob);
+build_pptt(tables_blob, tables->linker, ms);
+}
+
 acpi_add_table(table_offsets, tables_blob);
 build_gtdt(tables_blob, tables->linker, vms);
 
-- 
2.23.0




[RFC PATCH v2 12/13] hw/acpi/aml-build: build ACPI CPU cache hierarchy information

2020-10-20 Thread Ying Fang
To build cache information, An AcpiCacheInfo structure is defined to
hold the Type 1 cache structure according to ACPI spec v6.3 5.2.29.2.
A helper function build_cache_hierarchy is introduced to encode the
cache information.

Signed-off-by: Ying Fang 
---
 hw/acpi/aml-build.c | 26 ++
 include/hw/acpi/acpi-defs.h |  8 
 include/hw/acpi/aml-build.h |  3 +++
 3 files changed, 37 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index da3b41b514..6f0e8df49b 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1770,6 +1770,32 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms)
  table_data->len - slit_start, 1, NULL, NULL);
 }
 
+/* ACPI 6.3: 5.29.2 Cache type structure (Type 1) */
+static void build_cache_head(GArray *tbl, uint32_t next_level)
+{
+build_append_byte(tbl, 1);
+build_append_byte(tbl, 24);
+build_append_int_noprefix(tbl, 0, 2);
+build_append_int_noprefix(tbl, 0x7f, 4);
+build_append_int_noprefix(tbl, next_level, 4);
+}
+
+static void build_cache_tail(GArray *tbl, AcpiCacheInfo *cache_info)
+{
+build_append_int_noprefix(tbl, cache_info->size, 4);
+build_append_int_noprefix(tbl, cache_info->sets, 4);
+build_append_byte(tbl, cache_info->associativity);
+build_append_byte(tbl, cache_info->attributes);
+build_append_int_noprefix(tbl, cache_info->line_size, 2);
+}
+
+void build_cache_hierarchy(GArray *tbl,
+  uint32_t next_level, AcpiCacheInfo *cache_info)
+{
+build_cache_head(tbl, next_level);
+build_cache_tail(tbl, cache_info);
+}
+
 /*
  * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
  */
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 38a42f409a..3df38ab449 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -618,4 +618,12 @@ struct AcpiIortRC {
 } QEMU_PACKED;
 typedef struct AcpiIortRC AcpiIortRC;
 
+typedef struct AcpiCacheInfo {
+uint32_t size;
+uint32_t sets;
+uint8_t  associativity;
+uint8_t  attributes;
+uint16_t line_size;
+} AcpiCacheInfo;
+
 #endif
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 56474835a7..01078753a8 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -437,6 +437,9 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
 
+void build_cache_hierarchy(GArray *tbl,
+  uint32_t next_level, AcpiCacheInfo *cache_info);
+
 void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
 
 void build_processor_hierarchy(GArray *tbl, uint32_t flags,
-- 
2.23.0




[RFC PATCH v2 08/13] hw/acpi/aml-build: add processor hierarchy node structure

2020-10-20 Thread Ying Fang
Add the processor hierarchy node structures to build ACPI information
for CPU topology. Three helpers are introduced:

(1) build_socket_hierarchy for socket description structure
(2) build_processor_hierarchy for processor description structure
(3) build_smt_hierarchy for thread (logic processor) description structure

Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
 hw/acpi/aml-build.c | 37 +
 include/hw/acpi/aml-build.h |  7 +++
 2 files changed, 44 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 3792ba96ce..da3b41b514 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1770,6 +1770,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms)
  table_data->len - slit_start, 1, NULL, NULL);
 }
 
+/*
+ * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
+ */
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */
+build_append_int_noprefix(tbl, parent, 4);  /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);/* Type 0 - processor */
+build_append_byte(tbl, 20);   /* Length, add private resources */
+build_append_int_noprefix(tbl, 0, 2); /* Reserved */
+build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */
+build_append_int_noprefix(tbl, parent , 4); /* parent */
+build_append_int_noprefix(tbl, id, 4);  /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);   /* Num of private resources */
+}
+
 /* build rev1/rev3/rev5.1 FADT */
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index fe0055fffb..56474835a7 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -437,6 +437,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
 
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id);
+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id);
 
-- 
2.23.0




[RFC PATCH v2 06/13] hw/arm/virt: DT: add cpu-map

2020-10-20 Thread Ying Fang
From: Andrew Jones 

Support devicetree CPU topology descriptions.

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
 hw/arm/virt.c | 40 +++-
 include/hw/arm/virt.h |  1 +
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ff8a14439e..d23b941020 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -351,9 +351,10 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 int cpu;
 int addr_cells = 1;
 const MachineState *ms = MACHINE(vms);
+VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
 
 /*
- * From Documentation/devicetree/bindings/arm/cpus.txt
+ * See Linux Documentation/devicetree/bindings/arm/cpus.yaml
  *  On ARM v8 64-bit systems value should be set to 2,
  *  that corresponds to the MPIDR_EL1 register size.
  *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
@@ -407,8 +408,42 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
 }
 
+if (ms->smp.cpus > 1 && !vmc->ignore_cpu_topology) {
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+}
+
 g_free(nodename);
 }
+
+if (ms->smp.cpus > 1 && !vmc->ignore_cpu_topology) {
+/*
+ * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt
+ */
+qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map");
+
+for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
+char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu);
+char *map_path;
+
+if (ms->smp.threads > 1) {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d/%s%d",
+"cluster", cpu / (ms->smp.cores * ms->smp.threads),
+"core", (cpu / ms->smp.threads) % ms->smp.cores,
+"thread", cpu % ms->smp.threads);
+} else {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/%s%d/%s%d",
+"cluster", cpu / ms->smp.cores,
+"core", cpu % ms->smp.cores);
+}
+qemu_fdt_add_path(vms->fdt, map_path);
+qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path);
+g_free(map_path);
+g_free(cpu_path);
+}
+}
 }
 
 static void fdt_add_its_gic_node(VirtMachineState *vms)
@@ -2672,8 +2707,11 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 3)
 
 static void virt_machine_5_2_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
 virt_machine_5_3_options(mc);
 compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
+vmc->ignore_cpu_topology = true;
 }
 DEFINE_VIRT_MACHINE(5, 2)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 010f24f580..917bd8b645 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -118,6 +118,7 @@ typedef enum VirtGICType {
 struct VirtMachineClass {
 MachineClass parent;
 bool disallow_affinity_adjustment;
+bool ignore_cpu_topology;
 bool no_its;
 bool no_pmu;
 bool claim_edge_triggered_timers;
-- 
2.23.0




[RFC PATCH v2 11/13] hw/arm/virt: add fdt cache information

2020-10-20 Thread Ying Fang
Support devicetree CPU cache information descriptions

Signed-off-by: Ying Fang 
---
 hw/arm/virt.c | 92 +++
 1 file changed, 92 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d23b941020..adcfa52854 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -346,6 +346,89 @@ static void fdt_add_timer_nodes(const VirtMachineState 
*vms)
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL2_IRQ, irqflags);
 }
 
+static void fdt_add_l3cache_nodes(const VirtMachineState *vms)
+{
+int i;
+const MachineState *ms = MACHINE(vms);
+ARMCPU *cpu = ARM_CPU(first_cpu);
+unsigned int smp_cores = ms->smp.cores;
+unsigned int sockets = ms->smp.max_cpus / smp_cores;
+
+for (i = 0; i < sockets; i++) {
+char *nodename = g_strdup_printf("/cpus/l3-cache%d", i);
+qemu_fdt_add_subnode(vms->fdt, nodename);
+qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache");
+qemu_fdt_setprop_string(vms->fdt, nodename, "cache-unified", "true");
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-level", 3);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size",
+  cpu->caches.l3_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size",
+  cpu->caches.l3_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets",
+  cpu->caches.l3_cache->sets);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+g_free(nodename);
+}
+}
+
+static void fdt_add_l2cache_nodes(const VirtMachineState *vms)
+{
+int i, j;
+const MachineState *ms = MACHINE(vms);
+unsigned int smp_cores = ms->smp.cores;
+signed int sockets = ms->smp.max_cpus / smp_cores;
+ARMCPU *cpu = ARM_CPU(first_cpu);
+
+for (i = 0; i < sockets; i++) {
+char *next_path = g_strdup_printf("/cpus/l3-cache%d", i);
+for (j = 0; j < smp_cores; j++) {
+char *nodename = g_strdup_printf("/cpus/l2-cache%d",
+  i * smp_cores + j);
+qemu_fdt_add_subnode(vms->fdt, nodename);
+qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache");
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size",
+  cpu->caches.l2_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size",
+  cpu->caches.l2_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets",
+  cpu->caches.l2_cache->sets);
+qemu_fdt_setprop_phandle(vms->fdt, nodename,
+  "next-level-cache", next_path);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+g_free(nodename);
+}
+g_free(next_path);
+}
+}
+
+static void fdt_add_l1cache_prop(const VirtMachineState *vms,
+char *nodename, int cpu_index)
+{
+
+ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpu_index));
+CPUCaches caches = cpu->caches;
+
+char *cachename = g_strdup_printf("/cpus/l2-cache%d", cpu_index);
+
+qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-size",
+  caches.l1d_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-line-size",
+  caches.l1d_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-sets",
+  caches.l1d_cache->sets);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-size",
+  caches.l1i_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-line-size",
+  caches.l1i_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-sets",
+  caches.l1i_cache->sets);
+qemu_fdt_setprop_phandle(vms->fdt, nodename, "next-level-cache",
+  cachename);
+g_free(cachename);
+}
+
 static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 {
 int cpu;
@@ -379,6 +462,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells);
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#s

[RFC PATCH v2 10/13] target/arm/cpu: Add CPU cache description for arm

2020-10-20 Thread Ying Fang
Add the CPUCacheInfo structure to hold CPU cache information for ARM cpus.
A classic three level cache topology is used here. The default cache
capacity is given and userspace can overwrite these values.

Signed-off-by: Ying Fang 
---
 target/arm/cpu.c | 42 ++
 target/arm/cpu.h | 27 +++
 2 files changed, 69 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 056319859f..f1bac7452c 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -27,6 +27,7 @@
 #include "qapi/visitor.h"
 #include "cpu.h"
 #include "internals.h"
+#include "qemu/units.h"
 #include "exec/exec-all.h"
 #include "hw/qdev-properties.h"
 #if !defined(CONFIG_USER_ONLY)
@@ -997,6 +998,45 @@ uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz)
 return (Aff1 << ARM_AFF1_SHIFT) | Aff0;
 }
 
+static CPUCaches default_cache_info = {
+.l1d_cache = &(CPUCacheInfo) {
+.type = DATA_CACHE,
+.level = 1,
+.size = 64 * KiB,
+.line_size = 64,
+.associativity = 4,
+.sets = 256,
+.attributes = 0x02,
+},
+.l1i_cache = &(CPUCacheInfo) {
+.type = INSTRUCTION_CACHE,
+.level = 1,
+.size = 64 * KiB,
+.line_size = 64,
+.associativity = 4,
+.sets = 256,
+.attributes = 0x04,
+},
+.l2_cache = &(CPUCacheInfo) {
+.type = UNIFIED_CACHE,
+.level = 2,
+.size = 512 * KiB,
+.line_size = 64,
+.associativity = 8,
+.sets = 1024,
+.attributes = 0x0a,
+},
+.l3_cache = &(CPUCacheInfo) {
+.type = UNIFIED_CACHE,
+.level = 3,
+.size = 65536 * KiB,
+.line_size = 64,
+.associativity = 15,
+.sets = 2048,
+.attributes = 0x0a,
+},
+};
+
 static void cpreg_hashtable_data_destroy(gpointer data)
 {
 /*
@@ -1841,6 +1881,8 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 }
 }
 
+cpu->caches = default_cache_info;
+
 qemu_init_vcpu(cs);
 cpu_reset(cs);
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index cfff1b5c8f..dbc33a9802 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -746,6 +746,30 @@ typedef enum ARMPSCIState {
 
 typedef struct ARMISARegisters ARMISARegisters;
 
+/* Cache information type */
+enum CacheType {
+DATA_CACHE,
+INSTRUCTION_CACHE,
+UNIFIED_CACHE
+};
+
+typedef struct CPUCacheInfo {
+enum CacheType type;  /* Cache Type*/
+uint8_t level;
+uint32_t size;/* Size in bytes */
+uint16_t line_size;   /* Line size in bytes */
+uint8_t associativity;/* Cache associativity */
+uint32_t sets;/* Number of sets */
+uint8_t attributes;   /* Cache attributest  */
+} CPUCacheInfo;
+
+typedef struct CPUCaches {
+CPUCacheInfo *l1d_cache;
+CPUCacheInfo *l1i_cache;
+CPUCacheInfo *l2_cache;
+CPUCacheInfo *l3_cache;
+} CPUCaches;
+
 /**
  * ARMCPU:
  * @env: #CPUARMState
@@ -987,6 +1011,9 @@ struct ARMCPU {
 
 /* Generic timer counter frequency, in Hz */
 uint64_t gt_cntfrq_hz;
+
+/* CPU cache information */
+CPUCaches caches;
 };
 
 unsigned int gt_cntfrq_period_ns(ARMCPU *cpu);
-- 
2.23.0




[RFC PATCH v2 01/13] hw/arm/virt: Spell out smp.cpus and smp.max_cpus

2020-10-20 Thread Ying Fang
From: Andrew Jones 

Prefer to spell out the smp.cpus and smp.max_cpus machine state
variables in order to make grepping easier and to avoid any
confusion as to what cpu count is being used where.

Signed-off-by: Andrew Jones 
---
 hw/arm/virt-acpi-build.c |  8 +++
 hw/arm/virt.c| 51 +++-
 include/hw/arm/virt.h|  2 +-
 3 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9747a6458f..a222981737 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -57,11 +57,11 @@
 
 #define ARM_SPI_BASE 32
 
-static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
+static void acpi_dsdt_add_cpus(Aml *scope, int cpus)
 {
 uint16_t i;
 
-for (i = 0; i < smp_cpus; i++) {
+for (i = 0; i < cpus; i++) {
 Aml *dev = aml_device("C%.03X", i);
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
 aml_append(dev, aml_name_decl("_UID", aml_int(i)));
@@ -480,7 +480,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
 gicd->version = vms->gic_version;
 
-for (i = 0; i < vms->smp_cpus; i++) {
+for (i = 0; i < MACHINE(vms)->smp.cpus; i++) {
 AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
sizeof(*gicc));
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -599,7 +599,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  * the RTC ACPI device at all when using UEFI.
  */
 scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, vms->smp_cpus);
+acpi_dsdt_add_cpus(scope, ms->smp.cpus);
 acpi_dsdt_add_uart(scope, [VIRT_UART],
(irqmap[VIRT_UART] + ARM_SPI_BASE));
 if (vmc->acpi_expose_flash) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e465a988d6..0069fa1298 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -322,7 +322,7 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms)
 if (vms->gic_version == VIRT_GIC_VERSION_2) {
 irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
  GIC_FDT_IRQ_PPI_CPU_WIDTH,
- (1 << vms->smp_cpus) - 1);
+ (1 << MACHINE(vms)->smp.cpus) - 1);
 }
 
 qemu_fdt_add_subnode(vms->fdt, "/timer");
@@ -363,7 +363,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
  *  The simplest way to go is to examine affinity IDs of all our CPUs. If
  *  at least one of them has Aff3 populated, we set #address-cells to 2.
  */
-for (cpu = 0; cpu < vms->smp_cpus; cpu++) {
+for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
 
 if (armcpu->mp_affinity & ARM_AFF3_MASK) {
@@ -376,7 +376,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells);
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#size-cells", 0x0);
 
-for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
+for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) {
 char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
 CPUState *cs = CPU(armcpu);
@@ -387,7 +387,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 armcpu->dtb_compatible);
 
 if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED
-&& vms->smp_cpus > 1) {
+&& ms->smp.cpus > 1) {
 qemu_fdt_setprop_string(vms->fdt, nodename,
 "enable-method", "psci");
 }
@@ -533,7 +533,7 @@ static void fdt_add_pmu_nodes(const VirtMachineState *vms)
 if (vms->gic_version == VIRT_GIC_VERSION_2) {
 irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
  GIC_FDT_IRQ_PPI_CPU_WIDTH,
- (1 << vms->smp_cpus) - 1);
+ (1 << MACHINE(vms)->smp.cpus) - 1);
 }
 
 qemu_fdt_add_subnode(vms->fdt, "/pmu");
@@ -622,14 +622,13 @@ static void create_gic(VirtMachineState *vms)
 SysBusDevice *gicbusdev;
 const char *gictype;
 int type = vms->gic_version, i;
-unsigned int smp_cpus = ms->smp.cpus;
 uint32_t nb_redist_regions = 0;
 
 gictype = (type == 3) ? gicv3_class_name() : gic_class_name();
 
 vms->gic = qdev_new(gictype);
 qdev_prop_set_uint32(vms->gic, "revision", type);
-qdev_prop_set_uint32(vms->gic, "num-cpu", smp_cpus);
+qdev_prop_set_uint32(vms->gic, "num-cpu", ms->smp.cpus);
 /* Note that the num-irq property counts both internal and external
  * interrupts; there are always 32 of the former (mandated by GIC spec).
  */
@@ -641,7 +640,7 @@ static void 

[RFC PATCH v2 02/13] hw/arm/virt: Remove unused variable

2020-10-20 Thread Ying Fang
From: Andrew Jones 

We no longer use the smp_cpus virtual machine state variable.
Remove it.

Signed-off-by: Andrew Jones 
---
 hw/arm/virt.c | 2 --
 include/hw/arm/virt.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0069fa1298..ea24b576c6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1820,8 +1820,6 @@ static void machvirt_init(MachineState *machine)
 exit(1);
 }
 
-vms->smp_cpus = smp_cpus;
-
 if (vms->virt && kvm_enabled()) {
 error_report("mach-virt: KVM does not support providing "
  "Virtualization extensions to the guest CPU");
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 953d94acc0..010f24f580 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -151,7 +151,6 @@ struct VirtMachineState {
 MemMapEntry *memmap;
 char *pciehb_nodename;
 const int *irqmap;
-int smp_cpus;
 void *fdt;
 int fdt_size;
 uint32_t clock_phandle;
-- 
2.23.0




[RFC PATCH v2 00/13] hw/arm/virt: Introduce cpu and cache topology support

2020-10-20 Thread Ying Fang
An accurate cpu topology may help improve the cpu scheduler's decision
making when dealing with multi-core system. So cpu topology description
is helpful to provide guest with the right view. Cpu cache information may
also have slight impact on the sched domain, and even userspace software
may check the cpu cache information to do some optimizations. Thus this patch
series is posted to provide cpu and cache topology support for arm.

Both fdt and ACPI are introduced to present the cpu and cache topology.
To describe the cpu topology via ACPI, a PPTT table is introduced according
to the processor hierarchy node structure. To describe the cpu cache
information, a default cache hierarchy is given and built according to the
cache type structure defined by ACPI, it can be made configurable later.

The RFC v1 was posted at [1], we tried to map the MPIDR register into cpu
topology, however it is totally wrong. Andrew points it out that Linux kernel
is goint to stop using MPIDR for topology information [2]. The root cause is
the MPIDR register has been abused by ARM OEM manufactures. It is only used as
an identifer for a specific cpu, not representation of the topology. Moreover
this v2 is rebased on Andrew's latest branch shared [4].

This patch series was initially based on the patches posted by Andrew Jones [3].
I jumped in on it since some OS vendor cooperative partner are eager for it.
Thanks for Andrew's contribution.

After applying this patch series, launch a guest with virt-5.3 and cpu
topology configured with sockets:cores:threads = 2:4:2, you will get the
bellow messages with the lscpu command.

Architecture:aarch64
CPU op-mode(s):  64-bit
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):   2
NUMA node(s):2
Vendor ID:   HiSilicon
Model:   0
Model name:  Kunpeng-920
Stepping:0x1
BogoMIPS:200.00
L1d cache:   512 KiB
L1i cache:   512 KiB
L2 cache:4 MiB
L3 cache:128 MiB
NUMA node0 CPU(s):   0-7
NUMA node1 CPU(s):   8-15

changelog
v1 -> v2:
* Rebased to the latest branch shared by Andrew Jones [4]
* Stop mapping MPIDR into vcpu topology

[1] https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg06027.html
[2] 
https://patchwork.kernel.org/project/linux-arm-kernel/patch/20200829130016.26106-1-valentin.schnei...@arm.com/
[3] 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com
[4] https://github.com/rhdrjones/qemu/commits/virt-cpu-topology-refresh


Andrew Jones (5):
  hw/arm/virt: Spell out smp.cpus and smp.max_cpus
  hw/arm/virt: Remove unused variable
  hw/arm/virt: Replace smp_parse with one that prefers cores
  device_tree: Add qemu_fdt_add_path
  hw/arm/virt: DT: add cpu-map

Ying Fang (8):
  hw: add compat machines for 5.3
  hw/arm/virt-acpi-build: distinguish possible and present cpus Message
  hw/acpi/aml-build: add processor hierarchy node structure
  hw/arm/virt-acpi-build: add PPTT table
  target/arm/cpu: Add CPU cache description for arm
  hw/arm/virt: add fdt cache information
  hw/acpi/aml-build: build ACPI CPU cache hierarchy information
  hw/arm/virt-acpi-build: Enable CPU cache topology

 device_tree.c|  45 +-
 hw/acpi/aml-build.c  |  68 +
 hw/arm/virt-acpi-build.c |  99 -
 hw/arm/virt.c| 270 +++
 hw/core/machine.c|   3 +
 hw/i386/pc.c |   3 +
 hw/i386/pc_piix.c|  15 +-
 hw/i386/pc_q35.c |  14 +-
 hw/ppc/spapr.c   |  15 +-
 hw/s390x/s390-virtio-ccw.c   |  14 +-
 include/hw/acpi/acpi-defs.h  |  14 ++
 include/hw/acpi/aml-build.h  |  11 ++
 include/hw/arm/virt.h|   4 +-
 include/hw/boards.h  |   3 +
 include/hw/i386/pc.h |   3 +
 include/sysemu/device_tree.h |   1 +
 target/arm/cpu.c |  42 ++
 target/arm/cpu.h |  27 
 18 files changed, 606 insertions(+), 45 deletions(-)

-- 
2.23.0




Re: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support

2020-10-19 Thread Ying Fang




On 10/16/2020 6:07 PM, Andrew Jones wrote:

On Fri, Oct 16, 2020 at 05:40:02PM +0800, Ying Fang wrote:



On 10/15/2020 3:59 PM, Andrew Jones wrote:

On Thu, Oct 15, 2020 at 10:07:16AM +0800, Ying Fang wrote:



On 10/14/2020 2:08 AM, Andrew Jones wrote:

On Tue, Oct 13, 2020 at 12:11:20PM +, Zengtao (B) wrote:

Cc valentin


-Original Message-
From: Qemu-devel
[mailto:qemu-devel-bounces+prime.zeng=hisilicon@nongnu.org]
On Behalf Of Ying Fang
Sent: Thursday, September 17, 2020 11:20 AM
To: qemu-devel@nongnu.org
Cc: peter.mayd...@linaro.org; drjo...@redhat.com; Zhanghailiang;
Chenzhendong (alex); shannon.zha...@gmail.com;
qemu-...@nongnu.org; alistair.fran...@wdc.com; fangying;
imamm...@redhat.com
Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache
topology support

An accurate cpu topology may help improve the cpu scheduler's
decision
making when dealing with multi-core system. So cpu topology
description
is helpful to provide guest with the right view. Cpu cache information
may
also have slight impact on the sched domain, and even userspace
software
may check the cpu cache information to do some optimizations. Thus
this patch
series is posted to provide cpu and cache topology support for arm.

To make the cpu topology consistent with MPIDR, an vcpu ioctl


For aarch64, the cpu topology don't depends on the MPDIR.
See https://patchwork.kernel.org/patch/11744387/



The topology should not be inferred from the MPIDR Aff fields,


MPIDR is abused by ARM OEM manufactures. It is only used as a
identifer for a specific cpu, not representation of the topology.


Right, which is why I stated topology should not be inferred from
it.




but MPIDR is the CPU identifier. When describing a topology
with ACPI or DT the CPU elements in the topology description
must map to actual CPUs. MPIDR is that mapping link. KVM
currently determines what the MPIDR of a VCPU is. If KVM


KVM currently assigns MPIDR with vcpu->vcpu_id which mapped
into affinity levels. See reset_mpidr in sys_regs.c


I know, but how KVM assigns MPIDRs today is not really important
to KVM userspace. KVM userspace shouldn't depend on a KVM
algorithm, as it could change.




userspace is going to determine the VCPU topology, then it
also needs control over the MPIDR values, otherwise it
becomes quite messy trying to get the mapping right.

If we are going to control MPIDR, shall we assign MPIDR with
vcpu_id or map topology hierarchy into affinity levels or any
other link schema ?



We can assign them to whatever we want, as long as they're
unique and as long as Aff0 is assigned per the GIC requirements,
e.g. GICv3 requires that Aff0 be from 0 to 0xf. Also, when
pinning VCPUs to PCPUs we should ensure that MPIDRs with matching
Aff3,Aff2,Aff1 fields should actually be peers with respect to
the GIC.


Still not clear why vCPU's MPIDR need to match pPCPU's GIC affinity.
Maybe I should read spec for GICv3.


Look at how IPIs are efficiently sent to "peers", where the definition
of a peer is that only Aff0 differs in its MPIDR. But, gicv3's
optimizations can only handle 16 peers. If we want pinned VCPUs to
have the same performance as PCPUS, then we should maintain this
Aff0 limit.


Yes I see. I think *virt_cpu_mp_affinity* in qemu has limit
on the clustersz. It groups every 16 vCPUs into a cluster
and then mapped into the first two affinity levels.

Thanks.
Ying.



Thanks,
drew





We shouldn't try to encode topology in the MPIDR in any way,
so we might as well simply increment a counter to assign them,
which could possibly be the same as the VCPU ID.


Hmm, then we can leave it as it is.



Thanks,
drew

.





.





Re: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support

2020-10-16 Thread Ying Fang




On 10/15/2020 3:59 PM, Andrew Jones wrote:

On Thu, Oct 15, 2020 at 10:07:16AM +0800, Ying Fang wrote:



On 10/14/2020 2:08 AM, Andrew Jones wrote:

On Tue, Oct 13, 2020 at 12:11:20PM +, Zengtao (B) wrote:

Cc valentin


-Original Message-
From: Qemu-devel
[mailto:qemu-devel-bounces+prime.zeng=hisilicon@nongnu.org]
On Behalf Of Ying Fang
Sent: Thursday, September 17, 2020 11:20 AM
To: qemu-devel@nongnu.org
Cc: peter.mayd...@linaro.org; drjo...@redhat.com; Zhanghailiang;
Chenzhendong (alex); shannon.zha...@gmail.com;
qemu-...@nongnu.org; alistair.fran...@wdc.com; fangying;
imamm...@redhat.com
Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache
topology support

An accurate cpu topology may help improve the cpu scheduler's
decision
making when dealing with multi-core system. So cpu topology
description
is helpful to provide guest with the right view. Cpu cache information
may
also have slight impact on the sched domain, and even userspace
software
may check the cpu cache information to do some optimizations. Thus
this patch
series is posted to provide cpu and cache topology support for arm.

To make the cpu topology consistent with MPIDR, an vcpu ioctl


For aarch64, the cpu topology don't depends on the MPDIR.
See https://patchwork.kernel.org/patch/11744387/



The topology should not be inferred from the MPIDR Aff fields,


MPIDR is abused by ARM OEM manufactures. It is only used as a
identifer for a specific cpu, not representation of the topology.


Right, which is why I stated topology should not be inferred from
it.




but MPIDR is the CPU identifier. When describing a topology
with ACPI or DT the CPU elements in the topology description
must map to actual CPUs. MPIDR is that mapping link. KVM
currently determines what the MPIDR of a VCPU is. If KVM


KVM currently assigns MPIDR with vcpu->vcpu_id which mapped
into affinity levels. See reset_mpidr in sys_regs.c


I know, but how KVM assigns MPIDRs today is not really important
to KVM userspace. KVM userspace shouldn't depend on a KVM
algorithm, as it could change.




userspace is going to determine the VCPU topology, then it
also needs control over the MPIDR values, otherwise it
becomes quite messy trying to get the mapping right.

If we are going to control MPIDR, shall we assign MPIDR with
vcpu_id or map topology hierarchy into affinity levels or any
other link schema ?



We can assign them to whatever we want, as long as they're
unique and as long as Aff0 is assigned per the GIC requirements,
e.g. GICv3 requires that Aff0 be from 0 to 0xf. Also, when
pinning VCPUs to PCPUs we should ensure that MPIDRs with matching
Aff3,Aff2,Aff1 fields should actually be peers with respect to
the GIC.


Still not clear why vCPU's MPIDR need to match pPCPU's GIC affinity.
Maybe I should read spec for GICv3.



We shouldn't try to encode topology in the MPIDR in any way,
so we might as well simply increment a counter to assign them,
which could possibly be the same as the VCPU ID.


Hmm, then we can leave it as it is.



Thanks,
drew

.





Re: [RFC PATCH v2 0/8] block-backend: Introduce I/O hang

2020-10-15 Thread Ying Fang




On 10/10/2020 10:27 AM, cenjiahui wrote:

Hi Kevin,

Could you please spend some time reviewing and commenting on this patch series.

Thanks,
Jiahui Cen


This feature is confirmed effective in a cloud storage environment since
it can help to improve the availability without pausing the entire
guest. Hope it won't be lost on the thread. Any comments or reviews
are welcome.



On 2020/9/30 17:45, Jiahui Cen wrote:

A VM in the cloud environment may use a virutal disk as the backend storage,
and there are usually filesystems on the virtual block device. When backend
storage is temporarily down, any I/O issued to the virtual block device will
cause an error. For example, an error occurred in ext4 filesystem would make
the filesystem readonly. However a cloud backend storage can be soon recovered.
For example, an IP-SAN may be down due to network failure and will be online
soon after network is recovered. The error in the filesystem may not be
recovered unless a device reattach or system restart. So an I/O rehandle is
in need to implement a self-healing mechanism.

This patch series propose a feature called I/O hang. It can rehandle AIOs
with EIO error without sending error back to guest. From guest's perspective
of view it is just like an IO is hanging and not returned. Guest can get
back running smoothly when I/O is recovred with this feature enabled.

v1->v2:
* Rebase to fix compile problems.
* Fix incorrect remove of rehandle list.
* Provide rehandle pause interface.

Jiahui Cen (8):
   block-backend: introduce I/O rehandle info
   block-backend: rehandle block aios when EIO
   block-backend: add I/O hang timeout
   block-backend: add I/O rehandle pause/unpause
   block-backend: enable I/O hang when timeout is set
   virtio-blk: pause I/O hang when resetting
   qemu-option: add I/O hang timeout option
   qapi: add I/O hang and I/O hang timeout qapi event

  block/block-backend.c  | 300 +
  blockdev.c |  11 ++
  hw/block/virtio-blk.c  |   8 +
  include/sysemu/block-backend.h |   5 +
  qapi/block-core.json   |  26 +++
  5 files changed, 350 insertions(+)


.





Re: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support

2020-10-14 Thread Ying Fang




On 10/14/2020 2:08 AM, Andrew Jones wrote:

On Tue, Oct 13, 2020 at 12:11:20PM +, Zengtao (B) wrote:

Cc valentin


-Original Message-
From: Qemu-devel
[mailto:qemu-devel-bounces+prime.zeng=hisilicon@nongnu.org]
On Behalf Of Ying Fang
Sent: Thursday, September 17, 2020 11:20 AM
To: qemu-devel@nongnu.org
Cc: peter.mayd...@linaro.org; drjo...@redhat.com; Zhanghailiang;
Chenzhendong (alex); shannon.zha...@gmail.com;
qemu-...@nongnu.org; alistair.fran...@wdc.com; fangying;
imamm...@redhat.com
Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache
topology support

An accurate cpu topology may help improve the cpu scheduler's
decision
making when dealing with multi-core system. So cpu topology
description
is helpful to provide guest with the right view. Cpu cache information
may
also have slight impact on the sched domain, and even userspace
software
may check the cpu cache information to do some optimizations. Thus
this patch
series is posted to provide cpu and cache topology support for arm.

To make the cpu topology consistent with MPIDR, an vcpu ioctl


For aarch64, the cpu topology don't depends on the MPDIR.
See https://patchwork.kernel.org/patch/11744387/



The topology should not be inferred from the MPIDR Aff fields,


MPIDR is abused by ARM OEM manufactures. It is only used as a
identifer for a specific cpu, not representation of the topology.


but MPIDR is the CPU identifier. When describing a topology
with ACPI or DT the CPU elements in the topology description
must map to actual CPUs. MPIDR is that mapping link. KVM
currently determines what the MPIDR of a VCPU is. If KVM


KVM currently assigns MPIDR with vcpu->vcpu_id which mapped
into affinity levels. See reset_mpidr in sys_regs.c


userspace is going to determine the VCPU topology, then it
also needs control over the MPIDR values, otherwise it
becomes quite messy trying to get the mapping right.

If we are going to control MPIDR, shall we assign MPIDR with
vcpu_id or map topology hierarchy into affinity levels or any
other link schema ?



Thanks,
drew

.


Thanks Ying.



[RFC PATCH 7/7] qapi: add I/O hang and I/O hang timeout qapi event

2020-09-27 Thread Ying Fang
Sometimes hypervisor management tools like libvirt may need to monitor
I/O hang events. Let's report I/O hang and I/O hang timeout event via qapi.

Signed-off-by: Jiahui Cen 
Signed-off-by: Ying Fang 
---
 block/block-backend.c |  3 +++
 qapi/block-core.json  | 26 ++
 2 files changed, 29 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 95b2d6a679..5dc5b11bcc 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2540,6 +2540,7 @@ static bool blk_iohang_handle(BlockBackend *blk, int 
new_status)
 /* Case when I/O Hang is recovered */
 blk->is_iohang_timeout = false;
 blk->iohang_time = 0;
+qapi_event_send_block_io_hang(false);
 }
 break;
 case BLOCK_IO_HANG_STATUS_HANG:
@@ -2547,12 +2548,14 @@ static bool blk_iohang_handle(BlockBackend *blk, int 
new_status)
 /* Case when I/O hang is first triggered */
 blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
 need_rehandle = true;
+qapi_event_send_block_io_hang(true);
 } else {
 if (!blk->is_iohang_timeout) {
 now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
 if (now >= (blk->iohang_time + blk->iohang_timeout)) {
 /* Case when I/O hang is timeout */
 blk->is_iohang_timeout = true;
+qapi_event_send_block_io_hang_timeout(true);
 } else {
 /* Case when I/O hang is continued */
 need_rehandle = true;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 3c16f1e11d..7bdf75c6d7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5535,3 +5535,29 @@
 { 'command': 'blockdev-snapshot-delete-internal-sync',
   'data': { 'device': 'str', '*id': 'str', '*name': 'str'},
   'returns': 'SnapshotInfo' }
+
+##
+# @BLOCK_IO_HANG:
+#
+# Emitted when device I/O hang trigger event begin or end
+#
+# @set: true if I/O hang begin; false if I/O hang end.
+#
+# Since: 5.2
+#
+##
+{ 'event': 'BLOCK_IO_HANG',
+  'data': { 'set': 'bool' }}
+
+##
+# @BLOCK_IO_HANG_TIMEOUT:
+#
+# Emitted when device I/O hang timeout event set or clear
+#
+# @set: true if set; false if clear.
+#
+# Since: 5.2
+#
+##
+{ 'event': 'BLOCK_IO_HANG_TIMEOUT',
+  'data': { 'set': 'bool' }}
-- 
2.23.0




[RFC PATCH 2/7] block-backend: rehandle block aios when EIO

2020-09-27 Thread Ying Fang
When a backend device temporarily does not response, like a network disk down
due to some network faults, any IO to the coresponding virtual block device
in VM would return I/O error. If the hypervisor returns the error to VM, the
filesystem on this block device may not work as usual. And in many situations,
the returned error is often an EIO.

To avoid this unavailablity, we can store the failed AIOs, and resend them
later. If the error is temporary, the retries can succeed and the AIOs can
be successfully completed.

Signed-off-by: Ying Fang 
Signed-off-by: Jiahui Cen 
---
 block/block-backend.c | 89 +++
 1 file changed, 89 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index bf104a7cf5..90f1ca5753 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -365,6 +365,12 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
uint64_t shared_perm)
 notifier_list_init(>remove_bs_notifiers);
 notifier_list_init(>insert_bs_notifiers);
 
+/* for rehandle */
+blk->reinfo.enable = false;
+blk->reinfo.ts = NULL;
+atomic_set(>reinfo.in_flight, 0);
+QTAILQ_INIT(>reinfo.re_aios);
+
 QLIST_INIT(>aio_notifiers);
 
 QTAILQ_INSERT_TAIL(_backends, blk, link);
@@ -1425,8 +1431,16 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
 .get_aio_context= blk_aio_em_aiocb_get_aio_context,
 };
 
+static void blk_rehandle_timer_cb(void *opaque);
+static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb);
+
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
+if (acb->rwco.blk->reinfo.enable) {
+blk_rehandle_aio_complete(acb);
+return;
+}
+
 if (acb->has_returned) {
 acb->common.cb(acb->common.opaque, acb->rwco.ret);
 blk_dec_in_flight(acb->rwco.blk);
@@ -1459,6 +1473,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset, int bytes,
 .ret= NOT_DONE,
 };
 acb->bytes = bytes;
+acb->co_entry = co_entry;
 acb->has_returned = false;
 
 co = qemu_coroutine_create(co_entry, acb);
@@ -2054,6 +2069,20 @@ static int blk_do_set_aio_context(BlockBackend *blk, 
AioContext *new_context,
 throttle_group_attach_aio_context(tgm, new_context);
 bdrv_drained_end(bs);
 }
+
+if (blk->reinfo.enable) {
+if (blk->reinfo.ts) {
+timer_del(blk->reinfo.ts);
+timer_free(blk->reinfo.ts);
+}
+blk->reinfo.ts = aio_timer_new(new_context, QEMU_CLOCK_REALTIME,
+   SCALE_MS, blk_rehandle_timer_cb,
+   blk);
+if (atomic_read(>reinfo.in_flight)) {
+timer_mod(blk->reinfo.ts,
+  qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
+}
+}
 }
 
 blk->ctx = new_context;
@@ -2405,6 +2434,66 @@ static void blk_root_drained_end(BdrvChild *child, int 
*drained_end_counter)
 }
 }
 
+static void blk_rehandle_insert_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb)
+{
+assert(blk->reinfo.enable);
+
+atomic_inc(>reinfo.in_flight);
+QTAILQ_INSERT_TAIL(>reinfo.re_aios, acb, list);
+timer_mod(blk->reinfo.ts, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
+  blk->reinfo.timer_interval_ms);
+}
+
+static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb)
+{
+QTAILQ_REMOVE(>reinfo.re_aios, acb, list);
+atomic_dec(>reinfo.in_flight);
+}
+
+static void blk_rehandle_timer_cb(void *opaque)
+{
+BlockBackend *blk = opaque;
+BlockBackendRehandleInfo *reinfo = >reinfo;
+BlkAioEmAIOCB *acb, *tmp;
+Coroutine *co;
+
+aio_context_acquire(blk_get_aio_context(blk));
+QTAILQ_FOREACH_SAFE(acb, >re_aios, list, tmp) {
+if (acb->rwco.ret == NOT_DONE) {
+continue;
+}
+
+blk_inc_in_flight(acb->rwco.blk);
+acb->rwco.ret = NOT_DONE;
+acb->has_returned = false;
+
+co = qemu_coroutine_create(acb->co_entry, acb);
+bdrv_coroutine_enter(blk_bs(blk), co);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+blk_rehandle_remove_aiocb(acb->rwco.blk, acb);
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+}
+aio_context_release(blk_get_aio_context(blk));
+}
+
+static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb)
+{
+if (acb->has_returned) {
+blk_dec_in_flight(acb->rwco.blk);
+if (acb->rwco.ret == -EIO) {
+blk_rehandle_insert_aiocb(acb->rwco.blk, acb);
+return;
+}
+
+acb->common.cb(acb->common.opaque, acb->rwco.ret);
+qemu_aio_unref(acb);
+}
+}
+
 void blk_register_buf(BlockBackend *blk, void *host, size_t size)
 {
 bdrv_register_buf(blk_bs(blk), host, size);
-- 
2.23.0




[RFC PATCH 5/7] virtio-blk: disable I/O hang when resetting

2020-09-27 Thread Ying Fang
All AIOs including the hanging AIOs need to be drained when resetting
virtio-blk. So it is necessary to disable I/O hang before resetting
and enable I/O hang again after resetting if I/O hang is enabled.

Signed-off-by: Ying Fang 
Signed-off-by: Jiahui Cen 
---
 hw/block/virtio-blk.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 2204ba149e..11837a54f5 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -892,6 +892,10 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 AioContext *ctx;
 VirtIOBlockReq *req;
 
+if (blk_iohang_is_enabled(s->blk)) {
+blk_rehandle_disable(s->blk);
+}
+
 ctx = blk_get_aio_context(s->blk);
 aio_context_acquire(ctx);
 blk_drain(s->blk);
@@ -909,6 +913,10 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 
 assert(!s->dataplane_started);
 blk_set_enable_write_cache(s->blk, s->original_wce);
+
+if (blk_iohang_is_enabled(s->blk)) {
+blk_rehandle_enable(s->blk);
+}
 }
 
 /* coalesce internal state, copy to pci i/o region 0
-- 
2.23.0




[RFC PATCH 1/7] block-backend: introduce I/O rehandle info

2020-09-27 Thread Ying Fang
The I/O hang feature is realized based on a rehandle mechanism.
Each block backend will have a list to store hanging block AIOs,
and a timer to regularly resend these aios. In order to issue
the AIOs again, each block AIOs also need to store its coroutine entry.

Signed-off-by: Jiahui Cen 
Signed-off-by: Ying Fang 
---
 block/block-backend.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 24dd0670d1..bf104a7cf5 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -35,6 +35,18 @@
 
 static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
 
+/* block backend rehandle timer interval 5s */
+#define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL   5000
+
+typedef struct BlockBackendRehandleInfo {
+bool enable;
+QEMUTimer *ts;
+unsigned timer_interval_ms;
+
+unsigned int in_flight;
+QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios;
+} BlockBackendRehandleInfo;
+
 typedef struct BlockBackendAioNotifier {
 void (*attached_aio_context)(AioContext *new_context, void *opaque);
 void (*detach_aio_context)(void *opaque);
@@ -95,6 +107,8 @@ struct BlockBackend {
  * Accessed with atomic ops.
  */
 unsigned int in_flight;
+
+BlockBackendRehandleInfo reinfo;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -350,6 +364,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
uint64_t shared_perm)
 qemu_co_queue_init(>queued_requests);
 notifier_list_init(>remove_bs_notifiers);
 notifier_list_init(>insert_bs_notifiers);
+
 QLIST_INIT(>aio_notifiers);
 
 QTAILQ_INSERT_TAIL(_backends, blk, link);
@@ -1392,6 +1407,10 @@ typedef struct BlkAioEmAIOCB {
 BlkRwCo rwco;
 int bytes;
 bool has_returned;
+
+/* for rehandle */
+CoroutineEntry *co_entry;
+QTAILQ_ENTRY(BlkAioEmAIOCB) list;
 } BlkAioEmAIOCB;
 
 static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_)
-- 
2.23.0




[RFC PATCH 3/7] block-backend: add I/O hang timeout

2020-09-27 Thread Ying Fang
Not all errors would be fixed, so it is better to add a rehandle timeout
for I/O hang.

Signed-off-by: Jiahui Cen 
Signed-off-by: Ying Fang 
---
 block/block-backend.c  | 99 +-
 include/sysemu/block-backend.h |  2 +
 2 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 90f1ca5753..d0b2b59f55 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -38,6 +38,11 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB 
*acb);
 /* block backend rehandle timer interval 5s */
 #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL   5000
 
+enum BlockIOHangStatus {
+BLOCK_IO_HANG_STATUS_NORMAL = 0,
+BLOCK_IO_HANG_STATUS_HANG,
+};
+
 typedef struct BlockBackendRehandleInfo {
 bool enable;
 QEMUTimer *ts;
@@ -109,6 +114,11 @@ struct BlockBackend {
 unsigned int in_flight;
 
 BlockBackendRehandleInfo reinfo;
+
+int64_t iohang_timeout; /* The I/O hang timeout value in sec. */
+int64_t iohang_time;/* The I/O hang start time */
+bool is_iohang_timeout;
+int iohang_status;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -2480,20 +2490,107 @@ static void blk_rehandle_timer_cb(void *opaque)
 aio_context_release(blk_get_aio_context(blk));
 }
 
+static bool blk_iohang_handle(BlockBackend *blk, int new_status)
+{
+int64_t now;
+int old_status = blk->iohang_status;
+bool need_rehandle = false;
+
+switch (new_status) {
+case BLOCK_IO_HANG_STATUS_NORMAL:
+if (old_status == BLOCK_IO_HANG_STATUS_HANG) {
+/* Case when I/O Hang is recovered */
+blk->is_iohang_timeout = false;
+blk->iohang_time = 0;
+}
+break;
+case BLOCK_IO_HANG_STATUS_HANG:
+if (old_status != BLOCK_IO_HANG_STATUS_HANG) {
+/* Case when I/O hang is first triggered */
+blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
+need_rehandle = true;
+} else {
+if (!blk->is_iohang_timeout) {
+now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000;
+if (now >= (blk->iohang_time + blk->iohang_timeout)) {
+/* Case when I/O hang is timeout */
+blk->is_iohang_timeout = true;
+} else {
+/* Case when I/O hang is continued */
+need_rehandle = true;
+}
+}
+}
+break;
+default:
+break;
+}
+
+blk->iohang_status = new_status;
+return need_rehandle;
+}
+
+static bool blk_rehandle_aio(BlkAioEmAIOCB *acb, bool *has_timeout)
+{
+bool need_rehandle = false;
+
+/* Rehandle aio which returns EIO before hang timeout */
+if (acb->rwco.ret == -EIO) {
+if (acb->rwco.blk->is_iohang_timeout) {
+/* I/O hang has timeout and not recovered */
+*has_timeout = true;
+} else {
+need_rehandle = blk_iohang_handle(acb->rwco.blk,
+  BLOCK_IO_HANG_STATUS_HANG);
+/* I/O hang timeout first trigger */
+if (acb->rwco.blk->is_iohang_timeout) {
+*has_timeout = true;
+}
+}
+}
+
+return need_rehandle;
+}
+
 static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb)
 {
+bool has_timeout = false;
+bool need_rehandle = false;
+
 if (acb->has_returned) {
 blk_dec_in_flight(acb->rwco.blk);
-if (acb->rwco.ret == -EIO) {
+need_rehandle = blk_rehandle_aio(acb, _timeout);
+if (need_rehandle) {
 blk_rehandle_insert_aiocb(acb->rwco.blk, acb);
 return;
 }
 
 acb->common.cb(acb->common.opaque, acb->rwco.ret);
+
+/* I/O hang return to normal status */
+if (!has_timeout) {
+blk_iohang_handle(acb->rwco.blk, BLOCK_IO_HANG_STATUS_NORMAL);
+}
+
 qemu_aio_unref(acb);
 }
 }
 
+void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout)
+{
+if (!blk) {
+return;
+}
+
+blk->is_iohang_timeout = false;
+blk->iohang_time = 0;
+blk->iohang_timeout = 0;
+blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL;
+if (iohang_timeout > 0) {
+blk->iohang_timeout = iohang_timeout;
+}
+}
+
 void blk_register_buf(BlockBackend *blk, void *host, size_t size)
 {
 bdrv_register_buf(blk_bs(blk), host, size);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8203d7f6f9..bfebe3a960 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -268,4 +268,6 @@ const BdrvChild *blk_root(BlockBackend *blk);
 
 int blk_make_empty(BlockBackend *blk, Error **errp);
 
+void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout);
+
 #endif
-- 
2.23.0




[RFC PATCH 6/7] qemu-option: add I/O hang timeout option

2020-09-27 Thread Ying Fang
I/O hang timeout should be different under different situations. So it is
better to provide an option for user to determine I/O hang timeout for
each block device.

Signed-off-by: Jiahui Cen 
Signed-off-by: Ying Fang 
---
 blockdev.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 7f2561081e..ff8cdcd497 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -500,6 +500,7 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 BlockdevDetectZeroesOptions detect_zeroes =
 BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF;
 const char *throttling_group = NULL;
+int64_t iohang_timeout = 0;
 
 /* Check common options by copying from bs_opts to opts, all other options
  * stay in bs_opts for processing by bdrv_open(). */
@@ -622,6 +623,12 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 
 bs->detect_zeroes = detect_zeroes;
 
+/* init timeout value for I/O Hang */
+iohang_timeout = qemu_opt_get_number(opts, "iohang-timeout", 0);
+if (iohang_timeout > 0) {
+blk_iohang_init(blk, iohang_timeout);
+}
+
 block_acct_setup(blk_get_stats(blk), account_invalid, account_failed);
 
 if (!parse_stats_intervals(blk_get_stats(blk), interval_list, errp)) {
@@ -3786,6 +3793,10 @@ QemuOptsList qemu_common_drive_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "whether to account for failed I/O operations "
 "in the statistics",
+},{
+.name = "iohang-timeout",
+.type = QEMU_OPT_NUMBER,
+.help = "timeout value for I/O Hang",
 },
 { /* end of list */ }
 },
-- 
2.23.0




[RFC PATCH 0/7] block-backend: Introduce I/O hang

2020-09-27 Thread Ying Fang
A VM in the cloud environment may use a virutal disk as the backend storage,
and there are usually filesystems on the virtual block device. When backend
storage is temporarily down, any I/O issued to the virtual block device will
cause an error. For example, an error occurred in ext4 filesystem would make
the filesystem readonly. However a cloud backend storage can be soon recovered.
For example, an IP-SAN may be down due to network failure and will be online
soon after network is recovered. The error in the filesystem may not be
recovered unless a device reattach or system restart. So an I/O rehandle is
in need to implement a self-healing mechanism.

This patch series propose a feature called I/O hang. It can rehandle AIOs
with EIO error without sending error back to guest. From guest's perspective
of view it is just like an IO is hanging and not returned. Guest can get
back running smoothly when I/O is recovred with this feature enabled.


Ying Fang (7):
  block-backend: introduce I/O rehandle info
  block-backend: rehandle block aios when EIO
  block-backend: add I/O hang timeout
  block-backend: add I/O hang drain when disbale
  virtio-blk: disable I/O hang when resetting
  qemu-option: add I/O hang timeout option
  qapi: add I/O hang and I/O hang timeout qapi event

 block/block-backend.c  | 285 +
 blockdev.c |  11 ++
 hw/block/virtio-blk.c  |   8 +
 include/sysemu/block-backend.h |   5 +
 qapi/block-core.json   |  26 +++
 5 files changed, 335 insertions(+)

-- 
2.23.0




[RFC PATCH 4/7] block-backend: add I/O hang drain when disbale

2020-09-27 Thread Ying Fang
To disable I/O hang, all hanging AIOs need to be drained. A rehandle status
field is introduced to notify rehandle mechanism not to rehandle failed AIOs
when I/O hang is disabled.

Signed-off-by: Ying Fang 
Signed-off-by: Jiahui Cen 
---
 block/block-backend.c  | 85 --
 include/sysemu/block-backend.h |  3 ++
 2 files changed, 84 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d0b2b59f55..95b2d6a679 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -37,6 +37,9 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
 
 /* block backend rehandle timer interval 5s */
 #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL   5000
+#define BLOCK_BACKEND_REHANDLE_NORMAL   1
+#define BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED  2
+#define BLOCK_BACKEND_REHANDLE_DRAINED  3
 
 enum BlockIOHangStatus {
 BLOCK_IO_HANG_STATUS_NORMAL = 0,
@@ -50,6 +53,8 @@ typedef struct BlockBackendRehandleInfo {
 
 unsigned int in_flight;
 QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios;
+
+int status;
 } BlockBackendRehandleInfo;
 
 typedef struct BlockBackendAioNotifier {
@@ -471,6 +476,8 @@ static void blk_delete(BlockBackend *blk)
 assert(!blk->refcnt);
 assert(!blk->name);
 assert(!blk->dev);
+assert(atomic_read(>reinfo.in_flight) == 0);
+blk_rehandle_disable(blk);
 if (blk->public.throttle_group_member.throttle_state) {
 blk_io_limits_disable(blk);
 }
@@ -2460,6 +2467,37 @@ static void blk_rehandle_remove_aiocb(BlockBackend *blk, 
BlkAioEmAIOCB *acb)
 atomic_dec(>reinfo.in_flight);
 }
 
+static void blk_rehandle_drain(BlockBackend *blk)
+{
+if (blk_bs(blk)) {
+bdrv_drained_begin(blk_bs(blk));
+BDRV_POLL_WHILE(blk_bs(blk), atomic_read(>reinfo.in_flight) > 0);
+bdrv_drained_end(blk_bs(blk));
+}
+}
+
+static bool blk_rehandle_is_paused(BlockBackend *blk)
+{
+return blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED ||
+   blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAINED;
+}
+
+static void blk_rehandle_pause(BlockBackend *blk)
+{
+BlockBackendRehandleInfo *reinfo = >reinfo;
+
+aio_context_acquire(blk_get_aio_context(blk));
+if (!reinfo->enable || reinfo->status == BLOCK_BACKEND_REHANDLE_DRAINED) {
+aio_context_release(blk_get_aio_context(blk));
+return;
+}
+
+reinfo->status = BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED;
+blk_rehandle_drain(blk);
+reinfo->status = BLOCK_BACKEND_REHANDLE_DRAINED;
+aio_context_release(blk_get_aio_context(blk));
+}
+
 static void blk_rehandle_timer_cb(void *opaque)
 {
 BlockBackend *blk = opaque;
@@ -2559,10 +2597,12 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB 
*acb)
 
 if (acb->has_returned) {
 blk_dec_in_flight(acb->rwco.blk);
-need_rehandle = blk_rehandle_aio(acb, _timeout);
-if (need_rehandle) {
-blk_rehandle_insert_aiocb(acb->rwco.blk, acb);
-return;
+if (!blk_rehandle_is_paused(acb->rwco.blk)) {
+need_rehandle = blk_rehandle_aio(acb, _timeout);
+if (need_rehandle) {
+blk_rehandle_insert_aiocb(acb->rwco.blk, acb);
+return;
+}
 }
 
 acb->common.cb(acb->common.opaque, acb->rwco.ret);
@@ -2576,6 +2616,42 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb)
 }
 }
 
+void blk_rehandle_enable(BlockBackend *blk)
+{
+BlockBackendRehandleInfo *reinfo = >reinfo;
+
+aio_context_acquire(blk_get_aio_context(blk));
+if (reinfo->enable) {
+aio_context_release(blk_get_aio_context(blk));
+return;
+}
+
+reinfo->ts = aio_timer_new(blk_get_aio_context(blk), QEMU_CLOCK_REALTIME,
+   SCALE_MS, blk_rehandle_timer_cb, blk);
+reinfo->timer_interval_ms = BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL;
+reinfo->status = BLOCK_BACKEND_REHANDLE_NORMAL;
+reinfo->enable = true;
+aio_context_release(blk_get_aio_context(blk));
+}
+
+void blk_rehandle_disable(BlockBackend *blk)
+{
+if (!blk->reinfo.enable) {
+return;
+}
+
+blk_rehandle_pause(blk);
+timer_del(blk->reinfo.ts);
+timer_free(blk->reinfo.ts);
+blk->reinfo.ts = NULL;
+blk->reinfo.enable = false;
+}
+
+bool blk_iohang_is_enabled(BlockBackend *blk)
+{
+return blk->iohang_timeout != 0;
+}
+
 void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout)
 {
 if (!blk) {
@@ -2588,6 +2664,7 @@ void blk_iohang_init(BlockBackend *blk, int64_t 
iohang_timeout)
 blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL;
 if (iohang_timeout > 0) {
 blk->iohang_timeout = iohang_timeout;
+blk_rehandle_enable(blk);
 }
 }
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index bfebe3

Re: [RFC PATCH 07/12] hw/acpi/aml-build: add processor hierarchy node structure

2020-09-17 Thread Ying Fang




On 9/17/2020 4:27 PM, Andrew Jones wrote:

On Thu, Sep 17, 2020 at 11:20:28AM +0800, Ying Fang wrote:

Add the processor hierarchy node structures to build ACPI information
for CPU topology. Three helpers are introduced:

(1) build_socket_hierarchy for socket description structure
(2) build_processor_hierarchy for processor description structure
(3) build_smt_hierarchy for thread (logic processor) description structure

Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
  hw/acpi/aml-build.c | 37 +
  include/hw/acpi/aml-build.h |  7 +++
  2 files changed, 44 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index f6fbc9b95d..13eb6e1345 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1754,6 +1754,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms)
   table_data->len - slit_start, 1, NULL, NULL);
  }
  
+/*

+ * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
+ */
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */
+build_append_int_noprefix(tbl, parent, 4);  /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}


I see you took this from
https://patchwork.ozlabs.org/project/qemu-devel/patch/20180704124923.32483-6-drjo...@redhat.com/
(even though you neglected to mention that)

I've tweaked my implementation of it slightly per Igor's comments for the
refresh. See build_processor_hierarchy_node() in
https://github.com/rhdrjones/qemu/commit/439b38d67ca1f2cbfa5b9892a822b651ebd05c11

Ok, I will sync with your work and test it.



+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);/* Type 0 - processor */
+build_append_byte(tbl, 20);   /* Length, add private resources */
+build_append_int_noprefix(tbl, 0, 2); /* Reserved */
+build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */
+build_append_int_noprefix(tbl, parent , 4); /* parent */
+build_append_int_noprefix(tbl, id, 4);  /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);   /* Num of private resources */
+}
+
  /* build rev1/rev3/rev5.1 FADT */
  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
  const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index d27da03d64..ff4c6a38f3 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -435,6 +435,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
  
  void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
  
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);

+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id);
+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);


Why add build_socket_hierarchy() and build_smt_hierarchy() ?


To distinguish between socket, core and thread topology level,
build_socket_hierarchy and build_smt_hierarchy are introduced.
They will make the code logical in built_pptt much more much 
straightforward I think.





+
  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
  const char *oem_id, const char *oem_table_id);
  
--

2.23.0



Thanks,
drew

.





Re: [RFC PATCH 09/12] target/arm/cpu: Add CPU cache description for arm

2020-09-17 Thread Ying Fang




On 9/17/2020 4:39 PM, Andrew Jones wrote:

On Thu, Sep 17, 2020 at 11:20:30AM +0800, Ying Fang wrote:

Add the CPUCacheInfo structure to hold CPU cache information for ARM cpus.
A classic three level cache topology is used here. The default cache
capacity is given and userspace can overwrite these values.


Doesn't TCG already have some sort of fake cache hierarchy? If so, then


TCG may have some sort of fake cache hierarchy via CCSIDR.


we shouldn't be adding another one, we should be simply describing the
one we already have. For KVM, we shouldn't describe anything other than
what is actually on the host.


Yes, I agreed. Cache capacity should be the with host otherwise it may
have bad impact on guest performance, we can do that by query from the
host and make cache capacity configurable from userspace.

Dario Faggioli is going to give a talk about it in KVM forum [1].

[1] 
https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse?iframe=no=100%=yes=no


Thanks.



drew

.





Re: [RFC PATCH 06/12] hw/arm/virt-acpi-build: distinguish possible and present cpus

2020-09-17 Thread Ying Fang




On 9/17/2020 4:20 PM, Andrew Jones wrote:

On Thu, Sep 17, 2020 at 11:20:27AM +0800, Ying Fang wrote:

When building ACPI tables regarding CPUs we should always build
them for the number of possible CPUs, not the number of present
CPUs. We then ensure only the present CPUs are enabled.

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
  hw/arm/virt-acpi-build.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)


I approached this in a different way in the refresh, so the this patch
was dropped, but the refresh is completely untested, so something similar
may still be necessary.


Nice work, I'll open it and take a look.
Thanks.


Thanks,
drew

.





Re: [RFC PATCH 04/12] device_tree: add qemu_fdt_add_path

2020-09-17 Thread Ying Fang




On 9/17/2020 4:12 PM, Andrew Jones wrote:

On Thu, Sep 17, 2020 at 11:20:25AM +0800, Ying Fang wrote:

From: Andrew Jones 

qemu_fdt_add_path works like qemu_fdt_add_subnode, except it
also recursively adds any missing parent nodes.

Cc: Peter Crosthwaite 
Cc: Alexander Graf 
Signed-off-by: Andrew Jones 
---
  device_tree.c| 24 
  include/sysemu/device_tree.h |  1 +
  2 files changed, 25 insertions(+)

diff --git a/device_tree.c b/device_tree.c
index b335dae707..1854be3a02 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -524,6 +524,30 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
  return retval;
  }
  
+int qemu_fdt_add_path(void *fdt, const char *path)

+{
+char *parent;
+int offset;
+
+offset = fdt_path_offset(fdt, path);
+if (offset < 0 && offset != -FDT_ERR_NOTFOUND) {
+error_report("%s Couldn't find node %s: %s", __func__, path,
+ fdt_strerror(offset));
+exit(1);
+}
+
+if (offset != -FDT_ERR_NOTFOUND) {
+return offset;
+}
+
+parent = g_strdup(path);
+strrchr(parent, '/')[0] = '\0';
+qemu_fdt_add_path(fdt, parent);
+g_free(parent);
+
+return qemu_fdt_add_subnode(fdt, path);
+}


Igor didn't like the recursion when I posted this before so I changed
it when doing the refresh[*] that I gave to Salil Mehta. Salil also
works for Huawei, are you guys not working together?

[*] https://github.com/rhdrjones/qemu/commits/virt-cpu-topology-refresh


Thanks for the sync. I'll look into it. I did not know about the refresh
and the effort Salil Mehta has made on this. We are not in the same dept 
and work for different projects.


Thanks Ying.


Thanks,
drew


+
  void qemu_fdt_dumpdtb(void *fdt, int size)
  {
  const char *dumpdtb = qemu_opt_get(qemu_get_machine_opts(), "dumpdtb");
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 982c89345f..15fb98af98 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
  uint32_t qemu_fdt_alloc_phandle(void *fdt);
  int qemu_fdt_nop_node(void *fdt, const char *node_path);
  int qemu_fdt_add_subnode(void *fdt, const char *name);
+int qemu_fdt_add_path(void *fdt, const char *path);
  
  #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \

  do {  
\
--
2.23.0




.





Re: [RFC PATCH 03/12] target/arm/kvm32: make MPIDR consistent with CPU Topology

2020-09-17 Thread Ying Fang




On 9/17/2020 4:07 PM, Andrew Jones wrote:

On Thu, Sep 17, 2020 at 11:20:24AM +0800, Ying Fang wrote:

MPIDR helps to provide an additional PE identification in a multiprocessor
system. This patch adds support for setting MPIDR from userspace, so that
MPIDR is consistent with CPU topology configured.

Signed-off-by: Ying Fang 
---
  target/arm/kvm32.c | 46 ++
  1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 0af46b41c8..85694dc8bf 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c


This file no longer exists in mainline. Please rebase the whole series.

Thanks, it is gone. Will rebase it.


Thanks,
drew

.





Re: [RFC PATCH 02/12] target/arm/kvm64: make MPIDR consistent with CPU Topology

2020-09-17 Thread Ying Fang




On 9/17/2020 6:59 PM, Andrew Jones wrote:

On Thu, Sep 17, 2020 at 09:53:35AM +0200, Andrew Jones wrote:

On Thu, Sep 17, 2020 at 11:20:23AM +0800, Ying Fang wrote:

MPIDR helps to provide an additional PE identification in a multiprocessor
system. This patch adds support for setting MPIDR from userspace, so that
MPIDR is consistent with CPU topology configured.

Signed-off-by: Ying Fang 
---
  target/arm/kvm64.c | 46 ++
  1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index ef1e960285..fcce261a10 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -757,10 +757,46 @@ static int kvm_arm_sve_set_vls(CPUState *cs)
  
  #define ARM_CPU_ID_MPIDR   3, 0, 0, 0, 5
  
+static int kvm_arm_set_mp_affinity(CPUState *cs)

+{
+uint64_t mpidr;
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (kvm_check_extension(kvm_state, KVM_CAP_ARM_MP_AFFINITY)) {
+/* Make MPIDR consistent with CPU topology */
+MachineState *ms = MACHINE(qdev_get_machine());
+
+mpidr = (kvm_arch_vcpu_id(cs) % ms->smp.threads) << ARM_AFF0_SHIFT;


We should query KVM first to determine if it wants guests to see their PEs
as threads or not. If not, and ms->smp.threads is > 1, then that's an
error. And, in any case, if ms->smp.threads == 1, then we shouldn't waste
aff0 on it, as that could reduce IPI broadcast performance.


+mpidr |= ((kvm_arch_vcpu_id(cs) / ms->smp.threads % ms->smp.cores)
+& 0xff) << ARM_AFF1_SHIFT;
+mpidr |= (kvm_arch_vcpu_id(cs) / (ms->smp.cores * ms->smp.threads)
+& 0xff) << ARM_AFF2_SHIFT;


Also, as pointed out in the KVM thread, we should not be attempting to
describe topology with the MPIDR at all. Alexandru pointed out [*] as
evidence for that.

However, we do need to consider the limits on Aff0 imposed by the GIC.
See hw/arm/virt.c:virt_cpu_mp_affinity() for how we currently do it
for TCG. We should do something similar for KVM guests when we're taking
full control of the MPIDR.

[*] 
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?id=3102bc0e6ac7

Thanks,
drew


Thanks for your information on MPIDR. As described in [*], MPIDR cannot
be trusted as the actual topology. After applying:
arm64: topology: Stop using MPIDR for topology information

Can we just use topology information from ACPI or fdt as topology and 
ignore MPIDR ?





+
+/* Override mp affinity when KVM is in use */
+cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK;
+
+/* Bit 31 is RES1 indicates the ARMv7 Multiprocessing Extensions */
+mpidr |= (1ULL << 31);
+return kvm_vcpu_ioctl(cs, KVM_ARM_SET_MP_AFFINITY, );
+} else {
+/*
+ * When KVM_CAP_ARM_MP_AFFINITY is not supported, it means KVM has its
+ * own idea about MPIDR assignment, so we override our defaults with
+ * what we get from KVM.
+ */
+int ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), );
+if (ret) {
+error_report("failed to set MPIDR");


We don't need this error, kvm_get_one_reg() has trace support already.
Anyway, the wording is wrong since it says 'set' instead of 'get'.


+return ret;
+}
+cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK;
+return ret;
+}
+}
+
  int kvm_arch_init_vcpu(CPUState *cs)
  {
  int ret;
-uint64_t mpidr;
  ARMCPU *cpu = ARM_CPU(cs);
  CPUARMState *env = >env;
  
@@ -814,16 +850,10 @@ int kvm_arch_init_vcpu(CPUState *cs)

  }
  }
  
-/*

- * When KVM is in use, PSCI is emulated in-kernel and not by qemu.
- * Currently KVM has its own idea about MPIDR assignment, so we
- * override our defaults with what we get from KVM.
- */
-ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), );
+ret = kvm_arm_set_mp_affinity(cs);
  if (ret) {
  return ret;
  }
-cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK;
  
  kvm_arm_init_debug(cs);
  
--

2.23.0




Thanks,
drew


.





Re: [RFC PATCH 02/12] target/arm/kvm64: make MPIDR consistent with CPU Topology

2020-09-17 Thread Ying Fang




On 9/17/2020 3:53 PM, Andrew Jones wrote:

On Thu, Sep 17, 2020 at 11:20:23AM +0800, Ying Fang wrote:

MPIDR helps to provide an additional PE identification in a multiprocessor
system. This patch adds support for setting MPIDR from userspace, so that
MPIDR is consistent with CPU topology configured.

Signed-off-by: Ying Fang 
---
  target/arm/kvm64.c | 46 ++
  1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index ef1e960285..fcce261a10 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -757,10 +757,46 @@ static int kvm_arm_sve_set_vls(CPUState *cs)
  
  #define ARM_CPU_ID_MPIDR   3, 0, 0, 0, 5
  
+static int kvm_arm_set_mp_affinity(CPUState *cs)

+{
+uint64_t mpidr;
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (kvm_check_extension(kvm_state, KVM_CAP_ARM_MP_AFFINITY)) {
+/* Make MPIDR consistent with CPU topology */
+MachineState *ms = MACHINE(qdev_get_machine());
+
+mpidr = (kvm_arch_vcpu_id(cs) % ms->smp.threads) << ARM_AFF0_SHIFT;


We should query KVM first to determine if it wants guests to see their PEs
as threads or not. If not, and ms->smp.threads is > 1, then that's an
error. And, in any case, if ms->smp.threads == 1, then we shouldn't waste
aff0 on it, as that could reduce IPI broadcast performance.


Yes, good catch. Should check against smp.threads before filling
the MPIDR value.




+mpidr |= ((kvm_arch_vcpu_id(cs) / ms->smp.threads % ms->smp.cores)
+& 0xff) << ARM_AFF1_SHIFT;
+mpidr |= (kvm_arch_vcpu_id(cs) / (ms->smp.cores * ms->smp.threads)
+& 0xff) << ARM_AFF2_SHIFT;
+
+/* Override mp affinity when KVM is in use */
+cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK;
+
+/* Bit 31 is RES1 indicates the ARMv7 Multiprocessing Extensions */
+mpidr |= (1ULL << 31);
+return kvm_vcpu_ioctl(cs, KVM_ARM_SET_MP_AFFINITY, );
+} else {
+/*
+ * When KVM_CAP_ARM_MP_AFFINITY is not supported, it means KVM has its
+ * own idea about MPIDR assignment, so we override our defaults with
+ * what we get from KVM.
+ */
+int ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), );
+if (ret) {
+error_report("failed to set MPIDR");


We don't need this error, kvm_get_one_reg() has trace support already.
Anyway, the wording is wrong since it says 'set' instead of 'get'.


Yes, my careless, I will fix it.




+return ret;
+}
+cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK;
+return ret;
+}
+}
+
  int kvm_arch_init_vcpu(CPUState *cs)
  {
  int ret;
-uint64_t mpidr;
  ARMCPU *cpu = ARM_CPU(cs);
  CPUARMState *env = >env;
  
@@ -814,16 +850,10 @@ int kvm_arch_init_vcpu(CPUState *cs)

  }
  }
  
-/*

- * When KVM is in use, PSCI is emulated in-kernel and not by qemu.
- * Currently KVM has its own idea about MPIDR assignment, so we
- * override our defaults with what we get from KVM.
- */
-ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), );
+ret = kvm_arm_set_mp_affinity(cs);
  if (ret) {
  return ret;
  }
-cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK;
  
  kvm_arm_init_debug(cs);
  
--

2.23.0




Thanks,
drew

.





[RFC PATCH 12/12] hw/arm/virt-acpi-build: Enable CPU cache topology

2020-09-16 Thread Ying Fang
A helper struct AcpiCacheOffset is introduced to describe the offset
of three level caches. The cache hierarchy is built according to
ACPI spec v6.3 5.2.29.2. Let's enable CPU cache topology now.

Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
 hw/acpi/aml-build.c | 19 +-
 hw/arm/virt-acpi-build.c| 52 -
 include/hw/acpi/acpi-defs.h |  6 +
 include/hw/acpi/aml-build.h |  7 ++---
 4 files changed, 68 insertions(+), 16 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 123eb032cd..f8d74f3f10 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1783,27 +1783,32 @@ void build_cache_hierarchy(GArray *tbl,
 /*
  * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
  */
-void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+void build_socket_hierarchy(GArray *tbl, uint32_t parent,
+uint32_t offset, uint32_t id)
 {
 build_append_byte(tbl, 0);  /* Type 0 - processor */
-build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_byte(tbl, 24); /* Length, no private resources */
 build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
 build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */
 build_append_int_noprefix(tbl, parent, 4);  /* Parent */
 build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
-build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+build_append_int_noprefix(tbl, 1, 4);  /*  Number of private resources */
+build_append_int_noprefix(tbl, offset, 4);  /* Private resources */
 }
 
-void build_processor_hierarchy(GArray *tbl, uint32_t flags,
-   uint32_t parent, uint32_t id)
+void build_processor_hierarchy(GArray *tbl, uint32_t flags, uint32_t parent,
+   AcpiCacheOffset offset, uint32_t id)
 {
 build_append_byte(tbl, 0);  /* Type 0 - processor */
-build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_byte(tbl, 32); /* Length, no private resources */
 build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
 build_append_int_noprefix(tbl, flags, 4);  /* Flags */
 build_append_int_noprefix(tbl, parent, 4); /* Parent */
 build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
-build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+build_append_int_noprefix(tbl, 3, 4);  /* Number of private resources */
+build_append_int_noprefix(tbl, offset.l1d_offset, 4);/* Private resources 
*/
+build_append_int_noprefix(tbl, offset.l1i_offset, 4);/* Private resources 
*/
+build_append_int_noprefix(tbl, offset.l2_offset, 4); /* Private resources 
*/
 }
 
 void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index b5aa3d3c83..375fb9e24f 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -594,29 +594,69 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  "SRAT", table_data->len - srat_start, 3, NULL, NULL);
 }
 
-static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState 
*ms)
+static inline void arm_acpi_cache_info(CPUCacheInfo *cpu_cache,
+   AcpiCacheInfo *acpi_cache)
 {
+acpi_cache->size = cpu_cache->size;
+acpi_cache->sets = cpu_cache->sets;
+acpi_cache->associativity = cpu_cache->associativity;
+acpi_cache->attributes = cpu_cache->attributes;
+acpi_cache->line_size = cpu_cache->line_size;
+}
+
+static void build_pptt(GArray *table_data, BIOSLinker *linker,
+   VirtMachineState *vms)
+{
+MachineState *ms = MACHINE(vms);
 int pptt_start = table_data->len;
 int uid = 0, cpus = 0, socket;
 unsigned int smp_cores = ms->smp.cores;
 unsigned int smp_threads = ms->smp.threads;
+AcpiCacheOffset offset;
+ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpus));
+AcpiCacheInfo cache_info;
 
 acpi_data_push(table_data, sizeof(AcpiTableHeader));
 
 for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
-uint32_t socket_offset = table_data->len - pptt_start;
+uint32_t l3_offset = table_data->len - pptt_start;
+uint32_t socket_offset;
 int core;
 
-build_socket_hierarchy(table_data, 0, socket);
+/* L3 cache type structure */
+arm_acpi_cache_info(cpu->caches.l3_cache, _info);
+build_cache_hierarchy(table_data, 0, _info);
+
+socket_offset = table_data->len - pptt_start;
+build_socket_hierarchy(table_data, 0, l3_offset, socket);
 
 for (core = 0; core < smp_cores; core++) {
 uint32_t core_offset = table_data->len 

[RFC PATCH 11/12] hw/acpi/aml-build: build ACPI CPU cache topology information

2020-09-16 Thread Ying Fang
To build cache information, An AcpiCacheInfo structure is defined to
hold the Type 1 cache structure according to ACPI spec v6.3 5.2.29.2.
A helper function build_cache_hierarchy is introduced to encode the
cache information.

Signed-off-by: Ying Fang 
---
 hw/acpi/aml-build.c | 26 ++
 include/hw/acpi/acpi-defs.h |  8 
 include/hw/acpi/aml-build.h |  3 +++
 3 files changed, 37 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 13eb6e1345..123eb032cd 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1754,6 +1754,32 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms)
  table_data->len - slit_start, 1, NULL, NULL);
 }
 
+/* ACPI 6.3: 5.29.2 Cache type structure (Type 1) */
+static void build_cache_head(GArray *tbl, uint32_t next_level)
+{
+build_append_byte(tbl, 1);
+build_append_byte(tbl, 24);
+build_append_int_noprefix(tbl, 0, 2);
+build_append_int_noprefix(tbl, 0x7f, 4);
+build_append_int_noprefix(tbl, next_level, 4);
+}
+
+static void build_cache_tail(GArray *tbl, AcpiCacheInfo *cache_info)
+{
+build_append_int_noprefix(tbl, cache_info->size, 4);
+build_append_int_noprefix(tbl, cache_info->sets, 4);
+build_append_byte(tbl, cache_info->associativity);
+build_append_byte(tbl, cache_info->attributes);
+build_append_int_noprefix(tbl, cache_info->line_size, 2);
+}
+
+void build_cache_hierarchy(GArray *tbl,
+  uint32_t next_level, AcpiCacheInfo *cache_info)
+{
+build_cache_head(tbl, next_level);
+build_cache_tail(tbl, cache_info);
+}
+
 /*
  * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
  */
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 38a42f409a..3df38ab449 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -618,4 +618,12 @@ struct AcpiIortRC {
 } QEMU_PACKED;
 typedef struct AcpiIortRC AcpiIortRC;
 
+typedef struct AcpiCacheInfo {
+uint32_t size;
+uint32_t sets;
+uint8_t  associativity;
+uint8_t  attributes;
+uint16_t line_size;
+} AcpiCacheInfo;
+
 #endif
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index ff4c6a38f3..ced1ae6a83 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -435,6 +435,9 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
 
+void build_cache_hierarchy(GArray *tbl,
+  uint32_t next_level, AcpiCacheInfo *cache_info);
+
 void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
 
 void build_processor_hierarchy(GArray *tbl, uint32_t flags,
-- 
2.23.0




[RFC PATCH 10/12] hw/arm/virt: add fdt cache information

2020-09-16 Thread Ying Fang
Support devicetree CPU cache information descriptions

Signed-off-by: Ying Fang 
---
 hw/arm/virt.c | 91 +++
 1 file changed, 91 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 71f7dbb317..74b748ae35 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -343,6 +343,89 @@ static void fdt_add_timer_nodes(const VirtMachineState 
*vms)
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL2_IRQ, irqflags);
 }
 
+static void fdt_add_l3cache_nodes(const VirtMachineState *vms)
+{
+int i;
+const MachineState *ms = MACHINE(vms);
+ARMCPU *cpu = ARM_CPU(first_cpu);
+unsigned int smp_cores = ms->smp.cores;
+unsigned int sockets = ms->smp.max_cpus / smp_cores;
+
+for (i = 0; i < sockets; i++) {
+char *nodename = g_strdup_printf("/cpus/l3-cache%d", i);
+qemu_fdt_add_subnode(vms->fdt, nodename);
+qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache");
+qemu_fdt_setprop_string(vms->fdt, nodename, "cache-unified", "true");
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-level", 3);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size",
+  cpu->caches.l3_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size",
+  cpu->caches.l3_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets",
+  cpu->caches.l3_cache->sets);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+g_free(nodename);
+}
+}
+
+static void fdt_add_l2cache_nodes(const VirtMachineState *vms)
+{
+int i, j;
+const MachineState *ms = MACHINE(vms);
+unsigned int smp_cores = ms->smp.cores;
+signed int sockets = ms->smp.max_cpus / smp_cores;
+ARMCPU *cpu = ARM_CPU(first_cpu);
+
+for (i = 0; i < sockets; i++) {
+char *next_path = g_strdup_printf("/cpus/l3-cache%d", i);
+for (j = 0; j < smp_cores; j++) {
+char *nodename = g_strdup_printf("/cpus/l2-cache%d",
+  i * smp_cores + j);
+qemu_fdt_add_subnode(vms->fdt, nodename);
+qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache");
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size",
+  cpu->caches.l2_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size",
+  cpu->caches.l2_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets",
+  cpu->caches.l2_cache->sets);
+qemu_fdt_setprop_phandle(vms->fdt, nodename,
+  "next-level-cache", next_path);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+g_free(nodename);
+}
+g_free(next_path);
+}
+}
+
+static void fdt_add_l1cache_prop(const VirtMachineState *vms,
+char *nodename, int cpu_index)
+{
+
+ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpu_index));
+CPUCaches caches = cpu->caches;
+
+char *cachename = g_strdup_printf("/cpus/l2-cache%d", cpu_index);
+
+qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-size",
+  caches.l1d_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-line-size",
+  caches.l1d_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-sets",
+  caches.l1d_cache->sets);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-size",
+  caches.l1i_cache->size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-line-size",
+  caches.l1i_cache->line_size);
+qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-sets",
+  caches.l1i_cache->sets);
+qemu_fdt_setprop_phandle(vms->fdt, nodename, "next-level-cache",
+  cachename);
+g_free(cachename);
+}
+
 static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 {
 int cpu;
@@ -378,6 +461,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells);
 qemu_fdt_setprop_cell(vms->fdt, "/cpus", &qu

[RFC PATCH 08/12] hw/arm/virt-acpi-build: add PPTT table

2020-09-16 Thread Ying Fang
Add the Processor Properties Topology Table (PPTT) to present CPU topology
information to the guest.

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 42 
 1 file changed, 42 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f1d574b5d3..b5aa3d3c83 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -594,6 +594,42 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  "SRAT", table_data->len - srat_start, 3, NULL, NULL);
 }
 
+static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState 
*ms)
+{
+int pptt_start = table_data->len;
+int uid = 0, cpus = 0, socket;
+unsigned int smp_cores = ms->smp.cores;
+unsigned int smp_threads = ms->smp.threads;
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
+uint32_t socket_offset = table_data->len - pptt_start;
+int core;
+
+build_socket_hierarchy(table_data, 0, socket);
+
+for (core = 0; core < smp_cores; core++) {
+uint32_t core_offset = table_data->len - pptt_start;
+int thread;
+
+if (smp_threads <= 1) {
+build_processor_hierarchy(table_data, 2, socket_offset, uid++);
+ } else {
+build_processor_hierarchy(table_data, 0, socket_offset, core);
+for (thread = 0; thread < smp_threads; thread++) {
+build_smt_hierarchy(table_data, core_offset, uid++);
+}
+ }
+}
+cpus += smp_cores * smp_threads;
+}
+
+build_header(linker, table_data,
+ (void *)(table_data->data + pptt_start), "PPTT",
+ table_data->len - pptt_start, 2, NULL, NULL);
+}
+
 /* GTDT */
 static void
 build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -834,6 +870,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 unsigned dsdt, xsdt;
 GArray *tables_blob = tables->table_data;
 MachineState *ms = MACHINE(vms);
+bool cpu_topology_enabled = !vmc->ignore_cpu_topology;
 
 table_offsets = g_array_new(false, true /* clear */,
 sizeof(uint32_t));
@@ -853,6 +890,11 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 acpi_add_table(table_offsets, tables_blob);
 build_madt(tables_blob, tables->linker, vms);
 
+if (cpu_topology_enabled) {
+acpi_add_table(table_offsets, tables_blob);
+build_pptt(tables_blob, tables->linker, ms);
+}
+
 acpi_add_table(table_offsets, tables_blob);
 build_gtdt(tables_blob, tables->linker, vms);
 
-- 
2.23.0




[RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support

2020-09-16 Thread Ying Fang
An accurate cpu topology may help improve the cpu scheduler's decision
making when dealing with multi-core system. So cpu topology description
is helpful to provide guest with the right view. Cpu cache information may
also have slight impact on the sched domain, and even userspace software
may check the cpu cache information to do some optimizations. Thus this patch
series is posted to provide cpu and cache topology support for arm.

To make the cpu topology consistent with MPIDR, an vcpu ioctl
KVM_ARM_SET_MP_AFFINITY is introduced so that userspace can set MPIDR
according to the topology specified [1]. To describe the cpu topology
both fdt and ACPI are supported. To describe the cpu cache information,
a default cache hierarchy is given and can be made configurable later.
The cpu topology is built according to processor hierarchy node structure.
The cpu cache information is built according to cache type structure.

This patch series is partially based on the patches posted by Andrew Jone
years ago [2], I jumped in on it since some OS vendor cooperative partners
are eager for it. Thanks for Andrew's contribution. Please feel free to reply
to me if there is anything improper.

[1] https://patchwork.kernel.org/cover/11781317
[2] 
https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com

Andrew Jones (2):
  device_tree: add qemu_fdt_add_path
  hw/arm/virt: DT: add cpu-map

Ying Fang (10):
  linux headers: Update linux header with KVM_ARM_SET_MP_AFFINITY
  target/arm/kvm64: make MPIDR consistent with CPU Topology
  target/arm/kvm32: make MPIDR consistent with CPU Topology
  hw/arm/virt-acpi-build: distinguish possible and present cpus
  hw/acpi/aml-build: add processor hierarchy node structure
  hw/arm/virt-acpi-build: add PPTT table
  target/arm/cpu: Add CPU cache description for arm
  hw/arm/virt: add fdt cache information
  hw/acpi/aml-build: build ACPI CPU cache topology information
  hw/arm/virt-acpi-build: Enable CPU cache topology

 device_tree.c|  24 +++
 hw/acpi/aml-build.c  |  68 +++
 hw/arm/virt-acpi-build.c |  99 +--
 hw/arm/virt.c| 128 ++-
 include/hw/acpi/acpi-defs.h  |  14 
 include/hw/acpi/aml-build.h  |  11 +++
 include/hw/arm/virt.h|   1 +
 include/sysemu/device_tree.h |   1 +
 linux-headers/linux/kvm.h|   3 +
 target/arm/cpu.c |  42 
 target/arm/cpu.h |  27 
 target/arm/kvm32.c   |  46 ++---
 target/arm/kvm64.c   |  46 ++---
 13 files changed, 488 insertions(+), 22 deletions(-)

-- 
2.23.0




[RFC PATCH 07/12] hw/acpi/aml-build: add processor hierarchy node structure

2020-09-16 Thread Ying Fang
Add the processor hierarchy node structures to build ACPI information
for CPU topology. Three helpers are introduced:

(1) build_socket_hierarchy for socket description structure
(2) build_processor_hierarchy for processor description structure
(3) build_smt_hierarchy for thread (logic processor) description structure

Signed-off-by: Ying Fang 
Signed-off-by: Henglong Fan 
---
 hw/acpi/aml-build.c | 37 +
 include/hw/acpi/aml-build.h |  7 +++
 2 files changed, 44 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index f6fbc9b95d..13eb6e1345 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1754,6 +1754,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms)
  table_data->len - slit_start, 1, NULL, NULL);
 }
 
+/*
+ * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
+ */
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */
+build_append_int_noprefix(tbl, parent, 4);  /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);  /* Type 0 - processor */
+build_append_byte(tbl, 20); /* Length, no private resources */
+build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
+build_append_int_noprefix(tbl, flags, 4);  /* Flags */
+build_append_int_noprefix(tbl, parent, 4); /* Parent */
+build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
+}
+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+build_append_byte(tbl, 0);/* Type 0 - processor */
+build_append_byte(tbl, 20);   /* Length, add private resources */
+build_append_int_noprefix(tbl, 0, 2); /* Reserved */
+build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */
+build_append_int_noprefix(tbl, parent , 4); /* parent */
+build_append_int_noprefix(tbl, id, 4);  /* ACPI processor ID */
+build_append_int_noprefix(tbl, 0, 4);   /* Num of private resources */
+}
+
 /* build rev1/rev3/rev5.1 FADT */
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index d27da03d64..ff4c6a38f3 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -435,6 +435,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
 
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
 
+void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
+void build_processor_hierarchy(GArray *tbl, uint32_t flags,
+   uint32_t parent, uint32_t id);
+
+void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id);
 
-- 
2.23.0




[RFC PATCH 06/12] hw/arm/virt-acpi-build: distinguish possible and present cpus

2020-09-16 Thread Ying Fang
When building ACPI tables regarding CPUs we should always build
them for the number of possible CPUs, not the number of present
CPUs. We then ensure only the present CPUs are enabled.

Signed-off-by: Andrew Jones 
Signed-off-by: Ying Fang 
---
 hw/arm/virt-acpi-build.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9efd7a3881..f1d574b5d3 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -56,14 +56,18 @@
 
 #define ARM_SPI_BASE 32
 
-static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
+static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
 {
 uint16_t i;
+CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus;
 
-for (i = 0; i < smp_cpus; i++) {
+for (i = 0; i < possible_cpus->len; i++) {
 Aml *dev = aml_device("C%.03X", i);
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007")));
 aml_append(dev, aml_name_decl("_UID", aml_int(i)));
+if (possible_cpus->cpus[i].cpu == NULL) {
+aml_append(dev, aml_name_decl("_STA", aml_int(0)));
+}
 aml_append(scope, dev);
 }
 }
@@ -635,6 +639,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 const int *irqmap = vms->irqmap;
 AcpiMadtGenericDistributor *gicd;
 AcpiMadtGenericMsiFrame *gic_msi;
+int possible_cpus = MACHINE(vms)->possible_cpus->len;
 int i;
 
 acpi_data_push(table_data, sizeof(AcpiMultipleApicTable));
@@ -645,7 +650,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base);
 gicd->version = vms->gic_version;
 
-for (i = 0; i < vms->smp_cpus; i++) {
+for (i = 0; i < possible_cpus; i++) {
 AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data,
sizeof(*gicc));
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i));
@@ -660,7 +665,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicc->cpu_interface_number = cpu_to_le32(i);
 gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity);
 gicc->uid = cpu_to_le32(i);
-gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+if (i < vms->smp_cpus) {
+gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED);
+}
 
 if (arm_feature(>env, ARM_FEATURE_PMU)) {
 gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ));
@@ -764,7 +771,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  * the RTC ACPI device at all when using UEFI.
  */
 scope = aml_scope("\\_SB");
-acpi_dsdt_add_cpus(scope, vms->smp_cpus);
+acpi_dsdt_add_cpus(scope, vms);
 acpi_dsdt_add_uart(scope, [VIRT_UART],
(irqmap[VIRT_UART] + ARM_SPI_BASE));
 if (vmc->acpi_expose_flash) {
-- 
2.23.0




[RFC PATCH 09/12] target/arm/cpu: Add CPU cache description for arm

2020-09-16 Thread Ying Fang
Add the CPUCacheInfo structure to hold CPU cache information for ARM cpus.
A classic three level cache topology is used here. The default cache
capacity is given and userspace can overwrite these values.

Signed-off-by: Ying Fang 
---
 target/arm/cpu.c | 42 ++
 target/arm/cpu.h | 27 +++
 2 files changed, 69 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index c179e0752d..efa8e1974a 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -27,6 +27,7 @@
 #include "qapi/visitor.h"
 #include "cpu.h"
 #include "internals.h"
+#include "qemu/units.h"
 #include "exec/exec-all.h"
 #include "hw/qdev-properties.h"
 #if !defined(CONFIG_USER_ONLY)
@@ -998,6 +999,45 @@ uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz)
 return (Aff1 << ARM_AFF1_SHIFT) | Aff0;
 }
 
+static CPUCaches default_cache_info = {
+.l1d_cache = &(CPUCacheInfo) {
+.type = DATA_CACHE,
+.level = 1,
+.size = 64 * KiB,
+.line_size = 64,
+.associativity = 4,
+.sets = 256,
+.attributes = 0x02,
+},
+.l1i_cache = &(CPUCacheInfo) {
+.type = INSTRUCTION_CACHE,
+.level = 1,
+.size = 64 * KiB,
+.line_size = 64,
+.associativity = 4,
+.sets = 256,
+.attributes = 0x04,
+},
+.l2_cache = &(CPUCacheInfo) {
+.type = UNIFIED_CACHE,
+.level = 2,
+.size = 512 * KiB,
+.line_size = 64,
+.associativity = 8,
+.sets = 1024,
+.attributes = 0x0a,
+},
+.l3_cache = &(CPUCacheInfo) {
+.type = UNIFIED_CACHE,
+.level = 3,
+.size = 65536 * KiB,
+.line_size = 64,
+.associativity = 15,
+.sets = 2048,
+.attributes = 0x0a,
+},
+};
+
 static void cpreg_hashtable_data_destroy(gpointer data)
 {
 /*
@@ -1835,6 +1875,8 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 }
 }
 
+cpu->caches = default_cache_info;
+
 qemu_init_vcpu(cs);
 cpu_reset(cs);
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a1c7d8ebae..e9e3817e20 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -745,6 +745,30 @@ typedef enum ARMPSCIState {
 
 typedef struct ARMISARegisters ARMISARegisters;
 
+/* Cache information type */
+enum CacheType {
+DATA_CACHE,
+INSTRUCTION_CACHE,
+UNIFIED_CACHE
+};
+
+typedef struct CPUCacheInfo {
+enum CacheType type;  /* Cache Type*/
+uint8_t level;
+uint32_t size;/* Size in bytes */
+uint16_t line_size;   /* Line size in bytes */
+uint8_t associativity;/* Cache associativity */
+uint32_t sets;/* Number of sets */
+uint8_t attributes;   /* Cache attributest  */
+} CPUCacheInfo;
+
+typedef struct CPUCaches {
+CPUCacheInfo *l1d_cache;
+CPUCacheInfo *l1i_cache;
+CPUCacheInfo *l2_cache;
+CPUCacheInfo *l3_cache;
+} CPUCaches;
+
 /**
  * ARMCPU:
  * @env: #CPUARMState
@@ -986,6 +1010,9 @@ struct ARMCPU {
 
 /* Generic timer counter frequency, in Hz */
 uint64_t gt_cntfrq_hz;
+
+/* CPU cache information */
+CPUCaches caches;
 };
 
 unsigned int gt_cntfrq_period_ns(ARMCPU *cpu);
-- 
2.23.0




[RFC PATCH 01/12] linux headers: Update linux header with KVM_ARM_SET_MP_AFFINITY

2020-09-16 Thread Ying Fang
Signed-off-by: Ying Fang 
---
 linux-headers/linux/kvm.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index a28c366737..461a2302e7 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SECURE_GUEST 181
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
+#define KVM_CAP_ARM_MP_AFFINITY 187
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1470,6 +1471,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_S390_SET_CMMA_BITS  _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
 /* Memory Encryption Commands */
 #define KVM_MEMORY_ENCRYPT_OP  _IOWR(KVMIO, 0xba, unsigned long)
+/* Available with KVM_CAP_ARM_MP_AFFINITY */
+#define KVM_ARM_SET_MP_AFFINITY_IOWR(KVMIO, 0xbb, unsigned long)
 
 struct kvm_enc_region {
__u64 addr;
-- 
2.23.0




[RFC PATCH 04/12] device_tree: add qemu_fdt_add_path

2020-09-16 Thread Ying Fang
From: Andrew Jones 

qemu_fdt_add_path works like qemu_fdt_add_subnode, except it
also recursively adds any missing parent nodes.

Cc: Peter Crosthwaite 
Cc: Alexander Graf 
Signed-off-by: Andrew Jones 
---
 device_tree.c| 24 
 include/sysemu/device_tree.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/device_tree.c b/device_tree.c
index b335dae707..1854be3a02 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -524,6 +524,30 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
 return retval;
 }
 
+int qemu_fdt_add_path(void *fdt, const char *path)
+{
+char *parent;
+int offset;
+
+offset = fdt_path_offset(fdt, path);
+if (offset < 0 && offset != -FDT_ERR_NOTFOUND) {
+error_report("%s Couldn't find node %s: %s", __func__, path,
+ fdt_strerror(offset));
+exit(1);
+}
+
+if (offset != -FDT_ERR_NOTFOUND) {
+return offset;
+}
+
+parent = g_strdup(path);
+strrchr(parent, '/')[0] = '\0';
+qemu_fdt_add_path(fdt, parent);
+g_free(parent);
+
+return qemu_fdt_add_subnode(fdt, path);
+}
+
 void qemu_fdt_dumpdtb(void *fdt, int size)
 {
 const char *dumpdtb = qemu_opt_get(qemu_get_machine_opts(), "dumpdtb");
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 982c89345f..15fb98af98 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
 uint32_t qemu_fdt_alloc_phandle(void *fdt);
 int qemu_fdt_nop_node(void *fdt, const char *node_path);
 int qemu_fdt_add_subnode(void *fdt, const char *name);
+int qemu_fdt_add_path(void *fdt, const char *path);
 
 #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \
 do {  \
-- 
2.23.0




[RFC PATCH 05/12] hw/arm/virt: DT: add cpu-map

2020-09-16 Thread Ying Fang
From: Andrew Jones 

Support devicetree CPU topology descriptions.

Signed-off-by: Andrew Jones 
---
 hw/arm/virt.c | 37 -
 include/hw/arm/virt.h |  1 +
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index acf9bfbece..71f7dbb317 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -348,7 +348,10 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 int cpu;
 int addr_cells = 1;
 const MachineState *ms = MACHINE(vms);
-
+VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+unsigned int smp_cores = ms->smp.cores;
+unsigned int smp_threads = ms->smp.threads;
+bool cpu_topology_enabled = !vmc->ignore_cpu_topology;
 /*
  * From Documentation/devicetree/bindings/arm/cpus.txt
  *  On ARM v8 64-bit systems value should be set to 2,
@@ -404,8 +407,37 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
 }
 
+qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(vms->fdt));
+
 g_free(nodename);
 }
+if (cpu_topology_enabled) {
+/* Add vcpu topology by fdt node cpu-map. */
+qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map");
+
+for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
+char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu);
+char *map_path;
+
+if (smp_threads > 1) {
+map_path = g_strdup_printf(
+   "/cpus/cpu-map/%s%d/%s%d/%s%d",
+   "cluster", cpu / (smp_cores * smp_threads),
+   "core", (cpu / smp_threads) % smp_cores,
+   "thread", cpu % smp_threads);
+} else {
+map_path = g_strdup_printf(
+   "/cpus/cpu-map/%s%d/%s%d",
+   "cluster", cpu / smp_cores,
+   "core", cpu % smp_cores);
+}
+qemu_fdt_add_path(vms->fdt, map_path);
+qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path);
+g_free(map_path);
+g_free(cpu_path);
+}
+}
 }
 
 static void fdt_add_its_gic_node(VirtMachineState *vms)
@@ -2553,8 +2585,11 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 2)
 
 static void virt_machine_5_1_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
 virt_machine_5_2_options(mc);
 compat_props_add(mc->compat_props, hw_compat_5_1, hw_compat_5_1_len);
+vmc->ignore_cpu_topology = true;
 }
 DEFINE_VIRT_MACHINE(5, 1)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index dff67e1bef..d37c6b7858 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -119,6 +119,7 @@ typedef struct {
 MachineClass parent;
 bool disallow_affinity_adjustment;
 bool no_its;
+bool ignore_cpu_topology;
 bool no_pmu;
 bool claim_edge_triggered_timers;
 bool smbios_old_sys_ver;
-- 
2.23.0




[RFC PATCH 02/12] target/arm/kvm64: make MPIDR consistent with CPU Topology

2020-09-16 Thread Ying Fang
MPIDR helps to provide an additional PE identification in a multiprocessor
system. This patch adds support for setting MPIDR from userspace, so that
MPIDR is consistent with CPU topology configured.

Signed-off-by: Ying Fang 
---
 target/arm/kvm64.c | 46 ++
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index ef1e960285..fcce261a10 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -757,10 +757,46 @@ static int kvm_arm_sve_set_vls(CPUState *cs)
 
 #define ARM_CPU_ID_MPIDR   3, 0, 0, 0, 5
 
+static int kvm_arm_set_mp_affinity(CPUState *cs)
+{
+uint64_t mpidr;
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (kvm_check_extension(kvm_state, KVM_CAP_ARM_MP_AFFINITY)) {
+/* Make MPIDR consistent with CPU topology */
+MachineState *ms = MACHINE(qdev_get_machine());
+
+mpidr = (kvm_arch_vcpu_id(cs) % ms->smp.threads) << ARM_AFF0_SHIFT;
+mpidr |= ((kvm_arch_vcpu_id(cs) / ms->smp.threads % ms->smp.cores)
+& 0xff) << ARM_AFF1_SHIFT;
+mpidr |= (kvm_arch_vcpu_id(cs) / (ms->smp.cores * ms->smp.threads)
+& 0xff) << ARM_AFF2_SHIFT;
+
+/* Override mp affinity when KVM is in use */
+cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK;
+
+/* Bit 31 is RES1 indicates the ARMv7 Multiprocessing Extensions */
+mpidr |= (1ULL << 31);
+return kvm_vcpu_ioctl(cs, KVM_ARM_SET_MP_AFFINITY, );
+} else {
+/*
+ * When KVM_CAP_ARM_MP_AFFINITY is not supported, it means KVM has its
+ * own idea about MPIDR assignment, so we override our defaults with
+ * what we get from KVM.
+ */
+int ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), );
+if (ret) {
+error_report("failed to set MPIDR");
+return ret;
+}
+cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK;
+return ret;
+}
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 int ret;
-uint64_t mpidr;
 ARMCPU *cpu = ARM_CPU(cs);
 CPUARMState *env = >env;
 
@@ -814,16 +850,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 }
 
-/*
- * When KVM is in use, PSCI is emulated in-kernel and not by qemu.
- * Currently KVM has its own idea about MPIDR assignment, so we
- * override our defaults with what we get from KVM.
- */
-ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), );
+ret = kvm_arm_set_mp_affinity(cs);
 if (ret) {
 return ret;
 }
-cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK;
 
 kvm_arm_init_debug(cs);
 
-- 
2.23.0




[RFC PATCH 03/12] target/arm/kvm32: make MPIDR consistent with CPU Topology

2020-09-16 Thread Ying Fang
MPIDR helps to provide an additional PE identification in a multiprocessor
system. This patch adds support for setting MPIDR from userspace, so that
MPIDR is consistent with CPU topology configured.

Signed-off-by: Ying Fang 
---
 target/arm/kvm32.c | 46 ++
 1 file changed, 38 insertions(+), 8 deletions(-)

diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 0af46b41c8..85694dc8bf 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -201,11 +201,47 @@ int kvm_arm_cpreg_level(uint64_t regidx)
 
 #define ARM_CPU_ID_MPIDR   0, 0, 0, 5
 
+static int kvm_arm_set_mp_affinity(CPUState *cs)
+{
+uint32_t mpidr;
+ARMCPU *cpu = ARM_CPU(cs);
+
+if (kvm_check_extension(kvm_state, KVM_CAP_ARM_MP_AFFINITY)) {
+/* Make MPIDR consistent with CPU topology */
+MachineState *ms = MACHINE(qdev_get_machine());
+
+mpidr = (kvm_arch_vcpu_id(cs) % ms->smp.threads) << ARM_AFF0_SHIFT;
+mpidr |= ((kvm_arch_vcpu_id(cs) / ms->smp.threads % ms->smp.cores)
+& 0xff) << ARM_AFF1_SHIFT;
+mpidr |= (kvm_arch_vcpu_id(cs) / (ms->smp.cores * ms->smp.threads)
+   & 0xff) << ARM_AFF2_SHIFT;
+
+/* Override mp affinity when KVM is in use */
+cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK;
+
+/* Bit 31 is RES1 indicates the ARMv7 Multiprocessing Extensions */
+mpidr |= (1ULL << 31);
+return kvm_vcpu_ioctl(cs, KVM_ARM_SET_MP_AFFINITY, );
+} else {
+/*
+ * When KVM_CAP_ARM_MP_AFFINITY is not supported, it means KVM has its
+ * own idea about MPIDR assignment, so we override our defaults with
+ * what we get from KVM.
+ */
+int ret = kvm_get_one_reg(cs, ARM_CP15_REG32(ARM_CPU_ID_MPIDR), 
);
+if (ret) {
+error_report("failed to set MPIDR");
+return ret;
+}
+cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK;
+return ret;
+}
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 int ret;
 uint64_t v;
-uint32_t mpidr;
 struct kvm_one_reg r;
 ARMCPU *cpu = ARM_CPU(cs);
 
@@ -244,16 +280,10 @@ int kvm_arch_init_vcpu(CPUState *cs)
 return -EINVAL;
 }
 
-/*
- * When KVM is in use, PSCI is emulated in-kernel and not by qemu.
- * Currently KVM has its own idea about MPIDR assignment, so we
- * override our defaults with what we get from KVM.
- */
-ret = kvm_get_one_reg(cs, ARM_CP15_REG32(ARM_CPU_ID_MPIDR), );
+ret = kvm_arm_set_mp_affinity(cs);
 if (ret) {
 return ret;
 }
-cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK;
 
 /* Check whether userspace can specify guest syndrome value */
 kvm_arm_init_serror_injection(cs);
-- 
2.23.0




Re: [PATCH] qcow2: flush qcow2 l2 meta for new allocated clusters

2020-08-13 Thread Ying Fang




On 8/7/2020 4:13 PM, Kevin Wolf wrote:

Am 07.08.2020 um 09:42 hat Ying Fang geschrieben:



On 8/6/2020 5:13 PM, Kevin Wolf wrote:

Am 05.08.2020 um 04:38 hat Ying Fang geschrieben:

From: fangying 

When qemu or qemu-nbd process uses a qcow2 image and configured with
'cache = none', it will write to the qcow2 image with a cache to cache
L2 tables, however the process will not use L2 tables without explicitly
calling the flush command or closing the mirror flash into the disk.
Which may cause the disk data inconsistent with the written data for
a long time. If an abnormal process exit occurs here, the issued written
data will be lost.

Therefore, in order to keep data consistency we need to flush the changes
to the L2 entry to the disk in time for the newly allocated cluster.

Signed-off-by: Ying Fang 


If you want to have data safely written to the disk after each write
request, you need to use cache=writethrough/directsync (in other words,
aliases that are equivalent to setting -device ...,write-cache=off).
Note that this will have a major impact on write performance.

cache=none means bypassing the kernel page cache (O_DIRECT), but not
flushing after each write request.


Well, IIUC, cache=none does not guarantee data safety and we should not
expect that. Then this patch can be ignored.


Indeed, cache=none is a writeback cache mode with all of the
consequences. In practice, this is normally good enough because the
guest OS will send flush requests when needed (e.g. because a guest
application called fsync()), but if the guest doesn't do this, it may
suffer data loss. This behaviour is comparable to a volatile disk cache
on real hard disks and is a good default, but sometimes you need a
writethrough cache mode at the cost of a performance penalty.


The late reply, thanks for your detailed explanation on the 'cache' 
option, having more understanding for it now.


Kevin

.





Re: [PATCH] qcow2: flush qcow2 l2 meta for new allocated clusters

2020-08-07 Thread Ying Fang




On 8/6/2020 5:13 PM, Kevin Wolf wrote:

Am 05.08.2020 um 04:38 hat Ying Fang geschrieben:

From: fangying 

When qemu or qemu-nbd process uses a qcow2 image and configured with
'cache = none', it will write to the qcow2 image with a cache to cache
L2 tables, however the process will not use L2 tables without explicitly
calling the flush command or closing the mirror flash into the disk.
Which may cause the disk data inconsistent with the written data for
a long time. If an abnormal process exit occurs here, the issued written
data will be lost.

Therefore, in order to keep data consistency we need to flush the changes
to the L2 entry to the disk in time for the newly allocated cluster.

Signed-off-by: Ying Fang 


If you want to have data safely written to the disk after each write
request, you need to use cache=writethrough/directsync (in other words,
aliases that are equivalent to setting -device ...,write-cache=off).
Note that this will have a major impact on write performance.

cache=none means bypassing the kernel page cache (O_DIRECT), but not
flushing after each write request.


Well, IIUC, cache=none does not guarantee data safety and we should not
expect that. Then this patch can be ignored.

Thanks.


Kevin

.





Re: [PATCH] qcow2: flush qcow2 l2 meta for new allocated clusters

2020-08-06 Thread Ying Fang




On 8/5/2020 10:43 AM, no-re...@patchew.org wrote:

Patchew URL: https://patchew.org/QEMU/20200805023826.184-1-fangyi...@huawei.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.
I see some error message which says ** No space left on device **

However I do not know what is wrong with this build test.
Could you give me some help here?

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
error: copy-fd: write returned No space left on device
fatal: failed to copy file to 
'/var/tmp/patchew-tester-tmp-wtnwtuq5/src/.git/objects/pack/pack-518a8ad92e3ce11d2627a7221e2d360b337cb27d.pack': 
No space left on device

fatal: The remote end hung up unexpectedly
Traceback (most recent call last):
  File "patchew-tester/src/patchew-cli", line 521, in test_one
git_clone_repo(clone, r["repo"], r["head"], logf, True)
  File "patchew-tester/src/patchew-cli", line 53, in git_clone_repo
subprocess.check_call(clone_cmd, stderr=logf, stdout=logf)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", 
line 291, in check_call

raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'clone', '-q', 
'/home/patchew/.cache/patchew-git-cache/httpsgithubcompatchewprojectqemu-3c8cf5a9c21ff8782164d1def7f44bd888713384', 
'/var/tmp/patchew-tester-tmp-wtnwtuq5/src']' returned non-zero exit 
status 128.








The full log is available at
http://patchew.org/logs/20200805023826.184-1-fangyi...@huawei.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com





Re: [PATCH v5 0/2] add new options to set smbios type 4 fields

2020-08-06 Thread Ying Fang




On 8/6/2020 2:01 PM, Michael S. Tsirkin wrote:

On Thu, Aug 06, 2020 at 11:56:32AM +0800, Ying Fang wrote:

From: fangying 

Hi, this patchset was previously posted by my teamate Heyi Guo several
months ago, however we missed the merge window. It is reposted here to
make it an end. Thanks.



Thanks, I will tag it for after the release.
Pls ping me after the release to make sure I don't drop it by mistake.


Yes, I will do that. Hope it won't be missed this time.
Thanks.



Patch description:
  
Common VM users sometimes care about CPU speed, so we add two new

options to allow VM vendors to present CPU speed to their users.
Normally these information can be fetched from host smbios.

Strictly speaking, the "max speed" and "current speed" in type 4
are not really for the max speed and current speed of processor, for
"max speed" identifies a capability of the system, and "current speed"
identifies the processor's speed at boot (see smbios spec), but some
applications do not tell the differences.

Changelog:

v4 -> v5:
- Rebase patch for lastest upstream

v3 -> v4:
- Fix the default value when not specifying "-smbios type=4" option;
it would be 0 instead of 2000 in previous versions
- Use uint64_t type to check value overflow
- Add test case to check smbios type 4 CPU speed
- v4 https://patchwork.kernel.org/cover/11444635/

v2 -> v3:
- Refine comments per Igor's suggestion.

v1 -> v2:
- change "_" in option names to "-"
- check if option value is too large to fit in SMBIOS type 4 speed
fields.

Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 

Ying Fang (2):
   hw/smbios: add options for type 4 max-speed and current-speed
   tests/bios-tables-test: add smbios cpu speed test

  hw/smbios/smbios.c   | 36 ++
  qemu-options.hx  |  2 +-
  tests/bios-tables-test.c | 42 
  3 files changed, 75 insertions(+), 5 deletions(-)

--
2.23.0


.





[PATCH v5 1/2] hw/smbios: add options for type 4 max-speed and current-speed

2020-08-05 Thread Ying Fang
Common VM users sometimes care about CPU speed, so we add two new
options to allow VM vendors to present CPU speed to their users.
Normally these information can be fetched from host smbios.

Strictly speaking, the "max speed" and "current speed" in type 4
are not really for the max speed and current speed of processor, for
"max speed" identifies a capability of the system, and "current speed"
identifies the processor's speed at boot (see smbios spec), but some
applications do not tell the differences.

Reviewed-by: Igor Mammedov 
Signed-off-by: Ying Fang 
Signed-off-by: Heyi Guo 
---
 hw/smbios/smbios.c | 36 
 qemu-options.hx|  2 +-
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 11d476c4a2..53181a58eb 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -93,9 +93,21 @@ static struct {
 const char *manufacturer, *version, *serial, *asset, *sku;
 } type3;
 
+/*
+ * SVVP requires max_speed and current_speed to be set and not being
+ * 0 which counts as unknown (SMBIOS 3.1.0/Table 21). Set the
+ * default value to 2000MHz as we did before.
+ */
+#define DEFAULT_CPU_SPEED 2000
+
 static struct {
 const char *sock_pfx, *manufacturer, *version, *serial, *asset, *part;
-} type4;
+uint64_t max_speed;
+uint64_t current_speed;
+} type4 = {
+.max_speed = DEFAULT_CPU_SPEED,
+.current_speed = DEFAULT_CPU_SPEED
+};
 
 static struct {
 size_t nvalues;
@@ -273,6 +285,14 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = {
 .name = "version",
 .type = QEMU_OPT_STRING,
 .help = "version number",
+},{
+.name = "max-speed",
+.type = QEMU_OPT_NUMBER,
+.help = "max speed in MHz",
+},{
+.name = "current-speed",
+.type = QEMU_OPT_NUMBER,
+.help = "speed at system boot in MHz",
 },{
 .name = "serial",
 .type = QEMU_OPT_STRING,
@@ -587,9 +607,8 @@ static void smbios_build_type_4_table(MachineState *ms, 
unsigned instance)
 SMBIOS_TABLE_SET_STR(4, processor_version_str, type4.version);
 t->voltage = 0;
 t->external_clock = cpu_to_le16(0); /* Unknown */
-/* SVVP requires max_speed and current_speed to not be unknown. */
-t->max_speed = cpu_to_le16(2000); /* 2000 MHz */
-t->current_speed = cpu_to_le16(2000); /* 2000 MHz */
+t->max_speed = cpu_to_le16(type4.max_speed);
+t->current_speed = cpu_to_le16(type4.current_speed);
 t->status = 0x41; /* Socket populated, CPU enabled */
 t->processor_upgrade = 0x01; /* Other */
 t->l1_cache_handle = cpu_to_le16(0x); /* N/A */
@@ -1130,6 +1149,15 @@ void smbios_entry_add(QemuOpts *opts, Error **errp)
 save_opt(, opts, "serial");
 save_opt(, opts, "asset");
 save_opt(, opts, "part");
+type4.max_speed = qemu_opt_get_number(opts, "max-speed",
+  DEFAULT_CPU_SPEED);
+type4.current_speed = qemu_opt_get_number(opts, "current-speed",
+  DEFAULT_CPU_SPEED);
+if (type4.max_speed > UINT16_MAX ||
+type4.current_speed > UINT16_MAX) {
+error_setg(errp, "SMBIOS CPU speed is too large (> %d)",
+   UINT16_MAX);
+}
 return;
 case 11:
 qemu_opts_validate(opts, qemu_smbios_type11_opts, );
diff --git a/qemu-options.hx b/qemu-options.hx
index ea0638e92d..50b068423c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2073,7 +2073,7 @@ DEF("smbios", HAS_ARG, QEMU_OPTION_smbios,
 "  [,sku=str]\n"
 "specify SMBIOS type 3 fields\n"
 "-smbios 
type=4[,sock_pfx=str][,manufacturer=str][,version=str][,serial=str]\n"
-"  [,asset=str][,part=str]\n"
+"  [,asset=str][,part=str][,max-speed=%d][,current-speed=%d]\n"
 "specify SMBIOS type 4 fields\n"
 "-smbios 
type=17[,loc_pfx=str][,bank=str][,manufacturer=str][,serial=str]\n"
 "   [,asset=str][,part=str][,speed=%d]\n"
-- 
2.23.0




[PATCH v5 0/2] add new options to set smbios type 4 fields

2020-08-05 Thread Ying Fang
From: fangying 

Hi, this patchset was previously posted by my teamate Heyi Guo several
months ago, however we missed the merge window. It is reposted here to
make it an end. Thanks.

Patch description:
 
Common VM users sometimes care about CPU speed, so we add two new
options to allow VM vendors to present CPU speed to their users.
Normally these information can be fetched from host smbios.

Strictly speaking, the "max speed" and "current speed" in type 4
are not really for the max speed and current speed of processor, for
"max speed" identifies a capability of the system, and "current speed"
identifies the processor's speed at boot (see smbios spec), but some
applications do not tell the differences.

Changelog:

v4 -> v5:
- Rebase patch for lastest upstream

v3 -> v4:
- Fix the default value when not specifying "-smbios type=4" option;
it would be 0 instead of 2000 in previous versions
- Use uint64_t type to check value overflow
- Add test case to check smbios type 4 CPU speed
- v4 https://patchwork.kernel.org/cover/11444635/

v2 -> v3:
- Refine comments per Igor's suggestion.

v1 -> v2:
- change "_" in option names to "-"
- check if option value is too large to fit in SMBIOS type 4 speed
fields.

Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 

Ying Fang (2):
  hw/smbios: add options for type 4 max-speed and current-speed
  tests/bios-tables-test: add smbios cpu speed test

 hw/smbios/smbios.c   | 36 ++
 qemu-options.hx  |  2 +-
 tests/bios-tables-test.c | 42 
 3 files changed, 75 insertions(+), 5 deletions(-)

-- 
2.23.0




[PATCH v5 2/2] tests/bios-tables-test: add smbios cpu speed test

2020-08-05 Thread Ying Fang
Add smbios type 4 CPU speed check for we added new options to set
smbios type 4 "max speed" and "current speed". The default value
should be 2000 when no option is specified, just as the old version
did.

We add the test case to one machine of each architecture, though it
doesn't really run on aarch64 platform for smbios test can't run on
uefi only platform yet.

Signed-off-by: Ying Fang 
Signed-off-by: Heyi Guo 
---
 tests/bios-tables-test.c | 42 
 1 file changed, 42 insertions(+)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index a356ac3489..6bd165021b 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -37,6 +37,8 @@ typedef struct {
 GArray *tables;
 uint32_t smbios_ep_addr;
 struct smbios_21_entry_point smbios_ep_table;
+uint16_t smbios_cpu_max_speed;
+uint16_t smbios_cpu_curr_speed;
 uint8_t *required_struct_types;
 int required_struct_types_len;
 QTestState *qts;
@@ -516,6 +518,31 @@ static inline bool smbios_single_instance(uint8_t type)
 }
 }
 
+static bool smbios_cpu_test(test_data *data, uint32_t addr)
+{
+uint16_t expect_speed[2];
+uint16_t real;
+int offset[2];
+int i;
+
+/* Check CPU speed for backward compatibility */
+offset[0] = offsetof(struct smbios_type_4, max_speed);
+offset[1] = offsetof(struct smbios_type_4, current_speed);
+expect_speed[0] = data->smbios_cpu_max_speed ? : 2000;
+expect_speed[1] = data->smbios_cpu_curr_speed ? : 2000;
+
+for (i = 0; i < 2; i++) {
+real = qtest_readw(data->qts, addr + offset[i]);
+if (real != expect_speed[i]) {
+fprintf(stderr, "Unexpected SMBIOS CPU speed: real %u expect %u\n",
+real, expect_speed[i]);
+return false;
+}
+}
+
+return true;
+}
+
 static void test_smbios_structs(test_data *data)
 {
 DECLARE_BITMAP(struct_bitmap, SMBIOS_MAX_TYPE+1) = { 0 };
@@ -538,6 +565,10 @@ static void test_smbios_structs(test_data *data)
 }
 set_bit(type, struct_bitmap);
 
+if (type == 4) {
+g_assert(smbios_cpu_test(data, addr));
+}
+
 /* seek to end of unformatted string area of this struct ("\0\0") */
 prv = crt = 1;
 while (prv || crt) {
@@ -673,6 +704,11 @@ static void test_acpi_q35_tcg(void)
 data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types);
 test_acpi_one(NULL, );
 free_test_data();
+
+data.smbios_cpu_max_speed = 3000;
+data.smbios_cpu_curr_speed = 2600;
+test_acpi_one("-smbios type=4,max-speed=3000,current-speed=2600", );
+free_test_data();
 }
 
 static void test_acpi_q35_tcg_bridge(void)
@@ -885,6 +921,12 @@ static void test_acpi_virt_tcg(void)
 
 test_acpi_one("-cpu cortex-a57", );
 free_test_data();
+
+data.smbios_cpu_max_speed = 2900;
+data.smbios_cpu_curr_speed = 2700;
+test_acpi_one("-cpu cortex-a57 "
+  "-smbios type=4,max-speed=2900,current-speed=2700", );
+free_test_data();
 }
 
 int main(int argc, char *argv[])
-- 
2.23.0




[PATCH] qcow2: flush qcow2 l2 meta for new allocated clusters

2020-08-04 Thread Ying Fang
From: fangying 

When qemu or qemu-nbd process uses a qcow2 image and configured with
'cache = none', it will write to the qcow2 image with a cache to cache
L2 tables, however the process will not use L2 tables without explicitly
calling the flush command or closing the mirror flash into the disk.
Which may cause the disk data inconsistent with the written data for
a long time. If an abnormal process exit occurs here, the issued written
data will be lost.

Therefore, in order to keep data consistency we need to flush the changes
to the L2 entry to the disk in time for the newly allocated cluster.

Signed-off-by: Ying Fang 

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 7444b9c..ab6e812 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -266,6 +266,22 @@ int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
 return result;
 }
 
+#define L2_ENTRIES_PER_SECTOR 64
+int qcow2_cache_l2_write_entry(BlockDriverState *bs, Qcow2Cache *c,
+   void *table, int index, int num)
+{
+int ret;
+int i = qcow2_cache_get_table_idx(c, table);
+int start_sector = index / L2_ENTRIES_PER_SECTOR;
+int end_sector = (index + num - 1) / L2_ENTRIES_PER_SECTOR;
+int nr_sectors = end_sector - start_sector + 1;
+ret = bdrv_pwrite(bs->file,
+  c->entries[i].offset + start_sector * BDRV_SECTOR_SIZE,
+  table + start_sector * BDRV_SECTOR_SIZE,
+  nr_sectors * BDRV_SECTOR_SIZE);
+return ret;
+}
+
 int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
 Qcow2Cache *dependency)
 {
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a677ba9..ae49a83 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -998,6 +998,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
  }
 
 
+ret = qcow2_cache_l2_write_entry(bs, s->l2_table_cache, l2_slice,
+ l2_index, m->nb_clusters);
+
 qcow2_cache_put(s->l2_table_cache, (void **) _slice);
 
 /*
diff --git a/block/qcow2.h b/block/qcow2.h
index 7ce2c23..168ab59 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -748,6 +748,8 @@ int qcow2_cache_destroy(Qcow2Cache *c);
 void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
 int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
 int qcow2_cache_write(BlockDriverState *bs, Qcow2Cache *c);
+int qcow2_cache_l2_write_entry(BlockDriverState *bs, Qcow2Cache *c,
+   void *table, int index, int num);
 int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
 Qcow2Cache *dependency);
 void qcow2_cache_depends_on_flush(Qcow2Cache *c);
-- 
1.8.3.1




Re: [PATCH v3] target/arm/cpu: adjust virtual time for arm cpu

2020-06-15 Thread Ying Fang




On 6/10/2020 3:40 PM, Andrew Jones wrote:

On Wed, Jun 10, 2020 at 09:32:06AM +0800, Ying Fang wrote:



On 6/8/2020 8:49 PM, Andrew Jones wrote:

On Mon, Jun 08, 2020 at 08:12:43PM +0800, Ying Fang wrote:

From: fangying 

Virtual time adjustment was implemented for virt-5.0 machine type,
but the cpu property was enabled only for host-passthrough and
max cpu model. Let's add it for arm cpu which has the generic timer
feature enabled.

Suggested-by: Andrew Jones 


This isn't true. I did suggest the way to arrange the code, after
Peter suggested to move the kvm_arm_add_vcpu_properties() call to
arm_cpu_post_init(), but I didn't suggest making this change in general,
which is what this tag means. In fact, I've argued that it's pretty

I'm quite sorry for adding it here.


No problem.


pointless to do this, since KVM users should be using '-cpu host' or
'-cpu max' anyway. Since I don't need credit for the code arranging,

As discussed in thread [1], there is a situation where a 'custom' cpu mode
is needed for us to keep instruction set compatibility so that migration can
be done, just like x86 does.


I understand the motivation. But, as I've said, KVM doesn't work that way.


And we are planning to add support for it if
nobody is currently doing that.


Great! I'm looking forward to seeing the KVM patches. Especially since,
without the KVM patches, the 'custom' CPU model isn't a custom CPU model,
it's just a misleading way to use host passthrough. Indeed, I'm a bit
opposed to allowing anything other than '-cpu host' and '-cpu max' (with
features explicitly enabled/disabled, e.g. -cpu host,pmu=off) to work
until KVM actually works with CPU models. Otherwise, how do we know the
difference between a model that actually works and one that is just
misleadingly named?

Yes you are right here.
My colleague zhanghailiang and me are now working on it. We will post 
the patch set soon.


Thanks,
drew



Thanks.
Ying

[1]: https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00022.html

please just drop the tag. Peter can maybe do that on merge though. Also,
despite not agreeing that we need this change today, as there's nothing
wrong with it and it looks good to me

Reviewed-by: Andrew Jones 

Thanks,
drew


Signed-off-by: Ying Fang 

---
v3:
- set kvm-no-adjvtime property in kvm_arm_add_vcpu_properties

v2:
- move kvm_arm_add_vcpu_properties into arm_cpu_post_init

v1:
- initial commit
- https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg08518.html

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 32bec156f2..5b7a36b5d7 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1245,6 +1245,10 @@ void arm_cpu_post_init(Object *obj)
   if (arm_feature(>env, ARM_FEATURE_GENERIC_TIMER)) {
   qdev_property_add_static(DEVICE(cpu), _cpu_gt_cntfrq_property);
   }
+
+if (kvm_enabled()) {
+kvm_arm_add_vcpu_properties(obj);
+}
   }
   static void arm_cpu_finalizefn(Object *obj)
@@ -2029,7 +2033,6 @@ static void arm_max_initfn(Object *obj)
   if (kvm_enabled()) {
   kvm_arm_set_cpu_features_from_host(cpu);
-kvm_arm_add_vcpu_properties(obj);
   } else {
   cortex_a15_initfn(obj);
@@ -2183,7 +2186,6 @@ static void arm_host_initfn(Object *obj)
   if (arm_feature(>env, ARM_FEATURE_AARCH64)) {
   aarch64_add_sve_properties(obj);
   }
-kvm_arm_add_vcpu_properties(obj);
   arm_cpu_post_init(obj);
   }
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index cbc5c3868f..778cecc2e6 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -592,7 +592,6 @@ static void aarch64_max_initfn(Object *obj)
   if (kvm_enabled()) {
   kvm_arm_set_cpu_features_from_host(cpu);
-kvm_arm_add_vcpu_properties(obj);
   } else {
   uint64_t t;
   uint32_t u;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 4bdbe6dcac..eef3bbd1cc 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -194,17 +194,18 @@ static void kvm_no_adjvtime_set(Object *obj, bool value, 
Error **errp)
   /* KVM VCPU properties should be prefixed with "kvm-". */
   void kvm_arm_add_vcpu_properties(Object *obj)
   {
-if (!kvm_enabled()) {
-return;
-}
+ARMCPU *cpu = ARM_CPU(obj);
+CPUARMState *env = >env;
-ARM_CPU(obj)->kvm_adjvtime = true;
-object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
- kvm_no_adjvtime_set);
-object_property_set_description(obj, "kvm-no-adjvtime",
-"Set on to disable the adjustment of "
-"the virtual counter. VM stopped time "
-"will be counted.");
+if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
+cpu->kvm_adjvtime = true;
+object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
+  

Re: [PATCH v3] target/arm/cpu: adjust virtual time for arm cpu

2020-06-09 Thread Ying Fang




On 6/8/2020 8:49 PM, Andrew Jones wrote:

On Mon, Jun 08, 2020 at 08:12:43PM +0800, Ying Fang wrote:

From: fangying 

Virtual time adjustment was implemented for virt-5.0 machine type,
but the cpu property was enabled only for host-passthrough and
max cpu model. Let's add it for arm cpu which has the generic timer
feature enabled.

Suggested-by: Andrew Jones 


This isn't true. I did suggest the way to arrange the code, after
Peter suggested to move the kvm_arm_add_vcpu_properties() call to
arm_cpu_post_init(), but I didn't suggest making this change in general,
which is what this tag means. In fact, I've argued that it's pretty

I'm quite sorry for adding it here.

pointless to do this, since KVM users should be using '-cpu host' or
'-cpu max' anyway. Since I don't need credit for the code arranging,
As discussed in thread [1], there is a situation where a 'custom' cpu 
mode is needed for us to keep instruction set compatibility so that 
migration can be done, just like x86 does. And we are planning to add 
support for it if nobody is currently doing that.


Thanks.
Ying

[1]: https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00022.html

please just drop the tag. Peter can maybe do that on merge though. Also,
despite not agreeing that we need this change today, as there's nothing
wrong with it and it looks good to me

Reviewed-by: Andrew Jones 

Thanks,
drew


Signed-off-by: Ying Fang 

---
v3:
- set kvm-no-adjvtime property in kvm_arm_add_vcpu_properties

v2:
- move kvm_arm_add_vcpu_properties into arm_cpu_post_init

v1:
- initial commit
- https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg08518.html

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 32bec156f2..5b7a36b5d7 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1245,6 +1245,10 @@ void arm_cpu_post_init(Object *obj)
  if (arm_feature(>env, ARM_FEATURE_GENERIC_TIMER)) {
  qdev_property_add_static(DEVICE(cpu), _cpu_gt_cntfrq_property);
  }
+
+if (kvm_enabled()) {
+kvm_arm_add_vcpu_properties(obj);
+}
  }
  
  static void arm_cpu_finalizefn(Object *obj)

@@ -2029,7 +2033,6 @@ static void arm_max_initfn(Object *obj)
  
  if (kvm_enabled()) {

  kvm_arm_set_cpu_features_from_host(cpu);
-kvm_arm_add_vcpu_properties(obj);
  } else {
  cortex_a15_initfn(obj);
  
@@ -2183,7 +2186,6 @@ static void arm_host_initfn(Object *obj)

  if (arm_feature(>env, ARM_FEATURE_AARCH64)) {
  aarch64_add_sve_properties(obj);
  }
-kvm_arm_add_vcpu_properties(obj);
  arm_cpu_post_init(obj);
  }
  
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c

index cbc5c3868f..778cecc2e6 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -592,7 +592,6 @@ static void aarch64_max_initfn(Object *obj)
  
  if (kvm_enabled()) {

  kvm_arm_set_cpu_features_from_host(cpu);
-kvm_arm_add_vcpu_properties(obj);
  } else {
  uint64_t t;
  uint32_t u;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 4bdbe6dcac..eef3bbd1cc 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -194,17 +194,18 @@ static void kvm_no_adjvtime_set(Object *obj, bool value, 
Error **errp)
  /* KVM VCPU properties should be prefixed with "kvm-". */
  void kvm_arm_add_vcpu_properties(Object *obj)
  {
-if (!kvm_enabled()) {
-return;
-}
+ARMCPU *cpu = ARM_CPU(obj);
+CPUARMState *env = >env;
  
-ARM_CPU(obj)->kvm_adjvtime = true;

-object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
- kvm_no_adjvtime_set);
-object_property_set_description(obj, "kvm-no-adjvtime",
-"Set on to disable the adjustment of "
-"the virtual counter. VM stopped time "
-"will be counted.");
+if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
+cpu->kvm_adjvtime = true;
+object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
+ kvm_no_adjvtime_set);
+object_property_set_description(obj, "kvm-no-adjvtime",
+"Set on to disable the adjustment of "
+"the virtual counter. VM stopped time "
+"will be counted.");
+}
  }
  
  bool kvm_arm_pmu_supported(CPUState *cpu)

--
2.23.0




.





[PATCH v3] target/arm/cpu: adjust virtual time for arm cpu

2020-06-08 Thread Ying Fang
From: fangying 

Virtual time adjustment was implemented for virt-5.0 machine type,
but the cpu property was enabled only for host-passthrough and
max cpu model. Let's add it for arm cpu which has the generic timer
feature enabled.

Suggested-by: Andrew Jones 
Signed-off-by: Ying Fang 

---
v3:
- set kvm-no-adjvtime property in kvm_arm_add_vcpu_properties

v2:
- move kvm_arm_add_vcpu_properties into arm_cpu_post_init

v1:
- initial commit
- https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg08518.html

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 32bec156f2..5b7a36b5d7 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1245,6 +1245,10 @@ void arm_cpu_post_init(Object *obj)
 if (arm_feature(>env, ARM_FEATURE_GENERIC_TIMER)) {
 qdev_property_add_static(DEVICE(cpu), _cpu_gt_cntfrq_property);
 }
+
+if (kvm_enabled()) {
+kvm_arm_add_vcpu_properties(obj);
+}
 }
 
 static void arm_cpu_finalizefn(Object *obj)
@@ -2029,7 +2033,6 @@ static void arm_max_initfn(Object *obj)
 
 if (kvm_enabled()) {
 kvm_arm_set_cpu_features_from_host(cpu);
-kvm_arm_add_vcpu_properties(obj);
 } else {
 cortex_a15_initfn(obj);
 
@@ -2183,7 +2186,6 @@ static void arm_host_initfn(Object *obj)
 if (arm_feature(>env, ARM_FEATURE_AARCH64)) {
 aarch64_add_sve_properties(obj);
 }
-kvm_arm_add_vcpu_properties(obj);
 arm_cpu_post_init(obj);
 }
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index cbc5c3868f..778cecc2e6 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -592,7 +592,6 @@ static void aarch64_max_initfn(Object *obj)
 
 if (kvm_enabled()) {
 kvm_arm_set_cpu_features_from_host(cpu);
-kvm_arm_add_vcpu_properties(obj);
 } else {
 uint64_t t;
 uint32_t u;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 4bdbe6dcac..eef3bbd1cc 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -194,17 +194,18 @@ static void kvm_no_adjvtime_set(Object *obj, bool value, 
Error **errp)
 /* KVM VCPU properties should be prefixed with "kvm-". */
 void kvm_arm_add_vcpu_properties(Object *obj)
 {
-if (!kvm_enabled()) {
-return;
-}
+ARMCPU *cpu = ARM_CPU(obj);
+CPUARMState *env = >env;
 
-ARM_CPU(obj)->kvm_adjvtime = true;
-object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
- kvm_no_adjvtime_set);
-object_property_set_description(obj, "kvm-no-adjvtime",
-"Set on to disable the adjustment of "
-"the virtual counter. VM stopped time "
-"will be counted.");
+if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
+cpu->kvm_adjvtime = true;
+object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
+ kvm_no_adjvtime_set);
+object_property_set_description(obj, "kvm-no-adjvtime",
+"Set on to disable the adjustment of "
+"the virtual counter. VM stopped time "
+"will be counted.");
+}
 }
 
 bool kvm_arm_pmu_supported(CPUState *cpu)
-- 
2.23.0




Re: Forward migration broken down since virt-4.2 machine type

2020-06-07 Thread Ying Fang

ping

On 6/4/2020 4:51 PM, Ying Fang wrote:

Hi Richard,

Recently we are doing some tests on forward migration based on
arm virt machine. And we found the patch below breaks forward
migration compatibility from virt-4.2 to virt-5.0 above machine
type. The patch which breaks this down given by git bisect is

commit f9506e162c33e87b609549157dd8431fcc732085
target/arm: Remove ARM_FEATURE_VFP*

QEMU may get crashed on the destination host loading cpu state.
Here goes my question since I am not familiar with the VFP feature.
1: Should we keep the forward migration compatibility here ?
2: If so how can we fixed it ?

Below is the crash stack:
Thread 1 "qemu-system-aar" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 712330]
armv7m_nvic_neg_prio_requested (opaque=0x0, secure=secure@entry=false) 
at  qemu/hw/intc/armv7m_nvic.c:391

391    if (s->cpu->env.v7m.faultmask[secure]) {
#0  armv7m_nvic_neg_prio_requested (opaque=0x0, 
secure=secure@entry=false) at  qemu/hw/intc/armv7m_nvic.c:391
#1  0xaaae6f766510 in arm_v7m_mmu_idx_for_secstate_and_priv 
(env=0xaaae73456780, secstate=false, priv=true) at 
qemu/target/arm/m_helper.c:2711
#2  0xaaae6f7163f0 in arm_mmu_idx_el (env=env@entry=0xaaae73456780, 
el=el@entry=1) at  qemu/target/arm/helper.c:12386
#3  0xaaae6f717000 in rebuild_hflags_internal (env=0xaaae73456780) 
at  qemu/target/arm/helper.c:12611
#4  arm_rebuild_hflags (env=env@entry=0xaaae73456780) at 
qemu/target/arm/helper.c:12624
#5  0xaaae6f722940 in cpu_post_load (opaque=0xaaae7344ceb0, 
version_id=) at  qemu/target/arm/machine.c:767
#6  0xaaae6f9e0e78 in vmstate_load_state (f=f@entry=0xaaae73020260, 
vmsd=0xaaae6fe93178 , opaque=0xaaae7344ceb0, 
version_id=22) at migration/vmstate.c:168
#7  0xaaae6f9d9858 in vmstate_load (f=f@entry=0xaaae73020260, 
se=se@entry=0xaaae7302f750) at migration/savevm.c:885
#8  0xaaae6f9dab90 in qemu_loadvm_section_start_full 
(f=f@entry=0xaaae73020260, mis=0xaaae72fb88a0) at migration/savevm.c:2302
#9  0xaaae6f9dd248 in qemu_loadvm_state_main 
(f=f@entry=0xaaae73020260, mis=mis@entry=0xaaae72fb88a0) at 
migration/savevm.c:2486
#10 0xaaae6f9de3bc in qemu_loadvm_state (f=0xaaae73020260) at 
migration/savevm.c:2560
#11 0xaaae6f9d489c in process_incoming_migration_co 
(opaque=) at migration/migration.c:461
#12 0xaaae6fb59850 in coroutine_trampoline (i0=, 
i1=) at util/coroutine-ucontext.c:115

#13 0xfffdd6c16030 in ?? () from target:/usr/lib64/libc.so.6

#0  armv7m_nvic_neg_prio_requested (opaque=0x0, 
secure=secure@entry=false) at  qemu/hw/intc/armv7m_nvic.c:391

(gdb) p    s
$4 = (NVICState *) 0x0

Thanks.
Ying




Re: About the kvm-no-adjvtime CPU property

2020-06-04 Thread Ying Fang




On 6/3/2020 4:53 PM, Andrew Jones wrote:

On Tue, Jun 02, 2020 at 03:47:22PM +0800, Ying Fang wrote:



On 2020/6/1 20:29, Andrew Jones wrote:

On Mon, Jun 01, 2020 at 08:07:31PM +0800, Ying Fang wrote:



On 2020/6/1 16:07, Andrew Jones wrote:

On Sat, May 30, 2020 at 04:56:26PM +0800, Ying Fang wrote:

About the kvm-no-adjvtime CPU property

Hi Andrew,
To adjust virutal time, a new kvm cpu property kvm-no-adjvtime
was introduced to 5.0 virt machine types. However the cpu
property was enabled only for host-passthrough and max cpu model.
As for other cpu model like cortex-a57, cortex-a53, cortex-a72,
this kvm-adjvtime is not enabled by default, which means the
virutal time can not be adjust for them.

Here, for example, if VM is configured with kvm enabled:

 
   cortex-a72
   
   
 
 
   
 

We cannot adjust virtual time even if 5.0 virt machine is used.
So i'd like to add it to other cpu model, do you have any
suggestions here ?




Hi Fang,

The cpu feature only requires kvm.  If a cpu model may be used with kvm,
then the feature can be allowed to be used with the model.  What I find
interesting is that the cpu model is being used with kvm instead of 'host'
or 'max'.  Can you explain the reasons for that?  Currently, when using

yes,the cpu model is indeed used with kvm.

There is a situation where the host cpu model is Cortex-A72 and
a 'custom' cpu mode is used to keep insrtuction set compatible between
the source and destination host machine when doing live migration.
So the host physical machine cpu model is Cortex-A72 but
host-passthrough model is mode used here.

I mean host-passthrough model is 'not' used here. Sorry to make it
confusing.


I guessed as much.



Are the source and destinations hosts used in the migration identical?
If so, then the guest can use cpu 'host' and disable cpu features that
should not be exposed (e.g. -cpu host,pmu=off).  If the source and
destination hosts are not identical, then I'm curious what those exact
differences are.  With the way AArch64 KVM works today, even using the
Cortex-A72 cpu model should require identical hosts when migrating.  Or,
at least both hosts must be compatible with Cortex-A72 and any difference
in ID registers must be somehow getting hidden from the guest.

Yes, you are right.
We have AAarch64 server with cpu based on Cortex-A72 and some extra
instruction set added. Source host with cpu based on V1 and destination host
with cpu based on V2 and they are compatible with Cortex-A72. We want to use
a 'custom' cpu mode here to make it possible to do live migration between
them. This is the scenario where the 'host' cpu model is not used since a
'custom' cpu model Cortex-A72 is used here .


What you've described here is indeed the reason to use CPU models. I.e.
enabling the migration from a host of one type to another by creating a
guest that only enables the features contained in both hosts (as well as
maintaining all state that describes the CPU type, e.g. MIDR).
Unfortunately, unless your KVM has patches that aren't upstream, then that
doesn't work on AArch64 KVM (more on that below). It may appear to be
working for you, because your guest kernel and userspace don't mind the
slight differences exposed to it between the hosts, or those differences
are limited to explicitly disabled features. If that's the case, then I
would guess that using '-cpu host' and disabling the same features would
"work" as well.


Yes, upstream KVM currently does not support it. We are planning to add
support for the aarch64 platform since we have the situation where it is
needed for our hardware.

@Marc Zyngier, is there anyone who doing on this?


Here's some more details on why the Cortex-A72 CPU model doesn't matter
with upstream KVM. First, upstream AArch64 KVM doesn't support CPU models,
and it doesn't even have a Cortex-A72 preferred target. For Cortex-A72
it will use "KVM_ARM_TARGET_GENERIC_V8", which is the same thing 'host'
would do when running on a Cortex-A72. Second, if V2 of the Cortex-A72-
based CPU you're using changed the revision of the MIDR, or any other
state that gets passed directly through to the guest like the MIDR, then
that state will change on migration. If a guest looks before migration and
again after migration, then it could get confused. A guest kernel may only
look once on boot and then not notice, but anything exposed to userspace
is extra risky, as userspace may check more frequently.


Yes, just as explained here.



In short, without KVM patches that aren't upstream, then it's risky to
migrate between machines with V1 and V2 of these CPUs. And, it doesn't
help to use the Cortex-A72 CPU model.

Thanks for your detailed introduction.



Thanks,
drew



However the kvm-adjvtime
feature is also need. So I think we should move kvm_arm_add_vcpu_properties
to arm_cpu_post_init, instead of limited to 'host' and 'max' cpu model[1].

1: https://lists.gnu.org/archive/html/qemu-de

Forward migration broken down since virt-4.2 machine type

2020-06-04 Thread Ying Fang

Hi Richard,

Recently we are doing some tests on forward migration based on
arm virt machine. And we found the patch below breaks forward
migration compatibility from virt-4.2 to virt-5.0 above machine
type. The patch which breaks this down given by git bisect is

commit f9506e162c33e87b609549157dd8431fcc732085
target/arm: Remove ARM_FEATURE_VFP*

QEMU may get crashed on the destination host loading cpu state.
Here goes my question since I am not familiar with the VFP feature.
1: Should we keep the forward migration compatibility here ?
2: If so how can we fixed it ?

Below is the crash stack:
Thread 1 "qemu-system-aar" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 712330]
armv7m_nvic_neg_prio_requested (opaque=0x0, secure=secure@entry=false) 
at  qemu/hw/intc/armv7m_nvic.c:391

391 if (s->cpu->env.v7m.faultmask[secure]) {
#0  armv7m_nvic_neg_prio_requested (opaque=0x0, 
secure=secure@entry=false) at  qemu/hw/intc/armv7m_nvic.c:391
#1  0xaaae6f766510 in arm_v7m_mmu_idx_for_secstate_and_priv 
(env=0xaaae73456780, secstate=false, priv=true) at 
qemu/target/arm/m_helper.c:2711
#2  0xaaae6f7163f0 in arm_mmu_idx_el (env=env@entry=0xaaae73456780, 
el=el@entry=1) at  qemu/target/arm/helper.c:12386
#3  0xaaae6f717000 in rebuild_hflags_internal (env=0xaaae73456780) 
at  qemu/target/arm/helper.c:12611
#4  arm_rebuild_hflags (env=env@entry=0xaaae73456780) at 
qemu/target/arm/helper.c:12624
#5  0xaaae6f722940 in cpu_post_load (opaque=0xaaae7344ceb0, 
version_id=) at  qemu/target/arm/machine.c:767
#6  0xaaae6f9e0e78 in vmstate_load_state (f=f@entry=0xaaae73020260, 
vmsd=0xaaae6fe93178 , opaque=0xaaae7344ceb0, 
version_id=22) at migration/vmstate.c:168
#7  0xaaae6f9d9858 in vmstate_load (f=f@entry=0xaaae73020260, 
se=se@entry=0xaaae7302f750) at migration/savevm.c:885
#8  0xaaae6f9dab90 in qemu_loadvm_section_start_full 
(f=f@entry=0xaaae73020260, mis=0xaaae72fb88a0) at migration/savevm.c:2302
#9  0xaaae6f9dd248 in qemu_loadvm_state_main 
(f=f@entry=0xaaae73020260, mis=mis@entry=0xaaae72fb88a0) at 
migration/savevm.c:2486
#10 0xaaae6f9de3bc in qemu_loadvm_state (f=0xaaae73020260) at 
migration/savevm.c:2560
#11 0xaaae6f9d489c in process_incoming_migration_co 
(opaque=) at migration/migration.c:461
#12 0xaaae6fb59850 in coroutine_trampoline (i0=, 
i1=) at util/coroutine-ucontext.c:115

#13 0xfffdd6c16030 in ?? () from target:/usr/lib64/libc.so.6

#0  armv7m_nvic_neg_prio_requested (opaque=0x0, 
secure=secure@entry=false) at  qemu/hw/intc/armv7m_nvic.c:391

(gdb) p s
$4 = (NVICState *) 0x0

Thanks.
Ying




  1   2   >