[Qemu-devel] [PATCH] hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration() follow-up

2018-07-10 Thread Dou Liyang
This Commit 7747abf11487 misses the curly brackets. Follow Igor's and Eduardo's 
suggestion,

Add a follow-up patch for it.

Signed-off-by: Dou Liyang 
---
 hw/core/machine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index a9aeb22f03..6b68e1218f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -793,8 +793,9 @@ void machine_run_board_init(MachineState *machine)
 MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 
 numa_complete_configuration(machine);
-if (nb_numa_nodes)
+if (nb_numa_nodes) {
 machine_numa_finish_cpu_init(machine);
+}
 
 /* If the machine supports the valid_cpu_types check and the user
  * specified a CPU with -cpu check here that the user CPU is supported.
-- 
2.14.3






Re: [Qemu-devel] [PATCH v2] hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration()

2018-07-10 Thread Dou Liyang

Hi Eduardo,

At 07/10/2018 06:27 PM, Eduardo Habkost wrote:

On Tue, Jul 10, 2018 at 04:15:30PM +0800, Dou Liyang wrote:
[...]

+if (nb_numa_nodes)
   machine_numa_finish_cpu_init(machine);
-}

missing curly brackets, should look like:
 if (nb_numa_nodes) {
 machine_numa_finish_cpu_init(machine);
 }

with that fixed
Reviewed-by: Igor Mammedov 



Thank you, Igor.

Due to Eduardo has sent the pull request of this patch,
Can I fix this and send v2 patch now?


You can fix this on a follow-up patch and CC qemu-trivial.



Ok, I see. ;-)

Thanks,
dou





[Qemu-devel] [PATCH v2 1/2] hw/acpi-build: Add a check for memory-less NUMA nodes

2018-07-10 Thread Dou Liyang
Currently, Qemu ACPI builder doesn't consider the memory-less NUMA nodes, eg:

  -m 4G,slots=4,maxmem=8G \
  -numa node,nodeid=0 \
  -numa node,nodeid=1,mem=2G \
  -numa node,nodeid=2,mem=2G \
  -numa node,nodeid=3\

Guest Linux will report

  [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x]
  [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009]
  [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff]
  [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff]
  [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff]
  [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x13fff]
  [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug

[mem 0x-0x] and [mem 0x14000-0x13fff] are bogus.

Add a check to avoid building srat memory for memory-less NUMA nodes, also 
update
the test file. Now the info in guest linux will be

  [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009]
  [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff]
  [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff]
  [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff]
  [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug

Signed-off-by: Dou Liyang 
Reviewed-by: Igor Mammedov 
---
Note to maintainer: update ACPI tables test blobs on commit.
---
 hw/i386/acpi-build.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9e8350c55d..c584642e4e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2392,9 +2392,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 mem_len = next_base - pcms->below_4g_mem_size;
 next_base = mem_base + mem_len;
 }
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
+
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
+}
 }
 slots = (table_data->len - numa_start) / sizeof *numamem;
 for (; slots < pcms->numa_nodes + 2; slots++) {
-- 
2.14.3






[Qemu-devel] [PATCH v2 2/2][DO NOT APPLY] tests/acpi-test: update ACPI tables test blobs

2018-07-10 Thread Dou Liyang
Now, QEmu adds a new check for memory-less NUMA nodes in build_srat().

It effects the ACPI test.

So, Update ACPI tables test blobs.

Signed-off-by: Dou Liyang 
---
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 224 -> 224 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 224 -> 224 bytes
 2 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/tests/acpi-test-data/pc/SRAT.numamem 
b/tests/acpi-test-data/pc/SRAT.numamem
index 
dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3
 100644
GIT binary patch
delta 24
gcmaFB_<)fsILI;N0RsaAt<8

diff --git a/tests/acpi-test-data/q35/SRAT.numamem 
b/tests/acpi-test-data/q35/SRAT.numamem
index 
dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3
 100644
GIT binary patch
delta 24
gcmaFB_<)fsILI;N0RsaAt<8

-- 
2.14.3






Re: [Qemu-devel] [PATCH] hw/acpi-build: Add a check for non-memory NUMA nodes.

2018-07-10 Thread Dou Liyang

Hi Igor,

At 07/10/2018 03:52 PM, Igor Mammedov wrote:

On Thu, 5 Jul 2018 10:10:38 +0800
Dou Liyang  wrote:


Currently, Qemu ACPI builder doesn't consider the non-memory NUMA nodes, eg:

s/non-memory/memory-less/ throughout subj/commit message



Yes, I will do it.




   -m 4G,slots=4,maxmem=8G \
   -numa node,nodeid=0 \
   -numa node,nodeid=1,mem=2G \
   -numa node,nodeid=2,mem=2G \
   -numa node,nodeid=3\

Guest Linux will report

   [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x]
   [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009]
   [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff]
   [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff]
   [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff]
   [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x13fff]
   [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug

[mem 0x-0x] and [mem 0x14000-0x13fff] are bogus.

Add a check to avoid building srat memory for non-memory NUMA nodes, also update
the test file. Now the info in guest linux will be

   [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009]
   [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff]
   [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff]
   [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff]
   [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug

Signed-off-by: Dou Liyang 
---
Have done a bootup test in Linux and window 10, 7


add here":
note to maintainer: update ACPI tables test blobs on commit.

with patch amended

Reviewed-by: Igor Mammedov 


---
  hw/i386/acpi-build.c  |   9 ++---
  tests/acpi-test-data/pc/SRAT.numamem  | Bin 224 -> 224 bytes
  tests/acpi-test-data/q35/SRAT.numamem | Bin 224 -> 224 bytes
  3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9e8350c55d..c584642e4e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2392,9 +2392,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
  mem_len = next_base - pcms->below_4g_mem_size;
  next_base = mem_base + mem_len;
  }
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
+
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
+}
  }
  slots = (table_data->len - numa_start) / sizeof *numamem;
  for (; slots < pcms->numa_nodes + 2; slots++) {


Drop binary blobs from patch (for reviewer convenience we post
blobs as separate patch with DO NOT APPLY tag in subject).

Michael will update test blobs manually when merging your patch.


Thank you for your explanation. I see. will split them out in v2.

Thanks,
dou


diff --git a/tests/acpi-test-data/pc/SRAT.numamem 
b/tests/acpi-test-data/pc/SRAT.numamem
index 
dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3
 100644
GIT binary patch
delta 24
gcmaFB_<)fsILI;N0RsaAt<8

diff --git a/tests/acpi-test-data/q35/SRAT.numamem 
b/tests/acpi-test-data/q35/SRAT.numamem
index 
dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3
 100644
GIT binary patch
delta 24
gcmaFB_<)fsILI;N0RsaAt<8












Re: [Qemu-devel] [PATCH v2] hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration()

2018-07-10 Thread Dou Liyang

Hi Igor,

At 07/10/2018 03:42 PM, Igor Mammedov wrote:

On Wed, 4 Jul 2018 21:22:39 +0800
Dou Liyang  wrote:


Commit 7a3099fc9c5c("numa: postpone options post-processing till 
machine_run_board_init()")
broke the commit 7b8be49d36fc("NUMA: Enable adding NUMA node implicitly").

The machine_run_board_init() doesn't do NUMA setup if nb_numa_nodes=0,
but the numa_complete_configuration need add a new node if memory hotplug
is enabled (slots > 0) even nb_numa_nodes=0.

So, Remove the check for numa_complete_configuration() to fix this.

Fixes 7a3099fc9c5c("numa: postpone options post-processing till 
machine_run_board_init()")
Signed-off-by: Dou Liyang 
---
v1 --> v2:
   -Fix it to pass the "make check".
---
  hw/core/machine.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 2077328bcc..32b3c5a7d3 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -791,10 +791,9 @@ void machine_run_board_init(MachineState *machine)
  {
  MachineClass *machine_class = MACHINE_GET_CLASS(machine);
  
-if (nb_numa_nodes) {

-numa_complete_configuration(machine);
+numa_complete_configuration(machine);
+if (nb_numa_nodes)
  machine_numa_finish_cpu_init(machine);
-}

missing curly brackets, should look like:
if (nb_numa_nodes) {
machine_numa_finish_cpu_init(machine);
}

with that fixed
Reviewed-by: Igor Mammedov 



Thank you, Igor.

Due to Eduardo has sent the pull request of this patch,
Can I fix this and send v2 patch now?

Thanks,
dou
  
  /* If the machine supports the valid_cpu_types check and the user

   * specified a CPU with -cpu check here that the user CPU is supported.











[Qemu-devel] [PATCH] hw/acpi-build: Add a check for non-memory NUMA nodes.

2018-07-04 Thread Dou Liyang
Currently, Qemu ACPI builder doesn't consider the non-memory NUMA nodes, eg:

  -m 4G,slots=4,maxmem=8G \
  -numa node,nodeid=0 \
  -numa node,nodeid=1,mem=2G \
  -numa node,nodeid=2,mem=2G \
  -numa node,nodeid=3\

Guest Linux will report

  [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x]
  [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009]
  [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff]
  [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff]
  [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff]
  [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x13fff]
  [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug

[mem 0x-0x] and [mem 0x14000-0x13fff] are bogus.

Add a check to avoid building srat memory for non-memory NUMA nodes, also update
the test file. Now the info in guest linux will be

  [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009]
  [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff]
  [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff]
  [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff]
  [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug

Signed-off-by: Dou Liyang 
---
Have done a bootup test in Linux and window 10, 7
---
 hw/i386/acpi-build.c  |   9 ++---
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 224 -> 224 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 224 -> 224 bytes
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9e8350c55d..c584642e4e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2392,9 +2392,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 mem_len = next_base - pcms->below_4g_mem_size;
 next_base = mem_base + mem_len;
 }
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
+
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
+}
 }
 slots = (table_data->len - numa_start) / sizeof *numamem;
 for (; slots < pcms->numa_nodes + 2; slots++) {
diff --git a/tests/acpi-test-data/pc/SRAT.numamem 
b/tests/acpi-test-data/pc/SRAT.numamem
index 
dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3
 100644
GIT binary patch
delta 24
gcmaFB_<)fsILI;N0RsaAt<8

diff --git a/tests/acpi-test-data/q35/SRAT.numamem 
b/tests/acpi-test-data/q35/SRAT.numamem
index 
dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3
 100644
GIT binary patch
delta 24
gcmaFB_<)fsILI;N0RsaAt<8

-- 
2.14.3






[Qemu-devel] [PATCH v2] hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration()

2018-07-04 Thread Dou Liyang
Commit 7a3099fc9c5c("numa: postpone options post-processing till 
machine_run_board_init()")
broke the commit 7b8be49d36fc("NUMA: Enable adding NUMA node implicitly").

The machine_run_board_init() doesn't do NUMA setup if nb_numa_nodes=0,
but the numa_complete_configuration need add a new node if memory hotplug
is enabled (slots > 0) even nb_numa_nodes=0.

So, Remove the check for numa_complete_configuration() to fix this.

Fixes 7a3099fc9c5c("numa: postpone options post-processing till 
machine_run_board_init()")
Signed-off-by: Dou Liyang 
---
v1 --> v2:
  -Fix it to pass the "make check". 
---
 hw/core/machine.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 2077328bcc..32b3c5a7d3 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -791,10 +791,9 @@ void machine_run_board_init(MachineState *machine)
 {
 MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 
-if (nb_numa_nodes) {
-numa_complete_configuration(machine);
+numa_complete_configuration(machine);
+if (nb_numa_nodes)
 machine_numa_finish_cpu_init(machine);
-}
 
 /* If the machine supports the valid_cpu_types check and the user
  * specified a CPU with -cpu check here that the user CPU is supported.
-- 
2.14.3






Re: [Qemu-devel] [PATCH] hw/machine: Remove the Zero check of nb_numa_nodes

2018-07-04 Thread Dou Liyang




At 07/04/2018 07:52 PM, Dou Liyang wrote:

Commit 7a3099fc9c5c("numa: postpone options post-processing till 
machine_run_board_init()")
broke the commit 7b8be49d36fc("NUMA: Enable adding NUMA node implicitly").

The machine_run_board_init() doesn't do NUMA setup if nb_numa_nodes=0.

Due to the NUMA setup will also check the value of nb_numa_nodes, remove the
check from machine_run_board_init() to fix ths bug.

Fixes 7a3099fc9c5c("numa: postpone options post-processing till 
machine_run_board_init()")
Signed-off-by: Dou Liyang 
---
  hw/core/machine.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)


Please ignore this patch, it breaks the "make check", will send v2.

Thanks,
dou



diff --git a/hw/core/machine.c b/hw/core/machine.c
index 2077328bcc..962346f90d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -791,10 +791,8 @@ void machine_run_board_init(MachineState *machine)
  {
  MachineClass *machine_class = MACHINE_GET_CLASS(machine);
  
-if (nb_numa_nodes) {

-numa_complete_configuration(machine);
-machine_numa_finish_cpu_init(machine);
-}
+numa_complete_configuration(machine);
+machine_numa_finish_cpu_init(machine);
  
  /* If the machine supports the valid_cpu_types check and the user

   * specified a CPU with -cpu check here that the user CPU is supported.







[Qemu-devel] [PATCH] hw/machine: Remove the Zero check of nb_numa_nodes

2018-07-04 Thread Dou Liyang
Commit 7a3099fc9c5c("numa: postpone options post-processing till 
machine_run_board_init()")
broke the commit 7b8be49d36fc("NUMA: Enable adding NUMA node implicitly").

The machine_run_board_init() doesn't do NUMA setup if nb_numa_nodes=0.

Due to the NUMA setup will also check the value of nb_numa_nodes, remove the
check from machine_run_board_init() to fix ths bug.

Fixes 7a3099fc9c5c("numa: postpone options post-processing till 
machine_run_board_init()")
Signed-off-by: Dou Liyang 
---
 hw/core/machine.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 2077328bcc..962346f90d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -791,10 +791,8 @@ void machine_run_board_init(MachineState *machine)
 {
 MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 
-if (nb_numa_nodes) {
-numa_complete_configuration(machine);
-machine_numa_finish_cpu_init(machine);
-}
+numa_complete_configuration(machine);
+machine_numa_finish_cpu_init(machine);
 
 /* If the machine supports the valid_cpu_types check and the user
  * specified a CPU with -cpu check here that the user CPU is supported.
-- 
2.14.3






Re: [Qemu-devel] [RESEND PATCH for-2.12 0/2] ACPI/unit-test: Add a testcase

2018-01-11 Thread Dou Liyang



At 01/12/2018 03:23 PM, Dou Liyang wrote:

[...]


BTW, I used "make TEST_ACPI_REBUILD_AML=1 check" to create the DSDT
file. Is it correct?


This sounds correct to me.  Igor, can you confirm?

It's will rebuild/update reference tables.

I'm for merging it early and tracking/fixing issue Peter saw during 
this dev cycle.





Send an ack then?


Hi Michael,

I saw Igor gave the Reviewed-by at 21 Dec 2017

                                     ^^ Oops! Sorry, it's 27 Dec.


What is the state of this patchset?

I rebase and retest it again, Do I need to send it again?

Thanks,
 dou









Re: [Qemu-devel] [RESEND PATCH for-2.12 0/2] ACPI/unit-test: Add a testcase

2018-01-11 Thread Dou Liyang

[...]


BTW, I used "make TEST_ACPI_REBUILD_AML=1 check" to create the DSDT
file. Is it correct?


This sounds correct to me.  Igor, can you confirm?

It's will rebuild/update reference tables.

I'm for merging it early and tracking/fixing issue Peter saw during this dev 
cycle.




Send an ack then?


Hi Michael,

I saw Igor gave the Reviewed-by at 21 Dec 2017.

What is the state of this patchset?

I rebase and retest it again, Do I need to send it again?

Thanks,
dou









Re: [Qemu-devel] [RESEND PATCH for-2.12 0/2] ACPI/unit-test: Add a testcase

2017-12-18 Thread Dou Liyang

Hi Eduardo,

At 12/19/2017 06:09 AM, Eduardo Habkost wrote:

On Thu, Dec 14, 2017 at 12:08:53PM +0800, Dou Liyang wrote:

These are the patches left over from the pull request:

   [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05

because of some errors when tested by "make check" command[1].

Now, the QEMU 2.11.0 has been released, rebase/retest/respin them
for v2.12 dev.

Changelog:
   --Rebase and retest this patch in v2.11.0
   --Update the DSDT.numamem:

[1]: http://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01334.html


Do we know what caused the failures Peter saw, and if we can now
be sure it won't happen again?


Sorry, I can't. I tested it many times with my colleagues, it won't
happen like Peter saw. This issue is very subtle and funny. Could you
give me some suggestions?

BTW, I used "make TEST_ACPI_REBUILD_AML=1 check" to create the DSDT
file. Is it correct?

Thanks,
dou.








[Qemu-devel] [RESEND PATCH for-2.12 1/2] ACPI/unit-test: Add a testcase for RAM allocation in numa node

2017-12-13 Thread Dou Liyang
As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 5150 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7834 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes
 tests/bios-tables-test.c  |  24 
 5 files changed, 24 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

diff --git a/tests/acpi-test-data/pc/DSDT.numamem 
b/tests/acpi-test-data/pc/DSDT.numamem
new file mode 100644
index 
..224cfdd9e983e02dac5f4bf7e210eaa64cb0dc78
GIT binary patch
literal 5150
zcmb7I-EJG#5uUTVl$K{nX(_FLmDq$F*GSsf{PAB<pb)uBiL^v&&81uyE0jxFLFpiX
z7fuXB5rWbTpg6t@R2RMI1#R%!dW7~7@(AfGxM-}aurs?n6hmo%sF%B
zjAS`h<AMRe-W1o(vXd^}w@nM@7y!_ie)Wp732w(Kt~8k@Da?JU1!H^^RpWP7abt!3
zqwW3O^FIH^V=W)HUg<BkTK?D1ye%NmlP#Zf8t98nr`UA63$|IwsLgW4N)*25vf3Dd4?gJ9
zBCr+q7#@Q(wF7SV)@soj!DZQq2dgp)G<eYi4;a3+4cqv;C}F_eynrWAx(e{N#%&
zk1c{uz=FDLnWIgd9(uIE
<

[Qemu-devel] [RESEND PATCH for-2.12 0/2] ACPI/unit-test: Add a testcase

2017-12-13 Thread Dou Liyang
These are the patches left over from the pull request:

  [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05

because of some errors when tested by "make check" command[1].

Now, the QEMU 2.11.0 has been released, rebase/retest/respin them
for v2.12 dev.

Changelog:
  --Rebase and retest this patch in v2.11.0
  --Update the DSDT.numamem:

[1]: http://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01334.html

Dou Liyang (2):
  ACPI/unit-test: Add a testcase for RAM allocation in numa node
  hw/acpi-build: Make next_base easy to follow

 hw/i386/acpi-build.c  |   2 +-
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 5150 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7834 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes
 tests/bios-tables-test.c  |  24 
 6 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

-- 
2.14.3






[Qemu-devel] [RESEND PATCH for-2.12 2/2] hw/acpi-build: Make next_base easy to follow

2017-12-13 Thread Dou Liyang
It may be hard to read the assignment statement of "next_base", so

S/next_base += (1ULL << 32) - pcms->below_4g_mem_size;
 /next_base = mem_base + mem_len;

... for readability.

No functionality change.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 hw/i386/acpi-build.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 73519ab3ac..2010108ae0 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2381,7 +2381,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 }
 mem_base = 1ULL << 32;
 mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+next_base = mem_base + mem_len;
 }
 numamem = acpi_data_push(table_data, sizeof *numamem);
 build_srat_memory(numamem, mem_base, mem_len, i - 1,
-- 
2.14.3






Re: [Qemu-devel] [PULL 1/9] hw/acpi-build: Make assignment statement of next_base easy to read

2017-11-28 Thread Dou Liyang

Hi, Igor,

At 11/28/2017 11:07 PM, Igor Mammedov wrote:

On Thu,  5 Oct 2017 17:36:30 -0300
Eduardo Habkost <ehabk...@redhat.com> wrote:


From: Dou Liyang <douly.f...@cn.fujitsu.com>

It may be hard to read the assignment statement of "next_base", so

S/next_base += (1ULL << 32) - pcms->below_4g_mem_size;
 /next_base = mem_base + mem_len;

... for readability.

No functionality change.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Message-Id: <1504231805-30957-3-git-send-email-douly.f...@cn.fujitsu.com>
Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>

Pls, retest/respin once 2.12 dev window is open



Yes, I will.

Thanks,
dou.


---
 hw/i386/acpi-build.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2af37a9129..73e3443bce 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2381,7 +2381,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 }
 mem_base = 1ULL << 32;
 mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+next_base = mem_base + mem_len;
 }
 numamem = acpi_data_push(table_data, sizeof *numamem);
 build_srat_memory(numamem, mem_base, mem_len, i - 1,











Re: [Qemu-devel] [PULL 2/9] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-11-28 Thread Dou Liyang

Hi Igor,

At 11/28/2017 11:08 PM, Igor Mammedov wrote:

On Thu,  5 Oct 2017 17:36:31 -0300
Eduardo Habkost <ehabk...@redhat.com> wrote:


From: Dou Liyang <douly.f...@cn.fujitsu.com>

As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Message-Id: <1504231805-30957-4-git-send-email-douly.f...@cn.fujitsu.com>
Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>


Pls retest/respin once dev window is open.



Yes, Thanks so much for your reminder. I will do that.

Thanks,
dou.











Re: [Qemu-devel] [PATCH v6] NUMA: Enable adding NUMA node implicitly

2017-11-14 Thread Dou Liyang

Hi Igor,

[...]

+parse_numa_node(ms, , NULL);

I get build break here:

numa.c:451:13: error: too few arguments to function ‘parse_numa_node’
 parse_numa_node(ms, , NULL);



In upstream tree, your commit

  cc001888b780 ("numa: fixup parsed NumaNodeOptions earlier")

removed a argument from parse_numa_node() recently. this definition
of function becomes

static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
Error **errp)

this patch is based on the upstream tree, parse_numa_node() should have
three arguments.

I am not sure why you got this building failure log, can you tell me
which branch did you test?

Thanks,
dou





[Qemu-devel] [PATCH v6] NUMA: Enable adding NUMA node implicitly

2017-11-13 Thread Dou Liyang
Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.

Which breaks both linux and windows guests in certain conditions:
 * Windows: won't enable memory hotplug without SRAT table at all
 * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
   present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
   when memory is hotplugged and guest tries to use it with that drivers.

Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI.
(PS: auto-create numa node only for new machine types so not to break 
migration).

Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
 * Windows: to enable memory hotplug
 * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
   buffers that legacy drivers/hw can handle.

[Rewritten by Igor]

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>
---
changelog V5 --> V6:
  - rebase it to avoid building failure
  - test again
---
 hw/i386/pc.c|  1 +
 hw/i386/pc_piix.c   |  1 +
 hw/i386/pc_q35.c|  1 +
 include/hw/boards.h |  1 +
 numa.c  | 21 -
 vl.c|  3 +--
 6 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e11a65b..156501c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2325,6 +2325,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->auto_enable_numa_with_memhp = true;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index f79d5cb..5e47528 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m)
 m->is_default = 0;
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
+m->auto_enable_numa_with_memhp = false;
 }
 
 DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index da3ea60..d606004 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m)
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
 m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
+m->auto_enable_numa_with_memhp = false;
 }
 
 DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 191a5b3..f1077f1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -192,6 +192,7 @@ struct MachineClass {
 bool ignore_memory_transaction_failures;
 int numa_mem_align_shift;
 const char **valid_cpu_types;
+bool auto_enable_numa_with_memhp;
 void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
  int nb_nodes, ram_addr_t size);
 
diff --git a/numa.c b/numa.c
index 8d78d95..7151b24 100644
--- a/numa.c
+++ b/numa.c
@@ -216,6 +216,7 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 }
 numa_info[nodenr].present = true;
 max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
+nb_numa_nodes++;
 }
 
 static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
@@ -282,7 +283,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error 
**errp)
 if (err) {
 goto end;
 }
-nb_numa_nodes++;
 break;
 case NUMA_OPTIONS_TYPE_DIST:
 parse_numa_distance(>u.dist, );
@@ -433,6 +433,25 @@ void parse_numa_opts(MachineState *ms)
 exit(1);
 }
 
+/*
+ * If memory hotplug is enabled (slots > 0) but without '-numa'
+ * options explicitly on CLI, guestes will break.
+ *
+ *   Windows: won't enable memory hotplug without SRAT table at all
+ *
+ *   Linux: if QEMU is started with initial memory a

Re: [Qemu-devel] [PATCH v5] NUMA: Enable adding NUMA node implicitly

2017-11-13 Thread Dou Liyang

Hi Michael,

[...]

Seems to cause build failures:
/scm/qemu/numa.c:452:13: error: too many arguments to function ‘parse_numa_node’
 parse_numa_node(ms, , NULL, NULL);



Yes, commit

  cc001888b780 ("numa: fixup parsed NumaNodeOptions earlier")

removed a argument from the function recently.

I will rebase this patch.

Thanks,
dou.




---
changelog V4 --> V5:

  - Avoid calling qemu_opts_parse*()
  - Add a new NUMA node by calling parse_numa_node() directly
  - Remove the redundant argument in parse_numa_opts()

These all were suggested by Eduardo.

---
 hw/i386/pc.c|  1 +
 hw/i386/pc_piix.c   |  1 +
 hw/i386/pc_q35.c|  1 +
 include/hw/boards.h |  1 +
 numa.c  | 21 -
 vl.c|  3 +--
 6 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8e307f7..ec4eb97 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2325,6 +2325,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->auto_enable_numa_with_memhp = true;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index f79d5cb..5e47528 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m)
 m->is_default = 0;
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
+m->auto_enable_numa_with_memhp = false;
 }

 DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index da3ea60..d606004 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m)
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
 m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
+m->auto_enable_numa_with_memhp = false;
 }

 DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 191a5b3..f1077f1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -192,6 +192,7 @@ struct MachineClass {
 bool ignore_memory_transaction_failures;
 int numa_mem_align_shift;
 const char **valid_cpu_types;
+bool auto_enable_numa_with_memhp;
 void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
  int nb_nodes, ram_addr_t size);

diff --git a/numa.c b/numa.c
index 100a67f..4d12f30 100644
--- a/numa.c
+++ b/numa.c
@@ -221,6 +221,7 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 }
 numa_info[nodenr].present = true;
 max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
+nb_numa_nodes++;
 }

 static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
@@ -281,7 +282,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error 
**errp)
 if (err) {
 goto end;
 }
-nb_numa_nodes++;
 break;
 case NUMA_OPTIONS_TYPE_DIST:
 parse_numa_distance(>u.dist, );
@@ -432,6 +432,25 @@ void parse_numa_opts(MachineState *ms)
 exit(1);
 }

+/*
+ * If memory hotplug is enabled (slots > 0) but without '-numa'
+ * options explicitly on CLI, guestes will break.
+ *
+ *   Windows: won't enable memory hotplug without SRAT table at all
+ *
+ *   Linux: if QEMU is started with initial memory all below 4Gb
+ *   and no SRAT table present, guest kernel will use nommu DMA ops,
+ *   which breaks 32bit hw drivers when memory is hotplugged and
+ *   guest tries to use it with that drivers.
+ *
+ * Enable NUMA implicitly by adding a new NUMA node automatically.
+ */
+if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
+mc->auto_enable_numa_with_memhp) {
+NumaNodeOptions node = { };
+parse_numa_node(ms, , NULL, NULL);
+}
+
 assert(max_numa_nodeid <= MAX_NODES);

 /* No support for sparse NUMA node IDs yet: */
diff --git a/vl.c b/vl.c
index ec29909..be332d1 100644
--- a/vl.c
+++ b/vl.c
@@ -4675,8 +4675,6 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);

-parse_numa_opts(current_machine);
-
 if (qemu_opts_foreach(qemu_find_opts("mon"),
   mon_init_func, NULL, NULL)) {
 exit(1);
@@ -4726,6 +4724,7 @@ int main(int argc, char **argv, char **envp)
 current_machine->boot_order = boot_order;
 current_machine->cpu_model = cpu_model;

+parse_numa_opts(current_machine);

 /* parse features once if machine provides default cpu_type */
 if (machine_class->default_cpu_type) {

[Qemu-devel] [PATCH v5] NUMA: Enable adding NUMA node implicitly

2017-10-26 Thread Dou Liyang
Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.

Which breaks both linux and windows guests in certain conditions:
 * Windows: won't enable memory hotplug without SRAT table at all
 * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
   present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
   when memory is hotplugged and guest tries to use it with that drivers.

Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI.
(PS: auto-create numa node only for new machine types so not to break 
migration).

Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
 * Windows: to enable memory hotplug
 * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
   buffers that legacy drivers/hw can handle.

[Rewritten by Igor]

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>
---
changelog V4 --> V5:

  - Avoid calling qemu_opts_parse*()
  - Add a new NUMA node by calling parse_numa_node() directly
  - Remove the redundant argument in parse_numa_opts()

These all were suggested by Eduardo.

---
 hw/i386/pc.c|  1 +
 hw/i386/pc_piix.c   |  1 +
 hw/i386/pc_q35.c|  1 +
 include/hw/boards.h |  1 +
 numa.c  | 21 -
 vl.c|  3 +--
 6 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8e307f7..ec4eb97 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2325,6 +2325,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->auto_enable_numa_with_memhp = true;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index f79d5cb..5e47528 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m)
 m->is_default = 0;
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
+m->auto_enable_numa_with_memhp = false;
 }
 
 DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index da3ea60..d606004 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m)
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
 m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
+m->auto_enable_numa_with_memhp = false;
 }
 
 DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 191a5b3..f1077f1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -192,6 +192,7 @@ struct MachineClass {
 bool ignore_memory_transaction_failures;
 int numa_mem_align_shift;
 const char **valid_cpu_types;
+bool auto_enable_numa_with_memhp;
 void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
  int nb_nodes, ram_addr_t size);
 
diff --git a/numa.c b/numa.c
index 100a67f..4d12f30 100644
--- a/numa.c
+++ b/numa.c
@@ -221,6 +221,7 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 }
 numa_info[nodenr].present = true;
 max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
+nb_numa_nodes++;
 }
 
 static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
@@ -281,7 +282,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error 
**errp)
 if (err) {
 goto end;
 }
-nb_numa_nodes++;
 break;
 case NUMA_OPTIONS_TYPE_DIST:
 parse_numa_distance(>u.dist, );
@@ -432,6 +432,25 @@ void parse_numa_opts(MachineState *ms)
 exit(1);
 }
 
+/*
+ * If memory hotplug is enabled (slots > 0) but without '-numa'
+ * options explicitly on CLI, gueste

Re: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly

2017-10-26 Thread Dou Liyang

Deer Eduardo,

At 10/26/2017 04:02 PM, Eduardo Habkost wrote:

Hi,

Sorry for taking so long to review it:


Not matter. It's my honor!



On Mon, Oct 23, 2017 at 09:33:42AM +0800, Dou Liyang wrote:


[...]


+ */
+if (ram_slots > 0 && QTAILQ_EMPTY(_opts->head)) {
+if (mc->auto_enable_numa_with_memhp) {


If you move the code after qemu_opts_foreach(), you could just
check if nb_numa_nodes is 0 instead of peeking at
numa_opts->head.



+qemu_opts_parse_noisily(numa_opts, "node", true);
+}
+}


Calling qemu_opts_parse*() has additional user-visible side
effects (it can make -writeconfig include the new option,
depending on the initialization ordering).  Affecting QemuOpts
depending on the machine-type breaks the separation between
machine configuration from machine initialization, so I would
like to avoid it.


Yes, Indeed. Thank you so much for kind explanation!


We could simply call parse_numa_node() (after making it increment
nb_numa_nodes automatically).



Yes, I will use it in the next version.


e.g.:


diff --git a/numa.c b/numa.c
index 8d78d959f6..da18e42ce7 100644
--- a/numa.c
+++ b/numa.c
@@ -216,6 +216,7 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 }
 numa_info[nodenr].present = true;
 max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
+nb_numa_nodes++;
 }

 static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
@@ -282,7 +283,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error 
**errp)
 if (err) {
 goto end;
 }
-nb_numa_nodes++;
 break;
 case NUMA_OPTIONS_TYPE_DIST:
 parse_numa_distance(>u.dist, );
@@ -433,6 +433,26 @@ void parse_numa_opts(MachineState *ms)
 exit(1);
 }

+/*
+ * If memory hotplug is enabled (slots > 0) but without '-numa'
+ * options explicitly on CLI, guestes will break.
+ *
+ *   Windows: won't enable memory hotplug without SRAT table at all
+ *
+ *   Linux: if QEMU is started with initial memory all below 4Gb
+ *   and no SRAT table present, guest kernel will use nommu DMA ops,
+ *   which breaks 32bit hw drivers when memory is hotplugged and
+ *   guest tries to use it with that drivers.
+ *
+ * Enable NUMA implicitly by adding a new NUMA node automatically.
+ */
+if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
+mc->auto_enable_numa_with_memhp) {
+NumaNodeOptions node = { };
+parse_numa_node(ms, , _abort);
+}
+
+
 assert(max_numa_nodeid <= MAX_NODES);

 /* No support for sparse NUMA node IDs yet: */


+
+if (qemu_opts_foreach(numa_opts, parse_numa, ms, NULL)) {
 exit(1);
 }

diff --git a/vl.c b/vl.c
index 0723835..516d0c9 100644
--- a/vl.c
+++ b/vl.c
@@ -4677,8 +4677,6 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);

-parse_numa_opts(current_machine);
-
 if (qemu_opts_foreach(qemu_find_opts("mon"),
   mon_init_func, NULL, NULL)) {
 exit(1);
@@ -4728,6 +4726,7 @@ int main(int argc, char **argv, char **envp)
 current_machine->boot_order = boot_order;
 current_machine->cpu_model = cpu_model;

+parse_numa_opts(current_machine, ram_slots);


Why did you add a ram_slots argument if it's already present at
current_machine->ram_slots?


Oops, Forgot to update it after moving it behind the setup of
current_machine. will remove the redundant ram_slots argument.

Thanks,
dou




 /* parse features once if machine provides default cpu_type */
 if (machine_class->default_cpu_type) {
--
2.5.5












Re: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly

2017-10-22 Thread Dou Liyang

Hi, Fam

At 10/23/2017 09:37 AM, no-re...@patchew.org wrote:

Hi,

This series failed build test on s390x host. Please find the details below.

Type: series
Message-id: 1508722422-3861-1-git-send-email-douly.f...@cn.fujitsu.com
Subject: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly



  GEN hw/pci/trace.h
  GEN hw/s390x/trace.h
  GEN hw/vfio/trace.h
  GEN hw/acpi/trace.h
  GEN hw/arm/trace.h
  GEN hw/alpha/trace.h
Makefile:30: recipe for target 'git-submodule-update' failed
make: *** [git-submodule-update] Error 1


It also seems that there are something mistake in our test machines.

Thanks,
dou.


make: *** Waiting for unfinished jobs
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org







Re: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly

2017-10-22 Thread Dou Liyang

Hi Fam,

At 10/23/2017 09:37 AM, no-re...@patchew.org wrote:

Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 1508722422-3861-1-git-send-email-douly.f...@cn.fujitsu.com
Subject: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
time make docker-test-build@min-glib
time make docker-test-mingw@fedora
time make docker-test-block@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/1508722422-3861-1-git-send-email-douly.f...@cn.fujitsu.com -> 
patchew/1508722422-3861-1-git-send-email-douly.f...@cn.fujitsu.com
 t [tag update]patchew/cover.1508329282.git.riku.voi...@linaro.org 
-> patchew/cover.1508329282.git.riku.voi...@linaro.org
Switched to a new branch 'test'
b8c0d922bf NUMA: Enable adding NUMA node implicitly

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-z9ao6big/src/dtc'...
fatal: Could not read from remote repository.



Wow, It seems there are something wrong with the test robot.

Thanks,
dou.


Please make sure you have the correct access rights
and the repository exists.
fatal: clone of 'git://git.qemu-project.org/dtc.git' into submodule path 
'/var/tmp/patchew-tester-tmp-z9ao6big/src/dtc' failed
Failed to clone 'dtc'. Retry scheduled
Cloning into '/var/tmp/patchew-tester-tmp-z9ao6big/src/dtc'...
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of 'git://git.qemu-project.org/dtc.git' into submodule path 
'/var/tmp/patchew-tester-tmp-z9ao6big/src/dtc' failed
Failed to clone 'dtc' a second time, aborting
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org







[Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly

2017-10-22 Thread Dou Liyang
Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.

Which breaks both linux and windows guests in certain conditions:
 * Windows: won't enable memory hotplug without SRAT table at all
 * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
   present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
   when memory is hotplugged and guest tries to use it with that drivers.

Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI.
(PS: auto-create numa node only for new machine types so not to break 
migration).

Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
 * Windows: to enable memory hotplug
 * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
   buffers that legacy drivers/hw can handle.

[Rewritten by Igor]

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>
---
changelog V3 --> V4:
  -Move the parse_numa_opts behind the setup of current_machine
   for safety suggested by Igor
 hw/i386/pc.c  |  1 +
 hw/i386/pc_piix.c |  1 +
 hw/i386/pc_q35.c  |  1 +
 include/hw/boards.h   |  1 +
 include/sysemu/numa.h |  2 +-
 numa.c| 24 ++--
 vl.c  |  3 +--
 7 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8e307f7..ec4eb97 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2325,6 +2325,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->auto_enable_numa_with_memhp = true;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index f79d5cb..5e47528 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m)
 m->is_default = 0;
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
+m->auto_enable_numa_with_memhp = false;
 }
 
 DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index da3ea60..d606004 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m)
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
 m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
+m->auto_enable_numa_with_memhp = false;
 }
 
 DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 191a5b3..f1077f1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -192,6 +192,7 @@ struct MachineClass {
 bool ignore_memory_transaction_failures;
 int numa_mem_align_shift;
 const char **valid_cpu_types;
+bool auto_enable_numa_with_memhp;
 void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
  int nb_nodes, ram_addr_t size);
 
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 5c6df28..31d3ac0 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -30,7 +30,7 @@ struct NumaNodeMem {
 };
 
 extern NodeInfo numa_info[MAX_NODES];
-void parse_numa_opts(MachineState *ms);
+void parse_numa_opts(MachineState *ms, uint64_t ram_slots);
 void query_numa_node_mem(NumaNodeMem node_mem[]);
 extern QemuOptsList qemu_numa_opts;
 void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
diff --git a/numa.c b/numa.c
index 100a67f..ba8d813 100644
--- a/numa.c
+++ b/numa.c
@@ -423,12 +423,32 @@ void numa_default_auto_assign_ram(MachineClass *mc, 
NodeInfo *nodes,
 nodes[i].node_mem = size - usedmem;
 }
 
-void parse_numa_opts(MachineState *ms)
+void parse_numa_opts(MachineState *ms, uint64_t ram_slots)
 {
 int i;
 MachineClass *mc = MACHINE_GET_CLASS(ms);
+QemuOptsList *num

Re: [Qemu-devel] [PATCH] ACPI/unit-test: Add a testcase for RAM allocation in numa node

2017-10-22 Thread Dou Liyang

Hi Eduardo,

Thanks for your reply.

At 10/21/2017 03:15 AM, Eduardo Habkost wrote:

On Wed, Oct 18, 2017 at 11:50:32AM +0800, Dou Liyang wrote:


At 10/18/2017 11:48 AM, Dou Liyang wrote:

As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
Changelog:

This patch can pass the 'make check' in Peter's machine,


Oops,  s/can/can not  :-).

Thanks,
dou.


But, in my own and Eduardo's machine, we can pass it.

So I rebase and spilt it independently.


Thanks!



Peter:

could you help me test it in your machine.
then, give me the output files, such as /tmp/asl-6QYK7Y.dsl and 
/tmp/asl-1H1I7Y.dsl file.


As the failures reported by Peter were about the *.memhp test
case files, I think this patch is unlikely to be the cause of
those failures.  I believe the problem is at:

  From: Dou Liyang <douly.f...@cn.fujitsu.com>
  Date: Fri, 1 Sep 2017 10:10:03 +0800
  Message-ID: <1504231805-30957-3-git-send-email-douly.f...@cn.fujitsu.com>
  Subject: [PATCH v6 2/4] hw/acpi-build: Make assignment statement
   of next_base easy to read



This patch is just used for readability. so I discard it. and, we
need a test case for the commit

  4926403c250f ("hw/acpi-build: Fix SRAT memory building in case of 
node 0 without RAM")


So, I re-post this patch. ;-)



However, I would still like to get test results from Peter before
applying this patch again.


I see.

Thanks,
dou





Re: [Qemu-devel] [PATCH v3] NUMA: Enable adding NUMA node implicitly

2017-10-18 Thread Dou Liyang

Hi Igor,


diff --git a/vl.c b/vl.c
index 9bb5058..d083b4d 100644
--- a/vl.c
+++ b/vl.c
@@ -4665,7 +4665,11 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);

-parse_numa_opts(current_machine);
+current_machine->ram_size = ram_size;
+current_machine->maxram_size = maxram_size;
+current_machine->ram_slots = ram_slots;
+
+parse_numa_opts(current_machine, ram_slots);

 if (qemu_opts_foreach(qemu_find_opts("mon"),
   mon_init_func, NULL, NULL)) {
@@ -4710,9 +4714,6 @@ int main(int argc, char **argv, char **envp)
 replay_checkpoint(CHECKPOINT_INIT);
 qdev_machine_init();

-current_machine->ram_size = ram_size;
-current_machine->maxram_size = maxram_size;
-current_machine->ram_slots = ram_slots;
 current_machine->boot_order = boot_order;
 current_machine->cpu_model = cpu_model;

it should be safe to move parse_numa_opts(current_machine) here



Yes, Indeed!

I have modified and tested this, It's good. will change in the next
version.

Thanks,
dou.













Re: [Qemu-devel] [PATCH v3] NUMA: Enable adding NUMA node implicitly

2017-10-17 Thread Dou Liyang

Hi all,

It seems that a month has passed

So, Ping...

Thanks,
dou.

At 09/21/2017 05:23 PM, Dou Liyang wrote:

Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.

Which breaks both linux and windows guests in certain conditions:
 * Windows: won't enable memory hotplug without SRAT table at all
 * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
   present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
   when memory is hotplugged and guest tries to use it with that drivers.

Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI.
(PS: auto-create numa node only for new machine types so not to break 
migration).

Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
 * Windows: to enable memory hotplug
 * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
   buffers that legacy drivers/hw can handle.

[Rewritten by Igor]

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>
---
changelog V2 --> V3:
  -Replace the callback function with a boolean parameter suggested by Igor
  -Use QTAILQ_EMPTY() macro to check the QemuOptsList

 hw/i386/pc.c  |  1 +
 hw/i386/pc_piix.c |  1 +
 hw/i386/pc_q35.c  |  1 +
 include/hw/boards.h   |  1 +
 include/sysemu/numa.h |  2 +-
 numa.c| 24 ++--
 vl.c  |  9 +
 7 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 05985d4..f1a44cc 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2318,6 +2318,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->auto_enable_numa_with_memhp = true;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 9ff79b1..d87a433 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -449,6 +449,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m)
 m->is_default = 0;
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
+m->auto_enable_numa_with_memhp = false;
 }

 DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 6c4ec4b..68cbfc5 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -319,6 +319,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m)
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
 m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
+m->auto_enable_numa_with_memhp = false;
 }

 DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 156e0a5..0fe2c8f 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -191,6 +191,7 @@ struct MachineClass {
 bool has_hotpluggable_cpus;
 bool ignore_memory_transaction_failures;
 int numa_mem_align_shift;
+bool auto_enable_numa_with_memhp;
 void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
  int nb_nodes, ram_addr_t size);

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 5c6df28..31d3ac0 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -30,7 +30,7 @@ struct NumaNodeMem {
 };

 extern NodeInfo numa_info[MAX_NODES];
-void parse_numa_opts(MachineState *ms);
+void parse_numa_opts(MachineState *ms, uint64_t ram_slots);
 void query_numa_node_mem(NumaNodeMem node_mem[]);
 extern QemuOptsList qemu_numa_opts;
 void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
diff --git a/numa.c b/numa.c
index 100a67f..ba8d813 100644
--- a/numa.c
+++ b/numa.c
@@ -423,12 +423,32 @@ void numa_default_auto_assign_ram(MachineClass *mc, 
NodeInfo *nodes,
 nodes[i].node_mem = size - usedmem;
 }

-void parse_numa_opts(Ma

Re: [Qemu-devel] [PATCH] ACPI/unit-test: Add a testcase for RAM allocation in numa node

2017-10-17 Thread Dou Liyang


At 10/18/2017 11:48 AM, Dou Liyang wrote:

As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
Changelog:

This patch can pass the 'make check' in Peter's machine,


Oops,  s/can/can not  :-).

Thanks,
dou.


But, in my own and Eduardo's machine, we can pass it.

So I rebase and spilt it independently.

Peter:

could you help me test it in your machine.
then, give me the output files, such as /tmp/asl-6QYK7Y.dsl and 
/tmp/asl-1H1I7Y.dsl file.

 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 5104 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes
 tests/bios-tables-test.c  |  24 
 5 files changed, 24 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

diff --git a/tests/acpi-test-data/pc/DSDT.numamem 
b/tests/acpi-test-data/pc/DSDT.numamem
new file mode 100644
index 
..bc703d77ee48720af1b801c518b43511c95c852d
GIT binary patch
literal 5104
zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1
z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq
zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0
zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F;
zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z
z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE
z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS
zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^
za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w
zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h
zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u
z$a<-R7FR~o4s5?jj-
<

[Qemu-devel] [PATCH] ACPI/unit-test: Add a testcase for RAM allocation in numa node

2017-10-17 Thread Dou Liyang
As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
Changelog:

This patch can pass the 'make check' in Peter's machine,
But, in my own and Eduardo's machine, we can pass it.

So I rebase and spilt it independently.

Peter:

could you help me test it in your machine.
then, give me the output files, such as /tmp/asl-6QYK7Y.dsl and 
/tmp/asl-1H1I7Y.dsl file.  

 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 5104 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes
 tests/bios-tables-test.c  |  24 
 5 files changed, 24 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

diff --git a/tests/acpi-test-data/pc/DSDT.numamem 
b/tests/acpi-test-data/pc/DSDT.numamem
new file mode 100644
index 
..bc703d77ee48720af1b801c518b43511c95c852d
GIT binary patch
literal 5104
zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1
z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq
zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0
zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F;
zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z
z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE
z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS
zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^
za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w
zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h
zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u
z$a<-R7FR~o4s5?jj-
<

Re: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05

2017-10-10 Thread Dou Liyang

Hi Peter,

At 10/10/2017 08:56 PM, Peter Maydell wrote:

On 10 October 2017 at 13:52, Eduardo Habkost  wrote:

This shouldn't happen if CONFIG_IASL is not defined.  Did you run
configure again after removing iasl?


I've neither added nor removed iasl from the build systems
in question. Pull request tests just apply the pull and
run make/make check.



Yes, Indeed.

now I can be sure that the problem is not related to iasl
installed or not. There are both OK in my host.

According to:
>>> acpi-test: Warning! DSDT mismatch. Actual [asl:/tmp/asl-6QYK7Y.dsl,
>>> aml:/tmp/aml-E8YK7Y], Expected [asl:/tmp/asl-1H1I7Y.dsl,
>>> aml:tests/acpi-test-data/pc/DSDT.memhp].

May I resend the patch independently, and could you help me test it in
your machine. then, give me the output files, such as 
/tmp/asl-6QYK7Y.dsl and /tmp/asl-1H1I7Y.dsl file.


Thanks,
dou.

thanks
-- PMM









Re: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05

2017-10-10 Thread Dou Liyang

Hi Peter,

At 10/10/2017 05:40 PM, Peter Maydell wrote:

On 10 October 2017 at 08:41, Dou Liyang <douly.f...@cn.fujitsu.com> wrote:

If we want do the "bios-tables-test", we need iasl support.

otherwise, when I run "make V=1 check" without iasl, the test
will report error:


This works in master at the moment, I think (whether by skipping
the test or by using pre-compiled tables I don't know). We need to
keep 'make check' working with and without iasl.


Yes, you are right, QEMU uses the macro named CONFIG_IASL to skip
the test if there is no ACPICA-tools in our system.

The QEMU has already kept 'make check' working with and without iasl.

The way I tested was wrong.

Thanks,
dou.


thanks
-- PMM









Re: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05

2017-10-10 Thread Dou Liyang

Hi Eduardo,

At 10/10/2017 08:52 PM, Eduardo Habkost wrote:

This shouldn't happen if CONFIG_IASL is not defined.  Did you run
configure again after removing iasl?


Oops, Yes, the result above is in CONFIG_IASL = iasl.  :-(
I have been aware of this problem.

I am waiting the result of the test with './configure --iasl="" '

Thanks,
dou.





Re: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05

2017-10-10 Thread Dou Liyang

Hi Eduardo,

I couldn't reproduce the failure too. But, I have some concern below.

At 10/06/2017 11:37 PM, Peter Maydell wrote:

On 6 October 2017 at 16:27, Eduardo Habkost  wrote:

On Fri, Oct 06, 2017 at 01:17:42PM +0100, Peter Maydell wrote:

This fails tests, I'm afraid:

[...]

acpi-test: Warning! DSDT mismatch. Actual [asl:/tmp/asl-6QYK7Y.dsl,
aml:/tmp/aml-E8YK7Y], Expected [asl:/tmp/asl-1H1I7Y.dsl,
aml:tests/acpi-test-data/pc/DSDT.memhp].


This warning is about memory hot-plug tests. DSDT mismatch is too
strange, I hope to see their difference.


acpi-test: Warning. not showing difference since no diff utility is
specified. Set 'DIFF' environment variable to a preferred diff utility
and run 'make V=1 check' again to see ASL difference.OK

and same for /i386/acpi/piix4/numamem and for the
equivalent q35 tests.

Fails on s390, netbsd, freebsd, aarch32, x86-64.



Oops, sorry!

I need to find out why it didn't fail on travis-ci:
https://travis-ci.org/ehabkost/qemu/builds/283884670


My guess is that one of {iasl installed, iasl not installed}
is broken, but the travis environments don't test both.


If we want do the "bios-tables-test", we need iasl support.

otherwise, when I run "make V=1 check" without iasl, the test
will report error:

...
TEST: tests/bios-tables-test... (pid=1006)
  /i386/acpi/piix4:
Looking for expected file 'tests/acpi-test-data/pc/DSDT'

Using expected file 'tests/acpi-test-data/pc/DSDT'

Looking for expected file 'tests/acpi-test-data/pc/APIC'

Using expected file 'tests/acpi-test-data/pc/APIC'

Looking for expected file 'tests/acpi-test-data/pc/HPET'

Using expected file 'tests/acpi-test-data/pc/HPET'
**
ERROR:tests/bios-tables-test.c:328:load_asl: assertion failed (error == 
NULL): Failed to execute child process "iasl" (No such file or 
directory) (g-exec-error-quark, 8)

FAIL
...

Thanks,
dou.

thanks
-- PMM










Re: [Qemu-devel] [PATCH v6 2/4] hw/acpi-build: Make assignment statement of next_base easy to read

2017-10-09 Thread Dou Liyang

Hi Eduardo,

At 10/10/2017 10:29 AM, Eduardo Habkost wrote:

On Fri, Sep 01, 2017 at 10:10:03AM +0800, Dou Liyang wrote:

It may be hard to read the assignment statement of "next_base", so

S/next_base += (1ULL << 32) - pcms->below_4g_mem_size;
 /next_base = mem_base + mem_len;

... for readability.

No functionality change.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>


This or the new test case at patch 3/4 seem to have caused build
failures under some circumstances:

https://www.mail-archive.com/qemu-devel@nongnu.org/msg486368.html

I couldn't reproduce the failure yet, but I'm removing this and
patch 3/4 from the queue until this is investigated.


Hmm, I see, I will do it.

Thanks,
dou.



---
 hw/i386/acpi-build.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a0cf3bf..48525a1 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2411,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 }
 mem_base = 1ULL << 32;
 mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+next_base = mem_base + mem_len;
 }
 numamem = acpi_data_push(table_data, sizeof *numamem);
 build_srat_memory(numamem, mem_base, mem_len, i - 1,
--
2.5.5












[Qemu-devel] [PATCH v3] NUMA: Enable adding NUMA node implicitly

2017-09-21 Thread Dou Liyang
Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.

Which breaks both linux and windows guests in certain conditions:
 * Windows: won't enable memory hotplug without SRAT table at all
 * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
   present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
   when memory is hotplugged and guest tries to use it with that drivers.

Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI.
(PS: auto-create numa node only for new machine types so not to break 
migration).

Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
 * Windows: to enable memory hotplug
 * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
   buffers that legacy drivers/hw can handle.

[Rewritten by Igor]

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>
---
changelog V2 --> V3:
  -Replace the callback function with a boolean parameter suggested by Igor
  -Use QTAILQ_EMPTY() macro to check the QemuOptsList

 hw/i386/pc.c  |  1 +
 hw/i386/pc_piix.c |  1 +
 hw/i386/pc_q35.c  |  1 +
 include/hw/boards.h   |  1 +
 include/sysemu/numa.h |  2 +-
 numa.c| 24 ++--
 vl.c  |  9 +
 7 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 05985d4..f1a44cc 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2318,6 +2318,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->auto_enable_numa_with_memhp = true;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 9ff79b1..d87a433 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -449,6 +449,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m)
 m->is_default = 0;
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
+m->auto_enable_numa_with_memhp = false;
 }
 
 DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 6c4ec4b..68cbfc5 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -319,6 +319,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m)
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
 m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
+m->auto_enable_numa_with_memhp = false;
 }
 
 DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 156e0a5..0fe2c8f 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -191,6 +191,7 @@ struct MachineClass {
 bool has_hotpluggable_cpus;
 bool ignore_memory_transaction_failures;
 int numa_mem_align_shift;
+bool auto_enable_numa_with_memhp;
 void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
  int nb_nodes, ram_addr_t size);
 
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 5c6df28..31d3ac0 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -30,7 +30,7 @@ struct NumaNodeMem {
 };
 
 extern NodeInfo numa_info[MAX_NODES];
-void parse_numa_opts(MachineState *ms);
+void parse_numa_opts(MachineState *ms, uint64_t ram_slots);
 void query_numa_node_mem(NumaNodeMem node_mem[]);
 extern QemuOptsList qemu_numa_opts;
 void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node);
diff --git a/numa.c b/numa.c
index 100a67f..ba8d813 100644
--- a/numa.c
+++ b/numa.c
@@ -423,12 +423,32 @@ void numa_default_auto_assign_ram(MachineClass *mc, 
NodeInfo *nodes,
 nodes[i].node_mem = size - usedmem;
 }
 
-void parse_numa_opts(MachineState *ms)
+void parse_numa_opts(MachineState *ms, uint64_t ram_slots)
 {
 int i;
 Mach

Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly

2017-09-21 Thread Dou Liyang

Hi Igor,

At 09/21/2017 03:54 PM, Igor Mammedov wrote:

On Thu, 21 Sep 2017 12:19:17 +0800


[...]



I've meant something like that:

parse_numa_opts() {
if (mc->auto_enable_numa_with_memhp == true) {
qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true);
}
}



Oops, Understood!  ;-) .


where x86 sets mc->auto_enable_numa_with_memhp to true by default
and sets it to false for 2.10 and older machine types.
It will allow individual machines to enable feature if they need it
but prevent guest ABI breakage for old machine types
(i.e. won't break migration)



Got it.


grep source for 'pcmc->legacy_cpu_hotplug = true' to see how
this compact stuff works.



OK. Thank you very much.

Thanks,
dou






Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly

2017-09-20 Thread Dou Liyang

Hi Igor,

I am sorry I missed some comments you gave to me.

my reply is below.
At 09/18/2017 05:24 PM, Dou Liyang wrote:
[...]

ranges where
  *the guest will attempt to probe for a device that QEMU doesn't
  *implement and a stub device is required.
+ * @numa_implicit_add_node0:
+ *Enable NUMA implicitly by add a NUMA node.

how about:
s/auto_enable_numa_with_memhp/


Yes, really better than me, will do it.


boolean instead, see below how it could improve patch.



I am not really sure why do we want to make this function boolean.


  */
 struct MachineClass {
 /*< private >*/
@@ -191,6 +193,8 @@ struct MachineClass {
 CpuInstanceProperties
(*cpu_index_to_instance_props)(MachineState *machine,
  unsigned
cpu_index);
 const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState
*machine);
+
+void (*numa_implicit_add_node0)(void);
 };

 /**
diff --git a/vl.c b/vl.c
index fb1f05b..814a5fa 100644
--- a/vl.c
+++ b/vl.c
@@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp)
 Error *main_loop_err = NULL;
 Error *err = NULL;
 bool list_data_dirs = false;
+bool has_numa_config_in_CLI = false;
 typedef struct BlockdevOptions_queue {
 BlockdevOptions *bdo;
 Location loc;
@@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp)
 if (!opts) {
 exit(1);
 }
+has_numa_config_in_CLI = true;
 break;
 case QEMU_OPTION_display:
 display_type = select_display(optarg);
@@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);

+/*
+ * If memory hotplug is enabled i.e. slots > 0 and user hasn't add
+ * NUMA nodes explicitly on CLI
+ *
+ * Enable NUMA implicitly for guest to know the maximum memory
+ * from ACPI SRAT table, which is used for SWIOTLB.
+ */
+if (ram_slots > 0 && !has_numa_config_in_CLI) {
+if (machine_class->numa_implicit_add_node0) {
+machine_class->numa_implicit_add_node0();
+}
+}
 parse_numa_opts(current_machine);

it would be better to put this logic inside of parse_numa_opts()
I'd suggest to move:

current_machine->ram_size = ram_size;
current_machine->maxram_size = maxram_size;
current_machine->ram_slots = ram_slots;

before parse_numa_opts() is called, and then
handle 'memhp present+no numa on CLI" logic inside of
parse_numa_opts(). With this you won't have to track
'has_numa_config_in_CLI', drop callback numa_implicit_add_node0()
and numa nuances would be in place they are supposed to be: numa.c



Is "dropping the callback..." means :

static void auto_enable_numa_with_memhp(QemuOptsList *list)
{
...
}

void parse_numa_opts(MachineState *ms, uint64_t ram_slots)
{
QemuOptsList *numa_opts = qemu_find_opts("numa");
...
auto_enable_numa_with_memhp(numa_opts);
...
}

So, No matter what arch it is, if it support NUMA, we will enable NUMA
implicitly when it has already enabled memory hotplug by 
"slot=xx,maxmem=xx" CLI explicitly.


I am not sure that, but this bug only affects x86 as I know, seems no
need to affect other arches which support NUMA as well.

Thanks,
dou.


 if (qemu_opts_foreach(qemu_find_opts("mon"),











[Qemu-devel] [PATCH v2] NUMA: Enable adding NUMA node implicitly

2017-09-18 Thread Dou Liyang
Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
however currently QEMU doesn't create SRAT table if numa options aren't present
on CLI.

Which breaks both linux and windows guests in certain conditions:
 * Windows: won't enable memory hotplug without SRAT table at all
 * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table
   present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
   when memory is hotplugged and guest tries to use it with that drivers.

Fix above issues by automatically creating a numa node when QEMU is started with
memory hotplug enabled but without '-numa' options on CLI. 
(PS: auto-create numa node only for new machine types so not to break 
migration).

Which would provide SRAT table to guests without explicit -numa options on CLI
and would allow:
 * Windows: to enable memory hotplug
 * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated
   buffers that legacy drivers/hw can handle.

[Rewritten by Igor]

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>
---
changelog V1 --> V2:
  -Move the logic from vl.c to numa.c suggested by Igor
  -Fix the guest ABI problem reported by Daniel
  -make the function name more understandable

 hw/i386/pc.c  |  1 +
 hw/i386/pc_piix.c |  1 +
 hw/i386/pc_q35.c  |  1 +
 include/hw/boards.h   |  4 
 include/sysemu/numa.h |  3 ++-
 numa.c| 29 +++--
 vl.c  |  9 +
 7 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2108104..7a753ee 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2349,6 +2349,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->get_hotplug_handler = pc_get_hotpug_handler;
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->add_numa_node_implicitly = numa_add_node_implicitly;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index b03cc04..cc94334 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -452,6 +452,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m)
 m->is_default = 0;
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
+m->add_numa_node_implicitly = NULL;
 }
 
 DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index c1cba58..7329819 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -317,6 +317,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m)
 m->alias = NULL;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
 m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
+m->add_numa_node_implicitly = NULL;
 }
 
 DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 7f044d1..c8c4f25 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -141,6 +141,8 @@ typedef struct {
  *should instead use "unimplemented-device" for all memory ranges where
  *the guest will attempt to probe for a device that QEMU doesn't
  *implement and a stub device is required.
+ * @add_numa_node_implicitly:
+ *Enable NUMA implicitly by add a new NUMA node automatically.
  */
 struct MachineClass {
 /*< private >*/
@@ -191,6 +193,8 @@ struct MachineClass {
 CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
  unsigned cpu_index);
 const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
+
+void (*add_numa_node_implicitly)(QemuOptsList *list);
 };
 
 /**
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 5c6df28..1660249 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -30,7 +30,7 @@ struct NumaNodeMem {
 };
 
 extern NodeInfo numa_info[MAX_NODES];
-void parse_numa_opts(MachineState *ms);
+void parse_numa_opts(MachineState *ms, uint64_t ram_slots);
 void query_numa_node_mem(NumaNodeMem node_mem[]);
 extern QemuOptsList q

Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly

2017-09-18 Thread Dou Liyang

Hi Igor,

At 09/18/2017 05:08 PM, Igor Mammedov wrote:

On Fri, 15 Sep 2017 16:33:18 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT table
for transfering NUMA configuration to the guest. So, the maximum memory in
SRAT can be used to determine whether to use the swiotlb for IOMMU or not.
However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will
never be built. When memory hotplug is enabled, some guest's devices may
start failing due to SWIOTLB is disabled.

Add numa_implicit_add_node0 in struct MachineClass, Invoke it before QEMU
parse NUMA options to enable adding NUMA node implicitly.



I'd rewrite commit message with something like:

Linux and Windows need ACPI SRAT table to make memory
hotplug work properly, however currently QEMU doesn't
create SRAT table if numa options aren't present on CLI.

Which breaks both linux and windows guests in certain
conditions:
 * windows: won't enable memory hotplug without SRAT table at all
 * linux: if QEMU is started with initial memory all below
   4Gb and no SRAT table present, guest kernel will use
   nommu DMA ops, which breaks 32bit hw drivers when memory
   is hotplugged and guest tries to use it with that drivers.

Fix above issues by automatically creating a numa node when
QEMU is started with memory hotplug enabled but without '-numa'
options on CLI.
(PS: auto-create numa node only for new machine types so
not to break migration).

Which would provide SRAT table to guests without explicit
-numa options on CLI and would allow:
 * windows: to enable memory hotplug
 * linux: switch to SWIOTLB DMA ops, to bounce DMA transfers
   to 32bit allocated buffers that legacy drivers/hw can handle.



It's more detailed and clear to understand, I will update the changelog
in the next version.

Thanks,
dou.



Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>

---
 hw/i386/pc.c|  6 ++
 include/hw/boards.h |  4 
 vl.c| 14 ++
 3 files changed, 24 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2108104..3c40117 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2308,6 +2308,11 @@ static const CPUArchIdList 
*pc_possible_cpu_arch_ids(MachineState *ms)
 return ms->possible_cpus;
 }

+static void numa_implicit_add_node0(void)
+{
+qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true);
+}
+
 static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 /* cpu index isn't used */
@@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->get_hotplug_handler = pc_get_hotpug_handler;
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->numa_implicit_add_node0 = numa_implicit_add_node0;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 7f044d1..898d841 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -141,6 +141,8 @@ typedef struct {
  *should instead use "unimplemented-device" for all memory ranges where
  *the guest will attempt to probe for a device that QEMU doesn't
  *implement and a stub device is required.
+ * @numa_implicit_add_node0:
+ *Enable NUMA implicitly by add a NUMA node.

how about:
s/auto_enable_numa_with_memhp/
boolean instead, see below how it could improve patch.


  */
 struct MachineClass {
 /*< private >*/
@@ -191,6 +193,8 @@ struct MachineClass {
 CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
  unsigned cpu_index);
 const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
+
+void (*numa_implicit_add_node0)(void);
 };

 /**
diff --git a/vl.c b/vl.c
index fb1f05b..814a5fa 100644
--- a/vl.c
+++ b/vl.c
@@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp)
 Error *main_loop_err = NULL;
 Error *err = NULL;
 bool list_data_dirs = false;
+bool has_numa_config_in_CLI = false;
 typedef struct BlockdevOptions_queue {

Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly

2017-09-18 Thread Dou Liyang



At 09/18/2017 03:40 PM, Igor Mammedov wrote:

On Mon, 18 Sep 2017 11:54:50 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


At 09/15/2017 06:05 PM, Dou Liyang wrote:

Hi Daniel,

At 09/15/2017 04:40 PM, Daniel P. Berrange wrote:

On Fri, Sep 15, 2017 at 04:33:18PM +0800, Dou Liyang wrote:

In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT
table
for transfering NUMA configuration to the guest. So, the maximum
memory in
SRAT can be used to determine whether to use the swiotlb for IOMMU or
not.

However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will
never be built. When memory hotplug is enabled, some guest's devices may
start failing due to SWIOTLB is disabled.

Add numa_implicit_add_node0 in struct MachineClass, Invoke it before
QEMU
parse NUMA options to enable adding NUMA node implicitly.

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>

---
 hw/i386/pc.c|  6 ++
 include/hw/boards.h |  4 
 vl.c| 14 ++
 3 files changed, 24 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2108104..3c40117 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2308,6 +2308,11 @@ static const CPUArchIdList
*pc_possible_cpu_arch_ids(MachineState *ms)
 return ms->possible_cpus;
 }

+static void numa_implicit_add_node0(void)
+{
+qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true);
+}
+
 static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 /* cpu index isn't used */
@@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass
*oc, void *data)
 mc->get_hotplug_handler = pc_get_hotpug_handler;
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->numa_implicit_add_node0 = numa_implicit_add_node0;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 7f044d1..898d841 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -141,6 +141,8 @@ typedef struct {
  *should instead use "unimplemented-device" for all memory
ranges where
  *the guest will attempt to probe for a device that QEMU doesn't
  *implement and a stub device is required.
+ * @numa_implicit_add_node0:
+ *Enable NUMA implicitly by add a NUMA node.
  */
 struct MachineClass {
 /*< private >*/
@@ -191,6 +193,8 @@ struct MachineClass {
 CpuInstanceProperties
(*cpu_index_to_instance_props)(MachineState *machine,
  unsigned
cpu_index);
 const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState
*machine);
+
+void (*numa_implicit_add_node0)(void);
 };

 /**
diff --git a/vl.c b/vl.c
index fb1f05b..814a5fa 100644
--- a/vl.c
+++ b/vl.c
@@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp)
 Error *main_loop_err = NULL;
 Error *err = NULL;
 bool list_data_dirs = false;
+bool has_numa_config_in_CLI = false;
 typedef struct BlockdevOptions_queue {
 BlockdevOptions *bdo;
 Location loc;
@@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp)
 if (!opts) {
 exit(1);
 }
+has_numa_config_in_CLI = true;
 break;
 case QEMU_OPTION_display:
 display_type = select_display(optarg);
@@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);

+/*
+ * If memory hotplug is enabled i.e. slots > 0 and user hasn't add
+ * NUMA nodes explicitly on CLI
+ *
+ * Enable NUMA implicitly for guest to know the maximum memory
+ * from ACPI SRAT table, which is used for SWIOTLB.
+ */
+if (ram_slots > 0 && !has_numa_config_in_CLI) {
+if (machine_class->numa_implicit_add_node0) {
+machine_class->numa_implicit_add_node0();
+}
+}


Won't this change guest ABI


Hi Daniel,

No, this will change the ACPI table in following case without NUMA
con

Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly

2017-09-17 Thread Dou Liyang

At 09/15/2017 06:05 PM, Dou Liyang wrote:

Hi Daniel,

At 09/15/2017 04:40 PM, Daniel P. Berrange wrote:

On Fri, Sep 15, 2017 at 04:33:18PM +0800, Dou Liyang wrote:

In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT
table
for transfering NUMA configuration to the guest. So, the maximum
memory in
SRAT can be used to determine whether to use the swiotlb for IOMMU or
not.

However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will
never be built. When memory hotplug is enabled, some guest's devices may
start failing due to SWIOTLB is disabled.

Add numa_implicit_add_node0 in struct MachineClass, Invoke it before
QEMU
parse NUMA options to enable adding NUMA node implicitly.

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>

---
 hw/i386/pc.c|  6 ++
 include/hw/boards.h |  4 
 vl.c| 14 ++
 3 files changed, 24 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2108104..3c40117 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2308,6 +2308,11 @@ static const CPUArchIdList
*pc_possible_cpu_arch_ids(MachineState *ms)
 return ms->possible_cpus;
 }

+static void numa_implicit_add_node0(void)
+{
+qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true);
+}
+
 static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 /* cpu index isn't used */
@@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass
*oc, void *data)
 mc->get_hotplug_handler = pc_get_hotpug_handler;
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->numa_implicit_add_node0 = numa_implicit_add_node0;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 7f044d1..898d841 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -141,6 +141,8 @@ typedef struct {
  *should instead use "unimplemented-device" for all memory
ranges where
  *the guest will attempt to probe for a device that QEMU doesn't
  *implement and a stub device is required.
+ * @numa_implicit_add_node0:
+ *Enable NUMA implicitly by add a NUMA node.
  */
 struct MachineClass {
 /*< private >*/
@@ -191,6 +193,8 @@ struct MachineClass {
 CpuInstanceProperties
(*cpu_index_to_instance_props)(MachineState *machine,
  unsigned
cpu_index);
 const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState
*machine);
+
+void (*numa_implicit_add_node0)(void);
 };

 /**
diff --git a/vl.c b/vl.c
index fb1f05b..814a5fa 100644
--- a/vl.c
+++ b/vl.c
@@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp)
 Error *main_loop_err = NULL;
 Error *err = NULL;
 bool list_data_dirs = false;
+bool has_numa_config_in_CLI = false;
 typedef struct BlockdevOptions_queue {
 BlockdevOptions *bdo;
 Location loc;
@@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp)
 if (!opts) {
 exit(1);
 }
+has_numa_config_in_CLI = true;
 break;
 case QEMU_OPTION_display:
 display_type = select_display(optarg);
@@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);

+/*
+ * If memory hotplug is enabled i.e. slots > 0 and user hasn't add
+ * NUMA nodes explicitly on CLI
+ *
+ * Enable NUMA implicitly for guest to know the maximum memory
+ * from ACPI SRAT table, which is used for SWIOTLB.
+ */
+if (ram_slots > 0 && !has_numa_config_in_CLI) {
+if (machine_class->numa_implicit_add_node0) {
+machine_class->numa_implicit_add_node0();
+}
+}


Won't this change guest ABI


Hi Daniel,

No, this will change the ACPI table in following case without NUMA
configuration:

...
-m 128M,slots=4,maxmem=1G
...

How about fixing it by adding

m->numa_implicit_add_node0 = NULL;

in pc_i440fx/q35_2_

Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly

2017-09-15 Thread Dou Liyang

Hi Daniel,

At 09/15/2017 04:40 PM, Daniel P. Berrange wrote:

On Fri, Sep 15, 2017 at 04:33:18PM +0800, Dou Liyang wrote:

In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT table
for transfering NUMA configuration to the guest. So, the maximum memory in
SRAT can be used to determine whether to use the swiotlb for IOMMU or not.

However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will
never be built. When memory hotplug is enabled, some guest's devices may
start failing due to SWIOTLB is disabled.

Add numa_implicit_add_node0 in struct MachineClass, Invoke it before QEMU
parse NUMA options to enable adding NUMA node implicitly.

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>

---
 hw/i386/pc.c|  6 ++
 include/hw/boards.h |  4 
 vl.c| 14 ++
 3 files changed, 24 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2108104..3c40117 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2308,6 +2308,11 @@ static const CPUArchIdList 
*pc_possible_cpu_arch_ids(MachineState *ms)
 return ms->possible_cpus;
 }

+static void numa_implicit_add_node0(void)
+{
+qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true);
+}
+
 static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 /* cpu index isn't used */
@@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->get_hotplug_handler = pc_get_hotpug_handler;
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->numa_implicit_add_node0 = numa_implicit_add_node0;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 7f044d1..898d841 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -141,6 +141,8 @@ typedef struct {
  *should instead use "unimplemented-device" for all memory ranges where
  *the guest will attempt to probe for a device that QEMU doesn't
  *implement and a stub device is required.
+ * @numa_implicit_add_node0:
+ *Enable NUMA implicitly by add a NUMA node.
  */
 struct MachineClass {
 /*< private >*/
@@ -191,6 +193,8 @@ struct MachineClass {
 CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
  unsigned cpu_index);
 const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
+
+void (*numa_implicit_add_node0)(void);
 };

 /**
diff --git a/vl.c b/vl.c
index fb1f05b..814a5fa 100644
--- a/vl.c
+++ b/vl.c
@@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp)
 Error *main_loop_err = NULL;
 Error *err = NULL;
 bool list_data_dirs = false;
+bool has_numa_config_in_CLI = false;
 typedef struct BlockdevOptions_queue {
 BlockdevOptions *bdo;
 Location loc;
@@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp)
 if (!opts) {
 exit(1);
 }
+has_numa_config_in_CLI = true;
 break;
 case QEMU_OPTION_display:
 display_type = select_display(optarg);
@@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);

+/*
+ * If memory hotplug is enabled i.e. slots > 0 and user hasn't add
+ * NUMA nodes explicitly on CLI
+ *
+ * Enable NUMA implicitly for guest to know the maximum memory
+ * from ACPI SRAT table, which is used for SWIOTLB.
+ */
+if (ram_slots > 0 && !has_numa_config_in_CLI) {
+if (machine_class->numa_implicit_add_node0) {
+machine_class->numa_implicit_add_node0();
+}
+}


Won't this change guest ABI and so break migration/save/restore ?


Thank you for your reply, I can't answer this immediately, I will
look into it and reply you soon.

This patch works for X86, has no influence on other arch, and we
can regard it's functionality as "-numa node" in CLI.

Thanks,
dou.



Regards,
Daniel







[Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly

2017-09-15 Thread Dou Liyang
In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT table
for transfering NUMA configuration to the guest. So, the maximum memory in
SRAT can be used to determine whether to use the swiotlb for IOMMU or not.

However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will
never be built. When memory hotplug is enabled, some guest's devices may
start failing due to SWIOTLB is disabled.

Add numa_implicit_add_node0 in struct MachineClass, Invoke it before QEMU
parse NUMA options to enable adding NUMA node implicitly.

Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>
Suggested-by: Igor Mammedov <imamm...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Cc: Richard Henderson <r...@twiddle.net>
Cc: Eduardo Habkost <ehabk...@redhat.com>
Cc: "Michael S. Tsirkin" <m...@redhat.com>
Cc: Marcel Apfelbaum <mar...@redhat.com>
Cc: Igor Mammedov <imamm...@redhat.com>
Cc: David Hildenbrand <da...@redhat.com>
Cc: Thomas Huth <th...@redhat.com>
Cc: Alistair Francis <alistai...@gmail.com>
Cc: f4...@amsat.org
Cc: Takao Indoh <indou.ta...@jp.fujitsu.com>
Cc: Izumi Taku <izumi.t...@jp.fujitsu.com>

---
 hw/i386/pc.c|  6 ++
 include/hw/boards.h |  4 
 vl.c| 14 ++
 3 files changed, 24 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2108104..3c40117 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2308,6 +2308,11 @@ static const CPUArchIdList 
*pc_possible_cpu_arch_ids(MachineState *ms)
 return ms->possible_cpus;
 }
 
+static void numa_implicit_add_node0(void)
+{
+qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true);
+}
+
 static void x86_nmi(NMIState *n, int cpu_index, Error **errp)
 {
 /* cpu index isn't used */
@@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->get_hotplug_handler = pc_get_hotpug_handler;
 mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
 mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
+mc->numa_implicit_add_node0 = numa_implicit_add_node0;
 mc->has_hotpluggable_cpus = true;
 mc->default_boot_order = "cad";
 mc->hot_add_cpu = pc_hot_add_cpu;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 7f044d1..898d841 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -141,6 +141,8 @@ typedef struct {
  *should instead use "unimplemented-device" for all memory ranges where
  *the guest will attempt to probe for a device that QEMU doesn't
  *implement and a stub device is required.
+ * @numa_implicit_add_node0:
+ *Enable NUMA implicitly by add a NUMA node.
  */
 struct MachineClass {
 /*< private >*/
@@ -191,6 +193,8 @@ struct MachineClass {
 CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
  unsigned cpu_index);
 const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
+
+void (*numa_implicit_add_node0)(void);
 };
 
 /**
diff --git a/vl.c b/vl.c
index fb1f05b..814a5fa 100644
--- a/vl.c
+++ b/vl.c
@@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp)
 Error *main_loop_err = NULL;
 Error *err = NULL;
 bool list_data_dirs = false;
+bool has_numa_config_in_CLI = false;
 typedef struct BlockdevOptions_queue {
 BlockdevOptions *bdo;
 Location loc;
@@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp)
 if (!opts) {
 exit(1);
 }
+has_numa_config_in_CLI = true;
 break;
 case QEMU_OPTION_display:
 display_type = select_display(optarg);
@@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
 default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
+/*
+ * If memory hotplug is enabled i.e. slots > 0 and user hasn't add
+ * NUMA nodes explicitly on CLI
+ *
+ * Enable NUMA implicitly for guest to know the maximum memory
+ * from ACPI SRAT table, which is used for SWIOTLB.
+ */
+if (ram_slots > 0 && !has_numa_config_in_CLI) {
+if (machine_class->numa_implicit_add_node0) {
+machine_class->numa_implicit_add_node0();
+}
+}
 parse_numa_opts(current_machine);
 
 if (qemu_opts_foreach(qemu_find_opts("mon"),
-- 
2.5.5






Re: [Qemu-devel] [PATCH] x86/acpi: build SRAT when memory hotplug is enabled

2017-09-11 Thread Dou Liyang

Hi Igor,

At 09/11/2017 06:58 PM, Igor Mammedov wrote:

> >
> > Igor's suggestion to enable NUMA implicitly sounds safer to me.
> >

>
> I agree with Igor too.
>
> Is anybody doing this? If not, may I make a patch to enable adding NUMA
> node implicitly first. let's see what it looks like.

As far as I'm aware nobody is doing it, so fill free to look into it.



Got it!  I will do it right now.

Thanks,
dou.


>
> Thanks,
>dou.
>
>






Re: [Qemu-devel] [PATCH V5 1/3] net/colo-compare.c: Optimize unpredictable tcp options comparison

2017-09-04 Thread Dou Liyang

Hi Chen,

At 09/04/2017 02:14 PM, Zhang Chen wrote:

When network is busy, some tcp options(like sack) will unpredictable
occur in primary side or secondary side. it will make packet size
not same, but the two packet's payload is identical. colo just
care about packet payload, so we skip the option field.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 40 
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index ca67c68..18a9ebf 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -186,7 +186,10 @@ static int packet_enqueue(CompareState *s, int mode)
  * return:0  means packet same
  *> 0 || < 0 means packet different
  */
-static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset)
+static int colo_packet_compare_common(Packet *ppkt,
+  Packet *spkt,
+  int poffset,
+  int soffset)
 {
 if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
 char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20];
@@ -201,12 +204,14 @@ static int colo_packet_compare_common(Packet *ppkt, 
Packet *spkt, int offset)
sec_ip_src, sec_ip_dst);
 }

-offset = ppkt->vnet_hdr_len + offset;
+poffset = ppkt->vnet_hdr_len + poffset;
+soffset = ppkt->vnet_hdr_len + soffset;

-if (ppkt->size == spkt->size) {
-return memcmp(ppkt->data + offset,
-  spkt->data + offset,
-  spkt->size - offset);
+if (ppkt->size == spkt->size ||
+ppkt->size - poffset == spkt->size - soffset) {


This logic has a problem:

ppkt->data
   |---|-rangeP-|
   |\   |
   poffset ppkt->size
 \equal range \
spkt->data\\
   |---|--rangeS|
   ||
soffsetspkt->size

See the above picture, (ppkt->size == spkt->size) is true,
if  [soffset, spkt->size] == [poffset, poffset+(spkt->size- soffset) is
also ture, the code will return 0, but actually, they are not equal.

Please use following code instead,

if (ppkt->size - poffset == spkt->size - soffset)

I am a new boy in COLO, let's see what we actually want to compare,
If I am wrong, please correct me. :-)

ppkt->data
   |---|-rangeP-|
   ||
   poffset ppkt->size

spkt->data
   ||--rangeS-|
| |
 soffset spkt->size

The data in rangeP and rangeS is what we want to compare.
So, we just need care about the rangeX's size and head pointer,
not the whole size.


Thanks,
dou.





Re: [Qemu-devel] [PATCH] x86/acpi: build SRAT when memory hotplug is enabled

2017-09-04 Thread Dou Liyang

Hi Eduardo,

At 09/04/2017 09:08 PM, Eduardo Habkost wrote:
[...]

In my opinion, this may also add the hotpluggable memory, and see the
following commemts.

/*
 * Entry is required for Windows to enable memory hotplug in OS
 * and for Linux to enable SWIOTLB when booted with less than
   
 * 4G of RAM. Windows works better if the entry sets proximity
 * to the highest NUMA node in the machine.
 * Memory devices may override proximity set by this entry,
 * providing _PXM method if necessary.
 */
if (hotplugabble_address_space_size) {
numamem = acpi_data_push(table_data, sizeof *numamem);
build_srat_memory(numamem, pcms->hotplug_memory.base,
  hotplugabble_address_space_size, pcms->numa_nodes
- 1,
  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
}


You are correct, I didn't see that part of the code.  If that's
the entry that's missing, the patch makes sense.  Thanks!

However, the resulting tables still don't look correct: it will
generate an entry assigned to NUMA node (uint32_t)-1 if no NUMA
nodes are configured elsewhere, some APIC entries, but no entries
for the rest of the memory.


Yes, indeed.



Igor's suggestion to enable NUMA implicitly sounds safer to me.



I agree with Igor too.

Is anybody doing this? If not, may I make a patch to enable adding NUMA
node implicitly first. let's see what it looks like.

Thanks,
dou.





Re: [Qemu-devel] [PATCH v5 1/3] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM

2017-09-04 Thread Dou Liyang

Hi Igor,

At 09/04/2017 07:11 PM, Igor Mammedov wrote:
[...]

+if (mem_base <= HOLE_640K_START &&
+next_base > HOLE_640K_START) {
+mem_len -= next_base - HOLE_640K_START;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
+}
+
+/* Check for the rare case: 640K < RAM < 1M */
+if (next_base <= HOLE_640K_END) {
+next_base = HOLE_640K_END;

Is this assignment really necessary?



It is necessary, because we set mem_base to next_base before setting
next_base;

But, I can refine it:

MEM_AFFINITY_ENABLED);
  }

+mem_base = HOLE_640K_END;
  /* Check for the rare case: 640K < RAM < 1M */
  if (next_base <= HOLE_640K_END) {
-next_base = HOLE_640K_END;
  continue;
  }
-mem_base = HOLE_640K_END;
  mem_len = next_base - HOLE_640K_END;
  }

Is it?

I was wrong, so just leave it as it is now.



OK, I see.

Thanks,
dou.





Re: [Qemu-devel] [PATCH v5 1/3] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM

2017-09-04 Thread Dou Liyang



At 09/04/2017 05:39 PM, Igor Mammedov wrote:

On Thu, 31 Aug 2017 20:04:26 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


From: Eduardo Habkost <ehabk...@redhat.com>

Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node 0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by  cut out the 640K hole in the same way the PCI
4G hole does. Also do some cleanup.

Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 hw/i386/acpi-build.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..48525a1 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,6 +2318,9 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }

+#define HOLE_640K_START  (640 * 1024)
+#define HOLE_640K_END   (1024 * 1024)
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
@@ -2373,17 +2376,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 next_base = 0;
 numa_start = table_data->len;

-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
 for (i = 1; i < pcms->numa_nodes + 1; ++i) {
 mem_base = next_base;
 mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
-}
 next_base = mem_base + mem_len;

+/* Cut out the 640K hole */
+if (mem_base <= HOLE_640K_START &&
+next_base > HOLE_640K_START) {
+mem_len -= next_base - HOLE_640K_START;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
+}
+
+/* Check for the rare case: 640K < RAM < 1M */
+if (next_base <= HOLE_640K_END) {
+next_base = HOLE_640K_END;

Is this assignment really necessary?



It is necessary, because we set mem_base to next_base before setting
next_base;

But, I can refine it:

   MEM_AFFINITY_ENABLED);
 }

+mem_base = HOLE_640K_END;
 /* Check for the rare case: 640K < RAM < 1M */
 if (next_base <= HOLE_640K_END) {
-next_base = HOLE_640K_END;
 continue;
 }
-mem_base = HOLE_640K_END;
 mem_len = next_base - HOLE_640K_END;
 }

Is it?

Thanks,
dou.


it seems that next_base will be set at the start of the loop.





+continue;
+}
+mem_base = HOLE_640K_END;
+mem_len = next_base - HOLE_640K_END;
+}
+
 /* Cut out the ACPI_PCI hole */
 if (mem_base <= pcms->below_4g_mem_size &&
 next_base > pcms->below_4g_mem_size) {
@@ -2395,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 }
 mem_base = 1ULL << 32;
 mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+next_base = mem_base + mem_len;
 }
 numamem = acpi_data_push(table_data, sizeof *numamem);
 build_srat_memory(numamem, mem_base, mem_len, i - 1,











Re: [Qemu-devel] [PATCH V4 1/3] net/colo-compare.c: Optimize unpredictable tcp options comparison

2017-09-03 Thread Dou Liyang

Hi Chen,

At 09/02/2017 12:02 AM, Zhang Chen wrote:



On 09/01/2017 05:35 PM, Dou Liyang wrote:

Hi chen,

At 08/21/2017 04:55 PM, Zhang Chen wrote:

When network is busy, some tcp options(like sack) will unpredictable
occur in primary side or secondary side. it will make packet size
not same, but the two packet's payload is identical. colo just
care about packet payload, so we skip the option field.

Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com>
---
 net/colo-compare.c | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index ca67c68..f6bda41 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -186,7 +186,10 @@ static int packet_enqueue(CompareState *s, int
mode)
  * return:0  means packet same
  *> 0 || < 0 means packet different
  */
-static int colo_packet_compare_common(Packet *ppkt, Packet *spkt,
int offset)
+static int colo_packet_compare_common(Packet *ppkt,
+  Packet *spkt,
+  int poffset,
+  int soffset)
 {
 if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
 char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20],
sec_ip_dst[20];
@@ -201,12 +204,13 @@ static int colo_packet_compare_common(Packet
*ppkt, Packet *spkt, int offset)
sec_ip_src, sec_ip_dst);
 }

-offset = ppkt->vnet_hdr_len + offset;
+poffset = ppkt->vnet_hdr_len + poffset;
+soffset = ppkt->vnet_hdr_len + soffset;

-if (ppkt->size == spkt->size) {
-return memcmp(ppkt->data + offset,
-  spkt->data + offset,
-  spkt->size - offset);
+if (ppkt->size == spkt->size || poffset != soffset) {
+return memcmp(ppkt->data + poffset,
+  spkt->data + soffset,
+  spkt->size - soffset);


Here maybe a mistake,

I guess you should make sure

(ppkt->size - poffset) == (spkt->size - soffset) is true in case of
(poffset != soffset) at the same time.

Because, if sack make the header not equal, it is also possible that
the data is not equal at the same time.


Good catch! In rare situation will miss a different packet, I will fix
it in next version.



Thank you very much!




 } else {
 trace_colo_compare_main("Net packet size are not the same");
 return -1;
@@ -263,13 +267,22 @@ static int colo_packet_compare_tcp(Packet
*spkt, Packet *ppkt)
  * so we just need skip this field.
  */
 if (ptcp->th_off > 5) {
-ptrdiff_t tcp_offset;
+ptrdiff_t ptcp_offset, stcp_offset;

-tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
- + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
-res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
+ptcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
+  + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
+stcp_offset = spkt->transport_header - (uint8_t *)spkt->data
+  + (stcp->th_off * 4) - spkt->vnet_hdr_len;
+
+/*
+ * When network is busy, some tcp options(like sack) will
unpredictable
+ * occur in primary side or secondary side. it will make
packet size
+ * not same, but the two packet's payload is identical. colo
just
+ * care about packet payload, so we skip the option field.
+ */
+res = colo_packet_compare_common(ppkt, spkt, ptcp_offset,
stcp_offset);


we add a new parameter in xxx_common() just for xxx_tcp().
In my opinion, it seems not good.

Can we split the tcp related code out from xxx_common, and make the
xxx_common function just do like what its name says.


We introduce this parameter for all protocol, just tcp is the first user.
We should have the ability of compare all kinds of packet.
Maybe we must use different offset for other protocol except tcp in the
future.



Got it,

Thanks,
dou.


Thanks
Zhang Chen





 } else if (ptcp->th_sum == stcp->th_sum) {
-res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);
+res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN,
ETH_HLEN);
 } else {
 res = -1;
 }
@@ -329,6 +342,7 @@ static int colo_packet_compare_udp(Packet *spkt,
Packet *ppkt)
  * the ip payload here.
  */
 ret = colo_packet_compare_common(ppkt, spkt,
+ network_header_length + ETH_HLEN,
  network_header_length + ETH_HLEN);


ditto



 if (ret) {
@@ -366,6 +380,7 @@ static int colo_packet_compare_icmp(Packet *spkt,
Packet *ppkt)
  * the ip payload here.
  */
 if (colo_packet_compare_common(ppkt, spkt,
+  

Re: [Qemu-devel] [PATCH] x86/acpi: build SRAT when memory hotplug is enabled

2017-09-03 Thread Dou Liyang

Hi Eduardo, Thadeu,

At 09/02/2017 12:11 AM, Eduardo Habkost wrote:

On Fri, Sep 01, 2017 at 12:45:42PM -0300, Thadeu Lima de Souza Cascardo wrote:

Linux uses SRAT to determine the maximum memory in a system, which is
used to determine whether to use the swiotlb for IOMMU or not for a
device that supports only 32 bits of addresses.


Do you have a pointer to the corresponding Linux code, for
reference?  Which SRAT entries Linux uses to make this decision?



When there is no NUMA configuration, qemu will not build SRAT. And when
memory hotplug is done, some Linux device drivers start failing.

Tested by running with -m 512M,slots=8,maxmem=1G, adding the memory,
putting that online and using the system. Without the patch, swiotlb is
not used and ATA driver fails. With the patch, swiotlb is used, no
driver failure is observed.

Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com>


As far as I can see, this will only add APIC entries and a memory
affinity entry for the first 640KB (which would be obviously
wrong) if pcms->numa_nodes is 0.



In my opinion, this may also add the hotpluggable memory, and see the 
following commemts.


/*
 * Entry is required for Windows to enable memory hotplug in OS
 * and for Linux to enable SWIOTLB when booted with less than
   
 * 4G of RAM. Windows works better if the entry sets proximity
 * to the highest NUMA node in the machine.
 * Memory devices may override proximity set by this entry,
 * providing _PXM method if necessary.
 */
if (hotplugabble_address_space_size) {
numamem = acpi_data_push(table_data, sizeof *numamem);
build_srat_memory(numamem, pcms->hotplug_memory.base,
  hotplugabble_address_space_size, 
pcms->numa_nodes - 1,
  MEM_AFFINITY_HOTPLUGGABLE | 
MEM_AFFINITY_ENABLED);

}


Thanks,
dou.


Once we apply the "Fix SRAT memory building in case of node 0
without RAM" patch from Dou Liyang, no memory affinity entries
will be generated if pcms->numa_nodes is 0.  Would this cause the
problem to happen again?








---
 hw/i386/acpi-build.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424678..fb94249779 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2645,6 +2645,9 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 GArray *tables_blob = tables->table_data;
 AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL };
 Object *vmgenid_dev;
+ram_addr_t hotplugabble_address_space_size =
+object_property_get_int(OBJECT(pcms), PC_MACHINE_MEMHP_REGION_SIZE,
+NULL);

 acpi_get_pm_info();
 acpi_get_misc_info();
@@ -2708,7 +2711,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 build_tpm2(tables_blob, tables->linker);
 }
 }
-if (pcms->numa_nodes) {
+if (pcms->numa_nodes || hotplugabble_address_space_size) {
 acpi_add_table(table_offsets, tables_blob);
 build_srat(tables_blob, tables->linker, machine);
 if (have_numa_distance) {
--
2.11.0









Re: [Qemu-devel] [PATCH V4 1/3] net/colo-compare.c: Optimize unpredictable tcp options comparison

2017-09-01 Thread Dou Liyang

Hi chen,

At 08/21/2017 04:55 PM, Zhang Chen wrote:

When network is busy, some tcp options(like sack) will unpredictable
occur in primary side or secondary side. it will make packet size
not same, but the two packet's payload is identical. colo just
care about packet payload, so we skip the option field.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index ca67c68..f6bda41 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -186,7 +186,10 @@ static int packet_enqueue(CompareState *s, int mode)
  * return:0  means packet same
  *> 0 || < 0 means packet different
  */
-static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset)
+static int colo_packet_compare_common(Packet *ppkt,
+  Packet *spkt,
+  int poffset,
+  int soffset)
 {
 if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) {
 char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20];
@@ -201,12 +204,13 @@ static int colo_packet_compare_common(Packet *ppkt, 
Packet *spkt, int offset)
sec_ip_src, sec_ip_dst);
 }

-offset = ppkt->vnet_hdr_len + offset;
+poffset = ppkt->vnet_hdr_len + poffset;
+soffset = ppkt->vnet_hdr_len + soffset;

-if (ppkt->size == spkt->size) {
-return memcmp(ppkt->data + offset,
-  spkt->data + offset,
-  spkt->size - offset);
+if (ppkt->size == spkt->size || poffset != soffset) {
+return memcmp(ppkt->data + poffset,
+  spkt->data + soffset,
+  spkt->size - soffset);


Here maybe a mistake,

I guess you should make sure

(ppkt->size - poffset) == (spkt->size - soffset) is true in case of
(poffset != soffset) at the same time.

Because, if sack make the header not equal, it is also possible that
the data is not equal at the same time.


 } else {
 trace_colo_compare_main("Net packet size are not the same");
 return -1;
@@ -263,13 +267,22 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet 
*ppkt)
  * so we just need skip this field.
  */
 if (ptcp->th_off > 5) {
-ptrdiff_t tcp_offset;
+ptrdiff_t ptcp_offset, stcp_offset;

-tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
- + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
-res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
+ptcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
+  + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
+stcp_offset = spkt->transport_header - (uint8_t *)spkt->data
+  + (stcp->th_off * 4) - spkt->vnet_hdr_len;
+
+/*
+ * When network is busy, some tcp options(like sack) will unpredictable
+ * occur in primary side or secondary side. it will make packet size
+ * not same, but the two packet's payload is identical. colo just
+ * care about packet payload, so we skip the option field.
+ */
+res = colo_packet_compare_common(ppkt, spkt, ptcp_offset, stcp_offset);


we add a new parameter in xxx_common() just for xxx_tcp().
In my opinion, it seems not good.

Can we split the tcp related code out from xxx_common, and make the 
xxx_common function just do like what its name says.




 } else if (ptcp->th_sum == stcp->th_sum) {
-res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);
+res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN, ETH_HLEN);
 } else {
 res = -1;
 }
@@ -329,6 +342,7 @@ static int colo_packet_compare_udp(Packet *spkt, Packet 
*ppkt)
  * the ip payload here.
  */
 ret = colo_packet_compare_common(ppkt, spkt,
+ network_header_length + ETH_HLEN,
  network_header_length + ETH_HLEN);


ditto



 if (ret) {
@@ -366,6 +380,7 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet 
*ppkt)
  * the ip payload here.
  */
 if (colo_packet_compare_common(ppkt, spkt,
+   network_header_length + ETH_HLEN,
network_header_length + ETH_HLEN)) {


ditto


 trace_colo_compare_icmp_miscompare("primary pkt size",
ppkt->size);
@@ -403,7 +418,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet 
*ppkt)
sec_ip_src, sec_ip_dst);
 }

-return colo_packet_compare_common(ppkt, spkt, 0);
+return colo_packet_compare_common(ppkt, spkt, 0, 0);


ditto

Thanks,
dou.


 }

 static int colo_old_packet_check_one(Packet *pkt, int64_t 

[Qemu-devel] [PATCH v6 3/4] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-08-31 Thread Dou Liyang
As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 5104 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes
 tests/bios-tables-test.c  |  24 
 5 files changed, 24 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

diff --git a/tests/acpi-test-data/pc/DSDT.numamem 
b/tests/acpi-test-data/pc/DSDT.numamem
new file mode 100644
index 
..bc703d77ee48720af1b801c518b43511c95c852d
GIT binary patch
literal 5104
zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1
z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq
zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0
zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F;
zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z
z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE
z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS
zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^
za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w
zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h
zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u
z$a<-R7FR~o4s5?jj-
<

[Qemu-devel] [PATCH v6 4/4] NUMA: Replace MAX_NODES with nb_numa_nodes in for loop

2017-08-31 Thread Dou Liyang
In QEMU, the number of the NUMA nodes is determined by parse_numa_opts().
Then, QEMU uses it for iteration, for example:
  for (i = 0; i < nb_numa_nodes; i++)

However, in memory_region_allocate_system_memory(), it uses MAX_NODES
not nb_numa_nodes.

So, replace MAX_NODES with nb_numa_nodes to keep code consistency and
reduce the loop times.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Reviewed-by: Igor Mammedov <imamm...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4...@amsat.org>
---
 numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/numa.c b/numa.c
index e32af04..5f2916d 100644
--- a/numa.c
+++ b/numa.c
@@ -567,7 +567,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, 
Object *owner,
 }
 
 memory_region_init(mr, owner, name, ram_size);
-for (i = 0; i < MAX_NODES; i++) {
+for (i = 0; i < nb_numa_nodes; i++) {
 uint64_t size = numa_info[i].node_mem;
 HostMemoryBackend *backend = numa_info[i].node_memdev;
 if (!backend) {
-- 
2.5.5






[Qemu-devel] [PATCH v6 0/4]ACPI: NUMA: Fix ACPI SRAT Memory Affinity building

2017-08-31 Thread Dou Liyang
v5 --> v6:
  - Split the unrelated code into a separate patch [2/4]

v4 --> v5:
  - Replace the original way with Eduardo's method.
  - Rewrite the testcase.
  - Drop the SLIT date
  - 2.11 develop tree is opened, So, Add the third patch for re-posting it. 

v3 --> v4:
  -add a new testcase.

This patchset fixs an ACPI building bug which caused by no RAM
in the first NUAM node. and also add a new testcase for the bug.

Dou Liyang (3):
  hw/acpi-build: Make assignment statement of next_base easy to read
  ACPI/unit-test: Add a new testcase for RAM allocation in numa node
  NUMA: Replace MAX_NODES with nb_numa_nodes in for loop

Eduardo Habkost (1):
  hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM

 hw/i386/acpi-build.c  |  30 +++---
 numa.c|   2 +-
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 5104 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes
 tests/bios-tables-test.c  |  24 
 7 files changed, 48 insertions(+), 8 deletions(-)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

-- 
2.5.5






[Qemu-devel] [PATCH v6 1/4] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM

2017-08-31 Thread Dou Liyang
From: Eduardo Habkost <ehabk...@redhat.com>

Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node 0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by cut out the 640K hole in the same way the PCI
4G hole does.

Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 hw/i386/acpi-build.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..a0cf3bf 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,6 +2318,9 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }
 
+#define HOLE_640K_START  (640 * 1024)
+#define HOLE_640K_END   (1024 * 1024)
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
@@ -2373,17 +2376,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 next_base = 0;
 numa_start = table_data->len;
 
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
 for (i = 1; i < pcms->numa_nodes + 1; ++i) {
 mem_base = next_base;
 mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
-}
 next_base = mem_base + mem_len;
 
+/* Cut out the 640K hole */
+if (mem_base <= HOLE_640K_START &&
+next_base > HOLE_640K_START) {
+mem_len -= next_base - HOLE_640K_START;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
+}
+
+/* Check for the rare case: 640K < RAM < 1M */
+if (next_base <= HOLE_640K_END) {
+next_base = HOLE_640K_END;
+continue;
+}
+mem_base = HOLE_640K_END;
+mem_len = next_base - HOLE_640K_END;
+}
+
 /* Cut out the ACPI_PCI hole */
 if (mem_base <= pcms->below_4g_mem_size &&
 next_base > pcms->below_4g_mem_size) {
-- 
2.5.5






[Qemu-devel] [PATCH v6 2/4] hw/acpi-build: Make assignment statement of next_base easy to read

2017-08-31 Thread Dou Liyang
It may be hard to read the assignment statement of "next_base", so

S/next_base += (1ULL << 32) - pcms->below_4g_mem_size;
 /next_base = mem_base + mem_len;

... for readability.

No functionality change.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 hw/i386/acpi-build.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a0cf3bf..48525a1 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2411,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 }
 mem_base = 1ULL << 32;
 mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+next_base = mem_base + mem_len;
 }
 numamem = acpi_data_push(table_data, sizeof *numamem);
 build_srat_memory(numamem, mem_base, mem_len, i - 1,
-- 
2.5.5






Re: [Qemu-devel] [PATCH v5 1/3] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM

2017-08-31 Thread Dou Liyang

Hi, Eduardo

At 09/01/2017 05:36 AM, Eduardo Habkost wrote:

On Thu, Aug 31, 2017 at 08:04:26PM +0800, Dou Liyang wrote:

From: Eduardo Habkost <ehabk...@redhat.com>

Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node 0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by  cut out the 640K hole in the same way the PCI
4G hole does. Also do some cleanup.

Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 hw/i386/acpi-build.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..48525a1 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,6 +2318,9 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }

+#define HOLE_640K_START  (640 * 1024)
+#define HOLE_640K_END   (1024 * 1024)
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
@@ -2373,17 +2376,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 next_base = 0;
 numa_start = table_data->len;

-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
 for (i = 1; i < pcms->numa_nodes + 1; ++i) {
 mem_base = next_base;
 mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
-}
 next_base = mem_base + mem_len;

+/* Cut out the 640K hole */
+if (mem_base <= HOLE_640K_START &&
+next_base > HOLE_640K_START) {
+mem_len -= next_base - HOLE_640K_START;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
+}
+
+/* Check for the rare case: 640K < RAM < 1M */
+if (next_base <= HOLE_640K_END) {
+next_base = HOLE_640K_END;
+continue;
+}
+mem_base = HOLE_640K_END;
+mem_len = next_base - HOLE_640K_END;
+}
+
 /* Cut out the ACPI_PCI hole */
 if (mem_base <= pcms->below_4g_mem_size &&
 next_base > pcms->below_4g_mem_size) {
@@ -2395,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 }
 mem_base = 1ULL << 32;
 mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+next_base = mem_base + mem_len;


Is this extra change intentional?



Yes, it is, Just for readability. :-)


I find the code more readable with it, but it should go in a
separate patch because it is unrelated to the bug fix.



Indeed, I will split it out.

Thanks,
dou.






Re: [Qemu-devel] [PATCH v4 1/2] hw/acpi-build: Fix SRAT memory building when there is no memory in node0

2017-08-31 Thread Dou Liyang

Hi Eduardo,

At 08/31/2017 06:38 PM, Dou Liyang wrote:

Hi Eduardo,

[...]

+continue;
 }


Now the nodes will be out of order, if node 0 has no RAM.  Why?



Because the code parsed the other node with RAM first, then parsed the
node 0.


Why not implement this in the same way the PCI 4GB hole is
already implemented.  e.g.:



Indeed, it is better and more refined. I will do it in the next version.


#define HOLE_640K_START  (640 * 1024)
#define HOLE_640K_END   (1024 * 1024)

for (node = 0; node < pcms->numa_nodes; node++) {
mem_base = next_base;
mem_len = pcms->node_mem[node];
next_base = mem_base + mem_len;

/* Cut out the 640K hole */
if (mem_base <= HOLE_640K_START &&
next_base > HOLE_640K_START) {
mem_len -= next_base - HOLE_640K;
if (mem_len > 0) {
numamem = acpi_data_push(table_data, sizeof *numamem);
build_srat_memory(numamem, mem_base, mem_len, node,
  MEM_AFFINITY_ENABLED);
}
mem_base = HOLE_640K_END;
/* ... */


BTW, in case

   0
   |---|---||---
   |   |   ||
mem_base  640K   next_base   1M


 I wanna add a check here,

/* Check for the rare case: 640K < RAM < 1M */
if (next_base <= HOLE_640K_END) {
next_base = HOLE_640K_END;
continue;
}
mem_base = HOLE_640K_END;
mem_len = next_base - HOLE_640K_END;

I guess no one would set a node with this less RAM,
So, Is that necessary?



I post the V5 patches, and use your code.

It may be not very clearly, for the detail, Please see the new version.

Thanks,
dou.


Thanks,
dou.


}
/* Cut out the ACPI_PCI hole */
if (mem_base <= pcms->below_4g_mem_size &&
next_base > pcms->below_4g_mem_size) {
mem_len -= next_base - pcms->below_4g_mem_size;
/* ... *]
}
numamem = acpi_data_push(table_data, sizeof *numamem);
build_srat_memory(numamem, mem_base, mem_len, node,
  MEM_AFFINITY_ENABLED);
/* ... */
}


-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
+mem_base = build_srat_node_entry(table_data, pcms, i,
+mem_base,
pcms->node_mem[i]);
 }
 slots = (table_data->len - numa_start) / sizeof *numamem;
 for (; slots < pcms->numa_nodes + 2; slots++) {
--
2.5.5












[Qemu-devel] [PATCH v5 1/3] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM

2017-08-31 Thread Dou Liyang
From: Eduardo Habkost <ehabk...@redhat.com>

Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node 0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by  cut out the 640K hole in the same way the PCI
4G hole does. Also do some cleanup.

Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 hw/i386/acpi-build.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..48525a1 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,6 +2318,9 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }
 
+#define HOLE_640K_START  (640 * 1024)
+#define HOLE_640K_END   (1024 * 1024)
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
@@ -2373,17 +2376,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 next_base = 0;
 numa_start = table_data->len;
 
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
 for (i = 1; i < pcms->numa_nodes + 1; ++i) {
 mem_base = next_base;
 mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
-}
 next_base = mem_base + mem_len;
 
+/* Cut out the 640K hole */
+if (mem_base <= HOLE_640K_START &&
+next_base > HOLE_640K_START) {
+mem_len -= next_base - HOLE_640K_START;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i - 1,
+  MEM_AFFINITY_ENABLED);
+}
+
+/* Check for the rare case: 640K < RAM < 1M */
+if (next_base <= HOLE_640K_END) {
+next_base = HOLE_640K_END;
+continue;
+}
+mem_base = HOLE_640K_END;
+mem_len = next_base - HOLE_640K_END;
+}
+
 /* Cut out the ACPI_PCI hole */
 if (mem_base <= pcms->below_4g_mem_size &&
 next_base > pcms->below_4g_mem_size) {
@@ -2395,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 }
 mem_base = 1ULL << 32;
 mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+next_base = mem_base + mem_len;
 }
 numamem = acpi_data_push(table_data, sizeof *numamem);
 build_srat_memory(numamem, mem_base, mem_len, i - 1,
-- 
2.5.5






[Qemu-devel] [PATCH v5 2/3] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-08-31 Thread Dou Liyang
As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 5104 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes
 tests/bios-tables-test.c  |  24 
 5 files changed, 24 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

diff --git a/tests/acpi-test-data/pc/DSDT.numamem 
b/tests/acpi-test-data/pc/DSDT.numamem
new file mode 100644
index 
..bc703d77ee48720af1b801c518b43511c95c852d
GIT binary patch
literal 5104
zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1
z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq
zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0
zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F;
zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z
z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE
z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS
zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^
za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w
zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h
zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u
z$a<-R7FR~o4s5?jj-
<

[Qemu-devel] [PATCH v5 0/3] hw/acpi-build: Fix ACPI SRAT Memory Affinity building

2017-08-31 Thread Dou Liyang
v4 --> v5:
  - Replace the original way with Eduardo's method.
  - rewrite the testcase.
  - Drop the SLIT date
  - 2.11 develop tree is opened, So, Add the third patch for re-posting it. 

v3 --> v4:
  -add a new testcase.

This patchset fixs an ACPI building bug which caused by no RAM
in the first NUAM node. and also add a new testcase for the bug.

Dou Liyang (2):
  ACPI/unit-test: Add a new testcase for RAM allocation in numa node
  NUMA: Replace MAX_NODES with nb_numa_nodes in for loop

Eduardo Habkost (1):
  hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM

 hw/i386/acpi-build.c  |  30 +++---
 numa.c|   2 +-
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 5104 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes
 tests/bios-tables-test.c  |  24 
 7 files changed, 48 insertions(+), 8 deletions(-)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

-- 
2.5.5






[Qemu-devel] [PATCH v5 3/3] NUMA: Replace MAX_NODES with nb_numa_nodes in for loop

2017-08-31 Thread Dou Liyang
In QEMU, the number of the NUMA nodes is determined by parse_numa_opts().
Then, QEMU uses it for iteration, for example:
  for (i = 0; i < nb_numa_nodes; i++)

However, in memory_region_allocate_system_memory(), it uses MAX_NODES
not nb_numa_nodes.

So, replace MAX_NODES with nb_numa_nodes to keep code consistency and
reduce the loop times.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Reviewed-by: Igor Mammedov <imamm...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4...@amsat.org>
---
 numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/numa.c b/numa.c
index e32af04..5f2916d 100644
--- a/numa.c
+++ b/numa.c
@@ -567,7 +567,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, 
Object *owner,
 }
 
 memory_region_init(mr, owner, name, ram_size);
-for (i = 0; i < MAX_NODES; i++) {
+for (i = 0; i < nb_numa_nodes; i++) {
 uint64_t size = numa_info[i].node_mem;
 HostMemoryBackend *backend = numa_info[i].node_memdev;
 if (!backend) {
-- 
2.5.5






Re: [Qemu-devel] [PATCH v4 1/2] hw/acpi-build: Fix SRAT memory building when there is no memory in node0

2017-08-31 Thread Dou Liyang

Hi Eduardo,

[...]

+continue;
 }


Now the nodes will be out of order, if node 0 has no RAM.  Why?



Because the code parsed the other node with RAM first, then parsed the 
node 0.



Why not implement this in the same way the PCI 4GB hole is
already implemented.  e.g.:



Indeed, it is better and more refined. I will do it in the next version.


#define HOLE_640K_START  (640 * 1024)
#define HOLE_640K_END   (1024 * 1024)

for (node = 0; node < pcms->numa_nodes; node++) {
mem_base = next_base;
mem_len = pcms->node_mem[node];
next_base = mem_base + mem_len;

/* Cut out the 640K hole */
if (mem_base <= HOLE_640K_START &&
next_base > HOLE_640K_START) {
mem_len -= next_base - HOLE_640K;
if (mem_len > 0) {
numamem = acpi_data_push(table_data, sizeof *numamem);
build_srat_memory(numamem, mem_base, mem_len, node,
  MEM_AFFINITY_ENABLED);
}
mem_base = HOLE_640K_END;
/* ... */


BTW, in case

   0
   |---|---||---
   |   |   ||
mem_base  640K   next_base   1M


 I wanna add a check here,

/* Check for the rare case: 640K < RAM < 1M */
if (next_base <= HOLE_640K_END) {
next_base = HOLE_640K_END;
continue;
}
mem_base = HOLE_640K_END;
mem_len = next_base - HOLE_640K_END;

I guess no one would set a node with this less RAM,
So, Is that necessary?

Thanks,
dou.


}
/* Cut out the ACPI_PCI hole */
if (mem_base <= pcms->below_4g_mem_size &&
next_base > pcms->below_4g_mem_size) {
mem_len -= next_base - pcms->below_4g_mem_size;
/* ... *]
}
numamem = acpi_data_push(table_data, sizeof *numamem);
build_srat_memory(numamem, mem_base, mem_len, node,
  MEM_AFFINITY_ENABLED);
/* ... */
}


-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
+mem_base = build_srat_node_entry(table_data, pcms, i,
+mem_base, pcms->node_mem[i]);
 }
 slots = (table_data->len - numa_start) / sizeof *numamem;
 for (; slots < pcms->numa_nodes + 2; slots++) {
--
2.5.5












[Qemu-devel] [PATCH v5 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-08-24 Thread Dou Liyang
As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
v4 --> v5:
  - rewrite the testcase.
  - Drop the SLIT date

 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 5104 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes
 tests/bios-tables-test.c  |  24 
 5 files changed, 24 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

diff --git a/tests/acpi-test-data/pc/DSDT.numamem 
b/tests/acpi-test-data/pc/DSDT.numamem
new file mode 100644
index 
..bc703d77ee48720af1b801c518b43511c95c852d
GIT binary patch
literal 5104
zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1
z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq
zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0
zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F;
zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z
z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE
z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS
zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^
za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w
zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h
zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u
z$a<-R7FR~o4s5?jj-
<

Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-08-24 Thread Dou Liyang

Hi Igor,

At 08/24/2017 08:33 PM, Igor Mammedov wrote:

> test_acpi_one(" -numa node -numa node,mem=128", );
>
> but, the DSDT didn't match the default one. because, if we support
> NUMA, the DSDT will give us "_PXM" to map the CPU to node.

Ok, looks like you'll have to include your variant of DSDT along with SRAT



Yes, got it.

I will modify and post this patch first.

Thanks,
dou.














Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-08-23 Thread Dou Liyang

Hi Igor,

At 08/24/2017 01:47 AM, Igor Mammedov wrote:

On Wed, 23 Aug 2017 21:35:29 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


Hi Igor,

At 08/23/2017 08:45 PM, Igor Mammedov wrote:

On Wed, 23 Aug 2017 20:12:51 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


Hi Igor,

At 08/23/2017 04:40 PM, Igor Mammedov wrote:

On Tue, 22 Aug 2017 11:24:10 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 6463 bytes
 tests/acpi-test-data/pc/SLIT.numamem  | Bin 0 -> 48 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 264 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes
 tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes
 tests/bios-tables-test.c  |  30 ++
 7 files changed, 30 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SLIT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SLIT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem



considering only SRAT table has been changed and the other
tables match with default blobs, I'd suggest to keep only



Our testcase is:

+test_acpi_one(" -m 128,slots=3,maxmem=1G"
+  " -numa node -numa node,mem=128"
+  " -numa dist,src=0,dst=1,val=21",
+  );

The DSDT and SLIT don't match with default blobs.

do you actually need SLIT table /i.e. -numa dist/ for test at all?
it looks not relevant for the test case at the hand,
I'd suggest to drop '-numa dist' option for the test.



OK, Got it, will drop '-numa dist' option in next version.



So, they can't be dropped.


I wonder what's changed, could you post DSDT diff here?



Just like memory hot-plug cases, when we use the '-m 128
128,slots=3,maxmem=1G' option, As the ACPI spec said, There may be some
Memory Device in the DSDT table.

for your case '-numa node -numa node,mem=128', there is no need in enabling 
memory hotplug.


Thank you very much for your kind explanation.

Yes, I understood.


If I recall it correctly the default memory for x86 is 128Mb,
hence removing "-m" would probably make DSDT match default one.


Yes, I removed the "-m":

test_acpi_one(" -numa node -numa node,mem=128", );

but, the DSDT didn't match the default one. because, if we support
NUMA, the DSDT will give us "_PXM" to map the CPU to node.


--- a/tmp/asl-7QO54Y.dsl
+++ b/tmp/asl-EWM54Y.dsl
@@ -5,13 +5,13 @@
  *
  * Disassembling to symbolic ASL+ operators
  *
- * Disassembly of tests/acpi-test-data/pc/DSDT, Thu Aug 24 09:20:29 2017
+ * Disassembly of /tmp/aml-1YM54Y, Thu Aug 24 09:20:29 2017
  *
  * Original Table Header:
  * Signature"DSDT"
- * Length   0x13EA (5098)
+ * Length   0x13F0 (5104)
  * Revision 0x01  32-bit table (V1), no 64-bit math 
support

- * Checksum 0x78
+ * Checksum 0x13
  * OEM ID   "BOCHS "
  * OEM Table ID "BXPCDSDT"
  * OEM Revision 0x0001 (1)
@@ -783,6 +783,8 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", 
"BXPCDSDT", 0x0001)

 {
 COST (Zero, Arg0, Arg1, Arg2)
 }
+
+Name (_PXM, Zero)  // _PXM: Device Proximity
 }
 }
 }


Thanks,
dou.





[...]










Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-08-23 Thread Dou Liyang

Hi Eduardo,

At 08/24/2017 01:25 AM, Eduardo Habkost wrote:

On Wed, Aug 23, 2017 at 09:35:29PM +0800, Dou Liyang wrote:

Hi Igor,

At 08/23/2017 08:45 PM, Igor Mammedov wrote:

On Wed, 23 Aug 2017 20:12:51 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


Hi Igor,

At 08/23/2017 04:40 PM, Igor Mammedov wrote:

On Tue, 22 Aug 2017 11:24:10 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 6463 bytes
 tests/acpi-test-data/pc/SLIT.numamem  | Bin 0 -> 48 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 264 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes
 tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes
 tests/bios-tables-test.c  |  30 ++
 7 files changed, 30 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SLIT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SLIT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem



considering only SRAT table has been changed and the other
tables match with default blobs, I'd suggest to keep only



Our testcase is:

+test_acpi_one(" -m 128,slots=3,maxmem=1G"
+  " -numa node -numa node,mem=128"
+  " -numa dist,src=0,dst=1,val=21",
+  );

The DSDT and SLIT don't match with default blobs.

do you actually need SLIT table /i.e. -numa dist/ for test at all?
it looks not relevant for the test case at the hand,
I'd suggest to drop '-numa dist' option for the test.



OK, Got it, will drop '-numa dist' option in next version.



So, they can't be dropped.


I wonder what's changed, could you post DSDT diff here?



Just like memory hot-plug cases, when we use the '-m
128,slots=3,maxmem=1G' option, As the ACPI spec said, There may be some
Memory Device in the DSDT table.


Do you really need to use -m 128,slots=3,maxmem=1G to test your
bug fix?



I was wrong, As the default memory for x86 is 128Mb, I will remove this 
option to make one case just do one thing.


Thanks,
dou.





Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-08-23 Thread Dou Liyang

Hi Igor,

At 08/23/2017 08:45 PM, Igor Mammedov wrote:

On Wed, 23 Aug 2017 20:12:51 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


Hi Igor,

At 08/23/2017 04:40 PM, Igor Mammedov wrote:

On Tue, 22 Aug 2017 11:24:10 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 6463 bytes
 tests/acpi-test-data/pc/SLIT.numamem  | Bin 0 -> 48 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 264 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes
 tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes
 tests/bios-tables-test.c  |  30 ++
 7 files changed, 30 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SLIT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SLIT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem



considering only SRAT table has been changed and the other
tables match with default blobs, I'd suggest to keep only



Our testcase is:

+test_acpi_one(" -m 128,slots=3,maxmem=1G"
+  " -numa node -numa node,mem=128"
+  " -numa dist,src=0,dst=1,val=21",
+  );

The DSDT and SLIT don't match with default blobs.

do you actually need SLIT table /i.e. -numa dist/ for test at all?
it looks not relevant for the test case at the hand,
I'd suggest to drop '-numa dist' option for the test.



OK, Got it, will drop '-numa dist' option in next version.



So, they can't be dropped.


I wonder what's changed, could you post DSDT diff here?



Just like memory hot-plug cases, when we use the '-m
128,slots=3,maxmem=1G' option, As the ACPI spec said, There may be some
Memory Device in the DSDT table.


(to get diff run 'make V=1 check' without DSDT.numamem being present)


diff --git a/asl-YJ034Y.dsl b/asl-JLX34Y.dsl
index c7b187b..6cd9e5d 100644
--- a/asl-YJ034Y.dsl
+++ b/asl-JLX34Y.dsl
@@ -5,13 +5,13 @@
  *
  * Disassembling to symbolic ASL+ operators
  *
- * Disassembly of tests/acpi-test-data/pc/DSDT, Wed Aug 23 21:22:56 2017
+ * Disassembly of /tmp/aml-8IX34Y, Wed Aug 23 21:22:56 2017
  *
  * Original Table Header:
  * Signature"DSDT"
- * Length   0x13EA (5098)
+ * Length   0x193F (6463)
  * Revision 0x01  32-bit table (V1), no 64-bit math 
support

- * Checksum 0x78
+ * Checksum 0x7B
  * OEM ID   "BOCHS "
  * OEM Table ID "BXPCDSDT"
  * OEM Revision 0x0001 (1)
@@ -783,6 +783,8 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", 
"BXPCDSDT", 0x0001)

 {
 COST (Zero, Arg0, Arg1, Arg2)
 }
+
+Name (_PXM, Zero)  // _PXM: Device Proximity
 }
 }
 }
@@ -792,6 +794,310 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", 
"BXPCDSDT", 0x0001)

 \_SB.CPUS.CSCN ()
 }

+Device (\_SB.PCI0.MHPD)
+{
+Name (_HID, "PNP0A06" /* Generic Container Device */)  // _HID: 
Hardware ID

+Name (_UID, "Memory hotplug resources")  // _UID: Unique ID
+Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
+{
+IO (Decode16,
+0x0A00, // Range Minimum
+0x0A00, // Range Maximum
+0x00,   // Alignment
+0x18,   // Length
+)
+})
+OperationRegion (HPMR, SystemIO, 0x0A00, 0x18)
+}
+
+Device (\_SB.MHPC)
+{
+Name (_HID, "PNP0A06" /* Generic Container Device */)  // _HID: 
Hardware ID

+Name (_UID, "DIMM devices")  // _UID: Unique ID
+Name (MDNR, 0x03)
+Field (\_SB.PCI0.MHPD.HPMR, DWordAcc, NoLock, Preserve)
+{
+MRBL,   32,
+MRBH,   32,
+MRLL,   32,
+MRLH,   32,
+MPX,32
+}
+
+Field (\_SB.PCI0.MHPD.HPMR, ByteAcc, NoLock, WriteAsZeros)
+{
+

Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-08-23 Thread Dou Liyang

Hi Igor,

At 08/23/2017 04:40 PM, Igor Mammedov wrote:

On Tue, 22 Aug 2017 11:24:10 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 6463 bytes
 tests/acpi-test-data/pc/SLIT.numamem  | Bin 0 -> 48 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 264 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes
 tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes
 tests/bios-tables-test.c  |  30 ++
 7 files changed, 30 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SLIT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SLIT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem



considering only SRAT table has been changed and the other
tables match with default blobs, I'd suggest to keep only



Our testcase is:

+test_acpi_one(" -m 128,slots=3,maxmem=1G"
+  " -numa node -numa node,mem=128"
+  " -numa dist,src=0,dst=1,val=21",
+  );

The DSDT and SLIT don't match with default blobs.

So, they can't be dropped.

Thanks,
dou.


  tests/acpi-test-data/[pc|q35]/SRAT.numamem
in this patch and drop the rest of *.numamem tables
as test case should fallback to default tables
when .numamem variant doesn't exists









Re: [Qemu-devel] [PATCH v4 0/2] hw/acpi-build: Fix ACPI SRAT Memory Affinity building

2017-08-22 Thread Dou Liyang

Hi Michael,

At 08/23/2017 10:48 AM, Michael S. Tsirkin wrote:

On Tue, Aug 22, 2017 at 11:24:08AM +0800, Dou Liyang wrote:

V3 --> v4:
  -add a new testcase.

This patchset fixs an ACPI building bug which caused by no RAM
in the first NUAM node. and also add a new testcase for the bug.


thanks!
Pls remember to ping or repost after release is out.



OK, I will.

Thanks,
dou.


Dou Liyang (2):
  hw/acpi-build: Fix SRAT memory building when there is no memory in
node0
  ACPI/unit-test: Add a new testcase for RAM allocation in numa node

 hw/i386/acpi-build.c  |  78 ++
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 6463 bytes
 tests/acpi-test-data/pc/SLIT.numamem  | Bin 0 -> 48 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 264 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes
 tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes
 tests/bios-tables-test.c  |  30 +
 8 files changed, 80 insertions(+), 28 deletions(-)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SLIT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SLIT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

--
2.5.5












[Qemu-devel] [PATCH] NUMA: Replace MAX_NODES with nb_numa_nodes in for loop

2017-08-22 Thread Dou Liyang
In QEMU, the number of the NUMA nodes is determined by parse_numa_opts().
Then, QEMU uses it for iteration, for example:
  for (i = 0; i < nb_numa_nodes; i++)

However, in memory_region_allocate_system_memory(), it uses MAX_NODES
not nb_numa_nodes.

So, replace MAX_NODES with nb_numa_nodes to keep code consistency and
reduce the loop times.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/numa.c b/numa.c
index e32af04..5f2916d 100644
--- a/numa.c
+++ b/numa.c
@@ -567,7 +567,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, 
Object *owner,
 }
 
 memory_region_init(mr, owner, name, ram_size);
-for (i = 0; i < MAX_NODES; i++) {
+for (i = 0; i < nb_numa_nodes; i++) {
 uint64_t size = numa_info[i].node_mem;
 HostMemoryBackend *backend = numa_info[i].node_memdev;
 if (!backend) {
-- 
2.5.5






[Qemu-devel] [PATCH v4 1/2] hw/acpi-build: Fix SRAT memory building when there is no memory in node0

2017-08-21 Thread Dou Liyang
Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by replace the node0 with the first node which has
memory on it. Add a new function for each node. Also do some cleanup.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 hw/i386/acpi-build.c | 78 +---
 1 file changed, 50 insertions(+), 28 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..f93d712 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }
 
+static uint64_t
+build_srat_node_entry(GArray *table_data, PCMachineState *pcms,
+int i, uint64_t mem_base, uint64_t mem_len)
+{
+AcpiSratMemoryAffinity *numamem;
+uint64_t next_base;
+
+next_base = mem_base + mem_len;
+
+/* Cut out the ACPI_PCI hole */
+if (mem_base <= pcms->below_4g_mem_size &&
+next_base > pcms->below_4g_mem_size) {
+mem_len -= next_base - pcms->below_4g_mem_size;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+}
+mem_base = 1ULL << 32;
+mem_len = next_base - pcms->below_4g_mem_size;
+next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+return next_base;
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
 AcpiSystemResourceAffinityTable *srat;
 AcpiSratMemoryAffinity *numamem;
 
-int i;
+int i, node;
 int srat_start, numa_start, slots;
-uint64_t mem_len, mem_base, next_base;
+uint64_t mem_len, mem_base;
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
 PCMachineState *pcms = PC_MACHINE(machine);
@@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 /* the memory map is a bit tricky, it contains at least one hole
  * from 640k-1M and possibly another one from 3.5G-4G.
  */
-next_base = 0;
+
 numa_start = table_data->len;
 
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
-for (i = 1; i < pcms->numa_nodes + 1; ++i) {
-mem_base = next_base;
-mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
+/* get the first node which has memory and map the hole from 640K-1M */
+for (node = 0; node < pcms->numa_nodes; node++) {
+if (pcms->node_mem[node] != 0) {
+break;
 }
-next_base = mem_base + mem_len;
-
-/* Cut out the ACPI_PCI hole */
-if (mem_base <= pcms->below_4g_mem_size &&
-next_base > pcms->below_4g_mem_size) {
-mem_len -= next_base - pcms->below_4g_mem_size;
-if (mem_len > 0) {
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
-}
-mem_base = 1ULL << 32;
-mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, 0, 640 * 1024, node, MEM_AFFINITY_ENABLED);
+
+/* map the rest of memory from 1M */
+mem_base = 1024 * 1024;
+mem_len = pcms->node_mem[node] - mem_base;
+mem_base = build_srat_node_entry(table_data, pcms, node,
+mem_base, mem_len);
+
+for (i = 0; i < pcms->numa_nodes; i++) {
+if (i == node) {
+continue;
 }
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
- 

[Qemu-devel] [PATCH v4 0/2] hw/acpi-build: Fix ACPI SRAT Memory Affinity building

2017-08-21 Thread Dou Liyang
V3 --> v4:
  -add a new testcase.

This patchset fixs an ACPI building bug which caused by no RAM
in the first NUAM node. and also add a new testcase for the bug.

Dou Liyang (2):
  hw/acpi-build: Fix SRAT memory building when there is no memory in
node0
  ACPI/unit-test: Add a new testcase for RAM allocation in numa node

 hw/i386/acpi-build.c  |  78 ++
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 6463 bytes
 tests/acpi-test-data/pc/SLIT.numamem  | Bin 0 -> 48 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 264 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes
 tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes
 tests/bios-tables-test.c  |  30 +
 8 files changed, 80 insertions(+), 28 deletions(-)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SLIT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SLIT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

-- 
2.5.5






[Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node

2017-08-21 Thread Dou Liyang
As QEMU supports the memory-less node, it is possible that there is
no RAM in the first numa node(also be called as node0). eg:
  ... \
  -m 128,slots=3,maxmem=1G \
  -numa node -numa node,mem=128M \

But, this makes it hard for QEMU to build a known-to-work ACPI SRAT
table. Only fixing it is not enough.

Add a testcase for this situation to make sure the ACPI table is
correct for guest.

Suggested-by: Eduardo Habkost <ehabk...@redhat.com>
Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 tests/acpi-test-data/pc/DSDT.numamem  | Bin 0 -> 6463 bytes
 tests/acpi-test-data/pc/SLIT.numamem  | Bin 0 -> 48 bytes
 tests/acpi-test-data/pc/SRAT.numamem  | Bin 0 -> 264 bytes
 tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes
 tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes
 tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes
 tests/bios-tables-test.c  |  30 ++
 7 files changed, 30 insertions(+)
 create mode 100644 tests/acpi-test-data/pc/DSDT.numamem
 create mode 100644 tests/acpi-test-data/pc/SLIT.numamem
 create mode 100644 tests/acpi-test-data/pc/SRAT.numamem
 create mode 100644 tests/acpi-test-data/q35/DSDT.numamem
 create mode 100644 tests/acpi-test-data/q35/SLIT.numamem
 create mode 100644 tests/acpi-test-data/q35/SRAT.numamem

diff --git a/tests/acpi-test-data/pc/DSDT.numamem 
b/tests/acpi-test-data/pc/DSDT.numamem
new file mode 100644
index 
..53f6d5824359ff0ca20179b19b3b5cb79f017f82
GIT binary patch
literal 6463
zcmcgw-EJGl6`tiTrR9*4meR(St)wR8{4{A|^T&}Bv}jH4QX(xbwdPW;i#5unETdGA
z)(a(G22h;b1+0tqjy8C0KSKKmd4%*8QZ(^Z)bH$aD2CDk$wf;*t2uMN{mwZv
zXU@#5>6p#moMTMdFKrkVCsVp*8z%ZB#u+k|NOp)9N$)Jr#N!8yp
zOPg!b-#Xr3J@3QMJgM!ottZ-}t+xN^LvM=_=>C?^IW@H9mQ!lE-6h+oX4O`uYNm=`
zaanB@%?49jn^jZNEH%WG)rwti3XlX4)NrF>H!YT8?5ppSOmQD*Brn`7*UgOGFk2aY
zrR6k>%%>jDr>^$L9@o5n>dT(TdS3GAXu$fjU-sJUFYOfj*MH=%V;?1G@m_8
zAYza+g|R)Ry>f}XA$Q~<w5crysnFLqtT)~JjexzNCkHHM3>J(cc4g*WLD_JAP
zJq~wY{rx3kp*I<;TxQcXyIhypJ4`l;)R2u5{%OXA%d#*`Y;O0hM$-UkIAlo-7Wuo#
zUs#iT<})p}%%nAGm+i9H)E;xYSJzAC2rkQdA{doXpuvkC^O%IUw%IoRFUJtC+kMU2
z*c`n$w=nsl%HzvSBVbWoHI30EP7gg=;)`q2H}W?!Q`V<nJ(0%h%W=UB}
z*{f!CKBF-n=NmjL@n<~7U%}rzU*bpnef}+<<Eq+PHOuL<T~|HUA<-Cx)2=Ezrt
z>5Jji(~x@oDNOf(Jyu3BYkX!+`bhprR@LQ$z@M^WY*;xlsBtOGKtV$j5=HY|el9b0
zSqo@Zi6%cm!($^J%xEM}?0F14DtNStdYnWDw&66TBzwwLq%a+)hnaB?obq)n
zMfE9VRFVXvlr3L}qExpUQc?>QBp9Xorj>D}Qq5Rr`YX0kz8-Tgl5}+BcSz~)Njr>Q
zcVDCK0n_SOLZz0r?fllk+p1J7seC#A%rHb3w`Zu1!1^7nV!Ta3%>PG9TY
z0VsiGQ>(9=Y`gz*?~c?@_u5<;bvtbP@ytjn0+jCE;jvLqY1ku=dJMMhf3mm5HHL-D
z9({@3
<

Re: [Qemu-devel] [PATCH v3] hw/acpi-build: Fix SRAT memory building when there is no memory in node0

2017-08-21 Thread Dou Liyang

Hi, Igor

[...]


Ok, I will do it right now.


see commit 6b9c1dd2c for example of adding test case,
currently test has base SRAT tables that is reused by all tests.
When you create testcase that produces different SRAT table,
'make check' will print warnings that ables do not match
you can add "V=1" env. var. 'make V=1 check' to get detailed
diff of what's changed.

also see how 'test_data.variant' is handled to get idea how to use
it to differentiate specific test case vs shared ACPI tables baseline blobs.



Thanks for teaching me that, it's very helpful to me.

Except for adding a new case, can we except memhp testcase with
node RAM check? just like following shows:

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 564da45..c79ece4 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -788,7 +788,7 @@ static void test_acpi_q35_tcg_memhp(void)
 data.machine = MACHINE_Q35;
 data.variant = ".memhp";
 test_acpi_one(" -m 128,slots=3,maxmem=1G"
-  " -numa node -numa node"
+  " -numa node -numa node,mem=128"
   " -numa dist,src=0,dst=1,val=21",
   );
 free_test_data();
@@ -802,7 +802,7 @@ static void test_acpi_piix4_tcg_memhp(void)
 data.machine = MACHINE_PC;
 data.variant = ".memhp";
 test_acpi_one(" -m 128,slots=3,maxmem=1G"
-  " -numa node -numa node"
+  " -numa node -numa node,mem=128"
   " -numa dist,src=0,dst=1,val=21",
   );
 free_test_data();


Thanks,
dou.





Re: [Qemu-devel] [PATCH v3] hw/acpi-build: Fix SRAT memory building when there is no memory in node0

2017-08-20 Thread Dou Liyang

Hi Eduardo,

At 08/19/2017 12:48 AM, Eduardo Habkost wrote:

On Thu, Aug 17, 2017 at 01:58:40PM +0800, Dou Liyang wrote:

Hi Igor,

I tested this patch with following guests:

1. RHEL 6.5 with Linux 2.6.32
2. RHEL 7.0 with Linux 3.10.0
3. Fedora 23 with Linux 4.13.0-rc5
4. window 2003 service
5. window 7
6. window 10


What's the command-line(s) you have tested with each OS?  Have



I am sorry, one of the command-lines for the node0-without-RAM:

./x86_64-softmmu/qemu-system-x86_64 \
-hda /home/douly/image/rhel6.5.qcow2 \
-m 2G,slots=4,maxmem=32G \
-enable-kvm \
-smp 2,maxcpus=8,sockets=2,cores=2,threads=2\
-object memory-backend-ram,id=mem0,size=1024M \
-object memory-backend-ram,id=mem1,size=1024M \
-object memory-backend-ram,id=mem2,size=1024M \
-object memory-backend-ram,id=mem3,size=1024M \
-device pc-dimm,id=dimm0,memdev=mem0,slot=0,node=0 \
-device pc-dimm,id=dimm1,memdev=mem1,slot=1,node=1 \
-device pc-dimm,id=dimm2,memdev=mem2,slot=2,node=2 \
-device pc-dimm,id=dimm3,memdev=mem3,slot=3,node=3 \
-device qemu64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0 \
-device qemu64-x86_64-cpu,id=cpu3,socket-id=0,core-id=1,thread-id=1 \
-device qemu64-x86_64-cpu,id=cpu4,socket-id=1,core-id=0,thread-id=0 \
-device qemu64-x86_64-cpu,id=cpu5,socket-id=1,core-id=0,thread-id=1 \
-device qemu64-x86_64-cpu,id=cpu6,socket-id=1,core-id=1,thread-id=0 \
-device qemu64-x86_64-cpu,id=cpu7,socket-id=1,core-id=1,thread-id=1 \
-numa node,nodeid=0,cpus=0-1 \
-numa node,mem=2G,nodeid=1,cpus=2-3 \
-numa node,nodeid=2,cpus=4-5 \
-numa node,nodeid=3,cpus=6-7 \
-serial stdio \
-monitor telnet:127.0.0.1:,server,nowait \
#   -kernel /home/douly/openSource/linux/arch/x86_64/boot/bzImage \
#	-append "root=/dev/mapper/fedora_s3101490-root console=tty0 
console=ttyS0 earlyprintk=ttyS0,115200n8 movable_node" \

#   -initrd /home/douly/openSource/initramfs_image/4.13.0.img \
 you tested both the node0-with-RAM and node0-without-RAM cases?




Yes, I tested three cases for each guests:

Case 1)  all node with RAM.

-numa node,nodeid=0,cpus=0-1 \
-numa node,nodeid=1,cpus=2-3 \
-numa node,nodeid=2,cpus=4-5 \
-numa node,nodeid=3,cpus=6-7 \

Case 2) the node0-with-RAM:

-numa node,mem=2G,nodeid=0,cpus=0-1 \
-numa node,nodeid=1,cpus=2-3 \
-numa node,nodeid=2,cpus=4-5 \
-numa node,nodeid=3,cpus=6-7 \

Case 3) the node0-without-RAM:

-numa node,nodeid=0,cpus=0-1 \
-numa node,mem=2G,nodeid=1,cpus=2-3 \
-numa node,nodeid=2,cpus=4-5 \
-numa node,nodeid=3,cpus=6-7 \


I would be interested in a demonstration that no bytes in the
ACPI table are changed by this patch when there is some RAM


Yes, you are right, there is no change if node 0 has some RAM.
This patch just works for the node0-without-RAM situation.


configured in node 0.  (Is that already covered by our existing
ACPI test cases?)


I am not sure it, could you tell me where are the SRAT test cases?
Seems there is no test cases about SRAT in bios-tables-test.c.



A new test case in bios-tables-test.c for the bug you are fixing
would be nice to have.



Ok, I will do it right now.

Thanks,
dou.






Thanks,
dou.

At 08/16/2017 09:45 AM, Dou Liyang wrote:

Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by replace the node0 with the first node which has
memory on it. Add a new function for each node. Also do some cleanup.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
V3 --> V2
  -Modify the title
V2 --> V1:
  -Fix a coding style problem
Replace
for (node = 0;
node < pcms->numa_nodes && pcms->node_mem[node] == 0;
node++);

with
for (node = 0; node < pcms->numa_nodes; node++) {
   if (pcms->node_mem[node] != 0) {
break;
 }

 hw/i386/acpi-build.c | 78 +---
 1 file changed, 50 insertions(+), 28 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..f93d712 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_p

Re: [Qemu-devel] [PATCH v3] hw/acpi-build: Fix SRAT memory building when there is no memory in node0

2017-08-16 Thread Dou Liyang

Hi Igor,

I tested this patch with following guests:

1. RHEL 6.5 with Linux 2.6.32
2. RHEL 7.0 with Linux 3.10.0
3. Fedora 23 with Linux 4.13.0-rc5
4. window 2003 service
5. window 7
6. window 10

Thanks,
dou.

At 08/16/2017 09:45 AM, Dou Liyang wrote:

Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by replace the node0 with the first node which has
memory on it. Add a new function for each node. Also do some cleanup.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
V3 --> V2
  -Modify the title
V2 --> V1:
  -Fix a coding style problem
Replace
for (node = 0;
node < pcms->numa_nodes && pcms->node_mem[node] == 0;
node++);

with
for (node = 0; node < pcms->numa_nodes; node++) {
   if (pcms->node_mem[node] != 0) {
break;
 }

 hw/i386/acpi-build.c | 78 +---
 1 file changed, 50 insertions(+), 28 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..f93d712 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }

+static uint64_t
+build_srat_node_entry(GArray *table_data, PCMachineState *pcms,
+int i, uint64_t mem_base, uint64_t mem_len)
+{
+AcpiSratMemoryAffinity *numamem;
+uint64_t next_base;
+
+next_base = mem_base + mem_len;
+
+/* Cut out the ACPI_PCI hole */
+if (mem_base <= pcms->below_4g_mem_size &&
+next_base > pcms->below_4g_mem_size) {
+mem_len -= next_base - pcms->below_4g_mem_size;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+}
+mem_base = 1ULL << 32;
+mem_len = next_base - pcms->below_4g_mem_size;
+next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+return next_base;
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
 AcpiSystemResourceAffinityTable *srat;
 AcpiSratMemoryAffinity *numamem;

-int i;
+int i, node;
 int srat_start, numa_start, slots;
-uint64_t mem_len, mem_base, next_base;
+uint64_t mem_len, mem_base;
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
 PCMachineState *pcms = PC_MACHINE(machine);
@@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 /* the memory map is a bit tricky, it contains at least one hole
  * from 640k-1M and possibly another one from 3.5G-4G.
  */
-next_base = 0;
+
 numa_start = table_data->len;

-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
-for (i = 1; i < pcms->numa_nodes + 1; ++i) {
-mem_base = next_base;
-mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
+/* get the first node which has memory and map the hole from 640K-1M */
+for (node = 0; node < pcms->numa_nodes; node++) {
+if (pcms->node_mem[node] != 0) {
+break;
 }
-next_base = mem_base + mem_len;
-
-/* Cut out the ACPI_PCI hole */
-if (mem_base <= pcms->below_4g_mem_size &&
-next_base > pcms->below_4g_mem_size) {
-mem_len -= next_base - pcms->below_4g_mem_size;
-if (mem_len > 0) {
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
-}
-mem_base = 1ULL << 32;
-mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(tabl

Re: [Qemu-devel] [PATCH v2] hw/acpi: Select an node with memory for mapping memory hole to

2017-08-16 Thread Dou Liyang

Hi Igor,

At 08/16/2017 05:18 PM, Igor Mammedov wrote:

On Wed, 16 Aug 2017 09:26:51 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by replace the node0 with the first node which has
memory on it. Add a new function for each node. Also do some cleanup.

It seems harmless but one never knows for sure,
could you test it with different guests including old windows (up to XP)/
linux (2.6 stable kernel) versions?



I have test it with the following guests:

windows 7.
linux 4.13-rc5.

will test it in linux (2.6 and 3.10 stable kernel).

Thanks,
dou.



Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
V2 --> V1:
  -Fix a coding style problem
Replace
for (node = 0;
node < pcms->numa_nodes && pcms->node_mem[node] == 0;
node++);

with
for (node = 0; node < pcms->numa_nodes; node++) {
   if (pcms->node_mem[node] != 0) {
break;
 }

 hw/i386/acpi-build.c | 78 +---
 1 file changed, 50 insertions(+), 28 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..f93d712 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }

+static uint64_t
+build_srat_node_entry(GArray *table_data, PCMachineState *pcms,
+int i, uint64_t mem_base, uint64_t mem_len)
+{
+AcpiSratMemoryAffinity *numamem;
+uint64_t next_base;
+
+next_base = mem_base + mem_len;
+
+/* Cut out the ACPI_PCI hole */
+if (mem_base <= pcms->below_4g_mem_size &&
+next_base > pcms->below_4g_mem_size) {
+mem_len -= next_base - pcms->below_4g_mem_size;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+}
+mem_base = 1ULL << 32;
+mem_len = next_base - pcms->below_4g_mem_size;
+next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+return next_base;
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
 AcpiSystemResourceAffinityTable *srat;
 AcpiSratMemoryAffinity *numamem;

-int i;
+int i, node;
 int srat_start, numa_start, slots;
-uint64_t mem_len, mem_base, next_base;
+uint64_t mem_len, mem_base;
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
 PCMachineState *pcms = PC_MACHINE(machine);
@@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 /* the memory map is a bit tricky, it contains at least one hole
  * from 640k-1M and possibly another one from 3.5G-4G.
  */
-next_base = 0;
+
 numa_start = table_data->len;

-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
-for (i = 1; i < pcms->numa_nodes + 1; ++i) {
-mem_base = next_base;
-mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
+/* get the first node which has memory and map the hole from 640K-1M */
+for (node = 0; node < pcms->numa_nodes; node++) {
+if (pcms->node_mem[node] != 0) {
+break;
 }
-next_base = mem_base + mem_len;
-
-/* Cut out the ACPI_PCI hole */
-if (mem_base <= pcms->below_4g_mem_size &&
-next_base > pcms->below_4g_mem_size) {
-mem_len -= next_base - pcms->below_4g_mem_size;
-if (mem_len > 0) {
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
-}
-mem_base = 1ULL << 32;
-mem_len = nex

[Qemu-devel] [PATCH v3] hw/acpi-build: Fix SRAT memory building when there is no memory in node0

2017-08-15 Thread Dou Liyang
Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by replace the node0 with the first node which has
memory on it. Add a new function for each node. Also do some cleanup.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
V3 --> V2
  -Modify the title
V2 --> V1:
  -Fix a coding style problem
Replace
for (node = 0;
node < pcms->numa_nodes && pcms->node_mem[node] == 0;
node++);

with
for (node = 0; node < pcms->numa_nodes; node++) {
   if (pcms->node_mem[node] != 0) {
break;
 }

 hw/i386/acpi-build.c | 78 +---
 1 file changed, 50 insertions(+), 28 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..f93d712 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }
 
+static uint64_t
+build_srat_node_entry(GArray *table_data, PCMachineState *pcms,
+int i, uint64_t mem_base, uint64_t mem_len)
+{
+AcpiSratMemoryAffinity *numamem;
+uint64_t next_base;
+
+next_base = mem_base + mem_len;
+
+/* Cut out the ACPI_PCI hole */
+if (mem_base <= pcms->below_4g_mem_size &&
+next_base > pcms->below_4g_mem_size) {
+mem_len -= next_base - pcms->below_4g_mem_size;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+}
+mem_base = 1ULL << 32;
+mem_len = next_base - pcms->below_4g_mem_size;
+next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+return next_base;
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
 AcpiSystemResourceAffinityTable *srat;
 AcpiSratMemoryAffinity *numamem;
 
-int i;
+int i, node;
 int srat_start, numa_start, slots;
-uint64_t mem_len, mem_base, next_base;
+uint64_t mem_len, mem_base;
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
 PCMachineState *pcms = PC_MACHINE(machine);
@@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 /* the memory map is a bit tricky, it contains at least one hole
  * from 640k-1M and possibly another one from 3.5G-4G.
  */
-next_base = 0;
+
 numa_start = table_data->len;
 
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
-for (i = 1; i < pcms->numa_nodes + 1; ++i) {
-mem_base = next_base;
-mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
+/* get the first node which has memory and map the hole from 640K-1M */
+for (node = 0; node < pcms->numa_nodes; node++) {
+if (pcms->node_mem[node] != 0) {
+break;
 }
-next_base = mem_base + mem_len;
-
-/* Cut out the ACPI_PCI hole */
-if (mem_base <= pcms->below_4g_mem_size &&
-next_base > pcms->below_4g_mem_size) {
-mem_len -= next_base - pcms->below_4g_mem_size;
-if (mem_len > 0) {
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
-}
-mem_base = 1ULL << 32;
-mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, 0, 640 * 1024, node, MEM_AFFINITY_ENABLED);
+
+/* map the rest of memory from 1M */
+mem_base = 1024 * 1024;
+mem_len = pcms->node_mem[node] - mem_base;
+mem_base = build

[Qemu-devel] [PATCH v2] hw/acpi: Select an node with memory for mapping memory hole to

2017-08-15 Thread Dou Liyang
Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by replace the node0 with the first node which has
memory on it. Add a new function for each node. Also do some cleanup.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
V2 --> V1:
  -Fix a coding style problem
Replace
for (node = 0;
node < pcms->numa_nodes && pcms->node_mem[node] == 0;
node++);

with
for (node = 0; node < pcms->numa_nodes; node++) {
   if (pcms->node_mem[node] != 0) {
break;
 }

 hw/i386/acpi-build.c | 78 +---
 1 file changed, 50 insertions(+), 28 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..f93d712 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }
 
+static uint64_t
+build_srat_node_entry(GArray *table_data, PCMachineState *pcms,
+int i, uint64_t mem_base, uint64_t mem_len)
+{
+AcpiSratMemoryAffinity *numamem;
+uint64_t next_base;
+
+next_base = mem_base + mem_len;
+
+/* Cut out the ACPI_PCI hole */
+if (mem_base <= pcms->below_4g_mem_size &&
+next_base > pcms->below_4g_mem_size) {
+mem_len -= next_base - pcms->below_4g_mem_size;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+}
+mem_base = 1ULL << 32;
+mem_len = next_base - pcms->below_4g_mem_size;
+next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+return next_base;
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
 AcpiSystemResourceAffinityTable *srat;
 AcpiSratMemoryAffinity *numamem;
 
-int i;
+int i, node;
 int srat_start, numa_start, slots;
-uint64_t mem_len, mem_base, next_base;
+uint64_t mem_len, mem_base;
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
 PCMachineState *pcms = PC_MACHINE(machine);
@@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 /* the memory map is a bit tricky, it contains at least one hole
  * from 640k-1M and possibly another one from 3.5G-4G.
  */
-next_base = 0;
+
 numa_start = table_data->len;
 
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
-for (i = 1; i < pcms->numa_nodes + 1; ++i) {
-mem_base = next_base;
-mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
+/* get the first node which has memory and map the hole from 640K-1M */
+for (node = 0; node < pcms->numa_nodes; node++) {
+if (pcms->node_mem[node] != 0) {
+break;
 }
-next_base = mem_base + mem_len;
-
-/* Cut out the ACPI_PCI hole */
-if (mem_base <= pcms->below_4g_mem_size &&
-next_base > pcms->below_4g_mem_size) {
-mem_len -= next_base - pcms->below_4g_mem_size;
-if (mem_len > 0) {
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
-}
-mem_base = 1ULL << 32;
-mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, 0, 640 * 1024, node, MEM_AFFINITY_ENABLED);
+
+/* map the rest of memory from 1M */
+mem_base = 1024 * 1024;
+mem_len = pcms->node_mem[node] - mem_base;
+mem_base = build_srat_node_entry(table_data,

Re: [Qemu-devel] [PATCH] numa: Add a check whether the node0 has memory or not

2017-08-15 Thread Dou Liyang

Hi Igor,

At 08/15/2017 04:04 PM, Igor Mammedov wrote:

On Tue, 15 Aug 2017 09:26:46 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


Hi Eduardo,

At 08/14/2017 08:44 PM, Eduardo Habkost wrote:

On Mon, Aug 14, 2017 at 06:11:11PM +0800, Dou Liyang wrote:

Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M) firstly.

Add a check in parse_numa_opts to avoid this situation.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 numa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/numa.c b/numa.c
index e32af04..1d6f73f 100644
--- a/numa.c
+++ b/numa.c
@@ -464,6 +464,10 @@ void parse_numa_opts(MachineState *ms)
 if (i == nb_numa_nodes) {
 assert(mc->numa_auto_assign_ram);
 mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
+} else if (i != 0) {
+error_report("The first NUMA node must have some memory"
+  " for building ACPI SART");
+exit(1);


This doesn't belong to numa.c.  numa.c is generic code, and the
requirement you described is specific for PC.

Anyway, adding this check would make existing VM configurations
refuse to run after a QEMU upgrade.  I suggest fixing the bug in
the ACPI code instead.



I see.

If fixing the bug in the ACPI code, I have two solutions:

1). Add a check in build_srat(). If the first node has no memory, QEMU
will exit.

2). Fix the initialization of memory affinity structure to cover this
situation. Using the first node which has memory to deal with the memory
hole.

I prefer solution 2. what about you?

I'd go for 2nd solution



Yeah, I see. And I just sent the 2nd solution, waiting for your reply.

Thanks,
dou.



Thanks,
dou.













[Qemu-devel] [PATCH] hw/acpi: Select an node with memory for mapping memory hole to

2017-08-15 Thread Dou Liyang
Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M), but, actually it can have no
memory.

Fix this problem by replace the node0 with the first node which has
memory on it. Add a new function for each node. Also do some cleanup.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 hw/i386/acpi-build.c | 76 +---
 1 file changed, 48 insertions(+), 28 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 98dd424..e5f57d2 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker)
  (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL);
 }
 
+static uint64_t
+build_srat_node_entry(GArray *table_data, PCMachineState *pcms,
+int i, uint64_t mem_base, uint64_t mem_len)
+{
+AcpiSratMemoryAffinity *numamem;
+uint64_t next_base;
+
+next_base = mem_base + mem_len;
+
+/* Cut out the ACPI_PCI hole */
+if (mem_base <= pcms->below_4g_mem_size &&
+next_base > pcms->below_4g_mem_size) {
+mem_len -= next_base - pcms->below_4g_mem_size;
+if (mem_len > 0) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+}
+mem_base = 1ULL << 32;
+mem_len = next_base - pcms->below_4g_mem_size;
+next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+}
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, mem_base, mem_len, i,
+  MEM_AFFINITY_ENABLED);
+return next_base;
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
 AcpiSystemResourceAffinityTable *srat;
 AcpiSratMemoryAffinity *numamem;
 
-int i;
+int i, node;
 int srat_start, numa_start, slots;
-uint64_t mem_len, mem_base, next_base;
+uint64_t mem_len, mem_base;
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine);
 PCMachineState *pcms = PC_MACHINE(machine);
@@ -2370,36 +2398,28 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 /* the memory map is a bit tricky, it contains at least one hole
  * from 640k-1M and possibly another one from 3.5G-4G.
  */
-next_base = 0;
+
 numa_start = table_data->len;
 
+/* get the first node which has memory and  map the hole from 640K-1M */
+for (node = 0;
+node < pcms->numa_nodes && pcms->node_mem[node] == 0;
+node++);
 numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED);
-next_base = 1024 * 1024;
-for (i = 1; i < pcms->numa_nodes + 1; ++i) {
-mem_base = next_base;
-mem_len = pcms->node_mem[i - 1];
-if (i == 1) {
-mem_len -= 1024 * 1024;
-}
-next_base = mem_base + mem_len;
-
-/* Cut out the ACPI_PCI hole */
-if (mem_base <= pcms->below_4g_mem_size &&
-next_base > pcms->below_4g_mem_size) {
-mem_len -= next_base - pcms->below_4g_mem_size;
-if (mem_len > 0) {
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
-}
-mem_base = 1ULL << 32;
-mem_len = next_base - pcms->below_4g_mem_size;
-next_base += (1ULL << 32) - pcms->below_4g_mem_size;
+build_srat_memory(numamem, 0, 640 * 1024, node, MEM_AFFINITY_ENABLED);
+
+/* map the rest of memory from 1M */
+mem_base = 1024 * 1024;
+mem_len = pcms->node_mem[node] - mem_base;
+mem_base = build_srat_node_entry(table_data, pcms, node,
+mem_base, mem_len);
+
+for (i = 0; i < pcms->numa_nodes; i++) {
+if (i == node) {
+continue;
 }
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, mem_base, mem_len, i - 1,
-  MEM_AFFINITY_ENABLED);
+mem_base = build_srat_node_ent

Re: [Qemu-devel] [PATCH] numa: Add a check whether the node0 has memory or not

2017-08-14 Thread Dou Liyang

Hi Eduardo,

At 08/14/2017 08:44 PM, Eduardo Habkost wrote:

On Mon, Aug 14, 2017 at 06:11:11PM +0800, Dou Liyang wrote:

Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M) firstly.

Add a check in parse_numa_opts to avoid this situation.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 numa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/numa.c b/numa.c
index e32af04..1d6f73f 100644
--- a/numa.c
+++ b/numa.c
@@ -464,6 +464,10 @@ void parse_numa_opts(MachineState *ms)
 if (i == nb_numa_nodes) {
 assert(mc->numa_auto_assign_ram);
 mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
+} else if (i != 0) {
+error_report("The first NUMA node must have some memory"
+  " for building ACPI SART");
+exit(1);


This doesn't belong to numa.c.  numa.c is generic code, and the
requirement you described is specific for PC.

Anyway, adding this check would make existing VM configurations
refuse to run after a QEMU upgrade.  I suggest fixing the bug in
the ACPI code instead.



I see.

If fixing the bug in the ACPI code, I have two solutions:

1). Add a check in build_srat(). If the first node has no memory, QEMU
will exit.

2). Fix the initialization of memory affinity structure to cover this
situation. Using the first node which has memory to deal with the memory
hole.

I prefer solution 2. what about you?

Thanks,
dou.





Re: [Qemu-devel] [PATCH] numa: Add a check whether the node0 has memory or not

2017-08-14 Thread Dou Liyang

I'm sorry, forgot to cc Michael S. Tsirkin

At 08/14/2017 06:11 PM, Dou Liyang wrote:

Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M) firstly.

Add a check in parse_numa_opts to avoid this situation.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 numa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/numa.c b/numa.c
index e32af04..1d6f73f 100644
--- a/numa.c
+++ b/numa.c
@@ -464,6 +464,10 @@ void parse_numa_opts(MachineState *ms)
 if (i == nb_numa_nodes) {
 assert(mc->numa_auto_assign_ram);
 mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
+} else if (i != 0) {
+error_report("The first NUMA node must have some memory"
+  " for building ACPI SART");
+exit(1);
 }

 numa_total = 0;







[Qemu-devel] [PATCH] numa: Add a check whether the node0 has memory or not

2017-08-14 Thread Dou Liyang
Currently, Using the fisrt node without memory on the machine makes
QEMU unhappy. With this example command line:
  ... \
  -m 1024M,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,mem=1024M,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
Guest reports "No NUMA configuration found" and the NUMA topology is
wrong.

This is because when QEMU builds ACPI SRAT, it regards node0 as the
default node to deal with the memory hole(640K-1M). this means the
node0 must have some memory(>1M) firstly.

Add a check in parse_numa_opts to avoid this situation.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 numa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/numa.c b/numa.c
index e32af04..1d6f73f 100644
--- a/numa.c
+++ b/numa.c
@@ -464,6 +464,10 @@ void parse_numa_opts(MachineState *ms)
 if (i == nb_numa_nodes) {
 assert(mc->numa_auto_assign_ram);
 mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size);
+} else if (i != 0) {
+error_report("The first NUMA node must have some memory"
+  " for building ACPI SART");
+exit(1);
 }
 
 numa_total = 0;
-- 
2.5.5






Re: [Qemu-devel] Memory Hotplug : Does QEmu support cold-plugged memory?

2017-06-05 Thread Dou Liyang

Hi Igor,

At 06/05/2017 04:45 PM, Igor Mammedov wrote:

On Mon, 5 Jun 2017 15:53:02 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


Hi all,

I want to boot up a system with some removable memory.

So, i used '-object' and '-device' to cold-plug memory
in QEmu as following :

./x86_64-softmmu/qemu-system-x86_64 \
...
-object memory-backend-ram,id=mem3,size=128M \
-device pc-dimm,id=dimm3,memdev=mem3 \
...

then i want to hot-remove this memory by 'device-del' and
'object-del'.

But, unfortunately, I can't remove the memory device.

And, in guest OS, I found the memory's removable feature is 0,
But, in QEmu monitor with 'info memory-devices', I found the
hotplugged is false and hotpluggable is true.

It's issue in the guest side kernel.
In short it's not guarantied that guest kernel would be able
to hotremove DIMM module (even if hardware is capable of) since
kernel might use its memory as not MOVABLE, so DIMM might be
temporally or permanently pinned by kernel allocations.



I see.


There is ongoing refactoring/cleanup work upstream, that
tries to make memhotplug behavior consistent/reasonable.

If you are interested, I'd suggest to review/test changes that
Michal works on upstream, to make sure that refactoring won't
break your usecase.


OK, I will.



You can find the latest series Michal posted at:
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1410289.html



um, I found the patchset, and I am reading the discussions and test
cases you gave in v2.





Could you help me:

1, Can the ram based memory-backend be plugged like above?

yes, it can be used as above


2, Does the 'info memory-devices' command show a wrong info?

nope,
command shows hotpluggable status the way hw (qemu) sees it


3, Can file memory backend(memory-backend-file) be cold-plugged?

yes, it can.

[...]



Thanks a lot ;)

Thanks,
dou.










[Qemu-devel] Memory Hotplug : Does QEmu support cold-plugged memory?

2017-06-05 Thread Dou Liyang

Hi all,

I want to boot up a system with some removable memory.

So, i used '-object' and '-device' to cold-plug memory
in QEmu as following :

./x86_64-softmmu/qemu-system-x86_64 \
...
-object memory-backend-ram,id=mem3,size=128M \
-device pc-dimm,id=dimm3,memdev=mem3 \
...

then i want to hot-remove this memory by 'device-del' and
'object-del'.

But, unfortunately, I can't remove the memory device.

And, in guest OS, I found the memory's removable feature is 0,
But, in QEmu monitor with 'info memory-devices', I found the
hotplugged is false and hotpluggable is true.

Could you help me:

1, Can the ram based memory-backend be plugged like above?

2, Does the 'info memory-devices' command show a wrong info?

3, Can file memory backend(memory-backend-file) be cold-plugged?


PS: here is my whole script:

gdb --arg ./x86_64-softmmu/qemu-system-x86_64 \
-hda /image/fedora.img \
-m 256M,slots=4,maxmem=1G \
-enable-kvm \
-smp 2,maxcpus=16,sockets=2,cores=2,threads=2\
-object memory-backend-ram,id=mem1,size=128M \
-object memory-backend-ram,id=mem2,size=128M \
-object memory-backend-ram,id=mem3,size=128M \
-device pc-dimm,id=dimm3,memdev=mem3 \
-device qemu64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0 \
-device qemu64-x86_64-cpu,id=cpu3,socket-id=0,core-id=1,thread-id=1 \
-numa node,memdev=mem1,nodeid=0 \
-numa node,memdev=mem2,nodeid=1 \
-serial stdio \
-monitor telnet:127.0.0.1:,server,nowait \
-kernel /home/douly/openSource/orignal_linux/arch/x86_64/boot/bzImage \
	-append "root=/dev/mapper/fedora_s3101490-root mem=256M movable_node 
nokaslr earlyprintk=serial,ttyS0,115200" \

-initrd /home/douly/openSource/initramfs_image/4.12.0-rc3.img \


Thanks,
dou.





Re: [Qemu-devel] [PATCH v4 2/2] block/qapi: reduce the execution time of qmp_query_blockstats

2017-01-31 Thread Dou Liyang

Hi, Max

At 02/01/2017 11:02 AM, Max Reitz wrote:

On 15.01.2017 09:01, Dou Liyang wrote:

In order to reduce the execution time, this patch optimize
the qmp_query_blockstats():
Remove the next_query_bds function.
Remove the bdrv_query_stats function.
Remove some judgement sentence.

The original qmp_query_blockstats calls next_query_bds to get
the next objects in each loops. In the next_query_bds, it checks
the query_nodes and blk. It also call bdrv_query_stats to get
the stats, In the bdrv_query_stats, it checks blk and bs each
times. This waste more times, which may stall the main loop a
bit. And if the disk is too many and donot use the dataplane
feature, this may affect the performance in main loop thread.

This patch removes that two functions, and makes the structure
clearly.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 block/qapi.c | 73 
 1 file changed, 29 insertions(+), 44 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index bc622cd..45e9d47 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -456,23 +456,6 @@ static BlockStats *bdrv_query_bds_stats(const 
BlockDriverState *bs,
 return s;
 }

-static BlockStats *bdrv_query_stats(BlockBackend *blk,
-const BlockDriverState *bs,
-bool query_backing)
-{
-BlockStats *s;
-
-s = bdrv_query_bds_stats(bs, query_backing);
-
-if (blk) {
-s->has_device = true;
-s->device = g_strdup(blk_name(blk));
-bdrv_query_blk_stats(s->stats, blk);
-}
-
-return s;
-}
-
 BlockInfoList *qmp_query_block(Error **errp)
 {
 BlockInfoList *head = NULL, **p_next = 
@@ -496,42 +479,44 @@ BlockInfoList *qmp_query_block(Error **errp)
 return head;
 }

-static bool next_query_bds(BlockBackend **blk, BlockDriverState **bs,
-   bool query_nodes)
-{
-if (query_nodes) {
-*bs = bdrv_next_node(*bs);
-return !!*bs;
-}
-
-*blk = blk_next(*blk);
-*bs = *blk ? blk_bs(*blk) : NULL;
-
-return !!*blk;
-}
-
 BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
  bool query_nodes,
  Error **errp)
 {
 BlockStatsList *head = NULL, **p_next = 
-BlockBackend *blk = NULL;
-BlockDriverState *bs = NULL;
+BlockBackend *blk;
+BlockDriverState *bs;

 /* Just to be safe if query_nodes is not always initialized */
-query_nodes = has_query_nodes && query_nodes;
-
-while (next_query_bds(, , query_nodes)) {
-BlockStatsList *info = g_malloc0(sizeof(*info));
-AioContext *ctx = blk ? blk_get_aio_context(blk)
-  : bdrv_get_aio_context(bs);
+if (has_query_nodes && query_nodes) {
+for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(bs)) {
+BlockStatsList *info = g_malloc0(sizeof(*info));
+AioContext *ctx = bdrv_get_aio_context(bs);

-aio_context_acquire(ctx);
-info->value = bdrv_query_stats(blk, bs, !query_nodes);
-aio_context_release(ctx);
+aio_context_acquire(ctx);
+info->value = bdrv_query_bds_stats(bs, false);
+aio_context_release(ctx);

-*p_next = info;
-p_next = >next;
+*p_next = info;
+p_next = >next;
+}
+} else {
+for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+BlockStatsList *info = g_malloc0(sizeof(*info));
+AioContext *ctx = blk_get_aio_context(blk);
+BlockStats *s;
+
+aio_context_acquire(ctx);
+info->value = s = bdrv_query_bds_stats(blk_bs(blk), true);


Superfluous assignment to info->value here...


Yes, It is.




+s->has_device = true;
+s->device = g_strdup(blk_name(blk));
+bdrv_query_blk_stats(s->stats, blk);
+aio_context_release(ctx);
+
+info->value = s;


...since it's written here anyway. I'll remove the first assignment when
applying the patch to my tree.


Thanks very much!



Max


+*p_next = info;
+p_next = >next;
+}
 }

 return head;






Thanks,
Liyang





[Qemu-devel] [PATCH v3] vl: Move the cpu_synchronize_all_post_init() after generic devices initialization

2017-01-31 Thread Dou Liyang
At the Qemu initialization, we call the cpu_synchronize_all_post_init()
to synchronize All CPU states to KVM in the ./vl.c::main().

Currently, it is called before we initialize the CPUs, which is created
by "-device" command and parsed by generic devices initialization, So,
these CPUs may be ignored to synchronize.

The patch moves the cpu_synchronize_all_post_init func after generic
devices initialization to make sure that all the CPUs can be included.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
Change log:
v2-> v3:
1. Rebase the patch.
2. Rewrite the log
v1-> v2:
1. Split it from
https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03354.html
2. Rewrite the log.

 vl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index 0b72b12..bceb835 100644
--- a/vl.c
+++ b/vl.c
@@ -4490,8 +4490,6 @@ int main(int argc, char **argv, char **envp)
 
 audio_init();
 
-cpu_synchronize_all_post_init();
-
 if (hax_enabled()) {
 hax_sync_vcpus();
 }
@@ -4517,6 +4515,8 @@ int main(int argc, char **argv, char **envp)
 exit(1);
 }
 
+cpu_synchronize_all_post_init();
+
 numa_post_machine_init();
 
 rom_reset_order_override();
-- 
2.5.5






Re: [Qemu-devel] [PATCH v2] vl: Ensure the cpu_synchronize_all_post_init() in the appropriate location

2017-01-31 Thread Dou Liyang

Hi, Alex

Sorry for late reply.

At 01/27/2017 10:03 PM, Alex Bennée wrote:


Dou Liyang <douly.f...@cn.fujitsu.com> writes:


At the Qemu initialization, we call the cpu_synchronize_all_post_init()
to synchronize All CPU states to KVM in the ./vl.c::main().

Currently, it is called before we initialize the CPUs, which created by
"-device" command, So, these CPUs may be ignored to synchronize.

The patch moves the numa_post_machine_init func in the appropriate
location to make sure that all the CPUs has already been created
when it is called.


This doesn't match what the patch does (the function is
cpu_synchronise_all_post_init) but I see another patch is already merged
that moved the NUMA one.



Oops, Yes, I make the mistake.


I'm afraid you'll need to re-base anyway because it no longer applies
cleanly.



Yes, I will re-base it.

Thanks,

Liyang






Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---

Change log v1-> v2:
1. Split it from
https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03354.html
2. Rewrite the log.

 vl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index c643d3f..38269be 100644
--- a/vl.c
+++ b/vl.c
@@ -4547,8 +4547,6 @@ int main(int argc, char **argv, char **envp)

 audio_init();

-cpu_synchronize_all_post_init();
-
 numa_post_machine_init();

 if (qemu_opts_foreach(qemu_find_opts("fw_cfg"),
@@ -4571,6 +4569,9 @@ int main(int argc, char **argv, char **envp)
   device_init_func, NULL, NULL)) {
 exit(1);
 }
+
+cpu_synchronize_all_post_init();
+
 rom_reset_order_override();

 /* Did we create any drives that we failed to create a device for? */



--
Alex Bennée









Re: [Qemu-devel] [PATCH v4 0/2] block/qapi: refactor and optimize the qmp_query_blockstats()

2017-01-22 Thread Dou Liyang

Hi, Markus

Here is the non-RFC patch.
Any suggestions are welcome. :)

Thanks,
Liyang.

At 01/15/2017 04:01 PM, Dou Liyang wrote:

Change log v3 -> v4:
 1. Develop these into the non-RFC patches.
 2. Fix some comments.
 3. do declarations first.

Change log v2 -> v3:
 1. Remove the unnecessary code for the bdrv_next_node().
 2. Remove the change of the locking rules.
Even if this change can improve the performance, but it may
effect the consistency.

The Qemu uses the qmp_query_blockstats() function to get the
blockstats. As the function has been changed several times, it
becomes more complex and longer.

For the multi-disks guest, if we want to execute I/O operations
and this function at the same time. we can use the dataplane
feature to hold I/O performance does not drop. But, Normally
without this feature, How to reduce the decline in performance?

These patches refactor the qmp_query_blockstats() to make the
code easier to follow, and shorter as follows:

From:

+--+  +-+
 | 1|  | 4.  |
 |next_query_bds|  |bdrv_query_bds_stats +---+
 |  |  | |   |
 +^-+  +-^---+   |
  |  |   |
+-+--+  ++---+   |
| 0. |  | 2. |   |
|qmp_query_blockstats+-->bdrv_query_stats<
||  ||
++  ++---+
 |
   +-v---+
   | 3.  |
   |bdrv_query_blk_stats |
   | |
   +-+

To:

+--+
|  |
   +v---+  |
   +--->  3.|  |
+---+  |   |bdrv_query_bds_stats+--+
| 1.+--+   ||
|   +  ++
|qmp_query_blockstats--+
|   |  |
+---+  |   ++
   |   | 2. |
   +--->|
   |bdrv_query_blk_stats|
   ||
   ++


They also optimize the fuction by reducing its running time.

1. The function running time

the time it takes(ns) in each requests:
the disk numbers | 10| 500
-
before these patches | 19429 | 667722
after these patches  | 18536 | 627945

2. For the performance
used the dd command likes this to test:
dd if=date_1.dat of=date_2.dat conv=fsync oflag=direct bs=1k count=100k.

the I/O performance is degraded(%) during the monitor:
the disk numbers | 10| 500
-
before these patches | 1.3   | 14.2
after these patches  | 1.0   | 11.3

Dou Liyang (2):
  block/qapi: reduce the coupling between the bdrv_query_stats and
bdrv_query_bds_stats
  block/qapi: reduce the execution time of qmp_query_blockstats

 block/qapi.c | 95 ++--
 1 file changed, 41 insertions(+), 54 deletions(-)







Re: [Qemu-devel] [RFC 05/13] pc: move pcms->possible_cpus init out of pc_cpus_init()

2017-01-19 Thread Dou Liyang



At 01/19/2017 01:13 AM, Igor Mammedov wrote:

possible_cpus could be initialized earlier then cpu objects,


s/then/than/


i.e. when -smp is parsed so move init code to possible_cpu_arch_ids()


[...]





Re: [Qemu-devel] [PATCH 0/3] cpu: numa: Fix the mapping initialization of VCPUs and NUMA nodes

2017-01-19 Thread Dou Liyang

Hi, Eduardo

At 01/19/2017 01:06 AM, Eduardo Habkost wrote:

On Wed, Jan 18, 2017 at 09:26:36PM +0800, Dou Liyang wrote:

Hi, All


**
ERROR:/tmp/qemu-test/src/tests/vhost-user-test.c:668:test_migrate: assertion failed: 
(qdict_haskey(rsp, "return"))
GTester: last random seed: R02Sf52546c4daff8087416f43fa7c146db8
ftruncate: Permission denied
ftruncate: Permission denied
qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: 
Assertion `cpu->cpu_index < max_cpus' failed.
Broken pipe


I don't know What's the meaning of this log ?

Is the qemu-system-aarch64 can't recognize the
qom/cpu.c:346: assert(cpu->cpu_index < max_cpus);


This means the assert() line is being triggered for some reason,
and cpu_index is >= max_cpus when we cpu_common_map_numa_node()
gets called. We need to investigate why.



I have investigated the reason why it is failed.

Because not all targets(aarch64-linux-user, x86_64-linux-user ...)
need to compile the vl.c(include the max_cpus) and numa.c, but the all
may compile the ./qom/cpu.c.
So, when we Link those targets, we may can't find the vl.o or numa.o
that we want.

Add "#ifdef CONFIG_NUMA" to fix it.

+static void cpu_common_map_numa_node(CPUState *cpu)
+{
+#ifdef CONFIG_NUMA
+int i;
+
+assert(cpu->cpu_index < max_cpus);
+for (i = 0; i < nb_numa_nodes; i++) {
+if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
+cpu->numa_node = i;
+return;
+}
+}
+#endif
+}
+

And I am not sure if it is necessary to resend this patch for
fixing the bug before Igor's patches is OK completely?

Thanks,
Liyang.






Re: [Qemu-devel] [PATCH 0/3] cpu: numa: Fix the mapping initialization of VCPUs and NUMA nodes

2017-01-19 Thread Dou Liyang

Hi, Igor,

At 01/18/2017 09:46 PM, Igor Mammedov wrote:

On Wed, 18 Jan 2017 20:40:04 +0800
Dou Liyang <douly.f...@cn.fujitsu.com> wrote:


As we fixed a bug(Bug 1) in below links, Named "Method-A":

https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03354.html

Then, Eduardo gave us many suggests. Thanks very much!
when we try them, we also find another bugs named "Bug 2".

I have an alternative fix for both issues for which
I've been writing cover letter when I saw this series.

My fix series more intrusive though as it's goal isn't
just to fix 'info numa' but rather switch away from
cpu-index based mapping to socket/core/thread based mapping
and stop using bitmaps for mapping. And as 'info numa' was
getting in the way, I fixed both issues in a slightly
different way.

So pls wait a bit, once travis build is completed,
I'll post series here.


It doesn't matter. I am learning about your patches.
I think my method is temporary. your fix is better than me.

Thanks,
Liyang.





[Qemu-devel] [PATCH v2] vl: Ensure the cpu_synchronize_all_post_init() in the appropriate location

2017-01-18 Thread Dou Liyang
At the Qemu initialization, we call the cpu_synchronize_all_post_init()
to synchronize All CPU states to KVM in the ./vl.c::main().

Currently, it is called before we initialize the CPUs, which created by
"-device" command, So, these CPUs may be ignored to synchronize.

The patch moves the numa_post_machine_init func in the appropriate
location to make sure that all the CPUs has already been created
when it is called.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---

Change log v1-> v2:
1. Split it from 
https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03354.html
2. Rewrite the log.

 vl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index c643d3f..38269be 100644
--- a/vl.c
+++ b/vl.c
@@ -4547,8 +4547,6 @@ int main(int argc, char **argv, char **envp)
 
 audio_init();
 
-cpu_synchronize_all_post_init();
-
 numa_post_machine_init();
 
 if (qemu_opts_foreach(qemu_find_opts("fw_cfg"),
@@ -4571,6 +4569,9 @@ int main(int argc, char **argv, char **envp)
   device_init_func, NULL, NULL)) {
 exit(1);
 }
+
+cpu_synchronize_all_post_init();
+
 rom_reset_order_override();
 
 /* Did we create any drives that we failed to create a device for? */
-- 
2.5.5






Re: [Qemu-devel] [PATCH 0/3] cpu: numa: Fix the mapping initialization of VCPUs and NUMA nodes

2017-01-18 Thread Dou Liyang

Hi, All


**
ERROR:/tmp/qemu-test/src/tests/vhost-user-test.c:668:test_migrate: assertion failed: 
(qdict_haskey(rsp, "return"))
GTester: last random seed: R02Sf52546c4daff8087416f43fa7c146db8
ftruncate: Permission denied
ftruncate: Permission denied
qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: 
Assertion `cpu->cpu_index < max_cpus' failed.
Broken pipe


I don't know What's the meaning of this log ?

Is the qemu-system-aarch64 can't recognize the
qom/cpu.c:346: assert(cpu->cpu_index < max_cpus);

Thanks,
Liyang



GTester: last random seed: R02Sa39aa674143b4b48a89276d59eee19b3
qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: 
Assertion `cpu->cpu_index < max_cpus' failed.
Broken pipe
GTester: last random seed: R02Sbd557f67296de3764183db6b2105d88b
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: 
Assertion `cpu->cpu_index < max_cpus' failed.
Broken pipe
GTester: last random seed: R02S79c36cf5f50e19956d11fbc925f86df0
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: 
Assertion `cpu->cpu_index < max_cpus' failed.
Broken pipe
GTester: last random seed: R02S1186d0871ff44bed258ec179722caf91
qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: 
Assertion `cpu->cpu_index < max_cpus' failed.
Broken pipe
GTester: last random seed: R02S3429594491b10489f9abab73dcc15151
make: *** [check-qtest-aarch64] Error 1
make: *** Waiting for unfinished jobs
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
make[1]: *** [docker-run] Error 2
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-3q9yfmbs/src'
make: *** [docker-run-test-quick@centos6] Error 2
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org







Re: [Qemu-devel] [PATCH 1/3] cpu: Make the mapping of CPUs and NUMA nodes in cpu_common_realizefn

2017-01-18 Thread Dou Liyang

Hi, Eduardo

At 01/18/2017 08:56 PM, Eduardo Habkost wrote:

On Wed, Jan 18, 2017 at 08:40:05PM +0800, Dou Liyang wrote:

Current default way of seting the CPUState::numa_node might be wrong
in case on cold/hot-plug CPUs. Making the users confused why the
NUMA info is different beetween the guests and monitor.

Make the mapping of CPUs and NUMA nodes in qom/cpu.c:
cpu_common_realizefn(), where each VCPUs need to realize.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>


parse_numa_opts() is called a long time before any CPU is
created, so this should be safe.

Reviewed-by: Eduardo Habkost <ehabk...@redhat.com>

(But we can squash patch 2/3 and patch 3/3 in this patch).


OK, :)

Thanks,
Liyang.




---
 qom/cpu.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/qom/cpu.c b/qom/cpu.c
index 61ee0cb..e08dceb 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -23,6 +23,7 @@
 #include "qemu-common.h"
 #include "qom/cpu.h"
 #include "sysemu/kvm.h"
+#include "sysemu/numa.h"
 #include "qemu/notify.h"
 #include "qemu/log.h"
 #include "exec/log.h"
@@ -338,6 +339,18 @@ static void cpu_common_parse_features(const char 
*typename, char *features,
 }
 }

+static void cpu_common_map_numa_node(CPUState *cpu)
+{
+int i;
+
+for (i = 0; i < nb_numa_nodes; i++) {
+assert(cpu->cpu_index < max_cpus);
+if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
+cpu->numa_node = i;
+}
+}
+}
+
 static void cpu_common_realizefn(DeviceState *dev, Error **errp)
 {
 CPUState *cpu = CPU(dev);
@@ -347,6 +360,8 @@ static void cpu_common_realizefn(DeviceState *dev, Error 
**errp)
 cpu_resume(cpu);
 }

+cpu_common_map_numa_node(cpu);
+
 /* NOTE: latest generic point where the cpu is fully realized */
 trace_init_vcpu(cpu);
 }
--
2.5.5











[Qemu-devel] [PATCH 3/3] cpu: make the function of cpu_common_map_numa_node more efficiently

2017-01-18 Thread Dou Liyang
Current function does some unnecessary operations, such as it makes
the assert() in the loop, and the loop was not stopped in time.

This patch moves the assert() out the loop and stops the loop in
time.

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 qom/cpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qom/cpu.c b/qom/cpu.c
index e08dceb..3c655b2 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -343,10 +343,11 @@ static void cpu_common_map_numa_node(CPUState *cpu)
 {
 int i;
 
+assert(cpu->cpu_index < max_cpus);
 for (i = 0; i < nb_numa_nodes; i++) {
-assert(cpu->cpu_index < max_cpus);
 if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
 cpu->numa_node = i;
+return;
 }
 }
 }
-- 
2.5.5






  1   2   >