[Qemu-devel] [PATCH] hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration() follow-up
This Commit 7747abf11487 misses the curly brackets. Follow Igor's and Eduardo's suggestion, Add a follow-up patch for it. Signed-off-by: Dou Liyang --- hw/core/machine.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index a9aeb22f03..6b68e1218f 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -793,8 +793,9 @@ void machine_run_board_init(MachineState *machine) MachineClass *machine_class = MACHINE_GET_CLASS(machine); numa_complete_configuration(machine); -if (nb_numa_nodes) +if (nb_numa_nodes) { machine_numa_finish_cpu_init(machine); +} /* If the machine supports the valid_cpu_types check and the user * specified a CPU with -cpu check here that the user CPU is supported. -- 2.14.3
Re: [Qemu-devel] [PATCH v2] hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration()
Hi Eduardo, At 07/10/2018 06:27 PM, Eduardo Habkost wrote: On Tue, Jul 10, 2018 at 04:15:30PM +0800, Dou Liyang wrote: [...] +if (nb_numa_nodes) machine_numa_finish_cpu_init(machine); -} missing curly brackets, should look like: if (nb_numa_nodes) { machine_numa_finish_cpu_init(machine); } with that fixed Reviewed-by: Igor Mammedov Thank you, Igor. Due to Eduardo has sent the pull request of this patch, Can I fix this and send v2 patch now? You can fix this on a follow-up patch and CC qemu-trivial. Ok, I see. ;-) Thanks, dou
[Qemu-devel] [PATCH v2 1/2] hw/acpi-build: Add a check for memory-less NUMA nodes
Currently, Qemu ACPI builder doesn't consider the memory-less NUMA nodes, eg: -m 4G,slots=4,maxmem=8G \ -numa node,nodeid=0 \ -numa node,nodeid=1,mem=2G \ -numa node,nodeid=2,mem=2G \ -numa node,nodeid=3\ Guest Linux will report [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x] [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009] [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff] [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x13fff] [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug [mem 0x-0x] and [mem 0x14000-0x13fff] are bogus. Add a check to avoid building srat memory for memory-less NUMA nodes, also update the test file. Now the info in guest linux will be [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009] [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff] [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug Signed-off-by: Dou Liyang Reviewed-by: Igor Mammedov --- Note to maintainer: update ACPI tables test blobs on commit. --- hw/i386/acpi-build.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 9e8350c55d..c584642e4e 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2392,9 +2392,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) mem_len = next_base - pcms->below_4g_mem_size; next_base = mem_base + mem_len; } -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); + +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i - 1, + MEM_AFFINITY_ENABLED); +} } slots = (table_data->len - numa_start) / sizeof *numamem; for (; slots < pcms->numa_nodes + 2; slots++) { -- 2.14.3
[Qemu-devel] [PATCH v2 2/2][DO NOT APPLY] tests/acpi-test: update ACPI tables test blobs
Now, QEmu adds a new check for memory-less NUMA nodes in build_srat(). It effects the ACPI test. So, Update ACPI tables test blobs. Signed-off-by: Dou Liyang --- tests/acpi-test-data/pc/SRAT.numamem | Bin 224 -> 224 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 224 -> 224 bytes 2 files changed, 0 insertions(+), 0 deletions(-) diff --git a/tests/acpi-test-data/pc/SRAT.numamem b/tests/acpi-test-data/pc/SRAT.numamem index dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3 100644 GIT binary patch delta 24 gcmaFB_<)fsILI;N0RsaAt<8 diff --git a/tests/acpi-test-data/q35/SRAT.numamem b/tests/acpi-test-data/q35/SRAT.numamem index dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3 100644 GIT binary patch delta 24 gcmaFB_<)fsILI;N0RsaAt<8 -- 2.14.3
Re: [Qemu-devel] [PATCH] hw/acpi-build: Add a check for non-memory NUMA nodes.
Hi Igor, At 07/10/2018 03:52 PM, Igor Mammedov wrote: On Thu, 5 Jul 2018 10:10:38 +0800 Dou Liyang wrote: Currently, Qemu ACPI builder doesn't consider the non-memory NUMA nodes, eg: s/non-memory/memory-less/ throughout subj/commit message Yes, I will do it. -m 4G,slots=4,maxmem=8G \ -numa node,nodeid=0 \ -numa node,nodeid=1,mem=2G \ -numa node,nodeid=2,mem=2G \ -numa node,nodeid=3\ Guest Linux will report [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x] [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009] [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff] [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x13fff] [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug [mem 0x-0x] and [mem 0x14000-0x13fff] are bogus. Add a check to avoid building srat memory for non-memory NUMA nodes, also update the test file. Now the info in guest linux will be [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009] [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff] [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug Signed-off-by: Dou Liyang --- Have done a bootup test in Linux and window 10, 7 add here": note to maintainer: update ACPI tables test blobs on commit. with patch amended Reviewed-by: Igor Mammedov --- hw/i386/acpi-build.c | 9 ++--- tests/acpi-test-data/pc/SRAT.numamem | Bin 224 -> 224 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 224 -> 224 bytes 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 9e8350c55d..c584642e4e 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2392,9 +2392,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) mem_len = next_base - pcms->below_4g_mem_size; next_base = mem_base + mem_len; } -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); + +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i - 1, + MEM_AFFINITY_ENABLED); +} } slots = (table_data->len - numa_start) / sizeof *numamem; for (; slots < pcms->numa_nodes + 2; slots++) { Drop binary blobs from patch (for reviewer convenience we post blobs as separate patch with DO NOT APPLY tag in subject). Michael will update test blobs manually when merging your patch. Thank you for your explanation. I see. will split them out in v2. Thanks, dou diff --git a/tests/acpi-test-data/pc/SRAT.numamem b/tests/acpi-test-data/pc/SRAT.numamem index dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3 100644 GIT binary patch delta 24 gcmaFB_<)fsILI;N0RsaAt<8 diff --git a/tests/acpi-test-data/q35/SRAT.numamem b/tests/acpi-test-data/q35/SRAT.numamem index dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3 100644 GIT binary patch delta 24 gcmaFB_<)fsILI;N0RsaAt<8
Re: [Qemu-devel] [PATCH v2] hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration()
Hi Igor, At 07/10/2018 03:42 PM, Igor Mammedov wrote: On Wed, 4 Jul 2018 21:22:39 +0800 Dou Liyang wrote: Commit 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()") broke the commit 7b8be49d36fc("NUMA: Enable adding NUMA node implicitly"). The machine_run_board_init() doesn't do NUMA setup if nb_numa_nodes=0, but the numa_complete_configuration need add a new node if memory hotplug is enabled (slots > 0) even nb_numa_nodes=0. So, Remove the check for numa_complete_configuration() to fix this. Fixes 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()") Signed-off-by: Dou Liyang --- v1 --> v2: -Fix it to pass the "make check". --- hw/core/machine.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index 2077328bcc..32b3c5a7d3 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -791,10 +791,9 @@ void machine_run_board_init(MachineState *machine) { MachineClass *machine_class = MACHINE_GET_CLASS(machine); -if (nb_numa_nodes) { -numa_complete_configuration(machine); +numa_complete_configuration(machine); +if (nb_numa_nodes) machine_numa_finish_cpu_init(machine); -} missing curly brackets, should look like: if (nb_numa_nodes) { machine_numa_finish_cpu_init(machine); } with that fixed Reviewed-by: Igor Mammedov Thank you, Igor. Due to Eduardo has sent the pull request of this patch, Can I fix this and send v2 patch now? Thanks, dou /* If the machine supports the valid_cpu_types check and the user * specified a CPU with -cpu check here that the user CPU is supported.
[Qemu-devel] [PATCH] hw/acpi-build: Add a check for non-memory NUMA nodes.
Currently, Qemu ACPI builder doesn't consider the non-memory NUMA nodes, eg: -m 4G,slots=4,maxmem=8G \ -numa node,nodeid=0 \ -numa node,nodeid=1,mem=2G \ -numa node,nodeid=2,mem=2G \ -numa node,nodeid=3\ Guest Linux will report [0.00] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x] [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009] [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff] [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x13fff] [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug [mem 0x-0x] and [mem 0x14000-0x13fff] are bogus. Add a check to avoid building srat memory for non-memory NUMA nodes, also update the test file. Now the info in guest linux will be [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x-0x0009] [0.00] ACPI: SRAT: Node 1 PXM 1 [mem 0x0010-0x7fff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x8000-0xbfff] [0.00] ACPI: SRAT: Node 2 PXM 2 [mem 0x1-0x13fff] [0.00] ACPI: SRAT: Node 3 PXM 3 [mem 0x14000-0x33fff] hotplug Signed-off-by: Dou Liyang --- Have done a bootup test in Linux and window 10, 7 --- hw/i386/acpi-build.c | 9 ++--- tests/acpi-test-data/pc/SRAT.numamem | Bin 224 -> 224 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 224 -> 224 bytes 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 9e8350c55d..c584642e4e 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2392,9 +2392,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) mem_len = next_base - pcms->below_4g_mem_size; next_base = mem_base + mem_len; } -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); + +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i - 1, + MEM_AFFINITY_ENABLED); +} } slots = (table_data->len - numa_start) / sizeof *numamem; for (; slots < pcms->numa_nodes + 2; slots++) { diff --git a/tests/acpi-test-data/pc/SRAT.numamem b/tests/acpi-test-data/pc/SRAT.numamem index dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3 100644 GIT binary patch delta 24 gcmaFB_<)fsILI;N0RsaAt<8 diff --git a/tests/acpi-test-data/q35/SRAT.numamem b/tests/acpi-test-data/q35/SRAT.numamem index dbc595d9cb85d3fcb5a4243153f42bb431c9de8f..119922f4973f621602047d1dc160519f810922a3 100644 GIT binary patch delta 24 gcmaFB_<)fsILI;N0RsaAt<8 -- 2.14.3
[Qemu-devel] [PATCH v2] hw/machine: Remove the Zero check of nb_numa_nodes for numa_complete_configuration()
Commit 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()") broke the commit 7b8be49d36fc("NUMA: Enable adding NUMA node implicitly"). The machine_run_board_init() doesn't do NUMA setup if nb_numa_nodes=0, but the numa_complete_configuration need add a new node if memory hotplug is enabled (slots > 0) even nb_numa_nodes=0. So, Remove the check for numa_complete_configuration() to fix this. Fixes 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()") Signed-off-by: Dou Liyang --- v1 --> v2: -Fix it to pass the "make check". --- hw/core/machine.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index 2077328bcc..32b3c5a7d3 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -791,10 +791,9 @@ void machine_run_board_init(MachineState *machine) { MachineClass *machine_class = MACHINE_GET_CLASS(machine); -if (nb_numa_nodes) { -numa_complete_configuration(machine); +numa_complete_configuration(machine); +if (nb_numa_nodes) machine_numa_finish_cpu_init(machine); -} /* If the machine supports the valid_cpu_types check and the user * specified a CPU with -cpu check here that the user CPU is supported. -- 2.14.3
Re: [Qemu-devel] [PATCH] hw/machine: Remove the Zero check of nb_numa_nodes
At 07/04/2018 07:52 PM, Dou Liyang wrote: Commit 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()") broke the commit 7b8be49d36fc("NUMA: Enable adding NUMA node implicitly"). The machine_run_board_init() doesn't do NUMA setup if nb_numa_nodes=0. Due to the NUMA setup will also check the value of nb_numa_nodes, remove the check from machine_run_board_init() to fix ths bug. Fixes 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()") Signed-off-by: Dou Liyang --- hw/core/machine.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) Please ignore this patch, it breaks the "make check", will send v2. Thanks, dou diff --git a/hw/core/machine.c b/hw/core/machine.c index 2077328bcc..962346f90d 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -791,10 +791,8 @@ void machine_run_board_init(MachineState *machine) { MachineClass *machine_class = MACHINE_GET_CLASS(machine); -if (nb_numa_nodes) { -numa_complete_configuration(machine); -machine_numa_finish_cpu_init(machine); -} +numa_complete_configuration(machine); +machine_numa_finish_cpu_init(machine); /* If the machine supports the valid_cpu_types check and the user * specified a CPU with -cpu check here that the user CPU is supported.
[Qemu-devel] [PATCH] hw/machine: Remove the Zero check of nb_numa_nodes
Commit 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()") broke the commit 7b8be49d36fc("NUMA: Enable adding NUMA node implicitly"). The machine_run_board_init() doesn't do NUMA setup if nb_numa_nodes=0. Due to the NUMA setup will also check the value of nb_numa_nodes, remove the check from machine_run_board_init() to fix ths bug. Fixes 7a3099fc9c5c("numa: postpone options post-processing till machine_run_board_init()") Signed-off-by: Dou Liyang --- hw/core/machine.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index 2077328bcc..962346f90d 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -791,10 +791,8 @@ void machine_run_board_init(MachineState *machine) { MachineClass *machine_class = MACHINE_GET_CLASS(machine); -if (nb_numa_nodes) { -numa_complete_configuration(machine); -machine_numa_finish_cpu_init(machine); -} +numa_complete_configuration(machine); +machine_numa_finish_cpu_init(machine); /* If the machine supports the valid_cpu_types check and the user * specified a CPU with -cpu check here that the user CPU is supported. -- 2.14.3
Re: [Qemu-devel] [RESEND PATCH for-2.12 0/2] ACPI/unit-test: Add a testcase
At 01/12/2018 03:23 PM, Dou Liyang wrote: [...] BTW, I used "make TEST_ACPI_REBUILD_AML=1 check" to create the DSDT file. Is it correct? This sounds correct to me. Igor, can you confirm? It's will rebuild/update reference tables. I'm for merging it early and tracking/fixing issue Peter saw during this dev cycle. Send an ack then? Hi Michael, I saw Igor gave the Reviewed-by at 21 Dec 2017 ^^ Oops! Sorry, it's 27 Dec. What is the state of this patchset? I rebase and retest it again, Do I need to send it again? Thanks, dou
Re: [Qemu-devel] [RESEND PATCH for-2.12 0/2] ACPI/unit-test: Add a testcase
[...] BTW, I used "make TEST_ACPI_REBUILD_AML=1 check" to create the DSDT file. Is it correct? This sounds correct to me. Igor, can you confirm? It's will rebuild/update reference tables. I'm for merging it early and tracking/fixing issue Peter saw during this dev cycle. Send an ack then? Hi Michael, I saw Igor gave the Reviewed-by at 21 Dec 2017. What is the state of this patchset? I rebase and retest it again, Do I need to send it again? Thanks, dou
Re: [Qemu-devel] [RESEND PATCH for-2.12 0/2] ACPI/unit-test: Add a testcase
Hi Eduardo, At 12/19/2017 06:09 AM, Eduardo Habkost wrote: On Thu, Dec 14, 2017 at 12:08:53PM +0800, Dou Liyang wrote: These are the patches left over from the pull request: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05 because of some errors when tested by "make check" command[1]. Now, the QEMU 2.11.0 has been released, rebase/retest/respin them for v2.12 dev. Changelog: --Rebase and retest this patch in v2.11.0 --Update the DSDT.numamem: [1]: http://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01334.html Do we know what caused the failures Peter saw, and if we can now be sure it won't happen again? Sorry, I can't. I tested it many times with my colleagues, it won't happen like Peter saw. This issue is very subtle and funny. Could you give me some suggestions? BTW, I used "make TEST_ACPI_REBUILD_AML=1 check" to create the DSDT file. Is it correct? Thanks, dou.
[Qemu-devel] [RESEND PATCH for-2.12 1/2] ACPI/unit-test: Add a testcase for RAM allocation in numa node
As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 5150 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 224 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7834 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes tests/bios-tables-test.c | 24 5 files changed, 24 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem diff --git a/tests/acpi-test-data/pc/DSDT.numamem b/tests/acpi-test-data/pc/DSDT.numamem new file mode 100644 index ..224cfdd9e983e02dac5f4bf7e210eaa64cb0dc78 GIT binary patch literal 5150 zcmb7I-EJG#5uUTVl$K{nX(_FLmDq$F*GSsf{PAB<pb)uBiL^v&&81uyE0jxFLFpiX z7fuXB5rWbTpg6t@R2RMI1#R%!dW7~7@(AfGxM-}aurs?n6hmo%sF%B zjAS`h<AMRe-W1o(vXd^}w@nM@7y!_ie)Wp732w(Kt~8k@Da?JU1!H^^RpWP7abt!3 zqwW3O^FIH^V=W)HUg<BkTK?D1ye%NmlP#Zf8t98nr`UA63$|IwsLgW4N)*25vf3Dd4?gJ9 zBCr+q7#@Q(wF7SV)@soj!DZQq2dgp)G<eYi4;a3+4cqv;C}F_eynrWAx(e{N#%& zk1c{uz=FDLnWIgd9(uIE <
[Qemu-devel] [RESEND PATCH for-2.12 0/2] ACPI/unit-test: Add a testcase
These are the patches left over from the pull request: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05 because of some errors when tested by "make check" command[1]. Now, the QEMU 2.11.0 has been released, rebase/retest/respin them for v2.12 dev. Changelog: --Rebase and retest this patch in v2.11.0 --Update the DSDT.numamem: [1]: http://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01334.html Dou Liyang (2): ACPI/unit-test: Add a testcase for RAM allocation in numa node hw/acpi-build: Make next_base easy to follow hw/i386/acpi-build.c | 2 +- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 5150 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 224 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7834 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes tests/bios-tables-test.c | 24 6 files changed, 25 insertions(+), 1 deletion(-) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem -- 2.14.3
[Qemu-devel] [RESEND PATCH for-2.12 2/2] hw/acpi-build: Make next_base easy to follow
It may be hard to read the assignment statement of "next_base", so S/next_base += (1ULL << 32) - pcms->below_4g_mem_size; /next_base = mem_base + mem_len; ... for readability. No functionality change. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- hw/i386/acpi-build.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 73519ab3ac..2010108ae0 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2381,7 +2381,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) } mem_base = 1ULL << 32; mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +next_base = mem_base + mem_len; } numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, i - 1, -- 2.14.3
Re: [Qemu-devel] [PULL 1/9] hw/acpi-build: Make assignment statement of next_base easy to read
Hi, Igor, At 11/28/2017 11:07 PM, Igor Mammedov wrote: On Thu, 5 Oct 2017 17:36:30 -0300 Eduardo Habkost <ehabk...@redhat.com> wrote: From: Dou Liyang <douly.f...@cn.fujitsu.com> It may be hard to read the assignment statement of "next_base", so S/next_base += (1ULL << 32) - pcms->below_4g_mem_size; /next_base = mem_base + mem_len; ... for readability. No functionality change. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Message-Id: <1504231805-30957-3-git-send-email-douly.f...@cn.fujitsu.com> Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> Pls, retest/respin once 2.12 dev window is open Yes, I will. Thanks, dou. --- hw/i386/acpi-build.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 2af37a9129..73e3443bce 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2381,7 +2381,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) } mem_base = 1ULL << 32; mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +next_base = mem_base + mem_len; } numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, i - 1,
Re: [Qemu-devel] [PULL 2/9] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
Hi Igor, At 11/28/2017 11:08 PM, Igor Mammedov wrote: On Thu, 5 Oct 2017 17:36:31 -0300 Eduardo Habkost <ehabk...@redhat.com> wrote: From: Dou Liyang <douly.f...@cn.fujitsu.com> As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Message-Id: <1504231805-30957-4-git-send-email-douly.f...@cn.fujitsu.com> Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> Pls retest/respin once dev window is open. Yes, Thanks so much for your reminder. I will do that. Thanks, dou.
Re: [Qemu-devel] [PATCH v6] NUMA: Enable adding NUMA node implicitly
Hi Igor, [...] +parse_numa_node(ms, , NULL); I get build break here: numa.c:451:13: error: too few arguments to function ‘parse_numa_node’ parse_numa_node(ms, , NULL); In upstream tree, your commit cc001888b780 ("numa: fixup parsed NumaNodeOptions earlier") removed a argument from parse_numa_node() recently. this definition of function becomes static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, Error **errp) this patch is based on the upstream tree, parse_numa_node() should have three arguments. I am not sure why you got this building failure log, can you tell me which branch did you test? Thanks, dou
[Qemu-devel] [PATCH v6] NUMA: Enable adding NUMA node implicitly
Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- changelog V5 --> V6: - rebase it to avoid building failure - test again --- hw/i386/pc.c| 1 + hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c| 1 + include/hw/boards.h | 1 + numa.c | 21 - vl.c| 3 +-- 6 files changed, 25 insertions(+), 3 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index e11a65b..156501c 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2325,6 +2325,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->get_default_cpu_node_id = pc_get_default_cpu_node_id; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->auto_enable_numa_with_memhp = true; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index f79d5cb..5e47528 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m) m->is_default = 0; m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); +m->auto_enable_numa_with_memhp = false; } DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL, diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index da3ea60..d606004 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m) m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); m->numa_auto_assign_ram = numa_legacy_auto_assign_ram; +m->auto_enable_numa_with_memhp = false; } DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL, diff --git a/include/hw/boards.h b/include/hw/boards.h index 191a5b3..f1077f1 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -192,6 +192,7 @@ struct MachineClass { bool ignore_memory_transaction_failures; int numa_mem_align_shift; const char **valid_cpu_types; +bool auto_enable_numa_with_memhp; void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, int nb_nodes, ram_addr_t size); diff --git a/numa.c b/numa.c index 8d78d95..7151b24 100644 --- a/numa.c +++ b/numa.c @@ -216,6 +216,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, } numa_info[nodenr].present = true; max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1); +nb_numa_nodes++; } static void parse_numa_distance(NumaDistOptions *dist, Error **errp) @@ -282,7 +283,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp) if (err) { goto end; } -nb_numa_nodes++; break; case NUMA_OPTIONS_TYPE_DIST: parse_numa_distance(>u.dist, ); @@ -433,6 +433,25 @@ void parse_numa_opts(MachineState *ms) exit(1); } +/* + * If memory hotplug is enabled (slots > 0) but without '-numa' + * options explicitly on CLI, guestes will break. + * + * Windows: won't enable memory hotplug without SRAT table at all + * + * Linux: if QEMU is started with initial memory a
Re: [Qemu-devel] [PATCH v5] NUMA: Enable adding NUMA node implicitly
Hi Michael, [...] Seems to cause build failures: /scm/qemu/numa.c:452:13: error: too many arguments to function ‘parse_numa_node’ parse_numa_node(ms, , NULL, NULL); Yes, commit cc001888b780 ("numa: fixup parsed NumaNodeOptions earlier") removed a argument from the function recently. I will rebase this patch. Thanks, dou. --- changelog V4 --> V5: - Avoid calling qemu_opts_parse*() - Add a new NUMA node by calling parse_numa_node() directly - Remove the redundant argument in parse_numa_opts() These all were suggested by Eduardo. --- hw/i386/pc.c| 1 + hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c| 1 + include/hw/boards.h | 1 + numa.c | 21 - vl.c| 3 +-- 6 files changed, 25 insertions(+), 3 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 8e307f7..ec4eb97 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2325,6 +2325,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->get_default_cpu_node_id = pc_get_default_cpu_node_id; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->auto_enable_numa_with_memhp = true; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index f79d5cb..5e47528 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m) m->is_default = 0; m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); +m->auto_enable_numa_with_memhp = false; } DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL, diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index da3ea60..d606004 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m) m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); m->numa_auto_assign_ram = numa_legacy_auto_assign_ram; +m->auto_enable_numa_with_memhp = false; } DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL, diff --git a/include/hw/boards.h b/include/hw/boards.h index 191a5b3..f1077f1 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -192,6 +192,7 @@ struct MachineClass { bool ignore_memory_transaction_failures; int numa_mem_align_shift; const char **valid_cpu_types; +bool auto_enable_numa_with_memhp; void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, int nb_nodes, ram_addr_t size); diff --git a/numa.c b/numa.c index 100a67f..4d12f30 100644 --- a/numa.c +++ b/numa.c @@ -221,6 +221,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, } numa_info[nodenr].present = true; max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1); +nb_numa_nodes++; } static void parse_numa_distance(NumaDistOptions *dist, Error **errp) @@ -281,7 +282,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp) if (err) { goto end; } -nb_numa_nodes++; break; case NUMA_OPTIONS_TYPE_DIST: parse_numa_distance(>u.dist, ); @@ -432,6 +432,25 @@ void parse_numa_opts(MachineState *ms) exit(1); } +/* + * If memory hotplug is enabled (slots > 0) but without '-numa' + * options explicitly on CLI, guestes will break. + * + * Windows: won't enable memory hotplug without SRAT table at all + * + * Linux: if QEMU is started with initial memory all below 4Gb + * and no SRAT table present, guest kernel will use nommu DMA ops, + * which breaks 32bit hw drivers when memory is hotplugged and + * guest tries to use it with that drivers. + * + * Enable NUMA implicitly by adding a new NUMA node automatically. + */ +if (ms->ram_slots > 0 && nb_numa_nodes == 0 && +mc->auto_enable_numa_with_memhp) { +NumaNodeOptions node = { }; +parse_numa_node(ms, , NULL, NULL); +} + assert(max_numa_nodeid <= MAX_NODES); /* No support for sparse NUMA node IDs yet: */ diff --git a/vl.c b/vl.c index ec29909..be332d1 100644 --- a/vl.c +++ b/vl.c @@ -4675,8 +4675,6 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); -parse_numa_opts(current_machine); - if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, NULL)) { exit(1); @@ -4726,6 +4724,7 @@ int main(int argc, char **argv, char **envp) current_machine->boot_order = boot_order; current_machine->cpu_model = cpu_model; +parse_numa_opts(current_machine); /* parse features once if machine provides default cpu_type */ if (machine_class->default_cpu_type) {
[Qemu-devel] [PATCH v5] NUMA: Enable adding NUMA node implicitly
Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- changelog V4 --> V5: - Avoid calling qemu_opts_parse*() - Add a new NUMA node by calling parse_numa_node() directly - Remove the redundant argument in parse_numa_opts() These all were suggested by Eduardo. --- hw/i386/pc.c| 1 + hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c| 1 + include/hw/boards.h | 1 + numa.c | 21 - vl.c| 3 +-- 6 files changed, 25 insertions(+), 3 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 8e307f7..ec4eb97 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2325,6 +2325,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->get_default_cpu_node_id = pc_get_default_cpu_node_id; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->auto_enable_numa_with_memhp = true; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index f79d5cb..5e47528 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m) m->is_default = 0; m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); +m->auto_enable_numa_with_memhp = false; } DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL, diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index da3ea60..d606004 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m) m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); m->numa_auto_assign_ram = numa_legacy_auto_assign_ram; +m->auto_enable_numa_with_memhp = false; } DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL, diff --git a/include/hw/boards.h b/include/hw/boards.h index 191a5b3..f1077f1 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -192,6 +192,7 @@ struct MachineClass { bool ignore_memory_transaction_failures; int numa_mem_align_shift; const char **valid_cpu_types; +bool auto_enable_numa_with_memhp; void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, int nb_nodes, ram_addr_t size); diff --git a/numa.c b/numa.c index 100a67f..4d12f30 100644 --- a/numa.c +++ b/numa.c @@ -221,6 +221,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, } numa_info[nodenr].present = true; max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1); +nb_numa_nodes++; } static void parse_numa_distance(NumaDistOptions *dist, Error **errp) @@ -281,7 +282,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp) if (err) { goto end; } -nb_numa_nodes++; break; case NUMA_OPTIONS_TYPE_DIST: parse_numa_distance(>u.dist, ); @@ -432,6 +432,25 @@ void parse_numa_opts(MachineState *ms) exit(1); } +/* + * If memory hotplug is enabled (slots > 0) but without '-numa' + * options explicitly on CLI, gueste
Re: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly
Deer Eduardo, At 10/26/2017 04:02 PM, Eduardo Habkost wrote: Hi, Sorry for taking so long to review it: Not matter. It's my honor! On Mon, Oct 23, 2017 at 09:33:42AM +0800, Dou Liyang wrote: [...] + */ +if (ram_slots > 0 && QTAILQ_EMPTY(_opts->head)) { +if (mc->auto_enable_numa_with_memhp) { If you move the code after qemu_opts_foreach(), you could just check if nb_numa_nodes is 0 instead of peeking at numa_opts->head. +qemu_opts_parse_noisily(numa_opts, "node", true); +} +} Calling qemu_opts_parse*() has additional user-visible side effects (it can make -writeconfig include the new option, depending on the initialization ordering). Affecting QemuOpts depending on the machine-type breaks the separation between machine configuration from machine initialization, so I would like to avoid it. Yes, Indeed. Thank you so much for kind explanation! We could simply call parse_numa_node() (after making it increment nb_numa_nodes automatically). Yes, I will use it in the next version. e.g.: diff --git a/numa.c b/numa.c index 8d78d959f6..da18e42ce7 100644 --- a/numa.c +++ b/numa.c @@ -216,6 +216,7 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node, } numa_info[nodenr].present = true; max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1); +nb_numa_nodes++; } static void parse_numa_distance(NumaDistOptions *dist, Error **errp) @@ -282,7 +283,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp) if (err) { goto end; } -nb_numa_nodes++; break; case NUMA_OPTIONS_TYPE_DIST: parse_numa_distance(>u.dist, ); @@ -433,6 +433,26 @@ void parse_numa_opts(MachineState *ms) exit(1); } +/* + * If memory hotplug is enabled (slots > 0) but without '-numa' + * options explicitly on CLI, guestes will break. + * + * Windows: won't enable memory hotplug without SRAT table at all + * + * Linux: if QEMU is started with initial memory all below 4Gb + * and no SRAT table present, guest kernel will use nommu DMA ops, + * which breaks 32bit hw drivers when memory is hotplugged and + * guest tries to use it with that drivers. + * + * Enable NUMA implicitly by adding a new NUMA node automatically. + */ +if (ms->ram_slots > 0 && nb_numa_nodes == 0 && +mc->auto_enable_numa_with_memhp) { +NumaNodeOptions node = { }; +parse_numa_node(ms, , _abort); +} + + assert(max_numa_nodeid <= MAX_NODES); /* No support for sparse NUMA node IDs yet: */ + +if (qemu_opts_foreach(numa_opts, parse_numa, ms, NULL)) { exit(1); } diff --git a/vl.c b/vl.c index 0723835..516d0c9 100644 --- a/vl.c +++ b/vl.c @@ -4677,8 +4677,6 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); -parse_numa_opts(current_machine); - if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, NULL)) { exit(1); @@ -4728,6 +4726,7 @@ int main(int argc, char **argv, char **envp) current_machine->boot_order = boot_order; current_machine->cpu_model = cpu_model; +parse_numa_opts(current_machine, ram_slots); Why did you add a ram_slots argument if it's already present at current_machine->ram_slots? Oops, Forgot to update it after moving it behind the setup of current_machine. will remove the redundant ram_slots argument. Thanks, dou /* parse features once if machine provides default cpu_type */ if (machine_class->default_cpu_type) { -- 2.5.5
Re: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly
Hi, Fam At 10/23/2017 09:37 AM, no-re...@patchew.org wrote: Hi, This series failed build test on s390x host. Please find the details below. Type: series Message-id: 1508722422-3861-1-git-send-email-douly.f...@cn.fujitsu.com Subject: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly GEN hw/pci/trace.h GEN hw/s390x/trace.h GEN hw/vfio/trace.h GEN hw/acpi/trace.h GEN hw/arm/trace.h GEN hw/alpha/trace.h Makefile:30: recipe for target 'git-submodule-update' failed make: *** [git-submodule-update] Error 1 It also seems that there are something mistake in our test machines. Thanks, dou. make: *** Waiting for unfinished jobs === OUTPUT END === Test command exited with code: 2 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-de...@freelists.org
Re: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly
Hi Fam, At 10/23/2017 09:37 AM, no-re...@patchew.org wrote: Hi, This series failed automatic build test. Please find the testing commands and their output below. If you have docker installed, you can probably reproduce it locally. Type: series Message-id: 1508722422-3861-1-git-send-email-douly.f...@cn.fujitsu.com Subject: [Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-quick@centos6 time make docker-test-build@min-glib time make docker-test-mingw@fedora time make docker-test-block@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/1508722422-3861-1-git-send-email-douly.f...@cn.fujitsu.com -> patchew/1508722422-3861-1-git-send-email-douly.f...@cn.fujitsu.com t [tag update]patchew/cover.1508329282.git.riku.voi...@linaro.org -> patchew/cover.1508329282.git.riku.voi...@linaro.org Switched to a new branch 'test' b8c0d922bf NUMA: Enable adding NUMA node implicitly === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-z9ao6big/src/dtc'... fatal: Could not read from remote repository. Wow, It seems there are something wrong with the test robot. Thanks, dou. Please make sure you have the correct access rights and the repository exists. fatal: clone of 'git://git.qemu-project.org/dtc.git' into submodule path '/var/tmp/patchew-tester-tmp-z9ao6big/src/dtc' failed Failed to clone 'dtc'. Retry scheduled Cloning into '/var/tmp/patchew-tester-tmp-z9ao6big/src/dtc'... fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. fatal: clone of 'git://git.qemu-project.org/dtc.git' into submodule path '/var/tmp/patchew-tester-tmp-z9ao6big/src/dtc' failed Failed to clone 'dtc' a second time, aborting === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-de...@freelists.org
[Qemu-devel] [PATCH v4] NUMA: Enable adding NUMA node implicitly
Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- changelog V3 --> V4: -Move the parse_numa_opts behind the setup of current_machine for safety suggested by Igor hw/i386/pc.c | 1 + hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + include/hw/boards.h | 1 + include/sysemu/numa.h | 2 +- numa.c| 24 ++-- vl.c | 3 +-- 7 files changed, 28 insertions(+), 5 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 8e307f7..ec4eb97 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2325,6 +2325,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->get_default_cpu_node_id = pc_get_default_cpu_node_id; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->auto_enable_numa_with_memhp = true; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index f79d5cb..5e47528 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m) m->is_default = 0; m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); +m->auto_enable_numa_with_memhp = false; } DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL, diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index da3ea60..d606004 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m) m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); m->numa_auto_assign_ram = numa_legacy_auto_assign_ram; +m->auto_enable_numa_with_memhp = false; } DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL, diff --git a/include/hw/boards.h b/include/hw/boards.h index 191a5b3..f1077f1 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -192,6 +192,7 @@ struct MachineClass { bool ignore_memory_transaction_failures; int numa_mem_align_shift; const char **valid_cpu_types; +bool auto_enable_numa_with_memhp; void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, int nb_nodes, ram_addr_t size); diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 5c6df28..31d3ac0 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -30,7 +30,7 @@ struct NumaNodeMem { }; extern NodeInfo numa_info[MAX_NODES]; -void parse_numa_opts(MachineState *ms); +void parse_numa_opts(MachineState *ms, uint64_t ram_slots); void query_numa_node_mem(NumaNodeMem node_mem[]); extern QemuOptsList qemu_numa_opts; void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); diff --git a/numa.c b/numa.c index 100a67f..ba8d813 100644 --- a/numa.c +++ b/numa.c @@ -423,12 +423,32 @@ void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, nodes[i].node_mem = size - usedmem; } -void parse_numa_opts(MachineState *ms) +void parse_numa_opts(MachineState *ms, uint64_t ram_slots) { int i; MachineClass *mc = MACHINE_GET_CLASS(ms); +QemuOptsList *num
Re: [Qemu-devel] [PATCH] ACPI/unit-test: Add a testcase for RAM allocation in numa node
Hi Eduardo, Thanks for your reply. At 10/21/2017 03:15 AM, Eduardo Habkost wrote: On Wed, Oct 18, 2017 at 11:50:32AM +0800, Dou Liyang wrote: At 10/18/2017 11:48 AM, Dou Liyang wrote: As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- Changelog: This patch can pass the 'make check' in Peter's machine, Oops, s/can/can not :-). Thanks, dou. But, in my own and Eduardo's machine, we can pass it. So I rebase and spilt it independently. Thanks! Peter: could you help me test it in your machine. then, give me the output files, such as /tmp/asl-6QYK7Y.dsl and /tmp/asl-1H1I7Y.dsl file. As the failures reported by Peter were about the *.memhp test case files, I think this patch is unlikely to be the cause of those failures. I believe the problem is at: From: Dou Liyang <douly.f...@cn.fujitsu.com> Date: Fri, 1 Sep 2017 10:10:03 +0800 Message-ID: <1504231805-30957-3-git-send-email-douly.f...@cn.fujitsu.com> Subject: [PATCH v6 2/4] hw/acpi-build: Make assignment statement of next_base easy to read This patch is just used for readability. so I discard it. and, we need a test case for the commit 4926403c250f ("hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM") So, I re-post this patch. ;-) However, I would still like to get test results from Peter before applying this patch again. I see. Thanks, dou
Re: [Qemu-devel] [PATCH v3] NUMA: Enable adding NUMA node implicitly
Hi Igor, diff --git a/vl.c b/vl.c index 9bb5058..d083b4d 100644 --- a/vl.c +++ b/vl.c @@ -4665,7 +4665,11 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); -parse_numa_opts(current_machine); +current_machine->ram_size = ram_size; +current_machine->maxram_size = maxram_size; +current_machine->ram_slots = ram_slots; + +parse_numa_opts(current_machine, ram_slots); if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, NULL)) { @@ -4710,9 +4714,6 @@ int main(int argc, char **argv, char **envp) replay_checkpoint(CHECKPOINT_INIT); qdev_machine_init(); -current_machine->ram_size = ram_size; -current_machine->maxram_size = maxram_size; -current_machine->ram_slots = ram_slots; current_machine->boot_order = boot_order; current_machine->cpu_model = cpu_model; it should be safe to move parse_numa_opts(current_machine) here Yes, Indeed! I have modified and tested this, It's good. will change in the next version. Thanks, dou.
Re: [Qemu-devel] [PATCH v3] NUMA: Enable adding NUMA node implicitly
Hi all, It seems that a month has passed So, Ping... Thanks, dou. At 09/21/2017 05:23 PM, Dou Liyang wrote: Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- changelog V2 --> V3: -Replace the callback function with a boolean parameter suggested by Igor -Use QTAILQ_EMPTY() macro to check the QemuOptsList hw/i386/pc.c | 1 + hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + include/hw/boards.h | 1 + include/sysemu/numa.h | 2 +- numa.c| 24 ++-- vl.c | 9 + 7 files changed, 32 insertions(+), 7 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 05985d4..f1a44cc 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2318,6 +2318,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->get_default_cpu_node_id = pc_get_default_cpu_node_id; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->auto_enable_numa_with_memhp = true; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 9ff79b1..d87a433 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -449,6 +449,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m) m->is_default = 0; m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); +m->auto_enable_numa_with_memhp = false; } DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL, diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 6c4ec4b..68cbfc5 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -319,6 +319,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m) m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); m->numa_auto_assign_ram = numa_legacy_auto_assign_ram; +m->auto_enable_numa_with_memhp = false; } DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL, diff --git a/include/hw/boards.h b/include/hw/boards.h index 156e0a5..0fe2c8f 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -191,6 +191,7 @@ struct MachineClass { bool has_hotpluggable_cpus; bool ignore_memory_transaction_failures; int numa_mem_align_shift; +bool auto_enable_numa_with_memhp; void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, int nb_nodes, ram_addr_t size); diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 5c6df28..31d3ac0 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -30,7 +30,7 @@ struct NumaNodeMem { }; extern NodeInfo numa_info[MAX_NODES]; -void parse_numa_opts(MachineState *ms); +void parse_numa_opts(MachineState *ms, uint64_t ram_slots); void query_numa_node_mem(NumaNodeMem node_mem[]); extern QemuOptsList qemu_numa_opts; void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); diff --git a/numa.c b/numa.c index 100a67f..ba8d813 100644 --- a/numa.c +++ b/numa.c @@ -423,12 +423,32 @@ void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, nodes[i].node_mem = size - usedmem; } -void parse_numa_opts(Ma
Re: [Qemu-devel] [PATCH] ACPI/unit-test: Add a testcase for RAM allocation in numa node
At 10/18/2017 11:48 AM, Dou Liyang wrote: As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- Changelog: This patch can pass the 'make check' in Peter's machine, Oops, s/can/can not :-). Thanks, dou. But, in my own and Eduardo's machine, we can pass it. So I rebase and spilt it independently. Peter: could you help me test it in your machine. then, give me the output files, such as /tmp/asl-6QYK7Y.dsl and /tmp/asl-1H1I7Y.dsl file. tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 5104 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 224 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes tests/bios-tables-test.c | 24 5 files changed, 24 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem diff --git a/tests/acpi-test-data/pc/DSDT.numamem b/tests/acpi-test-data/pc/DSDT.numamem new file mode 100644 index ..bc703d77ee48720af1b801c518b43511c95c852d GIT binary patch literal 5104 zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1 z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0 zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F; zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^ za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u z$a<-R7FR~o4s5?jj- <
[Qemu-devel] [PATCH] ACPI/unit-test: Add a testcase for RAM allocation in numa node
As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- Changelog: This patch can pass the 'make check' in Peter's machine, But, in my own and Eduardo's machine, we can pass it. So I rebase and spilt it independently. Peter: could you help me test it in your machine. then, give me the output files, such as /tmp/asl-6QYK7Y.dsl and /tmp/asl-1H1I7Y.dsl file. tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 5104 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 224 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes tests/bios-tables-test.c | 24 5 files changed, 24 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem diff --git a/tests/acpi-test-data/pc/DSDT.numamem b/tests/acpi-test-data/pc/DSDT.numamem new file mode 100644 index ..bc703d77ee48720af1b801c518b43511c95c852d GIT binary patch literal 5104 zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1 z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0 zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F; zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^ za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u z$a<-R7FR~o4s5?jj- <
Re: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05
Hi Peter, At 10/10/2017 08:56 PM, Peter Maydell wrote: On 10 October 2017 at 13:52, Eduardo Habkostwrote: This shouldn't happen if CONFIG_IASL is not defined. Did you run configure again after removing iasl? I've neither added nor removed iasl from the build systems in question. Pull request tests just apply the pull and run make/make check. Yes, Indeed. now I can be sure that the problem is not related to iasl installed or not. There are both OK in my host. According to: >>> acpi-test: Warning! DSDT mismatch. Actual [asl:/tmp/asl-6QYK7Y.dsl, >>> aml:/tmp/aml-E8YK7Y], Expected [asl:/tmp/asl-1H1I7Y.dsl, >>> aml:tests/acpi-test-data/pc/DSDT.memhp]. May I resend the patch independently, and could you help me test it in your machine. then, give me the output files, such as /tmp/asl-6QYK7Y.dsl and /tmp/asl-1H1I7Y.dsl file. Thanks, dou. thanks -- PMM
Re: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05
Hi Peter, At 10/10/2017 05:40 PM, Peter Maydell wrote: On 10 October 2017 at 08:41, Dou Liyang <douly.f...@cn.fujitsu.com> wrote: If we want do the "bios-tables-test", we need iasl support. otherwise, when I run "make V=1 check" without iasl, the test will report error: This works in master at the moment, I think (whether by skipping the test or by using pre-compiled tables I don't know). We need to keep 'make check' working with and without iasl. Yes, you are right, QEMU uses the macro named CONFIG_IASL to skip the test if there is no ACPICA-tools in our system. The QEMU has already kept 'make check' working with and without iasl. The way I tested was wrong. Thanks, dou. thanks -- PMM
Re: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05
Hi Eduardo, At 10/10/2017 08:52 PM, Eduardo Habkost wrote: This shouldn't happen if CONFIG_IASL is not defined. Did you run configure again after removing iasl? Oops, Yes, the result above is in CONFIG_IASL = iasl. :-( I have been aware of this problem. I am waiting the result of the test with './configure --iasl="" ' Thanks, dou.
Re: [Qemu-devel] [PULL 0/9] x86 and machine queue, 2017-10-05
Hi Eduardo, I couldn't reproduce the failure too. But, I have some concern below. At 10/06/2017 11:37 PM, Peter Maydell wrote: On 6 October 2017 at 16:27, Eduardo Habkostwrote: On Fri, Oct 06, 2017 at 01:17:42PM +0100, Peter Maydell wrote: This fails tests, I'm afraid: [...] acpi-test: Warning! DSDT mismatch. Actual [asl:/tmp/asl-6QYK7Y.dsl, aml:/tmp/aml-E8YK7Y], Expected [asl:/tmp/asl-1H1I7Y.dsl, aml:tests/acpi-test-data/pc/DSDT.memhp]. This warning is about memory hot-plug tests. DSDT mismatch is too strange, I hope to see their difference. acpi-test: Warning. not showing difference since no diff utility is specified. Set 'DIFF' environment variable to a preferred diff utility and run 'make V=1 check' again to see ASL difference.OK and same for /i386/acpi/piix4/numamem and for the equivalent q35 tests. Fails on s390, netbsd, freebsd, aarch32, x86-64. Oops, sorry! I need to find out why it didn't fail on travis-ci: https://travis-ci.org/ehabkost/qemu/builds/283884670 My guess is that one of {iasl installed, iasl not installed} is broken, but the travis environments don't test both. If we want do the "bios-tables-test", we need iasl support. otherwise, when I run "make V=1 check" without iasl, the test will report error: ... TEST: tests/bios-tables-test... (pid=1006) /i386/acpi/piix4: Looking for expected file 'tests/acpi-test-data/pc/DSDT' Using expected file 'tests/acpi-test-data/pc/DSDT' Looking for expected file 'tests/acpi-test-data/pc/APIC' Using expected file 'tests/acpi-test-data/pc/APIC' Looking for expected file 'tests/acpi-test-data/pc/HPET' Using expected file 'tests/acpi-test-data/pc/HPET' ** ERROR:tests/bios-tables-test.c:328:load_asl: assertion failed (error == NULL): Failed to execute child process "iasl" (No such file or directory) (g-exec-error-quark, 8) FAIL ... Thanks, dou. thanks -- PMM
Re: [Qemu-devel] [PATCH v6 2/4] hw/acpi-build: Make assignment statement of next_base easy to read
Hi Eduardo, At 10/10/2017 10:29 AM, Eduardo Habkost wrote: On Fri, Sep 01, 2017 at 10:10:03AM +0800, Dou Liyang wrote: It may be hard to read the assignment statement of "next_base", so S/next_base += (1ULL << 32) - pcms->below_4g_mem_size; /next_base = mem_base + mem_len; ... for readability. No functionality change. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> This or the new test case at patch 3/4 seem to have caused build failures under some circumstances: https://www.mail-archive.com/qemu-devel@nongnu.org/msg486368.html I couldn't reproduce the failure yet, but I'm removing this and patch 3/4 from the queue until this is investigated. Hmm, I see, I will do it. Thanks, dou. --- hw/i386/acpi-build.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index a0cf3bf..48525a1 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2411,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) } mem_base = 1ULL << 32; mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +next_base = mem_base + mem_len; } numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, i - 1, -- 2.5.5
[Qemu-devel] [PATCH v3] NUMA: Enable adding NUMA node implicitly
Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- changelog V2 --> V3: -Replace the callback function with a boolean parameter suggested by Igor -Use QTAILQ_EMPTY() macro to check the QemuOptsList hw/i386/pc.c | 1 + hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + include/hw/boards.h | 1 + include/sysemu/numa.h | 2 +- numa.c| 24 ++-- vl.c | 9 + 7 files changed, 32 insertions(+), 7 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 05985d4..f1a44cc 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2318,6 +2318,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->get_default_cpu_node_id = pc_get_default_cpu_node_id; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->auto_enable_numa_with_memhp = true; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 9ff79b1..d87a433 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -449,6 +449,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m) m->is_default = 0; m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); +m->auto_enable_numa_with_memhp = false; } DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL, diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 6c4ec4b..68cbfc5 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -319,6 +319,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m) m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); m->numa_auto_assign_ram = numa_legacy_auto_assign_ram; +m->auto_enable_numa_with_memhp = false; } DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL, diff --git a/include/hw/boards.h b/include/hw/boards.h index 156e0a5..0fe2c8f 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -191,6 +191,7 @@ struct MachineClass { bool has_hotpluggable_cpus; bool ignore_memory_transaction_failures; int numa_mem_align_shift; +bool auto_enable_numa_with_memhp; void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, int nb_nodes, ram_addr_t size); diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 5c6df28..31d3ac0 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -30,7 +30,7 @@ struct NumaNodeMem { }; extern NodeInfo numa_info[MAX_NODES]; -void parse_numa_opts(MachineState *ms); +void parse_numa_opts(MachineState *ms, uint64_t ram_slots); void query_numa_node_mem(NumaNodeMem node_mem[]); extern QemuOptsList qemu_numa_opts; void numa_set_mem_node_id(ram_addr_t addr, uint64_t size, uint32_t node); diff --git a/numa.c b/numa.c index 100a67f..ba8d813 100644 --- a/numa.c +++ b/numa.c @@ -423,12 +423,32 @@ void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes, nodes[i].node_mem = size - usedmem; } -void parse_numa_opts(MachineState *ms) +void parse_numa_opts(MachineState *ms, uint64_t ram_slots) { int i; Mach
Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly
Hi Igor, At 09/21/2017 03:54 PM, Igor Mammedov wrote: On Thu, 21 Sep 2017 12:19:17 +0800 [...] I've meant something like that: parse_numa_opts() { if (mc->auto_enable_numa_with_memhp == true) { qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true); } } Oops, Understood! ;-) . where x86 sets mc->auto_enable_numa_with_memhp to true by default and sets it to false for 2.10 and older machine types. It will allow individual machines to enable feature if they need it but prevent guest ABI breakage for old machine types (i.e. won't break migration) Got it. grep source for 'pcmc->legacy_cpu_hotplug = true' to see how this compact stuff works. OK. Thank you very much. Thanks, dou
Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly
Hi Igor, I am sorry I missed some comments you gave to me. my reply is below. At 09/18/2017 05:24 PM, Dou Liyang wrote: [...] ranges where *the guest will attempt to probe for a device that QEMU doesn't *implement and a stub device is required. + * @numa_implicit_add_node0: + *Enable NUMA implicitly by add a NUMA node. how about: s/auto_enable_numa_with_memhp/ Yes, really better than me, will do it. boolean instead, see below how it could improve patch. I am not really sure why do we want to make this function boolean. */ struct MachineClass { /*< private >*/ @@ -191,6 +193,8 @@ struct MachineClass { CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine, unsigned cpu_index); const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); + +void (*numa_implicit_add_node0)(void); }; /** diff --git a/vl.c b/vl.c index fb1f05b..814a5fa 100644 --- a/vl.c +++ b/vl.c @@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp) Error *main_loop_err = NULL; Error *err = NULL; bool list_data_dirs = false; +bool has_numa_config_in_CLI = false; typedef struct BlockdevOptions_queue { BlockdevOptions *bdo; Location loc; @@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp) if (!opts) { exit(1); } +has_numa_config_in_CLI = true; break; case QEMU_OPTION_display: display_type = select_display(optarg); @@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); +/* + * If memory hotplug is enabled i.e. slots > 0 and user hasn't add + * NUMA nodes explicitly on CLI + * + * Enable NUMA implicitly for guest to know the maximum memory + * from ACPI SRAT table, which is used for SWIOTLB. + */ +if (ram_slots > 0 && !has_numa_config_in_CLI) { +if (machine_class->numa_implicit_add_node0) { +machine_class->numa_implicit_add_node0(); +} +} parse_numa_opts(current_machine); it would be better to put this logic inside of parse_numa_opts() I'd suggest to move: current_machine->ram_size = ram_size; current_machine->maxram_size = maxram_size; current_machine->ram_slots = ram_slots; before parse_numa_opts() is called, and then handle 'memhp present+no numa on CLI" logic inside of parse_numa_opts(). With this you won't have to track 'has_numa_config_in_CLI', drop callback numa_implicit_add_node0() and numa nuances would be in place they are supposed to be: numa.c Is "dropping the callback..." means : static void auto_enable_numa_with_memhp(QemuOptsList *list) { ... } void parse_numa_opts(MachineState *ms, uint64_t ram_slots) { QemuOptsList *numa_opts = qemu_find_opts("numa"); ... auto_enable_numa_with_memhp(numa_opts); ... } So, No matter what arch it is, if it support NUMA, we will enable NUMA implicitly when it has already enabled memory hotplug by "slot=xx,maxmem=xx" CLI explicitly. I am not sure that, but this bug only affects x86 as I know, seems no need to affect other arches which support NUMA as well. Thanks, dou. if (qemu_opts_foreach(qemu_find_opts("mon"),
[Qemu-devel] [PATCH v2] NUMA: Enable adding NUMA node implicitly
Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- changelog V1 --> V2: -Move the logic from vl.c to numa.c suggested by Igor -Fix the guest ABI problem reported by Daniel -make the function name more understandable hw/i386/pc.c | 1 + hw/i386/pc_piix.c | 1 + hw/i386/pc_q35.c | 1 + include/hw/boards.h | 4 include/sysemu/numa.h | 3 ++- numa.c| 29 +++-- vl.c | 9 + 7 files changed, 41 insertions(+), 7 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 2108104..7a753ee 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2349,6 +2349,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->get_hotplug_handler = pc_get_hotpug_handler; mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->add_numa_node_implicitly = numa_add_node_implicitly; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index b03cc04..cc94334 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -452,6 +452,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass *m) m->is_default = 0; m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); +m->add_numa_node_implicitly = NULL; } DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL, diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index c1cba58..7329819 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -317,6 +317,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m) m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); m->numa_auto_assign_ram = numa_legacy_auto_assign_ram; +m->add_numa_node_implicitly = NULL; } DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL, diff --git a/include/hw/boards.h b/include/hw/boards.h index 7f044d1..c8c4f25 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -141,6 +141,8 @@ typedef struct { *should instead use "unimplemented-device" for all memory ranges where *the guest will attempt to probe for a device that QEMU doesn't *implement and a stub device is required. + * @add_numa_node_implicitly: + *Enable NUMA implicitly by add a new NUMA node automatically. */ struct MachineClass { /*< private >*/ @@ -191,6 +193,8 @@ struct MachineClass { CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine, unsigned cpu_index); const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); + +void (*add_numa_node_implicitly)(QemuOptsList *list); }; /** diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 5c6df28..1660249 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -30,7 +30,7 @@ struct NumaNodeMem { }; extern NodeInfo numa_info[MAX_NODES]; -void parse_numa_opts(MachineState *ms); +void parse_numa_opts(MachineState *ms, uint64_t ram_slots); void query_numa_node_mem(NumaNodeMem node_mem[]); extern QemuOptsList q
Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly
Hi Igor, At 09/18/2017 05:08 PM, Igor Mammedov wrote: On Fri, 15 Sep 2017 16:33:18 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT table for transfering NUMA configuration to the guest. So, the maximum memory in SRAT can be used to determine whether to use the swiotlb for IOMMU or not. However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will never be built. When memory hotplug is enabled, some guest's devices may start failing due to SWIOTLB is disabled. Add numa_implicit_add_node0 in struct MachineClass, Invoke it before QEMU parse NUMA options to enable adding NUMA node implicitly. I'd rewrite commit message with something like: Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * windows: won't enable memory hotplug without SRAT table at all * linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * windows: to enable memory hotplug * linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. It's more detailed and clear to understand, I will update the changelog in the next version. Thanks, dou. Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- hw/i386/pc.c| 6 ++ include/hw/boards.h | 4 vl.c| 14 ++ 3 files changed, 24 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 2108104..3c40117 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2308,6 +2308,11 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) return ms->possible_cpus; } +static void numa_implicit_add_node0(void) +{ +qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true); +} + static void x86_nmi(NMIState *n, int cpu_index, Error **errp) { /* cpu index isn't used */ @@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->get_hotplug_handler = pc_get_hotpug_handler; mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->numa_implicit_add_node0 = numa_implicit_add_node0; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/include/hw/boards.h b/include/hw/boards.h index 7f044d1..898d841 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -141,6 +141,8 @@ typedef struct { *should instead use "unimplemented-device" for all memory ranges where *the guest will attempt to probe for a device that QEMU doesn't *implement and a stub device is required. + * @numa_implicit_add_node0: + *Enable NUMA implicitly by add a NUMA node. how about: s/auto_enable_numa_with_memhp/ boolean instead, see below how it could improve patch. */ struct MachineClass { /*< private >*/ @@ -191,6 +193,8 @@ struct MachineClass { CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine, unsigned cpu_index); const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); + +void (*numa_implicit_add_node0)(void); }; /** diff --git a/vl.c b/vl.c index fb1f05b..814a5fa 100644 --- a/vl.c +++ b/vl.c @@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp) Error *main_loop_err = NULL; Error *err = NULL; bool list_data_dirs = false; +bool has_numa_config_in_CLI = false; typedef struct BlockdevOptions_queue {
Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly
At 09/18/2017 03:40 PM, Igor Mammedov wrote: On Mon, 18 Sep 2017 11:54:50 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: At 09/15/2017 06:05 PM, Dou Liyang wrote: Hi Daniel, At 09/15/2017 04:40 PM, Daniel P. Berrange wrote: On Fri, Sep 15, 2017 at 04:33:18PM +0800, Dou Liyang wrote: In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT table for transfering NUMA configuration to the guest. So, the maximum memory in SRAT can be used to determine whether to use the swiotlb for IOMMU or not. However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will never be built. When memory hotplug is enabled, some guest's devices may start failing due to SWIOTLB is disabled. Add numa_implicit_add_node0 in struct MachineClass, Invoke it before QEMU parse NUMA options to enable adding NUMA node implicitly. Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- hw/i386/pc.c| 6 ++ include/hw/boards.h | 4 vl.c| 14 ++ 3 files changed, 24 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 2108104..3c40117 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2308,6 +2308,11 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) return ms->possible_cpus; } +static void numa_implicit_add_node0(void) +{ +qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true); +} + static void x86_nmi(NMIState *n, int cpu_index, Error **errp) { /* cpu index isn't used */ @@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->get_hotplug_handler = pc_get_hotpug_handler; mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->numa_implicit_add_node0 = numa_implicit_add_node0; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/include/hw/boards.h b/include/hw/boards.h index 7f044d1..898d841 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -141,6 +141,8 @@ typedef struct { *should instead use "unimplemented-device" for all memory ranges where *the guest will attempt to probe for a device that QEMU doesn't *implement and a stub device is required. + * @numa_implicit_add_node0: + *Enable NUMA implicitly by add a NUMA node. */ struct MachineClass { /*< private >*/ @@ -191,6 +193,8 @@ struct MachineClass { CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine, unsigned cpu_index); const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); + +void (*numa_implicit_add_node0)(void); }; /** diff --git a/vl.c b/vl.c index fb1f05b..814a5fa 100644 --- a/vl.c +++ b/vl.c @@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp) Error *main_loop_err = NULL; Error *err = NULL; bool list_data_dirs = false; +bool has_numa_config_in_CLI = false; typedef struct BlockdevOptions_queue { BlockdevOptions *bdo; Location loc; @@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp) if (!opts) { exit(1); } +has_numa_config_in_CLI = true; break; case QEMU_OPTION_display: display_type = select_display(optarg); @@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); +/* + * If memory hotplug is enabled i.e. slots > 0 and user hasn't add + * NUMA nodes explicitly on CLI + * + * Enable NUMA implicitly for guest to know the maximum memory + * from ACPI SRAT table, which is used for SWIOTLB. + */ +if (ram_slots > 0 && !has_numa_config_in_CLI) { +if (machine_class->numa_implicit_add_node0) { +machine_class->numa_implicit_add_node0(); +} +} Won't this change guest ABI Hi Daniel, No, this will change the ACPI table in following case without NUMA con
Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly
At 09/15/2017 06:05 PM, Dou Liyang wrote: Hi Daniel, At 09/15/2017 04:40 PM, Daniel P. Berrange wrote: On Fri, Sep 15, 2017 at 04:33:18PM +0800, Dou Liyang wrote: In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT table for transfering NUMA configuration to the guest. So, the maximum memory in SRAT can be used to determine whether to use the swiotlb for IOMMU or not. However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will never be built. When memory hotplug is enabled, some guest's devices may start failing due to SWIOTLB is disabled. Add numa_implicit_add_node0 in struct MachineClass, Invoke it before QEMU parse NUMA options to enable adding NUMA node implicitly. Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- hw/i386/pc.c| 6 ++ include/hw/boards.h | 4 vl.c| 14 ++ 3 files changed, 24 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 2108104..3c40117 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2308,6 +2308,11 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) return ms->possible_cpus; } +static void numa_implicit_add_node0(void) +{ +qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true); +} + static void x86_nmi(NMIState *n, int cpu_index, Error **errp) { /* cpu index isn't used */ @@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->get_hotplug_handler = pc_get_hotpug_handler; mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->numa_implicit_add_node0 = numa_implicit_add_node0; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/include/hw/boards.h b/include/hw/boards.h index 7f044d1..898d841 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -141,6 +141,8 @@ typedef struct { *should instead use "unimplemented-device" for all memory ranges where *the guest will attempt to probe for a device that QEMU doesn't *implement and a stub device is required. + * @numa_implicit_add_node0: + *Enable NUMA implicitly by add a NUMA node. */ struct MachineClass { /*< private >*/ @@ -191,6 +193,8 @@ struct MachineClass { CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine, unsigned cpu_index); const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); + +void (*numa_implicit_add_node0)(void); }; /** diff --git a/vl.c b/vl.c index fb1f05b..814a5fa 100644 --- a/vl.c +++ b/vl.c @@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp) Error *main_loop_err = NULL; Error *err = NULL; bool list_data_dirs = false; +bool has_numa_config_in_CLI = false; typedef struct BlockdevOptions_queue { BlockdevOptions *bdo; Location loc; @@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp) if (!opts) { exit(1); } +has_numa_config_in_CLI = true; break; case QEMU_OPTION_display: display_type = select_display(optarg); @@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); +/* + * If memory hotplug is enabled i.e. slots > 0 and user hasn't add + * NUMA nodes explicitly on CLI + * + * Enable NUMA implicitly for guest to know the maximum memory + * from ACPI SRAT table, which is used for SWIOTLB. + */ +if (ram_slots > 0 && !has_numa_config_in_CLI) { +if (machine_class->numa_implicit_add_node0) { +machine_class->numa_implicit_add_node0(); +} +} Won't this change guest ABI Hi Daniel, No, this will change the ACPI table in following case without NUMA configuration: ... -m 128M,slots=4,maxmem=1G ... How about fixing it by adding m->numa_implicit_add_node0 = NULL; in pc_i440fx/q35_2_
Re: [Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly
Hi Daniel, At 09/15/2017 04:40 PM, Daniel P. Berrange wrote: On Fri, Sep 15, 2017 at 04:33:18PM +0800, Dou Liyang wrote: In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT table for transfering NUMA configuration to the guest. So, the maximum memory in SRAT can be used to determine whether to use the swiotlb for IOMMU or not. However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will never be built. When memory hotplug is enabled, some guest's devices may start failing due to SWIOTLB is disabled. Add numa_implicit_add_node0 in struct MachineClass, Invoke it before QEMU parse NUMA options to enable adding NUMA node implicitly. Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- hw/i386/pc.c| 6 ++ include/hw/boards.h | 4 vl.c| 14 ++ 3 files changed, 24 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 2108104..3c40117 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2308,6 +2308,11 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) return ms->possible_cpus; } +static void numa_implicit_add_node0(void) +{ +qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true); +} + static void x86_nmi(NMIState *n, int cpu_index, Error **errp) { /* cpu index isn't used */ @@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->get_hotplug_handler = pc_get_hotpug_handler; mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->numa_implicit_add_node0 = numa_implicit_add_node0; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/include/hw/boards.h b/include/hw/boards.h index 7f044d1..898d841 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -141,6 +141,8 @@ typedef struct { *should instead use "unimplemented-device" for all memory ranges where *the guest will attempt to probe for a device that QEMU doesn't *implement and a stub device is required. + * @numa_implicit_add_node0: + *Enable NUMA implicitly by add a NUMA node. */ struct MachineClass { /*< private >*/ @@ -191,6 +193,8 @@ struct MachineClass { CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine, unsigned cpu_index); const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); + +void (*numa_implicit_add_node0)(void); }; /** diff --git a/vl.c b/vl.c index fb1f05b..814a5fa 100644 --- a/vl.c +++ b/vl.c @@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp) Error *main_loop_err = NULL; Error *err = NULL; bool list_data_dirs = false; +bool has_numa_config_in_CLI = false; typedef struct BlockdevOptions_queue { BlockdevOptions *bdo; Location loc; @@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp) if (!opts) { exit(1); } +has_numa_config_in_CLI = true; break; case QEMU_OPTION_display: display_type = select_display(optarg); @@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); +/* + * If memory hotplug is enabled i.e. slots > 0 and user hasn't add + * NUMA nodes explicitly on CLI + * + * Enable NUMA implicitly for guest to know the maximum memory + * from ACPI SRAT table, which is used for SWIOTLB. + */ +if (ram_slots > 0 && !has_numa_config_in_CLI) { +if (machine_class->numa_implicit_add_node0) { +machine_class->numa_implicit_add_node0(); +} +} Won't this change guest ABI and so break migration/save/restore ? Thank you for your reply, I can't answer this immediately, I will look into it and reply you soon. This patch works for X86, has no influence on other arch, and we can regard it's functionality as "-numa node" in CLI. Thanks, dou. Regards, Daniel
[Qemu-devel] [RFC PATCH] NUMA: Enable adding NUMA node implicitly
In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT table for transfering NUMA configuration to the guest. So, the maximum memory in SRAT can be used to determine whether to use the swiotlb for IOMMU or not. However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will never be built. When memory hotplug is enabled, some guest's devices may start failing due to SWIOTLB is disabled. Add numa_implicit_add_node0 in struct MachineClass, Invoke it before QEMU parse NUMA options to enable adding NUMA node implicitly. Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> Suggested-by: Igor Mammedov <imamm...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Cc: Paolo Bonzini <pbonz...@redhat.com> Cc: Richard Henderson <r...@twiddle.net> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: "Michael S. Tsirkin" <m...@redhat.com> Cc: Marcel Apfelbaum <mar...@redhat.com> Cc: Igor Mammedov <imamm...@redhat.com> Cc: David Hildenbrand <da...@redhat.com> Cc: Thomas Huth <th...@redhat.com> Cc: Alistair Francis <alistai...@gmail.com> Cc: f4...@amsat.org Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> --- hw/i386/pc.c| 6 ++ include/hw/boards.h | 4 vl.c| 14 ++ 3 files changed, 24 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 2108104..3c40117 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2308,6 +2308,11 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms) return ms->possible_cpus; } +static void numa_implicit_add_node0(void) +{ +qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true); +} + static void x86_nmi(NMIState *n, int cpu_index, Error **errp) { /* cpu index isn't used */ @@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) mc->get_hotplug_handler = pc_get_hotpug_handler; mc->cpu_index_to_instance_props = pc_cpu_index_to_props; mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; +mc->numa_implicit_add_node0 = numa_implicit_add_node0; mc->has_hotpluggable_cpus = true; mc->default_boot_order = "cad"; mc->hot_add_cpu = pc_hot_add_cpu; diff --git a/include/hw/boards.h b/include/hw/boards.h index 7f044d1..898d841 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -141,6 +141,8 @@ typedef struct { *should instead use "unimplemented-device" for all memory ranges where *the guest will attempt to probe for a device that QEMU doesn't *implement and a stub device is required. + * @numa_implicit_add_node0: + *Enable NUMA implicitly by add a NUMA node. */ struct MachineClass { /*< private >*/ @@ -191,6 +193,8 @@ struct MachineClass { CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine, unsigned cpu_index); const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine); + +void (*numa_implicit_add_node0)(void); }; /** diff --git a/vl.c b/vl.c index fb1f05b..814a5fa 100644 --- a/vl.c +++ b/vl.c @@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp) Error *main_loop_err = NULL; Error *err = NULL; bool list_data_dirs = false; +bool has_numa_config_in_CLI = false; typedef struct BlockdevOptions_queue { BlockdevOptions *bdo; Location loc; @@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp) if (!opts) { exit(1); } +has_numa_config_in_CLI = true; break; case QEMU_OPTION_display: display_type = select_display(optarg); @@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp) default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); +/* + * If memory hotplug is enabled i.e. slots > 0 and user hasn't add + * NUMA nodes explicitly on CLI + * + * Enable NUMA implicitly for guest to know the maximum memory + * from ACPI SRAT table, which is used for SWIOTLB. + */ +if (ram_slots > 0 && !has_numa_config_in_CLI) { +if (machine_class->numa_implicit_add_node0) { +machine_class->numa_implicit_add_node0(); +} +} parse_numa_opts(current_machine); if (qemu_opts_foreach(qemu_find_opts("mon"), -- 2.5.5
Re: [Qemu-devel] [PATCH] x86/acpi: build SRAT when memory hotplug is enabled
Hi Igor, At 09/11/2017 06:58 PM, Igor Mammedov wrote: > > > > Igor's suggestion to enable NUMA implicitly sounds safer to me. > > > > I agree with Igor too. > > Is anybody doing this? If not, may I make a patch to enable adding NUMA > node implicitly first. let's see what it looks like. As far as I'm aware nobody is doing it, so fill free to look into it. Got it! I will do it right now. Thanks, dou. > > Thanks, >dou. > >
Re: [Qemu-devel] [PATCH V5 1/3] net/colo-compare.c: Optimize unpredictable tcp options comparison
Hi Chen, At 09/04/2017 02:14 PM, Zhang Chen wrote: When network is busy, some tcp options(like sack) will unpredictable occur in primary side or secondary side. it will make packet size not same, but the two packet's payload is identical. colo just care about packet payload, so we skip the option field. Signed-off-by: Zhang Chen--- net/colo-compare.c | 40 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index ca67c68..18a9ebf 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -186,7 +186,10 @@ static int packet_enqueue(CompareState *s, int mode) * return:0 means packet same *> 0 || < 0 means packet different */ -static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) +static int colo_packet_compare_common(Packet *ppkt, + Packet *spkt, + int poffset, + int soffset) { if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20]; @@ -201,12 +204,14 @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) sec_ip_src, sec_ip_dst); } -offset = ppkt->vnet_hdr_len + offset; +poffset = ppkt->vnet_hdr_len + poffset; +soffset = ppkt->vnet_hdr_len + soffset; -if (ppkt->size == spkt->size) { -return memcmp(ppkt->data + offset, - spkt->data + offset, - spkt->size - offset); +if (ppkt->size == spkt->size || +ppkt->size - poffset == spkt->size - soffset) { This logic has a problem: ppkt->data |---|-rangeP-| |\ | poffset ppkt->size \equal range \ spkt->data\\ |---|--rangeS| || soffsetspkt->size See the above picture, (ppkt->size == spkt->size) is true, if [soffset, spkt->size] == [poffset, poffset+(spkt->size- soffset) is also ture, the code will return 0, but actually, they are not equal. Please use following code instead, if (ppkt->size - poffset == spkt->size - soffset) I am a new boy in COLO, let's see what we actually want to compare, If I am wrong, please correct me. :-) ppkt->data |---|-rangeP-| || poffset ppkt->size spkt->data ||--rangeS-| | | soffset spkt->size The data in rangeP and rangeS is what we want to compare. So, we just need care about the rangeX's size and head pointer, not the whole size. Thanks, dou.
Re: [Qemu-devel] [PATCH] x86/acpi: build SRAT when memory hotplug is enabled
Hi Eduardo, At 09/04/2017 09:08 PM, Eduardo Habkost wrote: [...] In my opinion, this may also add the hotpluggable memory, and see the following commemts. /* * Entry is required for Windows to enable memory hotplug in OS * and for Linux to enable SWIOTLB when booted with less than * 4G of RAM. Windows works better if the entry sets proximity * to the highest NUMA node in the machine. * Memory devices may override proximity set by this entry, * providing _PXM method if necessary. */ if (hotplugabble_address_space_size) { numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, pcms->hotplug_memory.base, hotplugabble_address_space_size, pcms->numa_nodes - 1, MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); } You are correct, I didn't see that part of the code. If that's the entry that's missing, the patch makes sense. Thanks! However, the resulting tables still don't look correct: it will generate an entry assigned to NUMA node (uint32_t)-1 if no NUMA nodes are configured elsewhere, some APIC entries, but no entries for the rest of the memory. Yes, indeed. Igor's suggestion to enable NUMA implicitly sounds safer to me. I agree with Igor too. Is anybody doing this? If not, may I make a patch to enable adding NUMA node implicitly first. let's see what it looks like. Thanks, dou.
Re: [Qemu-devel] [PATCH v5 1/3] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM
Hi Igor, At 09/04/2017 07:11 PM, Igor Mammedov wrote: [...] +if (mem_base <= HOLE_640K_START && +next_base > HOLE_640K_START) { +mem_len -= next_base - HOLE_640K_START; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i - 1, + MEM_AFFINITY_ENABLED); +} + +/* Check for the rare case: 640K < RAM < 1M */ +if (next_base <= HOLE_640K_END) { +next_base = HOLE_640K_END; Is this assignment really necessary? It is necessary, because we set mem_base to next_base before setting next_base; But, I can refine it: MEM_AFFINITY_ENABLED); } +mem_base = HOLE_640K_END; /* Check for the rare case: 640K < RAM < 1M */ if (next_base <= HOLE_640K_END) { -next_base = HOLE_640K_END; continue; } -mem_base = HOLE_640K_END; mem_len = next_base - HOLE_640K_END; } Is it? I was wrong, so just leave it as it is now. OK, I see. Thanks, dou.
Re: [Qemu-devel] [PATCH v5 1/3] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM
At 09/04/2017 05:39 PM, Igor Mammedov wrote: On Thu, 31 Aug 2017 20:04:26 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: From: Eduardo Habkost <ehabk...@redhat.com> Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node 0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by cut out the 640K hole in the same way the PCI 4G hole does. Also do some cleanup. Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- hw/i386/acpi-build.c | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..48525a1 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,6 +2318,9 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +#define HOLE_640K_START (640 * 1024) +#define HOLE_640K_END (1024 * 1024) + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { @@ -2373,17 +2376,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) next_base = 0; numa_start = table_data->len; -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; for (i = 1; i < pcms->numa_nodes + 1; ++i) { mem_base = next_base; mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; -} next_base = mem_base + mem_len; +/* Cut out the 640K hole */ +if (mem_base <= HOLE_640K_START && +next_base > HOLE_640K_START) { +mem_len -= next_base - HOLE_640K_START; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i - 1, + MEM_AFFINITY_ENABLED); +} + +/* Check for the rare case: 640K < RAM < 1M */ +if (next_base <= HOLE_640K_END) { +next_base = HOLE_640K_END; Is this assignment really necessary? It is necessary, because we set mem_base to next_base before setting next_base; But, I can refine it: MEM_AFFINITY_ENABLED); } +mem_base = HOLE_640K_END; /* Check for the rare case: 640K < RAM < 1M */ if (next_base <= HOLE_640K_END) { -next_base = HOLE_640K_END; continue; } -mem_base = HOLE_640K_END; mem_len = next_base - HOLE_640K_END; } Is it? Thanks, dou. it seems that next_base will be set at the start of the loop. +continue; +} +mem_base = HOLE_640K_END; +mem_len = next_base - HOLE_640K_END; +} + /* Cut out the ACPI_PCI hole */ if (mem_base <= pcms->below_4g_mem_size && next_base > pcms->below_4g_mem_size) { @@ -2395,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) } mem_base = 1ULL << 32; mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +next_base = mem_base + mem_len; } numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, i - 1,
Re: [Qemu-devel] [PATCH V4 1/3] net/colo-compare.c: Optimize unpredictable tcp options comparison
Hi Chen, At 09/02/2017 12:02 AM, Zhang Chen wrote: On 09/01/2017 05:35 PM, Dou Liyang wrote: Hi chen, At 08/21/2017 04:55 PM, Zhang Chen wrote: When network is busy, some tcp options(like sack) will unpredictable occur in primary side or secondary side. it will make packet size not same, but the two packet's payload is identical. colo just care about packet payload, so we skip the option field. Signed-off-by: Zhang Chen <zhangchen.f...@cn.fujitsu.com> --- net/colo-compare.c | 39 +++ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index ca67c68..f6bda41 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -186,7 +186,10 @@ static int packet_enqueue(CompareState *s, int mode) * return:0 means packet same *> 0 || < 0 means packet different */ -static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) +static int colo_packet_compare_common(Packet *ppkt, + Packet *spkt, + int poffset, + int soffset) { if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20]; @@ -201,12 +204,13 @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) sec_ip_src, sec_ip_dst); } -offset = ppkt->vnet_hdr_len + offset; +poffset = ppkt->vnet_hdr_len + poffset; +soffset = ppkt->vnet_hdr_len + soffset; -if (ppkt->size == spkt->size) { -return memcmp(ppkt->data + offset, - spkt->data + offset, - spkt->size - offset); +if (ppkt->size == spkt->size || poffset != soffset) { +return memcmp(ppkt->data + poffset, + spkt->data + soffset, + spkt->size - soffset); Here maybe a mistake, I guess you should make sure (ppkt->size - poffset) == (spkt->size - soffset) is true in case of (poffset != soffset) at the same time. Because, if sack make the header not equal, it is also possible that the data is not equal at the same time. Good catch! In rare situation will miss a different packet, I will fix it in next version. Thank you very much! } else { trace_colo_compare_main("Net packet size are not the same"); return -1; @@ -263,13 +267,22 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) * so we just need skip this field. */ if (ptcp->th_off > 5) { -ptrdiff_t tcp_offset; +ptrdiff_t ptcp_offset, stcp_offset; -tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data - + (ptcp->th_off * 4) - ppkt->vnet_hdr_len; -res = colo_packet_compare_common(ppkt, spkt, tcp_offset); +ptcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data + + (ptcp->th_off * 4) - ppkt->vnet_hdr_len; +stcp_offset = spkt->transport_header - (uint8_t *)spkt->data + + (stcp->th_off * 4) - spkt->vnet_hdr_len; + +/* + * When network is busy, some tcp options(like sack) will unpredictable + * occur in primary side or secondary side. it will make packet size + * not same, but the two packet's payload is identical. colo just + * care about packet payload, so we skip the option field. + */ +res = colo_packet_compare_common(ppkt, spkt, ptcp_offset, stcp_offset); we add a new parameter in xxx_common() just for xxx_tcp(). In my opinion, it seems not good. Can we split the tcp related code out from xxx_common, and make the xxx_common function just do like what its name says. We introduce this parameter for all protocol, just tcp is the first user. We should have the ability of compare all kinds of packet. Maybe we must use different offset for other protocol except tcp in the future. Got it, Thanks, dou. Thanks Zhang Chen } else if (ptcp->th_sum == stcp->th_sum) { -res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN); +res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN, ETH_HLEN); } else { res = -1; } @@ -329,6 +342,7 @@ static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt) * the ip payload here. */ ret = colo_packet_compare_common(ppkt, spkt, + network_header_length + ETH_HLEN, network_header_length + ETH_HLEN); ditto if (ret) { @@ -366,6 +380,7 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt) * the ip payload here. */ if (colo_packet_compare_common(ppkt, spkt, +
Re: [Qemu-devel] [PATCH] x86/acpi: build SRAT when memory hotplug is enabled
Hi Eduardo, Thadeu, At 09/02/2017 12:11 AM, Eduardo Habkost wrote: On Fri, Sep 01, 2017 at 12:45:42PM -0300, Thadeu Lima de Souza Cascardo wrote: Linux uses SRAT to determine the maximum memory in a system, which is used to determine whether to use the swiotlb for IOMMU or not for a device that supports only 32 bits of addresses. Do you have a pointer to the corresponding Linux code, for reference? Which SRAT entries Linux uses to make this decision? When there is no NUMA configuration, qemu will not build SRAT. And when memory hotplug is done, some Linux device drivers start failing. Tested by running with -m 512M,slots=8,maxmem=1G, adding the memory, putting that online and using the system. Without the patch, swiotlb is not used and ATA driver fails. With the patch, swiotlb is used, no driver failure is observed. Signed-off-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> As far as I can see, this will only add APIC entries and a memory affinity entry for the first 640KB (which would be obviously wrong) if pcms->numa_nodes is 0. In my opinion, this may also add the hotpluggable memory, and see the following commemts. /* * Entry is required for Windows to enable memory hotplug in OS * and for Linux to enable SWIOTLB when booted with less than * 4G of RAM. Windows works better if the entry sets proximity * to the highest NUMA node in the machine. * Memory devices may override proximity set by this entry, * providing _PXM method if necessary. */ if (hotplugabble_address_space_size) { numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, pcms->hotplug_memory.base, hotplugabble_address_space_size, pcms->numa_nodes - 1, MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED); } Thanks, dou. Once we apply the "Fix SRAT memory building in case of node 0 without RAM" patch from Dou Liyang, no memory affinity entries will be generated if pcms->numa_nodes is 0. Would this cause the problem to happen again? --- hw/i386/acpi-build.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424678..fb94249779 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2645,6 +2645,9 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine) GArray *tables_blob = tables->table_data; AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL }; Object *vmgenid_dev; +ram_addr_t hotplugabble_address_space_size = +object_property_get_int(OBJECT(pcms), PC_MACHINE_MEMHP_REGION_SIZE, +NULL); acpi_get_pm_info(); acpi_get_misc_info(); @@ -2708,7 +2711,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine) build_tpm2(tables_blob, tables->linker); } } -if (pcms->numa_nodes) { +if (pcms->numa_nodes || hotplugabble_address_space_size) { acpi_add_table(table_offsets, tables_blob); build_srat(tables_blob, tables->linker, machine); if (have_numa_distance) { -- 2.11.0
Re: [Qemu-devel] [PATCH V4 1/3] net/colo-compare.c: Optimize unpredictable tcp options comparison
Hi chen, At 08/21/2017 04:55 PM, Zhang Chen wrote: When network is busy, some tcp options(like sack) will unpredictable occur in primary side or secondary side. it will make packet size not same, but the two packet's payload is identical. colo just care about packet payload, so we skip the option field. Signed-off-by: Zhang Chen--- net/colo-compare.c | 39 +++ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index ca67c68..f6bda41 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -186,7 +186,10 @@ static int packet_enqueue(CompareState *s, int mode) * return:0 means packet same *> 0 || < 0 means packet different */ -static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) +static int colo_packet_compare_common(Packet *ppkt, + Packet *spkt, + int poffset, + int soffset) { if (trace_event_get_state(TRACE_COLO_COMPARE_MISCOMPARE)) { char pri_ip_src[20], pri_ip_dst[20], sec_ip_src[20], sec_ip_dst[20]; @@ -201,12 +204,13 @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset) sec_ip_src, sec_ip_dst); } -offset = ppkt->vnet_hdr_len + offset; +poffset = ppkt->vnet_hdr_len + poffset; +soffset = ppkt->vnet_hdr_len + soffset; -if (ppkt->size == spkt->size) { -return memcmp(ppkt->data + offset, - spkt->data + offset, - spkt->size - offset); +if (ppkt->size == spkt->size || poffset != soffset) { +return memcmp(ppkt->data + poffset, + spkt->data + soffset, + spkt->size - soffset); Here maybe a mistake, I guess you should make sure (ppkt->size - poffset) == (spkt->size - soffset) is true in case of (poffset != soffset) at the same time. Because, if sack make the header not equal, it is also possible that the data is not equal at the same time. } else { trace_colo_compare_main("Net packet size are not the same"); return -1; @@ -263,13 +267,22 @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt) * so we just need skip this field. */ if (ptcp->th_off > 5) { -ptrdiff_t tcp_offset; +ptrdiff_t ptcp_offset, stcp_offset; -tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data - + (ptcp->th_off * 4) - ppkt->vnet_hdr_len; -res = colo_packet_compare_common(ppkt, spkt, tcp_offset); +ptcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data + + (ptcp->th_off * 4) - ppkt->vnet_hdr_len; +stcp_offset = spkt->transport_header - (uint8_t *)spkt->data + + (stcp->th_off * 4) - spkt->vnet_hdr_len; + +/* + * When network is busy, some tcp options(like sack) will unpredictable + * occur in primary side or secondary side. it will make packet size + * not same, but the two packet's payload is identical. colo just + * care about packet payload, so we skip the option field. + */ +res = colo_packet_compare_common(ppkt, spkt, ptcp_offset, stcp_offset); we add a new parameter in xxx_common() just for xxx_tcp(). In my opinion, it seems not good. Can we split the tcp related code out from xxx_common, and make the xxx_common function just do like what its name says. } else if (ptcp->th_sum == stcp->th_sum) { -res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN); +res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN, ETH_HLEN); } else { res = -1; } @@ -329,6 +342,7 @@ static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt) * the ip payload here. */ ret = colo_packet_compare_common(ppkt, spkt, + network_header_length + ETH_HLEN, network_header_length + ETH_HLEN); ditto if (ret) { @@ -366,6 +380,7 @@ static int colo_packet_compare_icmp(Packet *spkt, Packet *ppkt) * the ip payload here. */ if (colo_packet_compare_common(ppkt, spkt, + network_header_length + ETH_HLEN, network_header_length + ETH_HLEN)) { ditto trace_colo_compare_icmp_miscompare("primary pkt size", ppkt->size); @@ -403,7 +418,7 @@ static int colo_packet_compare_other(Packet *spkt, Packet *ppkt) sec_ip_src, sec_ip_dst); } -return colo_packet_compare_common(ppkt, spkt, 0); +return colo_packet_compare_common(ppkt, spkt, 0, 0); ditto Thanks, dou. } static int colo_old_packet_check_one(Packet *pkt, int64_t
[Qemu-devel] [PATCH v6 3/4] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 5104 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 224 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes tests/bios-tables-test.c | 24 5 files changed, 24 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem diff --git a/tests/acpi-test-data/pc/DSDT.numamem b/tests/acpi-test-data/pc/DSDT.numamem new file mode 100644 index ..bc703d77ee48720af1b801c518b43511c95c852d GIT binary patch literal 5104 zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1 z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0 zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F; zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^ za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u z$a<-R7FR~o4s5?jj- <
[Qemu-devel] [PATCH v6 4/4] NUMA: Replace MAX_NODES with nb_numa_nodes in for loop
In QEMU, the number of the NUMA nodes is determined by parse_numa_opts(). Then, QEMU uses it for iteration, for example: for (i = 0; i < nb_numa_nodes; i++) However, in memory_region_allocate_system_memory(), it uses MAX_NODES not nb_numa_nodes. So, replace MAX_NODES with nb_numa_nodes to keep code consistency and reduce the loop times. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Reviewed-by: Igor Mammedov <imamm...@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4...@amsat.org> --- numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/numa.c b/numa.c index e32af04..5f2916d 100644 --- a/numa.c +++ b/numa.c @@ -567,7 +567,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, } memory_region_init(mr, owner, name, ram_size); -for (i = 0; i < MAX_NODES; i++) { +for (i = 0; i < nb_numa_nodes; i++) { uint64_t size = numa_info[i].node_mem; HostMemoryBackend *backend = numa_info[i].node_memdev; if (!backend) { -- 2.5.5
[Qemu-devel] [PATCH v6 0/4]ACPI: NUMA: Fix ACPI SRAT Memory Affinity building
v5 --> v6: - Split the unrelated code into a separate patch [2/4] v4 --> v5: - Replace the original way with Eduardo's method. - Rewrite the testcase. - Drop the SLIT date - 2.11 develop tree is opened, So, Add the third patch for re-posting it. v3 --> v4: -add a new testcase. This patchset fixs an ACPI building bug which caused by no RAM in the first NUAM node. and also add a new testcase for the bug. Dou Liyang (3): hw/acpi-build: Make assignment statement of next_base easy to read ACPI/unit-test: Add a new testcase for RAM allocation in numa node NUMA: Replace MAX_NODES with nb_numa_nodes in for loop Eduardo Habkost (1): hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM hw/i386/acpi-build.c | 30 +++--- numa.c| 2 +- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 5104 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 224 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes tests/bios-tables-test.c | 24 7 files changed, 48 insertions(+), 8 deletions(-) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem -- 2.5.5
[Qemu-devel] [PATCH v6 1/4] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM
From: Eduardo Habkost <ehabk...@redhat.com> Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node 0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by cut out the 640K hole in the same way the PCI 4G hole does. Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- hw/i386/acpi-build.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..a0cf3bf 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,6 +2318,9 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +#define HOLE_640K_START (640 * 1024) +#define HOLE_640K_END (1024 * 1024) + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { @@ -2373,17 +2376,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) next_base = 0; numa_start = table_data->len; -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; for (i = 1; i < pcms->numa_nodes + 1; ++i) { mem_base = next_base; mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; -} next_base = mem_base + mem_len; +/* Cut out the 640K hole */ +if (mem_base <= HOLE_640K_START && +next_base > HOLE_640K_START) { +mem_len -= next_base - HOLE_640K_START; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i - 1, + MEM_AFFINITY_ENABLED); +} + +/* Check for the rare case: 640K < RAM < 1M */ +if (next_base <= HOLE_640K_END) { +next_base = HOLE_640K_END; +continue; +} +mem_base = HOLE_640K_END; +mem_len = next_base - HOLE_640K_END; +} + /* Cut out the ACPI_PCI hole */ if (mem_base <= pcms->below_4g_mem_size && next_base > pcms->below_4g_mem_size) { -- 2.5.5
[Qemu-devel] [PATCH v6 2/4] hw/acpi-build: Make assignment statement of next_base easy to read
It may be hard to read the assignment statement of "next_base", so S/next_base += (1ULL << 32) - pcms->below_4g_mem_size; /next_base = mem_base + mem_len; ... for readability. No functionality change. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- hw/i386/acpi-build.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index a0cf3bf..48525a1 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2411,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) } mem_base = 1ULL << 32; mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +next_base = mem_base + mem_len; } numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, i - 1, -- 2.5.5
Re: [Qemu-devel] [PATCH v5 1/3] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM
Hi, Eduardo At 09/01/2017 05:36 AM, Eduardo Habkost wrote: On Thu, Aug 31, 2017 at 08:04:26PM +0800, Dou Liyang wrote: From: Eduardo Habkost <ehabk...@redhat.com> Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node 0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by cut out the 640K hole in the same way the PCI 4G hole does. Also do some cleanup. Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- hw/i386/acpi-build.c | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..48525a1 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,6 +2318,9 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +#define HOLE_640K_START (640 * 1024) +#define HOLE_640K_END (1024 * 1024) + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { @@ -2373,17 +2376,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) next_base = 0; numa_start = table_data->len; -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; for (i = 1; i < pcms->numa_nodes + 1; ++i) { mem_base = next_base; mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; -} next_base = mem_base + mem_len; +/* Cut out the 640K hole */ +if (mem_base <= HOLE_640K_START && +next_base > HOLE_640K_START) { +mem_len -= next_base - HOLE_640K_START; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i - 1, + MEM_AFFINITY_ENABLED); +} + +/* Check for the rare case: 640K < RAM < 1M */ +if (next_base <= HOLE_640K_END) { +next_base = HOLE_640K_END; +continue; +} +mem_base = HOLE_640K_END; +mem_len = next_base - HOLE_640K_END; +} + /* Cut out the ACPI_PCI hole */ if (mem_base <= pcms->below_4g_mem_size && next_base > pcms->below_4g_mem_size) { @@ -2395,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) } mem_base = 1ULL << 32; mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +next_base = mem_base + mem_len; Is this extra change intentional? Yes, it is, Just for readability. :-) I find the code more readable with it, but it should go in a separate patch because it is unrelated to the bug fix. Indeed, I will split it out. Thanks, dou.
Re: [Qemu-devel] [PATCH v4 1/2] hw/acpi-build: Fix SRAT memory building when there is no memory in node0
Hi Eduardo, At 08/31/2017 06:38 PM, Dou Liyang wrote: Hi Eduardo, [...] +continue; } Now the nodes will be out of order, if node 0 has no RAM. Why? Because the code parsed the other node with RAM first, then parsed the node 0. Why not implement this in the same way the PCI 4GB hole is already implemented. e.g.: Indeed, it is better and more refined. I will do it in the next version. #define HOLE_640K_START (640 * 1024) #define HOLE_640K_END (1024 * 1024) for (node = 0; node < pcms->numa_nodes; node++) { mem_base = next_base; mem_len = pcms->node_mem[node]; next_base = mem_base + mem_len; /* Cut out the 640K hole */ if (mem_base <= HOLE_640K_START && next_base > HOLE_640K_START) { mem_len -= next_base - HOLE_640K; if (mem_len > 0) { numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, node, MEM_AFFINITY_ENABLED); } mem_base = HOLE_640K_END; /* ... */ BTW, in case 0 |---|---||--- | | || mem_base 640K next_base 1M I wanna add a check here, /* Check for the rare case: 640K < RAM < 1M */ if (next_base <= HOLE_640K_END) { next_base = HOLE_640K_END; continue; } mem_base = HOLE_640K_END; mem_len = next_base - HOLE_640K_END; I guess no one would set a node with this less RAM, So, Is that necessary? I post the V5 patches, and use your code. It may be not very clearly, for the detail, Please see the new version. Thanks, dou. Thanks, dou. } /* Cut out the ACPI_PCI hole */ if (mem_base <= pcms->below_4g_mem_size && next_base > pcms->below_4g_mem_size) { mem_len -= next_base - pcms->below_4g_mem_size; /* ... *] } numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, node, MEM_AFFINITY_ENABLED); /* ... */ } -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); +mem_base = build_srat_node_entry(table_data, pcms, i, +mem_base, pcms->node_mem[i]); } slots = (table_data->len - numa_start) / sizeof *numamem; for (; slots < pcms->numa_nodes + 2; slots++) { -- 2.5.5
[Qemu-devel] [PATCH v5 1/3] hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM
From: Eduardo Habkost <ehabk...@redhat.com> Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node 0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by cut out the 640K hole in the same way the PCI 4G hole does. Also do some cleanup. Signed-off-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- hw/i386/acpi-build.c | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..48525a1 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,6 +2318,9 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +#define HOLE_640K_START (640 * 1024) +#define HOLE_640K_END (1024 * 1024) + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { @@ -2373,17 +2376,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) next_base = 0; numa_start = table_data->len; -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; for (i = 1; i < pcms->numa_nodes + 1; ++i) { mem_base = next_base; mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; -} next_base = mem_base + mem_len; +/* Cut out the 640K hole */ +if (mem_base <= HOLE_640K_START && +next_base > HOLE_640K_START) { +mem_len -= next_base - HOLE_640K_START; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i - 1, + MEM_AFFINITY_ENABLED); +} + +/* Check for the rare case: 640K < RAM < 1M */ +if (next_base <= HOLE_640K_END) { +next_base = HOLE_640K_END; +continue; +} +mem_base = HOLE_640K_END; +mem_len = next_base - HOLE_640K_END; +} + /* Cut out the ACPI_PCI hole */ if (mem_base <= pcms->below_4g_mem_size && next_base > pcms->below_4g_mem_size) { @@ -2395,7 +2411,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) } mem_base = 1ULL << 32; mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +next_base = mem_base + mem_len; } numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, i - 1, -- 2.5.5
[Qemu-devel] [PATCH v5 2/3] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 5104 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 224 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes tests/bios-tables-test.c | 24 5 files changed, 24 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem diff --git a/tests/acpi-test-data/pc/DSDT.numamem b/tests/acpi-test-data/pc/DSDT.numamem new file mode 100644 index ..bc703d77ee48720af1b801c518b43511c95c852d GIT binary patch literal 5104 zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1 z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0 zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F; zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^ za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u z$a<-R7FR~o4s5?jj- <
[Qemu-devel] [PATCH v5 0/3] hw/acpi-build: Fix ACPI SRAT Memory Affinity building
v4 --> v5: - Replace the original way with Eduardo's method. - rewrite the testcase. - Drop the SLIT date - 2.11 develop tree is opened, So, Add the third patch for re-posting it. v3 --> v4: -add a new testcase. This patchset fixs an ACPI building bug which caused by no RAM in the first NUAM node. and also add a new testcase for the bug. Dou Liyang (2): ACPI/unit-test: Add a new testcase for RAM allocation in numa node NUMA: Replace MAX_NODES with nb_numa_nodes in for loop Eduardo Habkost (1): hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM hw/i386/acpi-build.c | 30 +++--- numa.c| 2 +- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 5104 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 224 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes tests/bios-tables-test.c | 24 7 files changed, 48 insertions(+), 8 deletions(-) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem -- 2.5.5
[Qemu-devel] [PATCH v5 3/3] NUMA: Replace MAX_NODES with nb_numa_nodes in for loop
In QEMU, the number of the NUMA nodes is determined by parse_numa_opts(). Then, QEMU uses it for iteration, for example: for (i = 0; i < nb_numa_nodes; i++) However, in memory_region_allocate_system_memory(), it uses MAX_NODES not nb_numa_nodes. So, replace MAX_NODES with nb_numa_nodes to keep code consistency and reduce the loop times. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> Reviewed-by: Igor Mammedov <imamm...@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4...@amsat.org> --- numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/numa.c b/numa.c index e32af04..5f2916d 100644 --- a/numa.c +++ b/numa.c @@ -567,7 +567,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, } memory_region_init(mr, owner, name, ram_size); -for (i = 0; i < MAX_NODES; i++) { +for (i = 0; i < nb_numa_nodes; i++) { uint64_t size = numa_info[i].node_mem; HostMemoryBackend *backend = numa_info[i].node_memdev; if (!backend) { -- 2.5.5
Re: [Qemu-devel] [PATCH v4 1/2] hw/acpi-build: Fix SRAT memory building when there is no memory in node0
Hi Eduardo, [...] +continue; } Now the nodes will be out of order, if node 0 has no RAM. Why? Because the code parsed the other node with RAM first, then parsed the node 0. Why not implement this in the same way the PCI 4GB hole is already implemented. e.g.: Indeed, it is better and more refined. I will do it in the next version. #define HOLE_640K_START (640 * 1024) #define HOLE_640K_END (1024 * 1024) for (node = 0; node < pcms->numa_nodes; node++) { mem_base = next_base; mem_len = pcms->node_mem[node]; next_base = mem_base + mem_len; /* Cut out the 640K hole */ if (mem_base <= HOLE_640K_START && next_base > HOLE_640K_START) { mem_len -= next_base - HOLE_640K; if (mem_len > 0) { numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, node, MEM_AFFINITY_ENABLED); } mem_base = HOLE_640K_END; /* ... */ BTW, in case 0 |---|---||--- | | || mem_base 640K next_base 1M I wanna add a check here, /* Check for the rare case: 640K < RAM < 1M */ if (next_base <= HOLE_640K_END) { next_base = HOLE_640K_END; continue; } mem_base = HOLE_640K_END; mem_len = next_base - HOLE_640K_END; I guess no one would set a node with this less RAM, So, Is that necessary? Thanks, dou. } /* Cut out the ACPI_PCI hole */ if (mem_base <= pcms->below_4g_mem_size && next_base > pcms->below_4g_mem_size) { mem_len -= next_base - pcms->below_4g_mem_size; /* ... *] } numamem = acpi_data_push(table_data, sizeof *numamem); build_srat_memory(numamem, mem_base, mem_len, node, MEM_AFFINITY_ENABLED); /* ... */ } -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); +mem_base = build_srat_node_entry(table_data, pcms, i, +mem_base, pcms->node_mem[i]); } slots = (table_data->len - numa_start) / sizeof *numamem; for (; slots < pcms->numa_nodes + 2; slots++) { -- 2.5.5
[Qemu-devel] [PATCH v5 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- v4 --> v5: - rewrite the testcase. - Drop the SLIT date tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 5104 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 224 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 7788 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 224 bytes tests/bios-tables-test.c | 24 5 files changed, 24 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem diff --git a/tests/acpi-test-data/pc/DSDT.numamem b/tests/acpi-test-data/pc/DSDT.numamem new file mode 100644 index ..bc703d77ee48720af1b801c518b43511c95c852d GIT binary patch literal 5104 zcmb7IZExE~5}qX~n_5aXWizjC8^LKVXmL$b76k9SQ%DEh9oG8b|DMfK1 z(gvt&6i`^ERG;xE-sw(qx*XF!z}jjPX%ajXzq zw)Zd3`{YZHwS3rmsXyOp`CsgNTR@<vTR!JB&=;FdvFUafY_pP4o8^j?D166dwOO$0 zpf)!u7SpS0h$*RMyVMXMh9Fd<8)dsug#^HNKWm`4@0V&;+OAv8v~jEHGz#F; zjOWu->obq*UN7;@fd>bkHW>7wH}lOw3;x+kz<+;?^`Xya)uvf@QYk*O7srT^929^Z z(63x0Z^@d%=E?%?G|FWMJruoz0sP@PWP*8-tlj#VIsH94;*3Ze}dmP>U!{4qE z3%v<Yz7ErCcB#hT4#;MK2C`YiFWtO!T^5Fk#KdZ<t6LE}4dlMgE@WXI7XGKIqaS zuod<Mo`5v918(>BYS9|OW!Z=at1=2Sc+mq77{0X)+xWdGVZ3hlYIS{M^y2LNdbhD6aE$wxA%i+Vt_Of<O(8ZM1^ za7QC~q$3G=ROHdi>TwcvY}08l4PJ2-Q=9}7b8sV#4=e3<oYGx9is~!MD2)WOm?_;w zMX^?`GffW_B$+nY%#ueKv-Id(SBn<|Bo0R?zdj#~ldzGg9px5vS6`T4h zr?fXL-n8ot@aLqC=R@mdG5);8JN{=aF2R<+`=qr6T!x1|{oo;j;&^P<YO5A(_uf8u z$a<-R7FR~o4s5?jj- <
Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
Hi Igor, At 08/24/2017 08:33 PM, Igor Mammedov wrote: > test_acpi_one(" -numa node -numa node,mem=128", ); > > but, the DSDT didn't match the default one. because, if we support > NUMA, the DSDT will give us "_PXM" to map the CPU to node. Ok, looks like you'll have to include your variant of DSDT along with SRAT Yes, got it. I will modify and post this patch first. Thanks, dou.
Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
Hi Igor, At 08/24/2017 01:47 AM, Igor Mammedov wrote: On Wed, 23 Aug 2017 21:35:29 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: Hi Igor, At 08/23/2017 08:45 PM, Igor Mammedov wrote: On Wed, 23 Aug 2017 20:12:51 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: Hi Igor, At 08/23/2017 04:40 PM, Igor Mammedov wrote: On Tue, 22 Aug 2017 11:24:10 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 6463 bytes tests/acpi-test-data/pc/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 264 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes tests/bios-tables-test.c | 30 ++ 7 files changed, 30 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SLIT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SLIT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem considering only SRAT table has been changed and the other tables match with default blobs, I'd suggest to keep only Our testcase is: +test_acpi_one(" -m 128,slots=3,maxmem=1G" + " -numa node -numa node,mem=128" + " -numa dist,src=0,dst=1,val=21", + ); The DSDT and SLIT don't match with default blobs. do you actually need SLIT table /i.e. -numa dist/ for test at all? it looks not relevant for the test case at the hand, I'd suggest to drop '-numa dist' option for the test. OK, Got it, will drop '-numa dist' option in next version. So, they can't be dropped. I wonder what's changed, could you post DSDT diff here? Just like memory hot-plug cases, when we use the '-m 128 128,slots=3,maxmem=1G' option, As the ACPI spec said, There may be some Memory Device in the DSDT table. for your case '-numa node -numa node,mem=128', there is no need in enabling memory hotplug. Thank you very much for your kind explanation. Yes, I understood. If I recall it correctly the default memory for x86 is 128Mb, hence removing "-m" would probably make DSDT match default one. Yes, I removed the "-m": test_acpi_one(" -numa node -numa node,mem=128", ); but, the DSDT didn't match the default one. because, if we support NUMA, the DSDT will give us "_PXM" to map the CPU to node. --- a/tmp/asl-7QO54Y.dsl +++ b/tmp/asl-EWM54Y.dsl @@ -5,13 +5,13 @@ * * Disassembling to symbolic ASL+ operators * - * Disassembly of tests/acpi-test-data/pc/DSDT, Thu Aug 24 09:20:29 2017 + * Disassembly of /tmp/aml-1YM54Y, Thu Aug 24 09:20:29 2017 * * Original Table Header: * Signature"DSDT" - * Length 0x13EA (5098) + * Length 0x13F0 (5104) * Revision 0x01 32-bit table (V1), no 64-bit math support - * Checksum 0x78 + * Checksum 0x13 * OEM ID "BOCHS " * OEM Table ID "BXPCDSDT" * OEM Revision 0x0001 (1) @@ -783,6 +783,8 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPCDSDT", 0x0001) { COST (Zero, Arg0, Arg1, Arg2) } + +Name (_PXM, Zero) // _PXM: Device Proximity } } } Thanks, dou. [...]
Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
Hi Eduardo, At 08/24/2017 01:25 AM, Eduardo Habkost wrote: On Wed, Aug 23, 2017 at 09:35:29PM +0800, Dou Liyang wrote: Hi Igor, At 08/23/2017 08:45 PM, Igor Mammedov wrote: On Wed, 23 Aug 2017 20:12:51 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: Hi Igor, At 08/23/2017 04:40 PM, Igor Mammedov wrote: On Tue, 22 Aug 2017 11:24:10 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 6463 bytes tests/acpi-test-data/pc/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 264 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes tests/bios-tables-test.c | 30 ++ 7 files changed, 30 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SLIT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SLIT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem considering only SRAT table has been changed and the other tables match with default blobs, I'd suggest to keep only Our testcase is: +test_acpi_one(" -m 128,slots=3,maxmem=1G" + " -numa node -numa node,mem=128" + " -numa dist,src=0,dst=1,val=21", + ); The DSDT and SLIT don't match with default blobs. do you actually need SLIT table /i.e. -numa dist/ for test at all? it looks not relevant for the test case at the hand, I'd suggest to drop '-numa dist' option for the test. OK, Got it, will drop '-numa dist' option in next version. So, they can't be dropped. I wonder what's changed, could you post DSDT diff here? Just like memory hot-plug cases, when we use the '-m 128,slots=3,maxmem=1G' option, As the ACPI spec said, There may be some Memory Device in the DSDT table. Do you really need to use -m 128,slots=3,maxmem=1G to test your bug fix? I was wrong, As the default memory for x86 is 128Mb, I will remove this option to make one case just do one thing. Thanks, dou.
Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
Hi Igor, At 08/23/2017 08:45 PM, Igor Mammedov wrote: On Wed, 23 Aug 2017 20:12:51 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: Hi Igor, At 08/23/2017 04:40 PM, Igor Mammedov wrote: On Tue, 22 Aug 2017 11:24:10 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 6463 bytes tests/acpi-test-data/pc/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 264 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes tests/bios-tables-test.c | 30 ++ 7 files changed, 30 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SLIT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SLIT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem considering only SRAT table has been changed and the other tables match with default blobs, I'd suggest to keep only Our testcase is: +test_acpi_one(" -m 128,slots=3,maxmem=1G" + " -numa node -numa node,mem=128" + " -numa dist,src=0,dst=1,val=21", + ); The DSDT and SLIT don't match with default blobs. do you actually need SLIT table /i.e. -numa dist/ for test at all? it looks not relevant for the test case at the hand, I'd suggest to drop '-numa dist' option for the test. OK, Got it, will drop '-numa dist' option in next version. So, they can't be dropped. I wonder what's changed, could you post DSDT diff here? Just like memory hot-plug cases, when we use the '-m 128,slots=3,maxmem=1G' option, As the ACPI spec said, There may be some Memory Device in the DSDT table. (to get diff run 'make V=1 check' without DSDT.numamem being present) diff --git a/asl-YJ034Y.dsl b/asl-JLX34Y.dsl index c7b187b..6cd9e5d 100644 --- a/asl-YJ034Y.dsl +++ b/asl-JLX34Y.dsl @@ -5,13 +5,13 @@ * * Disassembling to symbolic ASL+ operators * - * Disassembly of tests/acpi-test-data/pc/DSDT, Wed Aug 23 21:22:56 2017 + * Disassembly of /tmp/aml-8IX34Y, Wed Aug 23 21:22:56 2017 * * Original Table Header: * Signature"DSDT" - * Length 0x13EA (5098) + * Length 0x193F (6463) * Revision 0x01 32-bit table (V1), no 64-bit math support - * Checksum 0x78 + * Checksum 0x7B * OEM ID "BOCHS " * OEM Table ID "BXPCDSDT" * OEM Revision 0x0001 (1) @@ -783,6 +783,8 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPCDSDT", 0x0001) { COST (Zero, Arg0, Arg1, Arg2) } + +Name (_PXM, Zero) // _PXM: Device Proximity } } } @@ -792,6 +794,310 @@ DefinitionBlock ("", "DSDT", 1, "BOCHS ", "BXPCDSDT", 0x0001) \_SB.CPUS.CSCN () } +Device (\_SB.PCI0.MHPD) +{ +Name (_HID, "PNP0A06" /* Generic Container Device */) // _HID: Hardware ID +Name (_UID, "Memory hotplug resources") // _UID: Unique ID +Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings +{ +IO (Decode16, +0x0A00, // Range Minimum +0x0A00, // Range Maximum +0x00, // Alignment +0x18, // Length +) +}) +OperationRegion (HPMR, SystemIO, 0x0A00, 0x18) +} + +Device (\_SB.MHPC) +{ +Name (_HID, "PNP0A06" /* Generic Container Device */) // _HID: Hardware ID +Name (_UID, "DIMM devices") // _UID: Unique ID +Name (MDNR, 0x03) +Field (\_SB.PCI0.MHPD.HPMR, DWordAcc, NoLock, Preserve) +{ +MRBL, 32, +MRBH, 32, +MRLL, 32, +MRLH, 32, +MPX,32 +} + +Field (\_SB.PCI0.MHPD.HPMR, ByteAcc, NoLock, WriteAsZeros) +{ +
Re: [Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
Hi Igor, At 08/23/2017 04:40 PM, Igor Mammedov wrote: On Tue, 22 Aug 2017 11:24:10 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 6463 bytes tests/acpi-test-data/pc/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 264 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes tests/bios-tables-test.c | 30 ++ 7 files changed, 30 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SLIT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SLIT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem considering only SRAT table has been changed and the other tables match with default blobs, I'd suggest to keep only Our testcase is: +test_acpi_one(" -m 128,slots=3,maxmem=1G" + " -numa node -numa node,mem=128" + " -numa dist,src=0,dst=1,val=21", + ); The DSDT and SLIT don't match with default blobs. So, they can't be dropped. Thanks, dou. tests/acpi-test-data/[pc|q35]/SRAT.numamem in this patch and drop the rest of *.numamem tables as test case should fallback to default tables when .numamem variant doesn't exists
Re: [Qemu-devel] [PATCH v4 0/2] hw/acpi-build: Fix ACPI SRAT Memory Affinity building
Hi Michael, At 08/23/2017 10:48 AM, Michael S. Tsirkin wrote: On Tue, Aug 22, 2017 at 11:24:08AM +0800, Dou Liyang wrote: V3 --> v4: -add a new testcase. This patchset fixs an ACPI building bug which caused by no RAM in the first NUAM node. and also add a new testcase for the bug. thanks! Pls remember to ping or repost after release is out. OK, I will. Thanks, dou. Dou Liyang (2): hw/acpi-build: Fix SRAT memory building when there is no memory in node0 ACPI/unit-test: Add a new testcase for RAM allocation in numa node hw/i386/acpi-build.c | 78 ++ tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 6463 bytes tests/acpi-test-data/pc/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 264 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes tests/bios-tables-test.c | 30 + 8 files changed, 80 insertions(+), 28 deletions(-) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SLIT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SLIT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem -- 2.5.5
[Qemu-devel] [PATCH] NUMA: Replace MAX_NODES with nb_numa_nodes in for loop
In QEMU, the number of the NUMA nodes is determined by parse_numa_opts(). Then, QEMU uses it for iteration, for example: for (i = 0; i < nb_numa_nodes; i++) However, in memory_region_allocate_system_memory(), it uses MAX_NODES not nb_numa_nodes. So, replace MAX_NODES with nb_numa_nodes to keep code consistency and reduce the loop times. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/numa.c b/numa.c index e32af04..5f2916d 100644 --- a/numa.c +++ b/numa.c @@ -567,7 +567,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner, } memory_region_init(mr, owner, name, ram_size); -for (i = 0; i < MAX_NODES; i++) { +for (i = 0; i < nb_numa_nodes; i++) { uint64_t size = numa_info[i].node_mem; HostMemoryBackend *backend = numa_info[i].node_memdev; if (!backend) { -- 2.5.5
[Qemu-devel] [PATCH v4 1/2] hw/acpi-build: Fix SRAT memory building when there is no memory in node0
Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by replace the node0 with the first node which has memory on it. Add a new function for each node. Also do some cleanup. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- hw/i386/acpi-build.c | 78 +--- 1 file changed, 50 insertions(+), 28 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..f93d712 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +static uint64_t +build_srat_node_entry(GArray *table_data, PCMachineState *pcms, +int i, uint64_t mem_base, uint64_t mem_len) +{ +AcpiSratMemoryAffinity *numamem; +uint64_t next_base; + +next_base = mem_base + mem_len; + +/* Cut out the ACPI_PCI hole */ +if (mem_base <= pcms->below_4g_mem_size && +next_base > pcms->below_4g_mem_size) { +mem_len -= next_base - pcms->below_4g_mem_size; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +} +mem_base = 1ULL << 32; +mem_len = next_base - pcms->below_4g_mem_size; +next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +return next_base; +} + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { AcpiSystemResourceAffinityTable *srat; AcpiSratMemoryAffinity *numamem; -int i; +int i, node; int srat_start, numa_start, slots; -uint64_t mem_len, mem_base, next_base; +uint64_t mem_len, mem_base; MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine); PCMachineState *pcms = PC_MACHINE(machine); @@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) /* the memory map is a bit tricky, it contains at least one hole * from 640k-1M and possibly another one from 3.5G-4G. */ -next_base = 0; + numa_start = table_data->len; -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; -for (i = 1; i < pcms->numa_nodes + 1; ++i) { -mem_base = next_base; -mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; +/* get the first node which has memory and map the hole from 640K-1M */ +for (node = 0; node < pcms->numa_nodes; node++) { +if (pcms->node_mem[node] != 0) { +break; } -next_base = mem_base + mem_len; - -/* Cut out the ACPI_PCI hole */ -if (mem_base <= pcms->below_4g_mem_size && -next_base > pcms->below_4g_mem_size) { -mem_len -= next_base - pcms->below_4g_mem_size; -if (mem_len > 0) { -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); -} -mem_base = 1ULL << 32; -mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, 0, 640 * 1024, node, MEM_AFFINITY_ENABLED); + +/* map the rest of memory from 1M */ +mem_base = 1024 * 1024; +mem_len = pcms->node_mem[node] - mem_base; +mem_base = build_srat_node_entry(table_data, pcms, node, +mem_base, mem_len); + +for (i = 0; i < pcms->numa_nodes; i++) { +if (i == node) { +continue; } -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, -
[Qemu-devel] [PATCH v4 0/2] hw/acpi-build: Fix ACPI SRAT Memory Affinity building
V3 --> v4: -add a new testcase. This patchset fixs an ACPI building bug which caused by no RAM in the first NUAM node. and also add a new testcase for the bug. Dou Liyang (2): hw/acpi-build: Fix SRAT memory building when there is no memory in node0 ACPI/unit-test: Add a new testcase for RAM allocation in numa node hw/i386/acpi-build.c | 78 ++ tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 6463 bytes tests/acpi-test-data/pc/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 264 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes tests/bios-tables-test.c | 30 + 8 files changed, 80 insertions(+), 28 deletions(-) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SLIT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SLIT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem -- 2.5.5
[Qemu-devel] [PATCH v4 2/2] ACPI/unit-test: Add a new testcase for RAM allocation in numa node
As QEMU supports the memory-less node, it is possible that there is no RAM in the first numa node(also be called as node0). eg: ... \ -m 128,slots=3,maxmem=1G \ -numa node -numa node,mem=128M \ But, this makes it hard for QEMU to build a known-to-work ACPI SRAT table. Only fixing it is not enough. Add a testcase for this situation to make sure the ACPI table is correct for guest. Suggested-by: Eduardo Habkost <ehabk...@redhat.com> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- tests/acpi-test-data/pc/DSDT.numamem | Bin 0 -> 6463 bytes tests/acpi-test-data/pc/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/pc/SRAT.numamem | Bin 0 -> 264 bytes tests/acpi-test-data/q35/DSDT.numamem | Bin 0 -> 9147 bytes tests/acpi-test-data/q35/SLIT.numamem | Bin 0 -> 48 bytes tests/acpi-test-data/q35/SRAT.numamem | Bin 0 -> 264 bytes tests/bios-tables-test.c | 30 ++ 7 files changed, 30 insertions(+) create mode 100644 tests/acpi-test-data/pc/DSDT.numamem create mode 100644 tests/acpi-test-data/pc/SLIT.numamem create mode 100644 tests/acpi-test-data/pc/SRAT.numamem create mode 100644 tests/acpi-test-data/q35/DSDT.numamem create mode 100644 tests/acpi-test-data/q35/SLIT.numamem create mode 100644 tests/acpi-test-data/q35/SRAT.numamem diff --git a/tests/acpi-test-data/pc/DSDT.numamem b/tests/acpi-test-data/pc/DSDT.numamem new file mode 100644 index ..53f6d5824359ff0ca20179b19b3b5cb79f017f82 GIT binary patch literal 6463 zcmcgw-EJGl6`tiTrR9*4meR(St)wR8{4{A|^T&}Bv}jH4QX(xbwdPW;i#5unETdGA z)(a(G22h;b1+0tqjy8C0KSKKmd4%*8QZ(^Z)bH$aD2CDk$wf;*t2uMN{mwZv zXU@#5>6p#moMTMdFKrkVCsVp*8z%ZB#u+k|NOp)9N$)Jr#N!8yp zOPg!b-#Xr3J@3QMJgM!ottZ-}t+xN^LvM=_=>C?^IW@H9mQ!lE-6h+oX4O`uYNm=` zaanB@%?49jn^jZNEH%WG)rwti3XlX4)NrF>H!YT8?5ppSOmQD*Brn`7*UgOGFk2aY zrR6k>%%>jDr>^$L9@o5n>dT(TdS3GAXu$fjU-sJUFYOfj*MH=%V;?1G@m_8 zAYza+g|R)Ry>f}XA$Q~<w5crysnFLqtT)~JjexzNCkHHM3>J(cc4g*WLD_JAP zJq~wY{rx3kp*I<;TxQcXyIhypJ4`l;)R2u5{%OXA%d#*`Y;O0hM$-UkIAlo-7Wuo# zUs#iT<})p}%%nAGm+i9H)E;xYSJzAC2rkQdA{doXpuvkC^O%IUw%IoRFUJtC+kMU2 z*c`n$w=nsl%HzvSBVbWoHI30EP7gg=;)`q2H}W?!Q`V<nJ(0%h%W=UB} z*{f!CKBF-n=NmjL@n<~7U%}rzU*bpnef}+<<Eq+PHOuL<T~|HUA<-Cx)2=Ezrt z>5Jji(~x@oDNOf(Jyu3BYkX!+`bhprR@LQ$z@M^WY*;xlsBtOGKtV$j5=HY|el9b0 zSqo@Zi6%cm!($^J%xEM}?0F14DtNStdYnWDw&66TBzwwLq%a+)hnaB?obq)n zMfE9VRFVXvlr3L}qExpUQc?>QBp9Xorj>D}Qq5Rr`YX0kz8-Tgl5}+BcSz~)Njr>Q zcVDCK0n_SOLZz0r?fllk+p1J7seC#A%rHb3w`Zu1!1^7nV!Ta3%>PG9TY z0VsiGQ>(9=Y`gz*?~c?@_u5<;bvtbP@ytjn0+jCE;jvLqY1ku=dJMMhf3mm5HHL-D z9({@3 <
Re: [Qemu-devel] [PATCH v3] hw/acpi-build: Fix SRAT memory building when there is no memory in node0
Hi, Igor [...] Ok, I will do it right now. see commit 6b9c1dd2c for example of adding test case, currently test has base SRAT tables that is reused by all tests. When you create testcase that produces different SRAT table, 'make check' will print warnings that ables do not match you can add "V=1" env. var. 'make V=1 check' to get detailed diff of what's changed. also see how 'test_data.variant' is handled to get idea how to use it to differentiate specific test case vs shared ACPI tables baseline blobs. Thanks for teaching me that, it's very helpful to me. Except for adding a new case, can we except memhp testcase with node RAM check? just like following shows: diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index 564da45..c79ece4 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -788,7 +788,7 @@ static void test_acpi_q35_tcg_memhp(void) data.machine = MACHINE_Q35; data.variant = ".memhp"; test_acpi_one(" -m 128,slots=3,maxmem=1G" - " -numa node -numa node" + " -numa node -numa node,mem=128" " -numa dist,src=0,dst=1,val=21", ); free_test_data(); @@ -802,7 +802,7 @@ static void test_acpi_piix4_tcg_memhp(void) data.machine = MACHINE_PC; data.variant = ".memhp"; test_acpi_one(" -m 128,slots=3,maxmem=1G" - " -numa node -numa node" + " -numa node -numa node,mem=128" " -numa dist,src=0,dst=1,val=21", ); free_test_data(); Thanks, dou.
Re: [Qemu-devel] [PATCH v3] hw/acpi-build: Fix SRAT memory building when there is no memory in node0
Hi Eduardo, At 08/19/2017 12:48 AM, Eduardo Habkost wrote: On Thu, Aug 17, 2017 at 01:58:40PM +0800, Dou Liyang wrote: Hi Igor, I tested this patch with following guests: 1. RHEL 6.5 with Linux 2.6.32 2. RHEL 7.0 with Linux 3.10.0 3. Fedora 23 with Linux 4.13.0-rc5 4. window 2003 service 5. window 7 6. window 10 What's the command-line(s) you have tested with each OS? Have I am sorry, one of the command-lines for the node0-without-RAM: ./x86_64-softmmu/qemu-system-x86_64 \ -hda /home/douly/image/rhel6.5.qcow2 \ -m 2G,slots=4,maxmem=32G \ -enable-kvm \ -smp 2,maxcpus=8,sockets=2,cores=2,threads=2\ -object memory-backend-ram,id=mem0,size=1024M \ -object memory-backend-ram,id=mem1,size=1024M \ -object memory-backend-ram,id=mem2,size=1024M \ -object memory-backend-ram,id=mem3,size=1024M \ -device pc-dimm,id=dimm0,memdev=mem0,slot=0,node=0 \ -device pc-dimm,id=dimm1,memdev=mem1,slot=1,node=1 \ -device pc-dimm,id=dimm2,memdev=mem2,slot=2,node=2 \ -device pc-dimm,id=dimm3,memdev=mem3,slot=3,node=3 \ -device qemu64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0 \ -device qemu64-x86_64-cpu,id=cpu3,socket-id=0,core-id=1,thread-id=1 \ -device qemu64-x86_64-cpu,id=cpu4,socket-id=1,core-id=0,thread-id=0 \ -device qemu64-x86_64-cpu,id=cpu5,socket-id=1,core-id=0,thread-id=1 \ -device qemu64-x86_64-cpu,id=cpu6,socket-id=1,core-id=1,thread-id=0 \ -device qemu64-x86_64-cpu,id=cpu7,socket-id=1,core-id=1,thread-id=1 \ -numa node,nodeid=0,cpus=0-1 \ -numa node,mem=2G,nodeid=1,cpus=2-3 \ -numa node,nodeid=2,cpus=4-5 \ -numa node,nodeid=3,cpus=6-7 \ -serial stdio \ -monitor telnet:127.0.0.1:,server,nowait \ # -kernel /home/douly/openSource/linux/arch/x86_64/boot/bzImage \ # -append "root=/dev/mapper/fedora_s3101490-root console=tty0 console=ttyS0 earlyprintk=ttyS0,115200n8 movable_node" \ # -initrd /home/douly/openSource/initramfs_image/4.13.0.img \ you tested both the node0-with-RAM and node0-without-RAM cases? Yes, I tested three cases for each guests: Case 1) all node with RAM. -numa node,nodeid=0,cpus=0-1 \ -numa node,nodeid=1,cpus=2-3 \ -numa node,nodeid=2,cpus=4-5 \ -numa node,nodeid=3,cpus=6-7 \ Case 2) the node0-with-RAM: -numa node,mem=2G,nodeid=0,cpus=0-1 \ -numa node,nodeid=1,cpus=2-3 \ -numa node,nodeid=2,cpus=4-5 \ -numa node,nodeid=3,cpus=6-7 \ Case 3) the node0-without-RAM: -numa node,nodeid=0,cpus=0-1 \ -numa node,mem=2G,nodeid=1,cpus=2-3 \ -numa node,nodeid=2,cpus=4-5 \ -numa node,nodeid=3,cpus=6-7 \ I would be interested in a demonstration that no bytes in the ACPI table are changed by this patch when there is some RAM Yes, you are right, there is no change if node 0 has some RAM. This patch just works for the node0-without-RAM situation. configured in node 0. (Is that already covered by our existing ACPI test cases?) I am not sure it, could you tell me where are the SRAT test cases? Seems there is no test cases about SRAT in bios-tables-test.c. A new test case in bios-tables-test.c for the bug you are fixing would be nice to have. Ok, I will do it right now. Thanks, dou. Thanks, dou. At 08/16/2017 09:45 AM, Dou Liyang wrote: Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by replace the node0 with the first node which has memory on it. Add a new function for each node. Also do some cleanup. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- V3 --> V2 -Modify the title V2 --> V1: -Fix a coding style problem Replace for (node = 0; node < pcms->numa_nodes && pcms->node_mem[node] == 0; node++); with for (node = 0; node < pcms->numa_nodes; node++) { if (pcms->node_mem[node] != 0) { break; } hw/i386/acpi-build.c | 78 +--- 1 file changed, 50 insertions(+), 28 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..f93d712 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_p
Re: [Qemu-devel] [PATCH v3] hw/acpi-build: Fix SRAT memory building when there is no memory in node0
Hi Igor, I tested this patch with following guests: 1. RHEL 6.5 with Linux 2.6.32 2. RHEL 7.0 with Linux 3.10.0 3. Fedora 23 with Linux 4.13.0-rc5 4. window 2003 service 5. window 7 6. window 10 Thanks, dou. At 08/16/2017 09:45 AM, Dou Liyang wrote: Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by replace the node0 with the first node which has memory on it. Add a new function for each node. Also do some cleanup. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- V3 --> V2 -Modify the title V2 --> V1: -Fix a coding style problem Replace for (node = 0; node < pcms->numa_nodes && pcms->node_mem[node] == 0; node++); with for (node = 0; node < pcms->numa_nodes; node++) { if (pcms->node_mem[node] != 0) { break; } hw/i386/acpi-build.c | 78 +--- 1 file changed, 50 insertions(+), 28 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..f93d712 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +static uint64_t +build_srat_node_entry(GArray *table_data, PCMachineState *pcms, +int i, uint64_t mem_base, uint64_t mem_len) +{ +AcpiSratMemoryAffinity *numamem; +uint64_t next_base; + +next_base = mem_base + mem_len; + +/* Cut out the ACPI_PCI hole */ +if (mem_base <= pcms->below_4g_mem_size && +next_base > pcms->below_4g_mem_size) { +mem_len -= next_base - pcms->below_4g_mem_size; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +} +mem_base = 1ULL << 32; +mem_len = next_base - pcms->below_4g_mem_size; +next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +return next_base; +} + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { AcpiSystemResourceAffinityTable *srat; AcpiSratMemoryAffinity *numamem; -int i; +int i, node; int srat_start, numa_start, slots; -uint64_t mem_len, mem_base, next_base; +uint64_t mem_len, mem_base; MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine); PCMachineState *pcms = PC_MACHINE(machine); @@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) /* the memory map is a bit tricky, it contains at least one hole * from 640k-1M and possibly another one from 3.5G-4G. */ -next_base = 0; + numa_start = table_data->len; -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; -for (i = 1; i < pcms->numa_nodes + 1; ++i) { -mem_base = next_base; -mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; +/* get the first node which has memory and map the hole from 640K-1M */ +for (node = 0; node < pcms->numa_nodes; node++) { +if (pcms->node_mem[node] != 0) { +break; } -next_base = mem_base + mem_len; - -/* Cut out the ACPI_PCI hole */ -if (mem_base <= pcms->below_4g_mem_size && -next_base > pcms->below_4g_mem_size) { -mem_len -= next_base - pcms->below_4g_mem_size; -if (mem_len > 0) { -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); -} -mem_base = 1ULL << 32; -mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(tabl
Re: [Qemu-devel] [PATCH v2] hw/acpi: Select an node with memory for mapping memory hole to
Hi Igor, At 08/16/2017 05:18 PM, Igor Mammedov wrote: On Wed, 16 Aug 2017 09:26:51 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by replace the node0 with the first node which has memory on it. Add a new function for each node. Also do some cleanup. It seems harmless but one never knows for sure, could you test it with different guests including old windows (up to XP)/ linux (2.6 stable kernel) versions? I have test it with the following guests: windows 7. linux 4.13-rc5. will test it in linux (2.6 and 3.10 stable kernel). Thanks, dou. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- V2 --> V1: -Fix a coding style problem Replace for (node = 0; node < pcms->numa_nodes && pcms->node_mem[node] == 0; node++); with for (node = 0; node < pcms->numa_nodes; node++) { if (pcms->node_mem[node] != 0) { break; } hw/i386/acpi-build.c | 78 +--- 1 file changed, 50 insertions(+), 28 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..f93d712 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +static uint64_t +build_srat_node_entry(GArray *table_data, PCMachineState *pcms, +int i, uint64_t mem_base, uint64_t mem_len) +{ +AcpiSratMemoryAffinity *numamem; +uint64_t next_base; + +next_base = mem_base + mem_len; + +/* Cut out the ACPI_PCI hole */ +if (mem_base <= pcms->below_4g_mem_size && +next_base > pcms->below_4g_mem_size) { +mem_len -= next_base - pcms->below_4g_mem_size; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +} +mem_base = 1ULL << 32; +mem_len = next_base - pcms->below_4g_mem_size; +next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +return next_base; +} + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { AcpiSystemResourceAffinityTable *srat; AcpiSratMemoryAffinity *numamem; -int i; +int i, node; int srat_start, numa_start, slots; -uint64_t mem_len, mem_base, next_base; +uint64_t mem_len, mem_base; MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine); PCMachineState *pcms = PC_MACHINE(machine); @@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) /* the memory map is a bit tricky, it contains at least one hole * from 640k-1M and possibly another one from 3.5G-4G. */ -next_base = 0; + numa_start = table_data->len; -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; -for (i = 1; i < pcms->numa_nodes + 1; ++i) { -mem_base = next_base; -mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; +/* get the first node which has memory and map the hole from 640K-1M */ +for (node = 0; node < pcms->numa_nodes; node++) { +if (pcms->node_mem[node] != 0) { +break; } -next_base = mem_base + mem_len; - -/* Cut out the ACPI_PCI hole */ -if (mem_base <= pcms->below_4g_mem_size && -next_base > pcms->below_4g_mem_size) { -mem_len -= next_base - pcms->below_4g_mem_size; -if (mem_len > 0) { -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); -} -mem_base = 1ULL << 32; -mem_len = nex
[Qemu-devel] [PATCH v3] hw/acpi-build: Fix SRAT memory building when there is no memory in node0
Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by replace the node0 with the first node which has memory on it. Add a new function for each node. Also do some cleanup. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- V3 --> V2 -Modify the title V2 --> V1: -Fix a coding style problem Replace for (node = 0; node < pcms->numa_nodes && pcms->node_mem[node] == 0; node++); with for (node = 0; node < pcms->numa_nodes; node++) { if (pcms->node_mem[node] != 0) { break; } hw/i386/acpi-build.c | 78 +--- 1 file changed, 50 insertions(+), 28 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..f93d712 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +static uint64_t +build_srat_node_entry(GArray *table_data, PCMachineState *pcms, +int i, uint64_t mem_base, uint64_t mem_len) +{ +AcpiSratMemoryAffinity *numamem; +uint64_t next_base; + +next_base = mem_base + mem_len; + +/* Cut out the ACPI_PCI hole */ +if (mem_base <= pcms->below_4g_mem_size && +next_base > pcms->below_4g_mem_size) { +mem_len -= next_base - pcms->below_4g_mem_size; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +} +mem_base = 1ULL << 32; +mem_len = next_base - pcms->below_4g_mem_size; +next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +return next_base; +} + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { AcpiSystemResourceAffinityTable *srat; AcpiSratMemoryAffinity *numamem; -int i; +int i, node; int srat_start, numa_start, slots; -uint64_t mem_len, mem_base, next_base; +uint64_t mem_len, mem_base; MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine); PCMachineState *pcms = PC_MACHINE(machine); @@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) /* the memory map is a bit tricky, it contains at least one hole * from 640k-1M and possibly another one from 3.5G-4G. */ -next_base = 0; + numa_start = table_data->len; -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; -for (i = 1; i < pcms->numa_nodes + 1; ++i) { -mem_base = next_base; -mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; +/* get the first node which has memory and map the hole from 640K-1M */ +for (node = 0; node < pcms->numa_nodes; node++) { +if (pcms->node_mem[node] != 0) { +break; } -next_base = mem_base + mem_len; - -/* Cut out the ACPI_PCI hole */ -if (mem_base <= pcms->below_4g_mem_size && -next_base > pcms->below_4g_mem_size) { -mem_len -= next_base - pcms->below_4g_mem_size; -if (mem_len > 0) { -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); -} -mem_base = 1ULL << 32; -mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, 0, 640 * 1024, node, MEM_AFFINITY_ENABLED); + +/* map the rest of memory from 1M */ +mem_base = 1024 * 1024; +mem_len = pcms->node_mem[node] - mem_base; +mem_base = build
[Qemu-devel] [PATCH v2] hw/acpi: Select an node with memory for mapping memory hole to
Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by replace the node0 with the first node which has memory on it. Add a new function for each node. Also do some cleanup. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- V2 --> V1: -Fix a coding style problem Replace for (node = 0; node < pcms->numa_nodes && pcms->node_mem[node] == 0; node++); with for (node = 0; node < pcms->numa_nodes; node++) { if (pcms->node_mem[node] != 0) { break; } hw/i386/acpi-build.c | 78 +--- 1 file changed, 50 insertions(+), 28 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..f93d712 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +static uint64_t +build_srat_node_entry(GArray *table_data, PCMachineState *pcms, +int i, uint64_t mem_base, uint64_t mem_len) +{ +AcpiSratMemoryAffinity *numamem; +uint64_t next_base; + +next_base = mem_base + mem_len; + +/* Cut out the ACPI_PCI hole */ +if (mem_base <= pcms->below_4g_mem_size && +next_base > pcms->below_4g_mem_size) { +mem_len -= next_base - pcms->below_4g_mem_size; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +} +mem_base = 1ULL << 32; +mem_len = next_base - pcms->below_4g_mem_size; +next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +return next_base; +} + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { AcpiSystemResourceAffinityTable *srat; AcpiSratMemoryAffinity *numamem; -int i; +int i, node; int srat_start, numa_start, slots; -uint64_t mem_len, mem_base, next_base; +uint64_t mem_len, mem_base; MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine); PCMachineState *pcms = PC_MACHINE(machine); @@ -2370,36 +2398,30 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) /* the memory map is a bit tricky, it contains at least one hole * from 640k-1M and possibly another one from 3.5G-4G. */ -next_base = 0; + numa_start = table_data->len; -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; -for (i = 1; i < pcms->numa_nodes + 1; ++i) { -mem_base = next_base; -mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; +/* get the first node which has memory and map the hole from 640K-1M */ +for (node = 0; node < pcms->numa_nodes; node++) { +if (pcms->node_mem[node] != 0) { +break; } -next_base = mem_base + mem_len; - -/* Cut out the ACPI_PCI hole */ -if (mem_base <= pcms->below_4g_mem_size && -next_base > pcms->below_4g_mem_size) { -mem_len -= next_base - pcms->below_4g_mem_size; -if (mem_len > 0) { -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); -} -mem_base = 1ULL << 32; -mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, 0, 640 * 1024, node, MEM_AFFINITY_ENABLED); + +/* map the rest of memory from 1M */ +mem_base = 1024 * 1024; +mem_len = pcms->node_mem[node] - mem_base; +mem_base = build_srat_node_entry(table_data,
Re: [Qemu-devel] [PATCH] numa: Add a check whether the node0 has memory or not
Hi Igor, At 08/15/2017 04:04 PM, Igor Mammedov wrote: On Tue, 15 Aug 2017 09:26:46 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: Hi Eduardo, At 08/14/2017 08:44 PM, Eduardo Habkost wrote: On Mon, Aug 14, 2017 at 06:11:11PM +0800, Dou Liyang wrote: Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M) firstly. Add a check in parse_numa_opts to avoid this situation. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- numa.c | 4 1 file changed, 4 insertions(+) diff --git a/numa.c b/numa.c index e32af04..1d6f73f 100644 --- a/numa.c +++ b/numa.c @@ -464,6 +464,10 @@ void parse_numa_opts(MachineState *ms) if (i == nb_numa_nodes) { assert(mc->numa_auto_assign_ram); mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size); +} else if (i != 0) { +error_report("The first NUMA node must have some memory" + " for building ACPI SART"); +exit(1); This doesn't belong to numa.c. numa.c is generic code, and the requirement you described is specific for PC. Anyway, adding this check would make existing VM configurations refuse to run after a QEMU upgrade. I suggest fixing the bug in the ACPI code instead. I see. If fixing the bug in the ACPI code, I have two solutions: 1). Add a check in build_srat(). If the first node has no memory, QEMU will exit. 2). Fix the initialization of memory affinity structure to cover this situation. Using the first node which has memory to deal with the memory hole. I prefer solution 2. what about you? I'd go for 2nd solution Yeah, I see. And I just sent the 2nd solution, waiting for your reply. Thanks, dou. Thanks, dou.
[Qemu-devel] [PATCH] hw/acpi: Select an node with memory for mapping memory hole to
Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by replace the node0 with the first node which has memory on it. Add a new function for each node. Also do some cleanup. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- hw/i386/acpi-build.c | 76 +--- 1 file changed, 48 insertions(+), 28 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 98dd424..e5f57d2 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2318,15 +2318,43 @@ build_tpm2(GArray *table_data, BIOSLinker *linker) (void *)tpm2_ptr, "TPM2", sizeof(*tpm2_ptr), 4, NULL, NULL); } +static uint64_t +build_srat_node_entry(GArray *table_data, PCMachineState *pcms, +int i, uint64_t mem_base, uint64_t mem_len) +{ +AcpiSratMemoryAffinity *numamem; +uint64_t next_base; + +next_base = mem_base + mem_len; + +/* Cut out the ACPI_PCI hole */ +if (mem_base <= pcms->below_4g_mem_size && +next_base > pcms->below_4g_mem_size) { +mem_len -= next_base - pcms->below_4g_mem_size; +if (mem_len > 0) { +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +} +mem_base = 1ULL << 32; +mem_len = next_base - pcms->below_4g_mem_size; +next_base += (1ULL << 32) - pcms->below_4g_mem_size; +} +numamem = acpi_data_push(table_data, sizeof *numamem); +build_srat_memory(numamem, mem_base, mem_len, i, + MEM_AFFINITY_ENABLED); +return next_base; +} + static void build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) { AcpiSystemResourceAffinityTable *srat; AcpiSratMemoryAffinity *numamem; -int i; +int i, node; int srat_start, numa_start, slots; -uint64_t mem_len, mem_base, next_base; +uint64_t mem_len, mem_base; MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *apic_ids = mc->possible_cpu_arch_ids(machine); PCMachineState *pcms = PC_MACHINE(machine); @@ -2370,36 +2398,28 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine) /* the memory map is a bit tricky, it contains at least one hole * from 640k-1M and possibly another one from 3.5G-4G. */ -next_base = 0; + numa_start = table_data->len; +/* get the first node which has memory and map the hole from 640K-1M */ +for (node = 0; +node < pcms->numa_nodes && pcms->node_mem[node] == 0; +node++); numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, 0, 640 * 1024, 0, MEM_AFFINITY_ENABLED); -next_base = 1024 * 1024; -for (i = 1; i < pcms->numa_nodes + 1; ++i) { -mem_base = next_base; -mem_len = pcms->node_mem[i - 1]; -if (i == 1) { -mem_len -= 1024 * 1024; -} -next_base = mem_base + mem_len; - -/* Cut out the ACPI_PCI hole */ -if (mem_base <= pcms->below_4g_mem_size && -next_base > pcms->below_4g_mem_size) { -mem_len -= next_base - pcms->below_4g_mem_size; -if (mem_len > 0) { -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); -} -mem_base = 1ULL << 32; -mem_len = next_base - pcms->below_4g_mem_size; -next_base += (1ULL << 32) - pcms->below_4g_mem_size; +build_srat_memory(numamem, 0, 640 * 1024, node, MEM_AFFINITY_ENABLED); + +/* map the rest of memory from 1M */ +mem_base = 1024 * 1024; +mem_len = pcms->node_mem[node] - mem_base; +mem_base = build_srat_node_entry(table_data, pcms, node, +mem_base, mem_len); + +for (i = 0; i < pcms->numa_nodes; i++) { +if (i == node) { +continue; } -numamem = acpi_data_push(table_data, sizeof *numamem); -build_srat_memory(numamem, mem_base, mem_len, i - 1, - MEM_AFFINITY_ENABLED); +mem_base = build_srat_node_ent
Re: [Qemu-devel] [PATCH] numa: Add a check whether the node0 has memory or not
Hi Eduardo, At 08/14/2017 08:44 PM, Eduardo Habkost wrote: On Mon, Aug 14, 2017 at 06:11:11PM +0800, Dou Liyang wrote: Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M) firstly. Add a check in parse_numa_opts to avoid this situation. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- numa.c | 4 1 file changed, 4 insertions(+) diff --git a/numa.c b/numa.c index e32af04..1d6f73f 100644 --- a/numa.c +++ b/numa.c @@ -464,6 +464,10 @@ void parse_numa_opts(MachineState *ms) if (i == nb_numa_nodes) { assert(mc->numa_auto_assign_ram); mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size); +} else if (i != 0) { +error_report("The first NUMA node must have some memory" + " for building ACPI SART"); +exit(1); This doesn't belong to numa.c. numa.c is generic code, and the requirement you described is specific for PC. Anyway, adding this check would make existing VM configurations refuse to run after a QEMU upgrade. I suggest fixing the bug in the ACPI code instead. I see. If fixing the bug in the ACPI code, I have two solutions: 1). Add a check in build_srat(). If the first node has no memory, QEMU will exit. 2). Fix the initialization of memory affinity structure to cover this situation. Using the first node which has memory to deal with the memory hole. I prefer solution 2. what about you? Thanks, dou.
Re: [Qemu-devel] [PATCH] numa: Add a check whether the node0 has memory or not
I'm sorry, forgot to cc Michael S. Tsirkin At 08/14/2017 06:11 PM, Dou Liyang wrote: Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M) firstly. Add a check in parse_numa_opts to avoid this situation. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- numa.c | 4 1 file changed, 4 insertions(+) diff --git a/numa.c b/numa.c index e32af04..1d6f73f 100644 --- a/numa.c +++ b/numa.c @@ -464,6 +464,10 @@ void parse_numa_opts(MachineState *ms) if (i == nb_numa_nodes) { assert(mc->numa_auto_assign_ram); mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size); +} else if (i != 0) { +error_report("The first NUMA node must have some memory" + " for building ACPI SART"); +exit(1); } numa_total = 0;
[Qemu-devel] [PATCH] numa: Add a check whether the node0 has memory or not
Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M) firstly. Add a check in parse_numa_opts to avoid this situation. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- numa.c | 4 1 file changed, 4 insertions(+) diff --git a/numa.c b/numa.c index e32af04..1d6f73f 100644 --- a/numa.c +++ b/numa.c @@ -464,6 +464,10 @@ void parse_numa_opts(MachineState *ms) if (i == nb_numa_nodes) { assert(mc->numa_auto_assign_ram); mc->numa_auto_assign_ram(mc, numa_info, nb_numa_nodes, ram_size); +} else if (i != 0) { +error_report("The first NUMA node must have some memory" + " for building ACPI SART"); +exit(1); } numa_total = 0; -- 2.5.5
Re: [Qemu-devel] Memory Hotplug : Does QEmu support cold-plugged memory?
Hi Igor, At 06/05/2017 04:45 PM, Igor Mammedov wrote: On Mon, 5 Jun 2017 15:53:02 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: Hi all, I want to boot up a system with some removable memory. So, i used '-object' and '-device' to cold-plug memory in QEmu as following : ./x86_64-softmmu/qemu-system-x86_64 \ ... -object memory-backend-ram,id=mem3,size=128M \ -device pc-dimm,id=dimm3,memdev=mem3 \ ... then i want to hot-remove this memory by 'device-del' and 'object-del'. But, unfortunately, I can't remove the memory device. And, in guest OS, I found the memory's removable feature is 0, But, in QEmu monitor with 'info memory-devices', I found the hotplugged is false and hotpluggable is true. It's issue in the guest side kernel. In short it's not guarantied that guest kernel would be able to hotremove DIMM module (even if hardware is capable of) since kernel might use its memory as not MOVABLE, so DIMM might be temporally or permanently pinned by kernel allocations. I see. There is ongoing refactoring/cleanup work upstream, that tries to make memhotplug behavior consistent/reasonable. If you are interested, I'd suggest to review/test changes that Michal works on upstream, to make sure that refactoring won't break your usecase. OK, I will. You can find the latest series Michal posted at: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1410289.html um, I found the patchset, and I am reading the discussions and test cases you gave in v2. Could you help me: 1, Can the ram based memory-backend be plugged like above? yes, it can be used as above 2, Does the 'info memory-devices' command show a wrong info? nope, command shows hotpluggable status the way hw (qemu) sees it 3, Can file memory backend(memory-backend-file) be cold-plugged? yes, it can. [...] Thanks a lot ;) Thanks, dou.
[Qemu-devel] Memory Hotplug : Does QEmu support cold-plugged memory?
Hi all, I want to boot up a system with some removable memory. So, i used '-object' and '-device' to cold-plug memory in QEmu as following : ./x86_64-softmmu/qemu-system-x86_64 \ ... -object memory-backend-ram,id=mem3,size=128M \ -device pc-dimm,id=dimm3,memdev=mem3 \ ... then i want to hot-remove this memory by 'device-del' and 'object-del'. But, unfortunately, I can't remove the memory device. And, in guest OS, I found the memory's removable feature is 0, But, in QEmu monitor with 'info memory-devices', I found the hotplugged is false and hotpluggable is true. Could you help me: 1, Can the ram based memory-backend be plugged like above? 2, Does the 'info memory-devices' command show a wrong info? 3, Can file memory backend(memory-backend-file) be cold-plugged? PS: here is my whole script: gdb --arg ./x86_64-softmmu/qemu-system-x86_64 \ -hda /image/fedora.img \ -m 256M,slots=4,maxmem=1G \ -enable-kvm \ -smp 2,maxcpus=16,sockets=2,cores=2,threads=2\ -object memory-backend-ram,id=mem1,size=128M \ -object memory-backend-ram,id=mem2,size=128M \ -object memory-backend-ram,id=mem3,size=128M \ -device pc-dimm,id=dimm3,memdev=mem3 \ -device qemu64-x86_64-cpu,id=cpu2,socket-id=0,core-id=1,thread-id=0 \ -device qemu64-x86_64-cpu,id=cpu3,socket-id=0,core-id=1,thread-id=1 \ -numa node,memdev=mem1,nodeid=0 \ -numa node,memdev=mem2,nodeid=1 \ -serial stdio \ -monitor telnet:127.0.0.1:,server,nowait \ -kernel /home/douly/openSource/orignal_linux/arch/x86_64/boot/bzImage \ -append "root=/dev/mapper/fedora_s3101490-root mem=256M movable_node nokaslr earlyprintk=serial,ttyS0,115200" \ -initrd /home/douly/openSource/initramfs_image/4.12.0-rc3.img \ Thanks, dou.
Re: [Qemu-devel] [PATCH v4 2/2] block/qapi: reduce the execution time of qmp_query_blockstats
Hi, Max At 02/01/2017 11:02 AM, Max Reitz wrote: On 15.01.2017 09:01, Dou Liyang wrote: In order to reduce the execution time, this patch optimize the qmp_query_blockstats(): Remove the next_query_bds function. Remove the bdrv_query_stats function. Remove some judgement sentence. The original qmp_query_blockstats calls next_query_bds to get the next objects in each loops. In the next_query_bds, it checks the query_nodes and blk. It also call bdrv_query_stats to get the stats, In the bdrv_query_stats, it checks blk and bs each times. This waste more times, which may stall the main loop a bit. And if the disk is too many and donot use the dataplane feature, this may affect the performance in main loop thread. This patch removes that two functions, and makes the structure clearly. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- block/qapi.c | 73 1 file changed, 29 insertions(+), 44 deletions(-) diff --git a/block/qapi.c b/block/qapi.c index bc622cd..45e9d47 100644 --- a/block/qapi.c +++ b/block/qapi.c @@ -456,23 +456,6 @@ static BlockStats *bdrv_query_bds_stats(const BlockDriverState *bs, return s; } -static BlockStats *bdrv_query_stats(BlockBackend *blk, -const BlockDriverState *bs, -bool query_backing) -{ -BlockStats *s; - -s = bdrv_query_bds_stats(bs, query_backing); - -if (blk) { -s->has_device = true; -s->device = g_strdup(blk_name(blk)); -bdrv_query_blk_stats(s->stats, blk); -} - -return s; -} - BlockInfoList *qmp_query_block(Error **errp) { BlockInfoList *head = NULL, **p_next = @@ -496,42 +479,44 @@ BlockInfoList *qmp_query_block(Error **errp) return head; } -static bool next_query_bds(BlockBackend **blk, BlockDriverState **bs, - bool query_nodes) -{ -if (query_nodes) { -*bs = bdrv_next_node(*bs); -return !!*bs; -} - -*blk = blk_next(*blk); -*bs = *blk ? blk_bs(*blk) : NULL; - -return !!*blk; -} - BlockStatsList *qmp_query_blockstats(bool has_query_nodes, bool query_nodes, Error **errp) { BlockStatsList *head = NULL, **p_next = -BlockBackend *blk = NULL; -BlockDriverState *bs = NULL; +BlockBackend *blk; +BlockDriverState *bs; /* Just to be safe if query_nodes is not always initialized */ -query_nodes = has_query_nodes && query_nodes; - -while (next_query_bds(, , query_nodes)) { -BlockStatsList *info = g_malloc0(sizeof(*info)); -AioContext *ctx = blk ? blk_get_aio_context(blk) - : bdrv_get_aio_context(bs); +if (has_query_nodes && query_nodes) { +for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(bs)) { +BlockStatsList *info = g_malloc0(sizeof(*info)); +AioContext *ctx = bdrv_get_aio_context(bs); -aio_context_acquire(ctx); -info->value = bdrv_query_stats(blk, bs, !query_nodes); -aio_context_release(ctx); +aio_context_acquire(ctx); +info->value = bdrv_query_bds_stats(bs, false); +aio_context_release(ctx); -*p_next = info; -p_next = >next; +*p_next = info; +p_next = >next; +} +} else { +for (blk = blk_next(NULL); blk; blk = blk_next(blk)) { +BlockStatsList *info = g_malloc0(sizeof(*info)); +AioContext *ctx = blk_get_aio_context(blk); +BlockStats *s; + +aio_context_acquire(ctx); +info->value = s = bdrv_query_bds_stats(blk_bs(blk), true); Superfluous assignment to info->value here... Yes, It is. +s->has_device = true; +s->device = g_strdup(blk_name(blk)); +bdrv_query_blk_stats(s->stats, blk); +aio_context_release(ctx); + +info->value = s; ...since it's written here anyway. I'll remove the first assignment when applying the patch to my tree. Thanks very much! Max +*p_next = info; +p_next = >next; +} } return head; Thanks, Liyang
[Qemu-devel] [PATCH v3] vl: Move the cpu_synchronize_all_post_init() after generic devices initialization
At the Qemu initialization, we call the cpu_synchronize_all_post_init() to synchronize All CPU states to KVM in the ./vl.c::main(). Currently, it is called before we initialize the CPUs, which is created by "-device" command and parsed by generic devices initialization, So, these CPUs may be ignored to synchronize. The patch moves the cpu_synchronize_all_post_init func after generic devices initialization to make sure that all the CPUs can be included. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- Change log: v2-> v3: 1. Rebase the patch. 2. Rewrite the log v1-> v2: 1. Split it from https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03354.html 2. Rewrite the log. vl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index 0b72b12..bceb835 100644 --- a/vl.c +++ b/vl.c @@ -4490,8 +4490,6 @@ int main(int argc, char **argv, char **envp) audio_init(); -cpu_synchronize_all_post_init(); - if (hax_enabled()) { hax_sync_vcpus(); } @@ -4517,6 +4515,8 @@ int main(int argc, char **argv, char **envp) exit(1); } +cpu_synchronize_all_post_init(); + numa_post_machine_init(); rom_reset_order_override(); -- 2.5.5
Re: [Qemu-devel] [PATCH v2] vl: Ensure the cpu_synchronize_all_post_init() in the appropriate location
Hi, Alex Sorry for late reply. At 01/27/2017 10:03 PM, Alex Bennée wrote: Dou Liyang <douly.f...@cn.fujitsu.com> writes: At the Qemu initialization, we call the cpu_synchronize_all_post_init() to synchronize All CPU states to KVM in the ./vl.c::main(). Currently, it is called before we initialize the CPUs, which created by "-device" command, So, these CPUs may be ignored to synchronize. The patch moves the numa_post_machine_init func in the appropriate location to make sure that all the CPUs has already been created when it is called. This doesn't match what the patch does (the function is cpu_synchronise_all_post_init) but I see another patch is already merged that moved the NUMA one. Oops, Yes, I make the mistake. I'm afraid you'll need to re-base anyway because it no longer applies cleanly. Yes, I will re-base it. Thanks, Liyang Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- Change log v1-> v2: 1. Split it from https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03354.html 2. Rewrite the log. vl.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index c643d3f..38269be 100644 --- a/vl.c +++ b/vl.c @@ -4547,8 +4547,6 @@ int main(int argc, char **argv, char **envp) audio_init(); -cpu_synchronize_all_post_init(); - numa_post_machine_init(); if (qemu_opts_foreach(qemu_find_opts("fw_cfg"), @@ -4571,6 +4569,9 @@ int main(int argc, char **argv, char **envp) device_init_func, NULL, NULL)) { exit(1); } + +cpu_synchronize_all_post_init(); + rom_reset_order_override(); /* Did we create any drives that we failed to create a device for? */ -- Alex Bennée
Re: [Qemu-devel] [PATCH v4 0/2] block/qapi: refactor and optimize the qmp_query_blockstats()
Hi, Markus Here is the non-RFC patch. Any suggestions are welcome. :) Thanks, Liyang. At 01/15/2017 04:01 PM, Dou Liyang wrote: Change log v3 -> v4: 1. Develop these into the non-RFC patches. 2. Fix some comments. 3. do declarations first. Change log v2 -> v3: 1. Remove the unnecessary code for the bdrv_next_node(). 2. Remove the change of the locking rules. Even if this change can improve the performance, but it may effect the consistency. The Qemu uses the qmp_query_blockstats() function to get the blockstats. As the function has been changed several times, it becomes more complex and longer. For the multi-disks guest, if we want to execute I/O operations and this function at the same time. we can use the dataplane feature to hold I/O performance does not drop. But, Normally without this feature, How to reduce the decline in performance? These patches refactor the qmp_query_blockstats() to make the code easier to follow, and shorter as follows: From: +--+ +-+ | 1| | 4. | |next_query_bds| |bdrv_query_bds_stats +---+ | | | | | +^-+ +-^---+ | | | | +-+--+ ++---+ | | 0. | | 2. | | |qmp_query_blockstats+-->bdrv_query_stats< || || ++ ++---+ | +-v---+ | 3. | |bdrv_query_blk_stats | | | +-+ To: +--+ | | +v---+ | +---> 3.| | +---+ | |bdrv_query_bds_stats+--+ | 1.+--+ || | + ++ |qmp_query_blockstats--+ | | | +---+ | ++ | | 2. | +--->| |bdrv_query_blk_stats| || ++ They also optimize the fuction by reducing its running time. 1. The function running time the time it takes(ns) in each requests: the disk numbers | 10| 500 - before these patches | 19429 | 667722 after these patches | 18536 | 627945 2. For the performance used the dd command likes this to test: dd if=date_1.dat of=date_2.dat conv=fsync oflag=direct bs=1k count=100k. the I/O performance is degraded(%) during the monitor: the disk numbers | 10| 500 - before these patches | 1.3 | 14.2 after these patches | 1.0 | 11.3 Dou Liyang (2): block/qapi: reduce the coupling between the bdrv_query_stats and bdrv_query_bds_stats block/qapi: reduce the execution time of qmp_query_blockstats block/qapi.c | 95 ++-- 1 file changed, 41 insertions(+), 54 deletions(-)
Re: [Qemu-devel] [RFC 05/13] pc: move pcms->possible_cpus init out of pc_cpus_init()
At 01/19/2017 01:13 AM, Igor Mammedov wrote: possible_cpus could be initialized earlier then cpu objects, s/then/than/ i.e. when -smp is parsed so move init code to possible_cpu_arch_ids() [...]
Re: [Qemu-devel] [PATCH 0/3] cpu: numa: Fix the mapping initialization of VCPUs and NUMA nodes
Hi, Eduardo At 01/19/2017 01:06 AM, Eduardo Habkost wrote: On Wed, Jan 18, 2017 at 09:26:36PM +0800, Dou Liyang wrote: Hi, All ** ERROR:/tmp/qemu-test/src/tests/vhost-user-test.c:668:test_migrate: assertion failed: (qdict_haskey(rsp, "return")) GTester: last random seed: R02Sf52546c4daff8087416f43fa7c146db8 ftruncate: Permission denied ftruncate: Permission denied qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: Assertion `cpu->cpu_index < max_cpus' failed. Broken pipe I don't know What's the meaning of this log ? Is the qemu-system-aarch64 can't recognize the qom/cpu.c:346: assert(cpu->cpu_index < max_cpus); This means the assert() line is being triggered for some reason, and cpu_index is >= max_cpus when we cpu_common_map_numa_node() gets called. We need to investigate why. I have investigated the reason why it is failed. Because not all targets(aarch64-linux-user, x86_64-linux-user ...) need to compile the vl.c(include the max_cpus) and numa.c, but the all may compile the ./qom/cpu.c. So, when we Link those targets, we may can't find the vl.o or numa.o that we want. Add "#ifdef CONFIG_NUMA" to fix it. +static void cpu_common_map_numa_node(CPUState *cpu) +{ +#ifdef CONFIG_NUMA +int i; + +assert(cpu->cpu_index < max_cpus); +for (i = 0; i < nb_numa_nodes; i++) { +if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) { +cpu->numa_node = i; +return; +} +} +#endif +} + And I am not sure if it is necessary to resend this patch for fixing the bug before Igor's patches is OK completely? Thanks, Liyang.
Re: [Qemu-devel] [PATCH 0/3] cpu: numa: Fix the mapping initialization of VCPUs and NUMA nodes
Hi, Igor, At 01/18/2017 09:46 PM, Igor Mammedov wrote: On Wed, 18 Jan 2017 20:40:04 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote: As we fixed a bug(Bug 1) in below links, Named "Method-A": https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03354.html Then, Eduardo gave us many suggests. Thanks very much! when we try them, we also find another bugs named "Bug 2". I have an alternative fix for both issues for which I've been writing cover letter when I saw this series. My fix series more intrusive though as it's goal isn't just to fix 'info numa' but rather switch away from cpu-index based mapping to socket/core/thread based mapping and stop using bitmaps for mapping. And as 'info numa' was getting in the way, I fixed both issues in a slightly different way. So pls wait a bit, once travis build is completed, I'll post series here. It doesn't matter. I am learning about your patches. I think my method is temporary. your fix is better than me. Thanks, Liyang.
[Qemu-devel] [PATCH v2] vl: Ensure the cpu_synchronize_all_post_init() in the appropriate location
At the Qemu initialization, we call the cpu_synchronize_all_post_init() to synchronize All CPU states to KVM in the ./vl.c::main(). Currently, it is called before we initialize the CPUs, which created by "-device" command, So, these CPUs may be ignored to synchronize. The patch moves the numa_post_machine_init func in the appropriate location to make sure that all the CPUs has already been created when it is called. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- Change log v1-> v2: 1. Split it from https://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03354.html 2. Rewrite the log. vl.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index c643d3f..38269be 100644 --- a/vl.c +++ b/vl.c @@ -4547,8 +4547,6 @@ int main(int argc, char **argv, char **envp) audio_init(); -cpu_synchronize_all_post_init(); - numa_post_machine_init(); if (qemu_opts_foreach(qemu_find_opts("fw_cfg"), @@ -4571,6 +4569,9 @@ int main(int argc, char **argv, char **envp) device_init_func, NULL, NULL)) { exit(1); } + +cpu_synchronize_all_post_init(); + rom_reset_order_override(); /* Did we create any drives that we failed to create a device for? */ -- 2.5.5
Re: [Qemu-devel] [PATCH 0/3] cpu: numa: Fix the mapping initialization of VCPUs and NUMA nodes
Hi, All ** ERROR:/tmp/qemu-test/src/tests/vhost-user-test.c:668:test_migrate: assertion failed: (qdict_haskey(rsp, "return")) GTester: last random seed: R02Sf52546c4daff8087416f43fa7c146db8 ftruncate: Permission denied ftruncate: Permission denied qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: Assertion `cpu->cpu_index < max_cpus' failed. Broken pipe I don't know What's the meaning of this log ? Is the qemu-system-aarch64 can't recognize the qom/cpu.c:346: assert(cpu->cpu_index < max_cpus); Thanks, Liyang GTester: last random seed: R02Sa39aa674143b4b48a89276d59eee19b3 qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: Assertion `cpu->cpu_index < max_cpus' failed. Broken pipe GTester: last random seed: R02Sbd557f67296de3764183db6b2105d88b Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: Assertion `cpu->cpu_index < max_cpus' failed. Broken pipe GTester: last random seed: R02S79c36cf5f50e19956d11fbc925f86df0 Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: Assertion `cpu->cpu_index < max_cpus' failed. Broken pipe GTester: last random seed: R02S1186d0871ff44bed258ec179722caf91 qemu-system-aarch64: /tmp/qemu-test/src/qom/cpu.c:346: cpu_common_map_numa_node: Assertion `cpu->cpu_index < max_cpus' failed. Broken pipe GTester: last random seed: R02S3429594491b10489f9abab73dcc15151 make: *** [check-qtest-aarch64] Error 1 make: *** Waiting for unfinished jobs Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. make[1]: *** [docker-run] Error 2 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-3q9yfmbs/src' make: *** [docker-run-test-quick@centos6] Error 2 === OUTPUT END === Test command exited with code: 2 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-de...@freelists.org
Re: [Qemu-devel] [PATCH 1/3] cpu: Make the mapping of CPUs and NUMA nodes in cpu_common_realizefn
Hi, Eduardo At 01/18/2017 08:56 PM, Eduardo Habkost wrote: On Wed, Jan 18, 2017 at 08:40:05PM +0800, Dou Liyang wrote: Current default way of seting the CPUState::numa_node might be wrong in case on cold/hot-plug CPUs. Making the users confused why the NUMA info is different beetween the guests and monitor. Make the mapping of CPUs and NUMA nodes in qom/cpu.c: cpu_common_realizefn(), where each VCPUs need to realize. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> parse_numa_opts() is called a long time before any CPU is created, so this should be safe. Reviewed-by: Eduardo Habkost <ehabk...@redhat.com> (But we can squash patch 2/3 and patch 3/3 in this patch). OK, :) Thanks, Liyang. --- qom/cpu.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/qom/cpu.c b/qom/cpu.c index 61ee0cb..e08dceb 100644 --- a/qom/cpu.c +++ b/qom/cpu.c @@ -23,6 +23,7 @@ #include "qemu-common.h" #include "qom/cpu.h" #include "sysemu/kvm.h" +#include "sysemu/numa.h" #include "qemu/notify.h" #include "qemu/log.h" #include "exec/log.h" @@ -338,6 +339,18 @@ static void cpu_common_parse_features(const char *typename, char *features, } } +static void cpu_common_map_numa_node(CPUState *cpu) +{ +int i; + +for (i = 0; i < nb_numa_nodes; i++) { +assert(cpu->cpu_index < max_cpus); +if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) { +cpu->numa_node = i; +} +} +} + static void cpu_common_realizefn(DeviceState *dev, Error **errp) { CPUState *cpu = CPU(dev); @@ -347,6 +360,8 @@ static void cpu_common_realizefn(DeviceState *dev, Error **errp) cpu_resume(cpu); } +cpu_common_map_numa_node(cpu); + /* NOTE: latest generic point where the cpu is fully realized */ trace_init_vcpu(cpu); } -- 2.5.5
[Qemu-devel] [PATCH 3/3] cpu: make the function of cpu_common_map_numa_node more efficiently
Current function does some unnecessary operations, such as it makes the assert() in the loop, and the loop was not stopped in time. This patch moves the assert() out the loop and stops the loop in time. Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> --- qom/cpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/qom/cpu.c b/qom/cpu.c index e08dceb..3c655b2 100644 --- a/qom/cpu.c +++ b/qom/cpu.c @@ -343,10 +343,11 @@ static void cpu_common_map_numa_node(CPUState *cpu) { int i; +assert(cpu->cpu_index < max_cpus); for (i = 0; i < nb_numa_nodes; i++) { -assert(cpu->cpu_index < max_cpus); if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) { cpu->numa_node = i; +return; } } } -- 2.5.5