On Mon, 18 Sep 2017 11:54:50 +0800 Dou Liyang <douly.f...@cn.fujitsu.com> wrote:
> At 09/15/2017 06:05 PM, Dou Liyang wrote: > > Hi Daniel, > > > > At 09/15/2017 04:40 PM, Daniel P. Berrange wrote: > >> On Fri, Sep 15, 2017 at 04:33:18PM +0800, Dou Liyang wrote: > >>> In QEMU, if we enable NUMA and have nodes, QEMU will build ACPI SRAT > >>> table > >>> for transfering NUMA configuration to the guest. So, the maximum > >>> memory in > >>> SRAT can be used to determine whether to use the swiotlb for IOMMU or > >>> not. > >>> > >>> However, if QEmu doesn't enable NUMA explicitly on CLI, The SRAT will > >>> never be built. When memory hotplug is enabled, some guest's devices may > >>> start failing due to SWIOTLB is disabled. > >>> > >>> Add numa_implicit_add_node0 in struct MachineClass, Invoke it before > >>> QEMU > >>> parse NUMA options to enable adding NUMA node implicitly. > >>> > >>> Reported-by: Thadeu Lima de Souza Cascardo <casca...@canonical.com> > >>> Suggested-by: Igor Mammedov <imamm...@redhat.com> > >>> Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com> > >>> Cc: Paolo Bonzini <pbonz...@redhat.com> > >>> Cc: Richard Henderson <r...@twiddle.net> > >>> Cc: Eduardo Habkost <ehabk...@redhat.com> > >>> Cc: "Michael S. Tsirkin" <m...@redhat.com> > >>> Cc: Marcel Apfelbaum <mar...@redhat.com> > >>> Cc: Igor Mammedov <imamm...@redhat.com> > >>> Cc: David Hildenbrand <da...@redhat.com> > >>> Cc: Thomas Huth <th...@redhat.com> > >>> Cc: Alistair Francis <alistai...@gmail.com> > >>> Cc: f4...@amsat.org > >>> Cc: Takao Indoh <indou.ta...@jp.fujitsu.com> > >>> Cc: Izumi Taku <izumi.t...@jp.fujitsu.com> > >>> > >>> --- > >>> hw/i386/pc.c | 6 ++++++ > >>> include/hw/boards.h | 4 ++++ > >>> vl.c | 14 ++++++++++++++ > >>> 3 files changed, 24 insertions(+) > >>> > >>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c > >>> index 2108104..3c40117 100644 > >>> --- a/hw/i386/pc.c > >>> +++ b/hw/i386/pc.c > >>> @@ -2308,6 +2308,11 @@ static const CPUArchIdList > >>> *pc_possible_cpu_arch_ids(MachineState *ms) > >>> return ms->possible_cpus; > >>> } > >>> > >>> +static void numa_implicit_add_node0(void) > >>> +{ > >>> + qemu_opts_parse_noisily(qemu_find_opts("numa"), "node", true); > >>> +} > >>> + > >>> static void x86_nmi(NMIState *n, int cpu_index, Error **errp) > >>> { > >>> /* cpu index isn't used */ > >>> @@ -2349,6 +2354,7 @@ static void pc_machine_class_init(ObjectClass > >>> *oc, void *data) > >>> mc->get_hotplug_handler = pc_get_hotpug_handler; > >>> mc->cpu_index_to_instance_props = pc_cpu_index_to_props; > >>> mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; > >>> + mc->numa_implicit_add_node0 = numa_implicit_add_node0; > >>> mc->has_hotpluggable_cpus = true; > >>> mc->default_boot_order = "cad"; > >>> mc->hot_add_cpu = pc_hot_add_cpu; > >>> diff --git a/include/hw/boards.h b/include/hw/boards.h > >>> index 7f044d1..898d841 100644 > >>> --- a/include/hw/boards.h > >>> +++ b/include/hw/boards.h > >>> @@ -141,6 +141,8 @@ typedef struct { > >>> * should instead use "unimplemented-device" for all memory > >>> ranges where > >>> * the guest will attempt to probe for a device that QEMU doesn't > >>> * implement and a stub device is required. > >>> + * @numa_implicit_add_node0: > >>> + * Enable NUMA implicitly by add a NUMA node. > >>> */ > >>> struct MachineClass { > >>> /*< private >*/ > >>> @@ -191,6 +193,8 @@ struct MachineClass { > >>> CpuInstanceProperties > >>> (*cpu_index_to_instance_props)(MachineState *machine, > >>> unsigned > >>> cpu_index); > >>> const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState > >>> *machine); > >>> + > >>> + void (*numa_implicit_add_node0)(void); > >>> }; > >>> > >>> /** > >>> diff --git a/vl.c b/vl.c > >>> index fb1f05b..814a5fa 100644 > >>> --- a/vl.c > >>> +++ b/vl.c > >>> @@ -3030,6 +3030,7 @@ int main(int argc, char **argv, char **envp) > >>> Error *main_loop_err = NULL; > >>> Error *err = NULL; > >>> bool list_data_dirs = false; > >>> + bool has_numa_config_in_CLI = false; > >>> typedef struct BlockdevOptions_queue { > >>> BlockdevOptions *bdo; > >>> Location loc; > >>> @@ -3293,6 +3294,7 @@ int main(int argc, char **argv, char **envp) > >>> if (!opts) { > >>> exit(1); > >>> } > >>> + has_numa_config_in_CLI = true; > >>> break; > >>> case QEMU_OPTION_display: > >>> display_type = select_display(optarg); > >>> @@ -4585,6 +4587,18 @@ int main(int argc, char **argv, char **envp) > >>> default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS); > >>> default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS); > >>> > >>> + /* > >>> + * If memory hotplug is enabled i.e. slots > 0 and user hasn't add > >>> + * NUMA nodes explicitly on CLI > >>> + * > >>> + * Enable NUMA implicitly for guest to know the maximum memory > >>> + * from ACPI SRAT table, which is used for SWIOTLB. > >>> + */ > >>> + if (ram_slots > 0 && !has_numa_config_in_CLI) { > >>> + if (machine_class->numa_implicit_add_node0) { > >>> + machine_class->numa_implicit_add_node0(); > >>> + } > >>> + } > >> > >> Won't this change guest ABI > > Hi Daniel, > > No, this will change the ACPI table in following case without NUMA > configuration: it also will affect FW_CFG_NUMA file length (read as break ABI), you didn't see any breakage because currnet SeaBIOS don't use this legacy file anymore and even if it would used it's still a race one has to at boot. So you need to disable feature on old machine types as you've suggested > > ... > -m 128M,slots=4,maxmem=1G > ... > > How about fixing it by adding > > m->numa_implicit_add_node0 = NULL; > > in pc_i440fx/q35_2_10_machine_options() ? yep, pls do so. Also take a look spapr which also supports numa, to make sure it won't regress. CCing David. > > > and so break migration/save/restore ? > > I did some tests(save/restore), this don't break migration. > > Are some situations I didn't consider ? > > > Thanks, > dou. > >> > > Thank you for your reply, I can't answer this immediately, I will > > look into it and reply you soon. > > > > This patch works for X86, has no influence on other arch, and we > > can regard it's functionality as "-numa node" in CLI. > > > > Thanks, > > dou. > > > > > >> Regards, > >> Daniel > >> > > >