On Tue, Apr 15, 2014 at 05:55:22PM +0200, Igor Mammedov wrote: > On Tue, 15 Apr 2014 14:37:01 +0800 > Hu Tao <hu...@cn.fujitsu.com> wrote: > > > On Mon, Apr 14, 2014 at 06:44:42PM +0200, Igor Mammedov wrote: > > > On Mon, 14 Apr 2014 15:25:01 +0800 > > > Hu Tao <hu...@cn.fujitsu.com> wrote: > > > > > > > On Fri, Apr 04, 2014 at 03:36:58PM +0200, Igor Mammedov wrote: > > > > > Needed for Windows to use hotplugged memory device, otherwise > > > > > it complains that server is not configured for memory hotplug. > > > > > Tests shows that aftewards it uses dynamically provided > > > > > proximity value from _PXM() method if available. > > > > > > > > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com> > > > > > --- > > > > > hw/i386/acpi-build.c | 14 ++++++++++++++ > > > > > 1 file changed, 14 insertions(+) > > > > > > > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > > > > > index ef89e99..012b100 100644 > > > > > --- a/hw/i386/acpi-build.c > > > > > +++ b/hw/i386/acpi-build.c > > > > > @@ -1197,6 +1197,8 @@ build_srat(GArray *table_data, GArray *linker, > > > > > uint64_t curnode; > > > > > int srat_start, numa_start, slots; > > > > > uint64_t mem_len, mem_base, next_base; > > > > > + PCMachineState *pcms = PC_MACHINE(qdev_get_machine()); > > > > > + ram_addr_t hotplug_as_size = > > > > > memory_region_size(&pcms->hotplug_memory); > > > > > > > > > > srat_start = table_data->len; > > > > > > > > > > @@ -1261,6 +1263,18 @@ build_srat(GArray *table_data, GArray *linker, > > > > > acpi_build_srat_memory(numamem, 0, 0, 0, > > > > > MEM_AFFINITY_NOFLAGS); > > > > > } > > > > > > > > > > + /* > > > > > + * Fake entry required by Windows to enable memory hotplug in OS. > > > > > + * Individual DIMM devices override proximity set here via _PXM > > > > > method, > > > > > + * which returns associated with it NUMA node id. > > > > > + */ > > > > > + if (hotplug_as_size) { > > > > > + numamem = acpi_data_push(table_data, sizeof *numamem); > > > > > + acpi_build_srat_memory(numamem, pcms->hotplug_memory_base, > > > > > + hotplug_as_size, 0, > > > > > MEM_AFFINITY_HOTPLUGGABLE | > > > > > + MEM_AFFINITY_ENABLED); > > > > > + } > > > > > + > > > > > > > > Hi Igor, > > > > > > > > With the faked entry, memory unplug doesn't work. Entries should be set > > > > up for each node with correct flags(enable, hotpluggable) to make memory > > > > unplug work. > > > Could you be more specific, what and how doesn't work and why there is > > > need for SRAT entries per DIMM? > > > I've briefly tested with your unplug patches and linux seemed be ok with > > > unplug, > > > i.e. device node was removed from /sys after receiving remove > > > notification. > > > > > > Following are fail cases: > > > I did some testing using upstream kernel with hot-remove enabled. > tested only "this patch" case > > > ------------------------------------------------------------------------+---------------------------------------------- > > guest commands | > > this patch | hacked SRAT > > ------------------------------------------------------------------------+---------------------------------------------- > > echo 'online' > /sys/devices/system/memory/memory32/state && \ | > > | > > echo 'offline' > /sys/devices/system/memory/memory32/state | > > fail | success > works for me, but it might/allowed to fail offline since page > migration may fail if memory section or its part is not movable.
You're right. More tests(latest kernel) show that offline memory randomly success and fail in both cases. > > > ------------------------------------------------------------------------+---------------------------------------------- > > echo 'online' > /sys/devices/system/memory/memory32/state && \ | > > | > > echo 1 > /sys/devices/LNXSYSTM\:00/device\:00/PNP0C80\:00/eject | > > fail | success > the same as #1 > > > ------------------------------------------------------------------------+---------------------------------------------- > > echo 'online_movable' > /sys/devices/system/memory/memory32/state | > > fail[first memory block] | fail > it's linux implementation specific, should be fixed in guest and has > nothing to do with qemu side. > PS: all hot-added memory sections could be onlined with 'online_movable' > in reverse order. Correct. > > > ------------------------------------------------------------------------+---------------------------------------------- > > echo 'online_movable' > /sys/devices/system/memory/memory35/state && \ | > > | > > echo 'offline' > /sys/devices/system/memory/memory35/state | > > success[last memory block] | success > > ------------------------------------------------------------------------+---------------------------------------------- > > echo 'online_movable' > /sys/devices/system/memory/memory32/state && \ | > > | > > echo 1 > /sys/devices/LNXSYSTM\:00/device\:00/PNP0C80\:00/eject | > > success[last memory block] | success > > ------------------------------------------------------------------------+---------------------------------------------- > movable memory section is guarantied to succeed, hence no issue. > > Reading upstream kernel code, it honors PNP0C80._PXM value and overrides > anything that was provided in SRAT. So I don't see why hacked SRAT > would make any difference. > > Could you verify with the latest upstream kernel? > PS: do not forget to check "removable" attribute before marking case as > failed. > > One time, I've seen guest panic on "successful" eject of ZONE_NORMAL memory > section since it was still using it (so there is still hot-remove bugs in > kernel) and "removable" doesn't guarantee anything for ZONE_NORMAL memory > section. > > > > > Hacke SRAT memory entry: > > > > PXM: 0 > > range: 4G ~ 4G + 512M > > flags: Enabled Hot-Pluggable > > > > PXM: 1 > > range: 4G + 512M ~ 5G > > flags: Enabled Hot-Pluggable > > > > So I think we should add maxmem to -numa and build SRAT accordingly. > > But there is something I'm not sure with. I added dimm in node 1, but > > it's memory range fell in node 0. Users always can cause the mismatch > > with dimm,start,node. > > > > > > > > This is the relevent part in command line: > > > > qemu command line: -m 512M,slots=4,maxmem=2G \ > > -object memory-ram,id=foo,size=512M \ > > -numa node,id=n0,mem=256M -numa node,id=n1,mem=256M > > > > (qemu monitor) device_add dimm,id=d0,memdev=foo,node=1 > > > > > > > > > > > > > Windows has not been tested yet. I encountered a problem that there is > > > > no SRAT in Windows so even memory hotplug doesn't work. (but there is > > > > in Linux with the same configuration). > > > For Windows to work one needs to add "-numa node" CLI option so that > > > SRAT would be exposed to guest. > > > > Thanks. I need to double-check. > > > > > Paolo suggested to enable -numa node by default, I guess we can do it > > > once NUMA re-factoring is merged. > > > > > > That said, I haven't found any information that Windows supports > > > memory hot-remove. Google tells that only hot-add is supported > > > for up to WS2008R2. I've tested WS2012R2, it doesn't work either, > > > i.e. it sees but ignores Notify request. > > > > > > > > > > > Regards, > > > > Hu Tao > > > > > > > -- > Regards, > Igor