On Tue, Apr 15, 2014 at 05:55:22PM +0200, Igor Mammedov wrote:
> On Tue, 15 Apr 2014 14:37:01 +0800
> Hu Tao <hu...@cn.fujitsu.com> wrote:
> 
> > On Mon, Apr 14, 2014 at 06:44:42PM +0200, Igor Mammedov wrote:
> > > On Mon, 14 Apr 2014 15:25:01 +0800
> > > Hu Tao <hu...@cn.fujitsu.com> wrote:
> > > 
> > > > On Fri, Apr 04, 2014 at 03:36:58PM +0200, Igor Mammedov wrote:
> > > > > Needed for Windows to use hotplugged memory device, otherwise
> > > > > it complains that server is not configured for memory hotplug.
> > > > > Tests shows that aftewards it uses dynamically provided
> > > > > proximity value from _PXM() method if available.
> > > > > 
> > > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com>
> > > > > ---
> > > > >  hw/i386/acpi-build.c | 14 ++++++++++++++
> > > > >  1 file changed, 14 insertions(+)
> > > > > 
> > > > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > > > index ef89e99..012b100 100644
> > > > > --- a/hw/i386/acpi-build.c
> > > > > +++ b/hw/i386/acpi-build.c
> > > > > @@ -1197,6 +1197,8 @@ build_srat(GArray *table_data, GArray *linker,
> > > > >      uint64_t curnode;
> > > > >      int srat_start, numa_start, slots;
> > > > >      uint64_t mem_len, mem_base, next_base;
> > > > > +    PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> > > > > +    ram_addr_t hotplug_as_size = 
> > > > > memory_region_size(&pcms->hotplug_memory);
> > > > >  
> > > > >      srat_start = table_data->len;
> > > > >  
> > > > > @@ -1261,6 +1263,18 @@ build_srat(GArray *table_data, GArray *linker,
> > > > >          acpi_build_srat_memory(numamem, 0, 0, 0, 
> > > > > MEM_AFFINITY_NOFLAGS);
> > > > >      }
> > > > >  
> > > > > +    /*
> > > > > +     * Fake entry required by Windows to enable memory hotplug in OS.
> > > > > +     * Individual DIMM devices override proximity set here via _PXM 
> > > > > method,
> > > > > +     * which returns associated with it NUMA node id.
> > > > > +     */
> > > > > +    if (hotplug_as_size) {
> > > > > +        numamem = acpi_data_push(table_data, sizeof *numamem);
> > > > > +        acpi_build_srat_memory(numamem, pcms->hotplug_memory_base,
> > > > > +                               hotplug_as_size, 0, 
> > > > > MEM_AFFINITY_HOTPLUGGABLE |
> > > > > +                               MEM_AFFINITY_ENABLED);
> > > > > +    }
> > > > > +
> > > > 
> > > > Hi Igor,
> > > > 
> > > > With the faked entry, memory unplug doesn't work. Entries should be set
> > > > up for each node with correct flags(enable, hotpluggable) to make memory
> > > > unplug work.
> > > Could you be more specific, what and how doesn't work and why there is
> > > need for SRAT entries per DIMM?
> > > I've briefly tested with your unplug patches and linux seemed be ok with 
> > > unplug,
> > > i.e. device node was removed from /sys after receiving remove 
> > > notification.
> > 
> > 
> > Following are fail cases:
> > 
> I did some testing using upstream kernel with hot-remove enabled.
>  tested only "this patch" case
>  
> > ------------------------------------------------------------------------+----------------------------------------------
> > guest commands                                                          | 
> > this patch                    | hacked SRAT
> > ------------------------------------------------------------------------+----------------------------------------------
> > echo 'online' > /sys/devices/system/memory/memory32/state && \          |   
> >                             |
> > echo 'offline' > /sys/devices/system/memory/memory32/state              |  
> > fail                         | success 
> works for me, but it might/allowed to fail offline since page
> migration may fail if memory section or its part is not movable.

You're right. More tests(latest kernel) show that offline memory
randomly success and fail in both cases.

> 
> > ------------------------------------------------------------------------+----------------------------------------------
> > echo 'online' > /sys/devices/system/memory/memory32/state && \          |   
> >                             |
> > echo 1 > /sys/devices/LNXSYSTM\:00/device\:00/PNP0C80\:00/eject         |  
> > fail                         | success
> the same as #1
> 
> > ------------------------------------------------------------------------+----------------------------------------------
> > echo 'online_movable' > /sys/devices/system/memory/memory32/state       |  
> > fail[first memory block]     | fail
> it's linux implementation specific, should be fixed in guest and has
> nothing to do with qemu side.
> PS: all hot-added memory sections could be onlined with 'online_movable'
> in reverse order.

Correct.

> 
> > ------------------------------------------------------------------------+----------------------------------------------
> > echo 'online_movable' > /sys/devices/system/memory/memory35/state && \  |   
> >                             |
> > echo 'offline' > /sys/devices/system/memory/memory35/state              |  
> > success[last memory block]   | success
> > ------------------------------------------------------------------------+----------------------------------------------
> > echo 'online_movable' > /sys/devices/system/memory/memory32/state && \  |   
> >                             |
> > echo 1 > /sys/devices/LNXSYSTM\:00/device\:00/PNP0C80\:00/eject         |  
> > success[last memory block]   | success
> > ------------------------------------------------------------------------+----------------------------------------------
> movable memory section is guarantied to succeed, hence no issue.
> 
> Reading upstream kernel code, it honors PNP0C80._PXM value and overrides
> anything that was provided in SRAT. So I don't see why hacked SRAT
> would make any difference.
> 
> Could you verify with the latest upstream kernel?
> PS: do not forget to check "removable" attribute before marking case as 
> failed.
> 
> One time, I've seen guest panic on "successful" eject of ZONE_NORMAL memory
> section since it was still using it (so there is still hot-remove bugs in
> kernel) and "removable" doesn't guarantee anything for ZONE_NORMAL memory
> section.
> 
> > 
> > Hacke SRAT memory entry:
> > 
> > PXM: 0
> > range: 4G ~ 4G + 512M
> > flags: Enabled Hot-Pluggable
> > 
> > PXM: 1
> > range: 4G + 512M ~ 5G
> > flags: Enabled Hot-Pluggable
> > 
> > So I think we should add maxmem to -numa and build SRAT accordingly.
> > But there is something I'm not sure with. I added dimm in node 1, but
> > it's memory range fell in node 0. Users always can cause the mismatch
> > with dimm,start,node.
> > 
> > 
> > 
> > This is the relevent part in command line:
> > 
> > qemu command line: -m 512M,slots=4,maxmem=2G \
> >                    -object memory-ram,id=foo,size=512M \
> >                    -numa node,id=n0,mem=256M -numa node,id=n1,mem=256M 
> > 
> > (qemu monitor) device_add dimm,id=d0,memdev=foo,node=1
> > 
> > > 
> > > > 
> > > > Windows has not been tested yet. I encountered a problem that there is
> > > > no SRAT in Windows so even memory hotplug doesn't work. (but there is
> > > > in Linux with the same configuration).
> > > For Windows to work one needs to add "-numa node" CLI option so that
> > > SRAT would be exposed to guest.
> > 
> > Thanks. I need to double-check.
> > 
> > > Paolo suggested to enable -numa node by default, I guess we can do it
> > > once NUMA re-factoring is merged.
> > > 
> > > That said, I haven't found any information that Windows supports
> > > memory hot-remove. Google tells that only hot-add is supported
> > > for up to WS2008R2. I've tested WS2012R2, it doesn't work either,
> > > i.e. it sees but ignores Notify request.
> > > 
> > > > 
> > > > Regards,
> > > > Hu Tao
> > > > 
> 
> 
> -- 
> Regards,
>   Igor

Reply via email to