On Wed, 27 Apr 2016 14:07:10 -0500 Michael Roth <mdr...@linux.vnet.ibm.com> wrote:
> Quoting Igor Mammedov (2016-04-27 09:34:53) > > On Wed, 27 Apr 2016 15:59:52 +0200 > > Thomas Huth <th...@redhat.com> wrote: > > > > > On 27.04.2016 15:37, Igor Mammedov wrote: > > > > On Tue, 26 Apr 2016 16:03:37 -0500 > > > > Michael Roth <mdr...@linux.vnet.ibm.com> wrote: > > > > > > > >> Quoting Igor Mammedov (2016-04-26 02:52:36) > > > >>> On Tue, 26 Apr 2016 10:39:23 +0530 > > > >>> Bharata B Rao <bhar...@linux.vnet.ibm.com> wrote: > > > >>> > > > >>>> On Mon, Apr 25, 2016 at 11:20:50AM +0200, Igor Mammedov wrote: > > > >>>>> On Wed, 16 Mar 2016 10:11:54 +0530 > > > >>>>> Bharata B Rao <bhar...@linux.vnet.ibm.com> wrote: > > > >>>>> > > > >>>>>> On Wed, Mar 16, 2016 at 12:36:05PM +1100, David Gibson wrote: > > > >>>>>> > > > >>>>>>> On Tue, Mar 15, 2016 at 10:08:56AM +0530, Bharata B Rao wrote: > > > >>>>>>> > > > >>>>>>>> Add support to hot remove pc-dimm memory devices. > > > >>>>>>>> > > > >>>>>>>> Signed-off-by: Bharata B Rao <bhar...@linux.vnet.ibm.com> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> Reviewed-by: David Gibson <da...@gibson.dropbear.id.au> > > > >>>>>>> > > > >>>>>>> Looks correct, but again, needs to wait on the PAPR change. > > > >>>>>>> > > > >>>>> [...] > > > >>>>>> > > > >>>>>> While we are here, I would also like to get some opinion on the > > > >>>>>> real > > > >>>>>> need for memory unplug. Is there anything that memory unplug gives > > > >>>>>> us > > > >>>>>> which memory ballooning (shrinking mem via ballooning) can't give > > > >>>>>> ? > > > >>>>> Sure ballooning can complement memory hotplug but turning it on > > > >>>>> would > > > >>>>> effectively reduce hotplug to balloning as it would enable > > > >>>>> overcommit > > > >>>>> capability instead of hard partitioning pc-dimms provides. So one > > > >>>>> could just use ballooning only and not bother with hotplug at all. > > > >>>>> > > > >>>>> On the other hand memory hotplug/unplug (at least on x86) tries > > > >>>>> to model real hardware, thus removing need in paravirt ballooning > > > >>>>> solution in favor of native guest support. > > > >>>> > > > >>>> Thanks for your views. > > > >>>> > > > >>>>> > > > >>>>> PS: > > > >>>>> Guest wise, currently hot-unplug is not well supported in linux, > > > >>>>> i.e. it's not guarantied that guest will honor unplug request > > > >>>>> as it may pin dimm by using it as a non migratable memory. So > > > >>>>> there is something to work on guest side to make unplug more > > > >>>>> reliable/guarantied. > > > >>>> > > > >>>> In the above scenario where the guest doesn't allow removal of > > > >>>> certain > > > >>>> parts of DIMM memory, what is the expected behaviour as far as QEMU > > > >>>> DIMM device is concerned ? I seem to be running into this situation > > > >>>> very often with PowerPC mem unplug where I am left with a DIMM device > > > >>>> that has only some memory blocks released. In this situation, I > > > >>>> would like > > > >>>> to block further unplug requests on the same device, but QEMU seems > > > >>>> to allow more such unplug requests to come in via the monitor. So > > > >>>> qdev won't help me here ? Should I detect such condition from the > > > >>>> machine unplug() handler and take required action ? > > > >>> I think offlining is a guests task along with recovering from > > > >>> inability to offline (i.e. offline all + eject or restore original > > > >>> state). > > > >>> QUEM does it's job by notifying guest what dimm it wants to remove > > > >>> and removes it when guest asks it (at least in x86 world). > > > >> > > > >> In the case of pseries, the DIMM abstraction isn't really exposed to > > > >> the guest, but rather the memory blocks we use to make the backing > > > >> memdev memory available to the guest. During unplug, the guest > > > >> completely releases these blocks back to QEMU, and if it can only > > > >> release a subset of what's requested it does not attempt to recover. > > > >> We can potentially change that behavior on the guest side, since > > > >> partially-freed DIMMs aren't currently useful on the host-side... > > > >> > > > >> But, in the case of pseries, I wonder if it makes sense to maybe go > > > >> ahead and MADV_DONTNEED the ranges backing these released blocks so the > > > >> host can at least partially reclaim the memory from a partially > > > >> unplugged DIMM? > > > > It's a little bit confusing, one asked to remove device but it's still > > > > there but not completely usable/available. > > > > What will happen when user wants that memory plugged back? > > > > > > As far as I've understood MADV_DONTNEED, you can use the memory again at > > > any time - just the previous contents will be gone, which is ok in this > > > case since the guest previously marked this area as unavailable. > > If host gave returned memory to someone else there might not be enough > > resources to give it back (what would happen I can't tell may be VM will > > stall or just get exception). > > It's not really an issue for pseries, since once the LMB is released > it's totally gone as far as the guest is concerned, and there's no > way to plug it back in via the still-present DIMM until removal > completes after, say, reset time. > > But, either way, I agree if we'll intend to let the guest recover, it > would be immediately upon being unable to satisfy the whole unplug and > not some future time. > > > > > Anyhow I'd suggest ballooning if one needs partial unplug and fix > > physical unplug to unplug whole pc-dimm or none instead of > > turning pc-dimm device model into some hybrid with balloon device > > and making users/mgmt even more confused. > > That seems reasonable, I can see why recovering memory from partially > removed DIMMs overlaps a lot with the ballooning use case... > > But I think that kind of leaves the question of how to make memory > unplug useful in practice? In practice, memory unplug seems quite > likely to fail in all-or-nothing scenarios. So if we expect I'd work on improving not yet mature native unplug support on guest side making guarantied unplug available. That would benefit not only virt which would be the first big consumer but physical systems as well. Also it would allow drop ballooning support guest wise in favor of native solution. > all-or-nothing removal in the guest, then it seems like some work > needs to be done with the balloon driver or elsewhere to provide the > sort of specificity management would need to know to determine if a > DIMM has become fully unpluggable, and let the guest make ballooning > decisions that complement eventual DIMM unplug more effectively. Currently using ballooning effectively bars pc-dimm unplug as balloon driver pins all unused pages to itself. So using them together might need some work done on ballooning side, I can't tell how much though as I'm not familiar with ballooning nor with how kernel memory allocator works. > > > > > > > > Thomas > > > > > > >