On Wed, 27 Mar 2019 11:11:46 +1100
David Gibson <da...@gibson.dropbear.id.au> wrote:

> On Tue, Mar 26, 2019 at 03:08:20PM +0100, Igor Mammedov wrote:
> > On Tue, 26 Mar 2019 14:50:58 +1100
> > David Gibson <da...@gibson.dropbear.id.au> wrote:
> >   
> > > qemu_getrampagesize() works out the minimum host page size backing any of
> > > guest RAM.  This is required in a few places, such as for POWER8 PAPR KVM
> > > guests, because limitations of the hardware virtualization mean the guest
> > > can't use pagesizes larger than the host pages backing its memory.
> > > 
> > > However, it currently checks against *every* memory backend, whether or 
> > > not
> > > it is actually mapped into guest memory at the moment.  This is incorrect.
> > > 
> > > This can cause a problem attempting to add memory to a POWER8 pseries KVM
> > > guest which is configured to allow hugepages in the guest (e.g.
> > > -machine cap-hpt-max-page-size=16m).  If you attempt to add non-hugepage,
> > > you can (correctly) create a memory backend, however it (correctly) will
> > > throw an error when you attempt to map that memory into the guest by
> > > 'device_add'ing a pc-dimm.
> > > 
> > > What's not correct is that if you then reset the guest a startup check
> > > against qemu_getrampagesize() will cause a fatal error because of the new
> > > memory object, even though it's not mapped into the guest.  
> > I'd say that backend should be remove by mgmt app since device_add failed
> > instead of leaving it to hang around. (but fatal error either not a nice
> > behavior on QEMU part)  
> 
> I agree, but reset could be guest initiated, so even if management is
> doing the right thing there's a window where it could break.

We could (log term) get rid of qemu_getrampagesize() (it's not really good
to push machine/target specific condition into generic function) and move
pagesize calculation closer to machine. In this case machine (spapr) knows
exactly when and what is mapped and can enumerate NOT backends but mapped
memory regions and/or pc-dimms (lets say we have memory_region_page_size()
and apply whatever policy to the results.

> 
> > > This patch corrects the problem by adjusting find_max_supported_pagesize()
> > > (called from qemu_getrampagesize() via object_child_foreach) to exclude
> > > non-mapped memory backends.  
> > I'm not sure about if it's ok do so. It depends on where from
> > qemu_getrampagesize() is called. For example s390 calls it rather early
> > when initializing KVM, so there isn't anything mapped yet.  
> 
> Oh.  Drat.
> 
> > And once I replace -mem-path with hostmem backend and drop
> > qemu_mempath_getpagesize(mem_path) /which btw aren't guarantied to be 
> > mapped either/
> > this patch might lead to incorrect results for initial memory as
> > well.  
> 
> Uh.. I don't immediately see why.
It depends on when qemu_getrampagesize() is called with this patch.

Maybe qemu_getrampagesize() is not a good interface anymore?
I also see it being called directly from device side hw/vfio/spapr.c, which
in my opinion should be propagated from machine, plus adhoc call
from kvm parts.




Reply via email to