On Wed, Mar 27, 2019 at 09:57:45AM +0100, Igor Mammedov wrote:
> On Wed, 27 Mar 2019 11:11:46 +1100
> David Gibson <da...@gibson.dropbear.id.au> wrote:
> 
> > On Tue, Mar 26, 2019 at 03:08:20PM +0100, Igor Mammedov wrote:
> > > On Tue, 26 Mar 2019 14:50:58 +1100
> > > David Gibson <da...@gibson.dropbear.id.au> wrote:
> > >   
> > > > qemu_getrampagesize() works out the minimum host page size backing any 
> > > > of
> > > > guest RAM.  This is required in a few places, such as for POWER8 PAPR 
> > > > KVM
> > > > guests, because limitations of the hardware virtualization mean the 
> > > > guest
> > > > can't use pagesizes larger than the host pages backing its memory.
> > > > 
> > > > However, it currently checks against *every* memory backend, whether or 
> > > > not
> > > > it is actually mapped into guest memory at the moment.  This is 
> > > > incorrect.
> > > > 
> > > > This can cause a problem attempting to add memory to a POWER8 pseries 
> > > > KVM
> > > > guest which is configured to allow hugepages in the guest (e.g.
> > > > -machine cap-hpt-max-page-size=16m).  If you attempt to add 
> > > > non-hugepage,
> > > > you can (correctly) create a memory backend, however it (correctly) will
> > > > throw an error when you attempt to map that memory into the guest by
> > > > 'device_add'ing a pc-dimm.
> > > > 
> > > > What's not correct is that if you then reset the guest a startup check
> > > > against qemu_getrampagesize() will cause a fatal error because of the 
> > > > new
> > > > memory object, even though it's not mapped into the guest.  
> > > I'd say that backend should be remove by mgmt app since device_add failed
> > > instead of leaving it to hang around. (but fatal error either not a nice
> > > behavior on QEMU part)  
> > 
> > I agree, but reset could be guest initiated, so even if management is
> > doing the right thing there's a window where it could break.
> 
> We could (log term) get rid of qemu_getrampagesize() (it's not really good
> to push machine/target specific condition into generic function) and move
> pagesize calculation closer to machine. In this case machine (spapr) knows
> exactly when and what is mapped and can enumerate NOT backends but mapped
> memory regions and/or pc-dimms (lets say we have memory_region_page_size()
> and apply whatever policy to the results.

So.. it used to be in the machine specific code, but was made generic
because it's used in the vfio code.  Where it is ppc specific, but not
keyed into the machine specific code in a way that we can really
re-use it there.

> > > > This patch corrects the problem by adjusting 
> > > > find_max_supported_pagesize()
> > > > (called from qemu_getrampagesize() via object_child_foreach) to exclude
> > > > non-mapped memory backends.  
> > > I'm not sure about if it's ok do so. It depends on where from
> > > qemu_getrampagesize() is called. For example s390 calls it rather early
> > > when initializing KVM, so there isn't anything mapped yet.  
> > 
> > Oh.  Drat.
> > 
> > > And once I replace -mem-path with hostmem backend and drop
> > > qemu_mempath_getpagesize(mem_path) /which btw aren't guarantied to be 
> > > mapped either/
> > > this patch might lead to incorrect results for initial memory as
> > > well.  
> > 
> > Uh.. I don't immediately see why.
> It depends on when qemu_getrampagesize() is called with this patch.
> 
> Maybe qemu_getrampagesize() is not a good interface anymore?
> I also see it being called directly from device side hw/vfio/spapr.c, which
> in my opinion should be propagated from machine,

Um.. I'm not sure what you mean by that.

> plus adhoc call
> from kvm parts.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature

Reply via email to