Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-04-27 Thread Paolo Bonzini
Il 09/04/2014 17:12, Daniel P. Berrange ha scritto:
> On Tue, Apr 08, 2014 at 03:06:13PM -0300, Marcelo Tosatti wrote:
>> On Mon, Mar 17, 2014 at 09:39:54AM +, Daniel P. Berrange wrote:
>>> We recently had a bunch more feature requests around huge page support
>>> in libvirt, so I think it is preferrable not to merge this currently.
>>> We need to examine the broader problem to come up with a coherant
>>> design for the whole problemspace.
>>
>> There is a number of options implemented in the kernel and qemu that
>> should be exposed via libvirt to the user.
>>
>> Can you explain what are the properties to be verified in the coherent
>> design that you mention which are being ignored in the proposed patchset
> 
> Libvirt needs to be able to report huge pages available per NUMA node,
> to report how many of the pages are free, what size they are. Also need
> to consider how we tell QEMU to use hugepages from specific host NUMA
> nodes when launching guests, possibly also mapping those to specific
> guest NUMA nodes. With NUMA there's more policy considerations if
> enough hugepages aren't available, do we fallback to non-hugepages on
> the same numa node, or to hugepages frorm a different numa node. I
> don't believe we're going to be able to solve all this in libvirt alone
> and so I think we're going to need improve QEMU command line args for
> setting up hugepages

Yes, this is already queued for 2.1.  The same backend device in QEMU
is used for both advanced memory placement and memory hotplug.  It
goes something like this:


  -object memory-ram,size=1024M,policy=membind,host-nodes=0,id=ram-node0 \
  -numa node,nodeid=0,cpus=0,memdev=ram-node0 \
  -object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \
  -numa node,nodeid=1,cpus=1,memdev=ram-node1

For a non-NUMA guest, just put a single "-numa node,memdev=ram-node0".  The
defaults will apply.

To simulate the effect of "-mem-path ...", instead of memory-ram you use
"-object memory-file,mem-path=...".


You can see above the properties that control policies for host NUMA nodes.
Other properties (merge, dump, prealloc) handle characteristics of the backend
that previously were split across multiple command-line options.  Notably,
prealloc now will work for mmap-backed RAM, not just for hugetlbfs.

Paolo

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-04-09 Thread Daniel P. Berrange
On Tue, Apr 08, 2014 at 03:06:13PM -0300, Marcelo Tosatti wrote:
> On Mon, Mar 17, 2014 at 09:39:54AM +, Daniel P. Berrange wrote:
> > We recently had a bunch more feature requests around huge page support
> > in libvirt, so I think it is preferrable not to merge this currently.
> > We need to examine the broader problem to come up with a coherant
> > design for the whole problemspace.
> 
> There is a number of options implemented in the kernel and qemu that
> should be exposed via libvirt to the user.
> 
> Can you explain what are the properties to be verified in the coherent
> design that you mention which are being ignored in the proposed patchset

Libvirt needs to be able to report huge pages available per NUMA node,
to report how many of the pages are free, what size they are. Also need
to consider how we tell QEMU to use hugepages from specific host NUMA
nodes when launching guests, possibly also mapping those to specific
guest NUMA nodes. With NUMA there's more policy considerations if
enough hugepages aren't available, do we fallback to non-hugepages on
the same numa node, or to hugepages frorm a different numa node. I
don't believe we're going to be able to solve all this in libvirt alone
and so I think we're going to need improve QEMU command line args for
setting up hugepages

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-04-09 Thread Marcelo Tosatti
On Mon, Mar 17, 2014 at 09:39:54AM +, Daniel P. Berrange wrote:
> On Fri, Mar 14, 2014 at 06:52:19PM -0300, Marcelo Tosatti wrote:
> > On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote:
> > > On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote:
> > > > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote:
> > > > > 
> > > > > Add an element named "strict-hugepages" to control whether to 
> > > > > refuse guest initialization in case hugepage allocation cannot 
> > > > > be performed.
> > > > > 
> > > > > Signed-off-by: Marcelo Tosatti 
> > > > > 
> > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > > > > index ff50214..e79f5e6 100644
> > > > > --- a/docs/formatdomain.html.in
> > > > > +++ b/docs/formatdomain.html.in
> > > > > @@ -632,6 +632,9 @@
> > > > >hugepages
> > > > >This tells the hypervisor that the guest should have its 
> > > > > memory
> > > > >  allocated using hugepages instead of the normal native page 
> > > > > size.
> > > > > +  strict-hugepages
> > > > > +  This tells the hypervisor that the guest should refuse to 
> > > > > start
> > > > > +  in case of failure to allocate guest memory with 
> > > > > hugepages
> > > > 
> > > > Huh, we already supply the -mem-prealloc command line flag alongside
> > > > the -mem-path flag which should cause QEMU to exit if it cannot allocate
> > > > all memory upfront.
> > 
> > Daniel,
> > 
> > Can we merge this patch please? 
> > 
> > Paolo thinks it belongs to libvirt.
> 
> We recently had a bunch more feature requests around huge page support
> in libvirt, so I think it is preferrable not to merge this currently.
> We need to examine the broader problem to come up with a coherant
> design for the whole problemspace.

There is a number of options implemented in the kernel and qemu that
should be exposed via libvirt to the user.

Can you explain what are the properties to be verified in the coherent
design that you mention which are being ignored in the proposed patchset
?

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-03-17 Thread Daniel P. Berrange
On Fri, Mar 14, 2014 at 06:52:19PM -0300, Marcelo Tosatti wrote:
> On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote:
> > On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote:
> > > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote:
> > > > 
> > > > Add an element named "strict-hugepages" to control whether to 
> > > > refuse guest initialization in case hugepage allocation cannot 
> > > > be performed.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti 
> > > > 
> > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > > > index ff50214..e79f5e6 100644
> > > > --- a/docs/formatdomain.html.in
> > > > +++ b/docs/formatdomain.html.in
> > > > @@ -632,6 +632,9 @@
> > > >hugepages
> > > >This tells the hypervisor that the guest should have its 
> > > > memory
> > > >  allocated using hugepages instead of the normal native page 
> > > > size.
> > > > +  strict-hugepages
> > > > +  This tells the hypervisor that the guest should refuse to 
> > > > start
> > > > +  in case of failure to allocate guest memory with 
> > > > hugepages
> > > 
> > > Huh, we already supply the -mem-prealloc command line flag alongside
> > > the -mem-path flag which should cause QEMU to exit if it cannot allocate
> > > all memory upfront.
> 
> Daniel,
> 
> Can we merge this patch please? 
> 
> Paolo thinks it belongs to libvirt.

We recently had a bunch more feature requests around huge page support
in libvirt, so I think it is preferrable not to merge this currently.
We need to examine the broader problem to come up with a coherant
design for the whole problemspace.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-03-14 Thread Marcelo Tosatti
On Fri, Mar 14, 2014 at 06:52:19PM -0300, Marcelo Tosatti wrote:
> On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote:
> > On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote:
> > > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote:
> > > > 
> > > > Add an element named "strict-hugepages" to control whether to 
> > > > refuse guest initialization in case hugepage allocation cannot 
> > > > be performed.
> > > > 
> > > > Signed-off-by: Marcelo Tosatti 
> > > > 
> > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > > > index ff50214..e79f5e6 100644
> > > > --- a/docs/formatdomain.html.in
> > > > +++ b/docs/formatdomain.html.in
> > > > @@ -632,6 +632,9 @@
> > > >hugepages
> > > >This tells the hypervisor that the guest should have its 
> > > > memory
> > > >  allocated using hugepages instead of the normal native page 
> > > > size.
> > > > +  strict-hugepages
> > > > +  This tells the hypervisor that the guest should refuse to 
> > > > start
> > > > +  in case of failure to allocate guest memory with 
> > > > hugepages
> > > 
> > > Huh, we already supply the -mem-prealloc command line flag alongside
> > > the -mem-path flag which should cause QEMU to exit if it cannot allocate
> > > all memory upfront.
> 
> Daniel,
> 
> Can we merge this patch please? 
> 
> Paolo thinks it belongs to libvirt.
> 

Err, actually, the huge page size patch:


http://www.spinics.net/linux/fedora/libvir/msg92526.html

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-03-14 Thread Marcelo Tosatti
On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote:
> On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote:
> > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote:
> > > 
> > > Add an element named "strict-hugepages" to control whether to 
> > > refuse guest initialization in case hugepage allocation cannot 
> > > be performed.
> > > 
> > > Signed-off-by: Marcelo Tosatti 
> > > 
> > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > > index ff50214..e79f5e6 100644
> > > --- a/docs/formatdomain.html.in
> > > +++ b/docs/formatdomain.html.in
> > > @@ -632,6 +632,9 @@
> > >hugepages
> > >This tells the hypervisor that the guest should have its memory
> > >  allocated using hugepages instead of the normal native page 
> > > size.
> > > +  strict-hugepages
> > > +  This tells the hypervisor that the guest should refuse to start
> > > +  in case of failure to allocate guest memory with hugepages
> > 
> > Huh, we already supply the -mem-prealloc command line flag alongside
> > the -mem-path flag which should cause QEMU to exit if it cannot allocate
> > all memory upfront.

Daniel,

Can we merge this patch please? 

Paolo thinks it belongs to libvirt.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-05 Thread Marcelo Tosatti
On Wed, Feb 05, 2014 at 09:49:53AM +, Daniel P. Berrange wrote:
> On Tue, Feb 04, 2014 at 03:04:11PM -0700, Eric Blake wrote:
> > On 02/04/2014 02:57 PM, Marcelo Tosatti wrote:
> > > 
> > >> So perhaps we do need some "policy" attribute on the 
> > >> element to indicate desired behaviour here.
> > > 
> > > What about the following new element under  ?
> > > 
> > > enforce_hugepage_size=integer 
> > 
> > Which feels a bit redundant (we're already under the 
> > element, after all).  Maybe:
> > 
> > 
> >   1
> > 
> > 
> > where strict could be no if we are giving a hint but don't care if the
> > hint cannot be honored (default yes if omitted), and where unit + value
> > allows the user to input the size in a sensible unit (on output, we'd
> > probably want to use unit='k' and spell out 1048576, for similarity with
> > all our other memory interfaces that output in k for back-compat reasons).
> 
> I don't think strict=yes|no is neccessarily the best. Per my previous
> mail in this thread there are at least 3 possible policies that could
> be implemented. 
> 
>  - Require memory size multiple of hugepage size

Awkward. If the memory size is not multiple of hugepage size it should 
still be possible to start the guest (although using smaller page
sizes).

>  - Round memory size upto multiple of huge page size

QEMU decides, today, whether or not to do this.

>  - Fill in with smaller huge pages

Again, QEMU can decide this automatically.

> So I'd say   policy="round|exact|bestfit"  even if QEMU doesn't decide
> to actually implement all 3 possible policies, we at least futureproof
> ourselves by not using a boolean.

The request is the following: certain applications require the
TLB performance characteristics provided by certain page sizes.
For this type of application, the minimum page size is given. Below that
page size, its known that the application cannot perform as expected, so 
guest initialization should instead fail.

Except that case, there is no necessity to specify the options you list
above. 

IMO the options should not be created unless their practical use has
been verified. So going for


  x


Which is very precise and can be extended.

Thanks for the suggestions

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-05 Thread Daniel P. Berrange
On Tue, Feb 04, 2014 at 03:04:11PM -0700, Eric Blake wrote:
> On 02/04/2014 02:57 PM, Marcelo Tosatti wrote:
> > 
> >> So perhaps we do need some "policy" attribute on the 
> >> element to indicate desired behaviour here.
> > 
> > What about the following new element under  ?
> > 
> > enforce_hugepage_size=integer 
> 
> Which feels a bit redundant (we're already under the 
> element, after all).  Maybe:
> 
> 
>   1
> 
> 
> where strict could be no if we are giving a hint but don't care if the
> hint cannot be honored (default yes if omitted), and where unit + value
> allows the user to input the size in a sensible unit (on output, we'd
> probably want to use unit='k' and spell out 1048576, for similarity with
> all our other memory interfaces that output in k for back-compat reasons).

I don't think strict=yes|no is neccessarily the best. Per my previous
mail in this thread there are at least 3 possible policies that could
be implemented. 

 - Require memory size multiple of hugepage size
 - Round memory size upto multiple of huge page size
 - Fill in with smaller huge pages

So I'd say   policy="round|exact|bestfit"  even if QEMU doesn't decide
to actually implement all 3 possible policies, we at least futureproof
ourselves by not using a boolean.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Eric Blake
On 02/04/2014 07:42 PM, Marcelo Tosatti wrote:

>>
>> 
>>   1
>> 
>>
>> where strict could be no if we are giving a hint but don't care if the
>> hint cannot be honored (default yes if omitted), and where unit + value
> 
> Its awkward because you'd always want the largest hugepages size
> possible, to reduce TLB pressure. So its hard to find use for 
> 
> 1
> 
> So what about x ?

XML attributes don't work that way.  Either you have strict='yes' or you
omit it; but if there is no reason to ever not have strict, then it
would be sufficient to have:


  x


where omitting the  subelement is a sign that it doesn't matter.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Marcelo Tosatti
On Tue, Feb 04, 2014 at 03:04:11PM -0700, Eric Blake wrote:
> On 02/04/2014 02:57 PM, Marcelo Tosatti wrote:
> > 
> >> So perhaps we do need some "policy" attribute on the 
> >> element to indicate desired behaviour here.
> > 
> > What about the following new element under  ?
> > 
> > enforce_hugepage_size=integer 
> 
> Which feels a bit redundant (we're already under the 
> element, after all).  Maybe:
> 
> 
>   1
> 
> 
> where strict could be no if we are giving a hint but don't care if the
> hint cannot be honored (default yes if omitted), and where unit + value

Its awkward because you'd always want the largest hugepages size
possible, to reduce TLB pressure. So its hard to find use for 

1

So what about x ?

> allows the user to input the size in a sensible unit (on output, we'd
> probably want to use unit='k' and spell out 1048576, for similarity with
> all our other memory interfaces that output in k for back-compat reasons).
> 
> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Eric Blake
On 02/04/2014 02:57 PM, Marcelo Tosatti wrote:
> 
>> So perhaps we do need some "policy" attribute on the 
>> element to indicate desired behaviour here.
> 
> What about the following new element under  ?
> 
> enforce_hugepage_size=integer 

Which feels a bit redundant (we're already under the 
element, after all).  Maybe:


  1


where strict could be no if we are giving a hint but don't care if the
hint cannot be honored (default yes if omitted), and where unit + value
allows the user to input the size in a sensible unit (on output, we'd
probably want to use unit='k' and spell out 1048576, for similarity with
all our other memory interfaces that output in k for back-compat reasons).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Marcelo Tosatti
On Tue, Feb 04, 2014 at 05:26:31PM +, Daniel P. Berrange wrote:
> On Tue, Feb 04, 2014 at 12:22:00PM -0500, Marcelo Tosatti wrote:
> > 
> > You're OK with StrictHugePageSize=N element ?
> 
> I'm not entirely sure what that would do ? Is that for when people want
> to have a specific size of hugepages ? If so, then I guess we might
> actually need some kind of granularity here.

Yes, they want to enforce a specific hugepage size (so as to fail guest
initialization if guest RAM cannot be backed by the particular size).

> eg if the user has a 1200 MB guest, and they want to use 1 GB huge
> pages, then there's a choice between 2 * 1GB huge pages, or 1 x 1GB
> and n * 4M huge pages, or an error requiring that their guest RAM is
> a multiple of requested huge page size.

ATM QEMU does not support different huge page sizes, so the only 
option would be 2 * 1GB pages.

> So perhaps we do need some "policy" attribute on the 
> element to indicate desired behaviour here.

What about the following new element under  ?

enforce_hugepage_size=integer 


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Marcelo Tosatti
On Tue, Feb 04, 2014 at 05:26:31PM +, Daniel P. Berrange wrote:
> On Tue, Feb 04, 2014 at 12:22:00PM -0500, Marcelo Tosatti wrote:
> > 
> > You're OK with StrictHugePageSize=N element ?
> 
> I'm not entirely sure what that would do ? Is that for when people want
> to have a specific size of hugepages ? If so, then I guess we might
> actually need some kind of granularity here.
> 
> eg if the user has a 1200 MB guest, and they want to use 1 GB huge
> pages, then there's a choice between 2 * 1GB huge pages, or 1 x 1GB
> and n * 4M huge pages, or an error requiring that their guest RAM is
> a multiple of requested huge page size.

Yes, they want to fail VM initialization if it cannot be backed by a specific
hugepage size.

> So perhaps we do need some "policy" attribute on the 
> element to indicate desired behaviour here.

OK, thanks.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Daniel P. Berrange
On Tue, Feb 04, 2014 at 12:22:00PM -0500, Marcelo Tosatti wrote:
> 
> You're OK with StrictHugePageSize=N element ?

I'm not entirely sure what that would do ? Is that for when people want
to have a specific size of hugepages ? If so, then I guess we might
actually need some kind of granularity here.

eg if the user has a 1200 MB guest, and they want to use 1 GB huge
pages, then there's a choice between 2 * 1GB huge pages, or 1 x 1GB
and n * 4M huge pages, or an error requiring that their guest RAM is
a multiple of requested huge page size.

So perhaps we do need some "policy" attribute on the 
element to indicate desired behaviour here.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Marcelo Tosatti
On Tue, Feb 04, 2014 at 05:10:13PM +, Daniel P. Berrange wrote:
> > Because there is no guarantee with -mem-prealloc. For instance, if the
> > hugepage path is not actually hugetlbfs backed, QEMU falls back to
> > malloc().
> 
> Well if you can't fix -mem-prealloc to properly report errors for reasons
> of back compat, then it is certainly possible to add a further 'strict=yes|no'
> option to the CLI arg request that it verify this. I don't see any reason
> why this checking code should be in libvirt rather than QEMU.
> 
> Regards,
> Daniel

OK, do you want the StrictHugepage element to exist, and default
libvirt behaviour to remain as it is today (fallback to malloc), 

or

do you want libvirt to pass strict mode to -mem-prealloc as default
when using hugepages element? (which would change behaviour of libvirt
for existing guests)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Marcelo Tosatti

You're OK with StrictHugePageSize=N element ?

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Daniel P. Berrange
On Tue, Feb 04, 2014 at 12:18:34PM -0500, Marcelo Tosatti wrote:
> On Tue, Feb 04, 2014 at 05:10:13PM +, Daniel P. Berrange wrote:
> > > Because there is no guarantee with -mem-prealloc. For instance, if the
> > > hugepage path is not actually hugetlbfs backed, QEMU falls back to
> > > malloc().
> > 
> > Well if you can't fix -mem-prealloc to properly report errors for reasons
> > of back compat, then it is certainly possible to add a further 
> > 'strict=yes|no'
> > option to the CLI arg request that it verify this. I don't see any reason
> > why this checking code should be in libvirt rather than QEMU.
> > 
> > Regards,
> > Daniel
> 
> OK, do you want the StrictHugepage element to exist, and default
> libvirt behaviour to remain as it is today (fallback to malloc), 
> 
> or
> 
> do you want libvirt to pass strict mode to -mem-prealloc as default
> when using hugepages element? (which would change behaviour of libvirt
> for existing guests)

I'd say the latter - if the user is requesting hugepages they should
be guaranteed to get them. Libvirt doesn't ever use -mem-path with
a non-hugetlbfs filesystem path AFAIR, so the fallback to malloc
doesn't really make any sense for our needs.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Daniel P. Berrange
On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote:
> On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote:
> > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote:
> > > 
> > > Add an element named "strict-hugepages" to control whether to 
> > > refuse guest initialization in case hugepage allocation cannot 
> > > be performed.
> > > 
> > > Signed-off-by: Marcelo Tosatti 
> > > 
> > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > > index ff50214..e79f5e6 100644
> > > --- a/docs/formatdomain.html.in
> > > +++ b/docs/formatdomain.html.in
> > > @@ -632,6 +632,9 @@
> > >hugepages
> > >This tells the hypervisor that the guest should have its memory
> > >  allocated using hugepages instead of the normal native page 
> > > size.
> > > +  strict-hugepages
> > > +  This tells the hypervisor that the guest should refuse to start
> > > +  in case of failure to allocate guest memory with hugepages
> > 
> > Huh, we already supply the -mem-prealloc command line flag alongside
> > the -mem-path flag which should cause QEMU to exit if it cannot allocate
> > all memory upfront.
> 
>-mem-path path
>Allocate guest RAM from a temporarily created file in path.
> 
>-mem-prealloc
>Preallocate memory when using -mem-path.
> 
> "should cause QEMU to exit if it cannot allocate all memory upfront" =>
> no it does not. See more below.

I'd say that is a bug - if it is failing to pre-allocate all the
memory when asked todo so, then it should exit with an error IMHO.

> > libvirt to the private implementation detail of QEMU's backing file
> 
> Right. I am going to add a comment to QEMU saying libvirt interprets the
> string (so that changing it breaks libvirt). Other tools already do
> this.

It is just plain bad design to expose internals of QEMU to libvirt in
this way. QEMU is perfectly capable of enforcing this itself and thus
retaining the flexibility to change its impl later, without binding
together libvirt + QEMU in this way.

> > naming. On top of which I don't see why we need this when we're already
> > setting -mem-prealloc to enforce exactly this.
> 
> Because there is no guarantee with -mem-prealloc. For instance, if the
> hugepage path is not actually hugetlbfs backed, QEMU falls back to
> malloc().

Well if you can't fix -mem-prealloc to properly report errors for reasons
of back compat, then it is certainly possible to add a further 'strict=yes|no'
option to the CLI arg request that it verify this. I don't see any reason
why this checking code should be in libvirt rather than QEMU.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Marcelo Tosatti
On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote:
> On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote:
> > 
> > Add an element named "strict-hugepages" to control whether to 
> > refuse guest initialization in case hugepage allocation cannot 
> > be performed.
> > 
> > Signed-off-by: Marcelo Tosatti 
> > 
> > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > index ff50214..e79f5e6 100644
> > --- a/docs/formatdomain.html.in
> > +++ b/docs/formatdomain.html.in
> > @@ -632,6 +632,9 @@
> >hugepages
> >This tells the hypervisor that the guest should have its memory
> >  allocated using hugepages instead of the normal native page 
> > size.
> > +  strict-hugepages
> > +  This tells the hypervisor that the guest should refuse to start
> > +  in case of failure to allocate guest memory with hugepages
> 
> Huh, we already supply the -mem-prealloc command line flag alongside
> the -mem-path flag which should cause QEMU to exit if it cannot allocate
> all memory upfront.

   -mem-path path
   Allocate guest RAM from a temporarily created file in path.

   -mem-prealloc
   Preallocate memory when using -mem-path.

"should cause QEMU to exit if it cannot allocate all memory upfront" =>
no it does not. See more below.

> > +/*
> > + * Returns bool: whether to fail guest initialization.
> > + *
> > + */
> > +static bool qemuValidateStrictHugepage(virDomainObjPtr vm, 
> > virQEMUDriverConfigPtr cfg)
> > +{
> > +bool ret = false;
> > +char **maps = NULL;
> > +int i;
> > +char *buf;
> > +
> > +if (!vm->def->mem.strict_hugepages)
> > +return ret;
> > +
> > +ret = true;
> > +
> > +if (!vm->def->mem.hugepage_backed || !cfg->hugepagePath) {
> > +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> > +_("strict huge pages depends on huge pages"));
> > +return ret;
> > +}
> > +
> > +buf = malloc(strlen(cfg->hugepagePath) + 50);
> > +
> > +/* The parser requires /proc/pid, which only exists on platforms
> > + * like Linux where pid_t fits in int.  */
> > +if ((int) vm->pid != vm->pid ||
> > +qemuParseProcFileStrings(vm->pid, "maps", &maps) < 0)
> > +goto cleanup;
> > +
> > +for (i = 0; maps && maps[i]; i++) {
> > +char *endptr;
> > +unsigned long start, end;
> > +const char *map = maps[i];
> > +bool found = false;
> > +
> > +sprintf(buf, "%s/qemu_back_mem.pc.ram.", cfg->hugepagePath);
> > +if (strstr(map,buf) != NULL)
> > +found = true;
> > +
> > +sprintf(buf, "%s/kvm.", cfg->hugepagePath);
> > +if (strstr(map,buf) != NULL)
> > +found = true;
> > +
> > +if (!found)
> > +continue;
> > +
> > +errno = 0;
> > +start = strtol(map, &endptr, 16);
> > +if ((errno == ERANGE && (start == LONG_MAX || start == LONG_MIN))
> > +|| (errno != 0 && start == 0)) {
> > +continue;
> > +}
> > +
> > +if (endptr && *endptr == '-')
> > +endptr++;
> > +
> > +if (!*endptr)
> > +continue;
> > +
> > +errno = 0;
> > +end = strtol(endptr, NULL, 16);
> > +if ((errno == ERANGE && (end == LONG_MAX || end == LONG_MIN))
> > +|| (errno != 0 && end == 0)) {
> > +continue;
> > +}
> > +
> > +if (end-start >= vm->def->mem.max_balloon * 1024) {
> > +ret = false;
> > +break;
> > +}
> > +}
> 
> This code is really very nasty. It is both Linux specific and exposes

Hugetlbfs is Linux specific. So all Hugetlbfs code in libvirt is already
Linux specific.

> libvirt to the private implementation detail of QEMU's backing file

Right. I am going to add a comment to QEMU saying libvirt interprets the
string (so that changing it breaks libvirt). Other tools already do
this.

> naming. On top of which I don't see why we need this when we're already
> setting -mem-prealloc to enforce exactly this.

Because there is no guarantee with -mem-prealloc. For instance, if the
hugepage path is not actually hugetlbfs backed, QEMU falls back to
malloc(). 

A new element StrictHugePageSize=N has been requested, where it is
necessary to fail in case hugepagesize != N.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Marcelo Tosatti
On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote:
> On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote:
> > 
> > Add an element named "strict-hugepages" to control whether to 
> > refuse guest initialization in case hugepage allocation cannot 
> > be performed.
> > 
> > Signed-off-by: Marcelo Tosatti 
> > 
> > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> > index ff50214..e79f5e6 100644
> > --- a/docs/formatdomain.html.in
> > +++ b/docs/formatdomain.html.in
> > @@ -632,6 +632,9 @@
> >hugepages
> >This tells the hypervisor that the guest should have its memory
> >  allocated using hugepages instead of the normal native page 
> > size.
> > +  strict-hugepages
> > +  This tells the hypervisor that the guest should refuse to start
> > +  in case of failure to allocate guest memory with hugepages
> 
> Huh, we already supply the -mem-prealloc command line flag alongside
> the -mem-path flag which should cause QEMU to exit if it cannot allocate
> all memory upfront.
> 
> > +/*
> > + * Returns bool: whether to fail guest initialization.
> > + *
> > + */
> > +static bool qemuValidateStrictHugepage(virDomainObjPtr vm, 
> > virQEMUDriverConfigPtr cfg)
> > +{
> > +bool ret = false;
> > +char **maps = NULL;
> > +int i;
> > +char *buf;
> > +
> > +if (!vm->def->mem.strict_hugepages)
> > +return ret;
> > +
> > +ret = true;
> > +
> > +if (!vm->def->mem.hugepage_backed || !cfg->hugepagePath) {
> > +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> > +_("strict huge pages depends on huge pages"));
> > +return ret;
> > +}
> > +
> > +buf = malloc(strlen(cfg->hugepagePath) + 50);
> > +
> > +/* The parser requires /proc/pid, which only exists on platforms
> > + * like Linux where pid_t fits in int.  */
> > +if ((int) vm->pid != vm->pid ||
> > +qemuParseProcFileStrings(vm->pid, "maps", &maps) < 0)
> > +goto cleanup;
> > +
> > +for (i = 0; maps && maps[i]; i++) {
> > +char *endptr;
> > +unsigned long start, end;
> > +const char *map = maps[i];
> > +bool found = false;
> > +
> > +sprintf(buf, "%s/qemu_back_mem.pc.ram.", cfg->hugepagePath);
> > +if (strstr(map,buf) != NULL)
> > +found = true;
> > +
> > +sprintf(buf, "%s/kvm.", cfg->hugepagePath);
> > +if (strstr(map,buf) != NULL)
> > +found = true;
> > +
> > +if (!found)
> > +continue;
> > +
> > +errno = 0;
> > +start = strtol(map, &endptr, 16);
> > +if ((errno == ERANGE && (start == LONG_MAX || start == LONG_MIN))
> > +|| (errno != 0 && start == 0)) {
> > +continue;
> > +}
> > +
> > +if (endptr && *endptr == '-')
> > +endptr++;
> > +
> > +if (!*endptr)
> > +continue;
> > +
> > +errno = 0;
> > +end = strtol(endptr, NULL, 16);
> > +if ((errno == ERANGE && (end == LONG_MAX || end == LONG_MIN))
> > +|| (errno != 0 && end == 0)) {
> > +continue;
> > +}
> > +
> > +if (end-start >= vm->def->mem.max_balloon * 1024) {
> > +ret = false;
> > +break;
> > +}
> > +}
> 
> This code is really very nasty. It is both Linux specific and exposes
> libvirt to the private implementation detail of QEMU's backing file
> naming. On top of which I don't see why we need this when we're already
> setting -mem-prealloc to enforce exactly this.

BTW, you already do /proc parsing and already do hugetlbfs which is
Linux specific.

So if you can detail in what new way this code is nasty (accordingly to
your judgement above), i can look into improving it.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Daniel P. Berrange
On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote:
> 
> Add an element named "strict-hugepages" to control whether to 
> refuse guest initialization in case hugepage allocation cannot 
> be performed.
> 
> Signed-off-by: Marcelo Tosatti 
> 
> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
> index ff50214..e79f5e6 100644
> --- a/docs/formatdomain.html.in
> +++ b/docs/formatdomain.html.in
> @@ -632,6 +632,9 @@
>hugepages
>This tells the hypervisor that the guest should have its memory
>  allocated using hugepages instead of the normal native page 
> size.
> +  strict-hugepages
> +  This tells the hypervisor that the guest should refuse to start
> +  in case of failure to allocate guest memory with hugepages

Huh, we already supply the -mem-prealloc command line flag alongside
the -mem-path flag which should cause QEMU to exit if it cannot allocate
all memory upfront.

> +/*
> + * Returns bool: whether to fail guest initialization.
> + *
> + */
> +static bool qemuValidateStrictHugepage(virDomainObjPtr vm, 
> virQEMUDriverConfigPtr cfg)
> +{
> +bool ret = false;
> +char **maps = NULL;
> +int i;
> +char *buf;
> +
> +if (!vm->def->mem.strict_hugepages)
> +return ret;
> +
> +ret = true;
> +
> +if (!vm->def->mem.hugepage_backed || !cfg->hugepagePath) {
> +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> +_("strict huge pages depends on huge pages"));
> +return ret;
> +}
> +
> +buf = malloc(strlen(cfg->hugepagePath) + 50);
> +
> +/* The parser requires /proc/pid, which only exists on platforms
> + * like Linux where pid_t fits in int.  */
> +if ((int) vm->pid != vm->pid ||
> +qemuParseProcFileStrings(vm->pid, "maps", &maps) < 0)
> +goto cleanup;
> +
> +for (i = 0; maps && maps[i]; i++) {
> +char *endptr;
> +unsigned long start, end;
> +const char *map = maps[i];
> +bool found = false;
> +
> +sprintf(buf, "%s/qemu_back_mem.pc.ram.", cfg->hugepagePath);
> +if (strstr(map,buf) != NULL)
> +found = true;
> +
> +sprintf(buf, "%s/kvm.", cfg->hugepagePath);
> +if (strstr(map,buf) != NULL)
> +found = true;
> +
> +if (!found)
> +continue;
> +
> +errno = 0;
> +start = strtol(map, &endptr, 16);
> +if ((errno == ERANGE && (start == LONG_MAX || start == LONG_MIN))
> +|| (errno != 0 && start == 0)) {
> +continue;
> +}
> +
> +if (endptr && *endptr == '-')
> +endptr++;
> +
> +if (!*endptr)
> +continue;
> +
> +errno = 0;
> +end = strtol(endptr, NULL, 16);
> +if ((errno == ERANGE && (end == LONG_MAX || end == LONG_MIN))
> +|| (errno != 0 && end == 0)) {
> +continue;
> +}
> +
> +if (end-start >= vm->def->mem.max_balloon * 1024) {
> +ret = false;
> +break;
> +}
> +}

This code is really very nasty. It is both Linux specific and exposes
libvirt to the private implementation detail of QEMU's backing file
naming. On top of which I don't see why we need this when we're already
setting -mem-prealloc to enforce exactly this.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM

2014-02-04 Thread Marcelo Tosatti

Add an element named "strict-hugepages" to control whether to 
refuse guest initialization in case hugepage allocation cannot 
be performed.

Signed-off-by: Marcelo Tosatti 

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index ff50214..e79f5e6 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -632,6 +632,9 @@
   hugepages
   This tells the hypervisor that the guest should have its memory
 allocated using hugepages instead of the normal native page size.
+  strict-hugepages
+  This tells the hypervisor that the guest should refuse to start
+  in case of failure to allocate guest memory with hugepages
   nosharepages
   Instructs hypervisor to disable shared pages (memory merge, KSM) for
 this domain. Since 1.0.6
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 28e24f9..f16ef0b 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -11226,6 +11226,9 @@ virDomainDefParseXML(xmlDocPtr xml,
 if (virXPathBoolean("boolean(./memoryBacking/locked)", ctxt))
 def->mem.locked = true;
 
+if ((node = virXPathNode("./memoryBacking/stricthugepages", ctxt)))
+def->mem.strict_hugepages = true;
+
 /* Extract blkio cgroup tunables */
 if (virXPathUInt("string(./blkiotune/weight)", ctxt,
  &def->blkio.weight) < 0)
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index d8f2e49..8ea5cf0 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -1977,6 +1977,7 @@ struct _virDomainDef {
 unsigned long long max_balloon; /* in kibibytes */
 unsigned long long cur_balloon; /* in kibibytes */
 bool hugepage_backed;
+bool strict_hugepages;
 bool nosharepages;
 bool locked;
 int dump_core; /* enum virDomainMemDump */
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 96b8825..3f8d0a4 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -12133,10 +12133,9 @@ cleanup:
 return def;
 }
 
-
-static int qemuParseProcFileStrings(int pid_value,
-const char *name,
-char ***list)
+int qemuParseProcFileStrings(int pid_value,
+ const char *name,
+ char ***list)
 {
 char *path = NULL;
 int ret = -1;
diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h
index de7683d..bcdfefa 100644
--- a/src/qemu/qemu_command.h
+++ b/src/qemu/qemu_command.h
@@ -226,7 +226,9 @@ virDomainDefPtr qemuParseCommandLinePid(virCapsPtr qemuCaps,
 char **pidfile,
 virDomainChrSourceDefPtr *monConfig,
 bool *monJSON);
-
+int qemuParseProcFileStrings(int pid_value,
+ const char *name,
+ char ***list);
 int qemuDomainAssignAddresses(virDomainDefPtr def,
   virQEMUCapsPtr qemuCaps,
   virDomainObjPtr obj)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 8bcd98e..cb8298e 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #if defined(__linux__)
 # include 
 #elif defined(__FreeBSD__)
@@ -3507,6 +3508,95 @@ error:
 }
 
 
+/*
+ * Returns bool: whether to fail guest initialization.
+ *
+ */
+static bool qemuValidateStrictHugepage(virDomainObjPtr vm, 
virQEMUDriverConfigPtr cfg)
+{
+bool ret = false;
+char **maps = NULL;
+int i;
+char *buf;
+
+if (!vm->def->mem.strict_hugepages)
+return ret;
+
+ret = true;
+
+if (!vm->def->mem.hugepage_backed || !cfg->hugepagePath) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+_("strict huge pages depends on huge pages"));
+return ret;
+}
+
+buf = malloc(strlen(cfg->hugepagePath) + 50);
+
+/* The parser requires /proc/pid, which only exists on platforms
+ * like Linux where pid_t fits in int.  */
+if ((int) vm->pid != vm->pid ||
+qemuParseProcFileStrings(vm->pid, "maps", &maps) < 0)
+goto cleanup;
+
+for (i = 0; maps && maps[i]; i++) {
+char *endptr;
+unsigned long start, end;
+const char *map = maps[i];
+bool found = false;
+
+sprintf(buf, "%s/qemu_back_mem.pc.ram.", cfg->hugepagePath);
+if (strstr(map,buf) != NULL)
+found = true;
+
+sprintf(buf, "%s/kvm.", cfg->hugepagePath);
+if (strstr(map,buf) != NULL)
+found = true;
+
+if (!found)
+continue;
+
+errno = 0;
+start = strtol(map, &endptr, 16);
+if ((errno == ERANGE && (start == LONG_MAX || start == LONG_MIN))
+|| (errno != 0 && start == 0)) {
+continue;
+}
+
+