Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
Il 09/04/2014 17:12, Daniel P. Berrange ha scritto: > On Tue, Apr 08, 2014 at 03:06:13PM -0300, Marcelo Tosatti wrote: >> On Mon, Mar 17, 2014 at 09:39:54AM +, Daniel P. Berrange wrote: >>> We recently had a bunch more feature requests around huge page support >>> in libvirt, so I think it is preferrable not to merge this currently. >>> We need to examine the broader problem to come up with a coherant >>> design for the whole problemspace. >> >> There is a number of options implemented in the kernel and qemu that >> should be exposed via libvirt to the user. >> >> Can you explain what are the properties to be verified in the coherent >> design that you mention which are being ignored in the proposed patchset > > Libvirt needs to be able to report huge pages available per NUMA node, > to report how many of the pages are free, what size they are. Also need > to consider how we tell QEMU to use hugepages from specific host NUMA > nodes when launching guests, possibly also mapping those to specific > guest NUMA nodes. With NUMA there's more policy considerations if > enough hugepages aren't available, do we fallback to non-hugepages on > the same numa node, or to hugepages frorm a different numa node. I > don't believe we're going to be able to solve all this in libvirt alone > and so I think we're going to need improve QEMU command line args for > setting up hugepages Yes, this is already queued for 2.1. The same backend device in QEMU is used for both advanced memory placement and memory hotplug. It goes something like this: -object memory-ram,size=1024M,policy=membind,host-nodes=0,id=ram-node0 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0 \ -object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \ -numa node,nodeid=1,cpus=1,memdev=ram-node1 For a non-NUMA guest, just put a single "-numa node,memdev=ram-node0". The defaults will apply. To simulate the effect of "-mem-path ...", instead of memory-ram you use "-object memory-file,mem-path=...". You can see above the properties that control policies for host NUMA nodes. Other properties (merge, dump, prealloc) handle characteristics of the backend that previously were split across multiple command-line options. Notably, prealloc now will work for mmap-backed RAM, not just for hugetlbfs. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Apr 08, 2014 at 03:06:13PM -0300, Marcelo Tosatti wrote: > On Mon, Mar 17, 2014 at 09:39:54AM +, Daniel P. Berrange wrote: > > We recently had a bunch more feature requests around huge page support > > in libvirt, so I think it is preferrable not to merge this currently. > > We need to examine the broader problem to come up with a coherant > > design for the whole problemspace. > > There is a number of options implemented in the kernel and qemu that > should be exposed via libvirt to the user. > > Can you explain what are the properties to be verified in the coherent > design that you mention which are being ignored in the proposed patchset Libvirt needs to be able to report huge pages available per NUMA node, to report how many of the pages are free, what size they are. Also need to consider how we tell QEMU to use hugepages from specific host NUMA nodes when launching guests, possibly also mapping those to specific guest NUMA nodes. With NUMA there's more policy considerations if enough hugepages aren't available, do we fallback to non-hugepages on the same numa node, or to hugepages frorm a different numa node. I don't believe we're going to be able to solve all this in libvirt alone and so I think we're going to need improve QEMU command line args for setting up hugepages Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Mon, Mar 17, 2014 at 09:39:54AM +, Daniel P. Berrange wrote: > On Fri, Mar 14, 2014 at 06:52:19PM -0300, Marcelo Tosatti wrote: > > On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote: > > > On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote: > > > > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote: > > > > > > > > > > Add an element named "strict-hugepages" to control whether to > > > > > refuse guest initialization in case hugepage allocation cannot > > > > > be performed. > > > > > > > > > > Signed-off-by: Marcelo Tosatti > > > > > > > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > > > > index ff50214..e79f5e6 100644 > > > > > --- a/docs/formatdomain.html.in > > > > > +++ b/docs/formatdomain.html.in > > > > > @@ -632,6 +632,9 @@ > > > > >hugepages > > > > >This tells the hypervisor that the guest should have its > > > > > memory > > > > > allocated using hugepages instead of the normal native page > > > > > size. > > > > > + strict-hugepages > > > > > + This tells the hypervisor that the guest should refuse to > > > > > start > > > > > + in case of failure to allocate guest memory with > > > > > hugepages > > > > > > > > Huh, we already supply the -mem-prealloc command line flag alongside > > > > the -mem-path flag which should cause QEMU to exit if it cannot allocate > > > > all memory upfront. > > > > Daniel, > > > > Can we merge this patch please? > > > > Paolo thinks it belongs to libvirt. > > We recently had a bunch more feature requests around huge page support > in libvirt, so I think it is preferrable not to merge this currently. > We need to examine the broader problem to come up with a coherant > design for the whole problemspace. There is a number of options implemented in the kernel and qemu that should be exposed via libvirt to the user. Can you explain what are the properties to be verified in the coherent design that you mention which are being ignored in the proposed patchset ? -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Fri, Mar 14, 2014 at 06:52:19PM -0300, Marcelo Tosatti wrote: > On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote: > > On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote: > > > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote: > > > > > > > > Add an element named "strict-hugepages" to control whether to > > > > refuse guest initialization in case hugepage allocation cannot > > > > be performed. > > > > > > > > Signed-off-by: Marcelo Tosatti > > > > > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > > > index ff50214..e79f5e6 100644 > > > > --- a/docs/formatdomain.html.in > > > > +++ b/docs/formatdomain.html.in > > > > @@ -632,6 +632,9 @@ > > > >hugepages > > > >This tells the hypervisor that the guest should have its > > > > memory > > > > allocated using hugepages instead of the normal native page > > > > size. > > > > + strict-hugepages > > > > + This tells the hypervisor that the guest should refuse to > > > > start > > > > + in case of failure to allocate guest memory with > > > > hugepages > > > > > > Huh, we already supply the -mem-prealloc command line flag alongside > > > the -mem-path flag which should cause QEMU to exit if it cannot allocate > > > all memory upfront. > > Daniel, > > Can we merge this patch please? > > Paolo thinks it belongs to libvirt. We recently had a bunch more feature requests around huge page support in libvirt, so I think it is preferrable not to merge this currently. We need to examine the broader problem to come up with a coherant design for the whole problemspace. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Fri, Mar 14, 2014 at 06:52:19PM -0300, Marcelo Tosatti wrote: > On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote: > > On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote: > > > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote: > > > > > > > > Add an element named "strict-hugepages" to control whether to > > > > refuse guest initialization in case hugepage allocation cannot > > > > be performed. > > > > > > > > Signed-off-by: Marcelo Tosatti > > > > > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > > > index ff50214..e79f5e6 100644 > > > > --- a/docs/formatdomain.html.in > > > > +++ b/docs/formatdomain.html.in > > > > @@ -632,6 +632,9 @@ > > > >hugepages > > > >This tells the hypervisor that the guest should have its > > > > memory > > > > allocated using hugepages instead of the normal native page > > > > size. > > > > + strict-hugepages > > > > + This tells the hypervisor that the guest should refuse to > > > > start > > > > + in case of failure to allocate guest memory with > > > > hugepages > > > > > > Huh, we already supply the -mem-prealloc command line flag alongside > > > the -mem-path flag which should cause QEMU to exit if it cannot allocate > > > all memory upfront. > > Daniel, > > Can we merge this patch please? > > Paolo thinks it belongs to libvirt. > Err, actually, the huge page size patch: http://www.spinics.net/linux/fedora/libvir/msg92526.html -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote: > On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote: > > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote: > > > > > > Add an element named "strict-hugepages" to control whether to > > > refuse guest initialization in case hugepage allocation cannot > > > be performed. > > > > > > Signed-off-by: Marcelo Tosatti > > > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > > index ff50214..e79f5e6 100644 > > > --- a/docs/formatdomain.html.in > > > +++ b/docs/formatdomain.html.in > > > @@ -632,6 +632,9 @@ > > >hugepages > > >This tells the hypervisor that the guest should have its memory > > > allocated using hugepages instead of the normal native page > > > size. > > > + strict-hugepages > > > + This tells the hypervisor that the guest should refuse to start > > > + in case of failure to allocate guest memory with hugepages > > > > Huh, we already supply the -mem-prealloc command line flag alongside > > the -mem-path flag which should cause QEMU to exit if it cannot allocate > > all memory upfront. Daniel, Can we merge this patch please? Paolo thinks it belongs to libvirt. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Wed, Feb 05, 2014 at 09:49:53AM +, Daniel P. Berrange wrote: > On Tue, Feb 04, 2014 at 03:04:11PM -0700, Eric Blake wrote: > > On 02/04/2014 02:57 PM, Marcelo Tosatti wrote: > > > > > >> So perhaps we do need some "policy" attribute on the > > >> element to indicate desired behaviour here. > > > > > > What about the following new element under ? > > > > > > enforce_hugepage_size=integer > > > > Which feels a bit redundant (we're already under the > > element, after all). Maybe: > > > > > > 1 > > > > > > where strict could be no if we are giving a hint but don't care if the > > hint cannot be honored (default yes if omitted), and where unit + value > > allows the user to input the size in a sensible unit (on output, we'd > > probably want to use unit='k' and spell out 1048576, for similarity with > > all our other memory interfaces that output in k for back-compat reasons). > > I don't think strict=yes|no is neccessarily the best. Per my previous > mail in this thread there are at least 3 possible policies that could > be implemented. > > - Require memory size multiple of hugepage size Awkward. If the memory size is not multiple of hugepage size it should still be possible to start the guest (although using smaller page sizes). > - Round memory size upto multiple of huge page size QEMU decides, today, whether or not to do this. > - Fill in with smaller huge pages Again, QEMU can decide this automatically. > So I'd say policy="round|exact|bestfit" even if QEMU doesn't decide > to actually implement all 3 possible policies, we at least futureproof > ourselves by not using a boolean. The request is the following: certain applications require the TLB performance characteristics provided by certain page sizes. For this type of application, the minimum page size is given. Below that page size, its known that the application cannot perform as expected, so guest initialization should instead fail. Except that case, there is no necessity to specify the options you list above. IMO the options should not be created unless their practical use has been verified. So going for x Which is very precise and can be extended. Thanks for the suggestions -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 03:04:11PM -0700, Eric Blake wrote: > On 02/04/2014 02:57 PM, Marcelo Tosatti wrote: > > > >> So perhaps we do need some "policy" attribute on the > >> element to indicate desired behaviour here. > > > > What about the following new element under ? > > > > enforce_hugepage_size=integer > > Which feels a bit redundant (we're already under the > element, after all). Maybe: > > > 1 > > > where strict could be no if we are giving a hint but don't care if the > hint cannot be honored (default yes if omitted), and where unit + value > allows the user to input the size in a sensible unit (on output, we'd > probably want to use unit='k' and spell out 1048576, for similarity with > all our other memory interfaces that output in k for back-compat reasons). I don't think strict=yes|no is neccessarily the best. Per my previous mail in this thread there are at least 3 possible policies that could be implemented. - Require memory size multiple of hugepage size - Round memory size upto multiple of huge page size - Fill in with smaller huge pages So I'd say policy="round|exact|bestfit" even if QEMU doesn't decide to actually implement all 3 possible policies, we at least futureproof ourselves by not using a boolean. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On 02/04/2014 07:42 PM, Marcelo Tosatti wrote: >> >> >> 1 >> >> >> where strict could be no if we are giving a hint but don't care if the >> hint cannot be honored (default yes if omitted), and where unit + value > > Its awkward because you'd always want the largest hugepages size > possible, to reduce TLB pressure. So its hard to find use for > > 1 > > So what about x ? XML attributes don't work that way. Either you have strict='yes' or you omit it; but if there is no reason to ever not have strict, then it would be sufficient to have: x where omitting the subelement is a sign that it doesn't matter. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 03:04:11PM -0700, Eric Blake wrote: > On 02/04/2014 02:57 PM, Marcelo Tosatti wrote: > > > >> So perhaps we do need some "policy" attribute on the > >> element to indicate desired behaviour here. > > > > What about the following new element under ? > > > > enforce_hugepage_size=integer > > Which feels a bit redundant (we're already under the > element, after all). Maybe: > > > 1 > > > where strict could be no if we are giving a hint but don't care if the > hint cannot be honored (default yes if omitted), and where unit + value Its awkward because you'd always want the largest hugepages size possible, to reduce TLB pressure. So its hard to find use for 1 So what about x ? > allows the user to input the size in a sensible unit (on output, we'd > probably want to use unit='k' and spell out 1048576, for similarity with > all our other memory interfaces that output in k for back-compat reasons). > > -- > Eric Blake eblake redhat com+1-919-301-3266 > Libvirt virtualization library http://libvirt.org > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On 02/04/2014 02:57 PM, Marcelo Tosatti wrote: > >> So perhaps we do need some "policy" attribute on the >> element to indicate desired behaviour here. > > What about the following new element under ? > > enforce_hugepage_size=integer Which feels a bit redundant (we're already under the element, after all). Maybe: 1 where strict could be no if we are giving a hint but don't care if the hint cannot be honored (default yes if omitted), and where unit + value allows the user to input the size in a sensible unit (on output, we'd probably want to use unit='k' and spell out 1048576, for similarity with all our other memory interfaces that output in k for back-compat reasons). -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 05:26:31PM +, Daniel P. Berrange wrote: > On Tue, Feb 04, 2014 at 12:22:00PM -0500, Marcelo Tosatti wrote: > > > > You're OK with StrictHugePageSize=N element ? > > I'm not entirely sure what that would do ? Is that for when people want > to have a specific size of hugepages ? If so, then I guess we might > actually need some kind of granularity here. Yes, they want to enforce a specific hugepage size (so as to fail guest initialization if guest RAM cannot be backed by the particular size). > eg if the user has a 1200 MB guest, and they want to use 1 GB huge > pages, then there's a choice between 2 * 1GB huge pages, or 1 x 1GB > and n * 4M huge pages, or an error requiring that their guest RAM is > a multiple of requested huge page size. ATM QEMU does not support different huge page sizes, so the only option would be 2 * 1GB pages. > So perhaps we do need some "policy" attribute on the > element to indicate desired behaviour here. What about the following new element under ? enforce_hugepage_size=integer -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 05:26:31PM +, Daniel P. Berrange wrote: > On Tue, Feb 04, 2014 at 12:22:00PM -0500, Marcelo Tosatti wrote: > > > > You're OK with StrictHugePageSize=N element ? > > I'm not entirely sure what that would do ? Is that for when people want > to have a specific size of hugepages ? If so, then I guess we might > actually need some kind of granularity here. > > eg if the user has a 1200 MB guest, and they want to use 1 GB huge > pages, then there's a choice between 2 * 1GB huge pages, or 1 x 1GB > and n * 4M huge pages, or an error requiring that their guest RAM is > a multiple of requested huge page size. Yes, they want to fail VM initialization if it cannot be backed by a specific hugepage size. > So perhaps we do need some "policy" attribute on the > element to indicate desired behaviour here. OK, thanks. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 12:22:00PM -0500, Marcelo Tosatti wrote: > > You're OK with StrictHugePageSize=N element ? I'm not entirely sure what that would do ? Is that for when people want to have a specific size of hugepages ? If so, then I guess we might actually need some kind of granularity here. eg if the user has a 1200 MB guest, and they want to use 1 GB huge pages, then there's a choice between 2 * 1GB huge pages, or 1 x 1GB and n * 4M huge pages, or an error requiring that their guest RAM is a multiple of requested huge page size. So perhaps we do need some "policy" attribute on the element to indicate desired behaviour here. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 05:10:13PM +, Daniel P. Berrange wrote: > > Because there is no guarantee with -mem-prealloc. For instance, if the > > hugepage path is not actually hugetlbfs backed, QEMU falls back to > > malloc(). > > Well if you can't fix -mem-prealloc to properly report errors for reasons > of back compat, then it is certainly possible to add a further 'strict=yes|no' > option to the CLI arg request that it verify this. I don't see any reason > why this checking code should be in libvirt rather than QEMU. > > Regards, > Daniel OK, do you want the StrictHugepage element to exist, and default libvirt behaviour to remain as it is today (fallback to malloc), or do you want libvirt to pass strict mode to -mem-prealloc as default when using hugepages element? (which would change behaviour of libvirt for existing guests) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
You're OK with StrictHugePageSize=N element ? -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 12:18:34PM -0500, Marcelo Tosatti wrote: > On Tue, Feb 04, 2014 at 05:10:13PM +, Daniel P. Berrange wrote: > > > Because there is no guarantee with -mem-prealloc. For instance, if the > > > hugepage path is not actually hugetlbfs backed, QEMU falls back to > > > malloc(). > > > > Well if you can't fix -mem-prealloc to properly report errors for reasons > > of back compat, then it is certainly possible to add a further > > 'strict=yes|no' > > option to the CLI arg request that it verify this. I don't see any reason > > why this checking code should be in libvirt rather than QEMU. > > > > Regards, > > Daniel > > OK, do you want the StrictHugepage element to exist, and default > libvirt behaviour to remain as it is today (fallback to malloc), > > or > > do you want libvirt to pass strict mode to -mem-prealloc as default > when using hugepages element? (which would change behaviour of libvirt > for existing guests) I'd say the latter - if the user is requesting hugepages they should be guaranteed to get them. Libvirt doesn't ever use -mem-path with a non-hugetlbfs filesystem path AFAIR, so the fallback to malloc doesn't really make any sense for our needs. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 11:54:22AM -0500, Marcelo Tosatti wrote: > On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote: > > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote: > > > > > > Add an element named "strict-hugepages" to control whether to > > > refuse guest initialization in case hugepage allocation cannot > > > be performed. > > > > > > Signed-off-by: Marcelo Tosatti > > > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > > index ff50214..e79f5e6 100644 > > > --- a/docs/formatdomain.html.in > > > +++ b/docs/formatdomain.html.in > > > @@ -632,6 +632,9 @@ > > >hugepages > > >This tells the hypervisor that the guest should have its memory > > > allocated using hugepages instead of the normal native page > > > size. > > > + strict-hugepages > > > + This tells the hypervisor that the guest should refuse to start > > > + in case of failure to allocate guest memory with hugepages > > > > Huh, we already supply the -mem-prealloc command line flag alongside > > the -mem-path flag which should cause QEMU to exit if it cannot allocate > > all memory upfront. > >-mem-path path >Allocate guest RAM from a temporarily created file in path. > >-mem-prealloc >Preallocate memory when using -mem-path. > > "should cause QEMU to exit if it cannot allocate all memory upfront" => > no it does not. See more below. I'd say that is a bug - if it is failing to pre-allocate all the memory when asked todo so, then it should exit with an error IMHO. > > libvirt to the private implementation detail of QEMU's backing file > > Right. I am going to add a comment to QEMU saying libvirt interprets the > string (so that changing it breaks libvirt). Other tools already do > this. It is just plain bad design to expose internals of QEMU to libvirt in this way. QEMU is perfectly capable of enforcing this itself and thus retaining the flexibility to change its impl later, without binding together libvirt + QEMU in this way. > > naming. On top of which I don't see why we need this when we're already > > setting -mem-prealloc to enforce exactly this. > > Because there is no guarantee with -mem-prealloc. For instance, if the > hugepage path is not actually hugetlbfs backed, QEMU falls back to > malloc(). Well if you can't fix -mem-prealloc to properly report errors for reasons of back compat, then it is certainly possible to add a further 'strict=yes|no' option to the CLI arg request that it verify this. I don't see any reason why this checking code should be in libvirt rather than QEMU. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote: > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote: > > > > Add an element named "strict-hugepages" to control whether to > > refuse guest initialization in case hugepage allocation cannot > > be performed. > > > > Signed-off-by: Marcelo Tosatti > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > index ff50214..e79f5e6 100644 > > --- a/docs/formatdomain.html.in > > +++ b/docs/formatdomain.html.in > > @@ -632,6 +632,9 @@ > >hugepages > >This tells the hypervisor that the guest should have its memory > > allocated using hugepages instead of the normal native page > > size. > > + strict-hugepages > > + This tells the hypervisor that the guest should refuse to start > > + in case of failure to allocate guest memory with hugepages > > Huh, we already supply the -mem-prealloc command line flag alongside > the -mem-path flag which should cause QEMU to exit if it cannot allocate > all memory upfront. -mem-path path Allocate guest RAM from a temporarily created file in path. -mem-prealloc Preallocate memory when using -mem-path. "should cause QEMU to exit if it cannot allocate all memory upfront" => no it does not. See more below. > > +/* > > + * Returns bool: whether to fail guest initialization. > > + * > > + */ > > +static bool qemuValidateStrictHugepage(virDomainObjPtr vm, > > virQEMUDriverConfigPtr cfg) > > +{ > > +bool ret = false; > > +char **maps = NULL; > > +int i; > > +char *buf; > > + > > +if (!vm->def->mem.strict_hugepages) > > +return ret; > > + > > +ret = true; > > + > > +if (!vm->def->mem.hugepage_backed || !cfg->hugepagePath) { > > +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", > > +_("strict huge pages depends on huge pages")); > > +return ret; > > +} > > + > > +buf = malloc(strlen(cfg->hugepagePath) + 50); > > + > > +/* The parser requires /proc/pid, which only exists on platforms > > + * like Linux where pid_t fits in int. */ > > +if ((int) vm->pid != vm->pid || > > +qemuParseProcFileStrings(vm->pid, "maps", &maps) < 0) > > +goto cleanup; > > + > > +for (i = 0; maps && maps[i]; i++) { > > +char *endptr; > > +unsigned long start, end; > > +const char *map = maps[i]; > > +bool found = false; > > + > > +sprintf(buf, "%s/qemu_back_mem.pc.ram.", cfg->hugepagePath); > > +if (strstr(map,buf) != NULL) > > +found = true; > > + > > +sprintf(buf, "%s/kvm.", cfg->hugepagePath); > > +if (strstr(map,buf) != NULL) > > +found = true; > > + > > +if (!found) > > +continue; > > + > > +errno = 0; > > +start = strtol(map, &endptr, 16); > > +if ((errno == ERANGE && (start == LONG_MAX || start == LONG_MIN)) > > +|| (errno != 0 && start == 0)) { > > +continue; > > +} > > + > > +if (endptr && *endptr == '-') > > +endptr++; > > + > > +if (!*endptr) > > +continue; > > + > > +errno = 0; > > +end = strtol(endptr, NULL, 16); > > +if ((errno == ERANGE && (end == LONG_MAX || end == LONG_MIN)) > > +|| (errno != 0 && end == 0)) { > > +continue; > > +} > > + > > +if (end-start >= vm->def->mem.max_balloon * 1024) { > > +ret = false; > > +break; > > +} > > +} > > This code is really very nasty. It is both Linux specific and exposes Hugetlbfs is Linux specific. So all Hugetlbfs code in libvirt is already Linux specific. > libvirt to the private implementation detail of QEMU's backing file Right. I am going to add a comment to QEMU saying libvirt interprets the string (so that changing it breaks libvirt). Other tools already do this. > naming. On top of which I don't see why we need this when we're already > setting -mem-prealloc to enforce exactly this. Because there is no guarantee with -mem-prealloc. For instance, if the hugepage path is not actually hugetlbfs backed, QEMU falls back to malloc(). A new element StrictHugePageSize=N has been requested, where it is necessary to fail in case hugepagesize != N. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 04:42:02PM +, Daniel P. Berrange wrote: > On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote: > > > > Add an element named "strict-hugepages" to control whether to > > refuse guest initialization in case hugepage allocation cannot > > be performed. > > > > Signed-off-by: Marcelo Tosatti > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > index ff50214..e79f5e6 100644 > > --- a/docs/formatdomain.html.in > > +++ b/docs/formatdomain.html.in > > @@ -632,6 +632,9 @@ > >hugepages > >This tells the hypervisor that the guest should have its memory > > allocated using hugepages instead of the normal native page > > size. > > + strict-hugepages > > + This tells the hypervisor that the guest should refuse to start > > + in case of failure to allocate guest memory with hugepages > > Huh, we already supply the -mem-prealloc command line flag alongside > the -mem-path flag which should cause QEMU to exit if it cannot allocate > all memory upfront. > > > +/* > > + * Returns bool: whether to fail guest initialization. > > + * > > + */ > > +static bool qemuValidateStrictHugepage(virDomainObjPtr vm, > > virQEMUDriverConfigPtr cfg) > > +{ > > +bool ret = false; > > +char **maps = NULL; > > +int i; > > +char *buf; > > + > > +if (!vm->def->mem.strict_hugepages) > > +return ret; > > + > > +ret = true; > > + > > +if (!vm->def->mem.hugepage_backed || !cfg->hugepagePath) { > > +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", > > +_("strict huge pages depends on huge pages")); > > +return ret; > > +} > > + > > +buf = malloc(strlen(cfg->hugepagePath) + 50); > > + > > +/* The parser requires /proc/pid, which only exists on platforms > > + * like Linux where pid_t fits in int. */ > > +if ((int) vm->pid != vm->pid || > > +qemuParseProcFileStrings(vm->pid, "maps", &maps) < 0) > > +goto cleanup; > > + > > +for (i = 0; maps && maps[i]; i++) { > > +char *endptr; > > +unsigned long start, end; > > +const char *map = maps[i]; > > +bool found = false; > > + > > +sprintf(buf, "%s/qemu_back_mem.pc.ram.", cfg->hugepagePath); > > +if (strstr(map,buf) != NULL) > > +found = true; > > + > > +sprintf(buf, "%s/kvm.", cfg->hugepagePath); > > +if (strstr(map,buf) != NULL) > > +found = true; > > + > > +if (!found) > > +continue; > > + > > +errno = 0; > > +start = strtol(map, &endptr, 16); > > +if ((errno == ERANGE && (start == LONG_MAX || start == LONG_MIN)) > > +|| (errno != 0 && start == 0)) { > > +continue; > > +} > > + > > +if (endptr && *endptr == '-') > > +endptr++; > > + > > +if (!*endptr) > > +continue; > > + > > +errno = 0; > > +end = strtol(endptr, NULL, 16); > > +if ((errno == ERANGE && (end == LONG_MAX || end == LONG_MIN)) > > +|| (errno != 0 && end == 0)) { > > +continue; > > +} > > + > > +if (end-start >= vm->def->mem.max_balloon * 1024) { > > +ret = false; > > +break; > > +} > > +} > > This code is really very nasty. It is both Linux specific and exposes > libvirt to the private implementation detail of QEMU's backing file > naming. On top of which I don't see why we need this when we're already > setting -mem-prealloc to enforce exactly this. BTW, you already do /proc parsing and already do hugetlbfs which is Linux specific. So if you can detail in what new way this code is nasty (accordingly to your judgement above), i can look into improving it. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
On Tue, Feb 04, 2014 at 11:32:47AM -0500, Marcelo Tosatti wrote: > > Add an element named "strict-hugepages" to control whether to > refuse guest initialization in case hugepage allocation cannot > be performed. > > Signed-off-by: Marcelo Tosatti > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index ff50214..e79f5e6 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -632,6 +632,9 @@ >hugepages >This tells the hypervisor that the guest should have its memory > allocated using hugepages instead of the normal native page > size. > + strict-hugepages > + This tells the hypervisor that the guest should refuse to start > + in case of failure to allocate guest memory with hugepages Huh, we already supply the -mem-prealloc command line flag alongside the -mem-path flag which should cause QEMU to exit if it cannot allocate all memory upfront. > +/* > + * Returns bool: whether to fail guest initialization. > + * > + */ > +static bool qemuValidateStrictHugepage(virDomainObjPtr vm, > virQEMUDriverConfigPtr cfg) > +{ > +bool ret = false; > +char **maps = NULL; > +int i; > +char *buf; > + > +if (!vm->def->mem.strict_hugepages) > +return ret; > + > +ret = true; > + > +if (!vm->def->mem.hugepage_backed || !cfg->hugepagePath) { > +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", > +_("strict huge pages depends on huge pages")); > +return ret; > +} > + > +buf = malloc(strlen(cfg->hugepagePath) + 50); > + > +/* The parser requires /proc/pid, which only exists on platforms > + * like Linux where pid_t fits in int. */ > +if ((int) vm->pid != vm->pid || > +qemuParseProcFileStrings(vm->pid, "maps", &maps) < 0) > +goto cleanup; > + > +for (i = 0; maps && maps[i]; i++) { > +char *endptr; > +unsigned long start, end; > +const char *map = maps[i]; > +bool found = false; > + > +sprintf(buf, "%s/qemu_back_mem.pc.ram.", cfg->hugepagePath); > +if (strstr(map,buf) != NULL) > +found = true; > + > +sprintf(buf, "%s/kvm.", cfg->hugepagePath); > +if (strstr(map,buf) != NULL) > +found = true; > + > +if (!found) > +continue; > + > +errno = 0; > +start = strtol(map, &endptr, 16); > +if ((errno == ERANGE && (start == LONG_MAX || start == LONG_MIN)) > +|| (errno != 0 && start == 0)) { > +continue; > +} > + > +if (endptr && *endptr == '-') > +endptr++; > + > +if (!*endptr) > +continue; > + > +errno = 0; > +end = strtol(endptr, NULL, 16); > +if ((errno == ERANGE && (end == LONG_MAX || end == LONG_MIN)) > +|| (errno != 0 && end == 0)) { > +continue; > +} > + > +if (end-start >= vm->def->mem.max_balloon * 1024) { > +ret = false; > +break; > +} > +} This code is really very nasty. It is both Linux specific and exposes libvirt to the private implementation detail of QEMU's backing file naming. On top of which I don't see why we need this when we're already setting -mem-prealloc to enforce exactly this. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH] add flag to enforce hugepage backing of guest RAM
Add an element named "strict-hugepages" to control whether to refuse guest initialization in case hugepage allocation cannot be performed. Signed-off-by: Marcelo Tosatti diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index ff50214..e79f5e6 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -632,6 +632,9 @@ hugepages This tells the hypervisor that the guest should have its memory allocated using hugepages instead of the normal native page size. + strict-hugepages + This tells the hypervisor that the guest should refuse to start + in case of failure to allocate guest memory with hugepages nosharepages Instructs hypervisor to disable shared pages (memory merge, KSM) for this domain. Since 1.0.6 diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 28e24f9..f16ef0b 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -11226,6 +11226,9 @@ virDomainDefParseXML(xmlDocPtr xml, if (virXPathBoolean("boolean(./memoryBacking/locked)", ctxt)) def->mem.locked = true; +if ((node = virXPathNode("./memoryBacking/stricthugepages", ctxt))) +def->mem.strict_hugepages = true; + /* Extract blkio cgroup tunables */ if (virXPathUInt("string(./blkiotune/weight)", ctxt, &def->blkio.weight) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index d8f2e49..8ea5cf0 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1977,6 +1977,7 @@ struct _virDomainDef { unsigned long long max_balloon; /* in kibibytes */ unsigned long long cur_balloon; /* in kibibytes */ bool hugepage_backed; +bool strict_hugepages; bool nosharepages; bool locked; int dump_core; /* enum virDomainMemDump */ diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 96b8825..3f8d0a4 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -12133,10 +12133,9 @@ cleanup: return def; } - -static int qemuParseProcFileStrings(int pid_value, -const char *name, -char ***list) +int qemuParseProcFileStrings(int pid_value, + const char *name, + char ***list) { char *path = NULL; int ret = -1; diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index de7683d..bcdfefa 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -226,7 +226,9 @@ virDomainDefPtr qemuParseCommandLinePid(virCapsPtr qemuCaps, char **pidfile, virDomainChrSourceDefPtr *monConfig, bool *monJSON); - +int qemuParseProcFileStrings(int pid_value, + const char *name, + char ***list); int qemuDomainAssignAddresses(virDomainDefPtr def, virQEMUCapsPtr qemuCaps, virDomainObjPtr obj) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 8bcd98e..cb8298e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -25,6 +25,7 @@ #include #include #include +#include #if defined(__linux__) # include #elif defined(__FreeBSD__) @@ -3507,6 +3508,95 @@ error: } +/* + * Returns bool: whether to fail guest initialization. + * + */ +static bool qemuValidateStrictHugepage(virDomainObjPtr vm, virQEMUDriverConfigPtr cfg) +{ +bool ret = false; +char **maps = NULL; +int i; +char *buf; + +if (!vm->def->mem.strict_hugepages) +return ret; + +ret = true; + +if (!vm->def->mem.hugepage_backed || !cfg->hugepagePath) { +virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", +_("strict huge pages depends on huge pages")); +return ret; +} + +buf = malloc(strlen(cfg->hugepagePath) + 50); + +/* The parser requires /proc/pid, which only exists on platforms + * like Linux where pid_t fits in int. */ +if ((int) vm->pid != vm->pid || +qemuParseProcFileStrings(vm->pid, "maps", &maps) < 0) +goto cleanup; + +for (i = 0; maps && maps[i]; i++) { +char *endptr; +unsigned long start, end; +const char *map = maps[i]; +bool found = false; + +sprintf(buf, "%s/qemu_back_mem.pc.ram.", cfg->hugepagePath); +if (strstr(map,buf) != NULL) +found = true; + +sprintf(buf, "%s/kvm.", cfg->hugepagePath); +if (strstr(map,buf) != NULL) +found = true; + +if (!found) +continue; + +errno = 0; +start = strtol(map, &endptr, 16); +if ((errno == ERANGE && (start == LONG_MAX || start == LONG_MIN)) +|| (errno != 0 && start == 0)) { +continue; +} + +