Re: [libvirt] [PATCH RFC]: Support numad
I think numad will probably work best with just the #vcpus and the #MBs of memory in the guest as the requested job size parameters. Sorry for lack of clarity here... Numad should work -- pending bugs -- with any numbers passed. If the requested parameters are bigger than actual physical resources available, numad is supposed to just return all the nodes in the system -- so the effective recommendation in that case would be "use the entire system". If the requested resources are a subset of the system, numad is supposed to return a recommended subset of the system nodes to use for the process -- based on the current amount of free memory and idle CPUs on the various nodes. On 03/01/2012 02:31 PM, Dave Allan wrote: On Wed, Feb 29, 2012 at 06:29:55AM -0500, Bill Burns wrote: On 02/28/2012 11:34 PM, Osier Yang wrote: On 02/29/2012 12:40 AM, Daniel P. Berrange wrote: On Tue, Feb 28, 2012 at 11:33:03AM -0500, Dave Allan wrote: On Tue, Feb 28, 2012 at 10:10:50PM +0800, Osier Yang wrote: numad is an user-level daemon that monitors NUMA topology and processes resource consumption to facilitate good NUMA resource alignment of applications/virtual machines to improve performance and minimize cost of remote memory latencies. It provides a pre-placement advisory interface, so significant processes can be pre-bound to nodes with sufficient available resources. More details: http://fedoraproject.org/wiki/Features/numad "numad -w ncpus:memory_amount" is the advisory interface numad provides currently. This patch add the support by introducing new XML like: Isn't the usual case going to be the vcpus and memory in the guest? IMO we should default to passing those numbers to numad if required_cpus and required_memory are not provided explicitly. Indeed, why you would want to specify anything different ? At first glance my reaction was just skip the XML and call numad internally automatically with the guest configured allocation Here the "required_cpus" stands for the physical CPUs number, which will be used numad to choose the proper nodeset. So from sementics point of view, it's different with4, I can imagine two problems if we reuse the vCPUs number for numad's use: 1) Suppose there are 16 pCPUs, but the specified vCPUs number is "64". I'm not sure if numad will work properly in this case, but isn't it a bad use case? :-) 2) Suppose there are 128 pCPUs, but the specified vCPUs number is "2". numad will work definitely, but is that the result the user wants to see? no good to performace. The basic thought is we provide the interface, and how to configure the provided XML for good performace is on the end-user then. If we mixed-use the two different sementics, and do things secrectly in the codes, then I could imagine there will be performance problems. The "required_memory" could be omitted though, we can reuse "524288", but I'm not sure if it's good to always pass a "memory amount" to numad command line, it may be not good in some case. @Bill(s), correct me if I'm not right. :-) Perhaps we could have a bool attribute then, such as: Please keep Bill Gray on this thread. He is the author of numad and is the best person to answer the above questions. Bill (Gray), Can you weigh in here? Dave Bill Regards, Osier -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC]: Support numad
On 03/01/2012 02:31 PM, Dave Allan wrote: On Wed, Feb 29, 2012 at 06:29:55AM -0500, Bill Burns wrote: On 02/28/2012 11:34 PM, Osier Yang wrote: On 02/29/2012 12:40 AM, Daniel P. Berrange wrote: On Tue, Feb 28, 2012 at 11:33:03AM -0500, Dave Allan wrote: On Tue, Feb 28, 2012 at 10:10:50PM +0800, Osier Yang wrote: numad is an user-level daemon that monitors NUMA topology and processes resource consumption to facilitate good NUMA resource alignment of applications/virtual machines to improve performance and minimize cost of remote memory latencies. It provides a pre-placement advisory interface, so significant processes can be pre-bound to nodes with sufficient available resources. More details: http://fedoraproject.org/wiki/Features/numad "numad -w ncpus:memory_amount" is the advisory interface numad provides currently. This patch add the support by introducing new XML like: Isn't the usual case going to be the vcpus and memory in the guest? IMO we should default to passing those numbers to numad if required_cpus and required_memory are not provided explicitly. Indeed, why you would want to specify anything different ? At first glance my reaction was just skip the XML and call numad internally automatically with the guest configured allocation Here the "required_cpus" stands for the physical CPUs number, which will be used numad to choose the proper nodeset. So from sementics point of view, it's different with4, I can imagine two problems if we reuse the vCPUs number for numad's use: 1) Suppose there are 16 pCPUs, but the specified vCPUs number is "64". I'm not sure if numad will work properly in this case, but isn't it a bad use case? :-) 2) Suppose there are 128 pCPUs, but the specified vCPUs number is "2". numad will work definitely, but is that the result the user wants to see? no good to performace. The basic thought is we provide the interface, and how to configure the provided XML for good performace is on the end-user then. If we mixed-use the two different sementics, and do things secrectly in the codes, then I could imagine there will be performance problems. The "required_memory" could be omitted though, we can reuse "524288", but I'm not sure if it's good to always pass a "memory amount" to numad command line, it may be not good in some case. @Bill(s), correct me if I'm not right. :-) Perhaps we could have a bool attribute then, such as: Please keep Bill Gray on this thread. He is the author of numad and is the best person to answer the above questions. Bill (Gray), Can you weigh in here? Am sure he will, but he is on PTO, back sometime over the weekend, Monday at the latest ;-) the other Bill Dave Bill Regards, Osier -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC]: Support numad
On Wed, Feb 29, 2012 at 06:29:55AM -0500, Bill Burns wrote: > On 02/28/2012 11:34 PM, Osier Yang wrote: > >On 02/29/2012 12:40 AM, Daniel P. Berrange wrote: > >>On Tue, Feb 28, 2012 at 11:33:03AM -0500, Dave Allan wrote: > >>>On Tue, Feb 28, 2012 at 10:10:50PM +0800, Osier Yang wrote: > numad is an user-level daemon that monitors NUMA topology and > processes resource consumption to facilitate good NUMA resource > alignment of applications/virtual machines to improve performance > and minimize cost of remote memory latencies. It provides a > pre-placement advisory interface, so significant processes can > be pre-bound to nodes with sufficient available resources. > > More details: http://fedoraproject.org/wiki/Features/numad > > "numad -w ncpus:memory_amount" is the advisory interface numad > provides currently. > > This patch add the support by introducing new XML like: > > > > >>> > >>>Isn't the usual case going to be the vcpus and memory in the guest? > >>>IMO we should default to passing those numbers to numad if > >>>required_cpus and required_memory are not provided explicitly. > >> > >>Indeed, why you would want to specify anything different ? At > >>first glance my reaction was just skip the XML and call numad > >>internally automatically with the guest configured allocation > >> > > > >Here the "required_cpus" stands for the physical CPUs number, > >which will be used numad to choose the proper nodeset. So from > >sementics point of view, it's different with 4, > >I can imagine two problems if we reuse the vCPUs number for > >numad's use: > > > >1) Suppose there are 16 pCPUs, but the specified vCPUs number > >is "64". I'm not sure if numad will work properly in this case, > >but isn't it a bad use case? :-) > > > >2) Suppose there are 128 pCPUs, but the specified vCPUs number > >is "2". numad will work definitely, but is that the result the > >user wants to see? no good to performace. > > > >The basic thought is we provide the interface, and how to configure > >the provided XML for good performace is on the end-user then. If > >we mixed-use the two different sementics, and do things secrectly > >in the codes, then I could imagine there will be performance problems. > > > >The "required_memory" could be omitted though, we can reuse > >"524288", but I'm not sure if it's good to > >always pass a "memory amount" to numad command line, it may be > >not good in some case. @Bill(s), correct me if I'm not right. :-) > > > >Perhaps we could have a bool attribute then, such as: > > > > > > > > Please keep Bill Gray on this thread. He is the author of numad and > is the best person to answer the above questions. Bill (Gray), Can you weigh in here? Dave > Bill > > >Regards, > >Osier > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC]: Support numad
On 02/28/2012 11:34 PM, Osier Yang wrote: On 02/29/2012 12:40 AM, Daniel P. Berrange wrote: On Tue, Feb 28, 2012 at 11:33:03AM -0500, Dave Allan wrote: On Tue, Feb 28, 2012 at 10:10:50PM +0800, Osier Yang wrote: numad is an user-level daemon that monitors NUMA topology and processes resource consumption to facilitate good NUMA resource alignment of applications/virtual machines to improve performance and minimize cost of remote memory latencies. It provides a pre-placement advisory interface, so significant processes can be pre-bound to nodes with sufficient available resources. More details: http://fedoraproject.org/wiki/Features/numad "numad -w ncpus:memory_amount" is the advisory interface numad provides currently. This patch add the support by introducing new XML like: Isn't the usual case going to be the vcpus and memory in the guest? IMO we should default to passing those numbers to numad if required_cpus and required_memory are not provided explicitly. Indeed, why you would want to specify anything different ? At first glance my reaction was just skip the XML and call numad internally automatically with the guest configured allocation Here the "required_cpus" stands for the physical CPUs number, which will be used numad to choose the proper nodeset. So from sementics point of view, it's different with 4, I can imagine two problems if we reuse the vCPUs number for numad's use: 1) Suppose there are 16 pCPUs, but the specified vCPUs number is "64". I'm not sure if numad will work properly in this case, but isn't it a bad use case? :-) 2) Suppose there are 128 pCPUs, but the specified vCPUs number is "2". numad will work definitely, but is that the result the user wants to see? no good to performace. The basic thought is we provide the interface, and how to configure the provided XML for good performace is on the end-user then. If we mixed-use the two different sementics, and do things secrectly in the codes, then I could imagine there will be performance problems. The "required_memory" could be omitted though, we can reuse "524288", but I'm not sure if it's good to always pass a "memory amount" to numad command line, it may be not good in some case. @Bill(s), correct me if I'm not right. :-) Perhaps we could have a bool attribute then, such as: Please keep Bill Gray on this thread. He is the author of numad and is the best person to answer the above questions. Bill Regards, Osier -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC]: Support numad
On 02/29/2012 12:40 AM, Daniel P. Berrange wrote: On Tue, Feb 28, 2012 at 11:33:03AM -0500, Dave Allan wrote: On Tue, Feb 28, 2012 at 10:10:50PM +0800, Osier Yang wrote: numad is an user-level daemon that monitors NUMA topology and processes resource consumption to facilitate good NUMA resource alignment of applications/virtual machines to improve performance and minimize cost of remote memory latencies. It provides a pre-placement advisory interface, so significant processes can be pre-bound to nodes with sufficient available resources. More details: http://fedoraproject.org/wiki/Features/numad "numad -w ncpus:memory_amount" is the advisory interface numad provides currently. This patch add the support by introducing new XML like: Isn't the usual case going to be the vcpus and memory in the guest? IMO we should default to passing those numbers to numad if required_cpus and required_memory are not provided explicitly. Indeed, why you would want to specify anything different ? At first glance my reaction was just skip the XML and call numad internally automatically with the guest configured allocation Here the "required_cpus" stands for the physical CPUs number, which will be used numad to choose the proper nodeset. So from sementics point of view, it's different with 4, I can imagine two problems if we reuse the vCPUs number for numad's use: 1) Suppose there are 16 pCPUs, but the specified vCPUs number is "64". I'm not sure if numad will work properly in this case, but isn't it a bad use case? :-) 2) Suppose there are 128 pCPUs, but the specified vCPUs number is "2". numad will work definitely, but is that the result the user wants to see? no good to performace. The basic thought is we provide the interface, and how to configure the provided XML for good performace is on the end-user then. If we mixed-use the two different sementics, and do things secrectly in the codes, then I could imagine there will be performance problems. The "required_memory" could be omitted though, we can reuse "524288", but I'm not sure if it's good to always pass a "memory amount" to numad command line, it may be not good in some case. @Bill(s), correct me if I'm not right. :-) Perhaps we could have a bool attribute then, such as: Regards, Osier -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC]: Support numad
On Tue, Feb 28, 2012 at 04:40:06PM +, Daniel P. Berrange wrote: > On Tue, Feb 28, 2012 at 11:33:03AM -0500, Dave Allan wrote: > > On Tue, Feb 28, 2012 at 10:10:50PM +0800, Osier Yang wrote: > > > numad is an user-level daemon that monitors NUMA topology and > > > processes resource consumption to facilitate good NUMA resource > > > alignment of applications/virtual machines to improve performance > > > and minimize cost of remote memory latencies. It provides a > > > pre-placement advisory interface, so significant processes can > > > be pre-bound to nodes with sufficient available resources. > > > > > > More details: http://fedoraproject.org/wiki/Features/numad > > > > > > "numad -w ncpus:memory_amount" is the advisory interface numad > > > provides currently. > > > > > > This patch add the support by introducing new XML like: > > > > > > > > > > > > > Isn't the usual case going to be the vcpus and memory in the guest? > > IMO we should default to passing those numbers to numad if > > required_cpus and required_memory are not provided explicitly. > > Indeed, why you would want to specify anything different ? At > first glance my reaction was just skip the XML and call numad > internally automatically with the guest configured allocation That seems reasonable to me. Dave > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| > |: http://libvirt.org -o- http://virt-manager.org :| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC]: Support numad
On Tue, Feb 28, 2012 at 11:33:03AM -0500, Dave Allan wrote: > On Tue, Feb 28, 2012 at 10:10:50PM +0800, Osier Yang wrote: > > numad is an user-level daemon that monitors NUMA topology and > > processes resource consumption to facilitate good NUMA resource > > alignment of applications/virtual machines to improve performance > > and minimize cost of remote memory latencies. It provides a > > pre-placement advisory interface, so significant processes can > > be pre-bound to nodes with sufficient available resources. > > > > More details: http://fedoraproject.org/wiki/Features/numad > > > > "numad -w ncpus:memory_amount" is the advisory interface numad > > provides currently. > > > > This patch add the support by introducing new XML like: > > > > > > > > Isn't the usual case going to be the vcpus and memory in the guest? > IMO we should default to passing those numbers to numad if > required_cpus and required_memory are not provided explicitly. Indeed, why you would want to specify anything different ? At first glance my reaction was just skip the XML and call numad internally automatically with the guest configured allocation Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC]: Support numad
On Tue, Feb 28, 2012 at 10:10:50PM +0800, Osier Yang wrote: > numad is an user-level daemon that monitors NUMA topology and > processes resource consumption to facilitate good NUMA resource > alignment of applications/virtual machines to improve performance > and minimize cost of remote memory latencies. It provides a > pre-placement advisory interface, so significant processes can > be pre-bound to nodes with sufficient available resources. > > More details: http://fedoraproject.org/wiki/Features/numad > > "numad -w ncpus:memory_amount" is the advisory interface numad > provides currently. > > This patch add the support by introducing new XML like: > > > Isn't the usual case going to be the vcpus and memory in the guest? IMO we should default to passing those numbers to numad if required_cpus and required_memory are not provided explicitly. Dave > And the corresponding numad command line will be: > numad -w 4:500 > > The advisory nodeset returned from numad will be used to set > domain process CPU affinity then. (e.g. qemuProcessInitCpuAffinity). > > If the user specifies both CPU affinity policy (e.g. > (4) and XML indicating to use > numad for the advisory nodeset, the specified CPU affinity will be > overridden by the nodeset returned from numad. > > If no XML to specify the CPU affinity policy, and XML indicating > to use numad is specified, the returned nodeset will be printed > in 4. > > Only QEMU/KVM and LXC drivers support it now. > --- > configure.ac |8 +++ > docs/formatdomain.html.in | 18 ++- > docs/schemas/domaincommon.rng | 12 > src/conf/domain_conf.c| 125 > +++-- > src/conf/domain_conf.h|5 ++ > src/lxc/lxc_controller.c | 98 > src/qemu/qemu_process.c | 99 + > 7 files changed, 311 insertions(+), 54 deletions(-) > > diff --git a/configure.ac b/configure.ac > index c9cdd7b..31f0835 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -1445,6 +1445,14 @@ AM_CONDITIONAL([HAVE_NUMACTL], [test "$with_numactl" > != "no"]) > AC_SUBST([NUMACTL_CFLAGS]) > AC_SUBST([NUMACTL_LIBS]) > > +dnl Do we have numad? > +if test "$with_qemu" = "yes"; then > +AC_PATH_PROG([NUMAD], [numad], [], [/bin:/usr/bin:/usr/local/bin:$PATH]) > + > +if test -n "$NUMAD"; then > +AC_DEFINE_UNQUOTED([NUMAD],["$NUMAD"], [Location or name of the > numad program]) > +fi > +fi > > dnl pcap lib > LIBPCAP_CONFIG="pcap-config" > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index 6fcca94..d8e70a6 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -505,6 +505,7 @@ >... >> >... > > @@ -519,7 +520,7 @@ > Since 0.9.3 >memory > > -The optional memory element specify how to allocate > memory > +The optional memory element specifies how to allocate > memory > for the domain process on a NUMA host. It contains two attributes, > attribute mode is either 'interleave', 'strict', > or 'preferred', > @@ -527,6 +528,21 @@ > syntax with attribute cpuset of element > vcpu. > Since 0.9.3 > > + > +The optional cpu element indicates pinning the virtual > CPUs > +to the nodeset returned by querying "numad" (a system daemon that > monitors > +NUMA topology and usage). It has two attributes, attribute > +required_cpus specifies the number of physical CPUs the > guest > +process want to use. And the optional attribute > required_memory > +specifies the amount of free memory the guest process want to see on > a node, > +"numad" will pick the physical CPUs on the node which has enough free > +memory of amount specified by required_memory. > + > +NB, with using this element, the physical CPUs specified by attribute > +cpuset (of element vcpu) will be > overridden by the > +nodeset returned from "numad". > +Since 0.9.11 (QEMU/KVM and LXC only) > + > > > > diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng > index 3908733..d0f443d 100644 > --- a/docs/schemas/domaincommon.rng > +++ b/docs/schemas/domaincommon.rng > @@ -549,6 +549,18 @@ > > > > + > + > + > + > + > + > + > + > + > + > + > + > > > > diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c > index f9654f1..aa03c05 100644 > --- a/src/conf/domain_conf.c > +++ b/src/conf/domain_conf.c > @@ -7125,7> + >
[libvirt] [PATCH RFC]: Support numad
numad is an user-level daemon that monitors NUMA topology and processes resource consumption to facilitate good NUMA resource alignment of applications/virtual machines to improve performance and minimize cost of remote memory latencies. It provides a pre-placement advisory interface, so significant processes can be pre-bound to nodes with sufficient available resources. More details: http://fedoraproject.org/wiki/Features/numad "numad -w ncpus:memory_amount" is the advisory interface numad provides currently. This patch add the support by introducing new XML like: And the corresponding numad command line will be: numad -w 4:500 The advisory nodeset returned from numad will be used to set domain process CPU affinity then. (e.g. qemuProcessInitCpuAffinity). If the user specifies both CPU affinity policy (e.g. (4) and XML indicating to use numad for the advisory nodeset, the specified CPU affinity will be overridden by the nodeset returned from numad. If no XML to specify the CPU affinity policy, and XML indicating to use numad is specified, the returned nodeset will be printed in 4. Only QEMU/KVM and LXC drivers support it now. --- configure.ac |8 +++ docs/formatdomain.html.in | 18 ++- docs/schemas/domaincommon.rng | 12 src/conf/domain_conf.c| 125 +++-- src/conf/domain_conf.h|5 ++ src/lxc/lxc_controller.c | 98 src/qemu/qemu_process.c | 99 + 7 files changed, 311 insertions(+), 54 deletions(-) diff --git a/configure.ac b/configure.ac index c9cdd7b..31f0835 100644 --- a/configure.ac +++ b/configure.ac @@ -1445,6 +1445,14 @@ AM_CONDITIONAL([HAVE_NUMACTL], [test "$with_numactl" != "no"]) AC_SUBST([NUMACTL_CFLAGS]) AC_SUBST([NUMACTL_LIBS]) +dnl Do we have numad? +if test "$with_qemu" = "yes"; then +AC_PATH_PROG([NUMAD], [numad], [], [/bin:/usr/bin:/usr/local/bin:$PATH]) + +if test -n "$NUMAD"; then +AC_DEFINE_UNQUOTED([NUMAD],["$NUMAD"], [Location or name of the numad program]) +fi +fi dnl pcap lib LIBPCAP_CONFIG="pcap-config" diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 6fcca94..d8e70a6 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -505,6 +505,7 @@ ...... @@ -519,7 +520,7 @@ Since 0.9.3 memory -The optional memory element specify how to allocate memory +The optional memory element specifies how to allocate memory for the domain process on a NUMA host. It contains two attributes, attribute mode is either 'interleave', 'strict', or 'preferred', @@ -527,6 +528,21 @@ syntax with attribute cpuset of element vcpu. Since 0.9.3 + +The optional cpu element indicates pinning the virtual CPUs +to the nodeset returned by querying "numad" (a system daemon that monitors +NUMA topology and usage). It has two attributes, attribute +required_cpus specifies the number of physical CPUs the guest +process want to use. And the optional attribute required_memory +specifies the amount of free memory the guest process want to see on a node, +"numad" will pick the physical CPUs on the node which has enough free +memory of amount specified by required_memory. + +NB, with using this element, the physical CPUs specified by attribute +cpuset (of element vcpu) will be overridden by the +nodeset returned from "numad". +Since 0.9.11 (QEMU/KVM and LXC only) + diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 3908733..d0f443d 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -549,6 +549,18 @@ + + + + + + + + + + + + diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index f9654f1..aa03c05 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7125,7 +7125,6 @@ error: goto cleanup; } - static int virDomainDefMaybeAddController(virDomainDefPtr def, int type, int idx) @@ -7185,6 +7184,7 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, bool uuid_generated = false; virBitmapPtr bootMap = NULL; unsigned long bootMapSize = 0; +xmlNodePtr cur; if (VIR_ALLOC(def) < 0) { virReportOOMError(); @@ -7454,47 +7454,100 @@ static virDomainDefPtr virDomainDefPars +